Eindhoven University of Technology

MASTER

Process mining in intensive care unit data

Lybeshari, E.

Award date: 2012

Link to publication

Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration.

General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

EINDHOVEN UNIVERSITY OF TECHNOLOGY

Department of Mathematics and Computer Science

Master Thesis Process mining in Intensive Care Unit Data

By Edlira Lybeshari

In partial fulfillment of the requirements for the degree of Master of Science in Business Information Systems

Supervisors: dr.ir. I.T.P. Vanderfeesten (Irene) TU/e, IE&IS prof.dr.ir. U. Kaymak (Uzay) TU/e, IE&IS dr.ir. R.S. Mans (Ronny) TU/e, M&CS

Eindhoven, August 2012

2

Acknowledgements

It is a pleasure to thank the people who made this thesis possible.

This work would not have been possible without the great support from my supervisor Irene Vanderfeesten under whose guidance, I chose this topic. She has been abundantly helpful and has assisted me in numerous ways. I specially thank her for the frequent valuable feedback and for believing in me from the first day of this project.

I would like to say a big thanks to Ronny Mans who assisted me during the whole process of this thesis project. His advices and feedback were very important for the research done in this thesis.

I am grateful to professor Uzay Kaymak. Thank you for helping me in the most crucial phase of this thesis, for your feedback and for being part of assessment committee.

I would like to thank also André Fialho who even though was not officially involved in this thesis project was always available to answer my questions.

The next thanks are for Leo Celi (doctor at BIDMC hospital), Erik Korsten (doctor at Catharina hospital) Walther van Mook (doctor at Academic hospital of Maastricht) and Dennis Bergmans (doctor at Academic hospital of Maastricht) who supported me in the early phases of this thesis.

My final words go to my family. I want to thank my family, whose love and guidance is with me in whatever I pursue. Thank you mom and dad for your unconditioned support during all my period of studying. Thank you Lenci for being extremely patient with me and my strange attitude during this thesis project. The most special thank you is for my little baby boy, Princ who made this thesis project an unforgettable journey. I feel lucky having you all in my life.

Edlira Lybeshari

Eindhoven,

August 2012

3

4

Abstract

Intensive care units (ICUs) of hospitals provide health care services to critically ill patients. The number of such patients is continuously increasing and the services provided to them by ICUs take a significant part of hospitals’ budgets. So, improving the quality of ICU services is important but also challenging.

In this thesis, we use process mining to investigate ICU healthcare services by mining the data of clinical processes (such as treatment, diagnosis and monitoring processes) related to these services. More specifically, we focus on applying process mining for checking whether medical guidelines are followed correctly by ICU staff, by contributing so in the improvement of ICU clinical processes and as a result having a better health care in ICUs.

So, to reach our goal we have followed a research approach moving from specific observations to broader generalizations, by first selecting a specific ICU database and some ICU clinical processes and then trying to mine their data from the selected source of data. This attempt can be successful (in cases when we have some mining results) or it can fail due to the lack of the required process data (in cases when we cannot do process mining). In this last case, we suggest solutions on how to store process data and how to collect them so that they are suitable for process mining.

The results of applying our approach in the selected ICU database (MIMICII database) and for the selected ICU processes (‘Daily checklist’ and ‘ST‐elevation treatment’ processes) showed that the data of these processes stored in the MIMICII database were not suitable for process mining. Consequently, we have not obtained mining results but we have designed a process oriented database schema for MIMICII and further generalized it to an abstract process oriented database schema that can be suitable for different healthcare clinical databases (including here, also ICU databases). These schemas make it clear what is the basic information needed for process mining and how it should be stored in a database. In addition, we have also provided procedural solutions on how to collect these process data.

Finally, in this thesis we realized that even when a healthcare database is large and varied in data, it is not sure that it contains the right data needed for process mining.

Keywords: intensive care, ICU, process mining, database

5

Table of Contents

Acknowledgements ...... 3 Abstract ...... 5 Table of Contents ...... 6 Tables of Figures ...... 9 Tables of Tables...... 10 1 Introduction ...... 11 1.1 Context ...... 11 1.2 Motivation ...... 12 1.2.1 Why process mining? ...... 12 1.2.2 Why process mining in healthcare and ICU? ...... 12 1.3 Research question ...... 13 1.4 Outline...... 14 2 Background ...... 16 2.1 Description of intensive care units ...... 16 2.2 Processes and Process mining ...... 16 2.2.1 What is a business process? ...... 17 2.2.2 Process mining basics ...... 17 2.2.3 What data do we need to do process mining? ...... 19 2.2.4 Process mining tools ...... 21 2.3 Related work ...... 21 2.3.1 Applications of process mining in healthcare ...... 21 2.3.2 Mining ICU data ...... 23 2.4 Structured Query Language (SQL) ...... 24 3 Methodology ...... 26 3.1 Select an ICU database ...... 26 3.2 Select ICU process (es)...... 27 3.3 Look for process(es) data...... 28 3.4 Assess process(es) data...... 29 3.5 Apply process mining techniques...... 29 3.6 Identify potential issues...... 30

6

3.7 Design solution ...... 30 4 Process mining the ICU data ...... 32 4.1 Select an ICU database ...... 32 4.1.1 Why MIMICII database? ...... 32 4.1.2 MIMICII content ...... 33 4.1.3 MIMICII clinical database structure ...... 34 4.2 Select ICU process(es)...... 36 4.2.1 The checklist ...... 37 4.2.2 ST‐elevation treatment pathway ...... 38 4.2.3 Antibiotic administration in ICU ...... 39 4.3 Look for process(es) data...... 40 4.3.1 Checklist ...... 40 4.3.2 ST‐elevation ...... 44 4.4 Assess process(es) data...... 45 4.5 Issues and challenges...... 46 4.5.1 No relationship between the process and its data in the database...... 46 4.5.2 Matching process guidelines with the process information found in the database ...... 47 4.5.3 Unstructured process data in the database...... 47 4.5.4 Missing process data in the database...... 48 4.6 Conclusions ...... 48 5 Design ...... 50 5.1 Solutions ...... 50 5.1.1 A new process oriented version of MIMICII ...... 50 5.1.2 A process oriented ICU database abstract schema ...... 59 5.1.3 Practical implications ...... 64 5.2 Discussion ...... 68 5.2.1 Solution of our faced issues ...... 68 5.2.2 Feasibility of the proposed solutions ...... 69 6 Conclusions and future work ...... 73 6.1 Conclusions ...... 73 6.2 Limitations and future work...... 74 Bibliography ...... 76

7

Appendix A‐ MIMICII tables and their relationships ...... 79 Appendix B ‐ How can we create a log file with the process data of the new MIMICII schema? ...... 87 Appendix C ‐ Can we use the diagnosis to link events with processes? ...... 88 Appendix D ...... 90

8

Tables of Figures FIGURE 1‐ PROCESS MINING OVERVIEW [30] ...... 18 FIGURE 2‐EXAMPLEOF AN EVENT LOG FOR A CERTAIN PROCESS WITH MINIMUM DATA ...... 20 FIGURE 3‐ EXAMPLE OF AN EVENT LOG FOR A CERTAIN PROCESS ...... 21 FIGURE 4‐ THE PHASES OF THE APPROACH ...... 26 FIGURE 5‐ A SIMPLIFIED MODEL OF MIMICII THAT CONTAINS TABLES RELEVANT FOR THIS PROJECT AND THEIR RELATIONSHIP WITH THE D_PATIENTS TABLE. THIS MODEL IS BUILT BASED ON THE INFORMATION IN [7]...... 35 FIGURE 6‐ BIDMC DAILY CHECKLIST ...... 38 FIGURE 7‐EMERGENCY MANAGEMENT OF COMPLICATED ST‐ELEVATION MYOCARDIAL INFARCTION. THE EMERGENCY MANAGEMENT OF PATIENTS WITH CARDIOGENIC SHOCK, ACUTE PULMONARY EDEMA, OR BOTH IS OUTLINED. SBP INDICATES SYSTOLIC BLOOD PRESSURE; IV, INTRAVENOUS; BP, BLOOD PRESSURE; ACE, ANGIOTENSIN CONVERTING ENZYME; MI, MYOCARDIAL INFARCTION. *FUROSEMIDE LESS THAN 0.5 MG/KG FOR NEW‐ONSET ACUTE PULMONARY EDEMA WITHOUT HYPOVOLEMIA; 1 MG/KG FOR ACUTE OR CHRONIC VOLUME OVERLOAD, RENAL INSUFFICIENCY. [3] ...... 39 FIGURE 8‐ NEW SCHEMA OF MIMICII‐ PART 1 ...... 51 FIGURE 9‐ NEW SCHEMA OF MIMICII‐ PART 2 ...... 52 FIGURE 10‐ TABLE PROCESSITEMS ...... 53 FIGURE 11‐ TABLE PROCESS_OCCURRENCE...... 54 FIGURE 12‐ TABLE EVENTITEMS ...... 55 FIGURE 13‐ PROCESS ORIENTED HEALTHCARE DATABASE SCHEMA ...... 60 FIGURE 14‐ SPECIALIZATION OF TABLE 'EVENT_OCCURRENCE' ...... 63 FIGURE 15‐ EXAMPLE OF SPECIALIZED TABLES FOR TABLE 'EVENT_OCCURRENCE' ...... 64 FIGURE 16‐ MAJOR MIMICII CLINICAL DATABASE COMPONENTS AND THEIR RELATIONSHIPS (REF/ MIMICII GUIDE2) ...... 80 FIGURE 17‐ PATIENT TO ICD‐9 AND DIAGNOSIS‐RELATED GROUP CODE ...... 81 FIGURE 18‐ CAREGIVER TABLE AND ITS RELATIONSHIPS ...... 82 FIGURE 19‐ CAREUNITS TABLE AND ITS RELATIONSHIPS...... 83 FIGURE 20‐ PATIENT MEDICATION TABLES AND THEIR RELATIONSHIPS ...... 84 FIGURE 21‐ STRUCTURE OF TABLE CHARTEVENTS ...... 85 FIGURE 22‐ STRUCTURE OF TABLE D_CHARTITEMS ...... 85 FIGURE 23‐ STRUCTURE OF TABLE ICUSTAYEVENTS ...... 85 FIGURE 24‐ STRUCTURE OF TABLE CENSUSEVENTS ...... 85 FIGURE 25‐STRUCTURE OF TABLE NOTEEVENTS ...... 86 FIGURE 26‐ STRUCTURE OF TABLE DEMOGRAPHICSEVENTS ...... 86 FIGURE 27 ‐ RICHMOND AGITATION SEDATION SCALE ...... 90 FIGURE 28‐ PROCEDURE OF RASS ASSESSMENT ...... 90

9

Tables of Tables TABLE 1‐ EXPLANATION OF CHECKLIST ITEMS ...... 42 TABLE 2‐ PROCESSSITEMS TABLE WITH TWO RECORDS (THE FIRST ONE FOR THE CHECKLIST PROCESS AND THE SECOND ONE FOR THE STEMI TREATMENT PROCESS) ...... 53 TABLE 3‐ TABLE PROCESS_OCCURRENCE THAT WITH FOUR SAMPLE RECORDS...... 55 TABLE 4‐ EVENTITEMS TABLE WITH TWO SAMPLE RECORDS ...... 55 TABLE 5‐ONE RECORD SAVED IN TABLE CHARTEVENTS IN THE NEW VERSION OF MIMICII ...... 58 TABLE 6‐ LIST OF CHART ITEMS RELATED TO SKIN IMPAIRMENTS CHECK ITEM OF THE CHECKLIST OBTAINED IN STEP 2 ...... 93 TABLE 7‐REDUCED LIST OF CHART ITEMS RELATED TO SKIN IMPAIRMENTS CHECK ITEM OF THE CHECKLIST ...... 93 TABLE 8‐ EXAMPLES OF PROCESS MINING PRODUCTS: COMMERCIAL TOOLS C, ACADEMIC TOOLS A AND OPEN‐SOURCE TOOLS O. [1:271]...... 94

10

1 Introduction

This chapter introduces the context, the motivation and the research question for this thesis project, respectively in sections 1.1, 1.2 and 1.3. Finally in section 1.4, it shows the outline of this thesis report.

1.1 Context

Today Business Process Management (BPM) is helping many organizations in public and private sector by attempting to improve processes continuously. As a result of this process improvement, BPM can deliver quite some significant benefits as for instance, improved process quality and customer satisfaction, reduced costs etc.

One way to analyze and improve business processes is by using process mining. Process mining mines the process related data logged by information systems in order to discover process knowledge and provide more insight into processes. So, it can discover processes based on the logged data by allowing so to understand how processes are executed in reality and consequently check if they are done conform corresponding guidelines or not. Process mining can also help in detecting bottlenecks and points for improvement in a process. Process mining has been used in different domains and among others also in the healthcare domain [26,29,30,34,38].

The healthcare industry diagnoses, treats and administers care to the needs of millions of people‐‐from newborns to the terminally ill. Especially the care provided to these critically ill patients is of a particular importance for hospitals because these patients need constant, close monitoring and support from equipment and medication that usually are expensive [27], in order to maintain normal bodily functions. Furthermore, number of patients in the ICUs is increasing [45]. Therefore, it is important to avoid complications and death in Intensive Care Units (ICUs) patients for both increasing the service quality and reducing costs in these hospital units.

According to Garland [15], all efforts to improve ICU performance require changing its structures and processes. Also, Curtis et al. in [8] emphasize the importance of quality improvement in intensive care and one step in their guide to achieve this, is about doing an environmental scan to understand the current situation (structure, process, or outcome), the potential barriers, opportunities, and resources for the project.

Part of strategies to improve performance in health‐care systems are those which aim at increasing or improving the use of specific evidence‐based best practices and these strategies include among others also clinical practice guidelines [15]. Usually ICUs are ad‐hoc (driven by complications) because ICU shift leaders make a vast number of ad hoc decisions concerning the entire ICU care process [28] but there are also parts in ICU care which are rather regulated by guidelines and protocols (i.e. treatment processes for certain diagnoses).

So, checking if these treatment guidelines are followed in ICUs can help in understanding and assessing the current use of advised treatment processes (the ones given in the guidelines), contributing so in the quality improvement in ICU.

11

Process mining can be used to check if ICU treatment guidelines are followed by the ICU staff and to do so it needs logged process data. These data are usually logged by an ICU clinical information system and they should show information about activities done to ICU patients as part of their treatment. So, it is important to have the right type of logged data in order to use process mining for this purpose.

In summary, improving ICU performance requires assessing and improving the ICU systems and processes [27]. Now, with process mining techniques, it can be possible to analyze the actual run‐time behavior of ICU treatment processes and compare the mining results with the corresponding guidelines, by contributing so in the assessment and improvement of such processes. But process mining needs the right types of logged data about these processes to do so and these data can be present or not in ICU databases.

1.2 Motivation

This section explains the reasons why we want to research on process mining in subsection 1.2.1 and why applying it on healthcare domain and more specifically on ICU processes in subsection 1.2.2.

1.2.1 Why process mining?

Mainly, the benefits of process mining are the reasons that motivate our choice to use it in our project.

Firstly, process mining helps getting valuable insights about the organizations’ processes. It gives an accurate, quantitative picture of what your organization has been doing. So, understanding the business can help answering the questions “How well are we doing? “ and “Can we do better?”.

Also, process mining can be used to discover how processes are executed in general (’the general way of working’) and to monitor the reality and check if processes are being executed as they are expected to (i.e. in compliance with certain guidelines or standards). So, it is a good approach to see if there are deviations in process executions and it can show the best practices in running those processes.

Considering the above benefits of process mining we decided to investigate if it can help solving the issues encountered in ICUs (mentioned in section 1.1) such as high costs, increasing number of patients, the need for improving service quality etc. This is discussed in the next subsection.

1.2.2 Why process mining in healthcare and ICU?

In this subsection, we explain why we focus our research in the healthcare domain and more specifically in the intensive care units. Here, we also show how process mining can help in this specific area.

Firstly, healthcare is important because everyone needs it at several points in their lifetime. It gives us the ability to live longer and healthier. So, trying to improve patient care processes and workflow usability by process mining can have a positive and big impact on the health of society because process mining can suggest ways to enhance healthcare process effectiveness and efficiency.

Once considering the healthcare domain as the one to work in during this project, we decided to narrow our scope to a specific healthcare type because healthcare domain is quite varied and considering all of

12 them can take much more time than we have for this project. So, we chose to focus our research in intensive care processes provided in intensive care units of a hospital.

The motivation behind our choice to apply process mining in ICU processes data is related firstly to the type of care ICUs offer. ICUs provide care to patients with the most serious injuries and illnesses and the number of patients that require intensive care has an increasing trend [45]. Obviously, improving the service offered by them is of a great benefit to the community.

Also, process mining can detect deviations from intended process models (as are the models described in medical guidelines and protocols) relevant to minimizing medical errors and maximizing patient safety. ICU has protocols and guidelines that its staff is supposed to use but it is not always clear whether they follow these protocols or not and if they follow them correctly. Medical errors are common and cause morbidity and mortality in critically ill patients [16]. In a study of surgical ICUs, the types of events reported were related to medications, tests, treatments, or procedures [41]. So, it is important to prevent this and Valentin and Bion have shown in [43] that many improvements in patient safety in ICUs could be implemented through changes in clinical behavior and within existing resources as for i.e. improving the reliability and standardization of processes of care, reducing unnecessary variation and complexity, and encouraging team working.

From the financial point of view it is also useful to do process mining in ICU processes because ICUs spend a big part of the hospital budget [27] and process mining can help reducing these costs by optimizing ICU processes.

Finally, to our knowledge, there is only little research done in the process mining field with the ICU data [19]. So, we think that our project can start contributing in this area by investigating whether medical guidelines are followed by ICU staff by process mining ICU process data.

1.3 Research question

In this section the goal and the main research question of the thesis are presented.

We have assumed that process mining can help improving ICU processes as explained earlier in this chapter and now the problem it is a matter of applying it in this area. Considering what process mining is and how this technique works, in general all we need to have for doing process mining is the logged data about processes and in our project the logged data about ICU processes.

Firstly, the goal of this project is to apply process mining in ICU data in order to do check if specific medical guidelines are followed correctly in ICU. Considering this goal our research question is:

1. Can process mining check whether medical guidelines are followed by ICU staff, based on the logged ICU process data?

Based on this research question we have the following sub‐questions:

13

1.1. Which ICU database should we use to get ICU process data? Time reserved to do this thesis project makes it very difficult to look at several ICU databases and as a result we should select one ICU database and use its data for the rest of the research. 1.2. Data of which ICU clinical processes should we select? Applying process mining in all the data of the ICU database without filtering or selecting parts of these data would result to spaghetti process models [44] which are difficult to understand and analyze. Moreover, mining separately the data of all ICU processes requires much more time that it is available for this project. So, it makes sense to select one or some specific ICU processes (not many of them) for using them further in our research project. 1.3. Where can I find the data of the selected ICU processes, in the ICU database? It is important to find the data that correspond to the selected ICU processes in the ICU database, because these data should be the input of process mining. 1.4. To what degree the data stored in the ICU database about the selected processes is suitable for process mining? From literature [1:96], it is not always sure that we can provide the required process data needed by process mining. So, with this question we determine the current situation by deciding whether the process data are enough to do process mining. Therefore, depending on the result of this research sub‐question, we decide if can apply process mining and answer the next research sub‐question (sub‐question 1.5) or we decide that we cannot apply process mining and answer the rest of research sub‐questions (sub‐questions 1.6, 1.7 and 1.8). 1.5. What are the results of applying process mining on the ICU process data with the goal of checking if guidelines are followed? 1.6. What are the issues that make the ICU process data not suitable for process mining? 1.7. How should we store the ICUs’ clinical process related information in an ICU database so that we can do process mining with it? 1.8. What procedures should we use to collect and store all clinical process related data in the ICUs required for doing process mining?

Finally, to achieve the project goal and to answer the above research question and sub‐questions, in this project, we have developed and followed a certain approach that is described in chapter 3.

1.4 Outline

The remainder of the thesis is organized as follows.

Chapter 2 provides all the necessary background for the reader of this thesis: a description of intensive care units of hospitals, an overview of the process mining discipline, related work done in mining healthcare data and a brief overview of basic knowledge of Structured Query Language.

Chapter 3 describes the approach used in thesis project.

Chapter 4 presents the attempt to apply process mining on concrete ICU data.

14

Chapter 5 gives solutions (in the form of database design solutions and procedural solutions) to different issues encountered in our attempt to apply process mining in concrete ICU data.

Chapter 6 concludes the report by summarizing the obtained results, the limitations of this project, and proposing ideas for future work.

15

2 Background

This chapter introduces some background information to have more knowledge in the research area of this project and better understand the content of the following chapters. It starts by giving some general information about intensive care units of a hospital in section 2.1 and it continuous with explaining what a process is and the basics of process mining in section 2.2. Then in section 2.3, it shows the related work done in mining healthcare data. Finally, it gives a brief overview of basic knowledge of Structured Query Language.

2.1 Description of intensive care units

In this section, we briefly give some general information about Intensive Care Units (ICUs) of a hospital because our research project is focused on mining the data of these hospital units.

Intensive care units (ICU), also called critical care departments, look after patients whose conditions are life‐threatening and need constant, close monitoring and support from equipment and medication to keep normal body functions going. Intensive care units are run and staffed by specialists trained in intensive care.

Patients in an ICU may be experiencing multiple organ failure, respiratory arrest, or other serious problems which require intensive monitoring and complex treatments. Once a patient is admitted to the unit, the intensive care team will manage the care of the patient in consultation with the original team that admitted the patient to the hospital and any other specialists that they think can help to aid the patient's recovery.

Effectiveness and efficiency of care of the critically ill patient are subject to a number of influences, including skills of individual physicians/nurses (technical and non‐technical), team working in the ICU, and the ICU environment [42]. Therefore, by using process mining we hope to get a better understanding of these human and non‐human elements of the care system of the critically ill patient and consequently improve care delivery in ICU. So, in the next section, we introduce the basic concepts of process mining.

2.2 Processes and Process mining

This section gives a general overview of process mining by explaining basic concepts of process mining and showing its applicability in the healthcare domain in order to give a clear picture of it which is needed for better understanding the rest of the chapters in this report.

Firstly, it introduces the notation of business process in subsection 2.2.1, followed by the explanation of what process mining is and what it can do in subsection 2.2.2. Then in subsection 2.2.3, it is shown what type of data process mining needs. Finally, we give a brief overview of available process mining tools in subsection 2.2.4.

16

2.2.1 What is a business process?

In this subsection, we give some definitions of the business process so that it is clear what we mean throughout the report by a business process (also referred in this report as process).

There are quite some definitions of business processes in the literature and below we are showing some of them.

Davenport (1993) [9] defines a (business) process as:

”a structured, measured set of activities designed to produce a specific output for a particular customer or market. It implies a strong emphasis on how work is done within an organization, in contrast to a product focus’s emphasis on what. A process is thus a specific ordering of work activities across time and space, with a beginning and an end, and clearly defined inputs and outputs: a structure for action. ... Taking a process approach implies adopting the customer’s point of view. Processes are the structure by which an organization does what is necessary to produce value for its customers.”

Hammer & Champy’s (1993) [20] define a process as:

”a collection of activities that takes one or more kinds of input and creates an output that is of value to the customer.”

Johansson et al. (1993) [22] define a process as:

”a set of linked activities that take an input and transform it to create an output. Ideally, the transformation that occurs in the process should add value to the input and create an output that is more useful and effective to the recipient either upstream or downstream.”

So, based on the definitions above, a business process mainly focuses on how work is done instead of what is done and it consists of related structured activities which are ordered in time (they can be executed sequentially or in parallel and they can be optional or not) and that must add value to the recipient of the process’ outcome (customer).

Examples of business processes could be handling customers’ orders, producing certain products in a factory, handling customers’ complaints in a bank etc.

Also healthcare processes are business processes. For instance, the treatment of patients with a specific diagnosis in hospital is a business process, because here we have a set of activities done in a certain order to the patient to treat him/her. The input of the process is the sick patient and the output is the treated patient. Other healthcare processes are different diagnosis processes such as doing an x‐ray to a patient in a hospital or administrative processes such as the scheduling of appointments in hospitals.

2.2.2 Process mining basics

In this subsection, we give a general overview of what process mining is and what it can do.

17

The idea of process mining is to discover, monitor and improve real processes (i.e. not assumed processes) by extracting knowledge from eventlogs [1:8].

An event log is a file which contains information about events executed in a certain information system. An event represents the execution of an activity or a task at a certain point in time. Usually, these events are related to activities of a specific process and they refer to a single instance of that process which is known as case.

Figure 1‐ Process mining overview [30]

Figure 1 gives a clear picture of the different elements related to process mining.

The graphically displayed cloud in Figure 1 represents the “world”, i.e. everyone and everything interacting and being affected by the environment in which the ‘(software) system’ operates, e.g. business processes, people, machines, components, organizations, etc. The goal is to run these processes by using optimum resources and still get the best output from them. Of course, this is not intuitive and demands for analyses of these processes. Process models (Figure 1, ‘process model component) can help analyzing processes and get more insight on them. Usually, these processes are integrated in the information systems (Figure 1, ‘software system’ component) of the company. These information systems on one hand can support and/or control processes and on the other hand, it keeps recording all the data related to the execution of these processes. Event logs (Figure 1, ‘event logs’ component) can be created with these data and here is where process mining comes out. The event log can be used to do three different types of process mining (as shown in Figure 1): 1) discovery, 2) conformance and 3) extension.

Discovery is the first type of process mining and it takes as input an event log and produces a process model as shown in the Figure 1. This process model is built by just using the data of the event log and not any other a‐priori model. This technique can be used for instance to find out what activities are executed and in which order for a certain process.

18

Conformance is the second type of process mining. It takes as input an event log and a process model and compares them as shown in the Figure 1. Here, the reality recorded in the event log is checked if it adheres to the process model and vice versa. This type of process mining can reveal the deviations done to the model during their execution as for instance if certain steps are skipped or different paths are followed and so on.

Extension is the last type of process mining. Here, we again use as input an event log and an existing process model but in difference to conformance checking, extension should use the information in the event log to extend the input model. So, modifying the model to be more compatible with the reality or adding new perspectives to it can be possible outputs of extension. For example, if two different laboratory tests are modeled to be done sequentially but in reality can happen in any order, then the model can be corrected to reflect them.

In addition to what Figure 1 shows about three types of process mining, we can identify different perspectives. According to [1:10–11], these perspectives are:

 The control‐flow perspective focuses on the control‐flow, i.e., the ordering of activities. The goal of mining this perspective is to find a good characterization of all possible paths.  The organizational perspective focuses on information about resources hidden in the log, i.e., which actors (such as people, systems, roles, and departments) are involved and how are they related. The goal is to either structure the organization by classifying people in terms of roles and organizational units or to show the social network.  The case perspective focuses on properties of cases. Obviously, a case can be characterized by its path in the process or by the originators working on it. However, cases can also be characterized by the values of the corresponding data elements.  The time perspective is concerned with the timing and frequency of events. When events bear timestamps it is possible to discover bottlenecks, measure service levels, monitor the utilization of resources, and predict the remaining processing time of running cases.

2.2.3 What data do we need to do process mining?

Here, we show what type of process data is needed to do process mining.

It is essential to have the process data in the database to do process mining because otherwise if there is no data to be mined or there is only a few of them then we cannot talk about process mining at all. As a solution to this issue, obviously would be to design information systems (through which the processes are executed) capable of logging process related data digitally in a database, or different type of files.

Furthermore, the digitally stored process data should be activity oriented. What type of data we need it is highly related to the goal of using process mining but these activity oriented data should have at least the minimum type of data required to do process mining. The minimum data required for process

19 mining are that set of data based on which we can build a sequence of events and that set of data should contain:

1. CaseId This is the unique identification code of the case. So, once a process instance is executed, it should have a unique identifier in order to distinguish it from other executed process instances. 2. Activity This is the name of the activity or task executed as part of the process. 3. DateTimestamp This shows the time when a certain activity of a certain process instance was executed.

Below, Figure 2 shows an example of the minimum set of data needed to do process mining which contains three different cases (with IDs 4, 5 and 6) and for each case there is a list of activities performed and their time stamps. For instance, case with ID 4 and 5 have the same sequence of events: 1) take patient to exam room, 2) check current medication, 3) show/modify allergies and 4) patient examination. Meanwhile, case with ID 6 has a longer sequence of events with 5 different activities.

Figure 2‐Exampleof an event log for a certain process with minimum data

But as we mentioned, this is just the minimum required data and depending on what kind of questions we want to answer through process mining other additional process related information can be needed, as for instance:

 Resource related information This kind of information can be about persons, roles, rooms, and devices etc. that are involved in the process.  Other data Here, we can include process information about diagnosis, treatments, resource costs, facilities, patients’ demographics etc.

20

A sample of an event log with more data than minimum requirement is shown in Figure 3. So, this event log contains the cases displayed in Figure 2 but with more data for each case. This additional data specify the facility the process belongs to, the user who performed the activity and the state of the patient (sick or well).

Figure 3‐ Example of an event log for a certain process

2.2.4 Process mining tools

Obviously, the application of process mining techniques is supported by tools. Nowadays, there are many tools available to the users for doing process mining as for i.e ProM, Reflect|one, ARIS, Disco etc. [1:271]. In Table 8 of Appendix D, it is shown a longer list of process mining tools where their names and some more details about them are given.

2.3 Related work

In this section, we start by giving an overview of how process mining is applied in healthcare domain in subsection 2.3.1. Considering that we are focused on process mining critical care data, we see it relevant to briefly introduce some of research done on mining the ICU data in subsection 2.3.2.

2.3.1 Applications of process mining in healthcare

Considering what process mining is and how it works, it can be used to extract valuable knowledge from process data of quite a wide range of industries as service industry, production industries, healthcare etc. In this section, we focus on the applicability of process mining techniques only in healthcare domain and we show some of the work done in this area.

Most of the literature tries to identify the problems and show the applicability of process mining in healthcare. So, firstly we describe the challenges of process mining in healthcare in subsection 2.3.1.1 and then its applicability by mentioning some case studies in subsection 2.3.1.2.

21

2.3.1.1 Challenges for process mining in healthcare

Healthcare processes could be treatment processes, diagnosis processes or administrative processes (such as scheduling an appointment). It is not trivial to mine these processes because of their characteristics and the fact that they are executed in a continuously changing and complex environment [38]. According Rebuge and Ferreira [38], healthcare processes have the following characteristics:

 Highly dynamic. The medical knowledge evolves continuously, meaning that new treatments, new drugs and diagnostics procedures are discovered. As a result, the medical processes change as well.  Highly complex. The basis of healthcare clinical processes is medical decision processes which use large amount and various types of data. Moreover, medical decisions and treatment outcomes may be unpredictable because individual treatment processes of different patients (even with the same illness) are unique. Consequently, related clinical processes once instantiated can have unpredictable behavior too.  Increasingly multi‐disciplinary. Healthcare organizations are characterized by an increasing level of specialized departments and medical disciplines and healthcare processes are increasingly executed as a set of activities distributed in different departments and performed by the collaborative effort of professionals with different skills, knowledge and organizational culture. So it is difficult to get an integrated view as data are found in different information systems.  Ad hoc. As physicians have the power to act according to their knowledge and experience, and need to deviate from defined guidelines to deal with specific patient situations, the result is that there are processes with high degree of variability, non‐repetitive character, and whose order of execution is non‐deterministic to a large extent.

2.3.1.2 Process mining in Healthcare ‐ Case studies overview

Despite the challenges mentioned in subsection 2.3.1.1, process mining has been applied in several case studies in healthcare domain. Here, we mention some of them so that the reader can have a better overview of what process mining can do in the healthcare domain.

R. Mans et al. [30] applied process mining on a real case of a gynecological oncology process in a Dutch hospital. They analyzed this type of healthcare processes from three different perspectives: (1) the control flow perspective, (2) the organizational perspective and (3) the performance perspective, by using various process mining techniques in order to obtain more insights about so‐called careflows (typical paths followed by particular groups of patients). Also, they derived understandable models for large groups of patients.

22

Another case study was conducted by Rebuge and Ferreira at emergency department of a hospital in Portugal [38]. They applied process mining to analyze the careflows of emergency patients, which involve activities comprising the triage, treatments, diagnosis, medical exams, and forwarding of patients.

Using the radiology workflow as an example, they showed that process mining helps in determining the regular behavior of the process, gaining insight into the variants and exceptions, the performance of the process, and about potential deviations from medical guidelines.

Moreover, R. Mans et al. [29] also applied process mining on different datasets of different hospitals for stroke patients related processes. Their work showed that process mining techniques can be applied successfully to clinical data to gain a better understanding of different clinical pathways adopted by different hospitals and for different groups of patients. Also, they demonstrated that process mining is useful in comparing the same type of process from different hospitals, in order to discover different practices that are used to treat similar patients, but also highlight unexpected behavior.

In addition to the case studies showed above, there are other scholarly publications done in this field as: the process mining analysis done on clinical imaging processes in [26]; the investigation of treatment processes for breast cancer patients in [34]; the investigation of treatment of patients in Intensive Care Unit in [19].

To sum up, the above mentioned research work shows that process mining can be useful in healthcare to analyze and get valuable insights about careflows of different departments of the hospital but, not much of this research is done on ICU healthcare data.

2.3.2 Mining ICU data

This section introduces some of the research done on mining the ICU data. Here, we mention not only process mining application on ICU data but also data mining application on them in order to show the great interest of researchers in this area.

In [21], data mining is defined as “the analysis of (often large) data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner”. The input data is usually given as a table and the output may be association rules, clusters, tree structures, graphs, patterns etc.

To our knowledge, as already mentioned in subsection 2.3.1.2, only in [19] process mining has been used to investigate the treatment of patients within an intensive care unit. Meanwhile, it seems that data mining has found its way into hospital Intensive Care Units because a lot of research is done on mining ICU data with data mining discipline.

Mainly, data mining is applied to create different prediction models in ICU data. So, data mining in [23] is used to develop an intensive care unit (ICU) mortality prediction model and in [18] it improves the predicting processes of ICU patient survival. Also by means of data mining, Ramon et al. in [37] did several prediction tasks related to intensive care patients as i.e. prediction of patient survival, prediction

23 of length of stay longer than 3 days, prediction of Inflammation N days from today, prediction of inflammation‐shock N days from today, prediction kidney dysfunction N days from today etc. In addition, Fialho et al. used data mining to predict in [12] ICU patient readmission between 24 and 72 hours after ICU patient discharge. In [14] they used fuzzy modeling to predict administration of vasopressors in intensive care unit patients and in [13] they tried to predicting the outcomes (survived or deceased) of septic shock patients based on their data recorded in 71 German intensive care units.

Data mining has been used also for analyzing blood glucose monitoring data from hospitalized intensive care unit patients [5].

Moreover, there are commercial software that use data mining techniques as i.e. the ICU monitoring software of Predictive Medical Technologies that can predict adverse events like a cardiac arrest up to 24 hours before they happen based on prediction models [55].

Finally, we can say that the above mentioned research on mining ICU data shows not only the interest of the researchers in this area but also the important benefits that ICUs of a hospital can have from it in practice.

2.4 Structured Query Language (SQL)

This section introduces briefly what is Structured Query Language (SQL) and some of its basic concepts needed to understand part of this report that explains the SQL queries used for searching process data in section 4.3 of chapter 4.

SQL (Structured Query Language) is a data sublanguage for access to relational databases that are managed by relational database management systems [31]. It defines the methods used to create and manipulate relational databases on all major platforms.SQL commands can be divided into two main sublanguages; the Data Definition Language (DDL) and Data Manipulation Language (DML). DDL contains the commands used to create and destroy databases and database objects. Meanwhile DML can be used to insert, retrieve/query and modify the data contained within it. Here, we explain briefly only commands needed to create queries in a relational database.

The SELECT statement is used to form queries for extracting information out of the database and its syntax (a simple form of select command) is shown below:

SELECT column_name1, column_name2,... FROM table_name WHERE column_name operator value

The query shown above, selects data from a table (in this case named ‘table_name’) stored in one or more of its fields (in this case named ‘column_name1’, ‘column_name2’ etc.). Here, the ‘WHERE’ clause can be used to limit the records that are retrieved to those that meet specified criteria.

In cases when we want to select all the fields of the table fulfilling certain criteria, we use the asterisk symbol (after the ‘SELECT’ keyword) and the query looks like below:

24

SELECT * FROM table_name WHERE column_name operator value

For more information about building SQL queries we can refer to [31].

25

3 Methodology

This chapter describes the approach taken to reach the project goal and to answer the research question and sub‐questions discussed in section 1.3. We decided to follow a research approach moving from specific observations to broader generalizations.

So, we start our research project by analyzing one or more ICU processes in an ICU database in order to mine their data with process mining. Then, we check what data we have about them in this database and based on the found data we decide if we can apply process mining on them. So, in case we have process data to do process mining, we proceed with applying different process mining techniques. Otherwise, we discuss different issues faced in our attempt to mine the selected process data and we suggest solutions to those issues.

Figure 4‐ The phases of the approach

Our research approach is shown in Figure 4 and it contains several phases which are shown below:

‐ Select ICU database ‐ Select ICU process(es) ‐ Look for process(es) data ‐ Assess procress(es) data (Do we have data to do process mining?) ‐ Apply process mining techniques ‐ Identify potential issues ‐ Design solution

Running our approach can lead to different final phases (shown in red color in Figure 4) based on the decision taken in the phase named ‘Assess procress(es) data’. So, the final phase of our approach either is the phase ‘Apply process mining techniques’ or the phase ‘Design solution’. In the following sections we describe in detail all the phases of our approach shown in Figure 4.

3.1 Select an ICU database

This is the initial phase of our approach and it answers the sub‐question 1.1 (described in section 1.3) that is about which ICU database we should use to get ICU process data.

26

The time reserved to do this thesis project makes it very difficult to look at several ICU databases and as a result we select one ICU database as a first step of our research approach. The selection of the ICU database should be based on the following criteria:

1. the database should contain diverse types of ICU clinical data 2. the database should contain vast amount of ICU clinical data 3. the database should be well supported by documentation 4. the database and its documentation should be available for access

As any other mining technique also process mining results are heavily dependent on the amount and the types of data available. Therefore, it is important to have an ICU database that contains various ICU clinical data because this increases the chances to have the type of data needed for process mining. Also, having more (in quantity) of these diverse data positively affects the results of process mining as they reflect better the reality (for i.e. it is easier to distinguish noise and exceptional cases).

Moreover, in the following phases of our approach we need to investigate the data of the selected ICU database. This implies that we have full access on its data and also understand well enough its structure and its content. So, the ICU database should be supported by documentation in order to help us understand it properly and obviously, in such cases we should also be able to access this documentation.

In addition, the documentation about the database helps us check whether an ICU database fulfills the first, the second and the fourth above mentioned criteria in relatively short time compared to doing independent investigation on its data (that can be too time consuming). On the other hand, during this phase we should also investigate its data (but not in details) in order to be sure that the provided documentation describes properly the database.

To sum up, in this phase of our approach we select an ICU database based on some specific criteria. The selected database is the input for the next phase of our approach named ‘Select ICU process(es)’ which is described in the following section.

3.2 Select ICU process (es).

This is the second phase of our approach and it answers the research sub‐question 1.2 (described in section 1.3) that is about which ICU clinical process(es) data we should select.

As mentioned in subsection 3.1, the selected ICU database should be large and diverse in data and consequently applying process mining without filtering or selecting parts of these data would result to spaghetti process models [44] which are difficult to understand and analyze. Also, it requires a lot of time to analyze many ICU processes. So, it makes sense to select one or some specific ICU processes (not many of them) for using them further in our research project.

During this phase, we conduct interviews with ICU experts and do online investigation about what ICU processes to select so that we can apply later process mining on them. We use ICU experts’ suggestions because our knowledge in this medical area is not enough and it is difficult for us to decide on which

27 processes to pick up. ICU experts are the ones who face everyday these processes and obviously they have quite good knowledge about them.

During these interviews, we firstly should introduce briefly to ICU experts what process mining is and how it can be used (among others) also for checking if medical guidelines are followed properly based on the logged data. So, in this way we explain our project goal and then we ask them what ICU processes they find interesting and important to be mined by process mining (with the above mentioned goal).

Furthermore, to get a better understanding of the selected processes, we try to gather as much as possible information about them by using online resources such as guidelines, protocols or other ICU documentations.

In summary, during this phase of our approach, we select one or some ICU processes so that in the next phase of our approach we can look for their data.

3.3 Look for process(es) data.

The third phase of our approach is phase ‘Look for process(es) data’. During this phase, we answer the research sub‐question 1.3 (described in section 1.3) that is about finding the data of the selected ICU process(es), in the ICU database.

So, once we have selected one or some ICU processes in the previous phase, the next thing to do is investigating these processes’ data in the database. It is important to have the right data about processes before exploring them with different process mining techniques. So, during this phase, the emphasis is on the process data that can be used for doing process mining. Here, we investigate in more detail the selected database to see what ICU data really consist of and where are the data about the selected ICU processes. So, we would like to get enough information to answer this kind of questions:

‐ What types of data are stored in such a database? ‐ How are the data structured in this database? ‐ Can we find data about ICU selected process(es) in this database? ‐ What part of the ICU selected process(es) data can be interesting from the process mining perspective? Etc.

This can be achieved by using different database management systems that have user friendly graphical interfaces to explore the data of a database (such as MySQL, SQL, Oracle etc.).

Finally, the output of this phase can be some process data or no process data at all (if we did not find any data about the selected process‐es). So, based on the output of this phase, in the next phase our approach, we decide if we can apply process mining.

28

3.4 Assess process(es) data.

This is a special phase of our approach because it determines which path out of two possible ones we go in our project, for each of our selected processes. In this phase, we answer the research sub‐question 1.4 (described in section 1.3) about the degree that the process(es) data stored in the ICU databases are suitable for mining them by means process mining techniques.

So, during this phase, we assess the output of previous phase of our approach (phase ‘Look for process(es) data’). This output can be no data or some data about ICU process(es). In case, we have some process data, here we check if this data are enough and of the right type to conduct process mining on them with the goal of checking whether corresponding medical guidelines are followed.

To do so, firstly, we check if the process data contain the minimum data required by process mining. These data usually give information the sequence of events of the process. The minimum data required by process mining is explained in detail in subsection 2.2.3.

Secondly, we specify what other data about the selected ICU process(es) (rather than the minimum) do we need to apply process mining for checking whether corresponding medical guidelines are followed, based on the information of these guidelines. Then, we investigate if the process data contains these last types of other required data. These other data can be data the show detailed information about events such as data about the name of medication and the dose used in the event of administrating medication to the patient or the specimen tested in a microbiology test event etc.

So, if the selected process(es) data contain all the needed data (described above) then we decide to apply process mining techniques on them, which is done in phase ‘Apply process mining techniques’ of our approach. Otherwise, we decide that we cannot apply process mining due to the lack of data and in such cases the next phase in our approach is phase ‘Identify potential issues’. Also, we take the same decision in cases when we have not found any data about the ICU processes in phase ‘Look for process(es) data’.

In summary, during this phase of our approach we answer the question ‘Do we have data to do process mining’ and if the answer is ‘Yes’ then the next phase in our approach is ‘Apply process mining techniques’ otherwise (if the answer is ‘No’) the next phase is ‘Identify potential issues’.

3.5 Apply process mining techniques.

In this phase, we apply different process mining techniques on selected process data in order to check if guidelines (the ones that correspond to selected process‐es) are followed by ICU medical staff. So, here we answer the research sub‐question 1.5 (described in section 1.3) about getting the results of applying process mining on the ICU process data with the goal of checking if guidelines are followed.

Considering the flexible and complex nature of healthcare processes (assuming the same also for ICU processes), in this phase, firstly we choose several different process mining algorithms that are good at dealing with such processes, for i.e. Heuristics miner, Fuzzy miner etc. Then, we find tools that

29 implement these algorithms and run them on the process data. As a last step, we analyze and compare the results of mining the process data by the selected algorithms in order to find out whether we can check if ICU staff follows the guidelines correctly. Consequently, we answer also the main research question in section 1.3.

After conducting this phase, there are no other phases that follow it in our approach because this is one of the end phases of the approach.

3.6 Identify potential issues.

Doing process mining is not always straightforward in healthcare [38]. So, during this phase, we identify clearly different problems faced during phase ‘Look for process(es) data’ of the approach by answering so the sub‐research question 1.6 (described in section 1.3) that is about specifying the issues that make the ICU process data not suitable for process mining.

As already mentioned above in this chapter, we do this phase of our approach only if we have decided in phase ‘Assess process(es) data’ that it is not possible to do process mining on the process data found in phase ‘Look for process(es) data’.

These issues are discussed and elaborated in detail in order to have a clear picture of them, that is necessary as input for the next phase where we design their solutions. This phase is ‘Design solution’ and it is described in the next section.

3.7 Design solution

The main goal of this phase is to give solutions to problems listed in phase ‘Identify potential issues’. We give solutions to these problems in order to show how ICU process data should be stored in a database so that it can be used by process mining.

As part of the solutions, we firstly redesign the selected ICU database in order to show how it should structure ICU process data. Then, we discuss its generalizability. If the redesigned version of the ICU database is general enough then we suggest using it also for other ICU databases (not just the selected one). Otherwise, we design an ICU database in a high level of abstraction that makes it clear and concrete the way data needs to be stored in order that process mining is possible. Then, this last design is suggested to be used for different ICU databases. In this way, we answer the research sub‐question 1.7 (described in section 1.3).

The redesign of the selected ICU database (selected in the first phase of our approach) or the abstract design of an ICU database should fulfill these criteria:

 Allow storing standardized information according to medical guidelines/protocols about ICU clinical processes, for i.e. a list of predefined clinical ICU processes for different diagnosis.  Allow storing standardized information according to medical guidelines/protocols about ICU events that are part of ICU clinical processes, for i.e. a list of predefined events that can happen to ICU patients.

30

 Allow storing information about the occurrence of the events and processes.  Allow storing information that maps the events’ occurrences with the initialized processes.

Furthermore, we can build these database design solutions by using different tools that allow modeling and designing databases such for i.e. UML tools, Entity Relationship diagramming tools (if we have a relational database) etc.

In addition, in this phase of the approach, we also provide ideas about the data collecting procedures in ICUs, by answering so the research sub‐questions 1.8 (described in section 1.3). These last procedural solutions are important because they show how to put in practice the above mentioned design solutions.

In summary, we show not only how an ICU database should look like but also how to make data available to it, with the goal of having enough information to conduct process mining in it. This phase is an end phase of our approach and as a result there are no other phases following it.

31

4 Process mining the ICU data

As already mentioned in Chapter 1, our goal in this project is to check whether process mining can find if treatment guidelines are followed correctly in ICU. Moreover, to reach this goal, we have developed an approach as described in Chapter 3.

In this chapter, we show how we have applied our approach in this project by reporting so our attempt to do process mining in the ICU source of data selected for this project. In this attempt, we failed to do process mining. So here, we describe the application of the following approach phases: 1) Select ICU database 2) Select ICU process‐es 3) Look for process‐es data 4) Assess process‐es data and 5) Identify potential issues respectively in sections 4.1, 4.2, 4.3, 4.4 and 4.5.

Then, in chapter 5, we explain the application results of phase ‘Design solution’.

4.1 Select an ICU database

This section describes the application of the first phase of our approach ‘Select an ICU database’.

In this project, we have chosen to use the logged ICU clinical data of a hospital in Boston called Beth Israel Deaconess Medical Center (BIDMC). These data are stored in a database named Multiparameter Intelligent Monitoring in Intensive Care (MIMIC II).

Firstly, we show the motivation for choosing MIMICII database in subsection 4.1.1 and then in subsection 4.1.2, we describe the content of this database. Finally, we give the structure of MIMICII database in more detail in subsection 4.1.3.

4.1.1 Why MIMICII database?

MIMICII database seemed the right choice to use in this thesis because first of all it is an ICU massive database. It contains the data recorded in different ICUs of BIDMC hospital that were collected over a seven year period. It has more than 36000 hospital admissions and around 33000 patients recorded in it. This allows us to have plenty of ICU data, which is a good start if we want to do process mining.

Furthermore, MIMICII is not only a large‐scale database but it also contains various kind of information. It records most of the clinical data and wave data of the ICU patients. Its information includes data pertaining to patient events (i.e. movement between wards), patient diagnoses, data from bedside monitors and patient clinical data. This diversity of stored data provides a wealth of information, all of which is potentially useful to researchers for a variety of purposes. As a result, it has great chances to have the data we need for our case study’s issues because the developers of this database have tried to store any kind of information that can be useful to researchers in the area of mining ICU medical data.

In addition, MIMICII is built for research purposes and as a result it is a well‐structured database supported by good documentation and a support team which offers help and support to researchers for different kind of issues related to MIMICII.

32

Last but not least, MIMICII is freely available for research purposes, allowing us to access and use its data and documentation easily for our research project.

In summary, this database fulfills all the four criteria mentioned in section 3.1 because it contains vast amount of diverse data, and we can access them and its good documentation.

4.1.2 MIMICII content

MIMICII database contains physiologic signals and vital signs time series captured from patient monitors, and comprehensive clinical data obtained from hospital medical information systems, for tens of thousands of ICU patients.

MIMICII data were collected from 2001 to 2008 from a variety of ICUs (medical, surgical, coronary care, and neonatal) in the BIDMC hospital and represent around 25000 adult patients and around 8000 neonate patients (version 2.6 of MIMICII). Its data are de‐identified due to privacy issues by anonymizing the patients and ICU staff and by shifting all dates in the future (but the time intervals are preserved) [40]. Source data for the MIMIC II database consists of bedside monitor waveforms and associated numeric trends derived from the raw signals, clinical data derived from Philips' CareVue system, data from hospital electronic archives and mortality data from the Social Security Death Index [7].

MIMICII has two distinctive groups of data which are saved in two different databases known as the Clinical database and Waveform database.

MIMICII clinical database contains clinical data of ICU patients. Its data includes patient administrated medications, pharmacy provider order entry (POE) records, admission and death records, patient demographic details, discharge summaries, ICD‐91 diagnostic codes, procedure codes, microbiology and lab tests, imaging and ECG2 reports and the ICU central database (which includes some subset of the bedside monitor trends, drip rates, free text nursing notes and nurse‐verified down‐sampled trends, amongst other information) [7]. We can see some of the main tables of MIMICII clinical database and the relationships between them in Figure 16‐Figure 19 in Appendix A‐ MIMICII tables and their relationships.

An overview of its clinical data categories is [52]:

 General ‐ Patient demographics, hospital admissions & discharge dates, room tracking, death dates (in or out of the hospital), ICD‐9 codes, unique code for health care provider and type.  Physiological ‐ Hourly vital sign metrics, SAPS, SOFA, ventilator settings, etc.  Medications– IV (Intravenous) meds, provider order entry data, etc.  Lab Tests ‐ Chemistry, hematology, imaging, etc.  Fluid Balance ‐ Intake (solutions, blood, etc.) and output (urine, estimated blood loss, etc.).

1International Classification of Diseases is a health care classification system that provides codes to classify diseases and ICD‐9 is the 9th version of ICD [49]. 2 The electrocardiogram or ECG is a test that records the electrical activity of the heart [50].

33

 Notes & Reports ‐ Discharge summary, nursing progress notes, etc.; cardiac catheterization, ECG, radiology, and echo reports.

Meanwhile MIMICII Waveform database stores the high resolution waveforms produced by the bedside monitors in ICUs. So, it includes records of continuous high‐resolution physiologic waveforms and minute‐by‐minute numeric time series (trends) of physiologic measurements. Many, but not all, of the Waveform Database records are matched to corresponding Clinical database records.

For more information regarding MIMICII content we can refer to [53].

In our case study, we decided to work with only the clinical data of MIMICII, stored in MIMICII clinical database, and left out of our project the waveform database. This choice was done because we wanted to investigate on the clinical ICU processes and we thought that the clinical database had all the kinds of information we could need. So, in the next subsection, we describe the structure of only the MIMICII clinical database.

4.1.3 MIMICII clinical database structure

In this subsection, we describe the database structure of the MIMICII clinical database because it contains the data we chose to work with as explained in subsection 4.1.2. So, we give a brief overview of its main tables and the relationships between them without getting into implementation details. This overview is needed to understand part of this chapter and the content of chapter 5.

MIMICII clinical database is a relational database1 which contains several tables connected with each other. So, it has information about the patient (i.e. date of birth, gender, demographic information etc.), different patient events (i.e. laboratory tests events, medications events etc.) and items which can be recorded for a particular event (i.e. medication items, chart items etc.) in separate tables.

The patient table named D_PATIENTS is central to the database model, other information such as admission, demographics, procedures or medications can be readily accessed once a particular patient is identified. The tables in Figure 17 of Appendix A‐ MIMICII tables and their relationships show the relationship between table D_PATIENTS and other major components in the database.

A patient may have been admitted several times during the period in which MIMICII data were collected. Therefore, it is important to understand how to identify patients and their stay(s). According to [7], there are three identifiers for data associated with any given patient:

 Subject ID (Subject_ID) ‐ an integer number identifying a particular patient. This can be thought of as a substitute for a unique medical record number and it is the same for all hospital admissions of the patient.  Hospital admission ID (Hadm_ID) ‐ an integer number identifying a particular admission to the hospital. Each patient may have many Hadm_IDs associated with his/her unique Subject ID.

1 Relational database is a collection of data items organized as a set of formally‐described tables from which data can be accessed or reassembled in many different ways without having to reorganize the database tables [47].

34

 ICU stay ID (ICUstay_ID) ‐ an integer number identifying an ICU stay. An ICU stay, refers to the period of time when the patient is cared for continuously in an Intensive Care Unit. During patient hospital admission (identified by Hadm_ID), he/she (the patient identified by Subject_ID) may have one or more ICU stays associated.

Moreover, there are several tables related to patients’ events such as: MEDEVENTS where medication(s) given to a patient are recorded ; IOEVENTS where fluid input/output events related to a patient are stored; TOTALBALEVENTS where total fluid balance events are stored; LABEVENTS where laboratory test done to a patient are stored; MICROBIOLOGYEVENTS where events indicating microbiology tests taken from a patient are stored; PROCEDUREEVENTS where events indicating procedures performed on a patient are stored; CHARTEVENTS where patient medical charts data are stored; NOTEEVENTS where patient notes made by hospital staff are recorded etc.

A simplified database model of MIMICII is shown in Figure 5 that contains tables relevant for this project and their relationship with the D_PATIENTS table. More detailed information about events tables is shown in Figure 16 and Figure 18 in Appendix A‐ MIMICII tables and their relationships.

Figure 5‐ A simplified model of MIMICII that contains tables relevant for this project and their relationship with the D_PATIENTS table. This model is built based on the information in [7].

35

In addition to the above mentioned tables, there are also tables which record separately items related to particular events such as medications items that are possible medications which can be administered to a patient, items which can be entered on patients’ chart etc.

Also, MIMICII has tables that store detailed information about durations of certain events such as A_MEDDURATIONS for duration of medication events, A_IODURATIONS for duration of fluid input/ output events and A_CHARTDURATIONS for chart event durations. The Figure 20 in Appendix A‐ MIMICII tables and their relationships shows the structure of tables which record data about medication events.

Finally, we can admit that MIMICII is quite a structured database and it seems that the above mentioned tables are well linked to each other by the patient identifiers Subject_ID, Hadm_ID and ICUstay_ID. As a result, MIMICII not only contains activity oriented data (such as patients’ events data) that are relevant for process mining but it also connects well all these data (admission, procedures, medications etc.) with the patient information based on the patient identifiers. So, applying process mining is such set of data seems promising. Therefore, we use this database as an input for the second phase of our approach, described in the next section.

4.2 Select ICU process(es).

This section describes the application second phase of our approach named ‘Select ICU process(es)’ that is described in section 3.2 of Chapter 3.

ICU data recorded in MIMICII are of a large quantity and a great diversity. Consequently, we have to make a choice and select one or several processes done in ICUs of BIDMC because doing process mining on the whole database would require more time than we have available for this project.

Moreover, we have decided to apply process mining techniques in the selected process(es)’ data in order to check if the ICU hospital staff adheres to the protocols or guidelines for the chosen process(es). This seems important because deviations from guidelines can lead to medical errors as explained in subsection 1.2.2. Consequently, to do so, we firstly should select the ICU process(es) to be mined and then also provide the guidelines or protocols that describe the selected process(es).

So, several ICU experts (doctors from ICUs of BIDMC, CZE1 and AZM2) and researchers (in mining healthcare data area) were contacted and asked for this issue and their suggestions about what processes to investigate into were:

1. An evidence‐based items checklist done daily to every patient 2. ST‐elevation treatment pathway 3. Antibiotic administration in ICU

In fact, we got the first two suggestions from a doctor of BIDMC and then discussed them with other doctors from CZE and AZM. They confirmed that there are guidelines that describe them but they do not

1 CZE stands for Catharina Ziekenhuis Eindhoven that is a hospital located in Eindhoven. 2 AZM stands for Academisch Ziekenhuis Maastricht that is a university hospital located in Maastricht.

36 know to what extent doctors and nurses follow them. Therefore, they all considered them interesting options to investigate with process mining in order to check if corresponding guidelines are followed or not.

In addition, we have investigated in internet and literature for guidelines that describe clearly treatment pathways of certain diseases which can be treated in ICUs of a hospital. Unfortunately, based on this online investigation we still could not make a choice on which process(es) to select due to our small knowledge in medical domain.

As a result, we have decided to continue and mine separately the data related to all three suggested options by experts because we considered all of them interesting. One interesting point in mining all these options is about their difference. The daily checklist is a process which includes actions done daily to all ICU patients but ST‐elevation treatment processes’ actions are done only to patients with ST‐elevation diagnosis. Moreover, antibiotics administration processes are more about rules than actions which are applied to patients with certain health conditions (such as patients infected by a certain bacteria). So, from process mining perspective these differences are interesting because they represent three different types of healthcare processes in ICU.

Also, choosing three different processes instead of one or two increases the chances of having some mining results at the end, because we were not sure if the data related to those processes stored in MIMICII could be used for process mining.

So, we describe separately each of these suggested options and explain why we find them interesting to be mined with process mining in order to check whether they adhere to corresponding guidelines In the following subsections 4.2.1, 4.2.2 and 4.2.3.

4.2.1 The checklist

According to Gawande in [17:12–13] the source of greatest difficulties and stresses in medicine is not money or government or threat of malpractice lawsuits (although they play a role) but it is the volume and complexity of knowledge and actions that science has dropped upon doctors. So, he suggests using checklists (the simplest of techniques) in medicine to cope with this complexity and as a result prevent many different failures deriving from it.

In other words, checklists are quite simple (as they remind us of the minimum necessary steps and make them explicit) but at the same time quite powerful in medicine (as they reduce significantly errors made by physicians and nurses) in [17]. In 2008, Gawande introduced a surgery checklist in eight hospitals and the practice resulted in 36 percent fewer major complications and 47 percent fewer deaths [17].

BIDMC hospital staff also uses a checklist everyday for each patient hospitalized at their ICUs. This checklist contains several check items which are related to nutrition of the patient, prevention steps taken for several medical conditions, evaluation of consciousness level of the patient, information about patient’s family communications and consulting teams involved in patient’s treatment etc. The full checklist used in BIDMC hospital is shown in Figure 6.

37

Figure 6‐ BIDMC daily checklist

Our goal here is to check if the checklist was done on each patient on a daily basis and check the order in which the items of the checklist were executed. This last investigation could be interesting if it is related somehow with the patients’ state/conditions.

4.2.2 ST‐elevation treatment pathway

ST‐elevation is related with some specific heart diseases. It refers to a finding on an electrocardiogram, wherein the trace in the ST segment is abnormally high above the isoelectric line [54]. ST‐elevation can be associated with different diseases and among others it is related to Myocardial infarction, commonly known as a heart attack. Here, we are focused on treatment pathways for ST‐elevation Myocardial infarction (acute or not) is in the ICU know as STEMI. This last disease is of our interest because according to [40] this is quite a common diagnosis in ICU patients of BIDMC hospital. Moreover, STEMI continues to be a significant public health problem in industrialized countries and is becoming an increasingly significant problem in developing countries [39].

An example of different pathways followed in the emergency management of complicated ST‐elevation myocardial infarction is show in Figure 7. These pathways are part of a guideline [3] that we do not know if it is used in BIDMC hospital because we were not provided with the STEMI treatment guidelines used in BIDMC hospital.

38

Figure 7‐Emergency management of complicated ST‐elevation myocardial infarction. The emergency management of patients with cardiogenic shock, acute pulmonary edema, or both is outlined. SBP indicates systolic blood pressure; IV, intravenous; BP, blood pressure; ACE, angiotensin converting enzyme; MI, myocardial infarction. *Furosemide less than 0.5 mg/kg for new‐onset acute pulmonary edema without hypovolemia; 1 mg/kg for acute or chronic volume overload, renal insufficiency. [3]

Here, our goal is to mine STEMI treatment pathways based on the data in MIMICII and check if they adhere to the specific treatment guidelines for this diagnosis.

4.2.3 Antibiotic administration in ICU

Badar and Navale in [4] claim that antibiotics are the most frequently prescribed drugs among hospitalized patients especially in intensive care. Also in [4] they concluded that antibiotic resistance is increasing at an alarming rate leading to increasing morbidity, mortality and treatment cost and that a key factor in the development of an antibiotic resistance is inappropriate use of antibiotics.

Therefore, antibiotic administration in critical care is considered interesting to be investigated because even though there are well described guidelines that explain the antibiotic administration rules in ICU, still there are reported cases about the continuous indiscriminate and excessive use of antibiotics prescribed by doctors to their patients [25,33,36].

The antibiotic administration guidelines mainly show for specific bacteria what antibiotics to use, its doses and the frequency of administrating the antibiotic to the patient. So, they are rule‐based and not process oriented because these guidelines describe rules that correspond to individual actions (not a set of actions) of administrating certain antibiotic in certain conditions. In other words, we cannot talk

39 about processes in cases when we have just one activity (such as giving a certain antibiotic to an ICU patient who has certain conditions).

So, if we do not have real processes (according to the definitions given in subsection 2.2.1) involved in antibiotic administrations to patients then it is not relevant to use process mining because process mining mines processes’ data.

In summary, firstly antibiotics administration seemed interesting to be checked if it is done according to corresponding guidelines but after some investigation on these guidelines, we noticed that we are not dealing with processes. As a result, we decided not to proceed with this option in our project because there is no direction to investigate into it according to this thesis goal.

4.3 Look for process(es) data.

In previous section, we concluded that it is interesting to conduct process mining on only two out of three suggested options by experts; the checklist process and ST‐elevation treatment process. Meanwhile, in this section, we describe our attempt to find the data of these processes, so that we can use them later for creating a log file to be mined. So, we show the results of phase ‘Look for process(es) data’ application for the checklist process and ST‐elevation treatment process respectively in subsections 4.3.1 and 4.3.2.

4.3.1 Checklist

In this subsection, we describe the application of phase of phase ‘Look for process(es) data’ of our approach, for the checklist process (described in subsection 4.2.1). So, we show how we tried to find checklist process data in MIMICII and what data we could identify in it as result of our search.

It was a challenge to find the information related to the checklist in the database because checklist’s data were not saved in a structured way in MIMICII clinical database and we had to look at the data itself to understand if they were related to the checklist items or not. So, it is important to have checklist data structured in one or more tables of the database in order to identify them clearly from other types of data.

We tried to analyze the data related to a few random chosen patients during their last admission in ICU in order to find the checklist data recorded for them but we did not succeed.

So, we decided to first understand the meaning of checklist’s items explained in subsection 4.3.1.1 and then search for its data in MIMICII database in subsection 4.3.1.2.

4.3.1.1 Understand the meaning of checklist’s items.

Obviously, we should know enough about the checklist items and understand them properly in the ICU given context before starting to search them in the database.

We have used different online resources to understand the meaning of the checklist items and below we give an overview of it.

40

As it is shown in Figure 6, the checklist has 11 check items and their meaning based on our online investigation is shown in Table 1.

No. Checklist item Explanation 1 Nutrition This is about the nutrition of the patient. Here, it should be Is the pt receiving nutrition? Should they checked if the patient is receiving any nutrition and if not be? should they get nutrition. 2 Prophylaxis Prophylaxis in general, is related to the steps taken to prevent  DVT certain medical conditions.  GI  Prophylaxis DVT is related with the steps taken to  VAP bundle prevent asymptomatic deep vein thrombosis in hospitalized medical patients. There are two general types of prophylaxis ‐mechanical methods and pharmacological agents. More information about prophylaxis DVT can be found in [2].  Prophylaxis GI is related to steps done to prevent gastrointestinal / Ulcers that are explained in [10].  Prophylaxis VAP is related with steps done to prevent ventilator‐associated pneumonia that are explained in [32]. So, this item should check if different steps are taken to prevent DVT, GI and VAP bundle in the ICU patient. 3 Sedation/Delirium/Vent This check item is done to evaluate how consciousness the weaning/Mobility: patient is. RASS stands for Richmond Agitation‐Sedation Scale  Current RASS? and is related to level of consciousness. CAM‐ICU stands for  Goal RASS? Confusion Assessment Method for the Intensive Care Unit and  CAM‐ICU: Positive or Negative? is related to the content of consciousness.  Current sedation? Plan for  Current and goal RASS show respectively current and weaning? goal score of RASS (for more information on this score refer to Figure 27 and Figure 28 in Appendix D).  The value of CAM‐ICU for the patient. It can be positive or negative.  Evaluate the current sedation of the patient which means check if the patient is experiencing pain, anxiety, delirium etc. Plan for weaning (removing) the patient from mechanical ventilation1 [48]. 4 IV Access IV Access stands for Intravenous Access. This is related with  What does the pt have for the infusion of liquid substances directly into a vein through access? different devices such as hypodermic needle, peripheral  Is it enough? cannula, peripherally inserted central catheter etc. [51]  Should it be DC’d?  What is the intravenous access device that patient has?  Is this device enough?  Should it be discontinued?

1Mechanical ventilation is indicated when the body can't meet its oxygen demand through spontaneous breathing or when the body can't adequately remove carbon dioxide (CO2).Mechanical ventilation aims to provide adequate ventilatory support to meet the patient's oxygen demands without harming the patient. Ventilation is delivered via an artificial airway: an oral or nasal endotracheal tube or a surgically placed tracheostomy tube [35].

41

5 Skin Impairments Pressure ulcers (or ulcerations) are areas of soft tissue  Any notable skin breakdown, breakdown that result from sustained mechanical loading of ulcerations? the skin and underlying tissues [6].  If yes, treatments? This check item is about examining the patient skin for different types of visible skin breakdowns/ulcerations. Also, in cases when there are noticed impairments on the patient skin, it should be checked if the patient is being treated for it. 6 Restrained orders (if applicable) If there are restrained orders done by physicians, it should be  Does it need to be written, checked if it is needed to write it, renew it or discontinue it. renewed or DC’d? Examples of restraint orders could be maintaining nutrition and hydration of the patient, physical restraints to ICU patients in order to prevent dislodgement of medical equipments and so on. 7 ICU consent The hospital requires consent of the patient for all special  Has it been signed? Yes, date_ procedures, tests and operations. If the patient is incompetent and cannot make his or her own decision, next of kin will be asked to sign the consent form. This item is to check in case there has been done any consent, if it is signed and when? 8 Consulting teams The treatment of a critically ill patient is usually followed by  What consults are following this physicians of different specialties as i.e. cardiologist, pt? dermatologist etc. This item checks what consults are following these patients. 9 Family communication Family communication can be used to share information about  Is there a HCP? illness, to engaging families in treatment and decision‐making  Who is the spoke person or and so on. contact? HCP stands for Health Care Provider and PCP stands for  Has the PCP been contacted? Primary Care Physician.  Last family meeting? Next family So, it should be checked who is the spoke person from family meeting? or any other contact and by communicating with him/her, it should be checked if there is a HCP and if the PCP has been already contacted or not. Also, it should be checked when was the last family meeting and when is scheduled the next one. 10 Code status Code status of the patient is a code that is the medical term  Has it been discussed? Should it that describes what a patient’s wishes are when his or her be? heart stops or lungs fail. It can values like ‘Do Not Resuscitate’  Has NEOB been contacted? (if heart fails), ‘Do Not Intubate’ (when their lung fails) etc. This items checks if the code status has been discussed with the patient or not. Also if it should be discussed or not. Moreover, this item checks if NEOB has been contacted or not for organ donation. NEOB stands for New England Organ Bank. 11 Disposition This item checks the information about the disposition of the  Is the patient staying at ICU today patient from ICU, by answering to the question if the patient is or not? or not staying at ICU today? Table 1‐ Explanation of checklist items

In summary, we could give some explanation to each checklist item in order to proceed with searching its data in the database. This is explained in the next subsection.

42

4.3.1.2 Retrieve the information related to the check list’s items from database.

In this step, we have searched into the MIMICII database by using different SQL queries in order to find the checklist’s data saved in it.

We tried to find all the checklist related information in MIMICII database but that was not so straightforward. We have investigated the possible tables where the data related to the checklist were saved and those tables were ‘ChartEvents’, ‘D_ChartItems’ and ‘NoteEvents’ (see Figure 21, Figure 22 and Figure 25 in Appendix A‐ MIMICII tables and their relationships for the detailed information about these tables) by running different SQL queries. These queries have the following structure: select * from ChartEvents inner join D_ChartItems on D_ChartItems.itemid= ChartEvents.itemid where lower(label) like ‘%keyword%’

The keyword is a word or a group of words that is used to find which patient chart items are used to save the checklist items. So, for instance, to find the information that corresponds to the checklist item about nutrition, the query used is: select * from ChartEvents inner join D_ChartItems on D_ChartItems.itemid= ChartEvents.itemid where lower(label) like ‘%nutrition%’

Unfortunately, the results of this step did not show all the information related to each of the checklist’s items. We have not only failed to get data from database for some checklist items (such as the item that checks if the patient should retrieve nutrition, the item checking the CAM‐ ICU of the patient etc.) but we have also noticed that for some other checklist’s items we were not sure if the retrieved information was 100% relevant to the corresponding checklist’s items (such as the item checking for skin impairments, the item checking for sedation etc.). This uncertainty was due to the way we have searched in the database by using terms found in the checklist items’ descriptions (the use of keywords in our SQL queries). Mainly, here the focus was to find which of patient’s charts is related to which item of the checklist.

Based on the results of this phase, we concluded that we cannot do process mining on the checklist process in the MIMICII database because we could not find in the database most of the information related to the checklist process. So, without the logged data for the checklist process we cannot conduct process mining in order to check if the checklist was applied correctly in ICU. The reasons why we were not able to find the checklist data in the database are mentioned below:

‐ The meaning of some checklist items was not clear to us (i.e. nutrition of the patient) because more medical knowledge was required. Here, we must emphasize that for these items we could not provide an explanation of them in the context that they are used in BIDMC’s ICUs. Consequently, we could not identify where these checklist items were saved in the database. ‐ For some other checklist items, even though their meaning was clear to us, it was not possible to distinguish them in the database from other patients’ data. This is the case where the same parameter checked by a checklist item also it is checked for other purposes and events in the same day, having so multiple entries for that parameter in the database as for instance the ‘code status’

43

checklist item. As a result, we cannot know which one of these values corresponds to the checklist item. So, it can happen that the same check that is done during checklist completion is also done for other reasons. For instance, the event about checking of the code status of the patient is done several times to the patient and we cannot decide which one of these same type of events (code status)is done as part of the checklist process. ‐ It was also noticed that even when a checklist item is performed, this event is not saved in the database. For instance if there is no need to do family communication then nothing it is saved in database even though this item of the checklist itself was performed as the nurse already checked for the need to communicate or not with the patient's family. Moreover, later in our project we were told by MIMICII support staff that there is no requirement for nurses to record the output of every checklist. ‐ We found several possible patient’s charts in the database that could be relevant to only one checklist item and consequently we cannot decide which one(s) was (were) used to store the results of this checklist item in the database, as for instance the one related to skin impairments check shown in Table 7 in Appendix D. ‐ Some checklist items are not saved in database in a structured way but they are part of notes made by nurses (i.e. disposition of the patient from ICU) and consequently it is too difficult to extract the checklist information from those notes by using simple SQL (Structured Query Language) queries in the database. Text mining 1could be a solution in this case, but it would require too much extra effort and time that we could not provide due to the time limit of this thesis project and it is out of our scope to do text mining.

In summary, we have noticed that in MIMICII database, we cannot identify the process of the checklist as a whole, meaning that we could find some data which can be related to only nine of its items. So, in the database we found information about nutrition of the patient, GI prophylaxis, current sedation of the patient, skin impairments of the patient, IV access, consulting teams following the patient, family communication, code status of the patient and disposition of the patient from ICU. Moreover, this information cannot be related with any instance of the checklist process because we did not find instances of the checklist process at all. As a result, we cannot claim with certainty if the information found in database for some specific checklist’s items, is or not related to the checklist process (because it can be related to other processes and events).

4.3.2 ST‐elevation

In this subsection, we describe the application of phase ‘Look for process(es) data’ by explaining our attempt to find ST‐elevation treatment process data in MIMICII database.

Firstly, we tried to get some knowledge about these processes from guidelines and online medical resources, so that we know what to search for in the database. Also, we asked ICU physicians from

1 In a manner analogous to data mining, text mining seeks to extract useful information from data sources through the identification and exploration of interesting patterns [11]. In the case of text mining, the data sources are document collections, and interesting patterns are not found among formalized database records but in unstructured textual data in the documents in these collections [11].

44

BIDMC hospital to explain how these processes are run in their hospital. As a result, from this investigation, among others, we found out that ST‐elevation treatment process starts in the emergency department and the interventions are performed in the cardiac catheterization suite. Then, the patient can be sent for recovering in intensive care. So, different information systems are used in recording ST‐ elevation treatment data.

As a second step, we searched for ST‐elevation treatment process data in MIMICII by firstly selecting the patient diagnosed with ST elevation myocardial infarction (STEMI). It was possible to distinguish such patients in the database based on their recorded diagnosis information (table ICD9 shown in Figure 17 of Appendix A‐ MIMICII tables and their relationships stores the diagnosis of ICU patients for each of their ICU admission) but we could not find all the data related to these treatment processes because those data were not in MIMICII. The data related to the first part of the process (the data from emergency department and cardiac catheterization suite) are not captured in MIMICII and as a result we could not find the full processes for management of ST‐ elevation.

Furthermore, patients diagnosed with STEMI in MIMICII suffered from different diagnosis during their hospitalization in ICU. So, 1340 patients out of 1499 patients in total diagnosed with STEMI myocardial infarction, have five or more different diagnosis in the same time during their stay at ICU. Also, in this group of patients, 19 patients have only two different diagnoses and there is no patient with one diagnosis during their ICU stay. Consequently, this makes it difficult to link events happened to a patient with treatment processes as for i.e. it is not possible to know (without having medical knowledge) to which treatment process is a medicine administrated to patient who has 3 different diagnoses at the same time and he is being treated for all of them.

In addition, we could have done it the other way around; checking whether all steps described in the STEMI treatment guideline were present in the data of the patient diagnosed with STEMI but this requires having the STEMI treatment guidelines from BIDMC and knowing exactly what part of the treatment process is done in the ICU. In our case, we could not provide the STEMI treatment guidelines from BIDMC.

To sum up, we could distinguish patients with STEMI but we could not provide all the data about STEMI treatment process from MIMICII, because parts of this process are not done in ICUs (but in other departments of the hospital) and as a result data related to these process parts are not recorded in MIMICII database. Also, we did not know which parts of this process were done in ICUs and patient with STEMI have multiple diagnosis at the same time. Consequently, we could not identify the data about ST‐ elevation treatment process in MIMICII.

4.4 Assess process(es) data.

After searching for data of the selected processes (checklist process and STEMI treatment process) in the phase ‘Look for process(es) data’ of our approach (described in section 4.3), we realized that we cannot apply process mining techniques on these processes’ data because we could not provide enough of these data.

45

So, we have decided that we cannot continue with applying process mining techniques on the found process data because they lack the minimum required data needed for process mining. These data do not include information based on which we can build the sequence of events of these processes. A complete description of the problems related to these data is presented in the next phase of our approach ‘Identify potential issues’.

In summary, for our selected processes, the phase ‘Apply process mining techniques’ of our approach is never done and the next phase after the phase ‘Assess process(es) data’ is always the phase ‘Identify potential issues’. The application of this last phase is explained in the next section.

4.5 Issues and challenges.

This section describes the application of phase ‘Identify potential issues’ of our approach (described in section 3.6) in which we summarize different issues faced in phase ‘Look for process(es) data’ of our approach. Mainly, these issues are related to the acquiring of the process data that will be mined from the database. So, in other words, they are related to providing the process logged data in order to create the event log file that will be used by process mining. These issues are grouped as below:

1. No relationship between the process and its data in the database. 2. Matching process guidelines with the process information found in the database. 3. Unstructured process data in the database. 4. Missing process data in the database.

The above mentioned issues are elaborated further in the following subsections.

4.5.1 No relationship between the process and its data in the database.

One of the issues faced in phase ‘Look for process(es) data’ of our approach (described in section 4.3) is about the relationship between the process and its events’ data in the database. We were not able to identify this relationship in the database (MIMICII) for our selected processes (checklist and ST‐elevation treatment processes) and as a result we are not sure if these events data are really related to those processes. It is important to mention here, that MIMICII already has information about different events happened to the patients and this information among other includes the type of the event, the patient to whom this event happened, timestamps that show when the event is done, the caregiver who did it etc. So the basic information needed for process mining about events is present in the database, but we cannot relate it with the process they belong to.

So, for the checklist process we found some data that we considered relevant to it based on the meaning and terminology (as explained in subsection 4.3.1.2) but we never had 100% confidence that these data were related to the checklist process. This happened because we could not find a link between the found data in MIMICII and the checklist process. Consequently, we could not exclude the possibility that this data could be related to other processes. Also, the terminology used in the database was not always exactly the same as the one found in the checklist description. This leads to uncertainty

46 in cases where we are not able to map data with processes because we are not sure if the data belong to the process or not.

Moreover, the same problem was encountered for STEMI treatment process because we could not distinguish a link between the STEMI treatment process and the events happened to a patient.

This last process is different from the checklist, because for getting its data we can use the patient diagnosis information. So, we could select the data of patient diagnosed with STEMI and consequently consider these data relevant to STEMI treatment process. Unfortunately, this does not solve the problem because as mentioned in subsection 4.3.2, patients with STEMI suffer from several different diagnoses at the same time. As a result, they are being treated simultaneously for all of their diseases and we (who are not experts in healthcare domain) could not distinguish the patient data related with his STEMI treatment from the rest of his/her data (that are also about his other treatments).

To sum up, we miss confidence even for that partly process related information found in MIMICII, because it was unclear to which process this information was related.

4.5.2 Matching process guidelines with the process information found in the database

Understanding the guideline that describes the process to be mined and also the meaning of database items is another issue faced in phase ‘Look for process(es) data’ of our approach (described in section 4.3). This is important especially in the cases when we need to select the process data in a database with various types of data (also data related to many other processes). Moreover, this understanding it is even more challenging when we are dealing with healthcare processes because of the domain knowledge required.

For instance, we tried to select the checklist process data among the data of a few randomly chosen patients during their last admission in ICU but we did not succeed because it was difficult to distinguish checklist data in the large and various data sets of these patients recorded in the database. So, we had to get some general information about the medical meaning of the checklist items before searching their data in the database but this was not straightforward. It happened that for some checklist items different terminology was used in the database compared with their description in the checklist guideline document and as a result we could not match the information from the guideline with the data in the database. In such cases, asking physicians can help but still they are not always the solution to this problem because they can explain the medical meaning of the process activities but they might not be aware of the terminology that it is used to save the process data in the database.

So, in cases of ICU databases (which have a great variety of data), understanding guidelines information about ICU processes and matching it with the corresponding process data in those databases is not an easy task because different terminology can be used.

4.5.3 Unstructured process data in the database.

Here, we explain the cases when we could find some process data in phase ‘Look for process(es) data’ of our approach (described in section 4.3) but we could not use them because they were unstructured.

47

For instance, we noticed that some of the checklist related data are saved in database as part of the nurses and doctors’ notes (i.e. disposition of the patient from ICU) and extracting this information is challenging because we should use other mining techniques (i.e. text mining)and not just process mining. Process mining techniques cannot mine the text notes in order to extract information about what event happened, when it happened, who performed it and other information that can be related to the event. So, these text notes represent unstructured data that process mining cannot use.

4.5.4 Missing process data in the database.

The last issue that we have encountered in phase ‘Look for process(es) data’ of our approach (described in section 4.3), is about missing process data in the database that was available to us (MIMICII database). So, these data either were not recorded at all or they were recorded in other systems/databases that were not part of our source of data (MIMICII database).

For instance, in the case of checklist process discussed in subsection 4.3.1, there are checklist items for which we could not find any information in the database (i.e. CAM‐ICU checklist item). We also could not find data about some checklist items that are formulated in the checklist document as questions asking for suggestions because we could not find their answers in MIMICII database such as the check item ‘Should the patient retrieve nutrition?’. We could identify if the patient was getting or not nutrition in the database but we could not find data related to if he/she should or should not retrieve nutrition. Other examples of missing data for the checklist process are given in subsection 4.3.1.2.

Also, parts of data related of STEMI treatment process performed on ICU patients are missing because they were part of multiple systems and not all the data from these systems are present in MIMICII. As mentioned in subsection 4.3.2, this treatment process is not run completely in ICUs of the hospital but it is started in the emergency room and some interventions are performed in the cardiac catheterization suite.

Finally, it is important to have complete logged data about processes in order to have useful results from process mining them because in cases where some activities of the process are done but there is no information recorded in database about them (i.e. such as it was the case for some checklist items in the checklist process explained in subsection 4.3.1.2) then the mined process will not reflect reality correctly. It will have fewer activities than the real process executed in practice.

4.6 Conclusions

In this section, we summarize our attempt to apply process mining in the MIMICII database.

Our goal on this case study was to apply process mining on one or several ICU processes in order to check if these processes are done in compliance with the guidelines which describe them. Firstly, we have considered three processes to look at; the checklist, the ST‐elevation treatment pathways and the antibiotics administration processes.

Then as described in section 4.3, we searched in MIMICII database for data related to only two of these processes; the checklist and the ST‐elevation treatment pathways. We decided to put away the

48 antibiotics administration processes because as explained in subsection 4.2.3 they were not real processes. During this phase we have faced several issues that made our attempt to get these processes’ data fail by confirming so that process mining is impossible without proper event logs [1:95]. As a result, we never conducted the phase ‘Apply process mining techniques’ of our approach.

Mainly, as explained in section 4.5, these issues are about no clear relationship between the process and its data in the database, difficulties in matching process guidelines with the process information found in the database and unstructured and missing process data in the database. We have encountered almost all of these problems for both selected processes (the checklist process and the ST‐elevation treatment process).

Furthermore, these issues seem to be general problems in providing process data. Rebuge and Ferreira has mentioned that one of the challenges of healthcare processes is that they are increasingly executed as a set of activities distributed in different departments making it difficult to get an integrated view as data are found in multiple information systems [38]. Also, Van der Aalst says that event data is typically scattered over different data sources and often quite some efforts are needed to collect the relevant data [1:95–96]. In addition, he confirms that in many situations, the process data is unstructured or important meta data that describe data sources is missing [1:96].

So, in the next chapter we give solutions to these issues in the form of a database design and procedural design.

49

5 Design

In section 4.5 of the previous chapter, we have identified several issues when trying to mine some ICU processes in the MIMICII database. These issues were encountered because the database structure is not process oriented and the ICU way of working is not directed towards logging process data. Firstly, in this chapter, we suggest technical (database design) solutions and procedural solutions (on how to provide the process data to the database design) to those issues in section 5.1. Then, in section 5.2, we discuss these solutions in order to highlight their strong and weak points.

5.1 Solutions

Doing process mining for whatever purpose requires process related data but what data we need depends on the questions we what to answer by means of process mining.

In our case, we want to use process mining on ICU clinical processes in order to check if these processes are done conform guidelines/protocols that describe them but we faced problems in having the needed process data and having them in a structured way (as described in section 4.5). By structured process data, we mean that they are stored in a database (in our case an MIMICII database) in structured way so that they can be easily connected to the process itself. In other words, we are able to identify process instances (or cases) in those data.

So, in this section we describe the application of phase ‘Design solution’ of our approach (described in section 3.7). Here, we suggest solutions to the issues listed in section 4.5 but before doing so, we have assumed that:

1. There are guidelines or protocols in a hospital that document checklist processes and treatment processes related to a certain diagnosis. 2. There is a clinical information system that is used by ICU nurses/physicians for entering patient data as for example medications supplied to him/her, the laboratory tests done to him/her, the fluids supplied to him/her etc. So, the ICU clinical information system is an information system that provides electronic medical records for the patients hospitalized in intensive care.

Firstly, in subsection 5.1.1, we propose a new design schema for MIMICII database as a solution to issues faced in our attempt to do process mining in this database (described in section 4.5), and discuss its generalizability. Then, in subsection 5.1.2, we give a highly abstract ICU database design schema suitable for saving process data of critical care units of hospitals. Finally, in subsection 5.1.3, we give general procedural suggestions on how to provide the process data to the database schemas (the ones presented in subsections 5.1.1 and 5.1.2).

5.1.1 A new process oriented version of MIMICII

In this subsection, we propose a new design of the MIMICII database. This new version of MIMICII is suitable for storing in a structured way the data of different treatment processes and checklist processes, in order to easily distinguish in this database which data are related to which processes. As a

50 result, this can help a lot in the extracting phase of the processes’ data that are used to create the log file to be mined by process mining techniques. This new MIMICII database schema is shown in Figure 8 and Figure 9.

Moreover, this schema fulfills the design criteria mentioned in subsection 3.7. This schema is not a full schema of MIMICII database but it just shows the changes that should be done to its existing schema (see Figure 5 in section 4.1.3 for the existing schema of MIMICII). So, it shows only the existing tables that are changed and the ones that are added (we do not show the tables of MIMICII that do not change). Also, it shows only the added new relationships between tables of this schema. So, the rest of relationships between the existing tables (not the new ones) of MIMICII are shown in Figure 5.

The new schema is shown in two figures (Figure 8 and Figure 9) just to make clearer visually the relationships between its tables. So, in Figure 8, the relationships between table ‘PROCESS_OCCURENCE’ and other schema’s tables are shown and in Figure 9, the relationships between table ‘EVENTITEMTS’ and other schema’s tables are shown.

Figure 8‐ New schema of MIMICII‐ part 1

51

Figure 9‐ New schema of MIMICII‐ part 2

In addition, in the proposed new schema of MIMICII we identify the patient and his stay at ICU in the same way as in its existing schema (as explained in section 4.1.3). So, the patient is identified by the field ‘SUBJECT_ID’ and his stay at ICU is identified by the combination of fields ‘SUBJECT_ID’ and ‘ICUSTAY_ID’ or ‘HADM_ID’ and ‘SUBJECT_ID’ in each of the schemas’ tables where these fields are present.

In Figure 8 and Figure 9, the lines connecting tables show the relationships between these tables and the cardinality of these relationships is shown by the text typed in the start and end of the line. Meanwhile, the tables are presented graphically as rectangles and the text written in the top of each of them shows the table name. The tables’ names typed in red color are new tables added to the existing database schema. Moreover, in the tables of the schema, only the new added fields and some of the existing fields are shown. The shown fields are used to relate them with other tables in the new schema. So, their other existing fields are not shown specifically but they are presented by the field named ‘…(other existing fields)’. This is done because it is not necessary to show them (as they are not affected by the new schema) and to make the new schema more readable and visually clearer. For complete information about the fields of these tables, we can refer to Figure 16 and Figure 18 in Appendix A‐ MIMICII tables and their relationships.

So, three new tables named ‘PROCESSITEMS’, ‘PROCESS_OCCURRENCE’ and ‘EVENTITEMS’ are added to the existing schema of the MIMICII database and their structure is as shown respectively in Figure 10, Figure 11 and Figure 12. These figures specify what fields the table contains (by ‘COLUMN_NAME’ column), if a field can have NULL values (by ‘NULLABLE’ column) and a short description of the field (by ‘COMMENT’ column). In the ‘NULLABLE’ column, the symbol ‘Y’ (short form for ‘Yes’) shows that the

52 field can be empty or have NULL values and ‘N’ (short form for ‘No’) shows that the field must always have a value different from NULL.

The first new table in the new MIMICII schema is table ‘PROCESSITEMS’ which stores a predefined list of treatment processes and checklist processes. The structure of this table is shown in Figure 10 and it has three fields; ‘PROCESS_ID’ to uniquely identify the process in database, ‘NAME’ to store the name of the process and ‘DIAGNOSIS’ to link processes with a certain diagnosis (i.e. treatment processes). The field ‘DIAGNOSIS’ can be specified in the table by using the ICD‐9 codes or DRGs code1 because these codes are already used in MIMICII to record patients’ diagnosis.

In Figure 10, it is also shown that the fields named ‘PROCESS_ID’ and ‘NAME’ are always required to have a value but the field ‘DIAGNOSIS’ can have NULL values (in records that are about processes not related to specific diagnoses).

COLUMN_NAME NULLABLE COMMENTS PROCESS_ID N Table record unique identifier, the process ID NAME N Name of the process DIAGNOSIS Y ICD‐9 code or DRGs code Figure 10‐ Table PROCESSITEMS

Examples of records of this table for the checklist process (the first record of the table) and STEMI treatment process (the second record of the table) can be as shown in Table 2. The record for the checklist process in this table has the value ‘1’ for the ‘PROCESS_ID’ field, the value ‘Daily checklist’ for field ‘NAME’ and value NULL for ‘DIAGNOSIS’ (because the daily checklist is done to every patient no matter their diagnosis). Meanwhile the record for the STEMI treatment process in this table has the value ‘2’ for the ‘PROCESS_ID’ field, the value ‘STEMI treatment’ for field ‘NAME’ and the value 410 (one of ICD‐9 codes for STEMI) for the ‘DIAGNOSIS’. The ICD‐9 codes in the range 410.0‐410.6 and the value 410.8 are different codes used for STEMI depending on which wall the heart the myocardial infarction occurred [56].

PROCESS_ID NAME DIAGNOSIS 1 Daily checklist NULL 2 STEMI treatment 410 Table 2‐ PROCESSSITEMS table with two records (the first one for the checklist process and the second one for the STEMI treatment process)

The second table, we have added in the new design of MIMICII database is table ‘PROCESS_OCCURRENCE’. This table records the data which show when a process has started for a certain patient during his/her stay in intensive care unit and each of its records is identified uniquely by the field ‘PROCESS_INSTANCE_ID’. Here, the patient is identified by the field ‘SUBJECT_ID’ and his/her

1 Diagnosis Related Groups (DRGs): A classification system that groups patients according to principal diagnosis, presence of a surgical procedure, age, presence or absence of significant comorbidities or complications, and other relevant criteria [46]

53 stays at ICU are identified by the combination of fields ‘HADM_ID’ and ‘ICUSTAY_ID’. These fields are already being used in MIMICII for the same purpose in many tables as mentioned in subsection 4.1.3. Meanwhile, the field ‘PROCESS_ID’ and ‘STARTDATETIME’ are used respectively to show the process and the date and time when this process has started. All the fields of this table are always required to have a value different from NULL (as values of column ’NULLABLE’ in Figure 11 show).

COLUMN_NAME NULLABLE COMMENTS PROCESS_INSTANCE_ID N The unique identifier of the process instance SUBJECT_ID N The unique identifier of the patient HADM_ID N The unique identifier of the hospital admission ICUSTAY_ID N The unique identifier of the ICU stay of the patient PROCESS_ID N The unique identifier of the process STARTDATETIME N The date and time when the process started Figure 11‐ Table PROCESS_OCCURRENCE.

As shown in Figure 8, table ‘PROCESS_OCCURRENCE’ is linked with the table ‘PROCESSITEMS’ through the field ‘PROCESS_ID’. Based on the cardinality of this relationship, each record of table ‘PROCESS_OCCURRENCE’ is related with exactly one record of table ‘PROCESSITEMS’ meaning so that a process instance can be related to only one process. On the other hand, each record of table ‘PROCESSITEMS’ can be related to zero or more records of table ‘PROCESS_OCCURRENCE’ (shown graphically as ‘0..*’) which means that a process can never happen or it can happen once and more. Also, table ‘PROCESS_OCCURRENCE’ is related to table ‘D_PATIENTS’ based on their common field ‘SUBJECT_ID’. This relationship shows that a process instance is related to only one patient and a patient can be linked with no process instance or several process instances of the same process or not.

In order to be clearer about the data that this table stores, we give four sample records of table ‘PROCESS_OCCURRENCE’ in Table 3. So, in Table 3, we have records related to processes started for two different patients (one with ‘SUBJECT_ID’ equal to 16404 and the other with SUBJECT_ID equal to 16405) during their hospitalization in the ICU (identified by the values of columns ‘HADM_ID’ and ‘ICUSTAY_ID’). Here, we notice that the patient with SUBJECT_ID’ value equal to 16404, is involved in three process instances, where two of them are instances of the checklist process (‘PROCESS_ID’ value equal to 1 corresponds to the ‘Daily checklist’ process according to the sample records given in Table 2) and one is instance of the STEMI treatment process (‘PROCESS_ID’ equal to 2 corresponds to the ‘STEMI treatment’ process according to sample records given in Table 2). So, the checklist has been done twice to this patient on 03/25/2012 08:00 and 03/26/2012 08:10. Meanwhile to the patient with SUBJECT_ID’ value equal to 16405, the checklist process is done once on 03/26/2012 08:30 and this patient is not involved into other processes. All these four process instances have a unique value in field ‘PROCESS_INSTANCE_ID’ that identifies the process instances from each other in the database.

54

PROCESS_INSTANCE_ID SUBJECT_ID HADM_ID ICUSTAY_ID PROCESS_ID STARTDATETIME 1 16404 20387 236 2 03/24/2012 05:00 2 16404 20387 236 1 03/25/2012 08:00 3 16404 20387 236 1 03/26/2012 08:10 4 16405 20100 300 1 03/26/201208:30 Table 3‐ Table PROCESS_OCCURRENCE that with four sample records.

The last table added in the new MIMICII database schema is the table named ‘EVENTITEMS’. It records a list of predefined activities/events that are done as part of ICU treatment or checklist processes. Actually, MIMICII contains some structured information about events as it stores different types of events in different tables (such as MEDEVENTS, LABEVENTS, PROCEDUREEVENTS etc.) and also it has a list of predefined patient charts (that is used when different chart events are stored in table ‘CHARTEVENTS’) but we are not sure whether these events are named according to the terminology that is usually used in medical guidelines. So, we propose to have a list of all possible centralized in the new table ‘EVENTITEMS’.

For each event stored in this table, there is an ID (stored in field ‘EVENT_ID’) that uniquely identifies the event in the database and a name (stored in field ‘NAME’). The name of events should be standardized so that they match the terminology of different treatment guidelines or checklists guidelines. Both fields of this table should not have NULL values.

COLUMN_NAME NULLABLE COMMENTS EVENT_ID N Table record unique identifier, the event ID NAME N Name of the event Figure 12‐ Table EVENTITEMS

In Table 4, two sample records of table ‘EVENTITEMS’ are shown. The first record corresponds to one of the daily checklist events (discussed in subsection 4.3.1.1) that is about checking the patient’s content of consciousness. Meanwhile the second record is about the event of administrating medication to the patient via IV access.

EVENT_ID NAME 1 CAM‐ICU 2 Medication administration via IV access Table 4‐ EVENTITEMS table with two sample records

Table ‘EVENTITEMS’ is related to other tables in the new schema (as shown in Figure 9). These relationships are explained later in this subsection.

Furthermore in the new schema shown in Figure 8 and Figure 9, every table of MIMICII that records the events done to a patient (as part of treatment processes or checklist processes) will have two new fields named ‘EVENT_ID’ and ‘PROCESS_INSTANCE_ID’. These tables are some of the existing tables of MIMICII which are listed below:

55

‐ MEDEVENTS (where are stored medication given to patients), ‐ IOEVENTS (where are stored fluid input/output events done to patients), ‐ TOTALBALEVENTS (where patients’ total fluid balance events are stored), ‐ LABEVENTS (where are stored laboratory test done to patients), ‐ MICROBIOLOGYEVENTS (where events indicating microbiology tests taken from patients are stored), ‐ PROCEDUREEVENTS (where are stored events indicating procedures performed on patients), ‐ CHARTEVENTS (where are stored events which occur on patient charts).

The information that these tables already include is, among others, the patient to whom the event was done, the date and time when the event happened, the caregiver who performed the event etc. In other words, all we need to add to make this information usable for process mining is the standardization of the event types (according to the list of events stored in table ‘EVENTITEMS’) and the link between events and process instances. This link cannot be achieved by using only information about patient diagnosis and this issue is discussed further in Appendix C.

So, these tables keep all the information they already have in the existing MIMICII database structure but in the new schema, they will contain also the ID of the events and the ID of the process instances to which the events are related which are recorded respectively in fields ‘EVENT_ID’ and ‘PROCESS_INSTANCE_ID’. In these tables, values of field ‘EVENT_ID’ refer to the list of event IDs stored in table ‘EVENTITEMS’ and values of field ‘PROCESS_INSTANCE_ID’ refer to the list of process instance IDs stored in table ‘PROCESS_OCCURRENCE’.

So, the event tables (such as MEDEVENTS, LABEVENTS etc.) are related to the table ‘EVENTITEMS’ by their common field ‘EVENT_ID’. Based on the cardinality of these relationships shown in Figure 9, each record of the event tables is related to exactly one record of table ’EVENTITEMS’ and on the other hand, each record of table ‘EVENTITEMS’ is related to zero or more records of each of the event tables. This means that the occurrence of the event (stored in the event tables) is related to only one event in the list of events stored in table ‘EVENTITEMS’ and it can happen that each event of this list does not occur at all or it occurs once or more times (to the same patient or not).

Furthermore, the event tables are related to table ‘PROCESS_OCCURRENCE’ by their common field ‘PROCESS_INSTANCE_ID’. In Figure 8, the cardinality of these relationships shows that every record of each of the event tables can be linked with one record of table ‘PROCESS_OCCURRENCE’ or cannot be linked at all to table ‘PROCESS_OCCURRENCE’. This means that if an event happened to a patient then this event can be part of one process instance or it is not related to any process. Moreover, if the same event is part of some different process instances then its data are stored in new records for each of the process instances automatically. On the other hand, each record of table ‘PROCESS_OCCURRENCE’ is related to zero or more records of each of the event tables. This means that it can happen that a process instance has no events of a certain type but also it can have one or more events (of the same or different types).

56

The way we store the data of an event that is part of some different process instances is not an optimal way of storing data because it results in data redundancy (we store the same data about the same event occurrence in different records that corresponds to each process instance that the event is part of). We chose to do it so because the event tables do not have a primary key which can be used to link the event occurrence (stored in one the event tables) with the process instance that it belongs to in a new table (to avoid redundancy). The timestamps of the records in the event tables can be used alone or in combination with other fields as primary keys of these tables but still we are not sure if the timestamps (alone or in combination with other fields) uniquely identify the records. So, in other words, our proposed new MIMICII schema shows the data needed to link the occurrence of events with the process instances they are part of and we think that it is a matter of the MIMICII database administrators to think of avoiding this data redundancy.

In addition to the previous explanations about the event tables in the new schema, we give an example of how data can be stored in these tables. So in Table 5, we show how can be stored one of the checklist items (an event of the checklist process) in the new version of table ‘CHARTEVENTS’. The first column of Table 5 (named ‘FIELD_NAME’) shows the fields of table ‘CHARTEVENTS’, the second column (named ‘VALUE’) shows the values of the sample record for each field and the third column (named ‘COMMENT’) gives short explanations about the meaning of the fields. In Table 5, the field typed in red is a new field added to the existing version of table ‘CHARTEVENTS’ of MIMICII. This check item is a chart done to the patient during his/her ICU hospitalization to check for impaired skin as part of the checklist process (see value of field ‘PROCESS_INSTANCE_ID’ in Table 5). If we compare the way this checklist item would be saved in table ‘CHARTEVENTS’ of the existing MIMICII database (all the information stored in Table 5 except of the information related to the field typed in red), we notice that in this table it is not clear to which process the event belongs. Meanwhile, the data stored in the new version of this table (shown in Table 5) link the event to a process instance.

FIELD_NAME VALUE COMMENT SUBJECT_ID 16404 The patient for the chart event ICUSTAY_ID 20387 The ICU stay for the chart event. ITEM_ID The chart item for the event record. In this case it is ‘Impaired 374 Skin Site#1’ chart item. CHARTTIME 03/25/2012 08:23 The time of the chart event. ELEMID 0 The element of the chart event REALTIME 03/25/201209:00 The real time of the chart event. CGID 199 The care giver for the chart event. CUID 1 The care unit where the chart event took place. VALUE1 Leg, Left Lower The first chart event value. VALUE1NUM null The first chart event value (cast to numeric value). VALUE1UOM null The units of measurement for the first chart event value VALUE2 null The second chart event value. VALUE2NUM null The second chart event value (cast to numeric value). VALUE2UOM null The second of measurement for the first chart event value RESULTSTATUS null The result status. STOPPED NotStopd Whether or not the event was stopped.

57

PROCESS_INSTANCE_ID 2 The process instance item. In this case, it is an instance of the ‘Daily checklist’ process according to sample records given in Table 2 and Table 3. Table 5‐One record saved in table CHARTEVENTS in the new version of MIMICII

Other tables of MIMICII which store other events’ information such as:

‐ events for ICU stays of a patient (see Figure 23 in Appendix A‐ MIMICII tables and their relationships for its structure), ‐ events related to ICU transfers of a patient (see Figure 24 in Appendix A‐ MIMICII tables and their relationships for its structure),, ‐ events indicating demographic information about a patient (see Figure 26 in Appendix A‐ MIMICII tables and their relationships for its structure), ‐ events related to notes done by nurse/doctor for a patient (see Figure 25 in Appendix A‐ MIMICII tables and their relationships for its structure), ‐ events related to DRGs or ICD‐9 codes of the patients (see table DRGEVENTS and table ICD9 in Figure 17 in Appendix A‐ MIMICII tables and their relationships). do not need to be changed in the new proposed schema of MIMICII, because we think that their information is not related to events that could be part of treatment or checklist processes. Notes taken by nurses or physicians can be used to store the result of certain activities of a treatment/checklist process, but as mentioned in section 4.5.3 this information cannot be used by process mining techniques.

In such cases, the information of the notes about patient events needs to be structured. So, we suggest to add new type of events (new records) in table ‘EVENTITEMS’ that correspond with the event types stored in notes and the occurrence of the events themselves in the table ‘CHARTEVENTS’ or in one of the event tables. As a result, we have structured information about events because once an event happens to a patient, then this event type either is one of the event’s types stored in one of the event tables (for i.e. a specific lab test done the patient is always recorded in table ‘LABEVENTS’) or it is one of the event’s types stored in table ‘D_CHARTITEMS’.

So, the new proposed schema for MIMICII solves our faced problems (described in section 4.5) and makes it easy to identify and select process data in order to create its event log to be mined by process mining techniques. The information on how event logs can be created is shown in Appendix B. But is this new schema of MIMICII general and optimal enough to be used for other ICU databases and different ICU clinical processes (treatment, checklist or diagnosis processes)?

The problems described in section 4.5 are a big challenge in general for an ICU database because it has a great diversity of data, and trying to identify process related data in it without proper connections/relationships between those data, it is not only a nontrivial task but we have a great risk of failing to do so. Supposing that other different ICU databases can have similar problems regarding clinical ICU processes data as MIMICII has, it is important to discuss if the MIMICII new schema (shown

58 in Figure 8 and Figure 9) is general and optimal enough to be applied to different ICU databases. This discussion is done in the remainder of this subsection.

The new MIMICII database schema structures the occurrence of events in different tables (such as MEDEVENTS, CHARTEVENTS, MICROBIOLOGYEVENTS etc.). On the one hand, this makes possible to store different types of relevant information for different types of events but on the other hand, it is not general enough because other ICU databases not necessarily store the event data in the similar way to MIMICII.

Also, as mentioned earlier in this subsection, the new schema of MIMICII is not so optimal because it allows for data redundancy. There is not an ID that uniquely identifies the occurrence of an event in the event tables of the MIMICII database and this is the main cause of data redundancy when mapping events’ occurrences with process instances.

In addition, MIMICII allows structuring in a general way of the list of all ICU treatment and checklist processes and ICU events that are part of these processes. It also, stores information about when a process has been initialized. At this point, we find its structure suitable for storing diagnosis processes too.

Finally, in general, the new MIMICII schema (shown in Figure 8 and Figure 9) shows general information on how to store process data but it still is not general and optimal enough to be applied also in other ICU databases. Therefore, in the next subsection, we propose a new database schema which is an abstract ICU database schema that shows in general way how to store ICU process data (relevant for doing process mining).

5.1.2 A process oriented ICU database abstract schema

In this subsection, we give an abstract database schema which shows the main data elements that a process‐aware ICU clinical database should have and how these data can be structured in this database.

Our proposed process‐aware ICU database aims recording in a structured way data about clinical processes done in ICUs by allowing so to discover these processes by means of process mining. So, we suggest to consider (or use) this database schema when creating or modifying ICU databases in order to make its data usable by process mining and consequently benefit from process mining findings.

As a starting point in designing the ICU process oriented database, we describe an ICU clinical process (also referred in this report as ICU process) conceptually. The ICU process contains a set of events and an event can happen at a certain time, to a certain patient, by a certain caregiver and it can occupy certain resources. The ICU process can be initialized several times for a specific patient or different ones.

So, in order to have ICU clinical process related data in a database we should have information about these processes, their events, the processes’ occurrences and the events’ occurrences. Moreover, these kinds of data should be properly mapped with each other, so that we are able to identify complete process instances in it. Consequently, we can select these process instances easily (because they are clearly shown in the database) and create the event log with them. The ICU database schema given in

59

Figure 13 shows how we can structure in a database the above mentioned data about ICU clinical processes.

This ICU database schema is an abstract database schema because in it, we abstract on details about patients’ data, details about different types of events that can happen to a patient, different resources related to those events etc. Meanwhile, we give more details on how to map processes with events data, patients with their diagnosis etc. The choices on which data to abstract and on which give details are based on the purpose of the schema as described above.

Figure 13‐ Process oriented healthcare database schema

Below, we explain in detail this schema by firstly showing its tables and their content in subsection 5.1.2.1 and then showing the relationships between its tables in subsection 5.1.2.2. Finally, in subsection 5.1.2.3, we discuss how more detailed information can be added to this schema.

5.1.2.1 Tables

The database schema in Figure 13 contains table ‘PATIENT’, ‘DIAGNOSIS’, ‘PATIENT_DIAGNOSIS’, ‘EVENT’, ‘PROCESS’, ‘EVENT_OCCURRENCE’, ‘PROCESS_OCCURRENCE’ and ‘EVENT_PROCESS_OCCURRENCE’. Each of these tables is explained in the following paragraphs of this subsection.

Table ‘PATIENT’ records the information about patients. Each patient has an ID stored in field ‘PATIENT_ID’ that identifies him uniquely in the database during his ICU hospitalization and a date stored in field ‘DISPOSITATION_DATE’ that shows the date the patient leaves ICU. Also, in this table, it can be recorded any other data related to the patient, but we abstract from these kind of data and just show a field named ‘other relevant data’ which represents the rest of the information about a patient that can be saved in a healthcare database.

60

Table ‘DIAGNOSIS’ records the list of all possible diagnosis that patients can have by storing for each diagnosis an ID and a name respectively in fields ‘DIAGNOSIS_ID’ and ‘NAME’. Here, other information about diagnosis can be stored but we abstract on them by using field ‘other relevant data’ that represents any other data about diagnosis that can be stored in a healthcare database.

Table ‘PATIENT_DIAGNOSIS’ stores information that shows what diagnosis patients have. Here, patients are identified by field ‘PATIENT_ID’ and diagnoses by field ‘DIAGNOSIS_ID’. In the same way as for the previous discussed tables, here we have also a field named ‘other relevant data’ that abstracts on any other information that can be saved in this table about patients and their diagnoses.

Table ‘EVENT’ records the list of possible events/activities/tasks (in the remainder of the report we refer to them as events) that caregivers can do to their patients as part as patients’ diagnosis, treatment or monitoring. Each event of this table has an ID (that uniquely identifies the event in the database) and a name respectively in fields ‘EVENT_ID’ and ‘NAME’. Also, here the field named ‘other relevant data’ abstracts on any other information that can be saved in this table about events.

Table ‘PROCESS’ records the list of all healthcare processes that are related to patients’ diagnosis, treatment or monitoring. So, each process in this table has an ID stored in field ‘PROCES_ID’ that uniquely identifies it in the database and a name stored in field ‘NAME’. In addition, the field ‘DIAGNOSIS_ID’ of this table stores information that links processes with diagnosis (i.e. treatment processes).

Table ‘EVENT_OCCURRENCE’ records when a certain event happened to a certain patient, by which caregiver and what value is related to it. Here, events are identified by the values of the field ‘EVENT_ID’, patients by the values of the field ‘PATIENT_ID’, caregivers by the values of the field ‘CAREGIVER’ and the time when the event occurred is stored in field ‘DATETIME’. The value associated with an event can be the output result of the event (for i.e. the value of blood pressure in case of blood pressure measurement event) or a detail related to it (for i.e. the name of the medicine in case of medicine administration event). Also, each event occurrence has an ID stored in field ‘EVENT_OCCURRENCE_ID’ that uniquely identifies it in the database. The occurrence of events in a hospital can be related to many other different types of data but we abstract on such information by including all of it in the field ‘other relevant data’ of this table.

Table ‘PROCESS_OCCURRENCE’ records information that shows when a certain process has started on a certain patient. Here, processes are identified by values of the field ‘PROCESS_ID’, patients by the values of the field ‘PATIENT_ID’ and the time when processes have started by values of the field ‘STARTDATETIME’. Moreover, each process occurrence has an ID stored in field ‘PROCESS_OCCURRENCE_ID’ that uniquely identifies this type of occurrences in the database.

The last table of the database schema shown in Figure 13 is table ‘EVENT_PROCESS_OCCURRENCE’. This table maps occurrences of events with occurrences of the processes. In other words, once an event happens, the information of this table shows to which started process(es) is the event part of. In this table, the started process and the occurred event are identified respectively by the fields ‘PROCESS_OCCURRENCE_ID’ and ‘EVENT_OCCURRENCE_ID’.

61

Finally, the above mentioned tables are linked to each other in order to map their information throughout the database and these relationships are described in the next subsection.

5.1.2.2 Relationships between tables

In this subsection, we describe the relationships between the tables of the database schema shown in Figure 13. A relationship between two tables is shown graphically by the line connecting them, and its cardinality is shown by the text written on the ends of this line.

So, the relationship between table ‘PATIENT’ and table ‘PATIENT_DIAGNOSIS’ shows that each record of the table ‘PATIENT’ is related to zero or more records of table ‘PATIENT_DIAGNOSIS’ (graphically shown by ‘0..*’) and each record of the table ‘PATIENT_DIAGNOSIS’ is related to only one patient in table ‘PATIENT’. This relationship is done based on the values of their common field ‘PATIENT_ID’. In other words, this relationship shows that each patient can have zero to several diagnosis.

Table ‘PATIENT_DIAGNOSIS’ is also related with table ‘DIAGNOSIS’ based on the values of their common field ‘DIAGNOSIS_ID’. This relationship shows that each record of the table ‘DIAGNOSIS’ is related to zero or more records of table ‘PATIENT_DIAGNOSIS’ (graphically shown by ‘0..*’) and each record of the table ‘PATIENT_DIAGNOSIS’ is related to only one record in table ‘DIAGNOSIS’. So, the meaning of this relationship is that the same diagnosis can be given to zero or more patients.

Moreover, table ‘DIAGNOSIS’ is related to table ‘PROCESS’ based on the values of their common field ‘DIAGNOSIS_ID’. This relationship shows that each record of the table ‘DIAGNOSIS’ is related to zero or more records of table ‘PROCESS’ and each record of the table ‘PROCESS’ is related to zero or one record (graphically shown by ‘0..1’) in table ‘DIAGNOSIS’. This means that a process can be related to no diagnosis at all or to a specific diagnosis but for a specific diagnosis we can have zero or more processes related to it.

In addition, table ‘PROCESS_OCCURENCE’ is related separately to table ‘PROCESS’ and table ‘PATIENT’ based respectively on the values of their common fields ‘PROCESS_ID’ and ‘PATIENT_ID’. The relationship between table ‘PROCESS_OCCURENCE’ and table ‘PROCESS’ shows that the occurrence of a process is related to exactly one process and it shows that a process can never occur or it can occur once or more times. Meanwhile, the relationship between table ‘PROCESS_OCCURENCE’ and table ‘PATIENT’ shows that the occurrence of a process is always related to one patient and a patient can be involved in zero to several started processes.

In the database schema shown in Figure 13, we also have the relationships of table ‘EVENT_OCCURRENCE’ with tables ‘EVENT’ and ‘PATIENT’ based respectively on the values of their common fields ‘EVENT_ID’ and ‘PATIENT_ID’. The relationship between the table ‘EVENT_OCCURRENCE’ and table ‘EVENT’ specifies that the occurrence of an event is related to exactly one event and an event can never occur or it can occur once or more times. Moreover, the relationship between the table ‘EVENT_OCCURRENCE’ and table ‘PATIENT’ shows that the occurrence of an event is always related to one patient and to a patient can happen zero to several events (different or same events).

62

The last relationships in the ICU abstract database schema (shown in Figure 13) of table ‘EVENT_PROCESS_OCCURRENCE’. It is related to both table ‘EVENT_OCCURRENCE’ and table ‘PROCESS_OCCURRENCE’ based respectively on the values of their common fields ‘EVENT_OCCURRENCE_ID’ and ‘PROCESS_OCCURRENCE_ID’. So, the relationship between tables ‘EVENT_PROCESS_OCCURRENCE’ and ‘EVENT_OCCURRENCE’ shows that the occurrence of an event can be part of no process or it can be part of one or more started processes. Meanwhile, the relationship between tables ‘EVENT_PROCESS_OCCURRENCE’ and ‘PROCESS_OCCURRENCE’ specifies that a started process can be related to no event occurrences or it can be related to one or more event occurrences.

So, after explaining the content of general ICU schema shown in Figure 13, it is important to mention that this schema is not static because it allows adding more specific information in it. This issue is explained in the next subsection.

5.1.2.3 More specific data

The information stored in the general ICU schema shown in Figure 13 represents the basic general information required to do process mining and it abstracts on details about processes, events, patients and diagnoses. But on the other hand, this schema allows for specialization of its data. This specialization can be done by using ‘other relevant data’ field in some of its tables or by adding specialized tables.

As already mentioned in subsection 5.1.2.1, the field ‘other relevant data’ in some of the tables of general ICU schema represents other relevant information that could be saved in those tables. For instance, in table ‘PATIENT’ there can be other fields that record information about admission date of the patient to ICU, or information telling whether or not the patient died in the hospital etc.

Meanwhile, the second way to store more specific data in the general ICU schema is by adding specialized tables. In Figure 14, it is shown an example on how we can add these specialized tables for table ‘EVENT_OCCURRENCE’.

Figure 14‐ Specialization of table 'EVENT_OCCURRENCE'

63

So, for table ‘EVENT_OCCURRENCE’ we can add a finite number of specialized tables as shown symbolically by the names of these tables (‘EVENT_TYPE 1 OCCURRENCE’, ‘EVENT_TYPE 2 OCCURRENCE’ and ‘EVENT_TYPE N OCCURRENCE’) and each of these table we can add a finite number of fields which store the specific data. To be more concrete and clear, in Figure 15, we show examples of two possible specialized tables for table ‘EVENT_OCCURRENCE’ based on the type of the event.

Figure 15‐ Example of specialized tables for table 'EVENT_OCCURRENCE'

These specialized tables are ‘MEDEVENT’ and ‘MICROBIOLOGYEVENT’ that store specific information respectively about the occurrence of medication administration events and microbiology tests taken from patients. So, table ‘MEDEVENT’ records data that show the name, the dose (also the unit of the dose), and the volume of the medication given to the patient. Also, in this table, there is information about the route through which the medication was given to the patient and in which care unit this event took place. Meanwhile, table ’MICROBIOLOGYEVENT’ records information about the specimen and the organism tested, information about the antibacterium used and the interpretation of the microbiology test events.

In the same way, we can create specialized tables for other tables of general ICU schema such ‘PATIENT’, ‘DIAGNOSIS’ etc.

Finally, the general ICU schema shown in Figure 13 even though shows the basic data needed for process mining, it allows adding specific data in it. So, once we have specified and explained the structure of a process oriented ICU database (for MIMICII database or a general ICU database), it is important to provide the right data to it so that it can be mined by process mining. This issue is discussed in the next subsection.

5.1.3 Practical implications

A database with a process oriented structure but no process data is not useful for process mining. So, in this subsection, we discuss how the proposed new schemas (the abstract ICU database schema given in subsection 5.1.2 and the new schema of MIMICII given in subsection 5.1.1) can be put into practice by giving procedural solutions. So, we discuss in general how to provide process related data to the

64 abstract ICU database schema (given in subsection 5.1.2) and we give more specification on how to provide data to the new MIMICII schema (given in subsection 5.1.1).

Here, with process related data we mean the type of process data which are specified in the proposed new schemas and part of these data are information about processes in general (i.e. process names, to what diagnosis is process related) and information about the occurrences of processes such as when have processes started, which patients are involved in them, what activities are part of them, who performed the activities and when etc.

Obviously, the most common way of providing data to a database is by an Information System. So, in this case, we suggest using the ICU clinical information system for recording the above mentioned process data in real‐time or not (because that is not important). This information system should provide the right GUIs to allow its users (ICU nurses and physicians) doing so.

So, the ICU staff should firstly provide a list with all treatment processes, checklist processes and diagnosis processes that they do in ICU. These processes should be given a name which can be for i.e. ‘treatment of ST‐elevation’, ‘daily checklist’ and so on. Usually the medical staff can come up with such names, but the titles of the guidelines which describe such processes can be used too as the name of these processes. Also, in this list, it should be specified the diagnosis for processes that are done to patients with specific diagnosis (i.e. a treatment process done for a specific diagnosis) by giving the corresponding diagnosis code (in case of MIMICII new schema the diagnosis code can be ICD‐9 code or DRG code).

Furthermore, the ICU staff should also provide a list of predefined events that are part of ICU clinical processes that are named according to the terminology used in general in medical guidelines.

In addition, it is required that the staff responsible for maintaining the database keeps updating the list of processes (saved in table ‘PROCESSITEMS’) and events (saved in table ‘EVENTITEMS’) by adding respectively new processes and new events when it is necessary.

Then, when a process (treatment or checklist process) starts for a certain patient, this information should be recorded in table ‘PROCESS_OCCURRENCE’ of the general ICU database (or in case of MIMICII, it is recorded in table ‘PROCESS_OCCURRENCE’ of its new schema) via an ICU clinical information system automatically or not.

So, if it is a treatment process and it is the first time being executed during the ICU hospitalization of a certain patient who has already a diagnosis (or multiple ones) then the system can automatically select based on the specified diagnosis of the patient (by the user of information system) the process that should be started (in cases when there is only one treatment process linked to the diagnosis) or can ask in one of its GUIs which process to start for the patient (in case that there are several treatment processes for a certain diagnosis). As a result, we have a process instance initialized.

Moreover, in case a treatment process needs to be repeated on the same patient during the same ICU hospitalization or a checklist process needs to be run (for the first time or not), then this information

65 must be recorded manually by the user via a GUI of the information system and cannot be done automatically. In such cases, the information system can allow configurations that make processes start automatically within a certain period of time and for a certain patient. For instance, the daily checklist process (described in section 4.2.1) can be configured to start every day automatically a new instance of it for a certain ICU patient at 8:00 AM for a specific period of time (starting from a certain date and ending at a certain date). Another way, to deal with processes that are repeated on a regular basis to patients can be by alerting the ICU staff to start these processes via the ICU information system.

So, once a process has been started to a patient (for the first time or not), the clinical information system (that stores the data in the ICU general database or more specifically in MIMICII) should adapt its GUIs in order to for the users (the nurses and/or the physicians) to link the results of certain events done the patient with the process these events belong to. So, for instance when a nurse records in the information system (via a GUI of the system) the event about administration of a medication to a patient then she should also specify the process name to which this event is part of. Later, it is the information system that finds the corresponding process instance to map the event with. So, it selects the latest process instance started before the event occurrence time which is related to the specified process and to the given patient.

The link of events with processes in some cases can be done also partly automatically. So, firstly the user chooses whether the activity done to a patient is related to a treatment process or a checklist process or other type of processes and then if it is a treatment process, the corresponding process/guideline is selected automatically by the information system based on the diagnosis of the patient (in cases when there is only one treatment process for that diagnosis). In this case, even when the user does not know part of which process is the data that he/she is entering in the system, the system automatically selects the process based on the diagnosis (if there is only one treatment process linked to the diagnosis) as for i.e. a nurse is registering in the system the result of a laboratory test but she is not aware with which treatment process to link this lab test. In other cases, when there are several treatment processes related to a certain diagnosis then the information system can show to the user the list of these processes so that he can select the process from a relatively short list (because it does not show all the processes but just the ones related to the diagnosis of the patient).

Moreover, the ICU clinical information system should also allow registering activities that are not part of a certain process.

In addition, the end date of the process instance is automatically determined and we do not need to store it. So, if the process instance is related to a process done only once to the patient during his hospitalization at ICU then the disposition date of the patient from ICU (stored in table ‘PATIENT’ of general ICU schema or in table ‘ADMISSIONS’ of MIMICII database) is the end date of this process instance. In other cases, when the same process is repeated several times (i.e. daily checklist) for a specific patient then the end of the process instance is when a new process instance related to the same patient and the same process has started.

66

In providing the above procedural solutions, the issue that we are concerned more about is not having an ICU clinical information system that supports the above mentioned tasks but it is the change that it implies in the ICU work processes. The new way of recording ICU data (with a focus on ICU processes) requires not just storing results of certain events happened to a patient (such as recording the results of blood test or the medication given to a patient)but it also requires storing information about processes to which these events belong. At this point, most of the time ICU staff has to be aware of processes in which their patients are involved before registering patients’ data. This can be a problem for ICU staff because they might not know all processes that have been started to patients.

So, they have to think of running processes (that have a set of events executed in a specified order) and not just of executing distinct events to patients. They should know that these events are usually part of processes but this cannot be trivial for them. An ICU patient can be involved in several clinical processes (such as several treatments, diagnosis processes and checklist processes) which are initialized by different ICU staff members with different roles (and knowledge) and mapping different events happened to the patient with those processes can be a problem. Having not the right knowledge about processes can be the reason for this. For instance, it can happen that a nurse does not have enough knowledge about different pathways of treating a disease because it is not part of her job (but it is part of physicians’ job) but she is required to do several activities as part of this treatment process (such as for i.e. administer a medicine). In this case, she might not be able to map these activities with the treatment process because of insufficient knowledge.

The ICU clinical information system can help in such situations as mentioned above in this subsection by automatically linking the events related to a patient with the treatment process that has started for him, if the user has already specified that they are part of treatment (not diagnosis or checklist processes). But this is not always a solution because several different treatment processes can be started to the patients for different diagnosis. In such cases, the ICU clinical information system cannot map events with treatment processes, because it does not have information about this (information that it gets from the proposed ICU database). The process oriented ICU database does not link events with processes before these events happen. So, once an event occurs then the event is connected to one or more initialized processes most of the times by the users of the information system. Also, it can happen that an activity is not part of a specified process and in these cases too, ICU staff should be able to distinguish such activities from the rest.

To sum up, the data about executed ICU processes and their events can be recorded via the ICU clinical information system but this information system requires some modifications to allow entering these data according to the proposed ICU process oriented database schemas (the general one or the MIMICII). These changes can be challenging to be implemented not only from the technical point of view but also adapting the ICU work processes to the new way of recording data. This new way of recording data requires that ICU staff be aware of the process for which they are recording data because in most of the cases the system cannot map automatically patient events with processes and it is the user who should make this mapping. Unfortunately, it can happen that users are not able to map events with processes due to their insufficient knowledge about these processes. The implications of this issue are discussed in the next section.

67

5.2 Discussion

In this subsection, we reflect on both the technical and procedural solutions proposed in section 5.1. Firstly, in subsection 5.2.1, we discuss how the new proposed schema of MIMICII (described in subsection 5.1.1) solves the problems that we have faced (described in section 4.5) and then, in subsection 5.2.2, we discuss how feasible the suggested solutions (both design and procedural solution) can be in practice.

5.2.1 Solution of our faced issues

The solutions proposed in section 5.1 show among others a new process oriented schema of MIMICII database (discussed in subsection 5.1.1) and how to make process data available to this database (discussed in subsection 5.1.3). In this subsection, we discuss how these solutions solve our faced issues in MIMICII database that are listed in section 4.5.

Firstly, the problem of missing relationship between the process and its data recorded in the database (as discussed in subsection 4.5.1) is solved. In the new database schema, all tables which store different events also store the process instance to which these events belong and this is done by the information stored in the field named ‘PROCESS_INSTANCE_ID’ of the event tables (tables such as MEDEVENTS, IOEVENTS, TOTALBALEVENTS, LABEVENTS, MICROBIOLOGYEVENTS etc.). This field records the ID of the process instance the event is part of, by mapping so the event with the process. Furthermore, with the information of table ‘PROCESS_OCCURRENCE’ this mapping of the event with the process becomes the mapping of the event with the process instance, by making so clear what events’ data belong to each process instance. Consequently, this new schema of MIMICII allows also distinguishing which data are related to the checklist process and which ones are related to the STEMI treatment process.

Secondly, the issue about the difficulty in matching process information from guidelines with the process data in the database (as discussed in 4.5.2) is solved by the new database schema. This difficulty derives mainly because of different terminology used for describing process activities in guidelines and the databases and sometimes it is related to the difficulty in understanding processes because of medical domain knowledge required. This problem is solved if the process data are stored according to the new database schema because it explicitly shows the connection of the process events/activities with the process instances (as described in subsection 5.1.1). Also, the new database schema of MIMICII structures data about processes and events by allowing having a standardized list of each of them (as explained in subsection 5.1.1) and by making it easy to match guideline process information with the process information of the database.

Thirdly, for the issue explained in subsection 4.5.3 related to unstructured process data found in the database as part of text notes, the new database schema offers the possibility to structure it. So, we can store the event happened to a certain patient not any more in free text notes but in a structured way (as explained in subsection 5.1.1). So, the event type itself is added as a new record in table ‘EVENTITEMS’ and the occurrences of the event are recorded in the corresponding event table. As a result, the process information can be retrieved easily to create the event log (because it is not part of text notes anymore).

68

Fourthly, the issue of missing process data in the database (as described in subsection 4.5.4) can also be solved by the new way of recording event data in the new MIMICII database schema. Since the existing MIMICII database schema is not process oriented and also ICU work processes not oriented towards logging process data can be reasons why these process data are not found in the database. This was noticed in the daily checklist process because even when a checklist item was performed, this event is not saved in the database (as discussed in subsection 4.3.1.2). For instance, the nurse checks if she should communicate with the patient's family but she does not store the result of this activity (family communication check item) in cases where there is no need to do family communication. So, having an ICU database that can store process oriented data (the technical solutions described in subsection 5.1.1) and an ICU clinical information system that demands to register the output of every activity done as part of a certain process (the procedural solutions described in subsection 5.1.3) can reduce the recorded missing information about processes and their activities.

The other case when we do not have complete process data is when these data are stored in multiple sources of data (i.e. multiple databases). In such cases our proposed solutions in 5.1 do not solve this problem because the database design is only for one database (not multiple databases) but they contribute in solving it. A solution to this problem can be putting together all the process data (i.e. spread in multiple databases) in one source of data (i.e. one database) which is not easy because data can be found in different formats. But if all the different databases (where process data are stored) have a process oriented structure (similar to the general ICU database schema described in 5.1.2) where process instances are well defined and clear to select then it is easier to put all the process data in one database. This can be done by firstly collecting process instances’ data from different databases and then trying to link these process instances (that have structured and linked data in the form of process instances) which it is easier to do than linking sparse process data (found in databases that are not process oriented).

Finally, the solutions suggested in 5.1 solve the problem of missing relationship in the database between the process and its data (discussed in 4.5.1), the issue about matching process guidelines with process data in the database (discussed in 4.5.2) and the issue related to unstructured process data in the database (discussed in 4.5.3). They also solve the problem related to missing process data in a single database and contribute in solving the issue about process data distributed in multiple databases (described in 4.5.4).

5.2.2 Feasibility of the proposed solutions

In this subsection, we discuss in general how feasible in practice are our solutions proposed in section 5.1. Firstly, we explain the feasibility of implementing MIMICII new version (proposed in subsection 5.1.1). Then the feasibility of the general ICU database (proposed in subsection 5.1.2) and procedural solutions (proposed in subsection 5.1.3).

The new schema of MIMICII suggests for adding information related to processes (adding new fields in existing tables and adding new tables) in the existing database without changing the way the rest of information is saved. So, we keep the existing information of MIMICII database as it is and just propose

69 to store extra information into the database in order to be able to connect different activities done to a patient with different process instances. This makes implementing technically the proposed schema more feasible in practice.

Furthermore, the design solutions (suggested in subsections 5.1.1 and 5.1.2) are conceptual solutions without giving implementation details. So, we do not analyze implementation issues (such as hard disk memory occupied by the new database schema, the processing time of the database etc.) because our aim is to show through this schema a possible way on how to structure the existing information about activities and map it to processes on conceptual level. These implementation issues are left for investigation to the MIMICII database specialists.

In addition, the abstract ICU database (proposed in subsection 5.1.2) leaves space to adapting it to real ICU databases (that have very detailed information about events) or vice versa. This adaptation should always keep the information about the occurrences of processes and events and the relation between them, because this is what makes the database, a process oriented one. This process oriented information is enough to discover processes (by means of process mining) but not always enough to check if medical guidelines are followed because doing this check requires more detailed information. Therefore, in the implementation of this abstract schema, the database designer together with ICU staff should consider also adding detailed information (as explained in subsection 5.1.2.3). We suggest that they refer to the content of medical guidelines to specify these details and also consider the type of detailed data stored for different types of events in MIMICII database.

Once the abstract ICU database schema (or new schema of MIMICII database) is implemented and it is up and running, it is time to record data to it. As we already mentioned in subsection 5.1.3, these data can be recorded via an ICU clinical information system and most of the times the user of the system (ICU staff) should do the mapping between different events with processes but it can happen that users cannot do this mapping due to their insufficient knowledge about these processes. This insufficient knowledge can be completely justified by the role of the user and as a result we should discuss the implications of this on the quality of process data recorded in the database. Usually, it is expected that doctors be mostly aware of the treatment and diagnosis processes meanwhile nurses often have full knowledge on checklist processes. Moreover, the doctors of AZM hospital also told us that nurses in general are much better in knowing and doing protocols and guidelines than physicians who only see the exception cases.

For instance, we are not sure about the knowledge that a nurse might have about a treatment process. So it can happen that a nurse registers in the system the medication administrated to a patient, and if the systems asks her to specify the process for this event (the administration of the medication), she might not know to what treatment process link this event (the patient can have involved in several treatment processes for different diagnoses).

So, it can happen that we have events’ occurrences not mapped to initialized processes because users do not know how to do that (and not because these events are not part of a process). This leads to not complete data about processes. In addition, it can happen that users do the wrong mapping between

70 the initialized process and event occurrence (in cases when users are not aware of this). So, we have incorrect data about processes.

Having incomplete and incorrect data about processes negatively affects the results of process mining them. Consequently, we can have not realistic results from process mining these data (because we are mining the data that does not show the real way the process was done). We can consider these data as noise if they are relatively small compared with the whole set of data to be mined. In such cases, we solve the problem by using process mining techniques that are good in dealing with noise. But if the incomplete and incorrect process data happen often in the database then they are not anymore considered as noise and the problem is not solved.

It can also happen that ICU staff postpones the registration of certain events because they do not know with certainty to which process(es) link these events. This can cause delay in registering the results of activities and as a consequence delay in sharing the information (because making immediately available the information to all the users is one of the ICU clinical information systems’ benefits) that can be really problematic for ICU patients who are critically ill and need immediate interventions and treatments. This does not mean that all the information related to process activities should be recorded real‐time. Actually, this is not a requirement of the solution proposed in section 5.1.3 because it does not make any difference for our proposed solutions as far as the correct date and time of an event occurrence is recorded. So, forcing the ICU staff always map the activities with processes (by the ICU clinical information system) can cause delay in recoding this information and this can have bad consequences on their way of working as explained above. On the other hand, leaving this mapping optional in ICU clinical information system (so that the user can postpone the entering of such data) can increase the chances to have missing information it.

In such cases, we suggest that ICU staff decides which activities’ data should be available immediately (without delay) in the ICU clinical information system (based on the important need of accessing this information as soon as possible) and for these activities the ICU clinical information system should not enforce users to map these activities with processes. In other words, the ICU clinical information system should allow configurations that specify which activities must always be linked to processes at the moment of recording their data and which activities can map to processes later (not necessarily at the first moment of recording their data).

Furthermore, the new way of recording ICU clinical data (described in section 5.1.3) has impact on ICU staff (nurses and physicians) daily work processes. Firstly, the ICU staff should register more data about patient events (i.e. the information about the process to which the event belongs) and in a structured way (no more activities described in text free notes). Recording more data can take more of their time but on the other hand, recording data in structured a way is easier and faster than writing long text notes. Another issue that arises here, it is about how to make sure that ICU staff always log all the information about performed activities without considering logging just the results of those activities. For instance, the check for ‘family communication’ checklist item (part of the checklist process, described in section 4.3.1) should be always recorded when it is done, no matter if we need to do family communication or not (doing family communication represents another different activity). In such cases,

71 we cannot give a concrete solution but training the ICU staff for the new way of recording data can contribute in solving this kind of issues.

Also, we are concerned about the list of ICU processes and events created by ICU staff (stored respectively in tables ‘PROCESSITEMS’ and ‘EVENTITEMS’ of new MIMICII schema or in tables ‘PROCESS’ and ‘EVENTS’ of the abstract ICU database schema). In our proposed schemas, it is important to have a complete list of ICU processes and events and always update them when is necessary. As a result, events (part of a standardized list of events) can always be linked to a certain process (which is part of a standardized list of processes) in the database. In addition, these schemas are flexible to adding new processes and new events (by just adding new records in the corresponding tables).

Our next concern is related to how compatible can be the ICU clinical information system to the proposed changes in subsection 5.1.3 because the user should be provided the right GUIs to enter the new data otherwise the new schema of MIMICII is not useful. So, we do not know if technically it is easy to adapt the existing ICU clinical information system to the changes described in subsection 5.1.3 because adapting it can cause high costs.

To sum up, implementing the design solutions requires that database experts and ICU staff work together on structuring detailed information of the ICU database not shown explicitly in the ICU abstract database schema. Also, database experts should think about implementation issues. Meanwhile, implementing the procedural solutions requires that the ICU clinical information system adapt with the new requirements. In addition, once the database and the ICU clinical information system are implemented according to our proposed solutions in 5.1, many issues can arise to put the new system into work that have impact on the ICU work processes. So, it can happen that ICU staff has insufficient knowledge about processes by resulting so in having incomplete and incorrect data about processes in the ICU database that can negatively affect the results of process mining them. Also, the ICU staff needs to register more data than before (on daily basis and in building the standardized list of ICU processes and events) and we should assure that they do it properly by training them on the new way of recording ICU data.

72

6 Conclusions and future work

In this chapter, we summarize the results of the work accomplished in this graduation project. Section 6.1 summarizes the work undertaken in this master project and the findings in view of the goals and research questions that were set out in the introduction chapter. Section 6.2 lists some of the limitations and possible directions for future work.

6.1 Conclusions

Healthcare provided in intensive care units (ICUs) of hospitals is both highly important and expensive. Therefore, it is crucial to improve their services’ quality, reduce medical errors and costs. One way to achieve this is by analyzing ICU clinical processes (treatment, monitoring and diagnosis processes) by means of process mining techniques in order to check if these processes are done in adherence to guidelines and protocols. So having this goal, we have considered interesting to apply process mining on ICU data of the MIMICII database about a monitoring process (the checklist process described in subsection 4.2.1), a treatment process (the ST‐elevation treatment process described in subsection 4.2.2) and a rule‐based process (antibiotics administration process described in subsection 4.2.3).

After getting more insight from literature about the selected processes, we have noticed that the antibiotic administration process is not a process because it includes rules that correspond to individual actions not related to each other. So, we decided not to proceed further with our attempt on mining them because there is no direction to investigate it according to the thesis goal.

In addition, we continued our attempt to mine the other two selected processes. This attempt failed because in MIMICII we could not provide all the data about these processes required for mining them with process mining according to our goal. So, part of these data was missing or was unstructured and in the existing data it was missing the link between the events’ occurrences and processes.

It is interesting to mention that even though MIMICII database is a well structured database which contains vast amount of diverse ICU data, we still are not able to discover processes from its data by means of process mining techniques. Also, even though MIMICII database is highly advised to be used for data mining (and many different successful researches are done to it in this area) it still is not good for process mining. The main reason for this can be that it contains data on the activities but misses the link between them. Also, it misses the link with the process and there are too many processes to assume that all activities belong to one process.

Furthermore, the issues faced in our attempt to mine the two selected processes in MIMICII seem to be general problems in providing process data and as a result solving them it is important. As solutions to them, in chapter 5, we have designed technical solutions and suggested procedural solutions by firstly being focused on solving these issues for MIMICII database and then providing general abstract solutions that can be applied to any ICU database. These technical solutions are ICU databases’ schemas (a MIMICII database schema and an abstract ICU database schema) which show how to store ICU clinical processes related data in a structured way by allowing so to discover these processes by means of process mining. Meanwhile, the procedural solutions are given to complete the technical ones and they

73 suggest different ways on how to collect and record these processes data by using an ICU clinical information system.

The big challenge on applying our suggested solutions in ICUs is not related to their technical implementation but it is the change that they can imply on ICU work processes. The new way of recording ICU data (with a focus on ICU processes) requires not just storing results of certain events happened to a patient but it also requires storing information about processes to which these events belong. The ones who are responsible for recording these kinds of information is the ICU staff (ICU nurses and physicians) but it can happen that they cannot map events’ occurrences with initialized processes of patients due to insufficient knowledge about ICU clinical processes. This insufficient knowledge is related to different roles of ICU staff (such as nurse, physicians etc.) that have different level of education on intensive care clinical processes. For instance, a nurse is specialized in doing different things compared to doctors. Not mapping events’ occurrences with initialized processes results in incomplete and incorrect process data. Since process mining output is heavily based on the quality of process data then mining incomplete and incorrect process data will lead in results that do not reflect the reality.

In addition, the general abstract solutions proposed in section 5.1 about ICU databases can be applied also to other types of clinical healthcare databases due to their high level of abstraction. These solutions do not have details related specifically to ICU clinical processes and as a result they can be used to structure data of different clinical healthcare processes.

Finally, in this thesis project, we realized that it is not enough to have plenty of various data in a healthcare database to apply process mining on them because we need process oriented data. So, a database that is good for data mining, not necessarily is good also for process mining.

6.2 Limitations and future work

In this section, we mention the limitations of the work presented in this thesis and we give suggestions for possible future work.

The main limitation of this project is not having ICU experts involved in different phases of the project but just at the starting phase of it. We have consulted doctors only for selecting some specific ICU processes to be mined by process mining techniques and later we followed the research work without their participation. But it turned out that ICU experts were needed also for understanding ICU processes and their data. Moreover, it would be good to validate our ideas and solutions with experts (i.e. ICU doctors) in order to find out whether they are feasible, even without implementing them. Doing so can help in improving our solutions in time (before implementing them) and consequently avoid undesired implications such as unfeasible solutions, higher costs etc.

So, for similar projects, we suggest to include experts of the field because their help for understanding processes and context where they are run is needed. Also, their participation can positively affect the results of the project.

74

Moreover, this project proposes an abstract process oriented database schema for storing process related data in such way that this database is a useful source of data for process mining. Actually, in this thesis, we suggest using this schema mainly for ICU databases but also for other types of clinical healthcare databases. So, in order to prove its values, we suggest as future work to investigate on the applicability of this abstract database design on existing clinical healthcare databases (ICU databases or not), so that we can check in a concrete way if we can discover processes in it by process mining techniques.

In healthcare, data mining is becoming increasingly popular, if not increasingly essential [24]. Our suggested database design considers only what type of data process mining requires and abstract on the rest of the data. This can make our design not so suitable for data mining. So, creating a new healthcare database structure that is good for both process mining and data mining can be a future work following the one done in this project. This last type of database provides more value for the healthcare domain because its data can be mined with both mentioned mining techniques and consequently take advantage of their benefits.

75

Bibliography

1. Aalst, W.M.P. van der. Process mining : discovery, conformance and enhancement of business processes. Springer, Heidelberg [etc.], 2011. 2. Anderson, F.A. and Audet, A.‐M. Best Practices. Preventing Deep Vein Thrombosis and Pulmonary Embolism. A Practical Guide to Evaluation and Improvement. http://www.outcomes‐ umassmed.org/dvt/best_practice/index.htm#Section4. 3. Antman, E.M., Anbe, D.T., and Armstrong, P.W. ACC/AHA Guidelines for the Management of Patients With ST‐Elevation Myocardial Infarction‐‐Executive Summary: A Report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines (Writing Committee to Revise the 1999 Guidelines for the Management of Patients With Acute Myocardial Infarction). Circulation 110, 5 (2004), 588–636. 4. Badar, V. and Navale, S. Study of Prescribing Pattern of Antimicrobial Agents in Medicine Intensive Care Unit of a Teaching Hospital in Central India. 60, (2012). 5. Bellazzi, R. and Hanna, A.A. Data Mining Technologies for Blood Glucose and Diabetes Management. 3, (2009). 6. Bouten, C.V., Oomens, C.W., Baaijens, F.P., and Bader, D.L. The etiology of pressure ulcers: Skin deep or muscle bound? Archives of Physical Medicine and Rehabilitation 84, 4 (2003), 616–619. 7. Clifford, G., Scott, D.J., and Villarroel, M. User Guide and Documentation for the MIMIC II Database. 2011. 8. Curtis, J.R., Cook, D.J., Wall, R.J., et al. Intensive care unit quality improvement: A how‐to guide for the interdisciplinary team*. Critical Care Medicine 34, 1 (2006), 211–218. 9. Davenport, T.H. Process innovation : reengineering work through information technology. Harvard Business School Press, Boston, Mass., 1993. 10. Deepak, B.L., Scheiman, J., Abraham, N.S., et al. ACCF/ACG/AHA 2008 Expert Consensus Document on Reducing the GastrointestinalRisks of Antiplatelet Therapy and NSAID Use : A Report of the American College ofCardiology Foundation Task Force on Clinical Expert Consensus Documents. 52, (2008), 1502–1517. 11. Feldman, R. and Sanger, J. The text mining handbook : advanced approaches in analyzing unstructured data. Cambridge University Press, Cambridge; New York, 2007. 12. Fialho, A.S., Cismondi, F., Vieira, S.M., Reti, S.R., Sousa, J.M.C., and Finkelstein, S.N. Data mining using clinical physiology at discharge to predict ICU readmissions. Expert Systems with Applications, (2012). 13. Fialho, A.S., Cismondi, F., Vieira, S.M., et al. Predicting Outcomes of Septic Shock Patients Using Feature Selection Based on Soft Computing Techniques. In, E. Hüllermeier, R. Kruse and F. Hoffmann, eds., Information Processing and Management of Uncertainty in Knowledge‐Based Systems. Applications. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010, pp. 65–74. 14. Fialho, A.S., Cismondi, F., Vieira, S.M., et al. Fuzzy modeling to predict administration of vasopressors in intensive care unit patients. IEEE (2011), 2296–2303. 15. Garland, A. Improving the ICU: Part 2. Chest 127, 6 (2005), 2165–2179. 16. Garrouste‐Orgeas, M., Timsit, J.F., Vesin, A., et al. Selected Medical Errors in the Intensive Care Unit: Results of the IATROREF Study: Parts I and II. American Journal of Respiratory and Critical Care Medicine 181, 2 (2009), 134–142. 17. Gawande, A. The checklist manifesto : how to get things right. Profile, London, 2010. 18. Gortzis, L.G., Sakellaropoulos, F., Ilias, I., Stamoulis, K., and Dimopoulou, I. Predicting ICU survival: A meta‐level approach. BMC Health Services Research 8, 1 (2008), 157. 19. Gupta, S. Workflow and Process Mining in Healthcare. 2007.

76

20. Hammer, M. and Champy, J. Reengineering the corporation a manifesto for business revolution. Harper Collins, New York, 2006. 21. Hand, D.J., Mannila, H., and Smyth, P. Principles of Data Mining. MIT Press, Cambridge,MA, 2001. 22. Johansson, H.J., McHugh, P., Pendlebury, A.J., and Wheeler, W.A. Business process reengineering : breakpoint strategies for market dominance. John Wiley, Chichester [etc.], 1993. 23. Kim, S., Kim, W., and Park, R.W. A Comparison of Intensive Care Unit Mortality Prediction Models through the Use of Data Mining Techniques. Healthcare Informatics Research 17, 4 (2011), 232–243. 24. Koh, H.C. and Tan, G. Data Mining Applications in Healthcare. 19, 2 (2005), 64–72. 25. Kollef, M.H. Optimizing antibiotic therapy in the intensive care unit setting. 5, (2001), 189–195. 26. Lang, M., BÜRKLE, T., Laumann, S., and prokosch, H.‐U. Process Mining for Clinical Workflows: Challenges and Current Limitations. In, eHealth Beyond the Horizon – Get IT There. IOS Press, 2008, pp. 229–234. 27. Luce, J.M. and Rubenfeld, G.D. Can Health Care Costs Be Reduced by Limiting Intensive Care at the End of Life? 165, (2002), 750–754. 28. Lundgrén‐Laine, H., Kontio, E., Perttilä, J., Korvenranta, H., Forsström, J., and Salanterä, S. Managing daily intensive care activities: An observational study concerning ad hoc decision making of charge nurses and intensivists. Critical Care 15, 4 (2011), R188. 29. Mans, R.S., Schonenberg, H., Leonardi, G., et al. Process mining techniques: an application to stroke care. 136, (2008), 573–578. 30. Mans, R.S., Schonenberg, M.H., Song, M., Aalst, W.M.P., and Bakker, P.J.M. Application of Process Mining in Healthcare – A Case Study in a Dutch Hospital. In, A. Fred, J. Filipe and H. Gamboa, eds., Biomedical Engineering Systems and Technologies. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 425–438. 31. Melton, J. and Simon, A.R. Understanding the new SQL : a complete guide. Kaufmann, San Francisco Calif., 1997. 32. Muscedere, J., Dodek, P., Keenan, S., Fowler, R., Cook, D., and Heyland, D. Comprehensive evidence‐ based clinical practice guidelines for ventilator‐associated pneumonia: Prevention. Journal of Critical Care 23, 1 (2008), 126–137. 33. Niederman, M.S. Appropriate use of antimicrobial agents: Challenges and strategies for improvement. Critical Care Medicine 31, 2 (2003), 608–616. 34. Poelmans, J., Dedene, G., Verheyden, G., Mussele, H., Viaene, S., and Peters, E. Combining Business Process and Data Discovery Techniques for Analyzing and Improving Integrated Care Pathways. In, P. Perner, ed., Advances in Data Mining. Applications and Theoretical Aspects. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010, pp. 505–517. 35. Pruitt, B. Weaning patients from mechanical ventilation. 36, (2006), 36–41. 36. Pulcini, C., Pradier, C., Long, C.S., et al. Factors associated with adherence to infectious diseases advice in two intensive care units. Journal of Antimicrobial Chemotherapy 57, 3 (2006), 546–550. 37. Ramon, J., Fierens, D., Güiza, F., et al. Mining data from intensive care patients. Advanced Engineering Informatics 21, 3 (2007), 243–256. 38. Rebuge, Á. and Ferreira, D.R. Business process analysis in healthcare environments: A methodology based on process mining. Information Systems 37, 2 (2012), 99–116. 39. Rogers, W.J., Canto, J.G., Lambrew, C.T., Tiefenbrunn, A.J., and Kinkaid, B. Temporal trends in the treatment of over 1.5 million patients with myocardial infarction in the US from 1990 through 1999: the National Registry of Myocardial Infarction 1, 2 and 3. 36, (2000), 2056–2063. 40. Saeed, M., Villarroel, M., Reisner, A.T., et al. Multiparameter Intelligent Monitoring in Intensive Care II: A public‐access intensive care unit database*. Critical Care Medicine 39, 5 (2011), 952–960.

77

41. Schuerer, D.J.E., Nast, P.A., Harris, C.B., et al. A New Safety Event Reporting System Improves Physician Reporting in the Surgical Intensive Care Unit. Journal of the American College of Surgeons 202, 6 (2006), 881–887. 42. Sevdalis, N. and Brett, S.J. Improving care by understanding the way we work: human factors and behavioural science in the context of intensive care. Critical Care 13, 2 (2009), 139. 43. Valentin, A. and Bion, J. How safe is my intensive care unit? An overview of error causation and prevention. Current Opinion in Critical Care 13, 6 (2007), 697–702. 44. Veiga, G.M. and Ferreira, D.R. Understanding Spaghetti Models with Sequence Clustering for ProM. In, S. Rinderle‐Ma, S. Sadiq and F. Leymann, eds., Business Process Management Workshops. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010, pp. 92–103. 45. Zilberberg, M.D., de Wit, M., Pirone, J.R., and Shorr, A.F. Growth in adult prolonged acute mechanical ventilation: Implications for healthcare delivery*. Critical Care Medicine 36, 5 (2008), 1451–1455. 46. Medical Technology and Costs of the Medicare Program, Diagnosis Related Groups (DRGs) and the Medicare Program: Implications for Medical Technology ‐ A Technical Memorandum. U.S. Government Printing Office, Washington, 1983. 47. Relational database. 2006. http://searchsqlserver.techtarget.com/definition/relational‐database. 48. ICU Sedation Guidelines of Care. 2009. http://www.chpso.org/meds/sedation.pdf. 49. International Statistical Classification of Diseases and Related Health Problems. 2012. http://en.wikipedia.org/wiki/ICD. 50. Electrocardiography. 2012. http://en.wikipedia.org/wiki/Electrocardiography. 51. Intravenous therapy. 2012. http://en.wikipedia.org/wiki/Intravenous_therapy#Intravenous_access_devices. 52. MIMICII clinical overview. 2012. http://physionet.org/mimic2/mimic2_clinical_overview.shtml. 53. MIMICII. 2012. http://physionet.org/mimic2/. 54. ST‐elevation. 2012. http://en.wikipedia.org/wiki/ST_elevation. 55. Predictive Medical Technologies. 2012. http://www.predictive‐medical.com/. 56. DRGs 124/125—Circulatory Disorders Except AMI with Cardiac Catheterization with and without Complication/Comorbidity. ICD‐9‐CM Coding Guidelines. http://www.primaris.org/sites/default/files/resources/HPMP/coding%20guidelines%20booklet_DR G_124_125.pdf.

78

Appendix A‐ MIMICII tables and their relationships

79

Figure 16‐ Major MIMICII clinical database components and their relationships (ref/ MIMICII guide2)

80

Figure 17‐ Patient to ICD‐9 and diagnosis‐related group code

81

Figure 18‐ Caregiver table and its relationships

82

Figure 19‐ Careunits table and its relationships.

83

Figure 20‐ Patient medication tables and their relationships

84

Figure 21‐ Structure of table ChartEvents

Figure 22‐ Structure of table D_ChartItems

Figure 23‐ Structure of table ICUSTAYEVENTS

Figure 24‐ Structure of table CENSUSEVENTS

85

Figure 25‐Structure of table NoteEvents

Figure 26‐ Structure of table DEMOGRAPHICSEVENTS

86

Appendix B ‐ How can we create a log file with the process data of the new MIMICII schema?

In the new MIMICII database schema explained in subsection 5.1.1, we do not need to struggle to distinguish and then retrieve the data of a certain process to create the event log to be mined. Mainly, this schema structures the processes’ information in such a way that it is clear to distinguish process instances. These process instances (or cases) are the basic information needed for the log file. So, in this subsection, we explain how we can create a log file with the process data from the new suggested database schema.

It is very important that we define the ‘case’ before extracting the data for the event log. The case is the sequence of activities/events of a specific process which refer to a single instance of that process and the event log is a set of cases (as described in subsection 2.2.2). The case for a treatment process or checklist process is defined as the combination of the following information from the new version of MIMICII database:

1) Information related to the patient and his ICU stay This information identifies uniquely the patient in the database and it is a combination of fields SUBJECT_ID, HADM_ID and ICUSTAY_ID as described in subsection 4.1.3. 2) Information related to the process name This information specifies the process for which we want to create the log file. So, we identify here the process by the field PROCESS_ID. 3) Information related to the time the process started This information specifies the date and time when the process identified in 2) has started for the patient who was selected in 1). 4) Information related to activities/events of the process This information includes the data of all events happened to a patient (selected in 1)) which are done as part of a process (selected in 2)) started at a certain time (specified in 3)) till the discharge of the patient from ICU or till the next time the same process has started again (in case the same process is repeated several times to the same patient during the same ICU hospitalization).

So, the information about when a process has started for a specific patient is found in table ‘PROCESS_OCCURRENCE’ (the data related to 1) and 2) and 3)). Then the information about all events that are related to this started process (data related to 4)) are recorded in the events tables (such as MEDEVENTS, IOEVENTS, LABEVENTS etc.) of new version of MIMICII. Meanwhile the information about the discharge date of the patient from ICU is found in table ‘ADMISSIONS’.

To sum up, in the new version of the MIMICII database, we can identify process instances and all the events related to them. So it is easy to select the processes’ data needed for the event log files which will be mined by process mining techniques.

87

Appendix C ‐ Can we use the diagnosis to link events with processes?

As already shown in the new schema of MIMICII in Figure 8 we link processes with diagnoses in table ‘PROCESSITEMS’. So, in this subsection, we discuss whether only the diagnosis of the patient can be used to link events happened to a patient with a process. In other words, once an event happens to a patient, can we automatically (by the ICU clinical information system) specify to which process the event belongs, just based on patient diagnosis information found in the new schema of MIMICII.

So, in order to answer this question we should consider the following possible scenarios:

1‐ The activity/event can be part of one checklist process. 2‐ The activity/event can be part of several checklist processes. 3‐ The activity/event can be part of one treatment process. 4‐ The activity/event can be part of several treatment processes. 5‐ The activity/event can be part of one checklist process and one treatment process. 6‐ The activity/event can be part of one checklist process and several treatment processes. 7‐ The activity/event can be part of several checklist processes and one treatment process. 8‐ The activity/event can be part of several checklist processes and several treatment processes.

In the following paragraphs, we explain the given scenarios.

Scenario 1‐ The activity/event can be part of one checklist process.

In this scenario, the activity done to the patient is part of one checklist process. We have two possible cases in this scenario; the checklist is done to every patient (as it was the checklist process discussed in 4.2.1) or the checklist is done to some specific patients (i.e. to patients with a certain diagnosis to check some specific health conditions). So, in the first case (the checklist is done to every patient) there is no link between the process and the diagnosis of the patient, and as a result diagnosis information cannot be used to specify to which process the activity belongs to. In the second case (the checklist is done to some specific patients), if the checklist is done to only patients with a certain diagnosis then automatically the system can suggest to the user of the system the corresponding processes related to the diagnosis.

Scenario 2‐ The activity/event can be part of several checklist processes.

This scenario is similar to the first scenario, only here the activity is common for several checklist processes. So, the activity should be linked manually by the user of the information system with all checklist processes that it is part of and in database the activity is stored multiples times in different records for each checklist.

Scenario 3‐ The activity/event can be part of one treatment process.

88

In this scenario, the activity is part of one treatment process. A treatment process is performed only on patients that have certain health conditions as for i.e. a certain diagnosis. So, ICU medical staff does start a treatment process on a patient after a diagnosis (temporally or permanent) for the patient is specified. In such cases, the system can suggest to its user the list of treatment processes related to the patient diagnosis. If the list contains just one process then the system automatically selects the process otherwise it asks the user to select the process via a GUI of the information system.

Scenario 4‐ The activity/event can be part of several treatment process.

This scenario can happen when an event happens to a patient who has several different diagnoses. So, the patient is being treated for all his/her diagnoses and an event happened to him/her can be related to more than one of the patient different treatment processes. For example, a lab test is done to a patient who is being treated for three different diagnoses and this test is part of two of his treatment processes. In this case, we have a common activity for different treatment processes and the system cannot specify automatically (based on the patient diagnosis) the treatment processes to which this activity is part of. Here, the system can suggest (in a similar way as in scenario 3) the list of treatment processes related to all patient diagnoses and it is the user who links the event with the corresponding processes (in this case with more than one process). Then, in the new schema of MIMICII, the event is registered in a different record for each process that it belongs to.

The other scenarios where an activity happened to a patient can be part of one or several checklist processes and one or several treatment processes, the system again cannot decide automatically which processes to link with this activity only based on the diagnosis of the patient. In these cases, the system just can suggest the treatment processes related to the patient (based on his/her diagnoses) and it is left to the user of the information system to specify the link between the activity and the processes.

In summary, the information which relates a diagnosis with a process (stored in the new version of MIMICII) can help the clinical information system makes the connection between an activity/event happened to the patient with a process but this cannot be done always completely automatically. So, in cases when the diagnosis of the patient is related to only one treatment process then once the user has specified that the event is part of a treatment process (and not a checklist process), the system can automatically link the event happened to patient with the treatment process. In other cases, the information system cannot decide which process to select and this choice is left to the user of the information system.

89

Appendix D

Figure 27 ‐ Richmond Agitation Sedation Scale

Figure 28‐ Procedure of RASS assessment

90

Circulation/SkinInt

Imp Skin Cleanse #1

Imp Skin Cleanse #2

Imp Skin Cleanse #3

Imp Skin Cleanse #4

Imp Skin Cleanse #5

Imp Skin Cleanse #6

ImpSkin Character #1

ImpSkin Character #2

ImpSkin Character #3

ImpSkin Character #4

ImpSkin Character #5

ImpSkin Character #6

ImpSkin Drain/Amt #3

ImpSkin Drain/Amt #6

ImpSkin Drain/Amt#1

ImpSkin Drain/Amt#2

ImpSkin Drain/Amt#4

ImpSkin Drain/Amt#5

ImpSkin Treatment #1

ImpSkin Treatment #2

ImpSkin Treatment #3

ImpSkin Treatment #4

91

ImpSkin Treatment #5

ImpSkin Treatment #6

ImpSkin Wound Base#1

ImpSkin Wound Base#2

ImpSkin Wound Base#3

ImpSkin Wound Base#4

ImpSkin Wound Base#5

ImpSkin Wound Base#6

ImpSkinWth/Lth #1

ImpSkinWth/Lth #2

ImpSkinWth/Lth #3

ImpSkinWth/Lth #4

ImpSkinWth/Lth #5

ImpSkinWth/Lth #6

Impaired Skin Site#1

Impaired Skin Site#2

Impaired Skin Site#3

Impaired Skin Site#4

Impaired Skin Site#5

Impaired Skin Site#6

Skin Care

Skin Color

Skin Integrity

Skin Temp/Condition

92

skin

Skin tags

SKIN

Splint off,skinchk

Sheepskin

SkinInteg1/1

Temp Skin [C]

Skin

Mineral oil/ skin

sheepskin for heels

Critic‐Aid Skin Pst

skin laceration

Table 6‐ List of chart items related to Skin impairments check item of the checklist obtained in step 2

Impaired Skin Site#1

Impaired Skin Site#2

Impaired Skin Site#3

Impaired Skin Site#4

Impaired Skin Site#5

Impaired Skin Site#6

Table 7‐Reduced list of chart items related to Skin impairments check item of the checklist

93

Table 8‐ Examples of process mining products: commercial tools C, academic tools A and open‐source tools O. [1:271].

94