Author Emmanuel Helm MSc

Submission Institute of Application- oriented Knowledge Processing (FAW)

First Supervisor PROCESS MINING IN a.Univ.-Prof. Dr. Josef Küng

STANDARDS-BASED Second Supervisor Prof. Marcos Sepúlveda PhD HEALTHCARE February 2021 INFORMATION SYSTEMS

Doctoral Thesis to confer the academic degree of Doktor der technischen Wissenschaften in the Doctoral Program Technische Wissenschaften

JOHANNES KEPLER UNIVERSITY LINZ Altenberger Straße 69 4040 Linz, Austria jku.at

Sworn Declaration

I hereby declare under oath that the submitted Doctoral Thesis has been written solely by me without any third-party assistance, information other than provided sources or aids have not been used and those used have been fully documented. Sources for literal, paraphrased, and cited quotes have been accurately credited. The submitted document here present is identical to the electronically submitted text document.

Linz, 25th February, 2021

Emmanuel Helm

i

“Computer Science is no more about comput- ers than astronomy is about telescopes.” Edsger W. Dijkstra

iii

Abstract

Healthcare organizations are bureaucracies where groups of trained professionals coordi- nate their work within functional units or departments. This coordination is based more on the standardization of skills and knowledge rather than on the standardization of work processes. However, by operating the user interfaces of their information systems and medical devices, healthcare personnel triggers a sequence of functions and procedures. From the point of view of the systems and their recorded event logs and databases, these actions constitute a process that is emerging over time. The research discipline of “process mining” aims to facilitate understanding and improvement of these processes. This thesis addresses challenges that emanate from the application of process mining techniques to data of healthcare information systems – especially data collection, data integration, and data quality. To this end, a review of existing work in the field process mining in healthcare is conducted and the characteristics of healthcare data are described. Based on the assumption that healthcare information systems strive for interoperability, methods are developed and tested to utilize the (process) data recorded in standardized “audit trails”. The contributions include an interface to access this data based on Health Level Seven (HL7) Fast Healthcare Interoperability Resources (FHIR). This thesis also presents a case study on guideline compliance checking for melanoma surveillance procedures based on process mining. The conclusion is that different initiatives and standardization efforts gradually converge towards a better, more interoperable, health IT environment. Syntactic and semantic interoperability pave the road for “process interoperability” and the standardization of data plays a key role in achieving a better understanding of the complex interactions in healthcare workflows.

v

Kurzfassung

Gesundheitseinrichtungen sind bürokratisch organisiert und Gruppen ausgebildeter Fach- leute koordinieren ihre Arbeit innerhalb funktionaler Einheiten (Abteilungen). Die Ko- ordination basiert dabei eher auf der Standardisierung von Fähigkeiten und Wissen als auf der Standardisierung von Arbeitsprozessen. Das Gesundheitspersonal initiiert durch die Bedienung von Benutzerschnittstellen der Informationssysteme und medizinischen Geräte jedoch eine Kette von Funktionsaufrufen und Interaktionen im Hintergrund. Diese Aktionen stellen, aus der Sicht der Systeme, aufgezeichnet in ihren Ereignisprotokollen und Datenbanken, einen Prozess dar, der sich über die Zeit entwickelt. Die Forschungs- disziplin des “Process Mining” zielt darauf ab, das Verständnis und die Verbesserung dieser Prozesse zu ermöglichen. Diese Dissertation befasst sich mit den Herausforderungen, die sich aus der Anwendung von Process-Mining-Techniken auf Daten von Gesundheitsinformationssystemen ergeben - insbesondere Datenerfassung, Datenintegration und Datenqualität. Zu diesem Zweck wird ein Überblick über bestehende Arbeiten auf dem Gebiet des Process Mining im Gesund- heitswesen gegeben und die Charakteristika von Gesundheitsdaten werden beschrieben. Ausgehend von der Annahme, dass Gesundheitsinformationssysteme nach Interopera- bilität streben, werden Methoden entwickelt und getestet, um die in standardisierten “Audit Trails” aufgezeichneten (Prozess-)Daten zu nutzen. Der wissenschaftliche Beitrag umfasst zusätzlich eine Schnittstelle für den Zugriff auf diese Daten auf Basis des Health Level Seven (HL7) Fast Healthcare Interoperability Resources (FHIR) Standards. In dieser Dissertation wird auch eine Fallstudie zur Überprüfung der Einhaltung klinischer Leitfäden nach Melanom-Behandlungen basierend auf Process Mining präsentiert. Verschiedene Initiativen und Standardisierungsvorhaben im Gesundheitswesen führen da- zu, dass die IT-Systeme allmählich interoperabler werden. Syntaktische und semantische Interoperabilität ebnen den Weg für die “Prozessinteroperabilität”, und die Standardisie- rung von Daten spielt eine Schlüsselrolle beim Erreichen eines besseren Verständnisses für die komplexen Prozesse und deren Interaktionen im Gesundheitswesen.

vii

Contents

iii

Abstract v

Kurzfassung vii

1 Introduction 1 1.1 Process Mining ...... 2 1.1.1 Running Example ...... 3 1.2 Healthcare Processes ...... 4 1.2.1 A Distinction ...... 4 1.2.2 Characteristics ...... 5 1.3 Relevance ...... 6 1.3.1 Process Characteristics Challenges ...... 7 1.3.2 Event Log Quality ...... 7 1.3.3 Data Collection and Integration ...... 8 1.4 Research Questions ...... 8 1.5 Contributions ...... 9 1.6 Structure of the Thesis ...... 10

2 Process Mining 13 2.1 Event Log as Starting Point for Process Mining ...... 14 2.1.1 Extended Running Example ...... 15 2.1.2 Data Quality Issues ...... 16 2.1.3 Event Log Maturity Levels ...... 17 2.2 Types of Process Mining ...... 18 2.2.1 Process Discovery ...... 19 2.2.2 Conformance Checking ...... 22 2.2.3 Process Enhancement ...... 24 2.3 Mining Different Perspectives ...... 24 2.3.1 Control-Flow ...... 24 2.3.2 Time ...... 25 2.3.3 Organizational ...... 25

ix 2.3.4 Case ...... 26 2.4 Standardized Event Log Representation ...... 26 2.5 Recent Developments in Process Mining ...... 28

3 Interoperability and Healthcare Data Standards 29 3.1 Interoperability ...... 30 3.1.1 Technical Interoperability ...... 31 3.1.2 Semantic Interoperability ...... 31 3.1.3 Process Interoperability ...... 32 3.2 Standards Development Organizations ...... 32 3.2.1 DICOM ...... 32 3.2.2 HL7 ...... 32 3.2.3 IHE ...... 33 3.3 Healthcare Data Exchange Standards ...... 33 3.3.1 FHIR ...... 36 3.4 Terminology Systems ...... 36 3.4.1 ICD-10 ...... 37 3.4.2 RadLex ...... 37 3.4.3 SWIM Lexicon ...... 38 3.4.4 LOINC ...... 39 3.4.5 SNOMED CT ...... 39 3.5 Integration Profiles ...... 40 3.5.1 ATNA ...... 40 3.5.2 SOLE ...... 42 3.6 Discussion ...... 43

4 State of Process Mining in Healthcare 45 4.1 Other Reviews ...... 46 4.2 Literature Review Methodology ...... 47 4.2.1 Selection of Clinically-relevant Case Studies ...... 48 4.2.2 Process Mining Aspects ...... 50 4.2.3 Clinical Aspects and Standard Coding Schemes ...... 50 4.3 Results of the Review ...... 50 4.3.1 Selected Case Studies ...... 50 4.3.2 Process Mining Aspects ...... 50 4.3.3 Clinical Aspects Using Standard Clinical Descriptors ...... 52 4.4 Conclusion ...... 54 4.4.1 Reporting Basic Characteristics of the Event Log Data . . . . . 55 4.4.2 Adopting the Use of Standard Clinical Descriptors ...... 55 4.4.3 The Need for a Reporting Template ...... 56 4.5 Reporting Template Outline ...... 57

5 Mining Audit Trails 59 5.1 Standardized Audit Logging ...... 60 x 5.1.1 IHE Audit Message Semantics ...... 61 5.1.2 HL7 FHIR AuditEvent Resource ...... 63 5.2 Direct Mapping Approach ...... 63 5.2.1 Transformation Architecture ...... 63 5.2.2 Test Setting ...... 64 5.2.3 Audit Messages for the Running Example ...... 66 5.2.4 Transformation Result ...... 68 5.2.5 Visualization ...... 69 5.2.6 Discussion and Issues ...... 69 5.3 Data Warehouse Approach ...... 70 5.3.1 OpenSLEX Meta Model ...... 71 5.3.2 Mapping and Integration ...... 72 5.3.3 Discussion and Issues ...... 75 5.4 Process Mining Interface ...... 76 5.4.1 Simulate ...... 77 5.4.2 Store & Provide ...... 78 5.4.3 Analyze ...... 80 5.4.4 Test Results ...... 81 5.4.5 Discussion and Issues ...... 82 5.5 Discussion ...... 84 5.5.1 Data Quality in Audit Logs ...... 84

6 Compliance Checking 87 6.1 Melanoma Surveillance ...... 88 6.2 Characteristics of the Case Study ...... 89 6.3 Methodology ...... 90 6.3.1 Data Preparation ...... 90 6.3.2 Time Boxing ...... 91 6.3.3 Conformance Checking ...... 92 6.4 Results of the Case Study ...... 92 6.4.1 Data Preparation ...... 93 6.4.2 Conformance Checking ...... 94 6.4.3 Applied Process Discovery ...... 96 6.5 Discussion ...... 97 6.5.1 Reuse of Clinical Data for Process Mining ...... 97 6.5.2 Events with Time Constraints Spanning a Long Period of Time 99 6.5.3 Medical Implications ...... 99 6.5.4 Guideline Compliance Measurement ...... 101

7 Conclusions and Outlook 103 7.1 Research Questions Revisited ...... 103 7.2 Impact and Future ...... 105 7.2.1 Impact on Standardization ...... 105 7.2.2 Basic Research ...... 106

xi 7.2.3 Guideline Compliance Checking ...... 107 7.3 Reflecting on Process Mining in Healthcare ...... 108 7.3.1 Limitations and Opportunities ...... 109 7.3.2 The Genesis of this Thesis ...... 111

A Running Example XES 113

B FHIR AuditEvent 117

C Acknowledgements 119

D Curriculum Vitae 121

List of Figures 127

List of Tables 129

Acronyms 131

Bibliography 135

xii CHAPTER 1 Introduction

Insufficient communication and missing information in medical management often lead to adverse events, i.e., unintended injuries. Although Business Process Management (BPM) technologies have the potential to improve this situation, they have not had a widespread adoption in the healthcare domain [1, 2]. One reason for the lack of BPM in healthcare is the complexity of the processes, where unforeseen events in the course of a disease or during the treatment are to some degree a “normal” phenomenon. The early attempts of IT-based healthcare process support have been unsuccessful whenever rigidity came with them, limiting the ability of the to respond to process changes and exceptional situations in an agile way [2].

Process mining, or more precisely process discovery, is able to adequately capture different behavioral aspects of non-trivial operational processes and produce human-readable process models. A goal of applying process mining techniques to the healthcare domain is to understand the complex interactions between multiple actors, both human and machine, and the underlying, partially implicit processes [3]. A deeper understanding of the processes could lead to better BPM systems that are able to coordinate organizational tasks and provide information at the point of care, while still allowing for flexible, adaptive and evolutionary processes [1].

While healthcare is “the largest remaining market for pens, paper and fax-machines” [4], the digital transformation continues steadily. Interoperability is the precondition for a seamless flow of information between the multiple actors, in order to enable collaboration and communication. Healthcare IT standards provide the basis for interoperability and can also help to identify, access, and understand the data needed for process mining.

1 1. Introduction

1.1 Process Mining

The basic idea of process mining is simple and intriguing: event logs of information systems contain implicit information about possibly unknown processes running in the real world [5]. This implies that these ever-growing logs are more than dusty archives that are only recorded for legal reasons. They contain potential business value because they are projections of real-world processes, that were executed using an information system.

Compared to classic business process intelligence approaches process mining is not based on qualitative methods (e.g., expert interviews) to assume the real processes. Essentially, process mining provides an a-posteriori empirical method to discover processes in observed system behavior, i.e., event logs. Additionally, the discipline of process mining covers methods for conformance checking and model enhancement (cf. figure 1.1) [6].

Figure 1.1: Positioning of the three main types of process mining: (a) discovery, (b) conformance checking, and (c) enhancement (from [6]).

All three types of process mining, discovery, conformance checking, and enhancement, presume that an information system sequentially has recorded events referring to real world activities [7].

2 1.1. Process Mining

1.1.1 Running Example

Figure 1.2 shows a simple process model for an examination in a radiology practice using Business Process Model and Notation (BPMN). It shows the main steps from the appointment scheduling to the distribution of the diagnostic report. It is based on the work by Erickson et al. [8] on business analytics in radiology, the work in [9], and on the process model used for evaluation by [10]. This radiology practice workflow will be used as a running example throughout this thesis.

Figure 1.2: Business Process Model and Notation (BPMN) diagram of a workflow in a radiology practice based on [8, 9, 10].

In the first step, a patient calls the practice to schedule an appointment because he/she received a referral for a radiological examination. On the day of the examination, the patient arrives at the reception and is placed on the waiting list (patient admission). When called, the patient enters the procedure room and the radiological examination takes place. Afterwards, the radiologist makes a diagnosis and dictates the report. The report writing is done by trained specialists. The resulting report is attested by the radiologist. Finally, the report is sent to a requesting physician or handed out directly to the patient (report transmission).

An information system documenting the execution of this process, sequentially recording the corresponding events, would generate an event log similar to the one in table 1.1. Every line corresponds to one event. Activity, the description of the event, and Case id (Cid), the identifier of the single instance of the process executed (also called trace), are the minimum attributes necessary for process mining [11]. Moreover, the recorded events must be ordered, otherwise temporal dependencies (e.g., activity Diagnosis follows activity Radiological Examination) cannot be discovered. The ordering could be for example enabled by a consecutively numbered Event id or a Timestamp. In addition to the three attributes listed in the table other properties of the events could be recorded, e.g., start and end time of an activity, the resources or persons involved, or costs. The attributes of the events are substantial for certain analysis techniques and allow for mining different perspectives of the process.

3 1. Introduction

Cid Timestamp Activity 1 20190918101505 Schedule Appointment 1 20190918113123 Patient Reception 1 20190918114021 Radiological Examination 1 20190918115517 Diagnosis 1 20190918121019 Report Writing 1 20190918121548 Report Attestation 1 20190918123711 Report Transmission

Table 1.1: An event log based on the running example process model in figure 1.2.

1.2 Healthcare Processes

Healthcare organizations like are organized in structured hierarchies. As noted by Leseure et al. [12], the “coordination of actions in these professional bureaucracies is based more on standardization of skills and knowledge rather than standardization of work processes” [12]. Groups of professionals with common education and skills, e.g., physicians or nurses, guarantee the coordination of work within the functional units of the healthcare organization. A work process in this kind of organization is typically not formally prescribed and standardized.

1.2.1 A Distinction To provide a better understanding of the characteristics of healthcare processes, R. Lenz and M. Reichert [1] introduced a distinction between organizational processes and the medical treatment process.

Organizational processes coordinate the collaboration between healthcare pro- fessionals and organizational units. [1]

Organizational processes are not focused on the support of medical decision making and are in general of a repetitive nature. The running example of the radiology practice workflow in section 1.1.1 describes an organizational process. One example of an organizational process is patient administration in a hospital, comprising steps like admission, transfer between wards, discharge, and all the data collection and management involved. The order entry and result reporting process in the laboratory domain is another example, where the interdepartmental communication between a ward and the laboratory unit is described.

Medical treatment processes, also described as diagnostic-therapeutic cycle comprising observation, reasoning and action, aim with each iteration at decreasing the uncertainty about the patient’s disease or the actual state of the disease process. [1, 13]

4 1.2. Healthcare Processes

In the medical treatment process (or diagnostic-therapeutic cycle, cf. figure 1.3), med- ical personnel makes informed decisions to choose the next diagnostic or therapeutic procedures in the the treatment of a patient. Their decisions are based on the available information and their medical knowledge. Organizational processes enable the medical treatment process by providing the required resources (e.g., rooms, devices, personnel, information) in time at the point of care.

Figure 1.3: The diagnostic-therapeutic cycle (from [1]).

This distinction may be described on a very abstract level, but it is important to understand the different perspectives to the recorded event data. Although working together, using the same facilities and information systems, actors in a healthcare environment execute different types of processes. Moreover, some actors, e.g., physicians, are active participants in both types of processes, organizational and medical treatment, at the same time.

1.2.2 Characteristics Rebuge and Ferreira [14] conclude in their work that healthcare processes, both or- ganizational and medical treatment, are highly dynamic, highly complex, increasingly multi-disciplinary, and generally ad hoc.

• They are highly dynamic because of the high number of influence factors like technological developments, scientific progress and new legal regulations. When medical knowledge or best practice evolves, the treatment process also does. A momentary change in treatment processes could be caused for example by a flu epidemic or a global pandemic stressing the hospital’s resources.

5 1. Introduction

• They are highly complex because of the high number of involved actors, different information systems, and the related interoperability issues. Moreover, the medical treatment process involves not only medical guidelines but the individual experience of physicians.

• Healthcare processes are increasingly interdisciplinary because of the increasing level of specialized departments that work together delivering care services, partly cross-organizational.

• And they are ad hoc, because decisions are made by humans collaborating with humans all acting according to their personal knowledge and skill, dealing with specific patient situations. Thus, healthcare processes typically show high variability and do not follow predefined step-by-step execution plans.

The latter aspect, being ad hoc, cannot be stressed enough. Medical personnel handling large equipment (e.g., Computed Tomography (CT) or X-Ray machines) or the Hospital Information System (HIS) do not consciously execute a predefined step-by-step process, but focus on more high-level tasks and goals they want to achieve using these systems, namely the treatment of the patient. By operating the user interfaces of their information systems and complex machines, they will trigger a sequence of functions and procedures. From the point of view of the machines and the resulting event logs, however, these actions constitute a process that is emerging as the medical personnel is working.

1.3 Relevance

The characteristics of healthcare processes make it impossible to apply rigorous BPM, workflow management, and business process reengineering techniques. Mans et al. [15] highlight the fact that “a hospital is not a factory and patients cannot be cured using a conveyor belt system” [15]. However, systematic literature reviews [16, 17, 18] already list multiple case studies and projects that show how process mining can be used to gain insights to the healthcare domain and highlight room for improvement. In 2011, W. van der Aalst et al. [6] listed in their Process Mining Manifesto 11 challenges that need to be addressed to advance the state of process mining. This list describes several challenges that process mining projects in the healthcare domain still face. Especially the data related issues like finding, merging, and cleaning the event data, or dealing with complex event logs having diverse characteristics are implied in multiple publications, e.g., [14, 19, 20] among others. For a high level classification of problems in process mining projects in general, Bose et al. [21] distinguished between two classes: (1) Process characteristics: describes the challenges emanating from characteristics such as fine-granular activities, process heterogeneity and variability, process evolution, and high volume processes. (2) Quality of event log: deals with problems that stem from issues related to the quality of logging, including data quality, manifested in event logs.

6 1.3. Relevance

For this thesis, we add a third class (3) Data collection and integration, which is actually the first challenge listed in the manifesto [6]. This class is about finding, merging, and cleaning event data for process mining.

1.3.1 Process Characteristics Challenges The characteristics of healthcare processes are already discussed in section 1.2.2. One of the main challenges resulting from these characteristics is that typically, process discovery algorithms generate spaghetti-like models, which are unstructured and hard to understand [3].

Figure 1.4: Spaghetti process model (from [22]).

Figure 1.4 is taken from the work of W. van der Aalst [22] on the discovery of process models. It shows a process model describing diagnosis and treatment of 2,765 patients in a Dutch hospital. 619 different types of activities were executed by 266 different persons leading to a total of 114,592 recorded events. Using the heuristic miner algorithm [23], low frequent behaviour was filtered out but still the model is too difficult to comprehend. The author explicitly notes that the high complexity of the model is not caused by the chosen algorithm but it duly reflects reality [22].

Long-Running Processes Established conformance checking techniques in process mining work on completed processes. However, certain surveillance protocols or guidelines, e.g., in oncology, demand regular checkup visits over the course of months or even years. Hence, it is a challenge to determine the compliance of all actors involved while the processes are still running.

1.3.2 Event Log Quality The quality of the event log can be affected by various issues. Bose et al. [21] identified 27 different classes of issues that can be summarized in four categories that also apply to the healthcare domain [15]: missing, incorrect, imprecise, and irrelevant data. These data quality issues also affect the results of process mining [11].

7 1. Introduction

1.3.3 Data Collection and Integration Most projects and case studies on process mining in healthcare encounter problems at the beginning, when they start with data collection and data integration [17]. This is a major issue delaying further analysis. It is primarily not a problem of either process characteristics or data quality, but ultimately stems from the very way the data is structured and recorded. Healthcare information systems reflect the complex environments they were developed for. Understanding the healthcare data exchange standards and their information models, i.e., their basis for interoperability, can help to better understand the data in order to collect and integrate the event logs.

1.4 Research Questions

Based on the challenges described above, this thesis addresses a number of research questions in the field of process mining in healthcare. The overall goal can be summarized by the title of this thesis: (to enable) process mining in standards-based healthcare information systems.

RQ1 How can healthcare IT standards be used to overcome the challenges of process mining in healthcare?

a) Which standards are relevant for process mining in healthcare? b) How do existing studies on process mining in healthcare utilize standards?

RQ2 How can we reuse the data captured in the audit trails of healthcare IT systems to discover healthcare processes?

a) Do standardized audit trails provide the information necessary for process mining? b) How can we make these data records accessible for process mining tools?

RQ3 How can we enable conformance checking in long-running healthcare processes?

a) How can data of recurring events with time constraints that span a long period of time be prepared to apply process mining? b) How can we apply conformance checking techniques to measure medical guideline compliance? c) What can we learn from process mining in the context of the surveillance of melanoma patients?

These questions will be reiterated in the respective chapters. Finally, the Conclusions and Outlook chapter 7 will revisit the research questions and sum up the findings.

8 1.5. Contributions

1.5 Contributions

This thesis contributes to the field of process mining in healthcare. It aims to provide insights and methods to support different stages of process mining projects, from data collection and integration to the analysis of long-running processes.

C1 A description of possible data sources. [24, 25] Although the world of healthcare IT is fragmented and heterogeneous, there are several initiatives and standards development organizations that aim to overcome these issues. This thesis gives an overview of widely-used standards for healthcare data encoding and exchange, and thus an analysis of possible data sources for process mining projects. This contribution addresses RQ1a and provides the basis for all following contributions. C2 An overview of the current state of process mining in healthcare. [18, 26] There are several literature reviews on process mining in healthcare. By analyzing existing secondary studies and by extending the well received review of Rojas et al. [16] for the three-year period from January 2016 to December 2018, the current state of the field can be described in detail. This contribution examines how existing case studies report their study design, and provides a reporting template to improve comparability, transparency, and reproducibility. RQ1b is addressed here. C3 Transformation of standardized audit data to mineable event logs. [10, 27, 28] The international non-profit organization Integrating the Healthcare Enterprise (IHE) defines a widely adopted technical guideline for audit trail recording. This guideline determines syntactical and semantical aspects of the log messages in healthcare environments. To enable process mining, rules are defined to transform the audit logs to suitable event logs. A test system was developed to show the applicability of the approach in the radiology domain. RQ2a is addressed here. C4 A novel method to query event logs from health information systems. [29] With the simultaneous maturing of several technologies, it is now possible to develop new methods to query and retrieve event logs in a standardized way. An approach is presented, that can be integrated in modern health information systems to provide a standardized process mining interface, utilizing the newest profiles, coding schemes and communication standards. This contribution addresses RQ2b. C5 Method and case study for conformance checking of long-running processes. [30, 31] The Department of Dermatology at the Medical University Vienna (DDMUV) provided data of their malignant melanoma surveillance program over the course of seven years. Using process mining techniques we are able to visualize important aspects of the surveillance program including frequent deviations. Moreover, meth- ods to enable the measurement of guideline compliance were developed and the results evaluated. This contribution addresses RQ3.

9 1. Introduction

1.6 Structure of the Thesis

The structure of the remainder of this thesis is depicted in figure 1.5. The figure shows the main parts in boxes and the arrows indicate influences. The box at the bottom, the Introduction to Process Mining, forms the foundation for the parts above, i.e., for the entire thesis. The two boxes on the sides form the cornerstones for the middle parts by providing the necessary context. The dashed line between these two boxes indicates that they were not developed independently, but in constant consideration of each other.

Figure 1.5: Structure of this thesis.

The first part of this thesis, chapter 2, provides an introduction to the field of process mining. The basic concepts, types, and perspectives are explained based on an extended version of the running example. Furthermore, the eXtensible Event Stream (XES) standard for achieving interoperability in event logs and event streams is introduced.

The next chapter 3 introduces the concepts of healthcare interoperability. It gives an overview of data exchange standards and terminologies relevant for process mining and describes recent developments and approaches in IHE integration profiles.

Chapter 4 analyzes existing secondary studies on process mining in healthcare and adds a literature review focussing on the aspects of standardized coding and study design. Based on the results of the review, an outline for a reporting template for case studies is presented.

10 1.6. Structure of the Thesis

The utilization of standardized audit logs for process mining is discussed in Chapter 5. Three approaches, (1) a direct transformation approach, (2) a data warehousing approach, and (3) a query interface based on Health Level Seven (HL7) Fast Healthcare Interoperability Resources (FHIR) are presented. All three approaches, their relationships, and their issues are discussed. In chapter 6, new methods for data preparation and analysis techniques for the mea- surement of guideline compliance are introduced. Their application in a case study on melanoma surveillance is described and the results are evaluated. Chapter 7 concludes the thesis, revisits the research questions, and highlights the relevance and impact of the contributions. Furthermore, this chapter provides a discussion of future research in the field of process mining in healthcare. Appendix A provides a listing of the full extended running example in the XES standard. Appendix B provides a listing of the HL7 FHIR AuditEvent resource used in chapter 5. In Appendix C the author thanks all the people who have had an influence on this work in one way or another. Appendix D is the author’s CV, including work experience and education, academic activities, and a list of all scientific publications.

11

“All models are wrong, but some are useful.” George E. P. Box

CHAPTER 2 Process Mining

13 2. Process Mining

Process mining as a research field is relatively new, with the term “Process Mining” being coined in the early 2000s by Wil van der Aalst [32]. It connects to the broader fields of data science and process science and thus naturally overlaps with some of their approaches and principles [11].

“The idea of process mining is to discover, monitor and improve real processes (i.e., not assumed processes) by extracting knowledge from event logs readily available in today’s (information) systems.” Process Mining Manifesto [6]

• Section 2.1 highlights the importance of high quality event logs as a starting point for process mining. The minimum requirements are listed and the quality metric and the scoring system by the IEEE Task Force on Process Mining is presented. This section also introduces an extended running example. • In section 2.2, the three different types of process mining, discovery, conformance, and enhancement are described in more detail. • Section 2.3 adds four different perspectives, control-flow, time, organizational, and case to the types of process mining. • Section 2.4 introduces the eXtensible Event Stream (XES) standard and model for event log representation. • Finally, section 2.5 concludes this chapter and outlines recent developments in process mining.

2.1 Event Log as Starting Point for Process Mining

Event logs are generated by information systems. Information systems are ubiquitous, from home computers to small enterprises like supermarkets and family physicians to large corporations like car manufacturers or hospital networks. By operating the information systems’ User Interfaces (UIs) and Application Programming Interfaces (APIs) the users and also other information systems trigger sequences of functions and procedures. These actions are recorded in the event logs. Whether these actions were executed in an orchestrated manner by a Process Aware Information System (PAIS) or whether the execution was ad hoc – there are certain requirements to enable use for process mining. Every single entry in the event log, that is, every event, must provide at least the following information:

• The name of the recorded activity, to identify which action in the real world led to the recording of the event. For the log of the extended running example in table 2.1 the events are named like the activities in the Business Process Model and Notation (BPMN) diagram of the simple running example in figure 1.2.

14 2.1. Event Log as Starting Point for Process Mining

• Some kind of ordering attribute to determine the order of the recorded events. This could also be an incrementing number, but in practice, timestamps (like in table 2.1) are most commonly used.

• To distinguish the different process instances, each recorded event also needs to provide information about the context of its recording. This attribute basically assigns all recorded events to their respective process instances. For example, this could be a case number in a hospital (Cid in table 2.1).

Events can have all kind of additional attributes like resource, costs, lifecycle information, or semantic extensions. For the running examples, the events could also yield the patient’s ID, the IDs of the examining radiologist and other involved specialists, the actual diagnosis, information about the referral, social insurance, or billing information. All these attributes could be utilized in process mining to unveil potentially useful information about the processes. However, the backbone of process mining is the control-flow perspective that requires only the minimum three attributes describing the aforementioned mandatory aspects – activity name, ordering, and context [33].

2.1.1 Extended Running Example

The simple running example in table 1.1 already provides the mandatory attributes to discover the control flow. However, to introduce different aspects and perspectives of process mining, an extended running example is presented in table 2.1. The domain, that is, the organizational workflow in a radiology practice, is the same. In addition to Cid (providing the context as clinical case identifier), timestamp (for ordering), and activity (name of the workflow step), the extended running example provides resource and role attributes.

The resource attribute records the names of actual people participating in the workflow, e.g., the first event records a schedule appointment activity executed by A. Allen (cf. table 2.1). The role attribute records the organizational context, that is, the role, in which a certain person contributed to the workflow. In this example, the different activities are carried out by medical and administrative personnel (resources) acting in specific roles. A. Allen, for example, is recorded to contribute in different roles, administration, receptionist, and report specialist. C. Cooper, on the other hand, filled the role of a radiologist in the diagnosis and report attestation activities.

The extended running example records a total of 21 events in three different process instances (i.e., cases) separated by dashed lines in table 2.1. The cases are of different length, both in the number of activities and in duration, and do not follow the same sequence of activities. The implications will be discussed in greater detail in section 2.2.1.

15 2. Process Mining

Cid Timestamp Activity Resource Role 1 20190918101505 Schedule Appointment A. Allen Administration 1 20190918113123 Patient Reception A. Allen Receptionist 1 20190918114021 Radiological Examination B. Baker Rad. Technologist 1 20190918115517 Diagnosis C. Cooper Radiologist 1 20190918121019 Report Writing A. Allen Report Specialist 1 20190918121548 Report Attestation C. Cooper Radiologist 1 20190918123711 Report Transmission D. Dyson Administration 2 20190918114123 Schedule Appointment D. Dyson Administration 2 20190918114921 Patient Reception D. Dyson Receptionist 2 20190918115917 Radiological Examination B. Baker Rad. Technologist 2 20190918121519 Diagnosis E. Eder Radiologist 2 20190918122248 Report Writing A. Allen Report Specialist 2 20190918123311 Report Transmission D. Dyson Administration 3 20190918122300 Patient Reception D. Dyson Receptionist 3 20190918122929 Radiological Examination F. Faraday Rad. Technologist 3 20190918125011 Diagnosis C. Cooper Radiologist 3 20190918125119 Report Writing A. Allen Report Specialist 3 20190918130242 Report Attestation C. Cooper Radiologist 3 20190918131411 Report Writing A. Allen Report Specialist 3 20190918131755 Report Attestation C. Cooper Radiologist 3 20190918133001 Report Transmission A. Allen Administration

Table 2.1: An extended example event log for the running example workflow from section 1.1.1. The dashed lines separate the cases recorded in the log.

2.1.2 Data Quality Issues

“Data Quality - The degree to which data items are accurate, complete, relevant, timely, sufficiently detailed, appropriately represented (for example, consistently coded using a clinical coding system), and retain sufficient contextual information to support decision making.” Basic concepts in medical informatics [34]

The quality of an event log can be affected by various issues. Data quality (DQ) issues in general can be classified in different ways depending on the perspective [35]. Rahm and Do [36] for example distinguish between single-source and multi-source DQ problems and how to approach them for data cleansing on either instance-level or schema-level. They also elaborate on transformation steps for single-source problems, e.g., standardization of attribute values using coding schemes and dictionaries. Another approach was presented by Müller and Freytag [37], classifying data anomalies into (1) syntactical anomalies (e.g., lexical errors or wrong units), (2) semantical anomalies (e.g., duplicates or contradictions), and (3) coverage anomalies (e.g., missing values).

16 2.1. Event Log as Starting Point for Process Mining

In their taxonomy of dirty data [38] Kim et al. classify DQ problems on the top-level as either (1) missing or (2) not-missing, with the not-missing branch being further split into (2.1) wrong and (2.2) not wrong, but unusable. A similar taxonomy was later chosen by Bose et al. [21] to classify DQ issues in the context of process mining. Analyzing the characteristics and the requirements of process mining data (i.e., event logs), Bose et al. [21] identified 27 different classes of possible DQ issues that can be summarized in four categories: (1) missing data, (2) incorrect data, (3) imprecise data, and (4) irrelevant data. For the healthcare domain R. Mans et al. provide a description of these four categories [15]:

• Missing data: Different kinds of process mining information are missing (e.g., missing events, attributes, or values).

• Incorrect data: The logged data is incorrect (e.g., impossible timestamps like birthdays in the future).

• Imprecise data: The logged data is too coarse leading to a loss of precision (e.g., timestamps in the order of a day).

• Irrelevant data: Some recorded information may be considered irrelevant for analysis (e.g., a Laboratory Information System (LIS)’s numerous automatic log entries when only the lab orders are relevant). In some cases these information can be filtered or aggregated to derive relevant entities.

DQ problems in general can lead to wrong conclusions in decision making and incorrect or misleading statistics (“garbage in, garbage out”) [36]. DQ problems also affect the results of process mining [11]. For some of the listed issues above, process mining techniques were developed to reduce their impact (e.g., fuzzy [39] or heuristic [23] approaches to handle missing data). The next section outlines how to tackle DQ from the input-side, reducing the effort in data cleaning and data preparation for process mining.

2.1.3 Event Log Maturity Levels For systems that want to actively improve their data quality with the goal to enable process mining on their event data, a list of 12 guidelines (GL) for logging was published in [40]. They describe for example the necessity of: (GL1) clear semantics for names of references and attributes, (GL2) structured and managed collections of reference and attribute names (e.g., taxonomies or ontologies), (GL6) at least some ordering of events, or (GL11) provenance (i.e., ensuring reproducibility). The guidelines aim to create awareness for data quality problems directly affecting the results of process mining [11]. To define the applicability of a data source for process mining, the Process Mining Manifesto [6] introduces five different maturity levels for event logs. The classification is based on the following quality criteria:

17 2. Process Mining

• Event logs should be trustworthy, i.e., it should be safe to assume that the recorded events actually happened and that the attributes of the events are correct.

• Event logs should be complete, i.e., given a particular scope, no events may be missing.

• Any recorded event should have well-defined semantics (cf. GL1 and GL2 in [40]).

• Moreover, the event data should be safe, i.e., privacy and security concerns are addressed while recording the events.

The first level describes event logs of poor quality, where recorded events may not correspond to reality and events may be missing. Examples for level 1 event logs are paper documents routed through the organization or paper-based medical records. While it can be possible to apply process mining techniques to these logs, it “does not make much sense” [6]. In the second level, events are already recorded automatically by an information system. However, there is no systematic approach to the recording of events and the information system is not part of all possible steps in the processes or it can be bypassed. This results in incomplete logs that do not always properly reflect the processes [6]. The third level is similar to the second level in that the information system still does not follow a systematic approach to record events. The main difference is the trustworthiness of the recorded events to match reality. Van der Aalst names Enterprise Resource Planning (ERP) systems as sources for level 3 logs because the recorded events, while distributed over many tables, can be assumed to be correct [6]. Many hospital information systems are based on ERP systems like SAP and without further effort to improve their data quality, these systems are classified within maturity level 3 in most cases. Level four records events not only in a complete and trustworthy, but also in a systematic way. This means that the information system is aware of the notions of process instance (e.g., a case id) and activity [6]. Workflow engines and other Business Process Management (BPM) systems can record level four logs. The highest level five adds semantics on top of trustworthiness, completeness, and safety. For a log to be classified in level five, the meaning of all recorded events and their attributes must be well-defined [6]. This can be achieved by using ontologies.

2.2 Types of Process Mining

Van der Aalst et al. [6] describe three types of process mining, (1) process discovery, (2) conformance checking, and (3) process enhancement. This section briefly introduces the basics of process discovery and the ideas behind conformance checking and process enhancement.

18 2.2. Types of Process Mining

2.2.1 Process Discovery Process discovery starts with an event log. The goal is to construct a process model of the process that made use of the information system to produce the given log (cf. figure 2.1).

Figure 2.1: Input and output of process discovery (from [6]).

There are different techniques and algorithms for process discovery. The α algorithm was one of the first approaches to discover a process model based on an event log. It discovers a Petri net by identifying basic process patterns [5]. Algorithm 2.1 provides a description of the eight steps leading from an event log to a Petri net. Note that both symbols → and # have a specific meaning in the context of this algorithm:

→ Causality: for both events x and y, x→y, iff for some case x is directly followed by y and for all cases y is never followed by x.

# Independence: for both events x and y, x#y, iff for all cases x is never followed by y and for all cases y is never followed by x.

Algorithm 2.1: α algorithm [5] Let L be an event log over some set T of activities. A trace σ ∈ L is a sequence of events. Then, α(L) is defined as follows:

1 TL = {t ∈ T |∃σ∈Lt ∈ σ} // determine transitions 2 TI = {t ∈ T |∃σ∈Lt = first(σ)} // determine start events 3 TO = {t ∈ T |∃σ∈Lt = last(σ)} // determine end events 4 XL = {(A, B)|A ⊆ TL ∧ A 6= ∅ ∧ B ⊆ TL ∧ B 6= ∅ ∧∀a∈A∀b∈Ba →L b

∧∀a1,a2∈Aa1#La2

∧∀b1,b2∈Bb1#Lb2} // find pairs 0 0 5 0 0 YL = {(A, B) ∈ XL|∀(A ,B )∈XL A ⊆ A ∧ B ⊆ B =⇒ (A, B) = (A0,B0)} // delete non-maximal pairs 6 PL = {p(A,B)|(A, B) ∈ YL} ∪ {iL, oL} // determine places 7 FL = {(a, p(A,B))|(A, B) ∈ YL ∧ a ∈ A} ∪{(p(A,B), b)|(A, B) ∈ YL ∧ b ∈ B} ∪{(iL, t)|t ∈ TI } ∪ {(t, oL)|t ∈ TO} // determine flow relation 8 α(L) = (PL,TL,FL) // bring together

19 2. Process Mining

Figure 2.2 shows the resulting Petri net after applying an extension of the the α algorithm (α+) to the event log of the extended running example in table 2.1. The algorithm discovered in the log some deviations from the process model of the simple running example (cf. figure 1.2):

• Some traces skipped Schedule Appointment and started with Patient Admission.

• Some traces skipped Report Attestation and went directly to Report Transmission.

• Some traces went back to Report Writing after Report Attestation.

Figure 2.2: The resulting Petri net after applying the α+ algorithm to the event log in table 2.1

The original α algorithm has several limitations, e.g., it is not able to discover short loops or non-free choice relations [5]. The extended event log of the running example in table 2.1 contains a short, length-two loop in case 3 (Writing>Attestation>Writing>Attestation). These limitations have been overcome with the successor, the α+ algorithm [41], and advanced algorithms are able to deal with noise and incomplete event logs as well or try to distinguish what is important and what is not [11]. In 2012, a review by De Weerdt et al. [42] identified 26 different approaches to process discovery. Building on that study, Augusto et al. [43] published a systematic literature review and benchmarking on process discovery algorithms in 2018, identifying 35 additional approaches.

While earlier approaches typically produced Petri nets as results (like the α algorithm), Augusto et al. [43] observe an increasing number of methods producing output in other model languages, like BPMN or declarative constraints. Regardless of the approach, the resulting models must aim to meet four competing quality criteria, as depicted in figure 2.3. They can be described as follows [11, 44, 45]:

20 2.2. Types of Process Mining

Figure 2.3: The four competing quality criteria in process discovery [11].

• Fitness: the ability to replay the recorded event log on the discovered process model. In a model with perfect fitness, all traces in the log can be replayed on the model.

• Precision: avoid underfitting. Precision measures are used to quantify how much a process model overapproximates the behavior observed in an event log [46]. A model with perfect precision can only replay traces that were recorded in the log.

• Generalization: avoid overfitting. In contrast to precision, generalization aims at a model that is not restricted to the observed behaviour. While an enumerating model (i.e., a model where every trace is modelled separately) has perfect precision and fitness, it fails at the generalization criteria.

• Simplicity (earlier: Structure [44] or Appropriateness [45]): the simplest model to explain the behaviour observed in the log is the best model. Simplicity basically expresses whether a model is understandable by human analysts, which is a highly subjective an ambiguous definition [47]. However, methods to quantify simplicity exist, e.g., Structural Appropriateness [45].

Commercial tools often use abstractions of logs to visualize the data, most notably the directly-follows graph. However, these abstractions have several limitations, for example, the resulting models can become very complex (i.e., lack of simplicity) and have low fitness [48]. Figure 2.4 shows a model of the extended running example, visualized using the com- mercial tool Disco by Fluxicon [49]. In contrast to the Petri net in figure 2.2, the directly-follows graph does not show splits and joins. Thus, the information whether certain activities run parallel or exclusively is not presented in the model. The model in figure 2.4 shows some additional information. It is called a weighted directly-follows graph, as it also visualizes frequencies of events and their connections. The differences in thickness and shade of arrows and boxes in figure 2.4 correspond to the relative frequencies. A thick arrow, for example connecting diagnosis with report writing,

21 2. Process Mining

Figure 2.4: Redrawn directly-follows graph of the extended running example mined with the tool Disco.

indicates a high relative frequency – it was observed three times in the log. A thin arrow, for example the sequence report attestation > report writing, indicates a low frequency – it was only observed once in the log. The same principle holds for the shading of the boxes representing the activities. Schedule appointment has the lightest shade, with the lowest frequency of two. The darkest shade, with the highest frequency, is the box for the report writing activity. The dashed and numbered lines at the beginning and at the end of the process show which, and how often, activities are first or last in the process. For example, patient admission has a total frequency of three, and on of these three times it acts as starting activity.

2.2.2 Conformance Checking Conformance checking techniques need an event log and a model as input (cf. figure 2.5). The objective is to check how the recorded reality in the log conforms to a model. These techniques enable the quantification of conformance and differences, unveiling possible shortcomings of the model or of the recorded reality in the log [7].

Figure 2.5: Input and output of conformance checking.

Leemans [47] lined out the different concepts of conformance based on (1) the behaviour of the observed system, (2) the behaviour recorded in the log, and (3) the behaviour allowed by the (discovered) model. Figure 2.6 shows two highlighted versions of the system-log-model Venn diagram. The whole diagram with all seven areas is discussed in

22 2.2. Types of Process Mining

further detail by Buijs in [50]. The two log conformance concepts depicted in figure 2.6 can be described as follows:

• To the left, behaviour that is recorded in the event log and also present (i.e., possible) in the model is deemed fitting (light grey). The opposite, unfitting behaviour, is characterized by recorded behaviour that can not be reproduced in the model (dark grey).

• To the right, log-precise behaviour is characterized as behaviour recorded in the log and possible on the model (light grey) and log-imprecise behaviour that is possible on the model, but never recorded in the log (dark grey).

Figure 2.6: Log conformance concepts. Left: fitting and unfitting behaviour. Right: log-precise and -imprecise behaviour. (based on [47] and [50])

The terms fitting and (log-)precise correspond to the quality criteria fitness and precision described in the previous section 2.2.1. Ultimately, determining the quality of a model, that is, calculating how good the model reflects the log, is already a form of confor- mance checking. Munoz-Gama [51] even names the four quality criteria “dimensions of conformance checking”. There are different approaches to conformance checking. Van der Aalst lists token replay (on Petri nets), trace alignments, and footprint comparison as the traditional approaches and gives a brief introduction in [11]. Despite being named a traditional approach, a recent literature review identified trace alignment to be still the current state of the art in conformance checking [52]. The method, developed around 2010 by Bose and van der Aalst [53], was inspired by Multiple Sequence Alignment (MSA) techniques from bioinformatics. Progressive [54] and iterative [55] alignment strategies were adopted [53]. However, initial trace alignment approaches focus only on the control-flow perspective, that is, the ordering of activities. More recent approaches also take the timing, resources and other data of events into account when measuring the conformance of log and model [56, 57].

23 2. Process Mining

2.2.3 Process Enhancement Enhancement goes one step further than conformance checking. It also starts with the recorded event log and the a-priori model (cf. figure 2.7) but uses the resulting information not to identify differences but to improve or extend the model [6].

Figure 2.7: Input and output of process enhancement.

Improving the model to better reflect the recorded reality is one example of process enhancement. Adding information about bottlenecks or resources to the model is another. This type of process mining is also called process reengineering in more recent publications [48].

Operational Support Originally not explicitly listed as one of the types of process mining, operational support can either be subsumed under the term process enhancement [11] or described as a seperate type [48]. From a functional perspective, the main difference is that the aim of operational support is not to enhance the process model but to support the process in execution. This can be achieved by providing evidence-based (i.e., data-driven) warnings, predictions, or recommendations in the running process [11, 48].

2.3 Mining Different Perspectives

Van der Aalst et al. identified four perspectives towards the analysis of event logs [6]: control-flow, time, organizational, and case. These perspectives can be depicted as orthogonal to the three types of process mining described above [20, 48].

2.3.1 Control-Flow The control-flow perspective focuses on the ordering of activities. The goal of mining with this perspective is a model that best describes the observed behavior, i.e., all (or most) cases in the event log. Control-flow oriented algorithms are typically independent from specific notations like Petri nets, BPMN, or Event-driven Process Chain (EPC). The α algorithm is an example of a method to mine for the control-flow (see figure 2.2). The control-flow perspective is the basis for the other process perspectives [48].

24 2.3. Mining Different Perspectives

2.3.2 Time

The time perspective makes use of the information about timing and frequency of events. It enables the discovery of bottlenecks and can be used for operational support, e.g., the prediction of the remaining time in a running case based on recorded finished cases.

Figure 2.8 shows a directly-follows graph of the extended running example from table 2.1. The edges are weighed not based on the frequency (cf. figure 2.4) but based on the mean duration, i.e., the average time between the activities. For example, the average time between Report Attestation and Report Transmission is 16.7 minutes.

Figure 2.8: Directly-follows graph of the extended running example highlighting the time perspective.

2.3.3 Organizational

The organizational perspective aims to identify the involved actors and show their relationships. Of course, these analyses require the event log to contain additional attributes to the three listed in the simple running example in table 1.1.

Based on the extended running example in table 2.1, figure 2.9 shows the handover of work between roles in the radiology practice. This very basic visualization is created by generating a directly-follows graph with the role attribute in place of the activity attribute. This means that the control flow is not shown between activities but between roles that were recorded to execute the activities.

For example, radiology technologists always receive their work (i.e., order) from recep- tionists after the patient admission and hand over their work (i.e., the completed images after the examination) to the radiologist for diagnosis.

Another possible result of mining the organizational perspective is a collaboration network showing the handover of work and the connections between specific people.

25 2. Process Mining

Figure 2.9: Directly-follows graph of the extended running example highlighting the handover of work between roles.

2.3.4 Case The case perspective, also “data and decision” [48], differentiates cases not only via their path in the process model (control-flow) or based on the involved actors (organizational) but focuses on the special properties of cases, e.g., values associated to certain activities. Based on these properties, dependencies and decision rules can be learned. For the extended running example this could mean that a specific report specialist collaborates better with a specific radiologist than his or her colleagues, i.e., they rarely need additional writing-attestation cycles – and should therefore be matched more often to increase the overall performance.

2.4 Standardized Event Log Representation

Log data is created from a variety of different systems with their own proprietary data models, formats, and semantics. Event log data is the key input for process mining, thus a standardized data format for the event logs is needed to allow for generalized tooling. The first approach was the Mining eXtensible Markup Language (MXML), that was used as an XML-based format for log exchange. To overcome the limitations of MXML, concerning primarily extensibility, XES was developed [11]. In September 2010, the IEEE Task Force on Process Mining accepted XES as standard for log data exchange [58]. XES defines three basic objects (see Fig. 2.10): log, trace and event. Log (the process) contains a collection of traces (execution instances) and a trace contains a collection of events. Each object can contain an arbitrary set of strongly typed attributes in the form of key-value pairs. Every attribute value has a data type, like string, boolean, or date. To add semantics to these data types, XES defines the concept of extensions. An extension defines a set of attributes, their types, and keys with a specific semantic meaning.

26 2.4. Standardized Event Log Representation

Figure 2.10: The XES meta-model as described in the IEEE 1849-2016 standard [59].

To improve the semantic interpretation of XES logs, additional resources were defined in the form of standard extensions (e.g. concept, organizational, or time). A standard extension refers to an external resource, identified by a Uniform Resource Identifier (URI) (e.g., http://www.xes-standard.org/time.xesext for time). This resource de- fines, e.g., for the “Time” extension, that the value of the timestamp attribute must be of the type date and must describe “the UTC time when the event occured” [60]. With these standard extensions, process mining tools like ProM, Disco, or Celonis can semantically interpret parts of the data [58]. To define mandatory fields in XES, global attributes can be used. For example if event is defined to have certain global attributes, like a timestamp or resource information, all events recorded in the log must contain those attributes. Listing 2.1 shows the first line (i.e., the first event) of the extended running example (cf. table 2.1), represented in XES. Here, the standard extensions time (for the timestamp), concept (for the name of the recorded activity), and lifecycle (for the status of the activity, e.g., complete) are used.

27 2. Process Mining

Listing 2.1: XES representation of the first event in the extended running example in table 2.1.

1 2 3 4 5 6 7

A complete XES representation of the extended running example can be found in Appendix A in listing A.1.

2.5 Recent Developments in Process Mining

Due to the increasing number of scientific publications, the diverse use cases, and the different approaches, techniques, and algorithms, process mining became its own scientific discipline over the last 20 years [61]. The adoption of process mining in industry started around 2010 and gained increasing interest over the last 5 years [61]. In his Market Guide for Process Mining [62], Kerremans sized the market at about 25 vendors of process mining tools and presented 19 of them in detail. Celonis1 is a german company based in Munich and the market leader for process mining software. Main feature is the native integration in major business software, like SAP, Salesforce, or ServiceNow [62]. The tools used in the course of this thesis are Disco, from Fluxicon2, and ProM3. Disco is an easy-to-use tool that utilizes an improved fuzzy mining approach for process discovery [49]. The founders of Fluxicon also developed the intial fuzzy mining algorithm [39]. ProM, on the other hand, is a framework that supports a wide variety of process mining techniques in the form of plug-ins [63]. There are more than 1500 plugins, mostly developed by researchers, that present unique approaches not found in any other tool. ProM is the leading process mining research platform [62]. While the basic ideas and principles of process mining are easily explained and applied in practice, there are still multiple open challenges and ongoing research (see also Section 1.3.1). In a recent publication on the academic outlook for the process mining discipline, Van der Aalst [61] identified new scientific and practical challenges, including dealing with uncertainty, the gap between process modeling and process mining, and multiple processes using different case notions. The latter challenge already led to a new field of research, namely object-centric process mining [64].

1https://www.celonis.com/, last access 17.01.2021 2https://fluxicon.com/, last access 17.01.2021 3http://www.promtools.org/, last access 17.01.2021

28 “The nice thing about standards is that you have so many to choose from.” Andrew S. Tanenbaum

CHAPTER 3 Interoperability and Healthcare Data Standards

29 3. Interoperability and Healthcare Data Standards

Modern healthcare depends on collaboration and communication across different special- ized departments or even enterprises. Interoperability is the precondition for a seamless flow of information between the multiple actors, medical personnel and information systems [4].

“The field of healthcare IT brings together the two communities of practice most known for their use of complex and difficult language: medicine and computer science” Michael Leavitt, former US Secretary of Health and Human Services [65]

This chapter presents a foundation for the remainder of the thesis and is in itself a contribution to the field by attempting to come to terms about the relevant standards and vocabularies in the context of process mining in healthcare.

• In section 3.1 the term interoperability is defined in regard of healthcare IT and workflow management, providing the context for the following sections.

• Section 3.2 gives a short overview of Standards Development Organizations (SDOs) responsible for the standards described in this thesis.

• In section 3.3 healthcare data exchange standards are discussed, with a focus on their approaches to communicate process information or metadata.

• Then, section 3.4 describes terminologies and classification systems important for process interoperability.

• Section 3.5 describes the concepts behind two integration profiles relevant for the remainder of the thesis.

• Finally, section 3.6 concludes this chapter and reflects on the implications of the latest developments in profiling and standardization.

3.1 Interoperability

It is not without irony that there are many different definitions of the term Interoperability, and that it means different things to different people [4, 65]. In the IEEE Standard Computer Glossary [66], interoperability is explained as “the ability of two or more systems or components to exchance information and to use the information that has been exchanged”. Furthermore they introduce the distinction between syntactic and semantic interoperability: Syntactic interoperability, describing the basic ability to communicate and exchange data, and semantic interoperability, describing the ability to also transport the meaning of the data [66].

30 3.1. Interoperability

Level of Geraci [66] Gibbons [65] Blobel [67] HIMSS [69] Interop. IEEE 1991 HL7 Int. 2007 HL7 Ger. 2008 2017 high process organizational /social /service medium semantic semantic semantic semantic syntactic low syntactic technical structural structural technical foundational

Table 3.1: Comparison of different definitions for the levels of interoperability.

3.1.1 Technical Interoperability A more detailed view on the different levels of interoperability reveals more ambiguities (see table 3.1). In an exhaustive study, Gibbons et al. [65] collected approximately 100 definitions directly related to healthcare IT interoperability. They identified three inter- operability classification types and for the lowest level, the term technical interoperability was chosen. For them, it explicitly includes the term syntactic interoperability among others (e.g., functional or exchange). Blobel [67] has a more differentiated view here, putting technical interoperability at the very bottom and placing structural and syntactic interoperability on top of it. Syntactic interoperability, as defined by Geraci et al., overlaps with the term technical interoperability, as defined by Gibbons et al., and Blobel’s lower three levels in that it focuses on the conveyance of data, not on its meaning [65, 67, 66]. Thus, messages or documents can be exchanged without any consideration of their contents [68].

“Technical interoperability neutralizes the effects of distance.” Gibbons et al. [65]

3.1.2 Semantic Interoperability Only based on technical interoperability, semantic interoperability can be reached [66]. Dolin and Alschuler provide an operational definition for semantic interoperability as “the ability to import utterances from another computer without prior negotiation, and have your decision support, data queries and business rules continue to work reliably against these utterances” [70]. To achieve this, the communicating information systems must defer to a common information exchange reference model [66]. Thus, enabling a common understanding of the meaning of the content in the exchanged messages or documents. Unambiguous codes and identifiers are the foundation for semantic interoperability [4].

“Semantic interoperability communicates meaning.” Gibbons et al. [65]

31 3. Interoperability and Healthcare Data Standards

3.1.3 Process Interoperability The highest level of interoperability aims not only for the exchange of the information but also for the coordination of work processes. This additional level, on top of semantic interoperability, was named differently in the literature (see table 3.1) but in any case the goal was to support the establishment of business processes across departments and even enterprises. Patricia et al. [65] concluded that “process interoperability is another way of talking about workflow management”.

“Process interoperability coordinates work processes.” Gibbons et al. [65]

3.2 Standards Development Organizations

A standard is defined by the International Organization for Standardization (ISO) as a document, established by consensus and approved by a recognized body, that provides, for common and repeated use, rules, guidelines or characteristics for activities or their results, aimed at the achievement of the optimum degree of order in a given context [71]. In this context, a recognized body is an internationally recognized SDO. This section briefly introduces the organizations that develop the well established stan- dards relevant for the running example. All of these SDOs are, among others, members in the Joint Initiative Counci (JCI)1 that operates on a strategic level to harmonize their standardization efforts [4].

3.2.1 DICOM Digital Imaging and Communications in Medicine (DICOM) is the name of a standard for digital imaging and related information. Although originally developed by the American College of Radiologist (ACR) in collaboration with the National Electrical Manufacturers Association (NEMA) in 1985, today DICOM is also used as the name for the SDO [68, 4]. It is recognized by ISO as the ISO 12052 standard. In 1993 DICOM version 3 was released and is backwards compatible since then. It is used in most imaging modalities including X-Ray, magnetic resonance imaging (MRI), Computed Tomography (CT), and Ultrasound and it defines the communication between these modalities, the workstations, and the digital image archives. DICOM is the basis for all digital imaging in medicine and was the key driver for the replacement of X-Ray films to a fully digital workflow in radiology [72].

3.2.2 HL7 The organization Health Level Seven (HL7) was founded in 1987 with the goal to develop standards for the electronic data exchange in healthcare, especially between applications

1http://www.jointinitiativecouncil.org/, last access 17.01.2021

32 3.3. Healthcare Data Exchange Standards within a hospital. It was accredited by the American National Standards Institute (ANSI) in 1994 and their subsequent publications of the HL7 2.x standards family became widely adopted [68].

Their standards were initially only designed to communicate patient-related administrative data within one hospital, but nowadays cover the whole spectrum of healthcare related data exchange. Table 3.2 lists, among others, several important HL7 standards. For this thesis, the newest member of the HL7 standards family, Fast Healthcare Interoperability Resources (FHIR), will be further investigated.

3.2.3 IHE Integrating the Healthcare Enterprise (IHE) is an international initiative by healthcare professionals and industry to improve the integration and interoperability of healthcare information systems. It started in 1998 with the goal to define how existing standards, like DICOM and HL7, can be implemented to overcome common interoperability problems in radiology [73]. IHE does not define own standards, but re-uses and constrains the standards developed by other groups for application to certain use cases. Oemig and Snelick [68] thus distinguish IHE from the other SDOs and call them Profile Development Organizations (PDOs). However, Benson and Grieve [4] list IHE as an SDO.

IHE’s approach comprises four consecutive steps, where (1) interoperability problems are identified by clinicians and IT experts, (2) healthcare IT experts define in integration profile documents how to use the established standards to solve these problems, (3) vendors implement these integration profiles and test their products at annual Connectathon events, and (4) the results of the tests are published to be used in requests for proposals, simplifying the acquisition process [4].

Over the years more specialties followed. On their website2, IHE lists 12 active domains, among them cardiology, , radiation oncology, and IT infrastructure. Nowadays, IHE integration profiles are the basis for information systems of major vendors, national healthcare programs like the Austrian (ELGA), the Smart Open Services for European Patients (epSOS) project of the European Union and the transatlantic Trillium Bridge project. This thesis will relate to integration profiles developed for the domains of radiology and IT infrastructure.

3.3 Healthcare Data Exchange Standards

Over the years, different standards for data exchange in healthcare have been developed and specified by a number of organizations and institutions. Some of them complementary, others competing [68]. Schulz et al. [74] give an overview and a short description of the scope of important medical data standards, cf. tables 3.2 and 3.3.

2https://www.ihe.net/ihe_domains/, last access 17.01.2021

33 3. Interoperability and Healthcare Data Standards

SDO Standard Scope Federative Committee on Terminologia Anatomy terms in English and Latin Anatomical Terminology Anatomica (TA) (FCAT) HL7 v2 Messaging protocol; several of the chap- ters of this standard cover clinical con- tent v3 Reference In- Information ontology; especially the formation Model “Clinical Statement” work aims to cre- (RIM) ate reusable clinical data standards Clinical Doc- Information model for clinical docu- ument Archi- ments (embedding of terminology stan- tecture (CDA) dards in level 2 and 3); especially the Level 1-3 Continuity of Care Document (CCD) specifications and the Consolidated CDA (C-CDA) specifications add de- tail to standards for clinical documents FHIR Information and Document model; sev- eral parts of the core specification deal with clinical content IHE Several Integra- Clinical workflows including references tion profiles to clinical data standards to be used ISO TS22220:2011 Identification of subjects of care 21090:2011 Harmonized data types for information exchange 13606 High-level description of clinical infor- mation models 23940 (ContSys) processes for continuity of care 14155 Clinical investigations IDMP Identification of medicinal products (IDMP) DICOM/NEMA DICOM Medical imaging and related data openEHR foundation openEHR Clinical information model specification

Table 3.2: Important medical data standards, Part 1 (from [74]).

34 3.3. Healthcare Data Exchange Standards

SDO Standard Scope Regenstrief Institute Logical Observation Terminology for lab and other ob- Identifiers Names servables and Codes (LOINC) Unified Code for Standardised representation of Units of Measure units of measure according to the (UCUM) SI units (ISO 80000) Personal Connected Continua Design Collecting data from personal Health Alliance (PCHA) Guidelines health devices SNOMED International Systematized Nomen- Terminology / Ontology for rep- clature of Medicine resenting the electronic health Clinical Terms record (“context model” = Infor- (SNOMED CT) mation model for SNOMED CT) World Health Organiza- International Classi- Disease classification tion (WHO) fication of Diseases (ICD-10)/ICD-11 ICF International Classification of Functioning, Disability and Health (ICF) ICHI International Classification of Health Interventions (ICHI) International Non- Generic names for pharmaceuti- proprietary Names cal substances (INN) Anatomical Ther- Drug ingredient classification apeutic Chemical Classification System (ATC) World Organization of ICPC International Classification of Pri- Family Doctors (WONCA) mary Care (ICPC)

Table 3.3: Important medical data standards, Part 2 (from [74]).

35 3. Interoperability and Healthcare Data Standards

3.3.1 FHIR

FHIR is the latest addition to the family of healthcare interoperability standards main- tained and published by HL7 International [75]. FHIR provides a comprehensive informa- tion model which is geared towards supporting semantic interoperability of clinical data. The fundamental building blocks for this information model are resources. A resource as described by Mandel et al. [76] is a coherent expression of clinical data and is based on a set of well-defined fields and data types. Every resource comprises the standard defined data content, a human-readable representation of respective content and has an identity. The FHIR specification defines resources for common clinical concepts, e.g., Patient, , Observation, Condition. Besides that, FHIR leverages modern web technologies together with a strong foundation of web standards and offers support for Representational state transfer (REST)ful architectures. Following the RESTful paradigm, FHIR allows to alter the state of a particular resource using a set of predefined actions for Create/Read/Update/Delete (CRUD). If required by a given use-case, it is also possible to apply a more remote procedure call (RPC)-like interaction paradigm. This is achieved by defining operations that work on input and produce an output [77]. The operations can be executed on the server level, on the resource type level, or on the instance level of a specific resource and are typically invoked by a Hypertext Transfer Protocol (HTTP) POST or can alternatively be invoked by a HTTP GET if no changes are caused on the server.

According to HL7 International [75], a central challenge for the FHIR specification is handling the wide variety and variability in diverse healthcare processes. This challenge is solved by offering a simple framework for extending the existing resources and describe use cases based on profiles. Profiling a resource allows to constrain and extend a resource specification for a given context [76]. By providing reference implementations for the specification, HL7 intends to reduce the entry barrier for developing FHIR conformant solutions. The development of the specification and the standard follows a developer first approach, which is reflected by the specification as a mixed standard comprising normative portions and parts still undergoing trial use [75].

3.4 Terminology Systems

Medical language is full of homonyms, synonyms, eponyms, acronyms and abbrevia- tions [4]. However, to enable semantic interoperability, and thus prepare for process interoperability, a common understanding of the terms in the domain of discourse (i.e., medicine) must be reached. Schulz et al. [74] distinguishes between (1) thesauri, like the Medical Subject Headings (MeSH) [78], (2) aggregation terminologies or classification systems, like ICD-10, and (3) ontologies, like SNOMED CT. Meta-repositories like the Unified Medical Language System (UMLS) [79] and BioPortal [80] list hundreds of clinical terminology systems, again, some of them complementary, others competing.

36 3.4. Terminology Systems

“Many researchers still tend to create their own ontologies to suit their specific use case. Re-use of existing ontologies is only a rarity. If left unchecked, this tendency has the potential of growing into the very problem that ontologies are created to solve – the multitude of ontologies will itself become the barrier to data interoperability and integration.” Fung and Bodenreider, Knowledge Representation and Ontologies [81]

This section introduces the terminology systems referenced in this thesis and explains their basic concepts.

3.4.1 ICD-10

For classification of clinical diagnoses and health problems, the commonly accepted system is the International Classification of Diseases or ICD, which is maintained by the WHO. The most current version is ICD-10 and it utilizes an alphanumeric coding scheme with more than 14.000 single clinical codes of medical terms organized hierarchically into 22 chapters. ICD-10 focuses on diagnoses, but also allows for the coding of location, severity, cause, manifestation and type of health problem [82].

ICD-10 Example Chapter 19 Injury, poisoning and certain other consequences of external causes (S00-T98), lists Injuries to the elbow and forearm (S50-S59) and specifically Fracture of forearm (S52) with Fracture of shafts of both ulna and radius having the ICD-10 code S52.4.

3.4.2 RadLex

RadLex is a standard terminology for radiology that was created by the Radiological Society of North America (RSNA) with the aim to improve the clarity of reports, to reduce radiologist variation, to enable access to imaging information, and to improve the quality of practice [83]. With the current version 4.0 of RadLex, released January 2019, the whole base ontology was transformed into a native OWL representation and can be found online3.

RadLex Example To code the fact that an examination was made using a CT imaging modality, RadLex provides directly under the top-level RadLex entity (RID1) a class called imaging modality (RID10311) with a sub-class tomography (RID28840) that has a sub-class computed tomography with the RadLex ID RID10321.

3http://www.radlex.org/, last access 17.01.2021

37 3. Interoperability and Healthcare Data Standards

3.4.3 SWIM Lexicon In 2012 the Society for Imaging Informatics in Medicine (SIIM) formed an initiative to improve operational processes in healthcare, the SIIM’s Workflow Initiative for Medicine (SWIM). This initiative aimed to create a lexicon that consistently names, codes and describes the workflow steps (activities) in radiology departments [84]. Recently, this lexicon was fully integrated into the RadLex ontology and all SWIM terms are children of workflow term (RID45812).

SWIM Example Coding the different organizational workflow steps, SWIM defines the RadLex ID RID45825 for the event of a patient check-in at the imaging facility (preferred name: Pt Arrived).

Coding of the Running Example The running example from section 1.1.1 comprises seven consecutive activities in a radiology practice. Different transactions, standards and codings are involved in recording the occurrence of the respective events. Table 3.4 lists the activities and their matching codes from the SWIM Lexicon [24].

Activity RadLex ID Pref. Name Definition Schedule Appoint- RID45821 PtAccept Patient accepts appointment, or ment may modify scheduling Patient Admission RID45825 Pt Arrived Patient check-in at the imaging facility Radiological Exami- RID45897 PatientIn (Time when) the patient enters nation the procedure room Diagnosis RID45859 Dictated Physician reviews image and renders a report in electronic audio format Report Writing RID45832 AudioTransmit Audio transmitted to ’speech- to-text’ system (human tran- scriber or speech recognition) Report Attestation RID45924 Final report Final electronic text report is approved approved(signed) by the inter- preting physician Report Transmission RID45865 FinalPublish Final report is sent to ordering physician (EMR confirmation of receipt)

Table 3.4: Coded activities of the running example using the SWIM lexicon in RadLex.

38 3.4. Terminology Systems

3.4.4 LOINC Logical Observation Identifiers Names and Codes (LOINC) is a database and coding system for laboratory observations. It was developed in 1994, and is still maintained by the US nonprofit medical research organization Regenstrief Institute. LOINC is publicly available at no cost under https://loinc.org/. LOINC provides codes for the observation names (e.g. eye color), not the observation findings (e.g. blue eyes) [4]. Moreover, LOINC distinguishes between tests and observa- tions and describes the details in their fully specified names. These fully specified names use a six-part semantic model to unambiguously identify them [85].

LOINC Example The fully specified name for the most common laboratory observationa is Creati- nine|SCnc|Pt|Ser/Plas|Qn. The LOINC code is 14682-9. The parts of the fully specified name are (1) Component or analyte, the measured or observed substance or entity: Creatinine; (2) Property, the characteristic or attribute of the component or analyte: SCnc (substance concentration); (3) Time, the time interval over which an observation was made: Pt (point in time); (4) System, the specimen or thing upon which the observation was made: Ser/Plas (Serum or Plasma); (5) Scale, describing how the observation is quantified or expressed: Qn (quantitative). The sixth part is optional and describes the Method, providing a classification of how the observation was made. It is only used when the clinical interpretation of the results is affected by the technique [4]. LOINC also provides alternative names for different uses. The Long Common Name is Creatinine [Moles/volume] in Serum or Plasma, the Short Name is Creat SerPl-sCnc and the Display Name is Creatinine [Moles/Vol].

ahttps://loinc.org/usage/, last access 17.01.2021

3.4.5 SNOMED CT The Systematized Nomenclature of Medicine – Clinical Terms is an internationally recognized standard that classifies clinically-relevant terminology and concepts, along with their synonyms and relationships, into numeric coded values [4]. Available in multiple languages and maintained by SNOMED International, there are currently over 340,000 numerically coded concepts that can be combined grammatically to create an expression.

SNOMED CT Example Accident and Emergency department (environment) is the fully qualified name and 225728007 is the SNOMED CT ID (SCTID). Synonyms are Accident and Emergency department, A & E - Accident and Emergency Department, AED - Accident and Emergency department, and Casualty department. It has an Is-a relationship to Hospital department (environment), SCTID: 284548004.

39 3. Interoperability and Healthcare Data Standards

3.5 Integration Profiles

The statement of Fung and Bodenreider [81], also cited in section 3.4, “... the multitude of ontologies will itself become the barrier to data interoperability and integration.” is as true for ontologies as it is for data exchange standards. One approach to tackle this issue of too many standards to chose from is developed by IHE, so-called integration profiles.

IHE integration profiles are defined in the technical frameworks (TFs) of each domain. These TF comprise a set of documents, called Volumes. For each TF the structure is the same. Volume 1 is called Integration Profiles and describes specific use cases, e.g., Cross-Enterprise Document Sharing (XDS) for exchanging medical documents across organizational boundaries. These integration profiles follow a structured approach, listing for every use case all the actors involved and also the transactions, that define how the actors communicate.

“Actors are information systems or components of information systems that produce, manage, or act on information associated with operational activities in the enterprise.” IHE Actor Definitions [86]

Volume 2 specifies the details on the transactions, i.e., how to use established standards (e.g., HL7 v2 or DICOM) to solve the use cases. Some domains provide a Volume 3 with content modules containing value sets to ensure semantic interoperability. IT infrastructure and radiology additionally provide a Volume 4 with national extensions.

An integration profile starts with a diagram depicting actors and transactions (e.g., figure 3.1). The actors are represented as boxes, the transactions as arrows. The transactions are coded with the domain abbreviation (e.g., ITI for IT-Infrastructure, or RAD for Radiology) and the corresponding chapter number in the Volume 2 of the respective technical framework of that domain. Actors are defined by their ability to support those transactions.

3.5.1 ATNA

Audit Trail and Node Authentication (ATNA) is an IHE integration profile in the IT infrastructure domain. Being one of the basic profiles dealing with IT infrastructure in healthcare, ATNA defines how to build up a secure domain that provides patient information confidentiality, data integrity, and user accountability [87]. A secure do- main can scale from department to enterprise to cross-enterprise size. To ensure user accountability, ATNA specifies the use of a centralized Audit Record Repository (ARR) where all Audit Messages are stored. Consequently violations of security policies can be detected, especially regarding protected health information which includes all kinds of patient-identifiable information records [87].

40 3.5. Integration Profiles

In a joint effort IHE, HL7, DICOM, ASTM4 E31, and the Joint NEMA/COCIR5/JIRA6 Security and Privacy Committee defined the structure of these audit messages using Extensible Markup Language (XML) schema. The normative specification of the messages is defined in the DICOM standard PS3.15: A.5 Audit Trail Message Format Profile. The original intention of ATNA event audit logging was to provide surveillance logging functions to detect security events and deviations from normal operations. It was not designed for forensic or workflow performance analysis. However, the integration profile states that forensic or workflow analysis logs may also use the same XML schema and IHE transactions [87] and recent developments propose the use of ATNA for keeping track of the whole workflow (see next section 3.5.2). A recent development is the addition of RESTful ATNA that utilizes the HL7 FHIR standard to feed and query audit messages to and from an ARR. At the time this thesis is written, the respective TF supplement is in its third revision of the draft for public comment [88].

Actors and Transactions Figure 3.1 shows the actors and transactions of the ATNA integration profile together with the actors and transactions of the latest supplement for RESTful query.

Figure 3.1: ATNA actors and transactions [87] including RESTful query [88] (grey).

4International SDO formerly known as American Society for Testing and Materials 5European coordination committee of the radiological, electromedical and healthcare IT industry 6Japan medical imaging and radiological systems industries association

41 3. Interoperability and Healthcare Data Standards

Secure Node actors and Secure Application actors provide security and privacy services (user authentication, secure communications, security audit recording, and security policy enforcement) [87]. Secure Nodes are complete systems, controlling everything from hardware over user interface to network connections (e.g., an ultrasound machine). Secure Applications do not control the whole stack, they only provide security functions to other grouped actors (e.g., to a radiology information system (RIS)). Details to those actors (as well as to the Audit Record Forwarder actor) and to the Node Authentication [ITI-19] transaction are not relevant for this thesis and can be found directly in the integration profile. The ARR’s role is to store the audit messages sent by Secure Application and Secure Node actors. It shall support the Record Audit Event [ITI-20] transaction specified in Volume 2a [89]. RESTful ATNA adds the Audit Consumer actor that “queries an Audit Record Repository for syslog and ATNA audit records using Syslog metadata and ATNA audit record content” [88]. Of all transactions listed in this integration profiles, only the Retrieve ATNA Audit Event [ITI-81] transaction is relevant for this thesis. It is described in [88] and is based on HL7 FHIR. It will be a central part of the process mining interface presented in section 5.4.

3.5.2 SOLE Standardized Operational Log of Events (SOLE) is a recently developed IHE integration profile. It is a supplement for the radiology technical framework and currently in revision 1.2, published for trial implementation in mid 2018 [90]. SOLE describes the capture and retrieval of operational events in the radiology domain and utilizes the actors and transactions from the ATNA profile, including the new RESTful ATNA [88]. The profile authors’ incentive for writing the SOLE integration profile was the strong desire of healthcare providers “to increase throughput and efficiency, both to improve the quality and timeliness of care and to control costs” [90]. Thus, workflow events must be captured in order to be able to apply business intelligence tools. SOLE provides [90]:

• Event descriptions for the commonly reported events based on the SWIM Lexicon. • A standard method to collect the event reports as they are logged from many different systems (based on ATNA). • The ability to analyze data provided by many different vendors without writing special software for each vendor. • The ability to compare experiences between different organizations.

The profile does not elaborate on the business intelligence tools that should be applied to the data. Chapter 5 will show how process mining can be used to analyze event logs created based on SOLE.

42 3.6. Discussion

3.6 Discussion

Although there are still many standards to chose from, this chapter emphasizes their convergence. The developments of recent years, from coding schemes over data exchange standards to profiles, show the efforts to enable semantic and process interoperability. In the field of radiology alone there is an increasing number of co-operations and initiatives to harmonize coding systems and create mappings between existing ones. The RadLex terminology now incorporates the SWIM Lexicon [84] and Regenstrief and RSNA agreed to provide mappings for their radiology procedure codes in the LOINC/RSNA Radiology Playbook and developed a joint governance process for ongoing maintenance of the terminology [91]. The IHE integration profile SOLE builds upon all these coding schemes to enable the transport of the meaning of workflow steps. Moreover, only with the recent development of the HL7 FHIR RESTful transactions for the ATNA integration profile, SOLE can now integrate devices and systems that were not able to provide workflow-related messages using the syslog protocol (e.g., mobile applications). This chapter introduced all data exchange standards, codes and profiles necessary to fully describe all workflow steps and transactions involved in the running example of the radiology workflow [24].

43

“The potential of IT to prevent medical errors and thereby improve healthcare quality is un- deniably attractive.” Richard Lenz and Manfred Reichert

CHAPTER 4 State of Process Mining in Healthcare

45 4. State of Process Mining in Healthcare

This chapter gives an overview on the state of process mining in healthcare and contributes to the field by providing a review on recent case studies and by presenting a reporting template for case studies. It is mostly based on the author’s work in [18] and [26].

• Section 4.1 shows the development of the field based on existing recent literature reviews and discusses the classifiers and descriptors used in those reviews.

• Afterwards, in section 4.2, a methodology for analyzing existing case studies and new descriptors is introduced. The descriptors are utilized to specify patient encounter environment, clinical specialty, and medical diagnoses using standard clinical coding schemes.

• Section 4.3 describes the results of the literature review conducted for a three-year period, applying the descriptors in practice.

• Section 4.4 analyzes the results and discusses the need for a reporting template.

• The last section 4.5 presents an outline for the reporting template. It can be used as a checklist for the reporting of case studies on process mining in healthcare.

4.1 Other Reviews

In recent years, several review papers provide an overview on the state of process mining in healthcare, e.g., [16, 17]. Rojas et al. [16] in 2016 wrote a literature review that was well received by the community, with over 360 citations by the end of 2020 according to Google Scholar. The two main objectives were to (1) identify existing case studies and to (2) describe the most important aspects of these studies. They ended up with eleven common aspects across 74 case studies. These aspects include methodologies, techniques or algorithms, medical fields and healthcare specialty (cf. table 4.1). In 2018, Erdogan and Tarhan [17] conducted a systematic mapping of 172 case studies with mostly the same metrics and aspects. In addition, they also created a list of secondary studies, that is, surveys and literature reviews, in the field of process mining in healthcare and provide the respective number of primary studies, highlighting the increasing interest in the field. Erdogan and Tarhan also analyzed the 172 case studies by venues to identify the most important conferences and journals. With 21 studies, the International Conference on Business Process Management (BPM) was first and the Journal of Biomedical Informatics second (8 studies). These review papers are very specific as to how the case studies were conducted, which enhances comparison between different process mining techniques in different settings. However, from a medical perspective, the terms and categories listed as medical fields [16] and later as healthcare specialty [17] are not structured in a uniform way, and do not follow a common coding scheme or standard.

46 4.2. Literature Review Methodology

Aspects Summary Process types Two classifications: Medical Treatment Processes and Organizational Processes; or Non-elective and Elective Care Data types Two classifications: , events, bolus drugs, in- fusion drugs and inhalation drugs; or Data from admin- istrative systems, clinical support systems, healthcare logistic systems, and medical devices Frequently posed questions Two types: Specific (e.g., do we comply with internal and external guidelines?); or generic (e.g., what happened?) Process mining perspectives Four perspectives: Control Flow, Performance, Confor- mance and Organizational Tools Three tools: ProM, Disco and RapidProM Techniques or algorithms Three main techniques or algorithms: Fuzzy Miner, Heuristics Miner and Trace Clustering Methodologies Three main methodologies: Clustering Methodology; L*life-cycle Model and Ad Hoc Methodologies Implementation strategies Three implementation strategies: Direct, Semi- Automated and using an Integrated suite Analysis Strategies Three analysis strategies: Basic, without new implemen- tation, with new implementation or with analysis from other fields Geographical analysis Two main countries: The Netherlands and Germany Medical fields Two main medical fields: Oncology and Surgery

Table 4.1: Tabular summary of the 2016 literature review of Rojas et al. [16].

Both reviews [16, 17] ultimately aimed to “show the multidisciplinary character of process mining in healthcare, and its potential application to all medical fields” [16]. Further, basic characteristics of the event log data (like timeframe, number of cases/patients, or healthcare facility/organization) are not always clearly reported.

4.2 Literature Review Methodology

For this thesis, further to the studies examined by Rojas et al., a forward search of processing mining case studies in healthcare was conducted for the three-year period from January 2016 to December 2018 [18]. We identified case studies that described basic characteristics of the event log data, and where information on the patient encounter environment, clinical specialty, and medical diagnoses could be assigned under a standard clinical coding scheme. Section 4.2.1 describes how the forward search was conducted and which criteria we applied to filter the results.

47 4. State of Process Mining in Healthcare

Figure 4.1: Timeline and dependency of the reviews of Rojas et al. [16] and Helm et al. [18]. The arrows indicate that studies in the latter reference studies in the former.

The review focused on answering three questions: (1) Which clinically-relevant case studies of process mining in healthcare will be selected for this study? (2) What were the technical aspects identified? (3) How can we improve the clarity and comparability of the clinical terms and aspects described?

4.2.1 Selection of Clinically-relevant Case Studies Our starting point was the review paper by Rojas et al. [16] which identified 74 case studies where process mining tools, techniques or algorithms were applied in the healthcare domain. We then performed a forward search using Google Scholar, in reference to the 74 identified articles and the review paper itself. The inclusion criteria (IC) were applied at once in the Google Scholar search and the exclusion criteria (EC) were applied manually afterwards (see Figure 4.2).

• IC1: All articles that reference either the review paper by Rojas et al. [16] or any of the 74 articles identified in their review were included (cf. figure 4.1).

• IC2: All articles published between 01.01.2016 and 31.12.2018 were included.

• IC3: All articles published in English were included.

• EC1: Articles that do not include evidence of a clinically-relevant1 case study of process mining in healthcare were excluded.

• EC2: Articles that present a case study based on data that was already used for an earlier case study were excluded.

• EC3: Articles that do not describe the characteristics of the event log data (e.g., timeframe, number of cases or patients, healthcare facility) or do not describe which process mining technique or algorithm was applied were excluded.

• EC4: Articles that did not describe any clinical context (i.e., clinical specialty or ) were excluded.

1This decision was made by a medical doctor, Alvin C. Lin, who is also co-author of the study [18, 26].

48 4.2. Literature Review Methodology

Figure 4.2: Flowchart on the case study selection strategy.

49 4. State of Process Mining in Healthcare

4.2.2 Process Mining Aspects A detailed account of the techniques or algorithms and tools used in process mining case studies in healthcare has been previously described in [16]. Also, other technical descriptors such as the data type and geographical analysis have been used to describe the event log data (cf. table 4.1). In this work, regarding the process mining aspects, the focus was on (1) the tools used in the case studies, (2) the techniques or algorithms used, and (3) the process mining perspectives.

4.2.3 Clinical Aspects and Standard Coding Schemes To improve the clarity and comparability of the clinical aspects described in the selected papers, we adopted the use of standard clinical coding schemes of SNOMED CT and ICD- 10. Namely, the clinical terms were matched to their best corresponding standard clinical descriptor, with respect to three clinical categories: (1) the type of patient encounter environment, (2) clinical speciality, and (3) medical diagnosis (i.e., disease or health problem). The SNOMED CT international browser2 was used in version v20190131 for clinical descriptors on the Patient encounter environment and Clinical specialty. The WHO ICD-10 browser in the 2016 version3 was used for clinical descriptors on medical diagnoses.

4.3 Results of the Review

Following the structure of section 4.2, the results of the case study selection are presented first. Second, the process mining aspects in the selected case studies are analyzed, including tools, techniques or algorithms, and process mining perspectives. At last, the clinical aspects - encounter environment, clinical specialty, and medical diagnosis - are described in a standardized way, using SNOMED CT and ICD-10.

4.3.1 Selected Case Studies The forward search yielded initially a total of 540 papers, and after the inclusion and exclusion criteria were applied, 38 articles were selected (cf. Figure 4.2). For all 38 papers, basic characteristics of the event log data were retrieved (e.g., origin of data, number of cases or patients, healthcare facility, timeframe of the study). The results of the technical and clinical aspects are described below.

4.3.2 Process Mining Aspects The description of the process mining aspects follows the terminology and method of the review by Rojas et al. [16].

2https://browser.ihtsdotools.org/, last access 17.01.2021 3https://icd.who.int/browse10/2016/en, last access 17.01.2021

50 4.3. Results of the Review

Tools Table 4.2 summarizes our findings of the most commonly used tools to enable process mining techniques and algorithms. ProM [58] was the most frequent, found in 18 of the selected case studies, and was also the most frequent in earlier reviews [16]. Disco [49] is becoming more prevalent with a total of 11 case studies. To complete our analysis, PALIA was used twice in combination with another tool or technique.

Table 4.2: Studies with their most commonly used tools (non-disjoint).

Tool Papers ProM [92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 31, 104, 105, 106, 107, 108] Disco [93, 94, 95, 109, 110, 31, 111, 112, 113, 114, 115] PALIA [116, 117] pMineR [118] Others [119, 120, 121, 98, 92, 111, 116, 122, 117, 123, 124, 115, 104] Self-Developed [119, 125, 126, 127, 113, 102]

There are a variety of other tools, often used together with ProM, listed in the table for a total of 13 papers. Six case studies introduced self-developed tools.

Techniques or Algorithms Table 4.3 describes the four most used techniques and algorithms amongst the selected case studies. Our analysis revealed that Fuzzy miner (as implemented in Disco [49]) was most frequently used, appearing in 11 of the case studies. Of note, several papers that utilized ProM also presented self-developed approaches that were case-specific based on the ProM environment. Further, the Inductive visual miner [128] is one of the more recent built-in miners in ProM, and is now more frequently used and reported as such. Five case studies used the Trace Clustering [129] technique. Other types like BPMN, analysis of variance (ANOVA) and machine learning were sometimes used but not on a frequent basis. While the Heuristic miner algorithm was most frequently used as per the previous reviews [16, 17] it was only used in two of our 38 selected papers.

Table 4.3: Papers with their corresponding techniques or algorithms.

Technique/Algorithm Papers Fuzzy Miner [115, 93, 94, 95, 109, 110, 31, 111, 112, 113, 114] Inductive Miner [99, 101, 100, 96, 105, 103] Clustering [120, 119, 112, 92, 118] Heuristic Miner [106, 96] Self-Developed [96, 97, 99, 127, 101, 130, 102, 113, 125, 123, 126]

51 4. State of Process Mining in Healthcare

Process Mining Perspectives Our analysis showed that the majority of the case studies (30 in the total) mainly aimed for the Control Flow perspective in their dataset (cf. Table 4.4). Of those remaining, five papers analyzed the Conformance perspective, two focused on Organizational, and one on Performance. Table 4.4: Papers with their corresponding process mining perspectives.

Perspective Papers Control Flow [93, 92, 120, 119, 121, 96, 97, 98, 116, 99, 100, 115, 109, 127, 122, 111, 101, 130, 112, 102, 113, 103, 125, 114, 104, 105, 123, 107, 108, 124] Conformance [118, 31, 95, 106, 126] Organizational [110, 117] Performance [94]

4.3.3 Clinical Aspects Using Standard Clinical Descriptors Encounter Environment From the patient’s perspective, we considered five clinical settings or encounter environ- ments: (1) Inpatient, (2) Outpatient, (3) Accident and Emergency department (AED), (4) General practitioner (GP) practice site, and (5) Pharmacy. All five encounter envi- ronments can be coded using SNOMED CT. For each paper, at least one of these five encounter environments was retrieved. Most of the papers examined events within the Inpatient environment, followed by AED environment (cf. Table 4.5).

Table 4.5: Papers with their corresponding SNOMED CT encounter environment.

SNOMED CT Environment Papers 440654001 Inpatient [101, 98, 124, 102, 120, 123, 92, 94, 130, 125, 127, 126, 112, 108, 106, 99, 104, 111, 119, 122, 31, 114, 109, 105, 115, 100, 107, 121, 113, 97] 440655000 Outpatient [103, 124, 93, 118, 95] 225728007 AED [110, 103, 96, 116, 99, 104, 122, 114, 107, 121, 113, 97] 394761003 GP practice site [117] 264372000 Pharmacy [109]

Clinical Specialty SNOMED CT offers the code of 394658006 for Clinical specialty, which further contains 18 high-level specialties. Table 4.6 shows 11 of the 18 high-level clinical specialties were

52 4.3. Results of the Review identified in our selected papers. The most frequently identified clinical specialty was Medical specialty, followed by Surgical specialty and Emergency medicine.

Some of the 18 high-level specialties in SNOMED CT are further divided into sub- specialties of greater clinical specificity. For example, Medical specialty has 44 sub- specialties that include e.g., Dermatology, Neurology and Cardiology. In this paper, we identified and assigned sub-specialities to their corresponding high-level Clinical specialty. Also, for example, if several different medical sub-specialities were described in one paper, we counted these sub-specialities together as Medical specialty.

Table 4.6: Papers with their corresponding SNOMED CT clinical specialty.

SNOMED CT Clinical Specialty Papers 394592004 Clinical oncology [124, 123, 100] 394581000 Community medicine [117] 722163006 Dentistry [93, 95] 722164000 Dietetics and nutrition [117] 773568002 Emergency medicine [110, 103, 98, 96, 116, 99, 104, 122, 114, 107, 121, 97] 394814009 General practice [117, 93, 95] 408446006 Gynecological oncology [127] 394733009 Medical specialty [101, 98, 102, 92, 118, 130, 125, 126, 106, 95, 120, 111, 119, 31, 109, 115, 107, 121] 722165004 [110, 117, 95] 394585009 Obstetrics and gyn. [102, 127] 394732004 Surgical specialty [98, 123, 118, 94, 127, 108, 112, 95, 105, 100, 97]

Medical Diagnosis

For each paper, we focused on identifying the medical diagnosis (i.e., disease or health problem) or description of a medical diagnosis. We then assigned these terms to their corresponding highest chapter or block category in ICD-10. Table 4.7 shows a total of 15 out of the 22 ICD-10 chapter categories for disease and health related problems were covered amongst the papers. The category with the most papers listed was Diseases of the circulatory system followed by Neoplasms. Two papers [102, 106] were not included in Table 4.7, since several hundred diseases and health problems were cited and classified using ICD-9. Of the remaining 36 case studies, ICD-10 was already used in 8 papers to code the diagnosis [107, 119, 124, 100, 98, 106, 126, 120].

53 4. State of Process Mining in Healthcare

Table 4.7: Papers with their corresponding ICD-10 medical diagnosis.

ICD-10 Diagnosis Papers A00-B99 Certain Infectious and parasitic diseases [98, 99, 104, 122, 109, 107] C00-D48 Neoplasms [98, 124, 100, 123, 127, 105, 111, 31] E00-E90 Endocrine, nutritional and metabolic diseases [118, 130, 98, 101, 117, 95] F00-F99 Mental and behavioural disorders [126, 98] G00-G99 Diseases of the nervous system [98] H60-H95 Diseases of the ear and mastoid process [98] I00-I99 Diseases of the circulatory system [130, 121, 98, 92, 125, 111, 94, 101, 119, 116, 120] J00-J99 Diseases of the respiratory system [110, 98] K00-K93 Diseases of the digestive system [110, 93, 114, 98] M00-M99 Diseases of the musculoskeletal system and [98, 126, 110] connective tissue N00-N99 Diseases of the genitourinary system [98] O00-O99 Pregnancy, childbirth and the puerperium [98] R00-R99 Symptoms, signs and abnormal clinical and [110, 96, 98] laboratory findings, not elsewhere classified S00-T98 Injury, poisoning and certain other conse- [115, 98, 110, 103, 96, quences of external causes 113, 97] Z00-Z99 Factors influencing health status and contact [118, 98, 112, 108, 123] with health services

4.4 Conclusion

Whether for process discovery, conformance checking, or enhancement, process mining case studies are influenced by the quality of the labeled data. The benefits of high-quality, labeled data include improved accuracy, efficiency, and predictability of processes, not only for the study itself but also for comparability across studies. Further, high-quality, labeled data can make other kinds of future analyses and even machine learning techniques (e.g., supervised learning, trend estimation, or clustering) easier and more efficient to achieve. In process mining case studies in healthcare, labeled data often encompasses clinical aspects and terms. As such, the aim was to examine clinically-relevant case studies since Rojas et al. [16] and determine how to improve upon the clarity and comparability of clinical aspects and terms described.

54 4.4. Conclusion

4.4.1 Reporting Basic Characteristics of the Event Log Data For our analysis, we selected papers that described basic characteristics of the event log data. These characteristics included the origin or source of the data, the healthcare facility, the number of cases or patients, and the timeframe of the study. For example, in Rinner et al. [31], event logs were extracted for a total of 1023 patients starting melanoma surveillance between January 2010 to June 2017, from a local melanoma registry in a medical university and Hospital Information System (HIS) in Austria. In papers where these characteristics were not clearly reported, the retrieval process was time-consuming. Several papers provided additional details (e.g., patient age, data from private insurance or public health records). Presumably for reasons of privacy and anonymity, specifics on the healthcare facility (e.g., hospital name) were not always provided, however, the country of origin was always reported. While variations exist in the style of reporting, we recommend case studies include these aforementioned basic characteristics when reporting the event log data.

4.4.2 Adopting the Use of Standard Clinical Descriptors Encounter Environment A patient can have vastly different experiences within the healthcare system depending on the clinical setting or encounter environment. For example, a patient with heart failure who presents to the AED may require admission as a hospital inpatient, follow-up at their GP practice site or outpatient , and prescription drugs at a pharmacy. As such, in our analysis of the selected papers, we focused on five patient encounter environments: Inpatient, Outpatient, AED, GP practice site, and Pharmacy. All five encounter types can be coded by SNOMED CT. While further details can be provided (e.g., outpatient clinic for thyroid disease [118]), we recommend case studies report at least the patient encounter environment using standard clinical codes, e.g., SNOMED CT.

Clinical Specialty Different clinical specialties are often involved in the care of a patient. For example, for a patient diagnosed with , a multidisciplinary care plan can encompass input from a medical specialty, a surgical specialty and clinical oncology. As each specialty offers their own unique set of knowledge and expertise, it is important to identify which clinical specialty is involved. For each of our selected papers, we identified at least one of the 18 high-level clinical specialties coded by SNOMED CT. For greater specificity, SNOMED CT offers further standard clinical codes for sub-specialities. In fact, Baek et al. [98] list multiple sub- specialities along with their corresponding SNOMED CT codes in their study. Also, instead of Clinical specialty, another category of clinical descriptors such as the type of

55 4. State of Process Mining in Healthcare

medical practitioner or occupation could have been considered (e.g., mapping to surgeon instead of surgical specialty).

In any event, the task of identifying and assigning such standard clinical codes is time consuming, and beyond the scope of this paper. For future case studies, we recommend reporting the clinical specialty (or similar clinical descriptor such as medical practitioner) by adopting standard clinical codes e.g., SNOMED CT.

Medical Diagnosis

There are thousands of different medical diagnoses, and each diagnosis comes with its own treatment and management plan. ICD-10 is a standard coding scheme in healthcare that provides specific clinical descriptors and codes for diseases and health conditions.

In our analysis, we were able to identify at least one medical diagnosis or description of a medical diagnosis in each paper, which we could map to the corresponding ICD-10 code. Further, over 25% (10 out of 38) of our selected papers utilized either ICD-9 or ICD-10 codes in their study. For broader comparison across studies, we assigned the selected papers to one or more of the 22 ICD-10 chapters or block categories. In Table 4.7 we only listed the ICD-10 chapters that were covered in the case studies.

It is important to distinguish the difference between a medical diagnosis (i.e., the process of identifying the disease or medical condition that explains a patient’s signs and symptoms) versus a patient’s signs (e.g., rash) or symptoms (e.g., cough). While the majority of ICD-10 chapters describe a group of medical diagnoses, some cover other clinical descriptors, such as signs and symptoms (R00-R99), external causes of morbidity and mortality (V01-V98), and codes for special purposes (U00-99). In our own analysis (albeit from a selected number of papers), not all of the ICD-10 chapters were covered, since our focus was on identifying the medical diagnosis in these papers. ICD-10 also allows for the coding of location, severity, cause, manifestation and type of health problem [82].

Taken together, we recommend adopting use of a standard coding scheme e.g. ICD-10 for clinical terms and aspects relating to medical diagnosis in process mining case studies in healthcare. Recently developed, ICD-11 is not adopted yet but provides backward compatibility, i.e., ICD-10 coded case studies will be comparable to newer ICD-11 coded ones, once the new coding scheme will be taken on by the information system vendors.

4.4.3 The Need for a Reporting Template

In summary, we propose adopting a standard for describing event log data and reporting medical terminology using standard clinical descriptors and coding schemes. In doing so, the goal is to improve accuracy and comparability across future clinically-relevant process mining case studies in health care. As such, we provide a sample checklist template of standard criteria for the reporting of such case studies, in section 4.5.

56 4.5. Reporting Template Outline

In scientific research, the idea of having a set of guidelines, criteria, or standards for peer- reviewed publications is not new. In fact, journals such as Nature4 are taking initiatives by creating mandatory reporting summary templates, in order to improve comparability, transparency, and reproducibility of the work they publish [131]. Other journals and disciplines, including biomedical informatics, are following suit [132]. Thus, as data sets become more transparent and available, consistency in reporting characteristics of the event log data (e.g., origin of data, number of patients or cases, healthcare facility, timeframe of the study) will aid in improving comparability and reproducibility. Further to the work by Rojas et al. [16], we identified and described the clinical terms and aspects in our selected papers with respect to three categories: the patient encounter environment, clinical specialty, and medical diagnosis. We then correlated the clinical terms and aspects to their respective standard clinical descriptors and codes found in SNOMED CT and ICD-10. For studies where a higher granularity for patient encounter environments is needed, SNOMED CT offers more codes. Similarly, for Clinical specialty in SNOMED CT, reporting of sub-specialties under e.g., Medical speciality will provide increased specificity for clarity and comparison. As aforementioned, several case studies have already adopted the use of a standard clinical coding scheme to describe medical diagnoses. Howevever, our consideration of SNOMED CT and ICD-10 serves only as a starting point. In fact, SNOMED CT also provides standard codes for medical diagnoses, which can provide further specificity and clarity. For example, instead of ICD-10, the one of Systematized Nomenclature for Dentistry or Systematized Nomenclature for Dentistry (SNODENT CT) (part of SNOMED CT) could be used to code the clinical descriptors of missing and filled teeth in [93]. Finally, when adopting the use of standard clinical descriptors, we recognize other fundamental clinical categories to consider are medical investigations and procedures. As such, the use of standard clinical descriptors is becoming increasingly relevant, not only for clarity and comparability, but efficiency in outcome measurements such as length of stay (LOS) and financial cost. For example, in their paper, Baek et al. [98] utilized process mining techniques and statistical methods to identify the factors associated with LOS in a South Korean hospital. This study is just one use case for a more detailed description of the medical context where process mining case studies could allow for future meta-studies, e.g., benchmarking LOS in different hospitals or countries, based on diagnoses while also considering other important factors like the patient encounter environment.

4.5 Reporting Template Outline

The following tables can be used as a checklist template when writing a case study on process mining in healthcare. They are intended to help address key base data (table 4.8), clinical (table 4.9), and technical (table 4.10) aspects.

4https://www.nature.com/documents/nr-reporting-summary-flat.pdf, l.a. 17.01.2021

57 4. State of Process Mining in Healthcare

Table 4.8: Aspects that describe the basic characteristics of the data.

Aspect Description / Example Data Source E.g., Administrative system, Clinical support system, Medi- cal devices Descriptive Statistics Statistics of the base data; e.g., number of cases or patients Timeframe The period during which the underlying data was collected Geographical Area Country or region where the data was collected

Table 4.9: Clinical aspects of the mined healthcare process.

Aspect Coding Listing / Example Scheme Process Type - Organizational or medical treatment pro- cess; following the definition in [133] Encounter Type - Elective or non-elective care Encounter Environ- SNOMED CT See Table 4.5; e.g., Inpatient (440654001) ment Clinical Specialty SNOMED CT See Table 4.6; e.g., Dentistry (722163006) Diagnosis ICD 10 E.g., J10.0 Influenza with pneumonia, sea- sonal influenza virus identified Investigations or Pro- - E.g., Complete blood count, X-ray imaging, cedures Colonoscopy, Appendectomy

Table 4.10: Aspects of the process mining techniques.

Aspect Listing / Example Type Discovery, Conformance or Enhancement Perspective Control-flow, Organizational, Case, Time Tools (Version) E.g., ProM 6.9 or Disco 2.2.1 Implementation Strategy Direct, Semi-Automated, Integrated suite Analysis Strategy Basic, New implementation, Extended analysis Methodology E.g., L* life-cycle model Techniques/Algorithms E.g., Fuzzy mining, Inductive mining

This template was used for the case study in chapter 6.2 and there was a detailed response [134] to its publication which will be addressed in section 7.3.1.

58 “A hospital is not a factory and patients can- not be cured using a conveyor belt system.” Ronny S. Mans et al.

CHAPTER 5 Mining Audit Trails

59 5. Mining Audit Trails

This chapter is largely based on the author’s work in [10] and the joint work with de Murillas in [28]. The goal was to find a way to enable process mining in distributed health information systems, without having to deal with an increasing preprocessing effort. The core questions were: (1) Does log-data, produced by means of the standardized ATNA integration profile, provide sufficient information to apply process mining methods? (2) How can we make this data usable for process mining projects?

• Section 5.1 introduces the background for standardized audit logging in healthcare based on the IHE integration profile ATNA and the new RESTful ATNA. Structure and semantics of audit messages and resources are described. • In section 5.2 a direct mapping approach from ATNA audit messages to XES event logs is described and evaluated based on the running example. • Section 5.3 shows how the Open SQL Log Exchange format (OpenSLEX) meta model can be utilized to overcome some of the issues of the direct mapping approach and to enable multi-perspective process mining. • Section 5.4 builds upon the concepts developed in the former sections and introduces an HL7 FHIR interface to query and retrieve audit logs based on RESTful ATNA. • Section 5.5 concludes this chapter and discusses the data quality and usability of audit trails in the context of process mining.

5.1 Standardized Audit Logging

As described in section 3.5.1, the IHE integration profile ATNA defines how to build up a secure domain that provides patient information confidentiality, data integrity and user accountability. It enables to detect violations of security policies, especially regarding Protected Health Information (PHI) [87]. The ATNA Profile defines how event data should be collected within (distributed) healthcare information systems and states four questions that must be answerable based on the information in an ARR [87]:

• “For some user: which patients’ PHI was accessed?” • “For some patient PHI: which users accessed it?” • “What user authentication failures were reported?” • “What node authentication failures were reported?”

Depending on the physical representation of the ARR (e.g., log file, Structured Query Language (SQL) database, or NoSQL database) the four questions can be answered by, e.g., utilizing a SQL query or running a simple script. However there are no mechanisms described to answer more sophisticated questions like:

60 5.1. Standardized Audit Logging

• “What are the typical clinical pathways in our hospital?” • “Which medical departments collaborate frequently?” • “Where are the bottlenecks in our clinical pathways?”

To map real-world activities to event logs, ATNA makes use of the “Security Audit and Access Accountability Message XML Data Definitions for Healthcare Applications” (RFC-3881). It incorporates the viewpoints of different organizations like HL7, IHE, DICOM, ASTM and the NEMA/COCIR/JIRA Security and Privacy Committee [135]. DICOM standardized parts of the RFC-3881 vocabulary and defined additional and optional elements [136]. IHE specifies the use of the DICOM vocabulary and provides extensions. Events that cannot be defined by the basis of the DICOM vocabulary have to be reported using the more general RFC-3881 schema. Events that cannot be described by that schema cannot be reported to an ARR [87]. Fig. 5.1 shows the respective schema diagram for audit messages.

Figure 5.1: IHE audit message schema diagram based on RFC-3881 and DICOM [87].

According to this schema, the event must be identified (EventIdentification), the event has one or more active participants (ActiveParticipant), it is reported by one or more sources (AuditSourceIdentification), and it may have other objects involved (Participan- tObjectIdentification).

5.1.1 IHE Audit Message Semantics The content of an IHE Audit Message depends on the type of action performed. Since Audit Messages have to capture a broad range of different events happening in an IHE environment, the format is modular and audit logs can be diverse. According to DICOM, the fields 1-6 in table 5.1 are mandatory, whereas 7-8 are mandatory only in context of the ParticipantObjectIdentification section which is optional as whole. As shown in Fig. 5.1 the IHE model conforms to that specification. The mandatory fields are described in the RFC-3881 [135] as follows:

61 5. Mining Audit Trails

Table 5.1: Selected RFC-3881 fields. 1-6 are mandatory according to [136]. 7-8 are mandatory if the ParticipantObjectIdentification is present.

Nr Name Location in Schema

1 EventID EventIdentification 2 EventDateTime EventIdentification 3 EventOutcomeIndicator EventIdentification 4 UserID ActiveParticipant 5 UserIsRequestor ActiveParticipant 6 AuditSourceID AuditSourceIdentification 7 ParticipantObjectID ParticipantObjectIdentification 8 ParticipantObjectIDTypeCode ParticipantObjectIdentification

1. EventID: “Identifier for a specific audited event, e.g., a menu item, program, rule, policy, function code, application name, or URL. It identifies the performed function.”

2. EventDateTime: “Universal coordinated time (UTC), i.e., a date/time specification that is unambiguous as to local time zones.”

3. EventOutcomeIndicator: “Indicates whether the event succeeded or failed.”

4. UserID: “Unique identifier for the user actively participating in the event.”

5. UserIsRequestor: “Indicator that the user is or is not the requestor, or initiator, for the event being audited.”

6. AuditSourceID: “Identifier of the source where the event originated.”

7. ParticipantObjectID: “Identifies a specific instance of the participant object.”

8. ParticipantObjectIDTypeCode: “Describes the identifier that is contained in Par- ticipant Object ID.”

DICOM [136] adds value sets to the verbal descriptions for the different fields. For example, the EventOutcomeIndicator can have the values: “0” Nominal Success (use if status otherwise unknown or ambiguous), “4” Minor failure (per reporting application definition), “8” Serious failure (per reporting application definition), or “12” Major failure (reporting application now unavailable). The (optional) EventActionCode can have the values “C” create, “R” read, “U” update, “D” delete, and “E” execute (CRUDE).

62 5.2. Direct Mapping Approach

5.1.2 HL7 FHIR AuditEvent Resource Together with the RFC-3881 and IHE definitions the schema from figure 5.1 was also the basis for the newer HL7 FHIR specification of the AuditEvent resource [137]. Thus, the aspects of event identification, active participants, source identification, and other objects are also part of the specification. A full XML template of the AuditEvent resource can be found in appendix B. The HL7 FHIR AuditEvent resource is managed collaboratively between HL7, DICOM, and IHE [137]. Multiple value sets for the different elements of the resource are provided in the specifica- tion [137], mostly taken directly from the DICOM specification [136].

5.2 Direct Mapping Approach

This is the first of three sections taking different approaches on the utilization of audit logs of healthcare information systems for process mining. It describes a direct transformation of the recorded ATNA audit messages to the standardized log format XES. It starts with the description of the direct mapping approach, then follows a brief description of the test infrastructure for executing the activities and recording audit messages. After that, the coding and mapping aspects of the running example in terms of ATNA audit messages are presented and test results are shown.

5.2.1 Transformation Architecture Process mining tools like Disco or ProM provide import interfaces that allow to map certain fields of a database or a comma-separated values (CSV)-file to the respective XES fields. The mapping strongly depends on the process mining task at hand and the questions that should be answered (e.g., the perspective). For example, ProM includes the XESame application to support the import of non-event log databases [58] and Disco allows the import of CSV files. In both cases the mapping task must be carried out manually, otherwise you can not import the data. The goal of the transformation approach was to keep as much information as possible and provide an automatic but semantically correct mapping from an ATNA log to an XES event log. Thus making it possible to first conduct process mining tasks and then decide how to continue processing the log to answer more specific questions like the ones stated in 5.1. In order to convert an event log recorded by the means of ATNA into an XES event log, we developed a transformation architecture based on the Meta Object Facility (MOF) standard [138]. It is influenced by the Model Driven Interoperability (MDI) approach in [139]. Fig. 5.2 shows the hierarchy of models. XML serves as the meta-meta model for RFC- 3881 and XES. Between the two meta models at M2 a definition exists how to map the components of RFC-3881 on XES. At M1, the specific instance of the RFC-3881 model,

63 5. Mining Audit Trails

the Audit Trail, is transformed into a specific instance of the XES model, the Mining Log. Both models on the M1 layer conform to their respective meta model. Of course, according to the MOF standard, the Audit Trail is also just a model representing the actual real-world events on M0.

Figure 5.2: Transformation architecture to convert RFC-3881 based Audit Trails into standardized XES Mining Logs.

In case of a log file the actual transformation of the Audit Trail into the Mining Log is conducted via Extensible Stylesheet Language Transformations (XSLT). The mapping on M2 represents a Model-to-Model (M2M) transformation, thus enabling an automatic transformation of model instances an M1. The transformation is a three step process: (1) First, the Audit Trail is checked for validity against the DICOM audit message schema [136]. (2) Second, the transformation is executed by the means of XSLT. (3) Finally, the resulting XES Mining Log file is validated against the XES schema [59].

5.2.2 Test Setting For the creation of audit messages, an IHE test system, based on the OpenHealthTools, was utilized. Maintained by the Open eHealth Foundation, the OpenHealthTools framework was later replaced by the Open eHealth Integration Platform (IPF) [140]. The environment was initially set up on behalf of the research project Workflow for Image prefetching in Radiology for ELGA (WIRE) with the aim of testing different prefetching mechanisms for radiological image data [9]. The system records the audit messages sent by IHE actors executing transactions (cf. figure 5.3). The test environment was designed in analogy to parts of the ELGA infrastructure. This yields the benefit of being able to analyze auditing conditions and information content within the scope of a nationwide implementation of an IHE-based electronic health record.

64 5.2. Direct Mapping Approach

Figure 5.3: Test setting for the direct mapping approach (implemented in [9]).

Besides the ATNA ARR, relevant actors and transactions originate from the IHE integra- tion profiles XDS, Patient Identifier Cross Referencing (PIX), and Patient Demographics Query (PDQ), all described in volume 1 of the ITI technical framework [141], cf. figure 5.3. Together, the components of the test system resemble a distributed health information system in radiology. Patient management is handled by the OpenPIX/PDQ component (transactions ITI-9 and ITI-29). The OpenXDS component handles the document man- agement. Imaging data, e.g., from CTs, MRIs, or X-Ray machines is stored in a Picture Archiving and Communication System (PACS), connected via DICOM’s Web Access to DICOM Objects (WADO) standard. Transactions that are executed in this test system:

• ITI-9 – PIX Query, to query patient identifiers.

• ITI-29 – PDQ Query, to query a patient’s demographic data.

• ITI-41 – XDS Provide and Register Document Set, to upload documents.

• ITI-18 – XDS Registry Stored Query, to search for documents.

• ITI-43 – XDS Retrieve Document Set, to download documents.

All actors are grouped with an ATNA Secure Application actor, sending audit messages to the OpenATNA ARR component. Any transaction between actors is bilaterally recorded and saved in the ARR, thus, the audit information is recorded twice by both actors sending audit messages.

65 5. Mining Audit Trails

5.2.3 Audit Messages for the Running Example In course of the research project WIRE, the organizational workflows of three radiologists were analyzed and seven major steps were identified [142, 10]. This simplified radiology practice workflow also acts as the running example in this thesis. Figure 5.4 shows the process, modeled by the means of BPMN. It was simulated by manually performing the IHE actors and executing the transactions implemented in the test system.

Figure 5.4: The simplified radiological workflow identified in the WIRE project [142, 10].

To relate the recorded events to the different steps of the process, post-processing of the transformed log was necessary. Since the test system did not use the SWIM lexicon to log the events, assumptions were made based on the executed transactions in the IHE environment. The EventID, EventActionCode and the EventTypeCode together with the EventActionCode (mandatory for most transactions) mapped to the fields of the XES Concept extension were used to identify the events. Table 5.2 outlines the mapping of the five attributes from RFC-3881 to four XES fields.

RFC-3881 7→ XES extension XES extension description [59]

EventDateTime time:timestamp The date and time, at which the event has occurred. EventTypeCode + concept:name The name of the executed activity repre- EventID + Event- sented by the event. ActionCode UserID org:resource The name, or identifier, of the resource having triggered the event.

Table 5.2: Mapping of four RFC-3881 fields to corresponding XES fields.

The following subsections go through the seven activities of the running example and explain how they were executed and recorded in the ARR. The assumptions for the mapping are based on the interviews and observations in [10, 9, 143].

66 5.2. Direct Mapping Approach

To show content and structure audit messages, one of the activities, Report Writing, also lists the respective audit message in the XML format.

Schedule Appointment and Patient Admission Interacting with the patient at the phone and later at the admission desk, the first two activites are associated with PIX and PDQ transactions, querying, updating, or writing patient data. For example, the IHE transaction ITI-9 was mapped to the Patient Admission activity because admitting the patient to a radiology practice requires a query for the patient’s identifiers. We did not map the Schedule Appointment activity.

Radiological Examination There are no recorded events in our test setting that refer to the patient entering the procedure room (cf. table 3.4, PatientIn). To identify the activity Radiological Examination, events triggered by the information systems handling the imaging data were taken into account: (1) the registration of a DICOM Key Object Selection (KOS) Document referring to the images (ITI-41), and (2) the DICOM Instances Transferred event triggered by the PACS (EventID: 110104, no EventTypeCode).

Diagnosis A radiologist’s diagnosis task starts with the retrieval of the current images and previous reports and results in the registration of an audio file (EventID: 110106, EventTypeCode: ITI-41).

Report Writing In this step the report was registered in the affinity domain (EventID: 110107, Event- TypeCode: ITI-41). Listing 5.1 shows the audit message generated by the OpenXDS component, i.e., by the XDS Repository receiving the report. Both active participants, the Document Source and the XDS Repository, are documented. Following good practice, also the related patient information is transmitted, to facilitate detection of improper creation, access, modification and deletion of PHI.

Listing 5.1: Provide & Register Document Set-b Audit Message.

1 2 3 4 5 6 7 8

67 5. Mining Audit Trails

9 10 11 12 13 14 15 Musterfrau^Maria 16 17 18

Report Attestation For the attestation only the XDS metadata of the report were changed – a legal authen- ticator was added (EventID: 110106, EventTypeCode: ITI-41, EventActionCode: “U” Update).

Report Transmission For the report transmission the report and the images were handed to the patient, thus triggering an export audit event (EventID: 110106, no EventTypeCode, EventActionCode: “R” Read). There are more potential classifiers and the post-processing has to be adapted to the specific process mining task. In this case the goal was to reconstruct a process looking like the one in figure 5.4.

5.2.4 Transformation Result Listing 5.2 shows the transformation result of the Provide & Register Document Set-b audit message from listing 5.1, corresponding to the Report Writing step.

Listing 5.2: Transformation result XES event.

1 2 3 4 5 6 7 8 9 10 11 12

68 5.2. Direct Mapping Approach

As described in Table 5.2, the element date is based on RFC-3881 EventDateTime and the values in the string elements with the keys concept:name (line 3) and con- cept:instance (line 4) are based on the codes in RFC-3881 EventTypeCode (together with EventActionCode) and EventID. The two RFC-3881 ActivePartcipant elements in the audit message were transformed into the two container elements (lines 5-9 and 10-13), comprising two string elements, org:resource (based on UserID) and org:role (based on displayName in RoleIDCode). Note that the container elements were part of the XES 2.0 standard at the time when the work on the direct mapping approach was conducted in [10]. In the Institute of Electrical and Electronics Engineers (IEEE) XES standard [59], the container elements were dropped in favor of the list elements, since they suffice to group and order elements.

5.2.5 Visualization After filtering the relevant events, the process mining tool ProM was used to visualize the result with the AlphaMiner plugin to generate a Petri net from the event log (cf. figure 5.5). This plugin uses the Alpha-algorithm presented in Algorithm 2.1 in section 2.2.1.

Figure 5.5: Discovered Petri net with the AlphaMiner.

The visualization showed that the recorded audit trails provided sufficient information to allow the reconstruction of radiology workflow steps of the running example. The resulting Petri net depicts the process steps from figure 5.4 using places and transitions.

5.2.6 Discussion and Issues The first approach is a straightforward mapping of fields from IHE ATNA audit messages to the standard extensions of the IEEE XES standard. It is simple and works for the test scenario of the running example, but has several shortcomings.

Trace Identification Due to the fact, that the original scope of the ATNA profile is only about security aspects and not about business analytics, the audit log lacks an explicit notation of process instances and activities. We handled this by assigning the traces based on the patient identifiers and by mapping combinations of attributes on specific activities. However, the trace identification presents a major problem if the same patient visits the radiologist

69 5. Mining Audit Trails

multiple times. To apply this approach on real-world audit logs, a preprocessing step is needed to distinguish the visits, e.g., based on a combination of patient identifier and date.

Fixed Perspective The XSLT always maps the same attributes in the same way. The concept:name of the events is always based on the combination of the same EventIdentification attributes. Moreover, the transformation only provides a fixes minimum set of attributes, i.e., name, odering, and context. Thus, different perspectives, e.g., based on roles and resources, cannot be taken. Since the trace identifier is also hard-wired on the patient identifier, different, non patient-centric views are not available.

Ad-hoc Semantic Match The mapping from the audit trail to the mining log is based on the mapping between the two respective meta models, RFC-3881 (with DICOM and IHE extension) and the IEEE XES standard extension definition. However, the mapping did not follow a standardized, structured methodology but was rather ad hoc. This is sufficient for a proof-of-concept on the running example, but a validation of the chosen mappings should be conducted in future work.

5.3 Data Warehouse Approach

This is the second of the three sections focusing on the utilization of audit messages for process mining. This section will go further by solving a number of remaining issues of the previous approach in section 5.2 by providing the basis for automated process mining in different contexts. It describes an approach utilizing the OpenSLEX meta model by de Murillas et al. [144], and is mainly based on the joint work on this topic in [28]. Although the approach presented in the previous section 5.2 showed that the information recorded in audit trails is, in theory, sufficient to enable process mining, several major issues remained unsolved. (1) The approach was not able to automatically identify traces. (2) The manually chosen, fixed trace identifier (the patient ID) lead to snapshot-problems and limited the possible process mining perspectives. (3) The hard-wired mapping of fields from ATNA audit messages to event logs could sometimes lead to incorrect mappings. The main goal of the joint work in [28] was to solve these issues, and to enable automated process mining on the extracted and transformed data. Section 5.3.1 introduces the OpenSLEX meta model and section 5.3.2 describes the mapping and transformation of the IHE ATNA audit messages to OpenSLEX. Finally, section 5.3.3 discusses the potential benefits and issues of this approach. While this chapter does not provide a validation of the presented approach, it sets the ground for the next and final chapter on audit trails based process mining in healthcare.

70 5.3. Data Warehouse Approach

5.3.1 OpenSLEX Meta Model Data extraction and transformation are, very often, the most time-consuming stages of a process mining project. The difficulty to tackle these tasks comes from the variability on data representations in which the original data can be found. Most applications of process mining in real-world systems provide ad-hoc solutions for the specific environment of the application. Some examples of these systems are SAP [145, 146, 147] and other ERPs [148]. Nevertheless, efforts have been made to develop standards for data representation in process mining. The IEEE XES standard [59] is the most important example, being extensively used both in academic and industrial solutions. However, despite its success at capturing event data in an exchangeable format, something that this standard misses is the data perspective on the original system.

Figure 5.6: Diagram of the OpenSLEX meta model at a high level.

With the purpose of mitigating the limitations of current event data representation standards, de Murillas et al. proposed OpenSLEX [144], which provides a meta model that takes into account not only the process view (events, instances, and processes), but also the data view (data models, objects, and object versions). Figure 5.6 shows a high level description of the meta model, where we see how granularity of data increases inversely proportional to the level of abstraction. In other words, a data model is a more abstract representation of the data than the objects or the object versions, while the latter has a greater level of granularity than the data model. The same can be said about the process view, where processes are abstract descriptions of the events, which are much more granular data. Additionally, the fact that in this meta model the process side is combined with the data side allows to capture a richer snapshot of the system under study. Unlike other meta models, like the one proposed in XES, which requires the existence of a case notion to group events into process instances, OpenSLEX enables the adoption of different

71 5. Mining Audit Trails

perspectives. Events are stored independently of any case notion. Afterwards, one or many case notions can be defined, generating the cases that will group events in different event logs. This is the key to enable multi-perspective process mining on the extracted data. The fact that not a single case notion is enforced during the data extraction phase avoids the loss of data and flexibility. Additionally, it enables the application of automated techniques that correlate events in multiple ways to show different processes or perspectives coexisting in the same system. To summarize, OpenSLEX provides a layer of standardization of the representation of data, while considering both process and data views, unlike other existing flatter event models. This makes it possible to decouple the application of analysis techniques from data extraction and transformation stages. Additionally, it enables smarter ways to analyze the information, considering the data side to enrich the process perspective. A more detailed description of the OpenSLEX meta model is available online1 and an evaluation can be found in [144].

5.3.2 Mapping and Integration The use of IHE ATNA audit messages in order to extract event data for process mining has been demonstrated in section 5.2. Now we aim at stepping up in the level of generalization, using the OpenSLEX meta model as an intermediate format for data collection. This meta model can be seen as a data schema for a data warehouse, acting as an ARR, capturing event data, together with data objects, data models, and historical data, ready to be exploited by the existing process mining techniques.

Mapping of ATNA messages to OpenSLEX The specific characteristics of ATNA messages makes them great candidates for event data extraction. Figure 5.7 shows how different fields of the ATNA message can be mapped to fields of the OpenSLEX meta model. The minimally required attributes in order to obtain events are activity names and timestamps. These two attributes can be directly mapped to the ATNA message’s fields EventID and EventDateTime, respectively. In addition, Active Participant fields such as UserID and UserName show valuable resource data to enrich the events. However, what makes ATNA messages specially attractive from the process mining perspective is the presence of ParticipantObject data. The fields within this part of the message contain not only object data information such as role (ParticipantObject- TypeCodeRole) and life-cycle (ParticipantObjectDataLifeCycle), but also object type (ParticipantObjectTypeCode) and unique object identifiers (ParticipantObjectID), which enable the traceability of data changes and behavior at object level. Additionally, detailed value pair data (ParticipantObjectDetail) of the participating object can be present. Such key-value pairs represent a snapshot of the relevant attributes of a participant object

1https://github.com/edugonza/OpenSLEX/blob/master/doc/meta-model.png

72 5.3. Data Warehouse Approach at the time of occurrence of the event, which can be seen as an object version. Object versions reveal the evolution of objects through time, and are related to the events that caused the modifications.

Figure 5.7: The dashed lines show the mapping of the fields of Audit Messages to the OpenSLEX meta model.

Data extraction and transformation are difficult tasks that require a significant amount of domain knowledge to be carried out. It is common that during this transformation of data choices are made that affect the final picture we obtain of the system under study. Considering the ATNA message fields we just discussed, we seem to be able to capture event information, which may be mapped into the OpenSLEX corresponding elements. The next section explains the transformation of the captured event data in order to infer new information. This will allow to obtain a more complete picture of the whole system.

Integration of ATNA messages OpenSLEX provides the meta model to shape the extracted information, with the purpose of minimizing the impact of the data extraction and transformation stages on the result of the analysis. Transforming the ATNA messages into this new representation enables the application of process mining without any semantic loss of the original data. This is achieved by considering the data view in addition to the process view, avoiding flattening multidimensional data into simple event logs. Figure 5.8 shows the steps in the data transformation process in order to capture a picture as close as possible to the original system:

a) Figure 5.8.a shows the situation in which we only obtained event information from the system. This matches the situation we face when dealing solely with ATNA messages. These messages are, in the end, events emitted by different actors in the healthcare ecosystem under study. These events contain valuable information that will let us infer some of the other sectors of the meta model. Using the information

73 5. Mining Audit Trails

Legend (a) (b)(c) (d) (e)

Versions Versions Versions Versions Versions

Input Events Events Events Events Events

Objects Objects Objects Objects Objects

Unknown or Cases Cases Cases Cases Cases not required Data Data Data Data Data Model Model Model Model Model Process Process Process Process Process Model Model Model Model Model Derived

Figure 5.8: Inference of the missing elements in the meta model, starting from the events (a) and finishing mining a model (e).

in the ParticipantObjectTypeCode field, we can infer the classes involved in the data model of the system. The ParticipantObjectDetail key-value pairs provide the information about the attributes of such classes.

b) Figure 5.8.b represents the next step in which, after discovering the data model underneath the global system, object instances are mapped into it. To do so, the field ParticipantObjectID helps us to identify the unique entities that exist for each data class.

c) Figure 5.8.c depicts the subsequent step, in which we infer the object versions involved in the process. These object versions are object snapshots at a certain moment in time, and they can be reconstructed by matching the key-value pairs in the ParticipantObjectDetail field of the ATNA events, to the object id obtained in the previous step (Figure 5.8.c). Reconstructing these event versions will help us understand the object’s life-cycle within the process. And not only that, but applying primary and foreign key discovery techniques [149, 150] will make it possible to uncover links between objects belonging to different classes. In the next step we will see how these connections can be exploited to correlate events in different ways.

d) So far we have been able to capture events, infer the data model, and extract object and object version information from the data. However, a case notion, the context, is needed in order to group events in cases. These cases will build the event logs necessary for process mining. It is in this step (Figure 5.8.d) in which one of the main benefits of this technique arises: the independence of the case notion from the event capturing phase. Events can be correlated in many different ways. One of them is to select a common attribute as a case identifier, which is the most common way to build logs nowadays. However, our meta model gives us an advantage with respect to traditional methods: the existence of links between events and object versions. As has been described in the previous step, relations

74 5.3. Data Warehouse Approach

between object versions can be discovered. This means that objects can point to others, as they do in databases with foreign keys, e.g., a report object points to a doctor and a patient. This enables a new way to correlate events that might not share any common attribute (doctor and patient events), by means of a linking object (report). The data model structure discovered from the data will determine all the possible combinations (case notions) that can be made in order to build event logs, making possible to have a multi-perspective view on the data.

e) Only when case notions have been discovered, and the logs representing the different views have been built, we can proceed with the process discovery. Figure 5.8.e shows the step in which process models are discovered using existing process mining techniques. This is the step that enables further analysis of the data combining process and data views in a centralized and comprehensive manner.

5.3.3 Discussion and Issues The above section 5.3.2 describes the integration of IHE ATNA messages into the OpenSLEX format. This requires to infer the missing parts of information step by step from the ones available in the ATNA message attributes. As a result, we obtain a global view of the data and process sides, minimizing data loss during the extraction and transformation processes. The goal was to obtain a global, multi-perspective view of the data, reducing data loss as much as possible. One of the benefits of this approach is that collecting all the information in a standardized meta model enables the application of analysis techniques independently of the origin of data. Decoupling the analysis from the data extraction by means of a standard integration process makes it possible to apply many different analysis techniques with minimum effort.

Traces, Perspectives, Semantics Regarding the issues of the previous approach, discussed in section 5.2.6, the integration of IHE ATNA audit messages in OpenSLEX presents viable solutions. Both, flexible trace identification and the enablement of different perspectives are inherently present in the OpenSLEX meta model. The mapping, based on the semantic matching of the meta-models, is less problematic than in the direct mapping approach, since all attributes are mapped 1:1 (no concatinations) and the relationship between the audit message fields is maintained via the event, object, and version relationships in the OpenSLEX meta model.

Audit Message Assumptions As described in section 5.3.2 this approach assumes the presence of the ParticipantObject data in the audit messages to (1) populate the data model and to (2) derive objects and object versions from it. However, the ParticipantObject information is not always

75 5. Mining Audit Trails

present in IHE ATNA audit messages. The existence and quality of this information depends on the respective IHE integration profile and the implementation of the actors and transactions.

Log Data Access

To retrieve a mineable XES event log from an OpenSLEX database, a detailed SQL query has to be composed (cf. listing 2 in [144]). This query has to encompass all aspects regarding the choices of perspective and the definitive mapping to the fields of the XES Concept extension. While SQL is the widely accepted standard for querying relational databases, the composition of OpenSLEX queries for the purpose of generating process mining logs is not common ground and not standardized.

OpenSLEX provides the means to store audit messages in a way that taking different perspectives on the process data remains possible. It can be the basis for future process- aware ARRs, but only if we find a way to make the process data accessible in a standardized way.

5.4 Process Mining Interface

To overcome the issues mentioned above in sections 5.2.6 and 5.3.3, an open standards- based process analytics interface for healthcare information systems is proposed in this section. It enables the development of tools that combine the easy applicability of an integrated suite with the ability to integrate different data sources. It aims to make existing process mining tools the business intelligence tools the radiology community wants (cf. 3.5.2), but is, of course, not limited to the domain of radiology.

To this end, this third approach aims to show how existing concepts can be utilized and what changes in the standard are necessary to build a process mining interface based on HL7 FHIR. The validation of this approach also contributes to the field by presenting a novel method to utilize a process simulation tool in a healthcare environment.

The next three sections describe which standards and tools were used in building the in- terface test environment and how we utilized and extended them to enable process mining based on HL7 FHIR. Figure 5.9 depicts the three steps (1) simulate, (2) store&provide, and (3) analyze, that aim to show how the open standardized process analytics suite works. The circles represent the data consumed and produced in those three steps.

To test the interface, a simple process, basically the running example, was used. Fig- ure 5.10 shows the simplified process model for an examination in a radiology practice using BPMN. It comprises the main steps from the appointment scheduling to the distribution of the diagnostic report (cf. section 1.1.1). The main difference is that in some cases of our simulation, the first step (Schedule Appointment) can be skipped and the patient arrives at the admission desk without an appointment.

76 5.4. Process Mining Interface

Figure 5.9: The three steps of the interface test setting including the respective consumed and produced data. The numbers correspond to sections or figures.

5.4.1 Simulate

In order to be able to automatically generate process data, some sort of process engine or simulator is required. Burattin [151] developed a tool specifically designed to simulate processes and generate event logs for process mining, the Processes and Logs Generator (PLG2). The tool allows to generate and simulate random BPMN models, and to add randomized noise (e.g., double activity execution, skipping activities). The tool also allows to load an existing model, in our case the model from figure 5.10, and simulate it.

Figure 5.10: BPMN process model of the radiology practice workflow.

To use PLG2 for the simulation, we needed to make REST calls to our HL7 FHIR server. PLG2 allows to specify the execution time of different activities using python scripts [151]. We adapted those scripts to execute REST calls using Client for URLs (cURL). By default PLG2 provides a single parameter, that is, the case identifier (caseId), to these python functions. We used this parameter to make the process instances distinguishable by deriving resource identifiers from it (i.e., patientId and encounterId).

77 5. Mining Audit Trails

Each activity in the process from figure 5.10 was extended with REST calls, creating, reading, or updating resources and executing operations on the FHIR server (according to the mapping described in the next section). For the evaluation, the process was then simulated 10 times, each run resulting in one process instance recorded on the server.

5.4.2 Store & Provide In the second step, we set up a FHIR server including the required extensions and operations to automatically record audit trails, and to transform and provide this information in the XES format for process mining.

FHIR Server. We implemented our FHIR Server based on the open-source project HAPI-FHIR Starter2. This project provides a fully working FHIR server, including a database connection, based on the HL7 API (HAPI) FHIR JPA project. Adjustable configuration files and the interceptor framework [152] create high flexibility for custom changes and for adding extensions to the existing server implementation. The HAPI FHIR library [77] supports the custom definition of operations via the Operation annotation in a provider class. Once the implemented provider is registered as part of the server initialization, the operation is available on the server. We utilized the Consent Interceptor, which amongst other functionalities has the ability to hook into the point of the server code, where a CRUD operation (e.g., creating an appointment or reading a patient record) has been finished. One of the Consent Interceptor’s roles is to write audit trail records, creating an AuditEvent resource every time an operation has been finished successfully or with a failure. In addition to the interceptor implementation, we provided the FHIR operation $fhir- ToCDA as part of our custom extensions to the server implementation. The operation can be executed on a specific instance of the DiagnosticReport resource and it returns an empty document to the client (see listing 5.3. An AuditEvent recording the execution of this operation in context of a radiology workflow encounter will, for mapping purposes, be interpreted as a report transmission activity. Listing 5.3: Query for a DiagnosticReport resource invoking the $fhirToCDA operation.

1 GET [fhirserver]/DiagnosticReport/[Id]/$fhirToCDA

To query for an event log in the XES format, we extended our FHIR server by the $xes operation. This operation is defined to work on the AuditEvent resource type and performs the actual identification and transformation of all AuditEvents of the radiological workflow “rad-wf” into the XES format (see listing 5.4). Listing 5.4: Query for AuditEvent resources invoking the $xes operation.

1 GET [fhirserver]/AuditEvent/$xes?plandefinition=PlanDefinition/rad-wf 2https://github.com/hapifhir/hapi-fhir-jpaserver-starter

78 5.4. Process Mining Interface

Extending AuditEvent. We filled the AuditEvent resource with request details that are automatically provided for any standard CRUD operation. In order to be able to query for relevant AuditEvent resources, we needed to identify grouping elements. We decided to extend the AuditEvent resource by references to the Encounter and PlanDefinition resources (cf. section 7.2.1). Geared to the other resources containing the Encounter resource reference as part of their standard FHIR resource definition, we named the extended AuditEvent element “encounter”. An additional extension “basedOn” is used to reference the PlanDefinition resource “rad-wf”, that defines the radiological workflow. This element can later be used to filter AuditEvent resources related to the executions of the radiological workflow process, while Encounter references are used to distinguish the single process instances, that is, the traces.

Mapping FHIR AuditEvent to XES. For the test setting, we base our mapping on the assumption that Encounter identifier can be utilized as trace identifiers and that recorded events refer to a common process description, i.e., a medical guideline or pathway defined as a PlanDefinition. Of course, this is just one perspective, and different perspectives can be considered (cf. section 5.4.5).

Figure 5.11: Venn diagramm depicting the grouping of AuditEvent (A) resources based on their references to PlanDefinition (P) and Encounter (E) resources.

79 5. Mining Audit Trails

Let R be the set of all resources on the FHIR server. Let A ⊆ R be the set of all AuditEvent resources, and E ⊆ R be the set of all Encounter resources, and P ⊆ R be the set of all PlanDefinition resources. All three subsets are disjoint, i.e., A ∩ P = ∅, A ∩ E = ∅, and E ∩ P = ∅. Resources can refer to other resources via the predicate refersTo(r, r0):⇔ (r, r0) ∈ R, where r’ is referenced by r, i.e., r contains the identifier of r’. A visual representation of the grouping of the AuditEvent resources A is presented in figure 5.11.

Let pw ∈ P be the PlanDefinition resource “rad-wf” defining the radiology workflow. Then, Aw = {a ∈ R|∀a∈A refersTo(a, pw)} is the set of all AuditEvent resources recorded during the execution of radiology workflows.

For our mapping, let Aw be a set of disjoint sets Awi, where every Awi represents a set of AuditEvents recorded during a specific radiology workflow encounter ∃e ∈ E of one patient. Then, every Awi will be mapped to a trace σ in an XES event log L. For testing the approach, we only map to mandatory fields in L, e.g., concept:name of the event (providing the activity name) and time:timestamp of the event (for ordering). Table 5.3 describes which recorded combination of operation and resource is mapped to which activity name. The timestamp is mapped directly from the recording time AuditEvent.recorded.

Table 5.3: Mapping table of operations on specific FHIR resources to activities of the radiology practice workflow, ordered by occurence in the simulated model in figure 5.10.

Operation FHIR Resource 7→ Activity create Appointment Schedule Appointment update Appointment Patient Admission create Procedure Radiological Examination create Media Diagnosis create DiagnosticReport Report Writing update DiagnosticReport Report Attestation execute *$fhirToCDA Report Transmission

5.4.3 Analyze

Querying the FHIR server for AuditEvent resources using the $xes operation returns an XES event log. Since the operation already utilizes XES standard extensions (i.e., Concept and Time), the semantics of the fields are clear for process mining tools. The next step is to analyze if the simulated process matches the one stored and provided by the HL7 FHIR server. Thus, we want to compare the input model with a model generated based on the retrieved XES event log. We use the process mining tool ProM 6.9 [58] with the Visual Inductive Miner plugin [128] to generate a model.

80 5.4. Process Mining Interface

5.4.4 Test Results This section shows three exemplary results of the implementation: (1) a FHIR resource generated by the simulator, (2) the corresponding event in the XES event log, and (3) the process model, created based on the event log. To provide a comparison to the first direct mapping approach, again, the Report Writing step is taken as an example.

FHIR Resources As described in the mapping in table 5.3, the Report Writing activity is associated with creating a DiagnosticReport resource. The simulator thus executes the cURL statement shown in listing 5.5, providing a JavaScript Object Notation (JSON) representation of the DiagnosticReport resource in the body.

Listing 5.5: cURL statement for creating a DiagnosticReport resource.

1 POST [fhirserver]/DiagnosticReport 2 {"resourceType":"DiagnosticReport", 3 "subject":{"reference":"Patient/[patientId]"}, 4 "encounter":{"reference":"Encounter/[encounterId]"}, 5 "status":"preliminary", 6 "code":{ 7 "coding":[{ 8 "system":"http://loinc.org", 9 "code":"LP31534-8", 10 "display":"Study report" 11 }] 12 } 13 }

This triggers the creation of an AuditEvent resource. Listing 5.6 shows the resulting resource in abbreviated form, focusing on the elements relevant for the mapping.

Listing 5.6: JSON representation of the generated AuditEvent resource.

1 {"resourceType":"AuditEvent", 2 "extension":[ 3 {"url":"https://fhirserver.com/extensions/auditevent-encounter", 4 "valueReference":{"reference":"Encounter/[encounterId]" }}, 5 {"url":"https://fhirserver.com/extensions/auditevent-basedon", 6 "valueReference":{"reference":"PlanDefinition/rad-wf"}} 7 ], 8 "action":"C", 9 "recorded":"2020-08-14T08:42:51.523+02:00", 10 "entity":[{ 11 "what":{"type":"DiagnosticReport"}, 12 "detail":[{ 13 "type":"RequestedURL", 14 "valueString":"[fhirserver]/DiagnosticReport/" 15 }] 16 }] 17 }

81 5. Mining Audit Trails

As specified, the created AuditEvent resource refers via extensions to the respective Encounter resource and to the PlanDefinition resource “rad-wf” that defines the radiology workflow. The action field indicates the type of operation (C=Create) and the entity element contains details about the manipulated resource, i.e., the DiagnosticReport. The recorded field contains the timestamp.

XES Log The query for AuditEvent resources with the $xes operation returns an XES event log. Listing 5.7 shows the event log in abbreviated form, focussing on the part mapped from the AuditEvent resource in listing 5.6. Listing 5.7: Part of the XES event log as returned by the AuditEvent query invoking the $xes operation.

1 2 3 4 5 6 7 8 9 10 11

The detail of the resulting XES log shown in listing 5.7, contains the concept:name attributes on log and trace level, derived from the referenced PlanDefinition and Encounter resources respectively. The event (Report Writing) was generated for the AuditEvent resource presented in the previous section 5.4.4, according to the mapping from table 5.3.

Process Model Figure 5.12 shows the resulting model after importing the XES event log in ProM and analyzing it with the Inductive Visual Miner [128]. It is split up in two parts to match the page width and to highlight the similarity to the input model in figure 5.10. All traces were identified based on their Encounter reference and all AuditEvents were correctly mapped according to table 5.3. All 10 recorded executions are visible, with 5 skipping the first (Schedule Appointment) activity. Beyond the 10 executions published in [29] and shown in figure 5.12, we conducted tests with up to 1,000 simulated runs.

5.4.5 Discussion and Issues The presented work is a proof of concept, making the case for a standards-based process analytics interface and making sure that the standard in development, HL7 FHIR, is aware of the capabilities and requirements of process mining. We were able to show how only minor extensions, namely the addition of Encounter and PlanDefinition references,

82 5.4. Process Mining Interface

Figure 5.12: Process model generated with the Inductive Visual Miner. and a simple mapping, enabled the analysis of the running example radiology practice workflow with process mining tools.

AuditEvent vs. Provenance In this third approach we analyzed AuditEvent resources, building on existing approaches that aimed to analyze audit data [10, 28, 153]. However, HL7 FHIR also makes use of the concept of provenance, recording “information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness” [154]. A Provenance resource is created by the client (i.e., the person or system conducting the work) as opposed to the AuditEvent resource, which is created automatically by a server. The client should explain for what purpose a resource was edited (created, updated, deleted). In addition, a client can add information about the process (or policy) behind the edit, and provide reasoning why something was done (i.e., which path of a process model was taken).

However, Provenance is (1) not widely used (yet), and (2) not documenting non-changing access to a resource (i.e., read). To summarize, Provenance can provide more detailed information on a process, but relies on the clients to record it and might thus be not present at all. Further research on the utilization of the Provenance resource for process mining is needed.

Considering Different Perspectives

In our example, Aw, the set of all AuditEvent resources recorded during the execution of a radiology workflow (as defined by the referenced PlanDefinition “rad-wf”), was split to traces based on the referenced Encounter resources. However, in fact, Aw represents a multiset of traces, that can be split based on the perspective you take on the data. A more generic approach should thus indicate the grouping behaviour in the query, based on the concepts developed in [88].

Another viable perspective would be, for example, to look at the active participants of the workflow. AuditEvent.agent is described as “an actor taking an active role in the event or activity that is logged” [155]. Mapping name and role to the corresponding fields

83 5. Mining Audit Trails

of the XES Organizational extension allows for additional analysis, e.g., social networks or handover of work for medical or care personnel.

When you want to trace specific patients over multiple encounters, e.g., for long-term studies spanning numerous visits, AuditEvent.entity presents a solution and can act as the grouping query parameter. While the presence of identifying patient information is not guaranteed, the resource definition explicitly states: “It is a best practice to include a reference to the Patient/Subject affected by any auditable event, in order to enable Privacy Accounting of Disclosures and Access Logs, and to enable privacy office and security office audit log analysis.” [155].

5.5 Discussion

This chapter showed how audit trails, generated by the means of IHE ATNA, can be utilized for process mining. Three approaches that partly build on each other are presented. (1) The first approach transformed the audit messages directly to XES events and had a hard-wired mapping based on patient identifiers. (2) The second approach enabled the integration of audit messages into the OpenSLEX meta model. It provided a basis for a process-aware ARR. (3) The third approach described an interface to query for event logs in a standardized way.

While in hindsight the first approach seems naive, it provided the basis for the following ones. The process-aware ARR allows to obtain a global, multi-perspective view of the data. The standardized FHIR-based interface makes the process data available for a broader audience and could also be used for open health data platforms.

5.5.1 Data Quality in Audit Logs

In 2013, Cruz-Correia et al. [153] were the first to explicitly make the connection between standardized auditing in healthcare and process mining. They also looked at the IHE integration profile ATNA and analyzed audit trails from four different hospitals in Portugal. They identified several data quality issues and found that “there is no Portuguese law or regulation enforcing health care institutions to have complete and secure audit trails” [153].

“Although there is some awareness for the need to have quality in audit trails, the existing AT are very poor and not able to provide proper traceability or help analyse Health Information Systems usage. Regarding the AT analysed, the lack of internal structure, data quality and precision limits the usefulness for legal issues and health information systems improvement.” Cruz-Correia et al., Analysis of the quality of hospital information systems audit trails [153]

84 5.5. Discussion

The main reason for the bad quality of the recorded audit trails was the fact that the recording information systems did not follow a standard. The authors checked multiple audit trails for different data quality dimensions and came to the conclusion that IHE ATNA and RFC-3881 would provide good starting points for improving the general situation. In this thesis, all approaches were developed and tested in a lab environment and in context of the radiology practice workflow in the running example. The scope of the test cases was set to check the general applicability of ATNA logs for process mining. Of course, a higher effort regarding preprocessing, mapping and analysis will be required to mine logs recorded by real-life information systems. Based on the event log maturity level classification system described in section 2.1.3 we classify ATNA audit trails, as described in the standard [87] and [88], as level 3 because:

• recorded events do match reality (trustworthy),

• there is an automatic recording mechanism (complete),

• privacy and security are highly valued and the very reason for the recordings (safe),

• and there is no explicit notion of process instance and activities are coded, but with a very different use case in mind (semantics).

It does not qualify for level 4 because of the lack of explicit notions of process instances and activities. Still, the recorded event log exceeds the criteria for level 2 as the completeness is guaranteed and it is not possible to bypass the information system. Additionally, the semantics of the recorded audit messages are well-defined as all fields in the messages have to be filled according to vocabularies defined in the IHE, RFC-3881, and DICOM standards [87]. Mans et al. [15] point out a big issue regarding data quality in HIS. The low granularity of timestamps, only the day of the event is recorded in some systems, consequently leads to problems identifying the correct order of events. By using IHE ATNA audit messages we can guarantee timestamps with very high granularity. This is ensured because the systems in an IHE compliant environment also implement the Consistent Time integration profile [156] that defines mechanisms to synchronize the time base between multiple actors with a median error less than 1 second.

85

“Those with bad luck should at least attempt to balance it with good sense.” Joe Abercrombie

CHAPTER 6 Compliance Checking

87 6. Compliance Checking

As discussed in Section 1.3.1, long-running processes pose special challenges for process conformance checking. In this chapter a case study on measuring guideline compliance for melanoma surveillance based on conformance checking is presented. With the surveillance period after the excision of malignant melanoma being 10 years, this qualifies as a long-running process. This chapter is mostly based on the author’s joint work with Rinner et al. in [31] and [30].

• In Section 6.1 the motivation and background for the case study on melanoma surveillance are described.

• Section 6.2 utilizes the reporting template outline for case studies presented in chapter 4.5 to describe the characteristics of the case study at hand.

• Section 6.3 provides conceptual extensions towards data preparation for imprecise log data. A method for data preparation using a specific naming convention to model the time aspects is described, and the chosen conformance checking technique is introduced.

• Section 6.4 shows the results of the case study with follow-up guidelines for melanoma patients and anonymous patient data from the Department of Der- matology at the Medical University Vienna (DDMUV).

• In section 6.5 the results of the case study and possible implications are discussed.

6.1 Melanoma Surveillance

Skin malignancies are recognized as a major and global health problem. Accounting for about 5% of all skin cancer cases melanoma is the most dangerous form of skin malignancy and causes about 90% of skin cancer mortalities [157]. Incidence rates in many European countries are actually ranging between 12 and 15 cases per 100.000 inhabitants. Currently, the increasing rates are levelling off in some countries. In contrast, however, for distinct subpopulations such as elderly men the rates are still increasing [158]. Early detection of melanoma is of utmost importance and is leading to a favourable prognosis. Since melanoma may appear years after the excision of the primary tumor, patients with melanoma are monitored closely, usually following a predefined protocol, to allow timely detection of recurrent disease [159]. In order to improve the surveillance of patients with melanoma traditional studies depended on manual data acquisition. The goal of this work is to show how existing data from routine care can be combined from different sources and reused for process mining to automatically detect processes and compare them to medical guidelines using conformance checking. Previous studies have indicated the potential of process mining in this area [160, 161]. In these studies, however, only a maximum of 10 process instances were analyzed.

88 6.2. Characteristics of the Case Study

Moreover, several data challenges were pointed out, specifically that the time granularity of the logged data was too coarse [160]. Hence, this case study addresses the following research questions:

1. How can the existing clinical data be reused for the application of process mining?

2. How can data of recurring events with time constraints that span a long period of time be prepared to apply process mining?

3. How can we apply process mining to check guideline compliance?

4. What can we learn from process mining in the context of surveillance of melanoma patients?

6.2 Characteristics of the Case Study

Following the reporting template presented in section 4.5 and the terminology introduced by Rojas et al. [16], this section provides an overview of basic characteristics of the data, clinical aspects, and aspects of the process mining techniques.

Basic characteristics of the data

The data comes from a clinical support system for oncology in Austria. The observation period of this study covers 7 years (January 2010 to June 2017), with a total of 1,023 cases, i.e., patients. More details on the base data is listed in section 6.4.

Clinical aspects

This case study analyzes organizational processes in the context of follow-up visits of patients diagnosed with malignant melanoma (ICD-10: C43.-). The follow-up visits typically have an outpatient character (SCTID: 440655000). We categorize it in the Medical specialty (SCTID: 394733009), since it belongs to the sub-specialty Dermatology (SCTID: 394582007).

Process mining aspects

The study poses a specific question (i.e., guideline compliance) and utilizes the con- formance perspective. This paper presents a semi-automated implementation strategy, providing a novel data preparation approach facilitating the use of the tool ProM. The analysis follows the basic approach, using the ProM plugin Multi-perspective Process Explorer (MPE) [162].

89 6. Compliance Checking

6.3 Methodology

The process of surveillance of melanoma patients starts with the detection and the excision of the primary tumor (i.e., the baseline visit). Melanoma patients are staged according to the American Joint Committee on Cancer (AJCC) staging system (i.e., stage I to IV). After excision of the primary tumor patients start a 10 year surveillance period. Depending on AJCC staging, follow-up visits have different surveillance intervals and include different types of examinations (e.g., clinical examination, analyzing tumor markers, lymph node sonography, computed tomography of the abdomen, PET-CT). In AJCC stage I for example the interval of the follow-up visits is 6 months in the first 5 years and one year between the 5th and the 10th year. In the other AJCC stages intervals of 3 months in the first five years and 6 month between the 5th and the 10th year are scheduled. The higher the AJCC stage the more often examinations are performed as part of a follow-up program. During surveillance, the AJCC stages are re-evaluated and patients can be assigned a higher AJCC stage and start the corresponding follow-up surveillance from the beginning. In this work we refer to this upgrading as a stage change. Since the observation period of this study only covers 7 years (January 2010 to June 2017) patients are still compliant to the guideline if they have not missed the next-to-last or last follow-up visit before the end of the study (i.e., June 2017). Patients are considered lost to follow-up when the surveillance is terminated prematurely at the DDMUV (e.g., the patient changed clinic). The events occurring during melanoma surveillance are depicted in Figure 6.1. The start event of the surveillance is the excision of the melanoma followed by the AJCC stage classification and the follow-up visits. Depending on the AJCC stage the number of follow-up visits can vary and the AJCC stage can be re-assessed after each follow-up visit. A patient can be lost to follow-up or complete the surveillance successfully.

Figure 6.1: BPMN representation of the process of melanoma surveillance.

6.3.1 Data Preparation We took the clinical data needed to perform the process mining from a local melanoma registry stored in the Research, Documentation, and Analysis (RDA) Platform of the Medical University of Vienna [163]. Additional information about the transfer of patients between different within the hospital, laboratory results as well as treatment

90 6.3. Methodology information is obtained from the local HIS. Since the melanoma registry is maintained manually, the information from the HIS is used to detect additional follow-up visits re- using information from routine care. The event logs used in this study are created using the JAVA programming language, a JDBC driver to access the data in the Oracle Database and the OpenXES library to create event logs in the MXML format. Conformance checking was performed in the ProM framework. The study was approved by the ethics committee of the Medical University of Vienna (EK Nr.: 1297/2014).

6.3.2 Time Boxing According to the guideline [157], depending on the AJCC stage of the patient, the follow-up treatment takes place at certain time intervals (i.e., every three, six or twelve months) in a repeated fashion for ten years. Figure 6.2 shows a Petri net model where for each AJCC stage (i.e., I, II, III, and IV) the follow-up visits after three, six or twelve month (i.e., 2Q means 2nd quarter, 3Q 3rd quarter, and 4Q 4th quarter) for ten years are shown. For example, I_F_01_1Q corresponds to an AJCC stage I follow-up visit in the first quarter of the first year after the excision. Coding scheme for follow-up events: [AJCC stage]_F_[year]_[quarter]

After each follow-up visit a patient can (a) proceed to any later follow-up visit, (b) have a state change, (c) be lost to follow-up (LTFU), or (d) complete the surveillance (i.e., IN_FUP meaning still in follow-up). The existing process execution logs record the same event for each occurrence (e.g., follow-up visit for each follow-up visit), so the process mining algorithms are not able to distinguish between these events depending on the fixed time period specified in the guideline (e.g., second follow-up visit after one year). Using the simplified process model depicted in Figure 6.1, it is not possible to distinguish the different follow-up visits automatically and as a consequence conformance to the guideline cannot be checked using current process mining algorithms. To overcome this problem we propose a naming convention based on time boxing for recurring events commonly described in medical guidelines. During the time boxing, each activity (e.g., each follow-up visit) is allocated (i.e., aligned) to a predefined fixed time period it matches in, called a time box. Each time box corresponds to an event in the medical guideline and the events in each time box are named according to the name of the time box. The event log follows the same naming convention as the process model. To generate the event log, all follow-up visits are assigned to the corresponding (i.e., temporally closest) time boxes. All follow-up visits in one time box are merged and represented as one. In order to analyze over-compliance, multiple events could be assigned to the same time box (without merging) and the resulting self-loops considered during the analysis.

91 6. Compliance Checking

Figure 6.2: A simplified Petri net model with applied time boxing corresponding to the guideline used at the DDMUV [157].

6.3.3 Conformance Checking The conformance checking was done using the process mining framework ProM in version 6.6 and the respective plug-in MPE [162]. It allows for fitness and precision calculation and provides different views on the data, including (1) a model view, depicting the base model petri net, (2) a trace view, making it possible to investigate individual traces, and (3) a chart view, showing the distribution of attribute values in the log for certain parts of the model. The MPE is an advanced tool that integrates state of the art algorithms described in [57] (fitness) and [164] (precision), that are also able to integrate different perspectives, that is, data, resource, and time. The configuration for penalties on log and event moves was adapted to the specific use case. A valid configuration for the trace alignment parameters (penalties for moves on the log/model) had to be identified. Due to the pre-processing there are no wrong events (events present in the log but not in the model) save for the LTFU (Lost to follow-up) event, so the alignment algorithm must always identify the missing events (events in the model that are missing in the log).

6.4 Results of the Case Study

The DDMUV is a tertiary referral centre that offers a long-term surveillance program for melanoma patients based on the European guideline on melanoma treatment [157]. An example for the follow-up sub process in the European guideline on melanoma treatment modelled in BPMN can be found in [160]. The melanoma registry at the DDMUV contains data of baseline and follow-up visits of melanoma patients. Excisions are documented way back to the early 1990s, a continuous

92 6.4. Results of the Case Study

documentation of the follow-up visits started 2010. In 2017, the melanoma registry covered about 2,200 patients. In this study we included all 1,023 patients (43% females, mean age 59 ± 17.5 years) with baseline visit (i.e., excisions) after January 2010 and at least one follow-up visit since patients without a single follow-up visit only had the excision at the DDMUV and no data is available in the melanoma registry. Besides the demographic data, different characteristics of the identified melanoma are documented. For the baseline visit this includes among others, (1) melanoma subtype (superficial spreading melanoma, nodular melanoma, lentigo maligna melanoma, acral lentiginous melanoma and others), (2) anatomic site (e.g., abdomen, hand, foot, head), (3) depth of invasion, (4) date of surgery for the primary tumor and (5) staging information. More than one primary tumor can be documented. Only melanoma staging is used for conformance checking.

6.4.1 Data Preparation We extracted five different event logs from our real world data, one including all patients (i.e., I-IV), and four for each AJCC stage separately (i.e., I, II, III, IV) based on the highest AJCC stage of the patient. If a patient initially started with AJCC stage I and then moved to AJCC stage II the patient is represented in the AJCC II log file. Table 6.1 lists the number of patients in each log.

Table 6.1: Number of patients and mean number of events per case in the base data.

No of patients No of mean events per case I-IV I II III IV I-IV I II III IV LTFU 286 146 95 20 25 8.23 6.78 9.2 10.6 11.2 F IN_FUP 153 45 50 17 41 12.8 10.1 14.9 13.4 13 Total 439 191 145 37 66 9.82 7.57 11.2 11.9 12.3 LTFU 379 167 124 42 46 8.67 7.09 8.90 10.7 11.9 M IN_FUP 205 43 70 33 59 13.3 10.7 13.9 15.1 13.4 Total 584 210 194 75 105 10.3 7.84 10.7 12.6 12.7

LTFU 665 313 219 62 71 8.48 6.95 9.03 10.7 11.6 P IN_FUP 358 88 120 50 100 13.1 10.4 14.3 14.5 13.2 Total 1,023 401 339 112 171 10.1 7.71 10.9 12.4 12.6

The number of patients per AJCC stage decreases with higher AJCC stage, which corresponds to the fact that most melanomas in Austria are diagnosed in early stages [165]. Most patients (n=401) were in AJCC stage I. This group also had the highest number of patients lost to follow-up (n=313, 78%). The ratio of patients IN_FUP (i.e. in follow-up) was the highest in AJCC stage IV with 58% (n=100). There is no difference between proportion of individuals lost to follow-up (LTFU) between men and women. Men were generally older than women and there was no significant difference between the LTFU and IN_FUP in respect to the age. Patients in lower AJCC stages were generally younger

93 6. Compliance Checking

(I: mean age 57 ± 17 years; II: mean age 59 ± 18 years; III: mean age 60 ± 18 years; IV: mean age 63 ± 16 years)

6.4.2 Conformance Checking To check the conformance of our guideline models in regard to the recorded event logs, we replayed the logs on the models using the MPE. For the alignment, the default costs of the MPE for missing events in the log (value: 2) and missing activities in the model (value: 3) leads to undesired behaviour. The alignment algorithm identifies follow-up visits after a long period of time as wrong events. When the penalty for a sequence of missing events exceeds the penalty for a wrong event, the alignment algorithm will declare the current event wrong. In order to ensure a correct alignment, the maximum number of skipped follow-up visits in all traces was identified and the penalties adopted respectively. Since the maximum number of consecutive skipped events for one trace is 19 in our data, we chose a penalty of 1 for missing events and 20 for wrong events. For the LTFU event we reduce the wrong event penalty to 0, thus only penalizing the missing IN_FUP event at the end and not overvalue the outcome indicator for the fitness calculation. The results in the form of fitness and precision indicators can be seen in table 6.2. Table 6.2: Average fitness and precision for each log.

AJCC No of Avg. fitness % Avg. precision % stage patients (min - max) (#observed / #possible behaviour) I 401 98,6% (91% - 100%) 75,1% II 339 98,0% (82% - 100%) 71,4% III 112 98,2% (85% - 100%) 65,0% IV 171 98,7% (88% - 100%) 63,1% I-IV 1.023 98,4% (53% - 100%) 87,0%

Our measurements show that the guideline models have an overall comparable and good fitness value, that is, the model generally explains the behaviour seen in the log. This originates from three facts: (1) the renaming and clustering of activities was done based on the terminology that was also used for the guideline model, (2) the time boxing method presented in section 6.3.2 leads to an ordered sequence of events, where loops and duplicates cannot occur, and (3) the only wrong events (i.e., events present only in the log, not in the model) are the LTFU events. The precision of the model for stage I is 75.1% and declines to 63.1% for stage IV. The ratio between observed and possible behaviour indicates under-fitting for low values. The explanation for the generally lower precision values is that the guideline models include the whole time period of ten years of follow-up visits, while the event logs only cover a maximum of seven and a half years. Thus, modelled events like I_F_08_1Q (i.e., stage I, eighth year, first quarter) will never be reached during replay, leading to a lower precision. The explanation for the declining values of precision is that the guideline models for

94 6.4. Results of the Case Study higher stages allow for all the lower stages’ events too, since a patient can start in stage I and be re-evaluated to stages II, III or IV during his follow-up visits. The amount of possible behaviour is thus higher while the number of actual patients in the stages is similar (II) or significantly lower (III and IV) than in stage I.

Figure 6.3 shows the most frequent trace recorded in the complete log. 148 of the 1023 patients follow this trace where they (1) start with the excision (Start), (2) are staged in AJCC I (StageChange), (3) go to their first follow-up (I_F_00_3Q) and (4) are afterwards lost to follow-up (wrong event LTFU). The following missing event (IN_FUP) is in the guideline model but was not present for those traces in the log. Finally, the End event concludes the trace.

Figure 6.3: The most frequent trace in the complete log (I-IV). [fitness 98.8%]

Figure 6.4 shows a trace where the patients skipped the second, third and fourth follow-up visits before dropping out the monitoring entirely. With our parameters for the alignment these traces have a fitness of 96,19

Figure 6.4: A trace of a patient that skipped three follow-up visits. [fitness 96.19%]

Figure 6.5 shows a patient that started in stage I and was re-evaluated to stage II and later to stages III and IV. All in all just 1 follow-up visit during stage II was missed and the fitness is very high. The trace spans over the whole observation period, with the start in 2010 and the last follow-up in late 2017. Thus, the patient was identified as in follow-up (IN_FUP).

Figure 6.5: A trace comprising all four stages and only one missing follow-up visit. [fitness 99.8%]

In Figure 6.6 the patient classified in stage II skipped multiple follow-up visits and left the monitoring entirely after four years. The low fitness value correlates with the low guideline compliance.

95 6. Compliance Checking

Figure 6.6: A trace with multiple skipped events and thus relatively low fitness. [fitness 89.6%]

6.4.3 Applied Process Discovery In addition to conformance checking we applied several techniques and tools to the data at hand in an explorative manner, to try and find interesting trends in the data and to validate our results with domain experts. Figure 6.7 shows the dotted chart analysis [166] of the stage I event log. The x-axis shows the timestamps of the events. The y-axis lists all cases sorted by the timestamps of their start event, descending. The dots represent the events, color-coded based on event label according to the legend on the left.

Figure 6.7: Dotted chart analysis of the stage I event log.

The recording period of 7.5 years, from January 2010 to June 2017, can be seen on the x-axis. Observations made based on this dotted chart:

1. A clear ’rainbow’ pattern is visible since the sequence of events recorded for stage I is linear and follow-up visits occur at generally regular intervals. 2. The outcome indicator (IN_FUP/LTFU event) is not visible since the End event occurs at the same timestamp +20 hours and the latter dot overlaps the former.

96 6.5. Discussion

3. The most frequent trace (i.e. ending the follow-up after the first visit in stage I - see Figure 6.3) can be observed due to the high number of End events that occur shortly after the Start and I_F_00_3Q events.

For stages II-IV the number of distinct event types becomes higher and additional stage changes speckle the diagram so the rainbow pattern becomes less noticeable. Figure 6.8 shows a part of the model depicting the flow of patients in stage I. The tool Disco was set to show all activities (100%) and the most frequent pathways (30%). The color of the events (dark - more, light - fewer) and the thickness of the pathways (thicker - more) represent the frequencies in the log. The model also includes the lost to follow-up step that marks an early dropout. Observations that can be made on this model:

1. All 401 patients in stage I start with the I_F_00_3Q event. 148 patients drop out (i.e., LTFU) after this step. This corresponds to the 148 patients of the most frequent trace in figure 6.3.

2. The frequency of following events declines steadily. There are fewer patients in later steps than at the beginning. There are two reasons for that. (1) Not all patients started at the same time, thus not all patients can reach the 7th years’ follow-up visits in the fixed time interval. (2) After each step some of the patients drop out.

3. The sequence of follow-up events is not linear but makes a “braided” impression due to the skipping of single follow-up visits (see also Figure 6.6).

6.5 Discussion

6.5.1 Reuse of Clinical Data for Process Mining We reused existing patient data available from a local electronic health record system in the context of melanoma surveillance. In combination with the local melanoma registry additional follow-up visits and laboratory data to the event log were identified. Creating the log file using a procedural programming approach allowed us to add pre-processing steps. For example we tagged patients that successfully terminate the process, i.e., patients that are still in the surveillance program (IN_FUP) during the creation of the log file based on the time they did not show up before the end, i.e., the number of time boxes missed. Beside the melanoma registry, more than 70 other registries are documented in the RDA platform. In recent work a mapping of the melanoma registry data from the RDA data model [167] to the i2b2 star schema [168] and the OMOP common data model [169] is performed. By adapting our approach to these two widely used data models a greater variety of data could be made available to process mining and conformance checking in particular.

97 6. Compliance Checking

Figure 6.8: A model of the stage I log created using the process mining tool Disco.

98 6.5. Discussion

6.5.2 Events with Time Constraints Spanning a Long Period of Time To perform conformance checking, events in event logs are compared to activities in a process model. Our process model, the melanoma guidelines, covers a time period of 10 years, while the event log only covers 7.5 years (since the melanoma registry only covered 7.5 years). These long running processes are common in the medical domain and have to be considered when modelling the process model. Our approach of time boxing, by coding AJCC stage information and the time dimension into the event names, enabled us to use conformance checking based on imperative Petri net models. We were able to check if patients’ follow-up visits were conforming to the time frames given by medical guidelines. Our approach was solely used to check whether a patient attended a specific follow-up visit. Generally this way of pushing additional information into the model can be widely used to answer diverse questions; the structure of the generated log files has to be adapted to the task at hand. The labelling convention using time information and the pushing additional data into the event names leads to bloated models and adds complexity to the model. In order to prevent uncontrolled growth of number of events, a higher granularity of time information was used. Depending on the AJCC stage the granularity is “three months”, “six months” or “one year”. The model with time boxes only consists of linear paths. To ensure consistent order of events on the same day (i.e., the stage change is detected at the same day as the follow-up visit), we applied activity sequencing by applying a hard coded time of day for each type of event (stage change always at 8 a.m., follow-up-visit at 10 a.m.). By only coding the lost to follow-up into the event logs and not into the model we were able to easily penalize this event in the MPE’s trace alignment – LTFU is the only “wrong event”.

6.5.3 Medical Implications In the work of Kittler et al. [170] the prognosis among patients with thin melanomas depending on the surveillance compliance was analyzed. Patients were considered to be compliant with the follow-up regimen if they had at least one annual follow-up examination and non-compliant if they had follow-up intervals of more than one year. They showed that compliant patients before the onset of recurrence had a significantly better prognosis than non-compliant patients. When using our calculated fitness instead of the fixed time-intervals of Kittler et al. [170] to evaluate the survival, the same effect can be observed in our data, as seen in Figure 6.9. We sampled all 246 patients that stayed in follow-up for more than two years based on their fitness value into three equal-sized groups and used a Kaplan–Meier estimator for survival analysis. We used the tool R (v3.4.4) and the Kaplan–Meier estimator (survival v2.41, Hmisc v4.1, Survminer v0.4). The survival probability of patients with high guideline compliance after five years is about 5% higher compared to the least compliant group.

99 6. Compliance Checking

Figure 6.9: Survival analysis for all 246 patients that stayed in follow-up for more than two years, sampled into three equal-sized groups depending on their fitness.

However, adding the patients that stayed for less than 2 years to the estimator, looking at all 358 patients in follow-up, showed a reversed effect (see Figure 6.10). The main reason was that higher fitness is easier to achieve with a shorter stay and many with a short stay died early, e.g., after being staged in IV and the first follow-up visit.

Figure 6.10: Survival analysis for all 358 patients in follow-up.

In interviews with experts from the DDMUV we found other possible reasons in that (1) patients with a more severe progression tend to follow the guideline more strictly, (2) physicians demand higher compliance when prognosis is worse, (3) patients with lower fitness have follow-up visits less frequently hence the bias in death reporting could be

100 6.5. Discussion lower, and (4) patients that died during follow-up are not lost to follow-up, hence lead to a higher fitness value. This trend is independent of the AJCC stage.

6.5.4 Guideline Compliance Measurement In our approach we pushed the time dimension into the process structure to be able to use the conformance checking capabilities of ProM on imperative models. However, there are two viable alternative approaches: (1) Using a data-aware alignment algorithm would allow to keep the time dimension hidden in the data, thus labelling the follow-up visits just follow-up, avoiding the initially confusing time boxing notation (e.g., I_F_00_3Q). However, we decided to use our labeling approach to also make all missing steps easily visible in the model. (2) The current version of the ProM framework also includes a declarative mining module that derives sets of constraints in form of a declarative model from log files and offers also conformance checking [171, 172]. This needs further investigation, especially in the preparation of a correct declarative constraint set based on the guideline as well as an adapted real log to be replayed. The compliance measurement, calculated and formalized using the fitness dimension in MPE, is promising. Yet it has to be further analyzed under which circumstances it correlates to the outcome of the patients. Further, we plan to analyze how the compliance affects the tumor progression of the patients, i.e., if patients with a higher compliance are less likely to progress to a higher AJCC stage.

101

“Whatever it is you’re seeking won’t come in the form you’re expecting.” Haruki Marukami

CHAPTER 7 Conclusions and Outlook

This chapter concludes the thesis. The more abstract overall objective was to advance the topic of process mining in healthcare. This was achieved by providing a better understanding of the base data and by developing new methods for accessing and analyzing the data.

• In Section 7.1 the research questions from section 1.4 are revisited. Based on these questions it is described how this thesis advanced the topic.

• Section 7.2 shows the impact of this thesis, points out potential future research directions in the field, and discusses the next steps to further advance the topic.

• The last section 7.3, reflects on the field of process mining in healthcare in general and on the thesis in specific. Limitations of the presented work are discussed and the genesis of the thesis is briefly described.

7.1 Research Questions Revisited

This section lists the research questions from section 1.4, provides a short answer and refers to the respective chapters and publications.

RQ1 How can healthcare IT standards be used to overcome the challenges of process mining in healthcare? To answer this question, we raise two sub-questions that address the current state of standardization and the state of the art in process mining in healthcare.

103 7. Conclusions and Outlook

a) Which standards are relevant for process mining in healthcare? To answer this question, chapter 3 provides an introduction and overview of the field of healthcare IT standards. The chapter later focuses on two specific IHE integration profiles that are relevant for the other parts of this thesis. The author’s work in [24] and [25] were the basis for this chapter. b) How do existing studies on process mining in healthcare utilize standards? A literature review on recent case studies was conducted [26] to answer this question in chapter 4. It showed that most case studies do not properly report on the details of their data origin and data characteristics, including their use of healthcare data standards. Thus, the review was later extended to include a reporting template for future case studies in [18] that also takes standards into account.

While the answer to the first sub-question highlights the potential utilizing standards, the second answer shows the gap that still exists in recent approaches. This leads to the second research question.

RQ2 How can we reuse the data captured in the audit trails of healthcare IT systems to discover healthcare processes? Based on the findings that result from answering RQ1, we focused on standardizes audit trails, namely IHE ATNA.

a) Do standardized audit trails provide the information necessary for process mining? This question is answered in chapter 5. A description of IHE ATNA audit events and the more recent HL7 FHIR AuditEvent resource is followed by three approaches to make them usable for process mining [10, 27, 28]. b) How can we make these data records accessible for process mining tools? This question is also answered in chapter 5. Mapping approaches were devel- oped and the audit messages were converted to the XES format [10, 27] for process mining. The integration in OpenSLEX [28] allows for multi perspective data analysis. An HL7 FHIR based approach to access the data concludes this chapter [29].

In answering the first sub-question, we developed approaches to make the data use- able, and in answering the second sub-question we provided standardized interfaces to the data.

RQ3 How can we enable conformance checking in long-running healthcare processes? To answer this question, we conducted a case study on melanoma surveillance data at the DDMUV [31, 30]. The developed method and the results of the case study are described in chapter 6.

104 7.2. Impact and Future

a) How can data of recurring events with time constraints that span a long period of time be prepared to apply process mining? We developed a method to move the time dimension into the control flow dimension by introducing a new naming convention for events. This naming convention was then applied to both, the recorded cases and the guideline model. b) How can we apply conformance checking techniques to measure medical guideline compliance? We used the MPE plugin of ProM [162] to calculate the fitness of the recorded cases in respect to the guideline model. To enable conformance checking on unfinished surveillance periods, we added two events, IN_FUP (in follow-up) and LTFU (lost to follow-up), to the event data. Thus, a surveillance process can end after the first of ten years but still be compliant to the guideline, e.g., if it just started in the last year of the recorded period of time. c) What can we learn from process mining in the context of the surveillance of melanoma patients? We learned how to deal with incomplete process instances in conformance checking. Moreover, we compared our results with the results from an earlier study [170] that did not use conformance checking techniques. We likewise concluded that compliance improves the prognosis for melanoma patients.

For a better understanding of the underlying concepts of process mining and healthcare IT standards, in the course of answering RQ1 and RQ2, this thesis also introduced a running example. The running example was first described in the introduction and extended in the process mining chapter 2. The respective events are later labeled with standard codes in the standards chapter 3. Finally, the running example is used to validate the approaches in chapter 5.

7.2 Impact and Future

The impact is twofold: (1) the work on this thesis already had an impact on the standardization of HL7 FHIR (section 7.2.1), and (2) a new basic research project based on the findings in this thesis was funded and started in 2020 (section 7.2.2). Based on this, new research on standards-based guideline compliance checking is planned, combining the findings of RQ1, RQ2, and RQ3 (section 7.2.3).

7.2.1 Impact on Standardization In the course of writing the process mining interface paper [29], the authors contributed to the HL7 FHIR Workflow project. There, the authors made a case for checking the usability of FHIR resources for process mining. Together, the working group members proposed the addition of a trace identifier to the AuditEvent and Provenance resources:

105 7. Conclusions and Outlook

“We want to be able to search on all events (creates, updates, deletes, etc.) that happened during a given encounter, that happened based on a particular protocol or as a result of a particular order.”a reported by Lloyd McKenzie

ahttps://jira.hl7.org/browse/FHIR-28100, last access 17.01.2021

Based on the discussions in that working group, we decided to use the PlanDefinition and Encounter references for the grouping and mapping approach (cf. section 5.4). Our proposal to extend AuditEvent to support this was reviewed and accepted for inclusion in the next FHIR release R5.

7.2.2 Basic Research Based on the preliminary findings of this work, the author applied for a research grant in the field of process analytics and model transformation (BPMN to FHIR). In 2020, the funding was approved by the Center for Technological Innovation in Medicine (TIMed Center) of the University of Applied Sciences Upper Austria. The goals of this project are (1) conducting basic research in the field and, based on this, (2) setting up new research projects. As for the basic research, in the project a standards- based model transformation between BPMN and HL7 FHIR is developed. The goal is to define clinical guidelines and treatment plans in BPMN and to automatically generate FHIR resources from them, which will then be used in IT systems for process control and documentation. The description and implementation of the model transformation (see figure 7.1) should become part of the HL7 standard.

Figure 7.1: Transformation concept for BPMN to FHIR PlanDefinition.

The main focus of the project is the transformation of a source model, in BPMN, into a target model, defined by the HL7 FHIR standard. In the scope of the project, this means a graph transformation GT = (T,R) consisting of the type graph T and the transformation rules R. Thus, an initial model is mapped to an abstract, intermediate

106 7.2. Impact and Future graph model. On the resulting graph model, rules are subsequently applied, to enable a transformation into a target model. This process is illustrated in figure 7.1. While the concrete use case, the description of clinical guidelines, is transferred into a suitable FHIR resource representation on the instance layer, the model layer contains the valid metamodels. In the case of the initial model, this is the BPMN metamodel, defined by the Object Management Group (OMG). The target model is based on the FHIR resource specification.

7.2.3 Guideline Compliance Checking We plan to follow up on this thesis and the work in the basic research project by researching the field of standards-based guideline compliance checking. A new project will combine the findings of all three research questions and the improved mapping techniques from the basic research project.

Motivation The workflows in hospitals are subject to a multitude of legal regulations, medical recommendations and guidelines as well as internal company guidelines. The objectives range from cost control to and evidence-based medicine. In the treatment of patients, but also in organizational processes in the hospital, several guidelines must be taken into account simultaneously. It is not trivial to make well-founded statements about the compliance with these guidelines during operation. Therefore, a tool is needed that automatically checks compliance with specifications and provides a basis for decision-making and support for the responsible personnel.

Figure 7.2: Basic idea of the follow-up project.

Goals Although it can build on the concepts developed in this thesis and the basic research project in section 7.2.2, there are still several open issues for the follow-up project, e.g.:

107 7. Conclusions and Outlook

(1) the relevant data sources in the different HIS (e.g., ATNA audit trails) have to be identified and examined for their suitability, (2) different conformance checking techniques have to be evaluated, and (3) a list of relevant guidelines has to be identified and formally modelled for test purposes.

The precondition is that the different HISs provide a uniform, standardized interface for process analysis. Thus, the developed tool can check the conformance of the processes to various predefined models and retrospectively, or ideally already during operation, make statements on compliance. Using the standardized interface described in section 5.4, multiple perspectives on the processes can be queried and subsequently checked against different target models. Figure 7.2 describes a scenario where guideline A was adhered to, B was not adhered to and C cannot yet be assessed because the process is still ongoing. A secondary goal is to move from predictive analytics (is the guideline adhered to?) to prescriptive analytics (what must happen in order to adhere to the guideline?). This would facilitate the development of decision-support systems.

7.3 Reflecting on Process Mining in Healthcare

In their work “Process management in healthcare: investigating why it’s easier said than done”, Hellström et al. [12] conclude:

“... it is obvious that bureaucracy is still present and in itself acts as efficient resistance to the new ideas. This presence manifests itself in the form of intra- and inter-subjective cognitive structures as well as in materialized artefacts. ... The materialized artefacts that are reminders of the traditional organization are numerous, from the most obvious such as organizational charts and buildings to more sophisticated ones such as systems for budgeting and reimbursement systems.” Andreas Hellström et al. [12]

We want to highlight the last part and add medical information systems in general to the list of materialized artefacts. Hospitals were not designed with process management in mind and, subsequently, the information systems implemented for hospitals lack process notions. Capturing workflow events was never a requirement of medical information systems or their support systems. The IHE integration profile ATNA even rules out the use of the event logs for “workflow performance analysis” [87].

However, in recent years the medical and standardization community slowly shifted the focus. With initiatives like SWIM [84] or the Radiology Playbook [91] the semantic basis for recording workflow events was created. A common language for terms in healthcare workflows is developed and respective IHE integration profiles, e.g., SOLE [90], follow suit by providing a description of the transactions needed for exchanging the recorded information.

108 7.3. Reflecting on Process Mining in Healthcare

7.3.1 Limitations and Opportunities This thesis attempts to bring together some of the recently developed standards and vocabularies, come to terms about them, and develop methods to show how they can be utilized for process mining. This comes with some limitations and opportunities.

Data Quality A major threat to the validity of our assumptions in chapter 5 is the questionable data quality in IHE ATNA audit event logs. As described in section 5.5.1, a study by Cruz- Correia et al. [153] found that “the lack of internal structure, data quality and precision limits the usefulness for legal issues and health information systems improvement”. This highlights the need for data quality improvement and enforcement. Process mining has some strict requirements regarding the structure and content of the event logs (cf. section 2.1). While there are methods to clean and repair event logs to some extent, the best way to ensure the applicability of analysis techniques is a rigorous, systematic logging approach. Wil van der Aalst [40] collected in his “Extracting Event Data from Databases to Unleash Process Mining” 12 guidelines (GL) for logging. This thesis supports the implementation of these guidelines. For example, the work on standards in chapter 3 can be used to identify clear semantics for names of references and attributes (GL1) and respective taxonomies or ontologies (GL2). A promising field for further research could be automated data quality measurement to verify the compliance with these 12 guidelines. This would be on a lower level than the medical guideline compliance checking in the follow-up project of section 7.2.3, but it is a vital precondition for any such continuous monitoring approach.

Mapping The issue of mapping data from the IHE ATNA information model to the XES event log format is discussed in the three subsequent approaches in chapter 5. From the fixed perspective in the direct mapping approach to potentially incomplete data and non-standardized access in the data warehouse approach.

• The first approach took away all decisions from the user (cf. section 5.2.6). However, defining a fixed mapping and a fixed perspective is not viable in practice. This “naive” approach must be seen as a first attempt and a proof of concept that ATNA audit trails can contain enough information and are potential sources for process mining.

• The subsequent data warehouse approach solved some of the issues by storing the audit messages in a standardized meta model (cf. section 5.3.3). Still, the mapping of fields from the messages to the fields in the meta model is fixed, but de Murillas et al. [144] describe how SQL queries can be composed to extract various perspectives from the data.

109 7. Conclusions and Outlook

• The final approach, the process mining interface, can be used stand-alone, like in the author’s work in [29]. However, the mapping defined there (cf. table 5.3) is, again, arbitrary.

All three approaches work because the validation is conducted on simulated data from the running example. In real-world scenarios, the lack of dedicated process notions becomes a major limitation. The mapping of arbitrary fields to trace or event identifiers is not viable in practice. This limitation can be tackled by introducing explicit process notions. Initiatives like SWIM [84], integration profiles like SOLE [90] or the Radiology Playbook [91] work in that direction. Also the latest additions to the HL7 FHIR standard described in section 7.2.1, if approved, will introduce concepts enabling unambiguous identification of trace and event identifiers.

Choice of Coding Systems In response to our literature review [18], Dylan A. Mordaunt published a direct comment in the same journal [134]. He highlights some of the major problems in the documentation of healthcare services that in turn could lead to false outcomes in process mining projects. He is particularly critical of our use of the ICD-10 code for coding diagnoses and emphasizes the fact that in certain areas of the world ICD-10 is used primarily for funding and planning rather than clinical purposes. For example, their experience with re-designing state-wide stroke pathways in South Australia showed that 25% of the codes were incorrect [134]. In terms of clinical specialty the use of SNOMED CT is also not without problems. Mordaunt [134] points out the specificities in the United Kingdom and New Zealand, where separate medical councils determine the coding of healthcare specialty and there is no standardized mapping to the SNOMED CT ontology.

“In summary, Helm and colleagues focused on systematising aspects of the HCM [healthcare process mining] case study reporting. There are problems with existing terminologies/ontologies/coding systems and limitations to solely rely on these. There is a precedent based on similar guidelines for a structured guideline or tool to be developed that will guide reporting and encourage standardisation. In addition to the aspects Helm et al. reported, it would need to cover core clinical/epidemiolog- ical/administrative questions such as PICO [Population, Intervention, Comparison, Outcome], and reports would ideally discuss the clinical/business implications of the insights obtained from the modelling.” Dylan A. Mordaunt, On Clinical Utility and Systematic Reporting in Case Studies of Healthcare Process Mining [134]

110 7.3. Reflecting on Process Mining in Healthcare

So this comment [134] highlights both limitations and opportunities. Although the terminologies described are generally adopted internationally, there are multiple issues that can limit their usefulness. However, the need for a “structured guideline or tool that will guide reporting and encourage standardisation” [134] is clearly there. This is a great opportunity for further research.

7.3.2 The Genesis of this Thesis The enthusiasm for this topic emerged during my bachelor’s studies and my first em- ployment as a junior researcher in 2009. I studied software engineering for medicine and was asked to join a research project called IHExplorer. The project’s main goal was to explore the possibilities of analyzing data from IHE-based systems. Due to the funding characteristics of the project and my supportive supervisors, Franz Pfeiffer and Josef Altmann, I had very little constraints and could work, well, exploratively. I was not aware of the, back then, very young and rather small field of process mining. However, I tried something new in defining and reconstructing clinical processes based on IHE and BPMN, which was also the topic of my bachelor’s thesis (and a paper [173]). The project ended and I shifted my focus to medical informatics and healthcare data standards, where I also finished my master’s degree. I stayed as a researcher at the University of Applied Sciences in Hagenberg and in 2014 I discovered the work of Wil van der Aalst et al. on process mining, the manifesto [6]. Without funding in this new field, I was lucky to find a professor willing to supervise my work, Josef Küng. Since there were no classes on process mining in any university around, I enlisted for the first edition of the online course by Wil van der Aalst “Process Mining: Data Science in Action”1. Together with a master student I co-supervised, Ferdinand Paster, I started on the first attempts to enable process mining on IHE-based systems [10, 27]. What I was still missing, was the connection to the process mining community. In 2016, my supervisor pointed me to Barbara Weber, who was working at the BPM research cluster at the university of Innsbruck. She pointed me to her colleague Andrea Burattin. In late 2016, I visited Andrea and he took the time to give me an introduction to the most important areas and to point me to the research group of Wil van der Aalst, more specific, to Joos Buijs. In early 2017 I visited Andrew Partington in Adelaide, South Australia. I was fascinated by his publication on process mining for clinical processes [20] and he gave me some insights on his work. It helped me to sharpen my focus in my own work, since he highlighted the troubles in comparative analysis of healthcare processes – a field that I initially also planned to work on in this thesis. In May 2017 I visited the group of Wil van der Aalst and Joos Buijs at the TU Eindhoven. I had very insightful conversations and met another PhD student, Eduardo Gonzales

1https://www.coursera.org/share/f8a248c52c81404831fef6a9d7b71c7c, last access 17.01.2021

111 7. Conclusions and Outlook

Lopez de Murillas, with whom I started collaborating on the data warehouse approach [28]. I also met Felix Mannhardt there, who introduced me to his great tool MPE that I used for conformance checking later on. The biggest leap forward, and the most influential development came as a result of the successful submission to the first Process-Oriented Data Science for Healthcare (PODS4H) workshop [30, 31]. Together with Christoph Rinner of the DDMUV I submitted the paper on conformance checking on melanoma surveillance data. At the workshop in Sydney, Australia, I finally met with the community of people focusing on the topic of process mining in healthcare and became appointed to the workshop’s program committee. For the second edition of the PODS4H workshop in 2019 I worked with my team in Hagenberg and Alvin C. Lin, a researcher from the Faculty of Medicine, University of Toronto, on the literature review and standard clinical descriptors for case studies [26, 18]. We had good discussions at the workshop in Vienna and after the conference I was asked to join the steering committee of the workshop. With the thesis taking shape I was missing one last part – the process mining interface. Again, together with my team in Hagenberg, I submitted the work on process mining on FHIR [29], describing the standards-based approach and completing this part of the topic. As a nice conclusion it may be noted that this last contribution also won the best student paper award at the workshop.

112 APPENDIX A Running Example XES

Listing A.1: XES event log of the extended running example in table 2.1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

113 A. Running Example XES

28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80

114 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133

115 A. Running Example XES

134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171

116 APPENDIX B FHIR AuditEvent

Listing B.1: XML template of the HL7 FHIR AuditEvent [137].

1 2 3 4 5 6 7 8 9 10 < /purposeOfEvent> 12 13 14 15 17 18 19 20 21 22 23 24

25 26

117 B. FHIR AuditEvent

27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

118 APPENDIX C Acknowledgements

This work took me more than seven years. I already described a bit of this journey in the genesis of this thesis section 7.3.2 and I want to thank all the people mentioned there. They influenced me and the course of this work in many different ways. Josef Küng supervised me although initially I wasn’t associated with the university or the institute and I didn’t bring any projects or funding to his group. I also thank Marcos Sepúlveda for the interesting conversations at the PODS4H workshops and for being my second supervisor. Herwig Mayr taught me that I should approach authorities with respect, but never with awe. A lesson I have learned to appreciate a lot. I hope we can find another opportunity to put your smoker to work. I thank my office mates Andreas Schuler and Oliver Krauss for countless hours of reflecting on academia, research papers, crazy ideas, healthcare IT, and height adjustable tables. I hope we will stay in touch throughout our professional careers. My research group, in no specific order, Anna Lin, Jacqueline Schwebach, Barbara Traxler, Gerald Zwettler, Andreas Pointner, Christoph Praschl, Rainer Meindl, David Baumgartner, Eva-Maria Spitzer, Simone Sandler, Johann Aichberger, and Martin Hanreich. I continue to learn a lot from you all and I hope to pay it back in one way or the other – at least by bringing cake or organizing coffee calls. Stefan Sabutsch, for telling me what interoperability is really about. Dietmar Keimel and the whole CAS, for many years of interesting research projects and great Christmas parties. Wolfgang Hießl, for many discussions about eHealth on our train rides to Vienna. Reinhard Egelkraut, because we can easily switch between pop culture and work during a conversation. Silvia Winkler, for managing the Austrian DICOM community and for many great discussions.

119 C. Acknowledgements

The PODS4H gang, Jorge Munoz-Gama, Carlos Fernandez-Llatas, Niels Martin, Owen Johnson, and Marcos Sepúlveda. Thank you for bringing together a community that talks about the things I also love to talk about. Moni, for accompanying me on my many conference trips. Whenever I was too nervous or too focused (happened more than once), you took care of everything. Lisa, for helping me to organize my thoughts and for showing me the ropes of the university processes. It’s quite possible that I would have needed another year or two without you. Lexx, Max, Georg1, Flo, Alex, Armin, Andrea, Phil, Hansi, Maja, and Oliver (again). Thank you for many fantastic evenings of D&D, board games, and video games. You made sure there was more to the week than work and sleep. My parents, Elisabeth and Franz, who taught me how to think2 and how to be a nice human being. My sister, Rebecca, to whom I secretly look up a lot. My brother, Joseph, who supposedly always tried to catch up with me, but probably surpassed me long ago. My other brothers, Benjamin and Jonathan. Not much to say here. Thank you for being awesome. Hopefully we will have as much fun in the future as we had in the past. And this, of course, includes Klaus and Sebastian as well. Some day we will determine if the dice are red or orange.

1u/f+4 2allegedly a prerequisite for an academic career

120 “We all become what we pretend to be.” Patrick Rothfuss

APPENDIX D Curriculum Vitae

Emmanuel Helm MSc born September 11th 1986 in Amstetten, Austria email: [email protected]

Experience

since 10/2019 FH Assistant Professor at University of Applied Sciences Upper Austria, School of Informatics, Communications and Media, Hagenberg Teaching in Bachelor’s programs and Master’s programs. since 05/2019 Lecturer at St. Pölten University of Applied Sciences, St. Pölten Teaching in the “Digital Healthcare” Master’s program. since 10/2018 Research Project Manager at University of Applied Sciences Upper Austria, R&D, Hagenberg Deputy head of the “Advanced Information Systems and Technology” research group. since 05/2014 Lecturer at University of Applied Sciences for Health Professions, Steyr Teaching in the “Radiological Technology” Bachelor’s program. since 10/2012 Lecturer at University of Applied Sciences Upper Austria, School of Informatics, Communications and Media, Hagenberg Teaching in the “Software Engineering” and “Medical and Bioinfor- matics” Bachelor’s programs and the “Data Science and Engineering” Master’s program.

121 D. Curriculum Vitae

since 09/2010 Researcher at University of Applied Sciences Upper Austria, R&D, Hagenberg Research in the “eHealth” and “Advanced Information Systems and Technology” groups. 03/2010–07/2010 Intern at University of Applied Sciences Upper Austria, R&D, Hagenberg Research in the “eHealth” group. 10/2008–07/2009 Tutor at University of Applied Sciences Upper Austria, School of Informatics, Communications and Media, Hagenberg Tutor for programming lectures in the “Software Engineering” Bachelor’s program. 04/2007–12/2009 Software Developer at Magna Powertrain Engineering Center Steyr, St. Valentin Developing report management software and workflows for the technical information systems department at ECS.

Education

since 2014 Doctorate program in Technical Sciences at Johannes Kepler Uni- versity Linz Topic of PhD thesis: “Process Mining in Healthcare” 2010–2012 Master’s program in Biomedical Informatics at University of Ap- plied Sciences Upper Austria, School of Informatics, Communications and Media, Hagenberg Thesis: “Telemedicine in Cardiac Rehabilitation” 2007-2010 Bachelor’s program in Software Engineering at University of Ap- plied Sciences Upper Austria, School of Informatics, Communications and Media, Hagenberg

122 Academic Activities

Reviewer for Journals and Conference Proceedings • Taylor&Francis, Enterprise Information Systems (Journal) • IEEE Journal of Biomedical and Health Informatics • MDPI, International Journal of Environmental Research and Public Health • IOS Press, Studies in Health Technologies and Informatics (dHealth conference) • Springer, Lecture Notes in Business Information Processing (PODS4H workshop)

Invited talks (recent) • Zukunftsforum OÖ, talk about Process Mining in healthcare (Linz, 2019). • Charles University Prague, Czech Institute of Informatics, Robotics, and Cybernet- ics (CIIRC CTU), Guest lecture and tutorial on Process Mining (Prague, 2018). • HL7 Swiss technical committee meeting, talk about radiology image exchange in Austria (Zürich, 2018). • eGov-Meeting “OÖ digital gesund”, talk about future trends in medical informatics (Linz, 2018).

Steering committee or board member • Process-Oriented Data Science for Healthcare Alliance1 (chapter within the IEEE Task Force on Process Mining), organizing the PODS4H workshop at the Interna- tional Conference on Process Mining (ICPM). • DICOM Usergroup Austria2, Co-Chair of the technical committee.

Working groups • HL7 FHIR Workflow3, working on the standardization of workflow notions in healthcare (since 2015). • Austrian Interoperability Forum4, working on interoperability of Austrian eHealth projects (since 2014). • ELGA Patient Summary, working on the Patient Summary technical guideline for the Austrian electronic health record ELGA (since 2016). • ELGA Imaging, working on the Imaging technical guideline for the Austrian electronic health record ELGA (since 2016). • ELGA Outpatient Report, working on the Outpatient Report technical guideline for the Austrian electronic health record ELGA (since 2016).

1http://pods4h.com/alliance 2https://dicom-austria.at/ 3https://confluence.hl7.org/pages/viewpage.action?pageId=40743450 4https://hl7.at/home/iopf/

123 D. Curriculum Vitae

Trainer • Trainer for the “ELGA, IHE, HL7 Certificate” program at Technikum Wien Academy5.

Visit to external institute • One week at TU/e, research group “Analytics for Information Systems” of Wil van der Aalst (Eindhoven, 2017).

Publications - Dissertation related • Helm, E., Krauss, O., Lin, A. M., Pointner, A., Schuler, A., and Küng, J. (2020). “Process Mining on FHIR – An Open Standards-Based Process Analytics Approach for Healthcare”. Proceedings of the Workshop for Process-Oriented Data Science for Healthcare6. (not yet published) • Helm, E. (2020). “Towards Process Mining in Radiology: Utilization of IHE SOLE”. In dHealth 2020 - Biomedical Informatics for Health and Care (pp. 108-109). Studies in health technology and informatics Vol. 271. IOS Press Amsterdam. https://doi.org/10.3233/SHTI200082 • Helm, E., Lin, A. M., Baumgartner, D., Lin, A. C., and Küng, J. (2020). “Towards the Use of Standardized Terms in Clinical Case Studies for Process Mining in Healthcare”. International Journal of Environmental Research and Public Health, 17(4). https://doi.org/10.3390/ijerph17041348 • Helm, E., Lin, A. M., Baumgartner, D., Lin, A. C., and Küng, J. (2019). “Adopting Standard Clinical Descriptors for Process Mining Case Studies in Healthcare”. In Business Process Management Workshops - BPM 2019 International Workshops, Revised Selected Papers (pp. 608–619). Lecture Notes in Business Information Processing Vol. 362. Springer. https://doi.org/10.1007/978-3-030-37453-2_49 • C. Rinner, E. Helm, R. Dunkl, H. Kittler, and S. Rinderle-Ma (2019). “An Application of Process Mining in the Context of Melanoma Surveillance Using Time Boxing”. In Business Process Management Workshops - BPM 2018 International Workshops, Revised Papers (pp. 175–186). Lecture Notes in Business Information Processing Vol. 342. Springer. https://doi.org/10.1007/978-3-030-11641-5_14 • C. Rinner, E. Helm, R. Dunkl, H. Kittler, and S. Rinderle-Ma (2018). “Process Mining and Conformance Checking of Long Running Processes in the Context of Melanoma Surveillance”. International Journal of Environmental Research and Public Health, 15(12). https://doi.org/10.3390/ijerph15122809 • Helm, E., Schuler, A. H., and Mayr, H. (2018). “Cross-Enterprise Communica- tion and Data Exchange in Radiology in Austria: Technology and Use Cases”. In Health Informatics Meets eHealth: Biomedical Meets eHealth - From Sen- sors to Decisions - Proceedings of the 12th eHealth Conference (pp. 64–71).

5https://academy.technikum-wien.at/zertifizierungen/elga-ihe-hl7-zertifizierung/ 6https://pods4h.com/wp-content/uploads/2020/10/PODS4H_2020_paper_3.pdf

124 Studies in Health Technology and Informatics Vol. 248. IOS Press Amsterdam. https://doi.org/10.3233/978-1-61499-858-7-64 • de Murillas, E. G-L., Helm, E., Reijers, H. A., and Küng, J. (2017). “Audit Trails in OpenSLEX: Paving the Road for Process Mining in Healthcare”. In Information Technology in Bio- and Medical Informatics - 8th International Conference, ITBAM 2017, Proceedings (pp. 82–91). Lecture Notes in Computer Science Vol. 10443. Springer. https://doi.org/10.1007/978-3-319-64265-9_7 • Helm, E., Traxler, B., Schuler, A. H., Krauss, O., and Küng, J. (2017). “Towards standards based health data extraction facilitating process mining”. In 6th Interna- tional Workshop on Innovative Simulation for Health Care, IWISH 2017, Held at the International Multidisciplinary Modeling and Simulation Multiconference, I3M 2017. CAL-TEK S.r.l.. • Helm, E., and Küng, J. (2016). “Process Mining: Towards Comparability of Healthcare Processes”. In Information Technology in Bio- and Medical Informatics - 7th International Conference, ITBAM 2016, Proceedings (pp. 249–252). Lecture Notes in Computer Science Vol. 9832. Springer. https://doi.org/10.1007/978-3- 319-43949-5_20 • Helm, E., and Paster, F. F. (2015). “First Steps Towards Process Mining in Distributed Health Information Systems”. International Journal of Electronics and Telecommunications, 61(2), 137–142. https://doi.org/10.1515/eletel-2015-0017 • Paster, F. F., and Helm, E. (2015). “From IHE Audit Trails to XES Event Logs Facilitating Process Mining”. In Digital Healthcare Empowering Europeans - Proceedings of MIE 2015 (pp. 40–44). Studies in Health Technology and Informatics Vol. 210. IOS Press. https://doi.org/10.3233/978-1-61499-512-8-40 • Helm, E., Schuler, A. H., Krauss, O., and Traxler, B. (2015). “Prefetching of Medical Imaging Data Across XDS Affinity Domains”. In eHealth2015 – Health Informatics Meets eHealth: Innovative Health Perspectives: Personalized Health (pp. 211-218). Studies in Health Technology and Informatics Vol. 212. IOS Press Amsterdam. https://doi.org/10.3233/978-1-61499-524-1-211 • Strasser, M., Pfeifer, F., Helm, E., Schuler, A. H., and Altmann, J. (2011). “Defining and reconstructing clinical processes based on IHE and BPMN 2.0”. In User Centred Networked Health Care - Proceedings of MIE 2011 (pp. 482-486). Studies in Health Technology and Informatics Vol. 169. IOS Press. https://doi.org/10.3233/978-1- 60750-806-9-482

Publications - Others • Lin, A. M., Krauss, O., and Helm, E. (2019). “Automated Verification of Struc- tured Questionnaires Using HL7 R FHIR R ”. In ICT for Health Science Research - Proceedings of the EFMI 2019 Special Topic Conference (pp. 11–15). Stud- ies in Health Technology and Informatics Vol. 258. IOS Press Amsterdam. https://doi.org/10.3233/978-1-61499-959-1-11 • Baumgartner, D., Haghofer, A., Limberger, M., and Helm, E. (2019). “Process pruner: A tool for sequence-based event log preprocessing”. CEUR Workshop

125 D. Curriculum Vitae

Proceedings vol. 2374, pp. 1-–4. ICPM Demo Track • Lin, A. M., Lin, A. C., Krauss, O., Hearn, J., and Helm, E. (2018). “A Model for Implementing an Interoperable Electronic Consent Form for Medical Treatment Using HL7 FHIR”. European Journal for Biomedical Informatics, 14(3), 37–47. https://doi.org/10.24105/ejbi.2018.14.3.6 • Krauss, O., Angermaier, M., and Helm, E. (2016). “Multidisciplinary team meet- ings - A literature based process analysis”. In Information Technology in Bio- and Medical Informatics – 7th International Conference, ITBAM 2016, Proceed- ings (pp. 115–129). Lecture Notes in Computer Science Vol. 9832. Springer. https://doi.org/10.1007/978-3-319-43949-5_8 • Helm, E., Schuler, A. H., and Mayr, H. (2013). “Regelbasierte Entwicklung von Barrierefreien und Plattformunabhängigen Mobilen Benutzeroberflächen”. In eHealth2013 – Von der Wissenschaft zur Anwendung und zurück. (pp. 207–218) • Traxler, B., Schuler, A. H., and Helm, E. (2013). “Analysis of clinical docu- ments to enable semantic interoperability”. In Database and Expert Systems Applications – 24th International Conference, DEXA 2013, Proceedings PART 2 ed., pp. 466–473. Lecture Notes in Computer Science Vol. 8056. Springer. https://doi.org/10.1007/978-3-642-40173-2_40 • Strasser, M., Helm, E., Schuler, A. H., Fuschlberger, M. G., and Altendorfer, B. (2012). “Mobile Access to Healthcare Monitoring Data for Patients and Medical Personnel”. In Quality of Life through Quality of Information IOS Press Amsterdam. • Strasser, M., Helm, E., Traxler, B., and Mayr, H. (2012). “Mobile health solutions for empowered, health-conscious individuals and patients”. In Proceedings of the 10th International Conference on Information Communication Technologies in Health (pp. 422–432) • Strasser, M., Helm, E., Schuler, A. H., Traxler, B., Mayr, H., and David, C. (2012). “Telemonitoring für mobile Pflegedienste: Entwicklung von standardkonformen Schnittstellen”. In eHealth2012 – Health Informatics meets eHealth – von der Wissenschaft zur Anwendung und zurück (pp. 179–184) • Pfeifer, F., Helm, E., Strasser, M., and Altmann, J. (2011). “Analyse Klinischer Pfade”. In e-Health - Die IT-Basis für eine Integrierte Versorgung (pp. 125–132). Wagner Verlag. • Strasser, M., Pfeifer, F., Helm, E., Schuler, A. H., and Altmann, J. (2011). “Recon- struction of clinical workflows based on the ihe integration profile cross-enterprise document workflow”. In 23rd European Modeling and Simulation Symposium, EMSS 2011 (pp. 272–277).

126 List of Figures

1.1 Positioning of the three main types of process mining: (a) discovery, (b) conformance checking, and (c) enhancement (from [6])...... 2 1.2 BPMN diagram of a workflow in a radiology practice based on [8, 9, 10]. . 3 1.3 The diagnostic-therapeutic cycle (from [1])...... 5 1.4 Spaghetti process model (from [22])...... 7 1.5 Structure of this thesis...... 10

2.1 Input and output of process discovery (from [6])...... 19 2.2 The resulting Petri net after applying the α+ algorithm to the event log in table 2.1 ...... 20 2.3 The four competing quality criteria in process discovery [11]...... 21 2.4 Redrawn directly-follows graph of the extended running example mined with the tool Disco...... 22 2.5 Input and output of conformance checking...... 22 2.6 Log conformance concepts. Left: fitting and unfitting behaviour. Right: log-precise and -imprecise behaviour. (based on [47] and [50]) ...... 23 2.7 Input and output of process enhancement...... 24 2.8 Directly-follows graph of the extended running example highlighting the time perspective...... 25 2.9 Directly-follows graph of the extended running example highlighting the handover of work between roles...... 26 2.10 The XES meta-model as described in the IEEE 1849-2016 standard [59]. . 27

3.1 ATNA actors and transactions [87] including RESTful query [88] (grey). . . 41

4.1 Timeline and dependency of the reviews of Rojas et al. [16] and Helm et al. [18]. The arrows indicate that studies in the latter reference studies in the former...... 48 4.2 Flowchart on the case study selection strategy...... 49

5.1 IHE audit message schema diagram based on RFC-3881 and DICOM [87]. . 61 5.2 Transformation architecture to convert RFC-3881 based Audit Trails into standardized XES Mining Logs...... 64 5.3 Test setting for the direct mapping approach (implemented in [9]). . . . . 65

127 5.4 The simplified radiological workflow identified in the WIRE project [142, 10]. 66 5.5 Discovered Petri net with the AlphaMiner...... 69 5.6 Diagram of the OpenSLEX meta model at a high level...... 71 5.7 The dashed lines show the mapping of the fields of Audit Messages to the OpenSLEX meta model...... 73 5.8 Inference of the missing elements in the meta model, starting from the events (a) and finishing mining a model (e)...... 74 5.9 The three steps of the interface test setting including the respective consumed and produced data. The numbers correspond to sections or figures. . . . . 77 5.10 BPMN process model of the radiology practice workflow...... 77 5.11 Venn diagramm depicting the grouping of AuditEvent (A) resources based on their references to PlanDefinition (P) and Encounter (E) resources. . . . . 79 5.12 Process model generated with the Inductive Visual Miner...... 83

6.1 BPMN representation of the process of melanoma surveillance...... 90 6.2 A simplified Petri net model with applied time boxing corresponding to the guideline used at the DDMUV [157]...... 92 6.3 The most frequent trace in the complete log (I-IV). [fitness 98.8%] . . . . 95 6.4 A trace of a patient that skipped three follow-up visits. [fitness 96.19%] . 95 6.5 A trace comprising all four stages and only one missing follow-up visit. [fitness 99.8%] ...... 95 6.6 A trace with multiple skipped events and thus relatively low fitness. [fitness 89.6%] ...... 96 6.7 Dotted chart analysis of the stage I event log...... 96 6.8 A model of the stage I log created using the process mining tool Disco. . . 98 6.9 Survival analysis for all 246 patients that stayed in follow-up for more than two years, sampled into three equal-sized groups depending on their fitness. 100 6.10 Survival analysis for all 358 patients in follow-up...... 100

7.1 Transformation concept for BPMN to FHIR PlanDefinition...... 106 7.2 Basic idea of the follow-up project...... 107

128 List of Tables

1.1 An event log based on the running example process model in figure 1.2. . 4

2.1 An extended example event log for the running example workflow from section 1.1.1. The dashed lines separate the cases recorded in the log...... 16

3.1 Comparison of different definitions for the levels of interoperability...... 31 3.2 Important medical data standards, Part 1 (from [74])...... 34 3.3 Important medical data standards, Part 2 (from [74])...... 35 3.4 Coded activities of the running example using the SWIM lexicon in RadLex. 38

4.1 Tabular summary of the 2016 literature review of Rojas et al. [16]. . . . . 47 4.2 Studies with their most commonly used tools (non-disjoint)...... 51 4.3 Papers with their corresponding techniques or algorithms...... 51 4.4 Papers with their corresponding process mining perspectives...... 52 4.5 Papers with their corresponding SNOMED CT encounter environment. . 52 4.6 Papers with their corresponding SNOMED CT clinical specialty...... 53 4.7 Papers with their corresponding ICD-10 medical diagnosis...... 54 4.8 Aspects that describe the basic characteristics of the data...... 58 4.9 Clinical aspects of the mined healthcare process...... 58 4.10 Aspects of the process mining techniques...... 58

5.1 Selected RFC-3881 fields. 1-6 are mandatory according to [136]. 7-8 are mandatory if the ParticipantObjectIdentification is present...... 62 5.2 Mapping of four RFC-3881 fields to corresponding XES fields...... 66 5.3 Mapping table of operations on specific FHIR resources to activities of the radiology practice workflow, ordered by occurence in the simulated model in figure 5.10...... 80

6.1 Number of patients and mean number of events per case in the base data. 93 6.2 Average fitness and precision for each log...... 94

129

Acronyms

ACR American College of Radiologist. 32

AED Accident and Emergency department. 52, 55

AJCC American Joint Committee on Cancer. 90, 91, 93, 99, 101

ANOVA analysis of variance. 51

ANSI American National Standards Institute. 33

API Application Programming Interface. 14

ARR Audit Record Repository. 40–42, 60, 65, 66, 72, 76, 84

ATC Anatomical Therapeutic Chemical Classification System. 35

ATNA Audit Trail and Node Authentication. 40–43, 60, 63, 65, 69, 70, 72–76, 84, 85, 104, 108, 109

BPM Business Process Management. 1, 6, 18, 111

BPMN Business Process Model and Notation. 3, 14, 20, 24, 51, 66, 76, 77, 90, 92, 106, 111, 127, 128

C-CDA Consolidated CDA. 34

CCD Continuity of Care Document. 34

CDA Clinical Document Architecture. 34

CRUD Create/Read/Update/Delete. 36, 78, 79

CSV comma-separated values. 63

CT Computed Tomography. 6, 32, 65 cURL Client for URLs. 77, 81

131 DDMUV Department of Dermatology at the Medical University Vienna. 9, 88, 90, 92, 93, 100, 104, 112

DICOM Digital Imaging and Communications in Medicine. 32–34, 41, 61–65, 70, 85

DQ Data quality. 16, 17

ELGA Austrian electronic health record. 33, 64

EPC Event-driven Process Chain. 24 epSOS Smart Open Services for European Patients. 33

ERP Enterprise Resource Planning. 18

FCAT Federative Committee on Anatomical Terminology. 34

FHIR Fast Healthcare Interoperability Resources. 11, 33, 34, 36, 41–43, 60, 63, 76–78, 80, 81, 84, 104–107, 110, 112

GP general practitioner. 52, 55

HAPI HL7 API. 78

HIS Hospital Information System. 6, 55, 85, 91, 108

HL7 Health Level Seven. 11, 32–34, 36, 41–43, 60, 61, 63, 76, 77, 104–106, 110

HTTP Hypertext Transfer Protocol. 36

ICD-10 International Classification of Diseases. 35–37, 50, 53, 56, 57, 89, 110

ICF International Classification of Functioning, Disability and Health. 35

ICHI International Classification of Health Interventions. 35

ICPC International Classification of . 35

IDMP Identification of medicinal products. 34

IEEE Institute of Electrical and Electronics Engineers. 69–71

IHE Integrating the Healthcare Enterprise. 9, 10, 33, 34, 40, 41, 60, 61, 63, 64, 66, 67, 69, 70, 72, 75, 76, 84, 85, 104, 108, 109, 111

INN International Nonproprietary Names. 35

ISO International Organization for Standardization. 32, 34

132 JCI Joint Initiative Counci. 32

JSON JavaScript Object Notation. 81

LIS Laboratory Information System. 17

LOINC Logical Observation Identifiers Names and Codes. 35

LOS length of stay. 57

MDI Model Driven Interoperability. 63

MeSH Medical Subject Headings. 36

MOF Meta Object Facility. 63

MPE Multi-perspective Process Explorer. 89, 92, 94, 99, 101, 105, 112

MRI magnetic resonance imaging. 32, 65

MSA Multiple Sequence Alignment. 23

MXML Mining eXtensible Markup Language. 26

NEMA National Electrical Manufacturers Association. 32, 34, 61

OpenSLEX Open SQL Log Exchange format. 60, 70–73, 75, 76, 84, 104

PACS Picture Archiving and Communication System. 65

PAIS Process Aware Information System. 14

PCHA Personal Connected Health Alliance. 35

PDO Profile Development Organization. 33

PDQ Patient Demographics Query. 65, 67

PHI Protected Health Information. 60, 67

PIX Patient Identifier Cross Referencing. 65, 67

PLG2 Processes and Logs Generator. 77

PODS4H Process-Oriented Data Science for Healthcare. 112, 119, 120

RDA Research, Documentation, and Analysis. 90, 97

REST Representational state transfer. 36, 41–43, 60, 77, 78

133 RIM Reference Information Model. 34 RIS radiology information system. 42 RPC remote procedure call. 36 RSNA Radiological Society of North America. 37, 43

SDO Standards Development Organization. 30, 32, 33 SIIM Society for Imaging Informatics in Medicine. 38 SNODENT CT Systematized Nomenclature for Dentistry. 57 SNOMED CT Systematized Nomenclature of Medicine Clinical Terms. 35, 36, 50, 52, 53, 55–57, 110 SOLE Standardized Operational Log of Events. 42, 43, 108, 110 SQL Structured Query Language. 60, 76, 109 SWIM SIIM’s Workflow Initiative for Medicine. 38, 43, 66, 108, 110

TA Terminologia Anatomica. 34 TF technical framework. 40, 41

UCUM Unified Code for Units of Measure. 35 UI User Interface. 14 UMLS Unified Medical Language System. 36 URI Uniform Resource Identifier. 27

WADO Web Access to DICOM Objects. 65 WHO World Health Organization. 35, 37, 50 WIRE Workflow for Image prefetching in Radiology for ELGA. 64, 66 WONCA World Organization of Family Doctors. 35

XDS Cross-Enterprise Document Sharing. 40, 65, 67, 68 XES eXtensible Event Stream. 10, 11, 14, 26, 27, 60, 63, 64, 66, 68–71, 76, 78, 80–82, 84, 104, 109 XML Extensible Markup Language. 41, 63, 67 XSLT Extensible Stylesheet Language Transformations. 64, 70

134 Bibliography

[1] R. Lenz, R. Blaser, M. Beyer, O. Heger, C. Biber, M. Bäumlein, and M. Schnabel, “It support for clinical pathways – lessons learned,” International Journal of Medical Informatics, vol. 76, pp. S397–S402, 2007.

[2] M. Reichert, “What bpm technology can do for healthcare process support,” in Artificial Intelligence in Medicine: 13th Conference on Artificial Intelligence in Medicine, AIME 2011, Bled, Slovenia, July 2-6, 2011, Proceedings, vol. 6747, pp. 2–13, Springer, 2011.

[3] U. Kaymak, R. Mans, T. van de Steeg, and M. Dierks, “On process mining in health care,” in Systems, Man, and Cybernetics (SMC), 2012 IEEE International Conference on, pp. 1859–1864, IEEE, 2012.

[4] T. Benson, Principles of Health Interoperability. Springer, 2016.

[5] W. M. P. van der Aalst, T. Weijters, and L. Maruster, “Workflow mining: Discov- ering process models from event logs,” IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 9, pp. 1128–1142, 2004.

[6] W. M. P. van der Aalst, A. Adriansyah, A. K. A. Medeiros, F. Arcieri, T. Baier, T. Blickle, J. C. Bose, P. Brand, R. Brandtjen, J. Buijs, et al., “Process mining manifesto,” in Business Process Management Workshops, pp. 169–194, Springer Berlin Heidelberg, 2012.

[7] W. M. P. van der Aalst, A. Adriansyah, and B. van Dongen, “Replaying history on process models for conformance checking and performance analysis,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 182–192, 2012.

[8] B. J. Erickson, C. Meenan, and S. Langer, “Standards for business analytics and departmental workflow,” Journal of Digital Imaging, vol. 26, no. 1, pp. 53–57, 2013.

[9] E. Helm, A. Schuler, O. Krauss, and B. Franz, “Prefetching of medical imaging data across xds affinity domains,” Studies in Health Technology and Informatics, vol. 212, pp. 211–218, 2015.

135 [10] E. Helm and F. Paster, “First steps towards process mining in distributed health information systems,” International Journal of Electronics and Telecommunications, vol. 61, no. 2, pp. 137–142, 2015.

[11] W. M. P. van der Aalst, Process Mining: Data Science in Action. Springer, 2016.

[12] A. Hellström, S. Lifvergren, and J. Quist, “Process management in healthcare: investigating why it’s easier said than done,” Journal of Manufacturing Technology Management, 2010.

[13] M. A. Musen and J. H. van Bemmel, Handbook of Medical Informatics. Bohn Stafleu Van Loghum Houten, the Netherlands, 1997.

[14] Á. Rebuge and D. R. Ferreira, “Business process analysis in healthcare environments: A methodology based on process mining,” Information Systems, vol. 37, no. 2, pp. 99–116, 2012.

[15] R. S. Mans, W. M. P. van der Aalst, and R. J. B. Vanwersch, Process Mining in Healthcare: Evaluating and Exploiting Operational Healthcare Processes. Springer, 2015.

[16] E. Rojas, J. Munoz-Gama, M. Sepúlveda, and D. Capurro, “Process mining in healthcare: A literature review,” Journal of Biomedical Informatics, vol. 61, pp. 224– 236, 2016.

[17] T. G. Erdogan and A. Tarhan, “Systematic mapping of process mining studies in healthcare,” IEEE Access, vol. 6, pp. 24543–24567, 2018.

[18] E. Helm, A. M. Lin, D. Baumgartner, A. C. Lin, and J. Küng, “Towards the use of standardized terms in clinical case studies for process mining in healthcare,” International Journal of Environmental Research and Public Health, vol. 17, no. 4, 2020.

[19] R. S. Mans, M. Schonenberg, M. Song, W. Aalst, and P. J. Bakker, “Application of process mining in healthcare–a case study in a dutch hospital,” Biomedical Engineering Systems and Technologies, pp. 425–438, 2009.

[20] A. Partington, M. Wynn, S. Suriadi, C. Ouyang, and J. Karnon, “Process mining for clinical processes: a comparative analysis of four australian hospitals,” ACM Transactions on Management Information Systems (TMIS), vol. 5, no. 4, pp. 1–19, 2015.

[21] R. J. C. Bose, R. S. Mans, and W. M. P. van der Aalst, “Wanna improve process mining results?,” in Computational Intelligence and Data Mining (CIDM), 2013 IEEE Symposium on, pp. 127–134, IEEE, 2013.

136 [22] W. M. P. van der Aalst, “Process mining: discovering and improving spaghetti and lasagna processes,” in 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 1–7, IEEE, 2011.

[23] A. Weijters, W. M. P. van der Aalst, and A. A. De Medeiros, “Process mining with the heuristics miner-algorithm,” Technische Universiteit Eindhoven, Tech. Rep. WP, vol. 166, pp. 1–34, 2006.

[24] E. Helm, “Towards process mining in radiology: Utilization of ihe sole,” in dHealth 2020–Biomedical Informatics for Health and Care: Proceedings of the 14th Health Informatics Meets Digital Health Conference, vol. 271, pp. 108–109, IOS Press, 2020.

[25] B. Traxler, E. Helm, O. Krauss, A. Schuler, and J. Kueng, “Towards semantic inter- operability in health data management facilitating process mining,” International Journal of Privacy and Health Information Management (IJPHIM), vol. 6, no. 2, pp. 1–12, 2018.

[26] E. Helm, A. M. Lin, D. Baumgartner, A. C. Lin, and J. Küng, “Adopting standard clinical descriptors for process mining case studies in healthcare,” in International Conference on Business Process Management, pp. 608–619, Springer, 2019.

[27] F. Paster and E. Helm, “From ihe audit trails to xes event logs facilitating process mining,” Digital Healthcare Empowering Europeans – Proceedings of MIE 2015, Studies in Health Technology and Informatics, vol. 210, pp. 40–44, 2015.

[28] E. G. L. de Murillas, E. Helm, H. A. Reijers, and J. Küng, “Audit trails in openslex: Paving the road for process mining in healthcare,” in International Conference on Information Technology in Bio-and Medical Informatics, pp. 82–91, Springer, 2017.

[29] E. Helm, O. Krauss, A. M. Lin, A. Pointner, A. Schuler, and J. Küng, “Process mining on fhir - an open standards-based process analytics suite for healthcare,” in International Conference on Process Mining, pp. 608–619, Springer, 2020.

[30] C. Rinner, E. Helm, R. Dunkl, H. Kittler, and S. Rinderle-Ma, “An application of process mining in the context of melanoma surveillance using time boxing,” in International Conference on Business Process Management, pp. 175–186, Springer, 2018.

[31] C. Rinner, E. Helm, R. Dunkl, H. Kittler, and S. Rinderle-Ma, “Process Mining and Conformance Checking of Long Running Processes in the Context of Melanoma Surveillance,” International Journal of Environmental Research and Public Health, vol. 15, no. 12, p. 2809, 2018.

[32] W. Van der Aalst, “Process design by discovery: Harvesting workflow knowledge from ad-hoc executions,” in Knowledge Management: An Interdisciplinary Approach, Dagstuhl Seminar Report, no. 281, 2000.

137 [33] W. Van Der Aalst, “Process mining: Overview and opportunities,” ACM Trans- actions on Management Information Systems (TMIS), vol. 3, no. 2, pp. 1–17, 2012.

[34] J. Wyatt and J. Liu, “Basic concepts in medical informatics,” Journal of Epidemi- ology & Community Health, vol. 56, no. 11, pp. 808–812, 2002.

[35] T. Gschwandtner, J. Gärtner, W. Aigner, and S. Miksch, “A taxonomy of dirty time-oriented data,” in International Conference on Availability, Reliability, and Security, pp. 58–72, Springer, 2012.

[36] E. Rahm and H. H. Do, “Data cleaning: Problems and current approaches,” IEEE Data Engineering Bulletin, vol. 23, no. 4, pp. 3–13, 2000.

[37] H. Müller and J.-C. Freytag, Problems, Methods, and Challenges in Comprehensive Data Cleansing. 2003.

[38] W. Kim, B.-J. Choi, E.-K. Hong, S.-K. Kim, and D. Lee, “A taxonomy of dirty data,” Data mining and knowledge discovery, vol. 7, no. 1, pp. 81–99, 2003.

[39] C. Günther and W. van der Aalst, “Fuzzy mining–adaptive process simplification based on multi-perspective metrics,” Business Process Management, pp. 328–343, 2007.

[40] W. M. P. van der Aalst, “Extracting event data from databases to unleash process mining,” in BPM-Driving innovation in a digital world, pp. 105–128, Springer, 2015.

[41] A. A. De Medeiros, B. F. van Dongen, W. M. P. van der Aalst, and A. Weijters, “Process mining: Extending the α-algorithm to mine short loops,” BETA Working Paper Series, WP 113, Eindhoven University of Technology, Eindhoven, 2004.

[42] J. De Weerdt, M. De Backer, J. Vanthienen, and B. Baesens, “A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs,” Information Systems, vol. 37, no. 7, pp. 654–676, 2012.

[43] A. Augusto, R. Conforti, M. Dumas, M. La Rosa, F. M. Maggi, A. Marrella, M. Mecella, and A. Soo, “Automated discovery of process models from event logs: Review and benchmark,” IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 4, pp. 686–705, 2018.

[44] A. Rozinat, A. A. De Medeiros, C. W. Günther, A. Weijters, and W. M. P. van der Aalst, “Towards an evaluation framework for process mining algorithms,” BPM Center Report BPM-07-06, BPMcenter. org, vol. 123, p. 142, 2007.

[45] A. Rozinat and W. M. P. van der Aalst, “Conformance checking of processes based on monitoring real behavior,” Information Systems, vol. 33, no. 1, pp. 64–95, 2008.

138 [46] N. Tax, X. Lu, N. Sidorova, D. Fahland, and W. M. P. van der Aalst, “The imprecisions of precision measures in process mining,” Information Processing Letters, vol. 135, pp. 1–8, 2018.

[47] S. J. Leemans, Robust Process Mining with Guarantees. PhD thesis, Eindhoven University of Technology, 2017.

[48] W. M. P. van der Aalst, “Process discovery from event data: Relating models and logs through abstractions,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 8, no. 3, p. e1244, 2018.

[49] C. W. Günther and A. Rozinat, “Disco: Discover your processes,” BPM (Demos), vol. 940, pp. 40–44, 2012.

[50] J. Buijs, Flexible Evolutionary Algorithms for Mining Structured Process Models. PhD thesis, Eindhoven University of Technology, 2014.

[51] J. Munoz-Gama, Conformance Checking and Diagnosis in Process Mining. Springer, 2016.

[52] S. Dunzer, M. Stierle, M. Matzner, and S. Baier, “Conformance checking: a state- of-the-art literature review,” in Proceedings of the 11th International Conference on Subject-Oriented Business Process Management, pp. 1–10, 2019.

[53] R. J. C. Bose and W. M. P. van der Aalst, “Trace alignment in process mining: opportunities for process diagnostics,” in International Conference on Business Process Management, pp. 227–242, Springer, 2010.

[54] D.-F. Feng and R. F. Doolittle, “Progressive sequence alignment as a prerequisite to correct phylogenetic trees,” Journal of Molecular Evolution, vol. 25, no. 4, pp. 351–360, 1987.

[55] G. J. Barton and M. J. Sternberg, “A strategy for the rapid multiple alignment of protein sequences: confidence levels from tertiary structure comparisons,” Journal of Molecular Biology, vol. 198, no. 2, pp. 327–337, 1987.

[56] M. De Leoni and W. M. P. van der Aalst, “Aligning event logs and process models for multi-perspective conformance checking: An approach based on integer linear programming,” in Business Process Management, pp. 113–129, Springer, 2013.

[57] F. Mannhardt, M. De Leoni, H. A. Reijers, and W. M. P. van der Aalst, “Balanced multi-perspective checking of process conformance,” Computing, vol. 98, no. 4, pp. 407–437, 2016.

[58] H. Verbeek, J. C. Buijs, B. F. Van Dongen, W. M. P. van der Aalst, et al., “Xes, xesame, and prom 6,” in CAiSE Forum, vol. 72, pp. 60–75, Springer, 2010.

139 [59] I. T. F. on Process Mining, “IEEE Standard for eXtensible Event Stream (XES) for Achieving Interoperability in Event Logs and Event Streams,” IEEE Std. 1849-2016, pp. 53–57, 2016.

[60] I. T. F. on Process Mining, XES standard extension for time, 2016. http://www. xes-standard.org/time.xesext, last access 17.01.2021.

[61] W. van der Aalst, “Academic view: Development of the process mining discipline,” in Process Mining in Action: Principles, Use Cases and Outlook, pp. 181–196, Springer, 2020.

[62] M. Kerremans, “Gartner market guide for process mining,” Report G00353970. Gartner, 2018 (updated 2019).

[63] B. F. Van Dongen, A. K. A. de Medeiros, H. Verbeek, A. Weijters, and W. M. P. van der Aalst, “The prom framework: A new era in process mining tool support,” in ICATPN, vol. 3536, pp. 444–454, Springer, 2005.

[64] W. M. P. van der Aalst, “Object-centric process mining: Dealing with divergence and convergence in event data,” in International Conference on Software Engineering and Formal Methods, pp. 3–25, Springer, 2019.

[65] G. Patricia, A. Noam, et al., “Coming to terms: Scoping interoperability for health care,” Health Level Seven, EHR Interoperability Work Group, pp. 4–31, 2007.

[66] A. Geraci, F. Katki, L. McMonegal, B. Meyer, J. Lane, P. Wilson, J. Radatz, M. Yee, H. Porteous, and F. Springsteel, IEEE Standard Computer Dictionary: Compilation of IEEE Standard Computer Glossaries. IEEE Press, 1991.

[67] B. Blobel, “Introduction into advanced ehealth–the personal health challenge,” Studies in Health Technology and Informatics, vol. 134, pp. 3–14, 2008.

[68] F. Oemig and R. Snelick, Healthcare Interoperability Standards Compliance Hand- book. Springer, 2016.

[69] H. Information and M. S. Society, HIMSS Dictionary of Health Information Tech- nology Terms, Acronyms, and Organizations. CRC Press, 2017.

[70] R. H. Dolin and L. Alschuler, “Approaching semantic interoperability in health level seven,” Journal of the American Medical Informatics Association, vol. 18, no. 1, pp. 99–103, 2010.

[71] ISO/IEC, Standardization and Related Activities — General Vocabulary. ISO/IEC GUIDE 2:2004(E/F/R), 2004.

[72] O. S. Pianykh, Digital Imaging and Communications in Medicine (DICOM): A Practical Introduction and Survival Guide. 2nd Ed. Springer Science & Business Media, 2012.

140 [73] E. L. Siegel and D. S. Channin, “Integrating the healthcare enterprise: a primer: part 1. introduction,” Radiographics, vol. 21, no. 5, pp. 1339–1341, 2001.

[74] S. Schulz, R. Stegwee, and C. Chronaki, “Standards in healthcare data,” in Funda- mentals of Clinical Data Science, pp. 19–36, Springer, 2019.

[75] HL7 International, FHIR Specification (v4.0.1: R4) Executive Summary, 2019. https://hl7.org/fhir/summary.html, last access 17.01.2021.

[76] J. C. Mandel, D. A. Kreda, K. D. Mandl, I. S. Kohane, and R. B. Ramoni, “SMART on FHIR: a standards-based, interoperable apps platform for electronic health records,” Journal of the American Medical Informatics Association, vol. 23, no. 5, pp. 899–908, 2016.

[77] HL7 International, FHIR Specification (v4.0.1: R4) Operations, 2019. https: //hl7.org/fhir/operations.html, last access 17.01.2021.

[78] C. E. Lipscomb, “Medical subject headings (mesh),” Bulletin of the Medical Library Association, vol. 88, no. 3, p. 265, 2000.

[79] O. Bodenreider, “The unified medical language system (umls): integrating biomed- ical terminology,” Nucleic Acids Research, vol. 32, no. suppl_1, pp. D267–D270, 2004.

[80] P. L. Whetzel, N. F. Noy, N. H. Shah, P. R. Alexander, C. Nyulas, T. Tudorache, and M. A. Musen, “Bioportal: enhanced functionality via new web services from the national center for biomedical ontology to access and use ontologies in software applications,” Nucleic Acids Research, vol. 39, no. suppl_2, pp. W541–W545, 2011.

[81] K. W. Fung and O. Bodenreider, “Knowledge representation and ontologies,” in Clinical Research Informatics, pp. 313–339, Springer, 2019.

[82] World Health Organization, International Statistical Classification of Diseases and Related Health Problems, vol. 2. World Health Organization, 2004.

[83] D. L. Rubin, “Creating and curating a terminology for radiology: ontology modeling and analysis,” Journal of Digital Imaging, vol. 21, no. 4, pp. 355–362, 2008.

[84] C. Meenan, B. Erickson, N. Knight, J. Fossett, E. Olsen, P. Mohod, J. Chen, and S. G. Langer, “Workflow lexicons in healthcare: validation of the swim lexicon,” Journal of Digital Imaging, vol. 30, no. 3, pp. 255–266, 2017.

[85] C. J. McDonald, S. M. Huff, J. G. Suico, G. Hill, D. Leavelle, R. Aller, A. Forrey, K. Mercer, G. DeMoor, J. Hook, et al., “Loinc, a universal standard for identifying laboratory observations: a 5-year update,” Clinical Chemistry, vol. 49, no. 4, pp. 624–633, 2003.

141 [86] IHE International, Inc., “Appendix a: Ihe actor definitions,” IHE Technical Frame- works, General Introduction, 2018.

[87] IHE ITI Technical Committee, “Audit trail and node authentication (atna) profile,” IHE IT Infrastructure (ITI) Technical Framework, Volume 1 (ITI TF-1) Integration Profiles, Rev. 17, pp. 68–80, 2020.

[88] IHE ITI Technical Committee, “Add restful atna (query and feed),” IHE IT Infras- tructure (ITI) Technical Framework Supplement, Rev. 3.2 – Trial Implementation, 2020.

[89] IHE ITI Technical Committee, “Record audit event,” IHE IT Infrastructure (ITI) Technical Framework, Volume 2a (ITI TF-2a) Transactions Part A – Sections 3.1 – 3.28, Rev. 17, pp. 139–153, 2020.

[90] IHE Radiology Technical Committee, “Standardized operational log of events (sole),” IHE Radiology Technical Framework Supplement, Rev. 1.2 – Trial Implementation, 2018.

[91] D. J. Vreeman, S. Abhyankar, K. C. Wang, C. Carr, B. Collins, D. L. Rubin, and C. P. Langlotz, “The loinc rsna radiology playbook-a unified terminology for radiology procedures,” Journal of the American Medical Informatics Association, vol. 25, no. 7, pp. 885–893, 2018.

[92] A. A. Funkner, A. N. Yakovlev, and S. V. Kovalchuk, “Data-driven modeling of clinical pathways using electronic health records,” Procedia Computer Science, vol. 121, pp. 835–842, 2017.

[93] F. Fox, V. R. Aggarwal, H. Whelton, and O. Johnson, “A data quality framework for process mining of electronic health record data,” in International Conference on Healthcare Informatics (ICHI), pp. 12–21, IEEE, 2018.

[94] T. G. Erdogan and A. Tarhan, “A Goal-Driven Evaluation Method Based On Process Mining for Healthcare Processes,” Applied Sciences, vol. 8, no. 6, p. 894, 2018.

[95] J. Lismont, A.-S. Janssens, I. Odnoletkova, and et al., “A guide for the application of analytics on healthcare processes: A dynamic view on patient pathways,” Computers in Biology and Medicine, vol. 77, pp. 125–134, 2016.

[96] D. Duma and R. Aringhieri, “An ad hoc process mining approach to discover patient paths of an Emergency Department,” Flexible Services and Manufacturing Journal, pp. 1–29, 2018.

[97] S. Yang, A. Sarcevic, R. A. Farneth, and et al., “An approach to automatic process deviation detection in a time-critical clinical process,” Journal of Biomedical Informatics, vol. 85, pp. 155–167, 2018.

142 [98] H. Baek, M. Cho, S. Kim, H. Hwang, M. Song, and S. Yoo, “Analysis of length of hospital stay using electronic health records: A statistical and data mining approach,” PloS one, vol. 13, no. 4, 2018. [99] F. Mannhardt and D. Blinde, “Analyzing the trajectories of patients with sepsis using process mining,” CEUR Workshop Proceedings, vol. 1859, pp. 72–80, 2017. [100] K. Tóth, K. Machalik, G. Fogarassy, and Á. Vathy-Fogarassy, “Applicability of process mining in the exploration of healthcare sequences,” in 30th Neumann Colloquium (NC), pp. 151–156, IEEE, 2017. [101] A. Alharbi, A. Bulpitt, and O. Johnson, “Improving Pattern Detection in Healthcare Process Mining Using an Interval-Based Event Selection Method,” in International Conference on Business Process Management, pp. 88–105, Springer, 2017. [102] Y. Chen, A. N. Kho, D. Liebovitz, C. Ivory, S. Osmundson, J. Bian, and B. A. Malin, “Learning bundled care opportunities from electronic medical records,” Journal of Biomedical Informatics, vol. 77, pp. 1–10, 2018. [103] R. Andrews, M. T. Wynn, K. Vallmuur, A. H. ter Hofstede, E. Bosley, M. Elcock, and S. Rashford, “Pre-hospital retrieval and transport of road trauma patients in queensland,” in International Conference on Business Process Management, pp. 199–213, Springer, 2018. [104] F. Mannhardt and P. J. Toussaint, “Revealing Work Practices in Hospitals Using Process Mining,” Building Continents of Knowledge in Oceans of Data: The Future of Co-Created eHealth, 2018. [105] A. Stefanini, D. Aloini, R. Dulmin, and V. Mininno, “Service Reconfiguration in Healthcare Systems: The Case of a New Focused Hospital Unit,” in Int. Conf. on Health Care Systems Engineering, pp. 179–188, Springer, 2017. [106] A. P. Kurniati, E. Rojas, D. Hogg, G. Hall, and O. Johnson, “The assessment of data quality issues for process mining in healthcare using Medical Information Mart for Intensive Care III, a freely available e-health record database,” Health Informatics Journal, 2018. [107] G.-J. de Vries, R. A. Q. Neira, G. Geleijnse, P. Dixit, and B. F. Mazza, “Towards Process Mining of EMR Data,” in International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC), 2017. [108] K. Kirchner and P. Marković, “Unveiling Hidden Patterns in Flexible Medical Treatment Processes – A Process Mining Case Study,” in Int. Conference on Decision Support System Technology, pp. 169–180, Springer, 2018. [109] E. Rojas and D. Capurro, “Characterization of drug use patterns using process mining and temporal abstraction digital phenotyping,” in International Conference on Business Process Management, pp. 187–198, Springer, 2018.

143 [110] C. Alvarez, E. Rojas, M. Arias, J. Munoz-Gama, and et al., “Discovering role interaction models in the Emergency Room using Process Mining,” Journal of Biomedical Informatics, vol. 78, pp. 60–77, 2018.

[111] O. Metsker, A. Yakovlev, E. Bolgova, A. Vasin, and S. Koval-chuk, “Identification of Pathophysiological Subclinical Variances During Complex Treatment Process of Cardiovascular Patients,” Procedia Computer Science, vol. 138, pp. 161–168, 2018.

[112] K. Kirchner, P. Marković, and P. Delias, “Automatic creation of clinical pathways - a case study,” Data Science and Business Intelligence, vol. 179, p. 188, 2016.

[113] S. Yang, M. Zhou, S. Chen, X. Dong, O. Ahmed, R. S. Burd, and I. Marsic, “Medical Workflow Modeling Using Alignment-Guided State-Splitting HMM,” in International Conference on Healthcare Informatics (ICHI), pp. 144–153, IEEE, 2017.

[114] E. Rojas, M. Sepúlveda, J. Munoz-Gama, D. Capurro, V. Traver, and C. Fernandez- Llatas, “Question-driven methodology for analyzing emergency room processes using process mining,” Applied Sciences, vol. 7, no. 3, 2017.

[115] A. Stell, I. Piper, and L. Moss, “Automated Measurement of Adherence to (TBI) Guidelines using Neurological ICU Data,” in International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC), SCITEPRESS, 2018.

[116] C. Fernandez-llatas, G. Ibanez-sanchez, and et al., “Analyzing Medical Emergency Processes with Process Mining : The Stroke Case,” in International Conference on Business Process Management, pp. 214–225, Springer, 2018.

[117] T. Conca, C. Saint-Pierre, V. Herskovic, M. Sepúlveda, D. Capurro, F. Prieto, and C. Fernandez-Llatas, “Multidisciplinary Collaboration in the Treatment of Patients With Type 2 Diabetes in Primary Care: Analysis Using Process Mining,” Journal of Medical Internet Research, vol. 20, no. 4, 2018.

[118] R. Gatta, M. Vallati, J. Lenkowicz, C. Casa, F. Cellini, A. Damiani, and V. Valen- tini, “A Framework for Event Log Generation and Knowledge Representation for Process Mining in Healthcare,” in International Conference on Tools with Artificial Intelligence (ICTAI), pp. 647–654, IEEE, 2018.

[119] A. Najjar, D. Reinharz, C. Girouard, and C. Gagné, “A two-step approach for mining patient treatment pathways in administrative healthcare databases,” Artificial Intelligence in Medicine, vol. 87, pp. 34–48, 2018.

[120] J. Chen, L. Sun, C. Guo, W. Wei, and Y. Xie, “A data-driven framework of typical treatment process extraction and evaluation,” Journal of Biomedical Informatics, vol. 83, pp. 178–195, 2018.

144 [121] H. Yan, P. Van Gorp, U. Kaymak, and et al., “Aligning event logs to task-time matrix clinical pathways in BPMN for variance analysis,” Journal of Biomedical and Health Informatics, vol. 22, no. 2, pp. 311–317, 2018.

[122] R. A. Q. Neira, G.-J. de Vries, J. Caffarel, and E. Stretton, “Extraction of Data from a Hospital Information System to Perform Process Mining,” in MedInfo, pp. 554–558, 2017.

[123] A. Dagliati, L. Sacchi, A. Zambelli, V. Tibollo, L. Pavesi, J. H. Holmes, and R. Bellazzi, “Temporal electronic phenotyping by mining careflows of breast cancer patients,” Journal of Biomedical Informatics, vol. 66, pp. 136–147, 2017.

[124] K. Baker, E. Dunwoodie, and et al., “Process mining routinely collected elec- tronic health records to define real-life clinical pathways during chemotherapy,” International Journal of Medical Informatics, vol. 103, pp. 32–41, 2017.

[125] Z. Huang, Z. Ge, W. Dong, K. He, and H. Duan, “Probabilistic modeling personal- ized treatment pathways using electronic health records,” Journal of Biomedical Informatics, vol. 86, pp. 33–48, 2018.

[126] O. Johnson, T. B. Dhafari, A. Kurniati, F. Fox, and E. Rojas, “The clearpath method for care pathway process mining and simulation,” in International Confer- ence on Business Process Management, pp. 239–250, Springer, 2018.

[127] A. Jimenez-Ramirez, I. Barba, M. Reichert, B. Weber, and C. Del Valle, “Clinical Processes-The Killer Application for Constraint-Based Process Interactions?,” in International Conference on Advanced Information Systems Engineering, pp. 374– 390, Springer, 2018.

[128] S. J. Leemans, D. Fahland, and W. van der Aalst, “Process and deviation exploration with inductive visual miner,” BPM (Demos), vol. 1295, no. 8, 2014.

[129] M. Song, C. W. Günther, and W. M. P. van der Aalst, “Trace clustering in process mining,” in International Conference on Business Process Management, pp. 109–120, Springer, 2008.

[130] Z. Huang, W. Dong, L. Ji, C. He, and H. Duan, “Incorporating comorbidities into latent treatment pattern mining for clinical pathways,” Journal of Biomedical Informatics, vol. 59, pp. 227–239, 2016.

[131] M. R. Munafò, B. A. Nosek, D. V. M. Bishop, and et al., “A manifesto for reproducible science,” Nature Human Behaviour, vol. 1, no. 1, p. 21, 2017.

[132] S. Bakken, “The journey to transparency, reproducibility, and replicability,” Journal of the American Med. Informatics Association, vol. 26, pp. 185–187, 2019.

[133] R. Lenz and M. Reichert, “It support for healthcare processes–premises, challenges, perspectives,” Data & Knowledge Engineering, vol. 61, no. 1, pp. 39–58, 2007.

145 [134] D. A. Mordaunt, “On clinical utility and systematic reporting in case studies of healthcare process mining,” International Journal of Environmental Research and Public Health, vol. 17, no. 22, p. 8298, 2020.

[135] G. Marshall, “Rfc 3881-security audit and access accountability message xml data definitions for healthcare applications,” Request for Comments, vol. 3881, 2004.

[136] NEMA, DICOM PS3.15 2020c - Security and System Management Profiles, Audit Trail Message Format Profile, 2020. http://dicom.nema.org/ medical/dicom/current/output/html/part15.html#sect_A.5, last access 17.01.2021.

[137] HL7 International, FHIR Specification (v4.0.1: R4) Resource AuditEvent, 2019. http://hl7.org/fhir/auditevent.html, last access 17.01.2021.

[138] O. Object Management Group, “Meta object facility (mof) core specification,” 2016. www.omg.org/spec/MOF/2.5.1, last access 17.01.2021.

[139] B. Elvesæter, A. Hahn, A.-J. Berre, and T. Neple, “Towards an interoperability framework for model-driven development of software systems,” in Interoperability of enterprise software and applications, pp. 409–420, Springer, 2006.

[140] Open eHealth Foundation, IPF Open eHealth Integration Platform, 2020. https: //oehf.github.io/ipf-docs/project-info/, last access 17.01.2021.

[141] IHE ITI Technical Committee, “Volume 1 (iti tf-1) integration profiles,” IHE IT Infrastructure (ITI) Technical Framework, Rev. 17, 2020.

[142] L. Eichinger, “Änderungen im radiologischen workflow durch die elektronische gesundheitsakte elga,” Bachelor thesis at University of Applied Sciences Upper Austria, School of Informatics, Communications and Media, Hagenberg, 2014.

[143] E. Helm, A. Schuler, and H. Mayr, “Cross-enterprise communication and data exchange in radiology in austria: Technology and use cases,” Studies in Health Technology and Informatics, vol. 248, pp. 64–71, 2018.

[144] E. González López de Murillas, H. A. Reijers, and W. M. P. van der Aalst, “Con- necting databases with process mining: A meta model and toolset,” in International Workshop on Business Process Modeling, Development and Support, pp. 231–249, Springer, 2016.

[145] J. E. Ingvaldsen and J. A. Gulla, “Preprocessing support for large scale process mining of SAP transactions,” in Business Process Management Workshops, pp. 30– 41, Springer, 2008.

[146] E. Mahendrawathi, H. M. Astuti, and I. R. K. Wardhani, “Material movement analysis for warehouse business process improvement with process mining: A case study,” in Asia Pacific Business Process Management, pp. 115–127, Springer, 2015.

146 [147] J. Štolfa, M. Kopka, S. Štolfa, O. Koběrsk`y,and V. Snášel, “An application of process mining to invoice verification process in sap,” in Innovations in Bio-inspired Computing and Applications, pp. 61–74, Springer, 2014.

[148] N. Mueller-Wickop and M. Schultz, “ERP event log preprocessing: Timestamps vs. accounting logic,” in Design Science at the Intersection of Physical and Virtual Design, vol. 7939 of Lecture Notes in Computer Science, pp. 105–119, Springer Berlin Heidelberg, 2013.

[149] Y. Sismanis, P. Brown, P. J. Haas, and B. Reinwald, “Gordian: efficient and scalable discovery of composite keys,” in Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 691–702, VLDB Endowment, 2006.

[150] M. Zhang, M. Hadjieleftheriou, B. C. Ooi, C. M. Procopiuc, and D. Srivastava, “On multi-column foreign key discovery,” Proceedings of the VLDB Endowment, vol. 3, no. 1-2, pp. 805–814, 2010.

[151] A. Burattin, “Plg2: Multiperspective process randomization with online and offline simulations,” in BPM (Demos), pp. 1–6, 2016.

[152] Smile CDR Inc., HAPI FHIR Interceptors: Overview, 2020. https://hapifhir. io/hapi-fhir/docs/interceptors/interceptors.html, last access 17.01.2021.

[153] R. Cruz-Correia, I. Boldt, L. Lapão, C. Santos-Pereira, P. P. Rodrigues, A. M. Ferreira, and A. Freitas, “Analysis of the quality of hospital information systems audit trails,” BMC Medical Informatics and Decision Making, vol. 13, no. 1, p. 84, 2013.

[154] P. Groth and L. Moreau, W3C Working Group Note “PROV-Overview”, 2013. https://w3.org/TR/2013/NOTE-prov-overview-20130430/, last access 17.01.2021.

[155] HL7 International, FHIR Specification (v4.0.1: R4) Resource AuditEvent - Detailed Desc., 2019. https://hl7.org/fhir/auditevent-definitions.html, last access 17.01.2021.

[156] IHE ITI Technical Committee, “Consistent time integration profile (ct),” IHE IT Infrastructure (ITI) Technical Framework, Volume 1 (ITI TF-1) Integration Profiles, Rev. 17, pp. 60–62, 2020.

[157] C. Garbe, K. Peris, A. Hauschild, P. Saiag, M. Middleton, L. Bastholt, J.-J. Grob, J. Malvehy, J. Newton-Bishop, A. J. Stratigos, et al., “Diagnosis and treatment of melanoma. european consensus-based interdisciplinary guideline–update 2016,” European Journal of Cancer, vol. 63, pp. 201–217, 2016.

147 [158] A. Jemal, M. Saraiya, P. Patel, S. S. Cherala, J. Barnholtz-Sloan, J. Kim, C. L. Wiggins, and P. A. Wingo, “Recent trends in cutaneous melanoma incidence and death rates in the united states, 1992-2006,” Journal of the American Academy of Dermatology, vol. 65, no. 5, pp. S17–e1, 2011.

[159] A. B. Francken, E. Bastiaannet, and H. J. Hoekstra, “Follow-up in patients with localised primary cutaneous melanoma,” The Lancet Oncology, vol. 6, no. 8, pp. 608– 621, 2005.

[160] M. Binder, W. Dorda, G. Duftschmid, R. Dunkl, K. A. Fröschl, W. Gall, W. Gross- mann, K. Harmankaya, M. Hronsky, S. Rinderle-Ma, et al., “On analyzing process compliance in skin cancer treatment: an experience report from the evidence-based medical compliance cluster (ebmc 2),” in International Conference on Advanced Information Systems Engineering, pp. 398–413, Springer, 2012.

[161] R. Dunkl, K. A. Fröschl, W. Grossmann, and S. Rinderle-Ma, “Assessing medical treatment compliance based on formal process modeling,” in Symposium of the Austrian HCI and Usability Engineering Group, pp. 533–546, Springer, 2011.

[162] F. Mannhardt, M. De Leoni, and H. A. Reijers, “The multi-perspective process explorer,” BPM (Demos), vol. 1418, pp. 130–134, 2015.

[163] W. Dorda, T. Wrba, G. Duftschmid, P. Sachs, W. Gall, C. Rehnelt, G. Boldt, and W. Premauer, “Archimed: a medical information and retrieval system,” Methods of Information in Medicine, vol. 38, no. 01, pp. 16–24, 1999.

[164] F. Mannhardt, M. De Leoni, H. A. Reijers, and W. M. P. van der Aalst, “Measuring the precision of multi-perspective process models,” in International Conference on Business Process Management, pp. 113–125, Springer, 2016.

[165] M. Hackl and P. Ihle, Krebserkrankungen in Österreich. 6th edn. Statistik Austria, 2018.

[166] M. Song and W. M. P. van der Aalst, “Supporting process mining by showing events at a glance,” in Proceedings of the 17th Annual Workshop on Information Technologies and Systems (WITS), pp. 139–145, 2007.

[167] C. Rinner, G. Duftschmid, T. Wrba, and W. Gall, “Making the complex data model of a clinical research platform accessible for teaching,” J. Innov. Health Inform, vol. 24, no. 1, 2017.

[168] S. N. Murphy, G. Weber, M. Mendis, V. Gainer, H. C. Chueh, S. Churchill, and I. Kohane, “Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2),” Journal of the American Medical Informatics Association, vol. 17, no. 2, pp. 124–130, 2010.

148 [169] J. M. Overhage, P. B. Ryan, C. G. Reich, A. G. Hartzema, and P. E. Stang, “Validation of a common data model for active safety surveillance research,” Journal of the American Medical Informatics Association, vol. 19, no. 1, pp. 54–60, 2011.

[170] H. Kittler, R. Weitzdorfer, H. Pehamberger, K. Wolff, and M. Binder, “Compliance with follow-up and prognosis among patients with thin melanomas,” European Journal of Cancer, vol. 37, no. 12, pp. 1504–1509, 2001.

[171] F. M. Maggi, “Declarative process mining with the declare component of prom,” in BPM (Demos), pp. 31–36, 2013.

[172] M. Rovani, F. M. Maggi, M. de Leoni, and W. M. P. van der Aalst, “Declarative process mining in healthcare,” Expert Systems with Applications, vol. 42, no. 23, pp. 9236–9251, 2015.

[173] M. Strasser, F. Pfeifer, E. Helm, A. Schuler, and J. Altmann, “Defining and reconstructing clinical processes based on ihe and bpmn 2.0,” Studies in Health Technology and Informatics, vol. 169, pp. 482–486, 2011.

149