<<

UPTEC F 18029 Examensarbete 30 hp Juni 2018

Investigation and Implementation of a Log Management and Analysis Framework for the Treatment Planning System RayStation

Elias Norrby

Abstract Investigation and Implementation of a Log Management and Analysis Framework for the Treatment Planning System RayStation Elias Norrby

Teknisk- naturvetenskaplig fakultet UTH-enheten The purpose of this thesis is to investigate and implement a framework for log management and analysis tailored to the treatment planning system (TPS) Besöksadress: RayStation. A TPS is a highly advanced package used in radiation Ångströmlaboratoriet Lägerhyddsvägen 1 oncology clinics, and the complexity of the software makes writing robust code Hus 4, Plan 0 challenging. Although the product is tested rigorously during development, bugs are present in released software. The purpose of the the framework is to allow the Postadress: RayStation development team insight into errors encountered in clinics by Box 536 751 21 Uppsala centralizing log file data recorded at clinics around the world.

Telefon: A framework based on the Elastic stack, a suite of open-source products, is 018 – 471 30 03 proposed, addressing a set of known issues described as the access problem, the

Telefax: processing problem, and the analysis problem. Firstly, log files are stored locally on 018 – 471 30 00 each machine running RayStation, some of which may not be connected to the Internet. Gaining access to the data is further complicated by legal frameworks Hemsida: such as HIPAA and GDPR that put constraints on how clinic data can be handled. http://www.teknat.uu.se/student The framework allows for access to the files while respecting these constraints. Secondly, log files are written in several different formats. The framework is flexible enough to process files of multiple different formats and consistently extracts relevant information. Thirdly, the framework offers comprehensive tools for analyzing the collected data. Deployed in-house on a set of 38 machines used by the RayStation development team, the framework was demonstrated to offer solutions to each of the listed problems.

Handledare: Karl Lundin Ämnesgranskare: Carl Nettelblad Examinator: Tomas Nyberg ISSN: 1401-5757, UPTEC F 18029

v

Sammanfattning

Många fält har kunnat dra nytta av de förbättrade beräkningsmöjligheter moderna datorer erbjuder. Inom strålbehandling av cancerpatienter används idag avancerade datorprogram i form av dosplaneringssystem. Med deras hjälp kan behandlingar utformas så att tumörer nås av den ordinerade mängden strålning samtidigt som omgivande frisk vävnad skonas. Ett sådant dosplaneringssytem är RayStation, utvecklat av RaySearch. Program- varans komplexitet gör det svårt att skriva robust kod, och trots noggranna testpro- cedurer finns mjukvarufel i de versioner som används på kliniker. Ett viktigt led i att säkerställa produktens kvalitet är att bearbeta innehållet i de loggfiler som skrivs av RayStation under användning. RaySearch har idag väldigt begränsad tillgång till logginformation från kliniker. Åtkomsten försvåras av att de maskiner som an- vänds på kliniker ofta är isolerade och saknar internetanslutning. Dessutom ställer juridiska ramverk såsom HIPAA och GDPR hårda krav på hur klinikernas data ska hanteras. Syftet med det här projektet har varit att ta fram ett ramverk för loggfilshanter- ing, anpassat för RayStation och den miljö programvaran används i på kliniker. Utöver att lösa problemet med åtkomst ger ramverket exempel på hur informatio- nen i filerna kan bearbetas och analyseras. Det föreslagna ramverket är baserat på Elasticstacken, ett programpaket med öp- pen källkod som är ett populärt val för loggfilshantering. Ramverkets funktionalitet demonstrerades på RaySearchs utvecklingsavdelning där information från loggfiler på 38 maskiner samlades in och analyserades.

vii

Acknowledgements

Firstly, I would like to thank my supervisor, Karl Lundin, as well as the rest of the RayStation Core development team at RaySearch, for aiding me in my work and for welcoming me into their office space. I will miss our morning meetings!

Secondly, I would like to thank my subject reader, Carl Nettelblad, for enthusiasti- cally and meticulously reviewing draft upon draft, and for offering insightful com- ments during the writing process.

ix

Contents

1 Introduction 1 1.1 Background ...... 1 1.2 Purpose and tasks ...... 1 1.2.1 Objective ...... 2 1.2.2 Strategy ...... 2 1.3 Delimitations ...... 2 1.3.1 Focus on exceptions ...... 2 1.3.2 Focus on functionality ...... 3 1.3.3 Focus on compatibility with current and future releases ...... 3 1.4 Restatement of the problem ...... 4 1.5 Restatement of the response ...... 4

2 Technical background 5 2.1 Context ...... 5 2.1.1 A brief introduction to radiation therapy ...... 5 2.1.2 The role of treatment planning systems ...... 6 2.1.3 The role of RaySearch ...... 7 2.1.4 Patient data privacy ...... 7 The Health Insurance Portability and Accountability Act (HIPAA) . 8 The General Data Protection Regulation (GDPR) ...... 8 2.2 General log management concepts ...... 9 2.2.1 Remote log analysis ...... 10 2.2.2 Secure data transfers ...... 10 2.2.3 A typical log message ...... 11 2.2.4 Writing logs ...... 11 2.2.5 Managing logs ...... 12 2.3 The Elastic stack ...... 13 2.3.1 Elasticsearch ...... 13 Glossary and architecture ...... 13 Indexing ...... 13 Scaling and clusters ...... 14 2.3.2 Logstash ...... 14 Configuring Logstash ...... 14 2.3.3 Kibana ...... 15 2.3.4 Filebeat ...... 17 2.4 Log files in RayStation ...... 18 2.4.1 The RayStation Storage Tool log ...... 18 2.4.2 The RayStation Index Service log ...... 18 2.4.3 The RayStation Error log ...... 19 2.4.4 The RaaS logs ...... 19 x

3 Implementation 21 3.1 A: Setting up a pattern debugging pipeline ...... 22 3.2 B: Monitoring performance and collecting system data ...... 24 3.3 C1: Processing RayStation logs and managing multi-line messages ..... 26 3.4 C2: Centralizing logs in a virtual environment ...... 29 3.5 C3: Monitoring multiple workstations ...... 31 3.6 C4: Simulating the clinic-to-RaySearch relationship ...... 32 3.7 C5: Finalizing the proof-of-concept solution ...... 33

4 Evaluation and results 35 4.1 File output impact on performance ...... 36 4.2 Comparison of two logging libraries ...... 37 4.3 Logstash performance with structured messages ...... 38

5 Discussion 39 5.1 Evaluation results ...... 39 5.1.1 File output impact on performance ...... 39 5.1.2 Performance of logging libraries ...... 40 5.1.3 Logstash performance with structured messages ...... 40 5.2 Security considerations ...... 41 5.3 Privacy considerations ...... 42 5.3.1 HIPAA ...... 42 5.3.2 GDPR ...... 43 5.4 Issues encountered during development ...... 44 5.4.1 Managing clinic-side component configurations ...... 44 5.4.2 Logstash output limitations ...... 45 5.4.3 Sparse indices vs. many indices ...... 46 5.4.4 Clinics not fulfilling the minimal networking conditions ...... 46

6 Conclusion 49 6.1 The access problem ...... 49 6.2 The processing problem ...... 49 6.3 The analysis problem ...... 49 1

1 Introduction

1.1 Background

A treatment planning system (TPS) is a highly advanced software package that is used in radiation oncology clinics to generate radiation plans for cancer patients. The overall objective of the TPS is to create treatment plans that give the prescribed amount of absorbed radiation dose to the tumor, while sparing the surrounding healthy tissue as much as possible. RayStation is a TPS developed by RaySearch Laboratories. Used by 400 clinics in 25 countries, it is one of the most widely used treatment planning systems on the market [35]. The complexity of the software makes writing robust code challeng- ing. With the correctness of calculations conceivably carrying the weight of life or death for patients, bugs and otherwise unexpected states have to be given grave con- sideration during runtime, usually meaning the termination of the program. Even though the product is put through rigorous testing pre-release, bugs are present in live software. A key part in the current process for troubleshooting RayStation software prob- lems, and eliminating bugs, is to analyze the contents of log files written by the software during runtime and when the application crashes. Today, the information contained in these log files is hard to come by. Due to the sensitive nature of the data handled, workstations used in clinics are often isolated. Frequently, they are not directly connected to the Internet. It is believed that access to this information would allow for major improvements to existing software. The aim of this thesis is to investigate, design and validate a framework for cen- tralized log management for the RayStation TPS.

1.2 Purpose and tasks

The purpose of this thesis is to propose a solution that: grants access to remotely stored log files (the access problem); processes the data contained in said log files, extracting important key-value pairs (the processing problem); and enables analysis of the data (the analysis problem). All parts of the problem come with their own sets of issues and considerations. The access problem Legal frameworks such as HIPAA and GDPR set high stan- dards for how patient data is to be handled. Meeting these criteria is a require- ment for RaySearch to receive clearance and thereby being able to sell its products. No framework that risks to violate any of these criteria can be used in production. While a certain amount of lenience towards these issues can be employed during the development of an in-house only, proof-of-concept solution, security and pri- vacy concerns should be given due consideration. The processing problem The sought after data is contained in log files of several different types. The structure and format of data varies between file types. The proposed solution must allow filtering and searching of log messages with regards 2 Chapter 1. Introduction to specific fields and features. Different log files may need to be parsed according to individual patterns to extract the necessary information. The analysis problem With no previous centralized system in place, there are no strict requirements with regards to the details of how the data is visualized or aggregated. Although there are no demands on the specifics of the implementation, some desired general functionality has been specified. The framework should al- low for: filtering events by distinguishing features such as application version, the error message contents, or the type of exception thrown; viewing similar events reg- istered at separate clinics so as to discover global or regional trends, and; viewing an event within a context provided by merging data from several sources. A platform is sought that is customizable enough to fine-tune visualizations, and other tools for analysis, to new scenarios.

1.2.1 Objective The proposed framework should solve the access, processing, and analysis problems when deployed in-house on 38 computers. The machines used during validation should be configured in such a way that a scenario with at least two different clinics supplying RaySearch’s facilities with log data is represented. The solution should

Allow for complete transparency (towards clinics) as to what data is being • transmitted from clinics to RaySearch. Use encrypted connections for all data transfers. • Filter out any sensitive data such as PII (Personally Identifiable Information) • of patients or clinic staff members. Enable analysis of the collected data. A sample of visual elements and tools • for analysis should be developed as proof of concept. Have minimum impact on the current application performance and user expe- • rience. Provide a way to persistently store logs at a central location. • 1.2.2 Strategy At no point during the development of the framework will it be employed in actual clinics. Although no patients are directly treated by RaySearch, the development en- vironment at RaySearch is sufficiently similar to a clinic environment. It involves a considerable amount of users running RayStation on a daily basis, all the time gen- erating log data. The processing and analysis problems will be considered solved when there is a framework deployed in-house that enables users to survey and filter logs from several machines in near real-time. The access problem will be consid- ered solved when the framework has been demonstrated to support the required amounts of transparency and security measures.

1.3 Delimitations

1.3.1 Focus on exceptions The log files contain events associated with different degrees of severity (see Sec- tion 2.2). The framework is developed primarily with the monitoring of critical er- rors in mind, i.e. crashes of RayStation. While the discovery of serious bugs and 1.3. Delimitations 3 the investigation of their origins are a primary concern for the development team at RaySearch, it is of great interest also to have a system for monitoring less critical program behavior. Potential use cases include: comparing performance metrics for different program functions on the diverse set of hardware components employed by clinics in their workstations; monitoring license use, available database storage and other system resources, and; gathering data about which features of the software are used most frequently. The monitoring of non-critical program events and performance metrics is judged to be beyond the scope of this thesis, mainly because the information required to carry out any meaningful analysis is not currently contained in the log file tem- plates. With a few exceptions, the thesis work will not involve any changes to actual RayStation code. The proposed framework will handle the files and events that RayStation records in its current state. No changes are made with regards to how, in what format and at what times logs are written, although some suggestions for potential improvements are offered (see Chapter 5).

1.3.2 Focus on functionality This work focuses on the benefits RaySearch would experience as a result of a log- ging framework being employed by clinics. It is to be expected, however, that clinics would show a certain amount of reluctance upon being asked to supply RaySearch with the log data. Any framework to be implemented in production must provide clinics with benefits of their own. While it could be argued that shipping log data to RaySearch will help RaySearch improve their products, which, in turn, will benefit clinics, this is not expected to be enough of a selling point. Developers of future im- plementations must consider what value the framework can produce for customers. Only limited effort has been dedicated to this issue during the design of the frame- work.

1.3.3 Focus on compatibility with current and future releases Tied to the release of any medical software are lengthy processes of approval by dif- ferent entities such as the Food and Drug Administration (FDA) in the United States and the China Food and Drug Administration (CFDA) in China. While new ver- sions of RayStation are released every year, older versions see widespread use and are still being actively supported by RaySearch. Ideally, a logging framework should be adaptable to every supported version. Development of a general (version agnos- tic) framework is impeded by the fact that log file formats vary between software versions. The matter is further complicated by the lack, in many log file templates, of a version identifier: even if custom filters and patterns for parsing files were de- veloped for each version, determining the version of the software being the source of an arbitrary log message is non-trivial. With these matters considered, the scope of this thesis has been limited to sup- plying a framework compatible with the version of RayStation currently in devel- opment (RayStation 8), with limited support offered to systems running the most recent release (RayStation 7). 4 Chapter 1. Introduction

1.4 Restatement of the problem

The complexity and large scope of TPS software code makes writing robust code dif- ficult. Some bugs are present in software versions released to clinics. Access to crash reports and log data would help accelerate development and improve quality, but due to safety regulations such data is predominantly stored only locally at clinics, frequently on workstations lacking an Internet connection. At its highest level, the problem is three-fold. First, there is the problem of data transfer from clinics to a central location; secondly, there is the problem of parsing and categorizing the collected data, and; thirdly, there is the problem of offering comprehensive tools for analyzing and investigating the data.

1.5 Restatement of the response

An investigation will be made into the specific limitations imposed by legal frame- works such as HIPAA and GDPR. Existing solutions for log management will be evaluated against the requirements presented in Section 1.2.1. A log management framework based on the Elastic stack will be proposed that fulfills the requirements, while adhering to the legal frameworks governing the handled data, both at rest, at clinics and at RaySearch, as well as in transit between them. 5

2 Technical background

2.1 Context

The following sections serve to give a comprehensive background of the state of radiation therapy today and the role of treatment planning systems in the work of clinics. A brief presentation of the position of RaySearch within this ecosystem is offered, as well as a cursory overview of the potential limitations and considerations introduced by patient data privacy frameworks such as HIPAA and GDPR.

2.1.1 A brief introduction to radiation therapy Following the discovery of x-rays by Wilhelm Conrad von Röntgen in 1895, it was not long before pioneers began experimenting with medical applications of the new technology. One of the first documented cases of radiation therapy is the treatment of a woman, suffering from cancer, carried out by the American Emil Grubbe only months after Röntgen’s discovery [39]. During 1898 and 1899, two Swedish pioneers, Thor Stenbeck and Tage Sjögren, treated three patients suffering from facial skin tumors using x-rays [11, p. 28]. The development of radiotherapy practices had a trial and error na- ture during its initial years. Little was known about the degree to which healthy tissue can tolerate radiation, putting both patients and radiologists at risk. Only by 1910 were any means of radiation protection built into the equipment used. By 1922, somewhere around 100 radiologists had died as a direct consequence of their exposure to radiation in their work environment [29]. Radiotherapy uses radiation to dam- age the DNA of cancerous cells. With the exception of hyperthermia (or ther- mal therapy), a type of cancer treatment FIGURE 2.1: Dose depth curves for different types currently under study in clinical trials of radiation. X-rays exhibit a high dose close to the surface and an exponential decay with increasing in which microwaves or ultrasound are thickness. Electron beams are good for targeting used to expose body tissue to high tem- tumors close to the surface since their rapid decay peratures, radiotherapy makes use of spares deeper, healthy tissue. Proton beams deliver ionizing radiation only [24][11, p. 24]. their maximum dose beneath the surface tissue, re- sulting in a surface sparing effect. Treatment falls into one of two main categories: external beam radiotherapy or Image source: https://commons.wikimedia.org/ brachytherapy. wiki/File:Dose_Depth_Curves.svg 6 Chapter 2. Technical background

In external beam radiotherapy, a radiation source is used to aim a beam at the patient’s body from a distance. The field of external beam radiotherapy is further divided by the type of radiation used. Photon therapy makes use of x-rays or gamma rays and is the most widely used form of treatment. Electron therapy, as the name implies, uses an electron beam. Compared to the alternatives, the electron beam has a limited range, after which the dose falls off rapidly. This makes it suitable for tumors close to the patient’s skin, since it spares deeper healthy tissue [34]. Particle therapy uses beams of energetic protons, neutrons or positive ions. These types of beams have a surface sparing effect in that they exhibit what is called a Bragg peak. Particles continuously lose energy while penetrating tissue, and as they slow down, the dose reaches a higher concentration. Whereas an x-ray beam delivers its maximum dose close to or at the surface, thereafter falling off exponentially, the maximum dose of the particle beam is delivered over the last few millimeters of the beam’s range, as can be seen in Figure 2.1. In brachytherapy, the radiation source, contained in a protective capsule, is placed inside or in close vicinity to the site of the cancerous tumor. This allows for high doses of very localized radiation. The implants may later be removed, or in some cases be allowed to remain in the body permanently [17].

2.1.2 The role of treatment planning systems Treatment planning is a process that takes place before treatment of a patient starts. The goals of the treatment planning are to make sure

the prescribed radiation dose is delivered to the target volume, i.e. the tumor; • the distribution of dose in the target volume is as even as possible; • healthy tissue is exposed to as little radiation as possible; • the total dose delivered to the patient is as low as possible; and • the treatment plan is viable in a practical sense, i.e. possible to repeat daily • with high precision. [11, p. 129]

Decisions involved in the treatment planning include determining the total dose and the number of fractions to deliver it in, how many beams to use, their respective angles of delivery and which multileaf collimator (MLC) configurations will be used to shape the beams. Godfrey Hounsfield’s 1971 invention of computed tomography (CT) allowed for a shift from the previously employed 2D planning to the superior 3D planning [3]. Instead of representing the target volume using one or several cross sections, tumors and organs could now be modeled in their entirety as volumes. In 3-dimensional conformal radiation therapy (3DCRT), a variable number of beams are used, each of which is shaped to fit the profile of the target volume using an MLC [11, p. 32]. Intensity-modulated radiation therapy (IMRT) is an advancement within 3DCRT. It allows for modulation of the intensity of beams, creating areas of high or low intensity, resulting in even higher control of the dose distribution reaching the target [25, 8]. Volumetric modulated arc therapy (VMAT) is an advanced type of IMRT where radiation is delivered by means of a rotating gantry. Depending on the cancer type, VMAT is superior to IMRT with regards to sparing healthy tissue [45]. It has the added benefit of faster delivery, reducing patient treatment times [5, 50]. 2.1. Context 7

The complexity of the calculations involved is drastically higher when work- ing with a 3D model, requiring powerful computer systems. Today, there are sev- eral treatment planning systems on the market, and RayStation, developed by Ray- Search, is one of them.

2.1.3 The role of RaySearch RaySearch is purely a software development company. As such, it has no direct contact with actual patients, merely providing clinics with software to aid treatment. RaySearch works continually to improve RayStation. Improvements range from the elimination of software bugs, to the addition of requested features and support for additional treatment methods and medical equipment. Problems experienced by clinics are communicated with RaySearch by way of e-mail, with log file information attached when the occasion calls for it.

FIGURE 2.2: RaySearch supplies clinics with the TPS RayStation. Clinics use it to plan the treatment of its patients. Left side: The current state of affairs. Access to log file data is occasional, and sharing is always initiated by the clinic. Right side: The envisioned state with a centralized logging framework in place. Sharing of log file data is persistent. RaySearch is made aware of clinic-side problems in real-time. Both RaySearch and clinics have access to tools for analyzing the data.

The proposed framework aims to facilitate the transfer of log file information, and to enable both clinics and RaySearch to analyze the data. A comparison of the cur- rent state and the vision is shown in Figure 2.2. Having real-time access to log file information would allow RaySearch to respond faster to individual support cases, as well as to discover software errors that are not being reported today.

2.1.4 Patient data privacy Medical software is subject to strict regulations regarding safety and privacy, im- posed by protocols such as the Health Insurance Portability and Accountability Act (HIPAA) issued by the United States Food and Drug Association (FDA) and the Gen- eral Data Protection Regulation (GDPR) issued by the European Commission. Com- pliance with these regulations is a requirement for selling products in the respective markets covered. RayStation and its supporting functions already fulfill the require- ments put forth by HIPAA and GDPR. The proposed framework must also comply with these standards. 8 Chapter 2. Technical background

The Health Insurance Portability and Accountability Act (HIPAA) HIPAA sets a standard for protecting sensitive patient data. Protected health infor- mation (PHI) is broadly interpreted to include any part of a patient’s medical record or payment history. HIPAA dictates that protected health information that is linked to any of 18 identifiers (including e.g. names, phone numbers and social security numbers) must be treated with special care. The contents of HIPAA and how it ap- plies to RayStation can summarized in five major areas:

Physical safeguards must be in place, including policies about use and access • to workstations and electronic media. Technical safeguards must be in place. Access control should allow only au- • thorized personnel access to electronic protected health information. Appro- priate measures include automatic log off of users, having an emergency access procedure, and encrypting stored sensitive data. Audit reports must be implemented. Audit reports are records of all access to • and changes to protected health information. Technical policies reducing the risk of patient data loss, such as keeping off- • site backups in case of disastrous hardware failure, must be employed. Network, or transmission, security puts demands on all methods of transmit- • ting data, be it within a private network or over the Internet. [46]

The General Data Protection Regulation (GDPR) The General Data Protection Regulation is a regulation on data protection and pri- vacy for all members of the European Union. Its main purpose is to give individuals insight into and control over the personal data recorded about them by companies and other actors. It lays down rules for how personal data about data subjects is to be handled by controllers and processors. Article 4 of the GDPR defines the terms as follows:

For the purposes of this Regulation:

1. ‘personal data’ means any information relating to an identified or identi- fiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physio- logical, genetic, mental, economic, cultural or social identity of that natural person; ... 7. ‘controller’ means the natural or legal person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of the processing of personal data; [. . . ] 8. ‘processor’ means a natural or legal person, public authority, agency or other body which processes personal data on behalf of the controller; [36]

With regards to RayStation and the intended framework for log management, there are three relationships that have to be considered: the RaySearch-to-Clinic rela- tionship, the Clinic-to-Patient relationship and the RaySearch-to-Patient relationship. In the RaySearch-to-Clinic relationship, RaySearch has the role of controller, in some cases handling personal data relating to individuals employed by the clinic. The 2.2. General log management concepts 9

Clinic-to-Patient relationship similarly has the clinic in the role of controller, han- dling the personal data of patients. RaySearch has an obligation to clinics to supply them with software that meets the standards of the GDPR, lest clinics violate their agreements with patients. The RaySearch-to-Patient relationship has to be consid- ered in cases where personal data about patients (data subjects) are transmitted from a clinic (a controller) to RaySearch (a processor). The GDPR specifies numerous requirements for controllers and processors of personal data. The ones potentially relevant during the design of a log management framework are

Lawful basis for processing: There are six lawful bases for processing of per- • sonal data, and at least of one of them must apply for processing to be allowed. Privacy by design and default: Data controllers must implement technical • and organizational measures which meet the principles of data protection by design and default, meaning privacy settings must be a high level priority. Pseudonomization: Stored personal data has to be transformed in such a way • that the resulting data cannot be tied to a specific data subject without the use of an additional information. Encryption of data is an example, requiring additional data in the form of a decryption key. Right of access: Data subjects have the right to access their personal data and • information about how that data is being processed. Right of erasure: Data subjects have the right to demand their data to be • erased on a number of grounds. Data breaches: Data controllers and processors are required to notify their • member state’s supervisory authority without delay in the event of a data breach, unless it is unlikely that rights and freedoms of individuals are at risk. [36]

RaySearch already has rigorous data anonymization procedures in place for when clinics have to share patient data with RaySearch. The main responsibility is put on clinics by contractually obliging them to anonymize patient data prior to disclosure to RaySearch. As an additional precaution, data is run through RaySearch’s own anonymization tool, an application that goes through data sets, deleting or other- wise modifying attributes that could risk identification of individual patients.

2.2 General log management concepts

Log management is the process of dealing with computer-generated log messages in large volumes. The processing of log messages can be motivated by different aims, and different types of processing may be appropriate during different stages of a product’s development and release. Different approaches overlap in several areas and are often employed in tandem with each other. Log analysis is the investigation of log messages, possibly spanning a long time frame, in order to draw conclusions about the data. Answers to the questions Who did what?, At what time?, and In what order? might be of interest in matters such as compliance with security regulations, forensics and research into user behavior. Error monitoring is applied mainly in de- velopment to validate new functionality, although it is also an approach used in production to weed out bugs [49]. Application performance management (APM) is an approach focused on immediate operations and system uptime. It may involve, for example, ensuring a web services remains available to users [43, p. 2]. 10 Chapter 2. Technical background

Data can be collected as a result of a user action, e.g. agreeing to send a crash report to the developer when an error is encountered, or collected automatically. The automatic (and periodic) collection and transmission of diagnostic information is known as software telemetry and is an important branch of APM. Users are typically allowed to choose the extent to which they are willing to share data with developers. The information collected can be of varying types and ranges from the general to the specific. Windows 10 diagnostics data sent to Microsoft can include information ranging from information about hardware components, network capabilities and connected peripherals, to application usage, browsing history, and samples of typing input [12]. The framework developed in this thesis is tailored for error monitoring. Possible implementations of APM are discussed in some detail. Related works include that of Ghanta and Mukherejee [19], in which a cloud based framework based on the Elastic stack was implemented to tackle the issues associated with many popular tools for traditional log management, such as cat, tail, and grep.

2.2.1 Remote log analysis The use of diagnostic data to aid software development and upkeep of services is not a new practice. Operating system manufacturers like the Microsoft Corporation and Apple Inc. and various software manufacturers like the Mozilla Corporation have made extensive use of diagnostic information to improve their products for years [16, 27]. Data mining of user data is routinely used in game development to improve the user experience [30]. Similarly, usage data is used extensively by the Microsoft Office development team to inform design decisions. Information about how many files contain elements such as tables and images, and in what ways users typically interact with such elements, are used to focus development resources on matters important to the user base [7]. A crash is an event usually requiring a restart of an application or the entire op- erating system. A crash can be the result of a problem at the user level (application level) or the kernel level (operating system level). While corporations such as Mi- crosoft collect vast amounts of data for their operating systems and products, legal concerns prevent them from sharing their data with research groups [15]. In 2006, Ganapathi et. al [16] studied the dominant failure causes of the Windows operating system, concluding that the majority of operating system crashes were caused by poorly-written device driver code. With many software systems automatically reporting failures to developers, the volume of crash data to process increases. Assessing which failures occur most fre- quently is important to better allocate maintenance and development efforts. Kim et. al [27] have studied how machine learning can be applied to crash reports of Mozilla Firefox and Thunderbird to predict top crashes before a new software release.

2.2.2 Secure data transfers Asymmetric encryption, or public key cryptography, is commonly used to authen- ticate servers and clients, and to enable secure transfer of data between them. Pairs of keys are used to encrypt and decrypt messages, and certificates are used to au- thenticate entities. Clients can authenticate servers by means of their server certificate, typically signed by a certificate authority. Conversely, servers can request a client cer- tificate from a connecting client to verify their identity. Server certificates are widely 2.2. General log management concepts 11 used, and server authentication is an integral part of any SSL/TLS session. Client certificates can be used to further improve security, but are used more rarely for common networking purposes due to the difficulties involved in adding them to client browsers [22]. While certificates are usually issued by a certificate authority, self-signed certificates can be generated for free using tools such as OpenSSL.

2.2.3 A typical log message A typical log message consists of a single line, or multiple lines, of text, with infor- mation contained in fields separated by some delimiter. Three of the most commonly present fields are the timestamp, severity and message fields. The timestamp field contains the time when the event occurred. The format may vary depending on the context. Although it is common for the field to contain both date and time, oftentimes with a precision of milliseconds, log messages that are only of interest during a small span of time following its creation may omit the year or the date entire, only recording the time. The severity field denotes the severity level of the message. A higher sever- ity level signals a more severe program state and might call for a response from a system administrator. The log levels documented in the syslog standard, a standard for message logging, are, in ascending order of severity: Debug, Informational, No- tice, Warning, Error, Critical, Alert and Emergency [18, p. 11]. Some variation on this hierarchy of severity levels is almost universally applied in computer logging. Messages are typically written to files under some implementation of log file ro- tation, a measure taken to limit the size of individual log files. On some condition, the targeted output file is swapped for another. The condition triggering a rotation is commonly tied to a maximum file size or based on date, and multiple conditions can be combined to improve categorization. The original file may be kept indefinitely or remain in storage for only a set length of time. Log files in RayStation are rotated when a file grows larger than 5 MB and generally remain in storage until they are removed manually.

2.2.4 Writing logs A logging library is a programming package used to add logging support to soft- ware. Popular logging libraries for .NET applications include Microsoft Enterprise Library Logging, NLog, Log4net and Serilog. Although they differ in their implemen- tations, their main objective is to encapsulate the logic required for application log- ging and to provide a consistent model for logging tasks. RayStation log messages are written mostly using the Microsoft Enterprise Library, which has not been up- dated since April 2013 [14]. NLog [32], Log4net [2] and Serilog [40] are open-source alternatives under active development. Application logging traditionally produces unstructured text data. Log messages are written chronologically in plain text. Filtering this data to show only events re- lated to e.g. a specific user ID, or transaction number, requires preparatory parsing and categorization of the unstructured data. Structured logging is a practice in which log messages are written using a structured format, typically JSON, facilitating the filtering and analysis of data. JSON (JavaScript Object Notation) is an open-standard file format using human-readable text. Objects are defined by specifying a number of attribute-value pairs. It is lightweight, language independent, and is easily parsed 12 Chapter 2. Technical background and generated by machines [44]. Microsoft Enterprise Library supports only tradi- tional logging. NLog and Log4net support both traditional and structured logging, whereas Serilog is geared almost entirely towards the latter [37]. Log messages are commonly written by calling one method or other associated with a logger object. In traditional logging, the call

_logger.Information("Applying patch 4.19 -> 4.20"); results in a message similar to

2018-05-29 10:29:03:7637 INFO Applying patch 4.19 -> 4.20 being written to file. To filter such events by e.g. severity level, the message has to be parsed to produce an object representation with separate fields for different parts of the message contents. In structured logging, information is already split into different fields in the logger method call. The call

_logger.Warning("{message}", "Applying patch 4.19 -> 4.20"); results in the JSON object

{ "@t": "2018-05-29T08:29:03.7887720Z", "@mt": "{message}", "@l": "Warning", "message": "Applying patch 4.19 -> 4.20" } with important information in separate fields. The message template field (@mt above) can be used to reconstruct the original message by substituting the field names contained in brackets with their values. If, in this example, the particular version numbers were of interest, the call

_logger.Warning("Applying patch {from_version} -> {to_version}", "4.19", "4.20"); is more appropriate, resulting in the JSON object

{ "@t": "2018-05-29T08:50:47.8206799Z", "@mt": "Applying patch {from_version} -> {to_version}", "@l": "Warning", "from_version": "4.19", "to_version": "4.20" }.

2.2.5 Managing logs There are several products and product suites available for log management, some of them commercial and some of them open-source. Many of the commercial op- tions are offered as hosted services, i.e. Software as a Service (SaaS). These offerings commonly consist of a data storage service and a search engine coupled with a vi- sual interface to perform analysis on the collected data. The means by which log data reaches the data storage service vary between options, some being tailored to ingest logs of common formats, such as Apache web server logs, and some offering more customization as to how log data is processed and ingested. In summary, logs need to be forwarded from edge clients, parsed to extract data fields, stored in a searchable environment, and visualized in a comprehensive way. These tasks can be solved somewhat independently. Once log messages are for- warded to a central location and parsed into a commonly recognized format, the data is compatible with a wide range of options for storage and analysis, be they commercial or open-source, on-premise or hosted. 2.3. The Elastic stack 13

2.3 The Elastic stack

A popular open-source suite of products for log management is the Elastic stack. It is made up of components dedicated to each of the tasks mentioned above: Filebeat is a light-weight log message forwarder; Logstash is a message parser and shipper; Elasticsearch is a search engine and storage option; and Kibana is a visualization and analysis platform. Many commercial and open-source products have Elasticsearch at their core, such as Loggly [13, 20], Graylog [26, 47], Logsene [1] and Logz.io [42]. Some simply offer a hosted installation of Elasticsearch and Kibana, while others use Elasticsearch as their search engine, building a custom tool for analysis on top of it. Standardized log message formats are commonly supported out-of-the-box, but messages with custom formats, such as those produced by RayStation, usually require prior pars- ing. Logstash is compatible with not only Elasticsearch, but with a wide range of commercially offered storage options. Because of the flexibility of Logstash and the wide use of Elasticsearch as a back-end in commercial solutions, the Elastic stack was chosen as the development environment for the proof of concept log management framework produced in this thesis.

2.3.1 Elasticsearch Elasticsearch is a highly scalable full-text search engine, and is the heart of any Elastic stack setup. It stores data centrally and allows the user to perform advanced searches. It is based on the open-source information retrieval software library Apache Lucene [41, p. 17]. Apache Lucene was created to provide a fast, scalable alternative to the SQL databases previously employed for full-text search operations [38, p. 1].

Glossary and architecture Elasticsearch handles data in the form of documents. A document is made up of a number of fields and can be considered analogous to a row in a relational database. In a document, a field has two parts: a name and a value. A field value is in turn made up of one or more terms, the unit of measure when conducting searches. A document can be considered analogous to a row in a relational database [21, pp. 36- 37], and is represented as a JSON object [33, p. 40]. A node is a single instance of Elasticsearch, i.e. a server. A cluster is a set of one or more nodes (servers) that together holds data and handles search requests [21, p. 4].

Indexing Whereas relational database tables map documents (rows) to fields, Elasticsearch works the other way around, mapping terms to documents in an inverted index. When a new document is added to Elasticsearch, its fields are analyzed: the oc- currence of each term is counted and the inverted index is updated. In contrast to a relational database, Elasticsearch is schema-flexible. A document is not required to contain a predetermined set of fields, as in a relational database, but can rather contain an arbitrary number fields, even regardless of whether they are present in other documents or not [21, pp. 9, 81-82]. 14 Chapter 2. Technical background

Scaling and clusters Elasticsearch is optimized for speed, with one of its goals being to provide near real- time search. Improving the performance and capacity of Elasticsearch can be done by scaling vertically or horizontally. Vertical scaling means improving the hardware used, and can benefit Elasticsearch to some extent. Horizontal scaling means adding more nodes to a cluster to distribute load and improve reliability. While scaling horizontally can be troublesome in classical database implementations, Elasticsearch is built from the ground up to be distributed, facilitating horizontal scaling. Even so, depending on the specific demands on performance of different environments, a cluster with a single node may be a perfectly viable solution [21, p. 26].

2.3.2 Logstash Logstash is an engine for data collection and transfor- mation, functioning as a pipeline between primary data sources and their destination (e.g. Elasticsearch). It is ca- pable of ingesting data from multiple different sources si- multaneously, transforming or adding information based on specific contexts, and sending the data to a suitable output [19, 41, p. 1]. Logstash handles input data in the form of message events and supports a multitude of input plug-ins. Examples include the file, kafka, imap, and stdin plug-ins, enabling Logstash to read events from a file, Apache Kafka topic, IMAP e-mail server, or standard input, respectively.

Configuring Logstash

A Logstash instance is configured by means of a YAML (.yml) configuration file. YAML (YAML Ain’t Markup Language) is a human readable serialization language commonly used for configuration files [4, p. 1]. Logstash is an event pipeline with three stages: inputs filters outputs. The three stages are isolated and configured ! ! independently of each other. In other words, an event has to pass any and all con- figured filters before being shipped to any output. It is not possible to apply one filter, then output to one destination, then apply a second filter, and then output to a second destination. The input section specifies which types of incoming connections Logstash should listen for, whether to use secure connections (SSL) or whether to decode the input using a supported codec (e.g. JSON). The filter section allows manipulation and mapping of the data contained in the event. Fields can be added, removed, or updated; simple logical expressions can be used to determine execution flow; plug-ins can be used to enrich messages with geographical data based on IP addresses [10]. Log messages shipped to Logstash from a plain text log file via Filebeat (see Section 2.3.4), take the form of a JSON object with a number of meta fields and a message field containing the original log event information. The grok filter can be used to match contents of existing fields and map that information to new fields. 2.3. The Elastic stack 15

The filter

LISTING 2.1: Filter example grok { match => {"message" => "Timestamp: %{ST_TIMESTAMP:log_timestamp}\nCategory: , %{DATA:category}\nSeverity: %{LOGLEVEL:severity}} ! } applied to a message field equal to

LISTING 2.2: Message example Timestamp: 17 Apr 2018, 09:40:34 Category: RayStation Error Log Severity: Error will update the log JSON object, adding the fields log_timestamp, category, and severity, populating those fields with values the values found at the corresponding matching locations in the text contents of the message field.

LISTING 2.3: Filter result example { "log_timestamp": "2018-04-17T09:40:34:000" "category": "RayStation Error log" "severity": "Error" ... }

The syntax %{DATA:category} in Listing 2.1 tells Logstash to take anything match- ing the regular expression defined by the placeholder DATA and store it in the field category, creating the field if necessary. A regular expression is a sequence of char- acters defining a search pattern [31, p. 754]. DATA is one of the standard patterns supported by Logstash and translates to .*?, matching zero or more occurrences (*) of any character (.), matching as little as possible when there are multiple matches (?). In the log message in Listing 2.2, the words RayStation Error log match the DATA pattern, and can be seen to populate the category field in the JSON represen- tation of the event shown in Listing 2.3. Users configuring Logstash are not limited to pre-defined patterns, rather, arbitrary patterns can be created when needed. The output section specifies which destinations to output to. Multiple output plug-ins of different types can be used, including, but not limited to, Elasticsearch, comma separated file (.csv), e-mail, file, HTTP and numerous log management plat- forms like Loggly and GrayLog [28, p. 284].

2.3.3 Kibana Kibana is a visualization and analysis platform that provides a graphical interface for interacting with Elasticsearch data [19]. It can be configured to display dashboards with visualizations updating in real time as data is indexed. The Lucene query syn- tax can be used to perform a wide range of searches on the data [9]. Some examples of search queries are shown in Table 2.1. Kibana provides a wide array of differ- ent visualization for the data stored in Elasticsearch. Visualizations are highly cus- tomizable and can be made to suit many types of data. Multiple visualizations can be combined, forming a dashboard. Visualizations can be interacted with to view the underlying data in a table format or to quickly filter for certain properties [41, pp. 39-41]. An example of a pie chart visualization is shown in Figure 2.3. 16 Chapter 2. Technical background

Query Finds documents with. . . severity:Critical ...aCritical severity level. severity:Critical OR severity:Error ...aCritical or Error severity level. message:database . . . the term "database" in their message field (case- insensitive). exception_class:*CorePlatform* . . . "CorePlatform" being a part of their exception class. * is a wildcard symbol matching multiple characters. message:te?t . . . e.g. the words "test" or "text" in their message field. ? is a wildcard symbol matching a single character. message:"exception property" 10 . . . the terms "exception" and "property" in their ⇠ message field, with the terms no more than 10 words apart.

TABLE 2.1: Examples of search queries usable in Kibana.

(A) (B)

FIGURE 2.3: Example of a Kibana visualization. (A) An example of a pie chart visualization in Kibana, showing how submitted log messages are dis- tributed over different machines. Clicking one of the machine names or circle segments in the pie chart filters for that particular machine only. Filters are applied to the entire dashboard, thereby updating any other visualizations contained in it. Clicking the small chevron in the bottom left corner opens the table view. (B) One of the table views associated with the pie chart vi- sualization. Additional information is contained in the tabs listed at the top. Clicking the small chevron in the bottom left corner shows the pie chart.

In addition to visualizations, saved searches can be added to a dashboard. A saved search is a query with a set of specified search parameters, such as whether to filter out certain documents or which fields to display in the results. The result is a list of matching documents with entries that can be expanded to show more information. Selecting a document reveals the option to view surrounding documents, allowing in- sight into the context of a particular message by displaying a window of documents surrounding it, potentially from different sources. 2.3. The Elastic stack 17

2.3.4 Filebeat Filebeat is a member of the Beats family of products, a set of lightweight data shippers that joined the Elasticsearch suite in 2016, facilitating the ingestion of data [23]. Each Beat is tailored to a single form of data, e.g. network data or audit data. The purpose of Filebeat is to monitor log files and to ship log messages to targets, e.g. Logstash or Elasticsearch. Filebeat works by setting up one or more prospectors, each managing a set of harvesters. A prospector has a type, determining the type of input it expects. A prospector of the log type monitors one or several local file paths.

LISTING 2.4: Filebeat prospector configuration example -type:log paths: -/var/log/*.log -/var/log/messages.txt

The prospector in Listing 2.4 monitors the /var/log/ folder. For every matching file found at the specified paths (i.e. the file messages.txt, as well as any file with the .log extension), a harvester is created. The harvester reads the file line by line, thereafter shipping it to a specified output. Filebeat keeps a registry of file states to keep track of how many lines have been read in every file. Filebeat guarantees that every message is delivered at least once by repeatedly trying to send messages until acknowledgement is received from the defined output. If, when Filebeat is shut down, there are any messages that have been sent, but have not been acknowledged by the output, those messages are sent again when Filebeat is restarted, potentially resulting in duplicates in the output. Filebeat makes no effort to filter out duplicate events contained in the monitored log files, forwarding each message regardless of whether it is unique or not. Distinguishing identifiers have to be handled later in the logging pipeline, e.g. in Elasticsearch where every message is stored as a document with a unique ID. Specifically built for monitoring log files, Filebeat handles log rotation, i.e. the renaming and archiving of log files, well [48]. 18 Chapter 2. Technical background

2.4 Log files in RayStation

A running RayStation instance writes information to a number of different log files, each with its own structure and set of data points.

2.4.1 The RayStation Storage Tool log The RayStation Storage Tool log consists largely of single-line, Information level, events. When an exception is logged, however, the message is extended with a multi-line stack trace.

LISTING 2.5: Example of a set of RayStation Storage Tool log messages. 08 nov 2017, 14:33:17 Information Applying patch 5.59 -> 5.60 08 nov 2017, 14:33:17 Information Model schema upgrade complete. 08 nov 2017, 14:33:17 Information Validating database checksums. 08 nov 2017, 14:33:45 Information Copying template lungtum_1_2 08 nov 2017, 14:33:51 Information Upgrade of database schema from structure , version 1.11 to 1.19 started. ! 08 nov 2017, 14:33:51 Information Structure schema upgrade complete. 08 nov 2017, 14:33:51 Information Upgrade of database schema from model , version 1.290 to 5.60 started. ! 08 nov 2017, 14:33:51 Information Applying patch 1.290 -> 1.291 08 nov 2017, 14:33:51 Information Applying patch 1.291 -> 1.292 24 mar 2017, 11:12:56 Critical System.Reflection.TargetInvocationException: , Exception has been thrown by the target of an invocation. ---> ! ......

2.4.2 The RayStation Index Service log Like the RayStation Storage Tool log, the RayStation Index Service log consists pre- dominantly of single-line, Information level events, with some multi-line messages for certain errors.

LISTING 2.6: Example of a set of RayStation Index Service log messages. 2017-12-04 00:31:11.655Z Information [...] Updating patient index... 2017-12-04 00:31:21.692Z Information Unhandled SqlException - Error index , #0 ! Source: .Net SqlClient Data Provider Number: 4060 State: 1 Class: 11 Server: ... Message: Cannot open database "..." requested by the login. The login failed. Procedure: LineNumber: 65536 2017-12-04 00:31:21.692Z Information Unhandled SqlException - Error index , #1 ! ... 2.4. Log files in RayStation 19

2.4.3 The RayStation Error log The RayStation Error log is the source of more complex log messages. It contains in- formation about several fields beyond the standard timestamp, severity and message fields, e.g. the current user, application version, some of which are single-line whereas others, i.e. callstack, are multi-line in structure. This is the most interesting log file with regards to logged exceptions and program crashes and thus for the scope of this thesis.

LISTING 2.7: Example of a RayStation Error log message. ------Timestamp: 10 Aug 2017, 15:06:52 Category: RayStation Error Log Severity: Error Machine: SE-XXXXX-WKS App Domain: RayStation.exe ProcessId: 10384 Process Name: ... Thread Name: ... Win32 ThreadId:26264 Message: RaySearch.CorePlatform.Framework.SchemaNotSupportedException: The schema , version of the data source ClinicDB is not supported. ! ...... ApplicationVersion: N/A User: ... IP: ... CallStack: at System.Environment......

------

2.4.4 The RaaS logs The RaaS (RayStation as a Service) logs are written by services used for integration between RayStation and other RaySearch products. These include the PatientService log, the SenderService log and the TreatmentService log. The logs are similar in structure, consisting of single-line messages with tab delimited fields.

LISTING 2.8: Example of a set of RayStation PatientService log messages. #Module: PatientService #Date: 2018-04-20 07:27:12 #Fields: timestamp level correlationId message exception , ipaddress result duration object ! 2018-04-20 09:36:20.074Z INFO c8e26b396a2f413c939c0c2d913599bc , Percentage: 10, Details: DICOM retrieve series ! 2018-04-20 09:36:21.792Z INFO c8e26b396a2f413c939c0c2d913599bc , Percentage: 20, Details: Importing DICOM series ! 2018-04-20 09:36:21.801Z INFO c8e26b396a2f413c939c0c2d913599bc , Percentage: 25, Details: DICOM import initiated ! 2018-04-20 09:36:21.817Z INFO c8e26b396a2f413c939c0c2d913599bc , Percentage: 25, Details: Importing patient information ! 2018-04-20 09:36:22.107Z REQIN GET Progress/GetProgress , ["c8e26b39-6a2f-413c-939c-0c2d913599bc"] ! 2018-04-20 09:36:22.108Z RSPIN GET Progress/GetProgress , 200 1 {"Status":"Started","Details":"Work started ! , at ... !

21

3 Implementation

The framework was developed over a series of iterations, each increasing in scale. Installations of the Elastic stack where deployed locally, remotely, and in virtual environments to test its functionality and to simulate relationships between dif- ferent network domains (i.e. RaySearch and a clinic). The number of machines in- volved differed greatly between prototypes, with only a single machine being used in the simplest implementation and 38 machines being used in the most advanced implementation. The seven different versions had different scopes and aims as listed below:

A: Setting up a grok pattern debugging pipeline B: Monitoring performance and collecting system data C1: Processing RayStation logs and managing multi-line messages C2: Centralizing logs in a virtual environment C3: Monitoring multiple workstations C4: Simulating the clinic-to-RaySearch relationship C5: Finalizing the proof-of-concept solution

The conditions and requirements posed by different clinics vary widely. Of par- ticular interest for this thesis is the network environment of clinics, as whether clinic workstations are equipped with an Internet connection or not drastically affects con- ditions for a solution to the access problem. Development has been targeted towards clinics with the following minimal networking conditions:

Any workstation lacking a direct Internet connection has a local connection to • at least one central machine. The central machine(s) either have an Internet connection themselves, or, in • case clinics employ multiple network domains, have connections to at least one Internet-connected machine within another domain.

Potential solutions to the access problem at clinics not fulfilling these minimal con- ditions are discussed in Section 5.4.4. The framework was developed in a Windows-based environment using the 6.2.1 software versions of all Elastic stack components (Elasticsearch, Logstash, Kibana and Filebeat). 22 Chapter 3. Implementation

3.1 A: Setting up a grok pattern debugging pipeline

Purpose The purpose of the first prototype was to establish a connection be- tween Filebeat and Logstash and to test filtering of simple messages.

Description The layout of the pipeline is shown in Figure 3.1. The first prototype featured installations of both Filebeat and Logstash. A small application was devel- oped for generating log messages with a simple syntax. The application features a text-based command window interface, shown in Figure 3.2, where a user is asked for a series of inputs within a set range. The program flow is documented in the logs with numerous DEBUG level messages. Invalid user input results in different types of exceptions being thrown. These events are documented in the logs as ERROR level messages. Log message examples are shown in Listing 3.1.

FIGURE 3.1: A log file is monitored by Filebeat, forwarding any new log messages to Logstash. Logstash matches the message against a grok pattern and outputs the result to the console, formatted as a JSON object.

Choose from the following actions:

------1. Add a patient 2. Print list of patients 3. Start timer 4. Stop timer 5. Performance intensive operation 6. Error prone operation with 10% failure rate 7. Quit ------

Choice: _

FIGURE 3.2: A simple command window interface was used to generate log messages as responses to user input.

Filebeat was set up to monitor the generated log file and to continuously forward new messages to a Logstash instance running on the same machine. A pattern was written for the grok filter plug-in, allowing Logstash to extract information about the fields timestamp, severity, message, and, when present in the log entry, the exception_type. After parsing the message, Logstash displayed the result, repre- sented as a JSON object, in a console window as shown in Listing 3.2. 3.1. A: Setting up a grok pattern debugging pipeline 23

LISTING 3.1: A sample of log messages (entries) printed to file during a run of the application. Log entries contain fields for date and time, severity and a message.

1 2018-02-13 12:34:49 [INFO] | === New run started === 2 2018-02-13 12:34:49 [DEBUG] | User asked for input between 1 and 5 3 2018-02-13 12:34:51 [ERROR] | InputOutOfBoundsException thrown: Input out of , bounds: 8 not in range [1, 5] ! 4 2018-02-13 12:34:51 [DEBUG] | Read invalid input: 8 5 2018-02-13 12:35:00 [DEBUG] | Read valid input: 2 6 2018-02-13 12:35:01 [DEBUG] | User asked for input between 1 and 5 7 2018-02-13 12:35:04 [ERROR] | UnrecognizedInputFormatException thrown: Input , "asdf" is not of the correct format: int ! 8 2018-02-13 12:35:04 [DEBUG] | Read invalid input: asdf

LISTING 3.2: The Logstash output printed to console after parsing the mes- sage on line 7 in Listing 3.1. { "@timestamp" => 2018-02-13T12:35:04.000Z, "severity" => "ERROR", "message" => "UnrecognizedInputFormatException thrown: Input \"asdf\" is , not of the correct format: int", ! "exception_type" => "UnrecognizedInputFormatException", "source" => "C:\\Data\\Logs\\log.txt", "offset" => 115, "prospector" => { "type" => "log" }, "beat" => { "name" => "SE-XXXXX-WKS", "hostname" => "SE-XXXXX-WKS", "version" => "6.2.1" }, "host" => "SE-XXXXX-WKS" "@version" => "1", "tags" => [ [0] "beats_input_codec_plain_applied" ], } This simple pipeline was not designed to serve any direct purpose in a production environment, but to be an aid during development. It allowed for immediate feed- back about the performance and correctness of grok filters currently under devel- opment. It was used continuously for debugging during the development of more sophisticated framework prototypes. 24 Chapter 3. Implementation

3.2 B: Monitoring performance and collecting system data

Purpose The second prototype explored ways of collecting and visualizing system information like OS version, GPU driver version, IP address and real time metrics like CPU and RAM usage. At the time of writing, these types of data are not present in any RayStation logs. As such, this prototype served to show potential use cases for such information, in case it is added to the logs in a future release.

Description Instead of using real RayStation log data, a simple program was writ- ten to randomly generate log messages with an adjustable frequency. A set of several different machine configurations were created, each being a set of properties like the ones listed above, i.e. a combination of software versions and hardware components. The log-generating program was designed to load one of the configurations and then to simulate work, alternating between two different tasks and logging its progress, each log message including information about the configuration being used. An IP address, chosen from a list of randomized addresses, was attached to each message to provide geographical information. Only one physical machine was used, but multiple instances of the log-generating program were run simultaneously, each writing to its own log file, to simulate logs being generated in a multi-machine environment. Filebeat was set up to monitor files created by all instances of the program and to forward any new log messages to a Logstash instance running on the same machine. The custom grok pattern from prototype A was improved to allow Logstash to extract the logged system proper- ties, in addition to the timestamp, severity and message fields. The geoip plug-in was used to make inferences about the geographical source of the messages based on the registered IP addresses. Elasticsearch and Kibana instances were added to the pipeline for storage and visualization of the data. The entire pipeline is shown in Figure 3.3.

FIGURE 3.3: Several log files are monitored by Filebeat, forwarding any new log messages to Logstash. Logstash matches the message against a grok pat- tern and ships the resulting JSON object to a local Elasticsearch instance. An instance of Kibana enables interaction with the data via queries and dash- boards. 3.2. B: Monitoring performance and collecting system data 25

Elasticsearch was set up using its default configuration. In Kibana, nine visualiza- tions aggregating the logged data in different ways were designed and implemented in a custom dashboard. Within the chosen timespan, the visualizations showed

the total number of recorded log messages; • the distribution of failed or successful operations; • the volume of incoming log messages over time; • the regional distribution of log messages in general, and failed operations in • particular; and the maximum, minimum, and average performance ratings registered over • time (CPU usage, RAM usage, and time taken for operations).

Information could be easily filtered by interacting with the visualizations or using the Kibana filter interface. The dashboard could be configured to show only data from, e.g: China, computers running Windows 10, operations throwing exceptions, or any combination of the above. A snapshot of the dashboard overview is shown in Figure 3.4.

FIGURE 3.4: A Kibana dashboard showing information recorded in a 24 hour timespan. The number in the top right shows the total number of messages recorded. The pie chart shows the distribution of failed and successful events (failed events throw an exception). The bar charts show the volume of mes- sages recorded over time: they show a disruption in the amount of messages logged between 12 PM and 1 PM, with an increase in both failed and success- ful operations following the disruption. The line charts plot various perfor- mance metrics over time. The maps show which regions register the most ac- tivity, the left showing all events and the right filtering for only failed events. 26 Chapter 3. Implementation

3.3 C1: Processing RayStation logs and managing multi-line messages

Purpose The purpose of the third prototype was to improve the parsing capa- bilities of the pipeline, using real RayStation log data as input: multi-line messages with complex structures.

Description During development of the third prototype, real RayStation log data was processed for the first time. The data consisted of log messages written by a single instance of RayStation during a year of sporadic usage within the RaySearch development environment. Although considerable work had to go into tailoring Filebeat and Logstash configurations to fit the new data, the overall structure of the pipeline remained as described in Figure 3.3. Log messages were contained in multiple files of three different types: RaySta- tion error log files, RayStation Storage Tool log files, and RayStation Index Service log files. Filebeat was set up with three separate prospectors, each monitoring files of one of the respective types. With each file type having at least some multi-line log messages, each Filebeat prospector had to be tailored to the specific format of each file type to recognize mes- sages in their entirety, as opposed to blindly forwarding every new line to Logstash. This was accomplished using the multiline set of parameters:

The parameter multiline.pattern defines a pattern that is central to finding • the boundary between messages. The pattern ^[0-9]{4}-[0-9]{2}-[0-9]{2} matches a set of digits arranged as a date such as 2018-05-23, i.e. the first part of a log message timestamp. The parameter multiline.negate determines the role that lines matching • the multiline.pattern have. If it is set to false, all lines matching the multiline.pattern are considered to be part of a larger message. If set to true, all lines not matching the multiline.pattern are considered to be part of a larger message. The parameter multiline.match is set to either before or after and deter- • mines whether lines (not) matching the multiline.pattern should be appen- ded to the lines preceding or following it.

The example configuration for the Index Service log prospector shown in List- ing 3.3 has the prospector look for lines beginning with a date on the form YYYY-MM-dd. When such a line is read, the lines following are read one by one. Each line not matching the defined pattern, i.e. not beginning with a date, are appended to the first. When a new line matching the pattern is found, the series of concatenated lines are forwarded as a whole. The caret (^) in the multiline.pattern denotes a position, namely the beginning of a line. As such, the processing of multi-line messages is not ended prematurely in case a date is featured somewhere in the body of the message: only a line beginning with the date will be considered a message boundary. 3.3. C1: Processing RayStation logs and managing multi-line messages 27

LISTING 3.3: Part of the configuration for Filebeat, showing the prospectors for Error and Index service logs. filebeat.prospectors: #ProspectorforIndexServicelogs -type:log enabled: true paths: -${PROGRAMDATA}/RaySearch/RayStation_Index_Service_log* fields: service_owner: RayStation_Index_Service index_base: raystation-index-service fields_under_root: true multiline.pattern: ^[0-9]{4}-[0-9]{2}-[0-9]{2} multiline.negate: true multiline.match: after

#ProspectorforRayStationErrorlogs -type:log enabled: true paths: -${PROGRAMDATA}/RaySearch/*RayStation_error_log* fields: service_owner: RayStation_Error_Log index_base: raystation-error-logs fields_under_root: true multiline.pattern: Timestamp multiline.negate: true multiline.match: after ...

The fields parameter allows for prospector-specific information to be included with the message as it is forwarded to the next stage in the pipeline. The addition of the service_owner field, seen in both prospector configurations in Listing 3.3, is essential as a way of communicating to Logstash which message structure to expect. The Logstash configuration was structured by means of a series of if-then-else statements. Each message passed to Logstash was subjected to a different set of filters based on the value of the service_owner field, populated by Filebeat. In ad- dition to the grok filters shown in Figure 3.4, several additional steps were carried out. Modification of the timestamp By default, the timestamp field of the log mes- sage denotes the time at which Logstash receives the message. The date filter plug-in was used to update the field so as to carry the date and time of when the message was written by RayStation. Extracting the exception class name Some log messages carry information about an exception having been thrown. These messages typically have a higher severity level, Error or Critical. Logstash was configured to subject such messages to a sec- ondary grok filter evaluation in an attempt to extract the class name of the exception thrown. Collecting additional information Each of the processed log types contains mes- sages describing database related errors. These messages contain an extra set of fields that are of interest during debugging. Logstash was configured to subject such messages to a secondary grok filter evaluation in order to extract the additional in- formation points. 28 Chapter 3. Implementation

LISTING 3.4: Part of the configuration for Logstash, showing how differ- ent types of messages were handled in the same pipeline. Additional filter- ing tasks such as adjusting the log timestamp and extracting exception class names have been substituted by ellipses ( . . . ) to improve readability. filter { #===FILTERSFORTHESTORAGETOOLLOG=== if [service_owner] == "RayStation_Storage_Tool" { grok { match => { "message" => "%{PATTERN_FOR_THE_STORAGE_TOOL_LOG}} } ... #===FILTERSFORTHEINDEXSERVICELOG=== }elseif[service_owner]=="RayStation_Index_Service"{ grok { match => { "message" => "%{PATTERN_FOR_THE_INDEX_SERVICE_LOG}} } ... #===FILTERSFORTHEERRORLOG=== }elseif[service_owner]=="RayStation_Error_Log"{ grok { match => { "message" => "%{PATTERN_FOR_THE_ERROR_LOG}} } ... } } To simulate data being written in real-time by RayStation, a simple program was written to copy existing log files to a new location one line at a time. While this behavior might have been better demonstrated by actually running RayStation and performing treatment planning tasks, copying existing files proved to be a more efficient way of generating relevant log messages in a reliable way. Logstash was configured to ship data to several different indices in Elasticsearch. Each message was assigned to an index based on its date and its source. A mes- sage from the RayStation_Error_log dated March 13th, 2018 would be indexed to the logstash-raystation-error-logs.2018.03.13 index in Elasticsearch. Visual- izations suited to the new types of data were developed in Kibana. 3.4. C2: Centralizing logs in a virtual environment 29

3.4 C2: Centralizing logs in a virtual environment

Purpose The purpose of the fourth prototype was to investigate how com- ponents of the logging solution could be deployed on different machines, and to investigate security measures for limiting access to the Elasticsearch and Kibana in- stances.

Description The fourth prototype employed a virtual machine to simulate ship- ping data to a central location. The log generation software and the Filebeat client were installed locally, and log data was shipped to the virtual machine running Logstash, Elasticsearch and Kibana, as seen in Figure 3.5. The virtual machine was set up to run Ubuntu Server 18.04 LTS using Oracle VM VirtualBox. No consider- able changes were made to the configurations of Filebeat, Logstash, Elasticsearch or Kibana, apart from redirecting the Filebeat output to a remote IP address instead of the local loopback address. The same utility that was used in prototype C1 was used to generate log messages by copying existing files. Although Kibana was hosted on the virtual machine, the web interface was accessed from the local workstation.

FIGURE 3.5: Several log files are monitored by Filebeat, forwarding any new log messages to a Logstash instance hosted on a virtual machine. Logstash matches the messages against one or more grok patterns and ships the result- ing JSON object to a local Elasticsearch instance. An instance of Kibana en- ables interaction with the data via queries and dashboards. Access to Kibana is restricted by means of a reverse proxy server. The connection between Filebeat and Logstash was secured using self-signed certificates.

The real environment counterpart to the virtual server would be a central ma- chine within a clinic, to which workstations running RayStation forward their logs. Access to the Kibana web interface from each workstation would allow for moni- toring of the log messages sent, offering transparency about their contents. Whether the workstations should have access and insight into all data collected (i.e. data from other workstations) on the central machine, or whether access should be restricted to privileged users remains a design parameter. To demonstrate how access to Kibana can be restricted, a reverse-proxy server was set up on the virtual machine using the open-source web server Nginx. The 30 Chapter 3. Implementation reverse-proxy acts as an intermediary between the Kibana server and its clients. In- stead of exposing Kibana to the entire network, clients send requests to the proxy, which in turn forwards the requests to Kibana. Kibana and Elasticsearch lack se- curity features such as TLS (Transport Layer Security) out of the box, but these are easily implemented in Nginx. OpenSSL was used to generate self-signed certifi- cates required for establishing encrypted connections. Clients requesting access to the Kibana server were forced to authenticate with a username and password over an encrypted (HTTPS) connection. Successfully authenticated requests were then forwarded by the proxy to Kibana locally over HTTP. OpenSSL was used to generate self-signed server and client certificates for Logstash and Filebeat. The certificates were used to set up a secure connection (SSL) between the Filebeat client and the Logstash server, having Logstash only accept incoming connections from trusted sources. 3.5. C3: Monitoring multiple workstations 31

3.5 C3: Monitoring multiple workstations

Purpose The purpose of the fifth prototype was to move the solution to mul- tiple physical machines. The virtual server was replaced, and Filebeat was set up to monitor multiple client machines. In order to investigate options for persistent stor- age of log files, Logstash was configured to not only output events to Elasticsearch, but to also save data to file.

Description The Logstash-Elasticsearch-Kibana pipeline, previously hosted on a virtual server, was moved to a physical server running Windows Server 2008. No changes were made to the Filebeat configuration apart from redirecting the output to the physical server. The Logstash configuration was expanded to include file outputs for persistent storage of log messages outside Elasticsearch. Two file outputs were configured: one storing log messages as one line JSON objects, and one being a copy of the original file. The first was intended to store data in a way that would easily allow importing it to another Elasticsearch instance. The second was intended to store the data in a format that was more human-readable. In case of data loss at the later stages of the pipeline, these files could be processed instead of having to collect every log message from the edge machines. In addition to Elasticsearch and file outputs, a console output was added for debugging purposes. The setup is shown in Figure 3.6. Filebeat was distributed to machines used by a subset of the general RaySearch development team. Prospectors were set up to monitor actual RayStation log files. All previously logged messages, some several years old, were deemed to be of inter- est and were therefore forwarded to Logstash. The collection of log messages from multiple machines increased the size of the data set. Messages with previously unseen structures were discovered and had to be catered for with improved grok filter patterns. Logstash was configured to send log messages with unrecognized message structures to specific Elasticsearch indices, facilitating the discovery of filter errors by using Kibana to browse messages in these specific indices.

FIGURE 3.6: Several Filebeat instances monitor multiple log files each, for- warding any new log messages to a Logstash instance hosted on another ma- chine. Logstash matches the messages against one or several grok patterns and ships the resulting JSON object to a local Elasticsearch instance. It also creates local backups of the original files. An instance of Kibana enables in- teraction with the data via queries and dashboards. 32 Chapter 3. Implementation

3.6 C4: Simulating the clinic-to-RaySearch relationship

Purpose The sixth prototype was a fully featured in-house solution. Multiple machines, all within the private RaySearch domain, were employed to simulate sep- arate Clinic and RaySearch environments. Filebeat clients were distributed on mul- tiple machines in the clinic environment. The machines were running RayStation, continually generating log messages. A server within the clinic environment hosted Logstash, Elasticsearch and Kibana instances.

Description The clinic side Logstash instance shipped data not only to the lo- cal Elasticsearch instance, but also to a remote Elasticsearch instance. This instance was hosted on a server within the RaySearch environment, also running a Kibana instance, enabling RaySearch to analyze the log data supplied by the clinic. The purpose of the clinic-side Elasticsearch and Kibana installations were to give clinics insight into the collected data and to enable them to perform analyses of their own. Logstash instances were used in both environments to different degrees. The clinic-side instance was set up to serve Elasticsearch instances in both environments, as well as to create backup files following the same procedure as in prototype C3. The RaySearch Logstash instance, served by a Filebeat instance set up to monitor the clinic-side backup files, was used to recreate the backup files in the RaySearch environment as well. The setup is shown in Figure 3.7.

FIGURE 3.7: Several Filebeat instances monitor several log files each, for- warding any new log messages to a Logstash instance hosted on another ma- chine within the Clinic environment. Logstash matches the messages against one or several grok patterns and ships the resulting JSON object to both local and remote Elasticsearch instances. It also creates local backups of the origi- nal files. A clinic-side Filebeat instance works with a Logstash instance in the RaySearch environment to recreate the backup files there. Kibana instances in both environments enable interaction with the data.

Support for additional log file formats (PatientService, SenderService and TreatmentService logs) were added by supplying Logstash with new grok filter patterns. The clinic-side Logstash instance was configured to add a new field clinic_name to every processed message. The field could be used within the Ray- Search environment to distinguish between data coming from different clinics. 3.7. C5: Finalizing the proof-of-concept solution 33

3.7 C5: Finalizing the proof-of-concept solution

Purpose The purpose of the final prototype was to test the logging solution in a larger scale environment.

Description The seventh prototype featured various improvements and optimiza- tions to the previous version. The scope of testing was greatly increased as Filebeat clients were installed on 38 machines running RayStation. The increased workload resulted in severe performance issues. These issues were combated by tuning the Elasticsearch configurations, namely reducing the number of indices used. The file backups generated in prototype C4 were discontinued due to perfor- mance and functionality considerations. The Logstash console output used for de- bugging was disabled as well, further increasing performance. The setup is shown in Figure 3.8.

FIGURE 3.8: Several Filebeat instances monitor several log files each, for- warding any new log messages to a Logstash instance hosted on another ma- chine within the Clinic environment. Logstash matches the messages against one or several grok patterns and ships the resulting JSON object to both local and remote Elasticsearch instances. Kibana instances in both environments enable interaction with the data via queries and dashboards.

In Kibana, custom visualizations and a dashboard were designed to meet the criteria specified in Section 1.2. The designs were informed by conversations with parts of the development team to meet their specific needs and to handle common use cases. Figure 3.9 shows a snapshot of the dashboard displaying data from the last 24 hours with a set of filters applied. The time setting can be changed freely to display data between any two points in time, and every visual element can be inter- acted with to filter the data further. The message entries displayed can be expanded to reveal additional information about each event. 34 Chapter 3. Implementation

FIGURE 3.9: The dashboard developed for the in-house deployment, set to display data from the last 24 hours. The topmost row of red and green ele- ments shows the filters currently applied: currently, only Error and Critical level alerts from the Index Service or Error logs are shown. The drop-down menus provide shortcuts to custom filters for easy access. 706 messages sent from 28 different machines match the criteria within the given time frame. The top pie chart shows the distribution of messages over different source file categories: 75 % of messages are from Index Service logs. The bottom pie chart shows the distribution messages per machine: close to 80 % of messages come from only two machines. The vertical bar chart shows the distribution of messages over time, with different log file categories separated by color. The horizontal bar chart shows the distribution of different severity levels. The word cloud at the bottom displays the class names of a set of frequently occurring exceptions, with font sizes relative to their respective frequencies. 68 messages have been tagged with the _grokparsefailure tag, indicating a failure in some part of the filtering process. Every panel can be expanded to show a table of the underlying data. The message entries displayed can be expanded to reveal additional information about each event. 35

4 Evaluation and results

A series of performance tests were carried out to inform design decisions and to research potential improvements. The viability of centralized file storage was evalu- ated, as well as the benefits offered by alternative logging libraries and log message formats. All test were run on a Mac Pro machine with an Intel Xeon E5-1680 v2 CPU at 3.00 GHz and 32 GB of RAM, running a (non-virtual) version of Windows 10. 36 Chapter 4. Evaluation and results

4.1 File output impact on performance

One of the framework objectives was to provide persistent, centralized storage of log messages. While it can be argued that storage on a centralized Elasticsearch server sufficiently addresses the requirement, the implementations described in Sec- tions 3.5 and 3.6 researched the viability of also recreating the original log files. In addition to considering the practical benefits and limitations offered by such a solu- tion, the performance impact on the processing pipeline was measured. File creation is handled by Logstash. Filebeat, Logstash, Elasticsearch and Kibana were set up similarly to the C3 implementation shown in Figure 3.6, with the excep- tion of Filebeat being deployed on the same machine as the other components, and isolated from any real RayStation log files. A sample of 50 000 Error log messages was prepared, spread across five log files. Logstash was configured to attach its own timestamp, with millisecond precision, during processing, instead of using the timestamp field contained in the log messages. Only a single file output was config- ured as opposed to the two shown in Figure 3.6. The messages were fed to Filebeat, forwarding them to Logstash which in turn output the processed messages to Elasticsearch and local files. Kibana was used to browse the messages sent to Elasticsearch, and the processing time was deter- mined as the time difference between the timestamps of the first and last messages. Measurements were carried out three times and the measured times taken were av- eraged. The results are shown in Figure 4.1. Enabling the file output results in an average increase of 33.8 % in time taken per 50 000 processed events.

FIGURE 4.1: Difference in Logstash processing time of 50 000 log messages, with and without a file output enabled. Measurements were performed three times and the results averaged. Error bars show one standard deviation. The addition of the file output results in an average increase of 33.8 % compared to a pipeline with a single (Elasticsearch) output. 4.2. Comparison of two logging libraries 37

4.2 Comparison of two logging libraries

RayStation currently lacks a consistent model for logging. Message formats and structures vary across different log files, and even across messages within the same files. The standardization of a logging format would greatly enhance the robustness of any framework parsing log messages. In particular, the transition to structured as opposed to traditional logging would solve many issues involved with maintaining a working configuration of Logstash. The practical benefits of structured logging are discussed in detail in Chapter 5. The performance of the currently used logging library, Microsoft Enterprise Li- brary Logging, was compared to that of Serilog, an open-source alternative with extensive support for structured logging. Two sample messages were prepared, one being a short single-line message and the other a long multi-line message (49 lines). A utility was programmed in C# to write the log messages and to time the execution. Microsoft Enterprise Library Logging was compared to both unstructured and struc- tured (JSON) printouts of Serilog. The Serilog JSON output was configured to use the CompactJsonFormatter [6]. The resulting log messages were one-line JSON ob- jects consisting of a series of key-value pairs, one of which being a message template field describing the layout of the fields in the original message. Each measurement involved writing 10 000 copies of the long and short messages using each respective logging library configuration. Measurements were performed 10 times and the results were averaged. Figure 4.2 shows a comparison of the num- ber of seconds taken to write 10 000 messages using the different configurations.

FIGURE 4.2: Time taken to write 10 000 log messages, both short (in blue) and long (in red), using different logging libraries and configurations. Measure- ments were performed ten times and the results averaged. Error bars show one standard deviation. For long, plain text messages, Serilog and Microsoft Enterprise Library Logging perform very similarly. For shorter messages, Serilog has an edge, sporting a 32 % reduction in average time taken. Writ- ing short messages with Serilog is slightly slower in JSON than in plain text, but in both cases faster than with Microsoft Enterprise Library Logging. The biggest difference is seen when comparing long messages written in JSON and those written in plain text: writing messages in a structured format re- duces the average time taken by 35 %. 38 Chapter 4. Evaluation and results

4.3 Logstash performance with structured messages

Messages written using structured logging does not require the same attention from Logstash. The number of filters used can be drastically reduced, and (sometimes) costly regular expression comparisons can be avoided. The performance of Logstash was evaluated by having it process 50 000 log messages of both types, i.e. plain text and JSON formatted messages. The messages were sent to Logstash from Filebeat and later shipped to Elasticsearch. As in Section 4.1, Kibana was used to compare the timestamps attached by Logstash to measure the processing time. The process was repeated three times for each respective message format and the measured times taken were averaged. Figure 4.3 shows a comparison of the number of seconds taken to process 50 000 events.

FIGURE 4.3: Difference in average time taken by Logstash to process 50 000 messages formatted in plain text and JSON, respectively. Each set of mes- sages contained 50 000 copies of identical messages. Measurements were car- ried out three times and the the measured times taken were averaged. Error bars show one standard deviation. The average processing time of JSON for- matted messages is 18.9 % lower than that of plain text messages. 39

5 Discussion

5.1 Evaluation results

5.1.1 File output impact on performance The measurements on the performance impacts of implementing file outputs from Logstash showed that doing so resulted in a significant increase – 33.8 % – in time taken to process log messages. While this is a significant slowdown relatively, the absolute increase in time – 4.12 seconds per 50 000 log messages, or just 0.08 millisec- onds per single message – is negligible, at least from a design perspective. The error monitoring applications the framework is designed for does not require log mes- sages to be delivered so urgently: a typical troubleshooting process handles data hours, days or even weeks old. Nevertheless, the figures have to be compared to the expected load of the clinic- side Logstash instance. Too large an increase in the processing time per log message would cause the processing queue to build up indefinitely if log messages from clinic workstations are being forwarded with a high enough frequency. The results sug- gest that a single Logstash instance, on a machine with moderate specs and a single Elasticsearch output configured, can handle 50 000 / 12.22 4 092 messages per ⇡ second or 245 500 events per minute. Enabling a file output slows the rate to 3 058 ⇡ messages per second or 183 519 messages per minute. The framework was deployed in-house, monitoring 38 workstations actively running RayStation for testing during a formal validation phase. During one such week of use, a total of 279 418 log messages were collected from 37 machines, av- eraging 27.7 events per minute or 0.46 events per second. It is safe to say that this amount of usage is well within the bounds of Logstash’s processing capabilities, re- gardless of whether a file output is configured or not. Whether this amount of usage is representative of a medium or large clinic is hard to determine, but it is hard to imagine that usage could be upwards of 665 000 % of that recorded during in-house testing, at which point the delay introduced by the file output begins to create con- cerns. Even then, Logstash can easily be scaled horizontally by allowing multiple instances to share the load. It should be concluded that an increase in processing time from file output is of no concern in deciding whether recreated log files should be used as a means for persistent back-up storage of logs. File backups do, however, come with other costs and issues. Storing messages not only in Elasticsearch, but in a file (or even two, as described in Sections 3.5 and 3.6 on prototypes C3 and C4), increases the disk space requirements by roughly 100 % (or 200 %) compared to only using Elasticsearch as storage. Additionally, be- cause of Logstash’s ability to utilize multiple processor cores to handle multiple mes- sages in parallel, messages written to files are not always written in chronological order. Messages being processed and shipped in parallel is of no concern to Elastic- search, since messages are sorted by log timestamps and not by the time of inges- tion. While not rendering the option of files as a backup solution completely useless, 40 Chapter 5. Discussion the loss of chronology is highly impractical should one wish to use the traditional method of manually reading the log files. The use of file backups was ultimately abandoned in implementation C5 (see Section 3.7). Here, a log message generated on any clinic workstation exists in three separate physical locations in a matter of seconds: on the local disk drive of the workstation itself, in the clinic-side Elasticsearch cluster and in RaySearch’s own cluster. This was deemed to be a more than sufficient amount of redundancy to fulfill the requirement of reliable, persistent storage.

5.1.2 Performance of logging libraries Section 4.2 compared the performance of two logging libraries, namely Microsoft Enterprise Library Logging, used extensively in RayStation, and Serilog, an open- source alternative focused on structured logging. Serilog was found to perform on a par with Microsoft Enterprise Library Logging, even outperforming it in some configurations. The performance of Microsoft Enterprise Library Logging has never been ques- tioned, and the amount of work involved in switching logging libraries is too great to be motivated by a meager performance increase. It could be argued that one or the other is more easily implemented and used during development, but that ques- tion is outside the scope of this thesis. Serilog is of interest because of its structured logging capabilities, not its potential performance benefits. These measurements were mainly made to ensure that it does not come with any performance deficits, unhampered performance being one of requirements listed in Section 1.2.

5.1.3 Logstash performance with structured messages The structured log messages written by Serilog and formatted in JSON allows for the previously implemented, expensive, grok patterns in Logstash to be bypassed. With information already categorized in separate fields, messages require minimal processing by Logstash. The measurements conducted in Section 4.3 served to show that doing so can result in a 2.3 second or 18.9 % reduction in time taken per 50 000 processed messages. In a real scenario, the reduction might be even bigger consid- ering the pipeline using grok filters for information extraction was operating un- der near optimal conditions: all messages were identical and of moderate length. A real application has to handle diverse messages of varying lengths. During de- velopment, messages with particularly large field values or unexpected structures have been seen to cause textttgrok timeout failures, meaning the filter evaluation exceeded the default maximum processing time of 30 seconds. Following the same reasoning as in Section 5.1.1, this performance improvement is not significant. In the test environment, the volume of messages handled has Logstash idling the majority of the time. Indeed, should the information contained in the messages causing timeouts be deemed important enough, settings in Logstash can be modified to increase the processing time limit. Conversely, should the mes- sage volume increase, the limit can be lowered as to minimize the impact of large message processing issues on the overall performance of the pipeline. The benefits of structured logging are, however, not mainly related to performance, but to ro- bustness. The grok filters used in the proof-of-concept framework are the single most likely point of failure in the entire pipeline. Pattern comparisons can time out, or fail to match message contents, either entirely or only in parts. Relying on a set of grok filters, maintenance of the framework becomes an issue as well. Once the 5.2. Security considerations 41

Filebeat-Logstash pipeline is deployed in clinics, any update to the logging behav- ior of RayStation (e.g. the addition of additional fields in an existing log or other changes to the log message structure) must be accounted for by updating the clinic- side Logstash configuration. Even if this could be achieved, the pipeline’s ability to process old log messages would be seriously impeded if not destroyed entirely. Structured logging, on the other hand, puts developers in immediate charge of logging message characteristics. A new field added using structured logging is just another attribute-value pair in a JSON object, which Logstash readily passes on to Elasticsearch, even if it is encountering the new message structure for the first time. Structured logging has some drawbacks of its own. With developers in charge for categorizing log message information, conflicts and information duplication can oc- cur. Two developers working on different program modules might want to log an IP address and decide on different names, one choosing ip and the other ip_address, for example. The result is documents having differently named fields containing the same information. The query ip:192.1.1.1 will only turn up the messages logged using the first of the two logging calls. The most straightforward remedy to this issue is agreeing on a company stan- dard for log message attribute names. Although less elegant, an alternative solution is to set up filters in Logstash for renaming and/or consolidating information con- tained in similarly named fields. An example of this practice is seen in Listing 5.1.

LISTING 5.1: Renaming a field using the mutate filter. if "" in [ip_address] { mutate { rename => { "ip_address" => "ip" } } } This, naturally, constitutes a step back towards the complications of maintaining consistent Logstash configurations described with regards to traditional logging. The grok filtering solution, on the other hand, centralizes the naming of field names. Developers only need to log the information, e.g. an IP address, letting Logstash take care of the naming of the field further down the pipeline, reducing the risk of naming conflicts. Another drawback of structured logging is the reduced readability of the raw log files. A plain text log message is more easily parsed by a human than a JSON object defined on a single line. Depending on the total overhead that logging constitutes in RayStation, a possible solution would be to write two sets of logs: one set in JSON and one human-readable. Another solution would be to use a tool for parsing JSON messages templates and substituting attributes to make the messages easier to read1.

5.2 Security considerations

Prototype C2 (see Section 3.4) explored ways in which data transmissions within the framework can be secured: data transmissions between Filebeat and Logstash instances were secured using self-signed certificates, and access to Kibana was re- stricted using a reverse proxy server. More advanced prototypes, namely prototypes C4 and C5 (see Sections 3.6 and 3.7), introduced separate clinic and RaySearch envi- ronments. In a real deployment of the framework, securing the connection between the clinic-side Logstash instance and the RaySearch Elasticsearch cluster should be

1An example of such a tool is clef-tool, found at https://github.com/datalust/clef-tool. 42 Chapter 5. Discussion considered a top priority. This is the only connection using the Internet, since both Filebeat-Logstash data transfers and uses of the Kibana interface are intended to take place within the local network of the clinic, or RaySearch, respectively. To achieve a secure connection between the clinic and RaySearch environments, both methods previously considered can be employed in tandem. A reverse proxy server should be set up at RaySearch in front of the central Elas- ticsearch cluster. The server should be configured to accept incoming connections over HTTPS, requesting a username and password combination before forwarding requests to Elasticsearch. Separate usernames and password combinations can be set up for each clinic. While, for access to Kibana, this solution offers sufficient se- curity, authentication solely based on username and password prompts might not be enough for a server exposed to the Internet. Having the reverse proxy server ac- cept any incoming connection makes the central storage susceptible to brute force or denial-of-service (DoS) attacks. An attacker could try to guess a clinic’s username and password combination by repeatedly connecting to the proxy server using dif- ferent credentials every time. Even if such an attack should not result in a breach, the increased traffic to RaySearch could result in clinics not being able to successfully deliver their data, the required bandwidth being occupied by malicious requests. To improve on the security provided by the reverse proxy server, (self-signed) client certificates can be issued to clinics to be used by Logstash when sending log data. This effectively produces a two-factor authentication process, where the clients behind incoming connections not only have to supply the correct credentials, but must also offer a valid certificate.

5.3 Privacy considerations

A critical objective for the framework is to comply with the necessary safety and pri- vacy regulations. The processes and components involved in the framework must protect and respect the privacy of individuals, be they patients or clinic employees. The HIPAA and the GDPR lay out separate sets of safety and privacy requirements that must be fulfilled. The straightforward solution is elegantly simple: handle no personal data. The access to personal data of patients or clinic employees provides no significant benefit to the intended and potential future uses of the proposed log management frame- work. The framework should thus be implemented in a way that eliminates the transmitting, processing and storing of personal data. Then again, appropriate mea- sures have to be implemented as safeguards should such information make its way to the log files.

5.3.1 HIPAA Section 2.1.4 mentions five major areas where HIPAA impacts the design and de- ployment of RayStation. In the extent that these requirements extend to the log management framework, most requirements are neatly fulfilled by designs already in place. Physical safeguards already protecting RayStation apply to components of the log management framework as well, and are largely enforced by the clinics them- selves. If no personal data is handled, no Technical safeguards or Audit reports are needed. If future developments of the framework should incorporate the handling of personal data in any way, there are tools for both role based access and audit 5.3. Privacy considerations 43 logging. Even so, reverse-proxy servers have been demonstrated to be a straightfor- ward way to restrict access to the collected data (see Section 3.4). The requirements for Technical policies to reduce the risk of patient data loss are irrelevant with re- gards to the framework. No personal data is intended to be stored throughout the logging solution, and even if that was the case, such data is not needed to provide patients with the treatment they require. Off-site backups of the collected data might be of interest nevertheless, but that is a concern related to the general functionality and robustness of the framework, and not a concern related to personal data privacy. The one item in the list of HIPAA requirements of substantial concern to the design of the log management framework is the one regarding Network, or trans- mission, security, putting demands on the security of all data transfers. Various means of securing data transmissions throughout the framework have been tested and are described in further detail in Section 5.2. The measures described have been deemed necessary to fulfill the requirements put forth in the HIPAA.

5.3.2 GDPR Section 2.1.4 outlines a number of demands stated in the GDPR. The default approach to fulfilling the requirements laid out is to eliminate or otherwise pseudo- nymize all personal data at the earliest possible point in the framework pipeline. The earliest point of elimination is of course at the time a log message containing sensitive information is written. The types of log messages written by RayStation are defined and formatted by RayStation developers. RaySearch has already success- fully implemented privacy by design in its development of RayStation, but before any kind of log management framework is distributed to clinics, logging practices must be thoroughly re-examined. With the advent of logs leaving clinics, personal data that was previously considered to be safely contained within the clinic environ- ment might be inadvertently transmitted to RaySearch. While modifying RaySta- tion’s logging behavior is outside the scope of this thesis, the previously suggested transition to a structured logging model could be argued to have some security and privacy benefits, in addition to the increase in robustness it would provide the log- ging solution. In a structured logging call, attributes have to be named, preferably in a way that is descriptive of their contents. The naming process of logged attributes has the potential to increase awareness about whether any personal data is logged or not. The problem becomes instead that of making sure that personal data is filtered if it is logged despite the developers’ intentions. The proposed framework offers mul- tiple ways to handle inadvertently logged personal data. Although Filebeat offers some simple tools for dropping individual lines based on some pattern, Logstash provides much more sophisticated options. Besides, it is not essential to contain the data on a single machine. If personal data is logged on a workstation, the issue is not worsened significantly by its transmission to a local server running Logstash. The main objective is to keep the data from making it into persistent storage in either of the Elasticsearch clusters at the clinic or, most importantly, at RaySearch. Logstash can easily be configured to remove specific fields or to drop entire mes- sages based on some criterion, but this approach may be found to be too inflexi- ble. While most personal data is entirely irrelevant in an error monitoring context, there are a few attributes currently contained in the logs that is of potential use. The RayStation Error log in particular contains a number of attributes that can be consid- ered personal data with regards to clinic employees. These include the name of the computer the message was logged on, its associated IP address, and the name of the 44 Chapter 5. Discussion currently logged in user. It might be of interest, for example, whether a specific error occurs on one or multiple machines. Not only that, but customer support might be offered support more quickly and reliably if users and machines having problems can be identified. One solution to this issue is to contractually agree with clinics to use certain types of data for purposes to which they offer their consent. Another would be to anonymize or pseudonymize the information, removing the possibility of identi- fying an individual while preserving the capability to cross-reference related mes- sages. Logstash has filters for this purpose, allowing for sensitive data to be pro- tected with hashing. Using the fingerprint filter plug-in, values of sensitive fields can be replaced with hash values. When a log message is processed, two documents are produced. One contains the pseudonymized data, i.e. hash values. The other contains the original values, each of which is coupled with a hash value constituting a key. This document can be stored in another location, e.g. at the clinic. A flowchart of what the pseudonymization process could look like is shown in Figure 5.1.

FIGURE 5.1: Values of sensitive fields can be replaced with hash values before a log message is sent to RaySearch. The original data can be stored at the clinic, where the hash values are used as keys to identify original values.

5.4 Issues encountered during development

During development of the framework, a set of issues were encountered. Whereas some were dealt with successfully, others will require consideration in future imple- mentations of the framework.

5.4.1 Managing clinic-side component configurations Installation of the Filebeat client requires action on part of the user of every client machine. If the specific configuration of Filebeat must be changed, each machine has to be tended to separately. If e.g. a new log file is introduced in RayStation, prospector and appropriate multi-line configurations have to be added. This issue is deemed to be of moderate concern in a real clinic setting. An actual clinic environment is more static than that of the general development environment 5.4. Issues encountered during development 45 at RaySearch, in that clinics typically purchase one major version of RayStation and use it for an extended period of time. While software features are not added or re- moved on a daily basis, as in development, software updates are typically installed every 6–12 months. While Filebeat and RayStation are installed separately in this proof-of-concept solution, future implementations should merge these processes. Filebeat should be considered an extra component during an install of RayStation, and any updates applied to RayStation should be made able to update the Filebeat configuration files as necessary. As was mentioned in Section 5.1.3, the clinic-side Logstash installation suffers from similar issues, with the added complication that the clinic’s central machine cannot be expected to have RayStation installed. As such, the solution proposed with regards to the Filebeat configuration issues is not applicable. Section 5.1.3 cov- ers structured logging as a means to improve the robustness of the framework. An alternative would be to opt for layers of Logstash filters. The clinic-side Logstash in- stance could be set up to only perform minimal filtering tasks, forwarding messages to another Logstash instance at RaySearch. This instance could be dedicated to more advanced filtering, and developers would have immediate control of its configura- tion. This approach comes with two major drawbacks. Firstly, it inhibits the capability of clinics to make meaningful analyses of their own, since the data shipped to their Elasticsearch cluster will only be partially processed. Secondly, it increases the risk of sensitive data being transmitted to RaySearch, potentially violating the security and privacy requirements laid out in Section 2.1.4.

5.4.2 Logstash output limitations When processing and forwarding a log message, Logstash makes sure to receive acknowledgements from all outputs before moving on to the next message in the queue. This ensures every message reaches every configured output without data loss. A consequence of this is that, should one of the configured outputs be unavail- able, log message forwarding to all other sources will be halted as well. This issue is deemed to be of moderate concern for future deployments. Respon- sibility for the upkeep of the clinic-side Elasticsearch instance will likely be carried by each clinic’s IT department, limiting the amount of control RaySearch has. Should the clinic-side Elasticsearch instance be disabled for some reason or other, no log messages will reach RaySearch from that particular clinic. This is problematic from RaySearch’s point of view, but might be considered a benefit to clinics, since it offers a switch whereby clinics can start or stop the flow of data to RaySearch. There are, however, more elegant ways of implementing such a switch should it be requested by the clinic. Clinics are expected to have full control over the clinic-side Logstash instance, and will thereby be able to configure the outputs as they see fit. This proof-of-concept solution has only made use of single-node Elasticsearch clusters, in both the clinic and RaySearch environments. Single-node clusters are more susceptible to availability issues since the whole clusters goes down should the single node experience problems. To improve availability and robustness, clus- ters may be expanded to consist of several nodes. For the RaySearch cluster, this approach would also offer the benefit of load balancing between nodes, should the workload of multiple clinics submitting data simultaneously prove to be too much to handle for a single-node cluster. 46 Chapter 5. Discussion

5.4.3 Sparse indices vs. many indices One of the general recommendations for maintaining good Elasticsearch perfor- mance is to avoid sparsity within indices. In a dense index, almost every document (log message) has a value for each field (e.g. severity or ip). In a sparse index, the opposite is true: the index maps a large number of fields, but every document (log message) only makes use of a small subset of these fields. The number of fields in an index is directly related to the size of each document stored within it. More fields means larger documents, regardless of whether the documents have values for the fields or not. The most notable impact of sparsity is increased storage requirements, but indexing and search speeds are impaired as well. During development of the logging solution, measures were taken to limit the impacts of sparse indices. Since the different log files processed each specify a dif- ferent set of fields, Elasticsearch was set up to create separate indices for each file. Indices were further separated by date, with a new index being created for each log file every day. This resulted in a very large number of indices, especially when data was collected from more machines in prototype C5 (see Section 3.7). A large num- ber of indices (and thereby a large number of shards, or index subdivisions) results in performance issues as well, requiring an enormous memory footprint. This is known as the Kagillion shards problem [21, pp. 586-587]. During development of prototype C5, the performance issues encountered were combated by drastically reducing the numbers of indices used. Elasticsearch was reconfigured to put log messages from all file types in the same index, and to segment indices not on a daily, but a yearly basis. It was concluded that some sparsity within indices is acceptable, and that for an implementation of this scale, with messages each populating around 30 out of 100 fields available in the index, limiting the number of indices created is of much greater importance.

5.4.4 Clinics not fulfilling the minimal networking conditions There may be clinics not fulfilling the minimal networking conditions stated in Chap- ter 3. The reason for this will most likely be that the clinic workstations are com- pletely isolated, even lacking a local connection to a central machine. In such a case, two main options are available. The first, naturally, is to agree with the clinic to install the hardware required to fulfill the minimal conditions. The second is to arrange for the transfer of log files from workstations to a central machine by other means than a local connec- tion. Portable physical drives could be used to periodically move log files from one machine to another, i.e. the central machine responsible for shipping the data to Ray- Search. Instead of being installed on every workstation, Filebeat would have to be set up on the central machine, monitoring some folder to which log files from other machines are copied periodically or on demand. Setting aside, for a moment, the ad- ministrative difficulties of organizing these transfers, this approach would limit the usefulness of the information appended to the log entries. The machine name, for in- stance, is not logged by RayStation in files other than the Error log. However, when Filebeat performs its initial processing, it enriches the entry with information about the machine it is itself running on. In this manner, all other types of log messages are traceable to the machine that logged them, as long as Filebeat is running on that same machine. This functionality would be lost by having a single Filebeat instance perform the processing of log messages from all workstations within a clinic. Ways of getting around this issue include modifying RayStation’s logging behavior so that 5.4. Issues encountered during development 47 more fields are logged. Another option would be to, instead of having a common location where the logs from all machines are copied, set up a set of different fold- ers, one for each unique machine, and copy each machine’s log files to its respective folder. The file location could then be used to discern the machine name. With these complications taken into consideration, the main issue should still be considered to be the effort required of clinic staff to carry out the manual transfer of data from workstations to a central machine.

49

6 Conclusion

The solution developed and implemented during this thesis succeeds to demon- strate viability of a log management framework based on the Elastic stack. All es- sential requirements for general functionality and patient data privacy are met. Is- sues pertaining to the framework in general and to RayStation’s unique conditions in particular are discussed, and suggestions are offered on potential improvements.

6.1 The access problem

The proposed framework offers a solution to the access problem based on the min- imal networking conditions stated in Chapter 3: that any workstation has at least a local connection to at least one machine within the clinic that is connected to the Internet. The proposed solution has two main software components: Filebeat and Logstash. Clinic workstations run Filebeat, continuously forwarding log messages to a central machine within the clinic. The central machine runs Logstash, shipping data to Ray- Search and to a data storage. A combination of reverse proxy servers and transport layer security (TLS/SSL) with client certificates has been demonstrated to be suit- able methods for securing all data transfers. A clinic-side platform for data analysis and insight has been shown to provide the required amount of transparency.

6.2 The processing problem

The proposed framework offers a solution to the processing problem through the use of Logstash, an engine for data collection and transformation, with custom filters designed specifically for RayStation log files. Data extraction has been shown to be customizable and fine-grained enough to provide a basis for analysis. Logstash has been demonstrated to provide tools suit- able for filtering out sensitive information contained in log messages. The benefits of transitioning to structured logging, with regards to both data extraction and security, have been discussed.

6.3 The analysis problem

The proposed framework offers a solution to the analysis problem through the use of two software components: the search engine Elasticsearch and the visualization platform Kibana. Through an in-house deployment of the framework, it has been demonstrated that, together, Elasticsearch and Kibana provides the desired storage, search, and filtering capabilities. A sample of visualizations and dashboards tailored to analyze RayStation software errors have been designed for Kibana.

51

Bibliography

[1] Jordi Aballó Martínez. “Wi-fi tracking system and analysis”. Bachelor thesis. 2017. URL: https://upcommons.upc.edu/handle/2117/114124. [2] Apache log4net. URL: https://logging.apache.org/log4net/. [3] E C Beckmann. “CT scanning the early days”. In: The British Journal of Radiology 79.937 (Jan. 2006), pp. 5–8. ISSN: 0007-1285. DOI: 10.1259/bjr/29444122. URL: https://www.birpublications.org/doi/10.1259/bjr/29444122. [4] Oren Ben-Kiki. YAML Ain’t Markup Language (YAMLTM) Version 1.1. URL: http: //yaml.org/spec/1.1/current.pdf. [5] Anders Bertelsen et al. “Single Arc Volumetric Modulated Arc Therapy of head and neck cancer”. In: Radiotherapy and Oncology 95.2 (May 2010), pp. 142– 148. ISSN: 0167-8140. DOI: 10 . 1016 / j . radonc . 2010 . 01 . 011. URL: http : //www.sciencedirect.com/science/article/pii/S0167814010000629. [6] Nicholas Blumhardt. Serilog 2.0 JSON improvements. URL: https://nblumhardt. com/2016/07/serilog-2-0-json-improvements/. [7] Tim Briggs. How does usage data improve the Office User Experience? – Microsoft Office 2010 Engineering. Feb. 2010. URL: https://blogs.technet.microsoft. com/office2010/2010/02/09/how-does-usage-data-improve-the-office- user-experience/. [8] M. Kara Bucci, Alison Bevan, and Mack Roach. “Advances in Radiation Ther- apy: Conventional to 3D, to IMRT, to 4D, and Beyond”. In: CA: A Cancer Journal for Clinicians 55.2 (2009), pp. 117–134. ISSN: 1542-4863. DOI: 10.3322/ canjclin.55.2.117. URL: https://onlinelibrary.wiley.com/doi/abs/10. 3322/canjclin.55.2.117. [9] Peter Carlson. Apache Lucene - Query Parser Syntax. Tech. rep. 2006. URL: https: //lucene.apache.org/core/2_9_4/queryparsersyntax.pdf. [10] Marji Cermak. and Logstash: centralised logging. Feb. 2016. URL: https: //events.drupal.org/neworleans2016/sessions/drupal-and-logstash- centralised-logging. [11] Jan Degerfält, Ing-Marie Moegelin, and Lena Sharp. Strålbehandling. Studentlit- teratur, 2008. [12] Diagnostics, feedback, and privacy in Windows 10 – Microsoft privacy. URL: https: //privacy.microsoft.com/en-us/windows-10-feedback-diagnostics- and-privacy. [13] Christof Ebert et al. “DevOps”. In: IEEE Software 33.3 (2016), pp. 94–100. ISSN: 0740-7459. [14] Enterprise Library 6 – April 2013. URL: https://docs.microsoft.com/en- us/previous-versions/msp-n-p/dn169621(v%3dpandp.10). 52 BIBLIOGRAPHY

[15] A. Ganapathi and . Patterson. “Crash data collection: a Windows case study”. In: 2005 International Conference on Dependable Systems and Networks (DSN’05). June 2005, pp. 280–285. DOI: 10.1109/DSN.2005.32. [16] Archana Ganapathi, Viji Ganapathi, and David Patterson. “Windows XP Ker- nel Crash Analysis”. In: Proceedings of LISA ’06: 20th Large Installation System Administration Conference (Dec. 2006), pp. 149–159. [17] Alain Gerbaulet and European Society for Therapeutic Radiology and Oncol- ogy. The GEC ESTRO handbook of brachytherapy. OCLC: 52988578. Brussel: ES- TRO, 2002. [18] R. Gerhards. The Syslog Protocol. Tech. rep. RFC5424. Mar. 2009. DOI: 10.17487/ rfc5424. URL: https://www.rfc-editor.org/info/rfc5424. [19] Sai Rakesh Ghanta and Ayoush Mukherjee. “Cloud and Virtualization Based Log Management Service”. In: Advances in Computational Intelligence. Springer, 2017, pp. 211–219. ISBN: 978-981-10-2524-2 978-981-10-2525-9. DOI: 10.1007/ 978- 981- 10- 2525- 9_21. URL: http://link.springer.com/chapter/10. 1007/978-981-10-2525-9_21. [20] Jon Gifford. Why Loggly Chose ElasticSearch Over Solr. July 2014. URL: https:// www.loggly.com/blog/loggly-chose-elasticsearch-reliable-scalable- log-management/. [21] Clinton Gormley and Zachary Tong. Elasticsearch: the definitive guide. First edi- tion. O’Reilly, 2015. ISBN: 978-1-4493-5854-9. [22] E. Grosse and M. Upadhyay. “Authentication at Scale”. In: IEEE Security Pri- vacy 11.1 (Jan. 2013), pp. 15–22. ISSN: 1540-7993. DOI: 10.1109/MSP.2012.162. [23] Heya, Elastic Stack and X-Pack. Learn/Blog. Feb. 2016. URL: https : / / www . elastic.co/blog/heya-elastic-stack-and-x-pack. [24] Hyperthermia in Cancer Treatment. URL: https : / / www . cancer . gov / about - cancer/treatment/types/surgery/hyperthermia-fact-sheet. [25] “Intensity-modulated radiotherapy: current status and issues of interest”. In: International Journal of Radiation Oncology*Biology*Physics 51.4 (2001), pp. 880– 914. ISSN: 0360-3016. DOI: 10 . 1016 / S0360 - 3016(01 ) 01749 - 7. URL: http : //www.sciencedirect.com/science/article/pii/S0360301601017497. [26] R. Kandan et al. “CLOF: A proposed containerized log management orchestra- tion framework”. In: 2017 IEEE Conference on Open Systems (ICOS). Nov. 2017, pp. 13–16. DOI: 10.1109/ICOS.2017.8280266. [27] D. Kim et al. “Which Crashes Should I Fix First?: Predicting Top Crashes at an Early Stage to Prioritize Debugging Efforts”. In: IEEE Transactions on Software Engineering 37.3 (May 2011), pp. 430–447. ISSN: 0098-5589. DOI: 10.1109/TSE. 2011.20. [28] Rauno Kuusisto and Erkki Kurkinen. Proceedings of the 12th European Confer- ence on Information Warfare and Security: ECIW 2013. Academic Conferences Limited, Nov. 2013. ISBN: 978-1-909507-34-0. [29] Kurt Lidén, Sören Mattsson, and R. Bertil R. Persson. Strålande miljö. Miljökun- skap. Lund: Gleerup, 1971. [30] Tim Menzies and Thomas Zimmermann. “Goldfish bowl panel: Software de- velopment analytics”. In: IEEE, June 2012, pp. 1032–1033. ISBN: 978-1-4673- 1066-6 978-1-4673-1067-3. DOI: 10 . 1109 / ICSE . 2012 . 6227117. URL: http : //ieeexplore.ieee.org/document/6227117/. BIBLIOGRAPHY 53

[31] Ruslan Mitkov. The Oxford Handbook of Computational Linguistics. en. Google- Books-ID: yl6AnaKtVAkC. OUP Oxford, 2004. ISBN: 978-0-19-927634-9. [32] NLog. URL: http://nlog-project.org/. [33] Alberto Paro. ElasticSearch Cookbook. Packt Publishing, 2013. ISBN: 978-1-78216- 663-4. URL: http://ebookcentral.proquest.com/lib/uu/detail.action? docID=1572920. [34] Carlos A. Perez, Luther W. Brady, and Edward C. Halperin. Principles and Prac- tice of Radiation Oncology. LWW (PE), 2003. ISBN: 978-1-4698-8568-1. [35] RaySearch Annual Review (2016). 2016. URL: https : / / www . raysearchlabs . com/globalassets/about- overview/media- center/wp- re- ev- n- pdfs/ brochures/annualreview_2016_web_spreads.pdf. [36] REGULATION (EU) 2016/ 679 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL - of 27 April 2016 - on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/ 46/ EC (General Data Protection Regulation). [37] Fanie Reynders. “Logging and Error Handling”. In: Modern API Design with ASP.NET Core 2. Apress, Berkeley, CA, 2018, pp. 113–130. ISBN: 978-1-4842- 3518-8 978-1-4842-3519-5. DOI: 10.1007/978-1-4842-3519-5_7. URL: http: //link.springer.com/chapter/10.1007/978-1-4842-3519-5_7. [38] Marek Rogozinski and Rafal Kuc. ElasticSearch Server. Packt Publishing, 2014. ISBN: 978-1-78398-053-6. [39] American Association for the Advancement of Science. “News of Science”. In: Science 125.3236 (Jan. 1957), pp. 18–22. ISSN: 0036-8075, 1095-9203. DOI: 10 . 1126/science.125.3236.18. [40] Serilog. URL: https://serilog.net/. [41] Vishal Sharma. Beginning Elastic Stack. Berkeley, CA: Apress, 2016. ISBN: 978- 1-4842-1693-4 978-1-4842-1694-1. DOI: 10.1007/978- 1- 4842- 1694- 1. URL: http://link.springer.com/10.1007/978-1-4842-1694-1. [42] Ronald Slocum. “Performance and Health Monitoring and Analysis of Hive Scales Portal Web Application”. In: Technical Library (Jan. 2016). URL: https: //scholarworks.gvsu.edu/cistechlib/234. [43] Michael J. Sydor. APM Best Practices. Berkeley, CA: Apress, 2011. ISBN: 978- 1-4302-3141-7 978-1-4302-3142-4. DOI: 10.1007/978- 1- 4302- 3142- 4. URL: http://link.springer.com/10.1007/978-1-4302-3142-4. [44] Ed T. Bray. The JavaScript Object Notation (JSON) Data Interchange Format. Tech. rep. 2017. DOI: 10.17487/RFC8259. URL: https://www.rfc-editor.org/info/ rfc8259. [45] M Teoh et al. “Volumetric modulated arc therapy: a review of current literature and clinical use in practice”. In: The British Journal of Radiology 84.1007 (Nov. 2011), pp. 967–996. ISSN: 0007-1285. DOI: 10.1259/bjr/22373346. URL: https: //www.birpublications.org/doi/10.1259/bjr/22373346. [46] The Health Insurance Portability and Accountability Act of 1996 (HIPAA). 1996. [47] The thinking behind the Graylog architecture and why it matters to you — Graylog 2.4.4 documentation. URL: http://docs.graylog.org/en/2.4/pages/ideas_ explained.html. 54 BIBLIOGRAPHY

[48] James Turnbull. The Logstash Book. 2017. ISBN: 978-0-9888202-1-0. URL: https: //logstashbook.com/TheLogstashBook_sample.pdf. [49] K.P. Valavanis and A.I. Kokkinaki. “Error specification, monitoring and re- covery in computer-integrated manufacturing: An analytic approach”. In: IEE Proceedings - Control Theory and Applications 143.6 (Nov. 1996), pp. 499–508. ISSN: 1350-2379, 1359-7035. DOI: 10 . 1049 / ip - cta : 19960768. URL: http : //digital-library.theiet.org/content/journals/10.1049/ip-cta_ 19960768. [50] Dirk Van Gestel et al. “RapidArc, SmartArc and TomoHD compared with clas- sical step and shoot and sliding window intensity modulated radiotherapy in an oropharyngeal cancer treatment plan comparison”. In: Radiation Oncology 8 (Feb. 2013), p. 37. ISSN: 1748-717X. DOI: 10.1186/1748-717X-8-37.