FULLTEXT01.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
UPTEC F 18029 Examensarbete 30 hp Juni 2018 Investigation and Implementation of a Log Management and Analysis Framework for the Treatment Planning System RayStation Elias Norrby Abstract Investigation and Implementation of a Log Management and Analysis Framework for the Treatment Planning System RayStation Elias Norrby Teknisk- naturvetenskaplig fakultet UTH-enheten The purpose of this thesis is to investigate and implement a framework for log management and analysis tailored to the treatment planning system (TPS) Besöksadress: RayStation. A TPS is a highly advanced software package used in radiation Ångströmlaboratoriet Lägerhyddsvägen 1 oncology clinics, and the complexity of the software makes writing robust code Hus 4, Plan 0 challenging. Although the product is tested rigorously during development, bugs are present in released software. The purpose of the the framework is to allow the Postadress: RayStation development team insight into errors encountered in clinics by Box 536 751 21 Uppsala centralizing log file data recorded at clinics around the world. Telefon: A framework based on the Elastic stack, a suite of open-source products, is 018 – 471 30 03 proposed, addressing a set of known issues described as the access problem, the Telefax: processing problem, and the analysis problem. Firstly, log files are stored locally on 018 – 471 30 00 each machine running RayStation, some of which may not be connected to the Internet. Gaining access to the data is further complicated by legal frameworks Hemsida: such as HIPAA and GDPR that put constraints on how clinic data can be handled. http://www.teknat.uu.se/student The framework allows for access to the files while respecting these constraints. Secondly, log files are written in several different formats. The framework is flexible enough to process files of multiple different formats and consistently extracts relevant information. Thirdly, the framework offers comprehensive tools for analyzing the collected data. Deployed in-house on a set of 38 machines used by the RayStation development team, the framework was demonstrated to offer solutions to each of the listed problems. Handledare: Karl Lundin Ämnesgranskare: Carl Nettelblad Examinator: Tomas Nyberg ISSN: 1401-5757, UPTEC F 18029 v Sammanfattning Många fält har kunnat dra nytta av de förbättrade beräkningsmöjligheter moderna datorer erbjuder. Inom strålbehandling av cancerpatienter används idag avancerade datorprogram i form av dosplaneringssystem. Med deras hjälp kan behandlingar utformas så att tumörer nås av den ordinerade mängden strålning samtidigt som omgivande frisk vävnad skonas. Ett sådant dosplaneringssytem är RayStation, utvecklat av RaySearch. Program- varans komplexitet gör det svårt att skriva robust kod, och trots noggranna testpro- cedurer finns mjukvarufel i de versioner som används på kliniker. Ett viktigt led i att säkerställa produktens kvalitet är att bearbeta innehållet i de loggfiler som skrivs av RayStation under användning. RaySearch har idag väldigt begränsad tillgång till logginformation från kliniker. Åtkomsten försvåras av att de maskiner som an- vänds på kliniker ofta är isolerade och saknar internetanslutning. Dessutom ställer juridiska ramverk såsom HIPAA och GDPR hårda krav på hur klinikernas data ska hanteras. Syftet med det här projektet har varit att ta fram ett ramverk för loggfilshanter- ing, anpassat för RayStation och den miljö programvaran används i på kliniker. Utöver att lösa problemet med åtkomst ger ramverket exempel på hur informatio- nen i filerna kan bearbetas och analyseras. Det föreslagna ramverket är baserat på Elasticstacken, ett programpaket med öp- pen källkod som är ett populärt val för loggfilshantering. Ramverkets funktionalitet demonstrerades på RaySearchs utvecklingsavdelning där information från loggfiler på 38 maskiner samlades in och analyserades. vii Acknowledgements Firstly, I would like to thank my supervisor, Karl Lundin, as well as the rest of the RayStation Core development team at RaySearch, for aiding me in my work and for welcoming me into their office space. I will miss our morning meetings! Secondly, I would like to thank my subject reader, Carl Nettelblad, for enthusiasti- cally and meticulously reviewing draft upon draft, and for offering insightful com- ments during the writing process. ix Contents 1 Introduction 1 1.1 Background ..................................... 1 1.2 Purpose and tasks .................................. 1 1.2.1 Objective ................................... 2 1.2.2 Strategy ................................... 2 1.3 Delimitations .................................... 2 1.3.1 Focus on exceptions ............................ 2 1.3.2 Focus on functionality ........................... 3 1.3.3 Focus on compatibility with current and future releases ........ 3 1.4 Restatement of the problem ............................ 4 1.5 Restatement of the response ............................ 4 2 Technical background 5 2.1 Context ........................................ 5 2.1.1 A brief introduction to radiation therapy ................ 5 2.1.2 The role of treatment planning systems ................. 6 2.1.3 The role of RaySearch ........................... 7 2.1.4 Patient data privacy ............................ 7 The Health Insurance Portability and Accountability Act (HIPAA) . 8 The General Data Protection Regulation (GDPR) ............ 8 2.2 General log management concepts ........................ 9 2.2.1 Remote log analysis ............................ 10 2.2.2 Secure data transfers ............................ 10 2.2.3 A typical log message ........................... 11 2.2.4 Writing logs ................................. 11 2.2.5 Managing logs ............................... 12 2.3 The Elastic stack ................................... 13 2.3.1 Elasticsearch ................................. 13 Glossary and architecture ......................... 13 Indexing ................................... 13 Scaling and clusters ............................ 14 2.3.2 Logstash ................................... 14 Configuring Logstash ........................... 14 2.3.3 Kibana .................................... 15 2.3.4 Filebeat .................................... 17 2.4 Log files in RayStation ............................... 18 2.4.1 The RayStation Storage Tool log ..................... 18 2.4.2 The RayStation Index Service log ..................... 18 2.4.3 The RayStation Error log .......................... 19 2.4.4 The RaaS logs ................................ 19 x 3 Implementation 21 3.1 A: Setting up a grok pattern debugging pipeline ................ 22 3.2 B: Monitoring performance and collecting system data ............ 24 3.3 C1: Processing RayStation logs and managing multi-line messages ..... 26 3.4 C2: Centralizing logs in a virtual environment ................. 29 3.5 C3: Monitoring multiple workstations ...................... 31 3.6 C4: Simulating the clinic-to-RaySearch relationship .............. 32 3.7 C5: Finalizing the proof-of-concept solution ................... 33 4 Evaluation and results 35 4.1 File output impact on performance ........................ 36 4.2 Comparison of two logging libraries ....................... 37 4.3 Logstash performance with structured messages ................ 38 5 Discussion 39 5.1 Evaluation results .................................. 39 5.1.1 File output impact on performance ................... 39 5.1.2 Performance of logging libraries ..................... 40 5.1.3 Logstash performance with structured messages ............ 40 5.2 Security considerations ............................... 41 5.3 Privacy considerations ............................... 42 5.3.1 HIPAA .................................... 42 5.3.2 GDPR .................................... 43 5.4 Issues encountered during development ..................... 44 5.4.1 Managing clinic-side component configurations ............ 44 5.4.2 Logstash output limitations ........................ 45 5.4.3 Sparse indices vs. many indices ...................... 46 5.4.4 Clinics not fulfilling the minimal networking conditions ....... 46 6 Conclusion 49 6.1 The access problem ................................. 49 6.2 The processing problem .............................. 49 6.3 The analysis problem ................................ 49 1 1 Introduction 1.1 Background A treatment planning system (TPS) is a highly advanced software package that is used in radiation oncology clinics to generate radiation plans for cancer patients. The overall objective of the TPS is to create treatment plans that give the prescribed amount of absorbed radiation dose to the tumor, while sparing the surrounding healthy tissue as much as possible. RayStation is a TPS developed by RaySearch Laboratories. Used by 400 clinics in 25 countries, it is one of the most widely used treatment planning systems on the market [35]. The complexity of the software makes writing robust code challeng- ing. With the correctness of calculations conceivably carrying the weight of life or death for patients, bugs and otherwise unexpected states have to be given grave con- sideration during runtime, usually meaning the termination of the program. Even though the product is put through rigorous testing pre-release, bugs are present in live software. A key part in the current process for troubleshooting RayStation software prob- lems, and eliminating bugs, is to analyze the contents of log files written by the software during runtime and when the application crashes. Today, the information contained in these