<<

DKTK Kaapana Technical Documentation Release 0.1.1

Kaapana Team Heidelberg

Sep 24, 2021 CONTENTS

1 What is Kaapana? 1

2 Getting started 3 2.1 What’s needed to run Kaapana? ...... 3

3 Build Kaapana 5 3.1 Build Requirements ...... 5 3.2 Build modes ...... 7 3.3 Build Dockerfiles and Helm Charts ...... 7

4 Install Kaapana 9 4.1 Step 1: Server Installation ...... 9 4.2 Step 2: Platform Deployment ...... 10

5 User guide 13 5.1 Default configuration ...... 13 5.2 Storage stack: Kibana, Elasticsearch, OHIF and DCM4CHEE ...... 14 5.3 Processing stack: Airflow, Kubernetes namespace flow-jobs and the work- ing directory ...... 15 5.4 Core stack: Landing Page, Traefik, Louketo, Keycloak, Grafana, Kubernetes and Helm ...... 17

6 Extensions 20 6.1 Workflows ...... 20 6.2 Applications ...... 23

7 Development guide 24 7.1 Getting started ...... 24 7.2 Write your first own DAG ...... 25 7.3 Deploy an own processing algorithm to the platform ...... 27 7.4 Deploy a Flask Application on the platform ...... 32

8 Frequently Asked Questions (FAQ) 35 8.1 There seems to be something wrong with the landing-page visualization in the Browser ...... 35 8.2 Kibana dashboard does not work ...... 35 8.3 Proxy configuration ...... 36 8.4 Setup a connection to the Kubernetes cluster from your local workstation . 36 8.5 Failing to install an extension ...... 37

i 9 Glossary 38

10 Releases 40 10.1 Version 0.1.0 ...... 40

11 License and contact 41 11.1 License ...... 41 11.2 Contact ...... 42

Index 43

ii CHAPTER ONE

WHAT IS KAAPANA?

Kaapana (from the hawaiian word ka￿āpana, meaning “distributor” or “part”) is an open source toolkit for state of the art platform provisioning in the field of medical data analysis. The applications comprise AI-based workflows and federated learning scenarios with a focus on radiological and radiotherapeutic imaging. Obtaining large amounts of medical data necessary for developing and training modern machine learning methods is an extremely challenging effort that often fails in a multi- center setting, e.g. due to technical, organizational and legal hurdles. A federated ap- proach where the data remains under the authority of the individual institutions and is only processed on-site is, in contrast, a promising approach ideally suited to overcome these difficulties. Following this federated concept, the goal of Kaapana is to provide a framework and a set of tools for sharing data processing algorithms, for standardized workflow design and execution as well as for performing distributed method development. This will facilitate data analysis in a compliant way enabling researchers and clinicians to perform large-scale multi-center studies. By adhering to established standards and by adopting widely used open technologies for private cloud development and containerized data processing, Kaapana integrates seam- lessly with the existing clinical IT infrastructure, such as the Picture Archiving and Commu- nication System (PACS), and ensures modularity and easy extensibility. Core components of Kaapana are: • dcm4chee: open source PACS system serving as a central DICOM data storage in Kaapana • Elasticsearch: search engine used to make the DICOM data searchable via their tags and meta information • Kibana: visualization dashboard enabling the interactive exploration of the DICOM data stored in Kaapana and indexed by Elasticsearch • Airflow: workflow management system that enables complex and flexible data pro- cessing workflows in Kaapana via container chaining • Kubernetes: container orchestration • Keycloak: user authentication • Docker: container system to provide algorithms as well as the platform components itself Kaapana is constantly developing and currently includes the following key-features:

1 DKTK Kaapana Technical Documentation, Release 0.1.1

• Large-scale image processing with SOTA deep learning algorithms, such as nnU-Net image segmentation • Analysing, evaluation and viewing of processed images and data • Simple integration of new, customized algorithms and applications into the frame- work • System monitoring • User management Currently the most widely used platform realized using Kaapana is the Joint Imaging Plat- form (JIP) of the German Cancer Consortium (DKTK). The JIP is currently being deployed at all 36 german university hospitals with the objective of distributed radiological image analysis and quantification. For more information, please also take a look at our recent publication of the Kaapana- based Joint Imaging Platform in JCO Clinical Cancer Informatics (JCO).

2 CHAPTER TWO

GETTING STARTED

This manual is intended to provide a quick and easy way to get started with Kaapana. Kaa- pana is not a ready-to-use software but a toolkit that enables you to build the platform that fits your specific needs. The steps described in this guide will build an example platform, which is a default configuration and contains many of the typical platforms components. This basic platform can be used as a starting-point to derive a customized platform for your specific project.

2.1 What’s needed to run Kaapana?

1. Host System

You will need some kind of server to run the platform on. Minimum specs:

• OS: CentOS 8, Ubuntu 20.04 or Ubuntu Server 20.04 • CPU: 4 cores • Memory: 8GB (for processing > 30GB recommended) • Storage: 100GB (deploy only) / 150GB (local build) -> (recommended >200GB) 2. Container Registry

Hint: Get access to our docker registry In case you just want to try out the platform, you are very welcome to reach out to us via slack or email. In this case, we will provide you credentials to our docker registry from which you can directly install the platform and skip the building part!

To provide the services in Kaapana, the corresponding containers are needed. These can be looked at as normal binaries of Kaapana and therefore only need to be built if you do not have access to already built containers via a container registry. This flow-chart should help you to decide if you need to build Kaapana and which mode to choose:

3 DKTK Kaapana Technical Documentation, Release 0.1.1

3. Build Build Kaapana 4. Installation Install Kaapana

2.1. What’s needed to run Kaapana? 4 CHAPTER THREE

BUILD KAAPANA

3.1 Build Requirements

Important: Disk space needed: For the complete build of the project ~50GB of container images will be stored at /var/snap/docker/common/var-lib-docker. If you use build-mode local it will be ~120GB since each container will be also imported separately into containerd. In the future we will also provide an option to delete the docker image after the import.

Before you get started you should be familiar with the basic concepts and components of Kaapana see What is Kaapana?. You should also have the following packages installed on your build-system. We expect the sudo systemctl restart snapd 1. Dependencies Ubuntu Centos

sudo apt update && sudo apt install -y curl git python3 python3-pip

sudo yum install -y curl git python3 python3-pip

2. Clone the repository:

git clone https://github.com/kaapana/kaapana.git

3. Python requirements python3 -m pip install -r kaapana/build-scripts/requirements.txt

5 DKTK Kaapana Technical Documentation, Release 0.1.1

4. Snap Ubuntu Centos

Check if snap is already installed: snap help --all If not run the following commands: sudo apt install -y snapd A reboot is needed afterwards!

Check if snap is already installed: snap help --all If not run the following commands: sudo yum install -y epel-release sudo yum update -y sudo yum install snapd sudo systemctl enable --now snapd.socket sudo snap wait system seed.loaded

5. Docker sudo snap install docker --classic --channel=latest/stable 6. In order to docker commands as non-root user you need to execute the following steps:

sudo groupadd docker sudo usermod -aG docker $USER For more information visit the Docker docs

7. Helm sudo snap install helm --classic --channel=3.5/stable 8. Reboot sudo reboot 9. Test Docker

docker run hello-world -> this should work now without root privileges

10. Helm plugins

3.1. Build Requirements 6 DKTK Kaapana Technical Documentation, Release 0.1.1

helm plugin install https://github.com/chartmuseum/helm-push helm plugin install https://github.com/instrumenta/helm-kubeval

3.2 Build modes

If you don’t have access to a container registry with already built containers for Kaapana, you need to build them first. This is comparable to a binary of regular software projects - if you already have access to it, you can continue with step 3.

The complete build will take ~1h (depending on the system)! Currently Kaapana supports two different build-modes:

1. Local build

By choosing this option you will need no external container registry to install the platform. All containers will be build and used locally on the server.

2. Container registry

This option will use a remote container registry. Since we’re also using charts and other artifacts, the registry must have OCI support . We recommend Gitlab or Harbor as registry software. Unfortunately, Dockerhub does not yet support OCI, and thus cannot currently be used with Kaapana. We recommend gitlab.com as a replacement.

The following sections include a configuration example for each of the options (if applica- ble).

3.3 Build Dockerfiles and Helm Charts

The build-process will be handled with a build-script, which you can find within the repos- itory at kaapana/build-scripts/start_build.py. Before you start the build-process, you should have a look at the build-configuration at kaapana/build-scripts/build-configuration.yaml and adapt it accordingly to your cho- sen build configuration as shown below. Local build Private registry

3.2. Build modes 7 DKTK Kaapana Technical Documentation, Release 0.1.1

http_proxy: "" default_container_registry: "" log_level: "WARN" build_containers: true push_containers: false push_dev_containers_only: false build_charts: true push_charts: false create_package: true

You need to login first: docker login . Then you must adjust the configuration as follows:

http_proxy: "" default_container_registry: "" (e.g. registry.gitlab.com// ,→ .) log_level: "WARN" build_containers: true push_containers: true push_dev_containers_only: false build_charts: true push_charts: true create_package: false

Adjust build-configuration: nano kaapana/build-scripts/build-configuration.yaml

Start the build process: python3 kaapana/build-scripts/start_build.py

3.3. Build Dockerfiles and Helm Charts 8 CHAPTER FOUR

INSTALL KAAPANA

Important: Access to a docker registry with build docker containers: Before proceeding with further installation steps, make sure you have access to a docker registry with the build Kaapana docker containers otherwise please visit Getting started.

The domain,hostname or IP-address has to be known and correctly configured for the system. If a proxy is needed, it should already be configured at /etc/environment (reboot needed after configuration!).

Hint: Supported browsers We recommend Chrome as a browser. Supported are the newest versions of Google Chrome and Firefox. Safari has some known issues with the user-interface of Traefik, some functionalities of the OHIF viewer as well as no-vnc-based application (like MITK). Internet Explorer and Microsoft Edge are not really tested.

4.1 Step 1: Server Installation

This part describes the preparation of the host system for Kaapana. Besides a few required software packages, mainly Microk8s is installed, to setup Kubernetes.

Hint: GPU support -> Currently only Nvidia GPUs are supported! GPU support requires installation of the Nvidia drivers . For Ubuntu Server 20.04 sudo apt install nvidia-driver-450-server should also work BUT check the settings afterwards (see)

9 DKTK Kaapana Technical Documentation, Release 0.1.1

-> sudo systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target –> reboot required! Please make sure the nvidia-smi command is working as expected!

Before the example platform “Kaapana-platform” can be deployed, all dependencies must be installed on the server. To do this, you can use the server-installation-script, located at kaapana/server-installation/server_installation.sh, by following the steps listed below. 1. Copy the script to your target-system (server) 2. Make it executable:

chmod +x server_installation.sh

3. Execute the script:

sudo ./server_installation.sh

4. Reboot the system

sudo reboot

5. (optional) Enable GPU support for Microk8s

sudo ./server_installation.sh -gpu

4.2 Step 2: Platform Deployment

Hint: Filesystem directories In the default configuration there are two locations on the filesystem, which will be used for stateful data on the host machine: 1. fast_data_dir=/home/kaapana: Location of data that do not take a lot of space and should be loaded fast. Preferably, a SSD is mounted here. 2. slow_data_dir=/home/kaapana: Location of huge files, like images or our object store is located here. Preferably, a HDD is mounted here. They can be adjusted in the platform-installation-script and can also be identical (everything is stored at one place).

4.2. Step 2: Platform Deployment 10 DKTK Kaapana Technical Documentation, Release 0.1.1

The platform is deployed using the platform-installation-script, which you can find at kaapana/platforms/kaapana-platform/platform-installation/install_platform.sh. Copy the script to your target-system (server) and adjust it as described below: 1. Open the install_platform.sh script on the server nano install_platform.sh 2. Have a look at the variables on top of the script. You need to do at least the following customizations: Local build Private registry

... CONTAINER_REGISTRY_URL="" ...

... CONTAINER_REGISTRY_URL="" ...

3. Make it executable with chmod +x install_platform.sh 4. Execute the script: Local build Private registry ./install_platform.sh --chart-path kaapana/build/kaapana-platform-. tgz ./install_platform.sh You may be asked the following questions: 1. Please enter the credentials for the Container-Registry: Use the same credentials you used before with docker login 2. Enable GPU support? Answer yes if you have a Nvidia GPU, installed drivers and enabled GPU for Microk8s. 3. Please enter the domain (FQDN) of the server. You should enter the domain, hostname or IP-address where the server is acces- sible from client workstations. Keep in mind, that valid SSL-certificates are only working with FQDN domains. 4. Which version do you want to install?: Specify the version you want to install. The script will stop and wait until the platform is deployed. Since all Docker containers must be downloaded, this may take some time (~15 min). After a successful installation you’ll get the following message:

4.2. Step 2: Platform Deployment 11 DKTK Kaapana Technical Documentation, Release 0.1.1

Installation finished. Please wait till all components have been downloaded and started. You can check the progress with: watch microk8s.kubectl get pods --all-namespaces When all pod are in the "running" or "completed" state, you can visit: You should be welcomed by the login page. Initial credentials: username: kaapana password: kaapana

4.2. Step 2: Platform Deployment 12 CHAPTER FIVE

USER GUIDE

This manual is intended to provide a quick and easy way to get started with the kaapana- platform. In detail, after given default configurations we will introduce the storage, the processing and the core stack of the platform and show examples of how to use them.

Hint: This project should not be considered a finished platform or software.

5.1 Default configuration

5.1.1 Default credentials

Main Kaapana Login: username: kaapana password: kaapana Keycloak Usermanagement Administrator: username: admin password: Kaapana2020 Minio: username: kaapanaminio password: Kaapana2020 Most likely you will not need the Minio admin password. Use the Login with OpenID in- stead.

13 DKTK Kaapana Technical Documentation, Release 0.1.1

5.1.2 Port Configuration

In the default configuration only four ports are open on the server: 1. Port 80: Redirects to https port 443 2. Port 443: Main https communication with the server 3. Port 11112: DICOM receiver port, which should be used as DICOM node in your pacs 4. Port 6443: Kubernetes API port -> used for external kubectl communication and se- cured via the certificate

5.2 Storage stack: Kibana, Elasticsearch, OHIF and DCM4CHEE

In general, the platform is a processing platform, which means that it is not a persistent data storage. Ideally, all the data on the platform should only form a copy of the original data. Data that are in DICOM format are stored in an internal PACS called DCM4CHEE. For all non-dicom data, an object store called Minio is available. In Minio, data are stored in buckets and are accessible via the browser for download. Most of the results that are generated during a pipeline will be saved in Minio. Finally a component called Clinical Trial Processor (CTP) was added to manage the distribution and acceptance of images. It opens the port 11112 on the server to accept DICOM images directly from, e.g. a clinic PACS. If you are more interested in the technologies, you can get started here: • Kibana • Elasticsearch • Minio • OHIF • Clinical Trial Processor (CTP)

5.2.1 Getting images into the platform

In order to get started with the platform, first you will have to send images to the plat- form. There are two ways of getting images into the platform. The preferred way is to use the provided DICOM receiver on port 11112. If you have images locally you can use e.g. DCMTK. However, any tool that sends images to a DICOM receiver can be used. Alter- natively, you can upload DICOM images via drag&drop on the landing page, however, we only recommend this for small image sizes. Also the upload expects a zip file of images with a .dcm ending. Here is an example of sending images with DCMTK: dcmsend -v 11112 --scan-directories --call --scan-pattern '*' --recurse

The AE title should represent your dataset since we use it for filtering images on our Meta- Dashboard in Kibana.

5.2. Storage stack: Kibana, Elasticsearch, OHIF and DCM4CHEE 14 DKTK Kaapana Technical Documentation, Release 0.1.1

When DICOMs are sent to the DICOM receiver of the platform two things happen. Firstly, the DICOMs are saved in the local PACs system called DCM4CHEE. Secondly, the meta data of the DICOMs are extracted and indexed by a search engine (powered by Elasticsearch) which makes the meta data available for Kibana. The Kibana dashboard called “Meta dash- board” is mainly responsible for visualizing the metadata but also serves as a filtering tool in order to select images and to trigger a processing pipeline. Image cohorts on the Meta dashboard can be selected via custom filters at the top. To ease this process, it is also possible to add filters automatically by clicking on the graphs (+/- pop ups).

5.2.2 Deleting images from the platform

Information of the images are saved in the PACS and in Elasticsearch. A workflow called delete-series-from-platform is provided to delete images from the platform. Simply go to the Meta Dashboard, select the images you want to delete and start the workflow. On the Airflow dashboard you can see when the DAG delete-series-from-platform has fin- ished, then all your selected images should be deleted from the platform. For more infor- mation check out the documentation of the workflow at Delete series from platform (delete- series-from-platform).

5.2.3 Viewing images with OHIF

A web-based DICOM viewer (OHIF) has been integrated to show images in the browser. The functionality of this viewer is limited at the moment, but more features will come soon. To view images, go to OHIF and click on the study. When e.g. a segmentation is available you can visualize the segmentation by dragging it into the main window.

5.3 Processing stack: Airflow, Kubernetes namespace flow-jobs and the working directory

In order to apply processing pipelines in which different operations are performed in a certain order to images, a framework is necessary which allows us to define and trigger such a pipeline. We decided to use Airflow for that. In Airflow, a workflow is called a DAG (directed acyclic graph, a graph type where you can only traverse forwards). It consists of operators which are the bricks of your pipeline. Ideally, every operator triggers a Docker container in which some kind of task is performed. A detailed overview of the concepts can be found here. Besides Airflow, Kubernetes is used to manage the Docker containers that are triggered by Airflow. On the platform, we introduce a namespace called flow-jobs in which all con- tainers initiated by Airflow are started. If you are more interested in the technologies, you can get started here: • Airflow • Kubernetes

5.3. Processing stack: Airflow, Kubernetes namespace flow-jobs and the working15 directory DKTK Kaapana Technical Documentation, Release 0.1.1

5.3.1 Triggering workflows with Kibana

As mentioned above, Kibana visualizes all the metadata of the images and is therefore a good option to also filter the images to which a workflow should be applied. To trigger a workflow from Kibana, a panel send_cohort was added to the Kibana dashboard which contains a dropdown to select a workflow, the option between single file and batch pro- cessing and a send button to send the request to Airflow.

Hint: Check out the difference between single file and batch processing

In order to trigger a workflow on images filter the images to which you want to apply the pipeline and trigger a workflow e.g. collect-metadata, batch processing, Send x results. Once Kibana has sent its request, the Airflow pipeline is triggered. If you navigate to Airflow, you should see that the DAG collect-meta data is running. By clicking on the DAG you will see different processing steps, that are called operators. In the opera- tors, first the query of Kibana is used to download the selected images from the local PACS system DCM4CHEE to a predefined directory of the server so that the images are available for the upcoming operators (get-input-data), then the dicoms are anonymized (dcm-anonmyizer), the meta data are extracted and converted to jsons (dcm2json), the gen- erated jsons are concatenated (concatenated-metadata), the concatenated json is send to Minio (minio-actions-put) and finally, the local directory is cleaned again. You can check out the Development guide to learn how to write your own DAGs. Also you can go to Minio to see if you find the collected meta data.

5.3.2 Debugging

This short section will show you how to debug in case a workflow throws an error. Syntax errors: If there is a syntax error in the implementation of a DAG or in the implementation of an operator, the errors are normally shown directly at the top of the Airflow DAGs view in red. For further information, you can also consult the log of the container that runs Airflow. For this, you have to go to Kubernetes, select the namespace flow and click on the Airflow pod. On the top right there is a button to view the logs. Since Airflow starts two containers at the same time, you can switch between the two outputs at the top in ‘Logs from…’. Operator errors during execution: • Via Airflow: when you click in Airflow on the DAG you are guided to the ‘Graph View’. Clicking on the red, failed operator a popup opens where you can click on ‘View Log’ to see what happened. • Via Kubernetes: in the namespace flow-jobs, you should find the running pod that was triggered from Airflow. Here you can click on the logs to see why the container failed. If the container is still running, you can also click on ‘Exec into pod’ to debug directly into the container.

5.3. Processing stack: Airflow, Kubernetes namespace flow-jobs and the working16 directory DKTK Kaapana Technical Documentation, Release 0.1.1

After you resolved the bug in the operator, you can either restart the whole workflow from Kibana or you can click on the operator in the ‘Graph View’, select ‘Clear’ in the popup and confirm the next dialog. This will restart the operator.

5.4 Core stack: Landing Page, Traefik, Louketo, Keycloak, Grafana, Kubernetes and Helm

From a technical point of view the core stack of the platform is Kubernetes, which is a container-orchestration system managing all the docker containers. Helm is the tool that we use to ship out our Kubernetes deployments. Traefik is a reverse proxy, managing the conversation between all components. Louketo and Keycloak form the base for user authentication. Finally, the landing page wraps all of the services in kaapana-platform into one uniform webpage. To find out more about the technologies checkout: • Helm • Kubernetes • Grafana • Traefik • Keycloak

5.4.1 Launching extensions via the landing page

On the landing page you can find a section called Extensions. Extensions can be workflows (that are used in Airflow) or static applications like a Jupyter Notebook. In general, the extensions can be understood like an app store, where new services and workflows can be installed and managed. Under the hood, Helm Charts are installed and uninstalled via the GUI. Most of the applications that are launched mount the Minio directory, so that you can directly work with the data that are generated in a workflow. In example, you can trigger the download-selected-files DAG to download images to Minio and then watch the data starting an MITK-Volume instance. In the Development guide you will learn how to write and add your own extensions.

5.4.2 Keycloak: Add users to the platform

Keycloak is an open source identity and access management solution that we integrated in our platform to manage authentication and different user roles. You can access keycloak via the dashboard (only if you have admin rights) or directly via /auth/. Please check out the documentation of Keycloak to find out what Keycloak is capable of. Here is an example of how to add new users to the platform: Depending on your needs you can add users manually or connect Keycloak instance i.e. to an Active Directory.

5.4. Core stack: Landing Page, Traefik, Louketo, Keycloak, Grafana, Kubernetes 17 and Helm DKTK Kaapana Technical Documentation, Release 0.1.1

• Adding a user manually: Once you are logged in you can add users in the section Users. By selecting a user you can change i.e. his password in the tab Credentials or change his role under Role mappings. Try i.e. to add a user who has no admin rights, only user rights. Currently there are only two user roles. The admin has some more privileges than a normal user, i.e. a user can not access the Kubernetes dashboard and can not see all components on the landing page. • Connecting with an Active Directory: In order to connect to an active directory go to the tap User Federation. Depending on your needs select ldap or kerberos. The necessary configuration you should be able to get from your institution. If everything is configured correctly you should be able to login with your credentials from the Active Directory.

5.4.3 Grafana and Prometheus

As with all platforms, a system to monitor the current system status is needed. To provide this, the kaapana utilized a commonly used combination of Prometheus and Grafana. The graphical dashboards present states such as disk space, CPU and GPU memory usage, network pressure etc.

5.4.4 Kubernetes: Your first place to look if something does not work

As mentioned above, Kubernetes is the basis of the whole platform. You can talk to Ku- bernetes either via the Kubernetes Dashboard, accessible on the landing page or via the terminal directly on your server. You can even talk to the Kuberentes cluster from another machine by setting up a connection to it (see here). In case anything on the platform is not working, Kubernetes is the first place to go. Here are two use cases, when you might need to access Kubernetes. Case 1: Service is down In case you can’t access a resource anymore most probably a Pod is down. In this case you first need to check why. For this you go to the Kubernetes-Dashboard. Select at the top a Namespace and then click on Pods. The pod which is down should appear in a red/orange color. Click on the pod. Add the top right, you see four buttons. First click on the left one, this will show the logs of the container. In the best case you see here, why your pod is down. To restart the pod you need to simply delete the pod. In case it was not triggered by an Airflow-Dag it should restart automatically (The same steps can be done via the console, see below). In case the component/service crashes again, there might be some deeper error. Case 2: Platform is not responding When your platform does not respond this can have different reasons. • Pods are down: In order to check if and which services are down please log in to your server, where you can check if pods are down with: kubectl get pods -A

If all pods are running, most probably there are network errors. If not, a first try would be to delete the pod manually. It will then be automatically restarted. To delete a pod via the

5.4. Core stack: Landing Page, Traefik, Louketo, Keycloak, Grafana, Kubernetes 18 and Helm DKTK Kaapana Technical Documentation, Release 0.1.1 console. You need do copy the “NAME” and remember the NAMESPACE of the pod you want to delete and then execute: kubectl delete pods -n

• Network errors: In case of network errors, there seems to be an error within your local network. E.g. your server domain might not work.

5.4. Core stack: Landing Page, Traefik, Louketo, Keycloak, Grafana, Kubernetes 19 and Helm CHAPTER SIX

EXTENSIONS

6.1 Workflows

6.1.1 Collect metadata (collect-metadata)

What’s going on? 1) DICOMs are anonymized by removing a list of personal tags 2) Meta data of the DICOMs are extracted and written to JSON files 3) JSON files are concatenated to one JSON file. 4) JSON file is zipped and send with a timestamp to the bucket download in Minio, where the file can be downloaded

Input data: DICOMs

Start processing: Select collect-metadata + BATCH PROCESSING and click SEND x RESULTS

6.1.2 Delete series from platform (delete-series-from-platform)

What’s going on? 1) DICOMs are deleted from the PACS. 2) Meta data of DICOMs are deleted from the Elasticsearch database.

Input data: Filter for DICOMs that you want to remove from the platform. Since in the current verison the files are copied to the local SSD drive, please, do not select too many images at once.

Start processing: Select delete-dcm-from-platform + BATCH PROCESSING and click SEND x RESULTS

Hint:

20 DKTK Kaapana Technical Documentation, Release 0.1.1

DCM4CHEE needs some time (maybe around 10-15 min) to fully delete the images.

6.1.3 Download series from platform (download-selected-files)

What’s going on? 1) DICOMs are send to the bucket download in Minio. If the option zipped is used, they are saved with a timestamp in the download bucket.

Input data: DICOMs

Start processing: Select download-selected-files + BATCH PROCESSING or SINGLE FILE PROCESSING and click SEND x RESULTS

6.1.4 nnUNet (nnunet-predict)

Method: “Automated Design of Deep Learning Methods for Biomedical Image Segmentation” Authors: Fabian Isensee, Paul F. Jäger, Simon A. A. Kohl, Jens Petersen, Klaus H. Maier-Hein Cite as: arXiv:1904.08128 [cs.CV]

What’s going on? 1) Model is downloaded 2) DICOM will be converted to .nrrd files 3) Selected task is applied on input image 4) .nrrd segmentations will be converted to DICOM Segmentation (DICOM SEG) object. 5) DICOM SEGs will be sent to the internal platform PACS

Input data: Depending on the Task see for more information on Github

Start processing: Select nnunet + SINGLE FILE PROCESSING and click SEND x RESULTS

6.1. Workflows 21 DKTK Kaapana Technical Documentation, Release 0.1.1

6.1.5 Automatic organ segmentation (shapemodel-organ-seg)

Method: “3D Statistical Shape Models Incorporating Landmark-Wise Random Regression Forests for Omni-Directional Landmark Detection” Authors: Tobias Norajitra and Klaus H. Maier-Hein DOI: 10.1109/TMI.2016.2600502

What’s going on? 1) DICOM will be converted to .nrrd files 2) Normalization of input images 3) Parallel segmentation of liver,spleen and kidneys (left and right) 4) .nrrd segmentations will be converted to DICOM Segmentation (DICOM SEG) object. 5) DICOM SEGs will be sent to the internal platform PACS

Input data: Filter for abdominal CT scans within the meta dashboard.

Start processing: Select organ-segmentation + SINGLE FILE PROCESSING and click SEND x RESULTS

6.1.6 Radiomics (radiomics-dcmseg)

TBA

What’s going on? 1) Selected DICOM SEGs are converted not .nrrd files 2) Corresponding CT file is downloaded form the PACS 3) Downloaded CT files are converted to *.nrrd 4) Radiomics is applied on selected DICOMs 5) Extracted radiomics data are pushed to the bucket radiomics in Minio and can be downloaded there

Input data: DICOM Segmentations

Start processing: Select radiomics + BATCH PROCESSING or SINGLE FILE PROCESSING and click SEND x RESULTS

6.1. Workflows 22 DKTK Kaapana Technical Documentation, Release 0.1.1

6.2 Applications

6.2.1 Code server

What’s going on? The code server is used for developing new DAGs and operators for Airflow. It mount the workflows directory of the kaapana

Mount point: /workflows

6.2.2 Jupyter lab

What’s going on? The Jupyter lab can be used to quickly analyse data that are saved to the object store Minio. We tried to preinstall most of the common python packages. Please do not use the Jupyter notebook for sophisticated calculations. Here, it is better to write an Airflow DAG

Mount point: /minio

6.2.3 MITK Flow

What’s going on? The MITK Flow is an instance of MITK to watch image data.

Mount point: /minio

6.2.4 Tensorboard

What’s going on? Tensorboard can be launched to analyse generated results during an training, which will come in the future. It also mounts to the Minio directory.

Mount point: /minio

6.2. Applications 23 CHAPTER SEVEN

DEVELOPMENT GUIDE

This guide is intended to provide a quick and easy way to get started with developments on the platform. The guide currently consists of three parts. The parts Write your first own DAG and Deploy an own processing algorithm to the platform focus on the implementation of pipelines for Air- flow in order to apply processing steps to images. The part Deploy a Flask Application on the platform explains how to develop a flask web application and integrate it as an extension into the Kaapana technology stack.

7.1 Getting started

7.1.1 List of the technologies used within this guide

These tutorials/technologies are good references, when starting with the Kaapana deploy- ment: • Docker: Necessary when you want to build container on your local machine • Airflow: Our pipeline tool for processing algorithms • Kubernetes: (Advanced) On your local machine - necessary when you want to talk to the Kubernetes cluster from your local machine • Helm: (super advanced) - our package manager for Kubernetes. Necessary when you want to build helm packages on your local machine All of the below examples are taken from the templates_and_examples folder of our Github repository!

7.1.2 Preparations for the development

Requirements: • Running version of the Kaapana platform and access to a terminal where the plat- form is running • Installation of Docker on your local machine • Installation of DCMTK (optional: convenient way to send images to the platform) Upload images to the platform

24 DKTK Kaapana Technical Documentation, Release 0.1.1

• You have two options to upload images to the platform – Using that Data Upload: Create a zip file of images that end with .dcm and upload the images via drag&drop on the landing page in the section “Data upload” – Send images with dcmtk e.g.: dcmsend -v 11112 --scan-directories --call --scan-pattern '*' --recurse

• Go to Meta on the landing page to check if the images were successfully uploaded • In order to create a development environment to add new DAGs to the platform go to the extension section on the landing page and install the code-server-chart. Clicking on the link you will be served with a Visual Studio Code environment in the direc- tory of Airflow, where you will find the Kaapana plugin (workflows/plugins), the data during processing (workflows/data), the models (workflows/models) and the direc- tory for the DAGs definition (workflows/dags). In order to get a general idea about how to use the platform checkout TODO. Furthermore, it might be helpful to check out the TODO in order to get an idea of the concepts of the Kaapana platform.

7.2 Write your first own DAG

Aim: Create a DAG that converts DICOMs to .nrrd files In order to deploy now a new DAG that convert DICOMs to nrrds, create a file called dag_example_dcm2nrrd.py inside the dags-folder with the following content: from airflow.utils.log.logging_mixin import LoggingMixin from airflow.utils.dates import days_ago from datetime import timedelta from airflow.models import DAG from kaapana.operators.DcmConverterOperator import DcmConverterOperator from kaapana.operators.LocalGetInputDataOperator import LocalGetInputDataOperator from kaapana.operators.LocalWorkflowCleanerOperator import␣ ,→LocalWorkflowCleanerOperator from kaapana.operators.LocalMinioOperator import LocalMinioOperator log = LoggingMixin().log ui_forms = { "workflow_form":{ "type": "object", "properties":{ "single_execution":{ "title": "single execution", "description": "Should each series be processed separately?", "type": "boolean", "default": False, "readOnly": False, } } (continues on next page)

7.2. Write your first own DAG 25 DKTK Kaapana Technical Documentation, Release 0.1.1

(continued from previous page) } } args = { 'ui_forms': ui_forms, 'ui_visible': True, 'owner': 'kaapana', 'start_date': days_ago(0), 'retries': 0, 'retry_delay': timedelta(seconds=30) } dag = DAG( dag_id='example-dcm2nrrd', default_args=args, schedule_interval=None ) get_input = LocalGetInputDataOperator(dag=dag) convert = DcmConverterOperator(dag=dag, input_operator=get_input, output_format='nrrd ,→') put_to_minio = LocalMinioOperator(dag=dag, action='put', action_operators=[convert],␣ ,→file_white_tuples=('.nrrd')) clean = LocalWorkflowCleanerOperator(dag=dag,clean_workflow_dir=True) get_input >> convert >> put_to_minio >> clean

That’s it basically. Now we can check if the DAG is successfully added to Airflow and then we can test our workflow! • Go to Airflow and check if your newly added DAG example-dcm2nrrd appears under DAGs (it might take up to five minutes that airflow recognizes the DAG! Alternatively you could restart the Airflow Pod in Kubernetes) • If there is an error in the created DAG file like indexing, library imports, etc, you will see an error at the top of the Airflow page • Go to the Meta-Dashboard • Filter via the name of your dataset and with +/- icons on the different charts your images to which you want to apply the algorithm • From the drop-down, choose the DAG you have created i.e. example-dcm2nrrd and press the start button. In the appearing pop-up window press start again and the execution of your DAG is triggered. • In order to check if your DAG runs successfully, you can go back to Airflow and watch how the pipeline jumps from one operator to the next. If an error occurs please check out the TODO section. • If everything was successful you can go to Minio where you will find a bucket called example-dcm2nrrd. Inside this folder you will find the .nrrd files of the selected im- ages.

7.2. Write your first own DAG 26 DKTK Kaapana Technical Documentation, Release 0.1.1

7.3 Deploy an own processing algorithm to the platform

Aim: We will write a workflow that opens a DICOM file with Pydicom, extracts the study id, saves the study id in a json file and pushes the json file to Minio.

7.3.1 Step 1: Check if our scripts works locally

First of all, it is important that your script works. For this, we will simulate the folder struc- ture that we expect on the platform and apply our algorithm locally to the files. In order to simulate the folder structure. Please go back to the Meta-Dashboard, select the files you want to develop with and trigger the DAG download-selected-files with option zip files False. This will download the selected images to a folder in Minio. Please go to Minio, download the folder called batch and save the extracted content to a folder called data. Now the data folder corresponds to the data folder that you have seen in the workflows folder.

Hint: In case your algorithm works with .nrrd files you could simply download the batch folder that we generated in the example-dcm2nrrd folder

• Now we create the following python script extract_study_id.py. Make sure that we have Pydicom installed: import sys, os import glob import json import pydicom from datetime import datetime

# For local testng

# os.environ["WORKFLOW_DIR"] = "" # os.environ["BATCH_NAME"] = "batch" # os.environ["OPERATOR_IN_DIR"] = "initial-input" # os.environ["OPERATOR_OUT_DIR"] = "output"

# From the template batch_folders = sorted([f for f in glob.glob(os.path.join('/', os.environ['WORKFLOW_ ,→DIR'], os.environ['BATCH_NAME'], '*'))]) for batch_element_dir in batch_folders:

element_input_dir = os.path.join(batch_element_dir, os.environ['OPERATOR_IN_DIR']) element_output_dir = os.path.join(batch_element_dir, os.environ['OPERATOR_OUT_DIR ,→']) if not os.path.exists(element_output_dir): os.makedirs(element_output_dir)

# The processing algorithm print(f'Checking {element_input_dir} for dcm files and writing results to ,→{element_output_dir}') (continues on next page)

7.3. Deploy an own processing algorithm to the platform 27 DKTK Kaapana Technical Documentation, Release 0.1.1

(continued from previous page) dcm_files = sorted(glob.glob(os.path.join(element_input_dir, "*.dcm*"),␣ ,→recursive=True))

if len(dcm_files) == 0: print("No dicom file found!") exit(1) else: print(("Extracting study_id: %s" % dcm_files))

incoming_dcm = pydicom.dcmread(dcm_files[0]) json_dict = { 'study_id': incoming_dcm.StudyInstanceUID, 'series_uid': incoming_dcm.SeriesInstanceUID }

if not os.path.exists(element_output_dir): os.makedirs(element_output_dir)

json_file_path = os.path.join(element_output_dir, "{}.json".format(os.path. ,→basename(batch_element_dir)))

with open(json_file_path, "w", encoding='utf-8') as jsonData: json.dump(json_dict, jsonData, indent=4, sort_keys=True, ensure_ ,→ascii=True)

Hint: When creating a new algorithm you can always take our templates (templates_and_examples/templates/processing-container) as a starting point and simply add your code snippet in between for the processing.

In order to test the script we uncomment the os.environ sections and adapt the WORKFLOW_DIR to the data location on our local file system. Then we execute the script. On the platform all the environment variables will be set automatically. If the algorithm runs without errors, the most difficult part is already done, we have a running workflow!

7.3.2 Step 2: Check if our scripts runs inside a Docker container

The next step is to put the algorithm into a Docker container and test if everything works as expected. For this we will put the extract_study_id.py in a folder called files, comment again the os.environ section and create the following file called Dockerfile next to the files directory:

FROM python:3.9-alpine3.12

LABEL IMAGE="example-extract-study-id" LABEL VERSION="0.1.0" LABEL CI_IGNORE="True"

RUN pip3 install pydicom==2.0.0

(continues on next page)

7.3. Deploy an own processing algorithm to the platform 28 DKTK Kaapana Technical Documentation, Release 0.1.1

(continued from previous page) COPY files/extract_study_id.py /

CMD ["python3","-u","/extract_study_id.py"]

The Dockerfile basically copies the python script and executes it.

Hint: Also here you can take our templates as a starting point.

In order to build and test the Dockerfile and the resulting container proceed as follows: • Build the docker container by executing: sudo docker build -t /example-extract-study-id:0.1.0 .

Hint: Depending on your docker registry docker-repo might be not defined: docker-repo='' or the name of the docker repository!

• Run the docker image, however, specify the environment variable as well as mount your local file system into the Docker container to a directory called data. This mount will also be made automatically on the platform. sudo docker run -v :/data -e WORKFLOW_DIR='data' -e␣ ,→BATCH_NAME='batch' -e OPERATOR_IN_DIR='dcm-converter' -e OPERATOR_OUT_DIR= ,→'segmented-nrrd' /example-extract-study-id:0.1.0

In order to debug directly into the container you can execute: sudo docker run -v :/data -e WORKFLOW_DIR='data' -e␣ ,→BATCH_NAME='batch' -e OPERATOR_IN_DIR='dcm-converter' -e OPERATOR_OUT_DIR= ,→'segmented-nrrd' -it /example-extract-study-id:0.1.0␣ ,→/bin/sh

• Finally you need to push the docker container to make it available for the workflow If not already done, log in to the docker registry: sudo docker login and push the docker image with: sudo docker push /example-extract-study-id:0.1.0

7.3. Deploy an own processing algorithm to the platform 29 DKTK Kaapana Technical Documentation, Release 0.1.1

7.3.3 Step 3: Create a DAG and Operator for the created Docker container

Now, we will embed the created Docker container into an operator that will be part of an Airflow DAG. • Go again to the code-server /code • Create a folder called example inside the dags directory • Create a file called ExtractStudyIdOperator.py with the following content inside the example folder: from datetime import timedelta from kaapana.operators.KaapanaBaseOperator import KaapanaBaseOperator, default_ ,→registry, default_project class ExtractStudyIdOperator(KaapanaBaseOperator):

def __init__(self, dag, execution_timeout=timedelta(seconds=30), *args, **kwargs ):

super().__init__( dag=dag, name='extract-study-id', image=f"{default_registry}/example-extract-study-id:0.1.0", image_pull_secrets=["registry-secret"], execution_timeout=execution_timeout, *args, **kwargs )

Hint: Since the operators inherits from the KaapanaBaseOperator.py all the environment variables that we have defined earlier manually are passed now automatically to the container. Studying the KaapanaBaseOperator.py you see that you can pass e.g. also a dictionary called env_vars in order to add additional environment variables to your Docker container!

• In order to use this operator, create a file called dag_example_extract_study_id.py with the following content inside the Dag directory: from airflow.utils.log.logging_mixin import LoggingMixin from airflow.utils.dates import days_ago from datetime import timedelta from airflow.models import DAG from kaapana.operators.LocalGetInputDataOperator import LocalGetInputDataOperator from kaapana.operators.LocalMinioOperator import LocalMinioOperator from kaapana.operators.LocalWorkflowCleanerOperator import␣ ,→LocalWorkflowCleanerOperator (continues on next page)

7.3. Deploy an own processing algorithm to the platform 30 DKTK Kaapana Technical Documentation, Release 0.1.1

(continued from previous page) from example.ExtractStudyIdOperator import ExtractStudyIdOperator ui_forms = { "workflow_form":{ "type": "object", "properties":{ "single_execution":{ "title": "single execution", "description": "Should each series be processed separately?", "type": "boolean", "default": False, "readOnly": False, } } } } log = LoggingMixin().log args = { 'ui_forms': ui_forms, 'ui_visible': True, 'owner': 'kaapana', 'start_date': days_ago(0), 'retries': 0, 'retry_delay': timedelta(seconds=30) } dag = DAG( dag_id='example-dcm-extract-study-id', default_args=args, schedule_interval=None ) get_input = LocalGetInputDataOperator(dag=dag) extract = ExtractStudyIdOperator(dag=dag, input_operator=get_input) put_to_minio = LocalMinioOperator(dag=dag, action='put', action_operators=[extract]) clean = LocalWorkflowCleanerOperator(dag=dag,clean_workflow_dir=True) get_input >> extract >> put_to_minio >> clean

• Now you can again test the final dag by executing it via the Meta-Dashboard to some image data. If everything works fine, you will find the generated data in Minio. In the templates_and_examples folder you will find even more example for DAGs and Docker containers!

7.3. Deploy an own processing algorithm to the platform 31 DKTK Kaapana Technical Documentation, Release 0.1.1

7.4 Deploy a Flask Application on the platform

Aim: Deploy a hello-world flask application to the Kaapana platform

7.4.1 Step 1: Create and run our Flask app locally

As a starting point, we first develop a Flask application and run it locally. The source code of the Hello-World Flask application can be found in the templates_and_examples/examples/ services/hello-world! In case you have never worked with Flask this tutorial will get you started! First of all install the requirements. pip install -r requirements.txt

Now you can try to run the Flask application locally: flask run

When you go to http://localhost:5000 you should see a hello message! Since flask run is only for development, we use Gunicorn to run in production. Gunicorn is started via the boot.sh bash script. To try it please run:

SCRIPT_NAME=/hello-world gunicorn -b :5000 -e SECRET_KEY='test' -e HELLO_WORLD_USER= ,→'klaus' -e APPLICATION_ROOT='/hello-world' run:app

Now the application can be accessed via http://localhost:5000/hello-world. As you can see we adapted the application to run on the prefix path /hello-world. A requirement for any application running on our platform is that it runs within its own subpath, otherwise, it is only possible to serve it via http on a specific port.

7.4.2 Step 2: Create a Docker container with the Flask application

First of all we build the docker container with: sudo docker build -t /hello-world:0.1.0 .

Hint: Depending on your docker registry docker-repo might be not defined: docker-repo='' or the name of the docker repository!

Then check locally if the docker container works as expected: sudo docker run -p 5000:5000 -e SECRET_KEY='some-secret-key' -e HELLO_WORLD_USER= ,→'Kaapana' -e APPLICATION_ROOT='/hello-world' /hello- ,→world:0.1.0

7.4. Deploy a Flask Application on the platform 32 DKTK Kaapana Technical Documentation, Release 0.1.1

Again you should be able to access the application via http://localhost:5000/ hello-world Now, we need to push the docker file to the docker registry. If not already done, log in to the registry with: If not already done, log in to the docker registry: sudo docker login and push the docker image with: sudo docker push /hello-world:0.1.0

7.4.3 Step 3: Write the Kubernetes deployments

Since the Kaapana platform runs in Kubernetes, we will create a Kubernetes deployment, service and ingress in order to get the application running inside the platform. The follow- ing steps will show how to to that: • Replace inside the hello-world-chart/templates/deployment.yaml file the with your docker registry. • Copy the folder hello-word-chart to the instance where the platform is running • Log in to the server and go to the templates directory. Now you should be able to deploy the platform. Go to the server to the directory of the hello-world-chart folder and execute: kubectl apply -f hello-world-chart/templates/

If everything works well the docker container is started inside the Kubernetes cluster. When going to /hello-world on your platform, you should see the hello kaapana page again. Furthermore you should also be able to see the application running on the port 5000. This is because we specified a NodePort in the service.yaml file. If the pod started successfully you can also execute: kubectl get pods -A

Then you should see your pod starting or running! In order to remove the deployment again execute: kubectl delete -f hello-world-chart/templates/

7.4. Deploy a Flask Application on the platform 33 DKTK Kaapana Technical Documentation, Release 0.1.1

7.4.4 Step 4: Write a helm chart and provide it as an extensions

For only local testing you can go to the hello-world directory and build the helm chart locally with: helm package hello-world-chart

This will generate a file called hello-world-chart-0.1.0.tgz, which can be install on the platform with: helm install hello-world-chart hello-world-chart-0.1.0.tgz

Now, you should have the same result as before, when you created the deployment with kubectl. With helm ls you can view all helm releases that are currently running. In order to remove the chart execute: helm uninstall hello-world-chart

In case you want to push the helm chart to a registry you first need to do the following steps: • Install two plugins on the server (if you are on CentOS you might need to install git first with sudo yum install git): helm plugin install https://github.com/instrumenta/helm-kubeval helm plugin install https://github.com/chartmuseum/helm-push

• Add the helm-repo to which you want to push the data: helm repo add --username --password https://dktk- ,→jip-registry.dkfz.de/chartrepo/

• Push the helm chart to your repo helm push hello-world-chart • Finally, after a helm repo update, you can install the hello-world-chart with: helm install --version 0.1.0 hello-world-chart /hello-world-chart

Also here the chart can be deleted again with: helm uninstall hello-world-chart

Since in the Chart.yaml definition we have added kaapanaapplication to the keywords, your application should also appear in the extension list. If it does not you might need to update the extension list via:

./install_platform.sh --update-extensions

7.4. Deploy a Flask Application on the platform 34 CHAPTER EIGHT

FREQUENTLY ASKED QUESTIONS (FAQ)

8.1 There seems to be something wrong with the landing-page visualization in the Browser

Most probably the Browser-Version is not supported. We try to support as many Browsers as possible.

8.2 Kibana dashboard does not work

You open Kibana/Meta and you see something like this?

The error occurred, because the dashboard was opened while not all the meta-data of the images were extracted. You can resolve this by going to https:///meta this is the Kibana dashboard. Select “Management” on the left hand side and then “Index Patterns’’. Then you should see a panel called “meta-index”. On the top right corner there

35 DKTK Kaapana Technical Documentation, Release 0.1.1 is a refresh button. By clicking this button the meta-data will be updated for the view. Now your dashboard should work as expected!

8.3 Proxy configuration

If you need to configure a proxy in your institution to access internet, you can do this as following:

Open /etc/environment with vi insert: http_proxy=”your.proxy.url:port” https_proxy=”your.proxy.url:port”

HTTP_PROXY=”your.proxy.url:port” HTTPS_PROXY=”your.proxy.url:port”

logout

Login again ping www.dkfz-heidelberg.de

Should work -> network connection is working

8.4 Setup a connection to the Kubernetes cluster from your lo- cal workstation

Since the whole software runs within Kubernetes you can connect your local workstation directly to the server and are able to check if the containers are running or not.

8.4.1 Installation of kubectl

Follow this instructions: How to install Kubectl To enable the communication between kubectl and the Kubernetes API server, you need to configure your kubectl with the Kubernetes certificate of the server. To get this, you need to use the “jip_tools.sh” on the server with: cat ~/.kube/config

To configure your local machine, you need to create a config file at:

8.3. Proxy configuration 36 DKTK Kaapana Technical Documentation, Release 0.1.1

nano ~/.kube/config

Paste the certificate from above in the file. You should now be able to communicate with the Kubernetes instance.

To check the functionality, you can try:

kubectl get pods --all-namespaces

You should now see a list of some Kubernetes resources. IF NOT: Check the IP-address at the beginning of your config file. server:

This should match the IP you are using for SSH into the server. ELSE: Check the date on the server! Check if the datetime is correct by: date Di 5. 18:08:15 CET 2020

8.5 Failing to install an extension

Since we use deletion hooks for extension, there might be the problem that the helm release of the extension gets stuck in the uninstalling process. To check if this is the case or if the release is stuck in another stage, get a terminal on your server and execute helm ls --uninstalled helm ls --pending helm ls --failed

Then delete the resource with: helm uninstall

If the resource is still there delete it with the no-hooks options: helm uninstall --no-hooks

8.5. Failing to install an extension 37 CHAPTER NINE

GLOSSARY

application When we speak of an application we mean a tool or service that can be in- stalled via the extensions into a running platform. Moreover, an extension can be started and deleted and runs statically. An example of an application is jupyterlab. chart A chart is a collection of Kubernetes files. E.g. there is a kaapana-platform chart, containing all configuration needed for the plain kaapana platform. However, also each extension is wrapped in a helm chart. component … dag A DAG (Directed Acyclic Graph) is a python script describing an Airflow pipeline. It links multiple operators (output to input) to realize a multi-step processing workflow, typically starting with an operator that collects that data and ending with an operator that pushes the processing results back to some data storage. docker Docker is the technology that enables to integrate a whole OS system with all necessary requirements and a program itself into a so-called docker container. When running such a docker container only the physical resources of the host system are used. On Kaapana every service and workflow runs within a docker container. extension Extensions are either workflows or applications that can be installed addition- ally on the platform. helm We use Helm to distribute and manage our Kubernetes configuration files. Like this we only need one helm chart that contains the whole platform. kaapana-platform The kaapana-platform is an example platform with a default config- uration that contains many of the typical platform components. This basic platform can be used as a starting-point to derive a customized platform for your specific project. kubernetes Kubernetes is an open-source container-orchestration system that we use to manage all the Docker containers that are needed for Kaapana. microk8s MicoK8s is a single-package lightweight kubernetes tha we use to set up our kubernetes cluster. operator Each method or algorithm that is included in Kaapana as Docker container re- quires an associated Operator. An operator is a python script that can be included in an Airflow DAG as a processing step and interfacing the Docker container. pipeline See workflow platform A platform describes a system that runs on a remote server and is accessible via the browser. The kaapana-platform is an example of a platform. Using kaapana,

38 DKTK Kaapana Technical Documentation, Release 0.1.1

you can basically build your own platform by putting the services and extensions together that you need. platform-installation-script This script is used to install a platform into the Kubernetes cluster. Basically this is done by installing the kaapana-platform chart. In addition, it can be used to reinstall, update and to uninstall the platform. Moreover, it can be used to update the extensions, to prefetch all docker containers needed for the extensions or to install certs. To see its full functionally simply execute it with the flag --help. For changes on a running platform itself. execute it without any flag. registry A registry is a storage and content delivery system, holding named Docker im- ages, available in different tagged versions. server A dedicated physical or virtual machine with a supported on which the platform can run. server-installation-script This script is used to install all required dependencies on the server. It can be found within the Kaapana-repository: kaapana/ server-installation/server_installation.sh Currently the following operating systems are supported by the script: • Centos 8 • Ubuntu 20.04 • Ubuntu Server 20.04 This script will do the following: 1. Configure a proxy (if needed) 2. Install packages if not present: snap, nano, jq and curl 3. Install and configure microk8s 4. (opt) Enable GPU for microk8s 5. (opt) Change the SSL-certificates It will also add some commands to the .bashrc of each user to enable a shortcut to the kubectl command and to enable auto-completion. service Every docker container that runs statically inside in kaapana is service. Examples for services are Minio, OHIF, etc. single file and batch processing The difference between single and batch processing is that in single file processing for every image an own DAG is triggered. Therefore, each operator within the DAG only obtains a single image at a time. When selecting batch processing, for all the selected images only one DAG is started and every operator obtains all images in the batch. In general, batch processing is recommended. Single file processing is only necessary if an operator within the workflow can only handle one image at a time. workflow A workflow in our definition is basically an Airflow DAG. It is a number of processing steps applied to a cohort of images. Synonyms used for “workflow are “pipeline” or “”DAG””. Some of the workflows are preinstalled in the platform. Other workflows can be installed and added via the extensions to Airflow.

39 CHAPTER TEN

RELEASES

10.1 Version 0.1.0

Initial release of Kaapana

40 CHAPTER ELEVEN

LICENSE AND CONTACT

11.1 License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details. You should have received a copy of the GNU Affero General Public License along with this program (see file LICENCE). If not, see gnu.org.

11.1.1 Considerations on our license choice

You can use Kaapana to build any product you like, including commercial closed source ones since it is a highly modular system. Kaapana is licensed under the GNU Affero Gen- eral Public License for now since we want to ensure that we can integrate all developments and contributions to its core system for maximum benefit to the community and give ev- erything back. We consider switching to a more liberal license in the future. This decision will depend on how our project develops and what the feedback from the community is regarding the license. Kaapana is built upon the great work of many other open source projects, see the docu- mentation for details. For now we only release source code we created ourselves, since providing pre-built docker containers and licensing for highly modular container based systems is a complex task. We have done our very best to fulfil all requirements, and the choice of AGPL was motivated mainly to make sure we can improve and advance Kaapana in the best way for the whole community. If you have thoughts about this or if you dis- agree with our way using a particular third-party toolkit or miss something please let us know and get in touch. We are open for any feedback and advice on this challenging topic.

41 DKTK Kaapana Technical Documentation, Release 0.1.1

11.2 Contact

Kaapana Team Heidelberg [email protected]

German Cancer Research Center (DKFZ) Division of Medical Image Computing (MIC)

Deutsches Krebsforschungzentrum Radiologisches Forschungs- und Entwicklungszentrum Im Neuenheimer Feld 280 69120 Heidelberg

Copyright (C) 2020 German Cancer Research Center (DKFZ)

11.2. Contact 42 INDEX

A application, 38 C chart, 38 component, 38 D dag, 38 docker, 38 E extension, 38 H helm, 38 K kaapana-platform, 38 kubernetes, 38 M microk8s, 38 O operator, 38 P pipeline, 38 platform, 38 platform-installation-script, 39 R registry, 39 S server, 39 server-installation-script, 39 service, 39 single file and batch processing, 39 W workflow, 39

43