SIIM 2017 Scientific Session Posters & Demonstrations

Creating an Open Source Infrastructure for Image Phenotyping in Clinical Research

Brian E. Chapman, PhD, University of Utah; John A. Roberts, PhD; Adam Sorenson

Background

Biomedical images are a rich source of information that can be exploited for biological discovery. Mature ecosystems exist for phenotyping from micrscopy images (e.g. Carpenter 2006) but infrastructure for analysis of clinical images are less mature. While open source tools such as (Pieper 2004) exist for , the image analysis need to be incorporated into a larger eco system that provides case identification, subject de-identification with linkage to other data, knowledge based data management. In this we describe an open-source eco system that can be used to facilitate phenotyping with biomedical images.

Case Presentation

We have built an image phenotyping eco system that is based entirely on open-source tools (primarily , Girder, and Jupyter) and leverages Docker for ease of deployment and administration. Figure 1 shows the schematic of our system. Separate physical or logical networks are used to isolate services touching PHI. Further, Docker is used to provide individual compute environments for users interacting with data stored in our repository.

Figure 1

We use existing research and operational databases to identify relevant cases of interest. Results of database queries are performed and stored within a HIPAA compliant infrastructure. Identified accession numbers, medical record numbers, and dates are then used to push identified cases from our clinical PACS using dcmtk (http://dicom.offis.de/dcmtk.php.en) scripts. Images are then pulled into an Orthanc system. Orthanc (http://www.orthanc-server.com/) is an open-source, lightweight DICOM server with a restful API that makes it highly configurable. We use a two-tier Orthanc system. The first (anonymizing) Orthanc server de-identifies the images using configurable Lua scripts, stores the identification information in a secure database, and then relays the anonymized images to a second (safe harbor) Orthanc server. We use the two-tier system to isolate PHI: downstream systems such as Girder never interact with the system containing PHI and the data flow is one-way within our Orthanc system: the safe harbor Orthanc server cannot pull data from the anonymizing server. Further scripts invoke the Orthanc restful API to push images from the safe harbor server into the Girder data store. We extend the capability of the anonymizing Orthanc server in order to support longitudinal studies. Images of the same subject acquired at different time points receive consistent anonymized patient identifiers. The Lua script maintains a map within the PostgreSQL database linking original and anonymized data. Further care is taken to maintain relative chronological order between anonymized longitudinal studies while still randomizing study dates as per HIPAA safe harbor requirements.

Our data permanance is built using Girder (https://girder.readthedocs.io/en/latest/index.html) Girder is an open source, web-based data management system. Girder is agnostic to the type of data being stored but through a flexible plugin architecture can provide customized support for a variety of data types. Girder provides access management, and web visualization of data and associated meta data. Metadata are stored using JSON attribute-value pairs. In order to ensure effective data retrieval within Girder, all data have ontological metadata that describe the anatomical regions contained within the images as well as any disease(s) processes associated with the images (e.g. reason for exam or findings identified within the studies). We use bioportal (http://bioportal.bioontology.org/) as our ontology service. The persistent uniform resource locator (PURL) associated with the ontological terms are stored in the metadata for each image. We currently use the Uberon Ontology (http://uberon.github.io/) for anatomical concepts and the Human Disease Ontology (http://disease-ontology.org/) for disease concepts.

When DICOM images are ingested into Girder, the public DICOM header fields are extracted into the JSON metadata structures associated with each data. DICOM images are transferred from Orthanc to Girder using web APIs for Orthanc and Girder.

Data can be downloaded from Girder either through the web pages, through the restful API, or via direct interaction between the application and Girder. Applications such as 3D Slicer (https://www.slicer.org/) and XTK (https://github.com/xtk/X#readme) can be configured to directly interact with Girder, bypassing the need to download images and upload results. In addition to these specialized applications we have built a flexible interface to Girder based on Jupyter notebooks (http://jupyter.org/). Jupyter notebooks allow programmatic interaction with data through a web browser via a number of programming languages including Python and R. Data interactivity is achieved with JavaScript. Python is a particularly useful language for image analysis due to the large number of relevant packages available including SimpleITK, scikit-learn, and TensorFlow. Using nvidia-docker it is easy to provide GPU computing resources through the Jupyter notebook. We have create Python packages for intuitive interactions with Girder including Jupyter notebook widgets for data uploading that incorporates tagging data with ontological descriptions of anatomical regions and depicted disease (Figure 2).

Figure 2

Outcome

The image phenotyping infrastructure we have created is being used to support a number of clinical research projects including mammography, MR imaging of carotid disease, CTPA evaluation of pulmonary hypertension, and dermatological imaging of melanoma.

Discussion

The use of Docker greatly simplifies deploying and distributing the software. Jupyter notebooks allow for a powerful web-interface to the data which facilitates programming with Python which has a rich set of packages for medical image analysis. Serving the notebooks to users with Docker containers provides users isolated compute environments.

Conclusion

The software infrastructure we have created is built entirely using open source tools and will be freely distributed. Our infrastructure leverages existing projects and will allow us to leverage expanding web- based computing capabilities as they are developed.

References

1. Carpenter, Anne E., et al. "CellProfiler: image analysis software for identifying and quantifying cell phenotypes." Genome biology 7.10 (2006): R100. 2. Pieper, Steve, Michael Halle, and Ron Kikinis. "3D Slicer." Biomedical Imaging: Nano to Macro, 2004. IEEE International Symposium on. IEEE, 2004.

Keywords

phenoyping, Python, Docker, Girder, Orthanc