Capsule Nets for Complex Medical Image Segmentation Tasks

Capsule Nets for Complex Medical Image Segmentation Tasks Shanmugapriya Survarachakan1, Jenny Stange Johansen1, Mathias Aarseth Pedersen1, Mahdi Amani1;2, and Frank Lindseth1 1 Norwegian University of Science and Technology, Department of Computer Science, Trondheim, Norway fshanmugapriya.survarachakan, mahdi.amani, [email protected] fjennysj, [email protected] 2 SimulaMET, Simula Research Laboratory, Oslo, Norway Abstract. In recent years, automatic image segmentation using deep learning has shown great potential. U-Net is commonly applied to the medical image segmentation task. Further, a new architecture called Seg- Caps, based on capsules networks has been introduced. Even though medical image segmentation has made great strides, it is still a complex task and continued research in this area is important. In this paper, the method from the Segcaps paper was implemented and trained on the same dataset and achieved a similar result. In addition, a Multi-SegCaps model, an EM-routing SegCaps model and a U-Net model able to segment an arbitrary number of classes were developed. The models were implemented for the segmentation of single 2D slices using two of its neighboring slices on each side (five in total). These models were trained to perform left atrium and hippocampus (anterior & posterior) segmentation on the MSD datasets. The performance of the Multi-SegCaps model and the EM-routing SegCaps model were compared to the U-Net model. The 2.5D U-Net model had an overall better performance on all the datasets achieving a Dice score of 91.39% for the left atrium segmentation, and the Dice scores of 85.18% and 83.90% for the anterior and posterior hippocampus segmentation. The multi-class SegCaps model was also applied to two different datasets, showing the ability to segment sev- eral classes reasonably well. A Dice score of 68.2% was achieved for the left atrium and the Dice scores of 72.42% and 70.49% were achieved for the anterior and posterior hippocampus, respectively. The EM-SegCaps model was applied to the hippocampus dataset and it achieved a Dice score of 54.50% for the whole hippocampus and the Dice scores of 18.67% and 24.52% for the anterior and posterior hippocampus, respectively. Keywords: Medical Image Segmentation · U-Net · SegCaps · Capsule Network. Copyright c 2020 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). Colour and Visual Com- puting Symposium 2020, Gjøvik, Norway, September 16-17, 2020. 2 S. Survarachakan et al. 1 Introduction Medical image segmentation methods attempt to extract and locate the precise location of organs, tumors and other structures of interest, with the intention of aiding health professionals in making accurate diagnoses in a shorter amount of time. In recent years, convolutional neural networks (CNNs) have been commonly used for image analysis tasks. A successful application of CNNs to the task of image segmentation was shown by [9] when they introduced fully convolutional networks where the skip connections were introduced to directly connect the opposing convolutional layers in the contraction and expansion paths. The network can work regardless of the original image size, without requiring any fixed number of units. The performance was improved by the elegant fully convolutional U-Net architecture, published by [10]. The architecture is symmetric and the skip connections between the downsampling path and the upsampling path apply a concatenation operator instead of a sum. The network combines the location information from the downsampling path with the contextual information in the upsampling path and preserves the spatial resolution of the output. SegNet, an encoder-decoder network similar to FCN was introduced in [1]. It uses max unpooling in the upsampling path which eliminates the need for the network to learn the upsampling and provides a more efficient way to achieve segmentations than FCN. [6] created a new U-Net type architecture, showing state of the art performance on the task of multimodal brain tumor segmentation using the BraTS dataset. The architecture was residual and worked directly with 3D patches of medical images. Though CNNs are popularly used they have some limitations due to its properties of rotational invariance and lack of instan- tiation parameters. So, many images with different views are needed to train the network. To overcome this issue, [11] presented a brand new capsule neural network architecture. Unlike CNNs, capsule networks are equivariant which can detect the objects if they are rotated and how many degrees it is rotated. This reduces the number of images required to train the network. [11] showed that capsule networks gave state of the art performance on the task of classifying highly overlapping digits in the MNIST dataset, proving CapsNet to be an efficient architecture for learning this type of problem. In [3], the capsule network consisting of two convolutional layers, two capsule layers, one fully connected layer, and two de-convolutional layers employing dynamic routing mechanism was proposed to segment Left ventricle (LV). Building on the work of capsule networks, [8] presented a method for performing binary image segmentation. The number of parameters needed by the architecture was reduced substantially by constraining the dynamic routing to capsules only in a defined kernel. The network showed good results for lung segmentation using the LUNA16 dataset. [5] claimed that the EM-routing algorithm solves most of the issues with the routing algorithm from [11]. It reduces the number of parameters needed, as well as using a metric of agreement that does not saturate with highly confi- dent predictions. It gave state of the art performance at detecting objects at novel viewpoints on the smallNORB dataset. In [2], a novel capsule is introduced which combines pose and appearance information encoded as capsules, Capsule Nets for Complex Medical Image Segmentation Tasks 3 named Matwo-Caps. Additionally, a new routing mechanism, i.e. dual routing, combining this two information was proposed. The method was evaluated for the semantic segmentation of the JSRT chest X-ray dataset. In [4], Fourier analysis and the circular Hough transform methods were applied to indicate the approxi- mate location of the LV and the capsule network with dynamic routing was used to precisely segment the LV. Thresholding and morphological processing were used as postprocessing methods to increase the accuracy of LV segmentation. In our work, the method and the results from [8] were reproduced and trained to adapt to other medical datasets and different modalities. The architecture was, as reported in the paper, capable of segmenting images of lungs. How- ever, the experiment also revealed that the architecture struggles to adapt to another medical dataset. A SegCaps architecture is further developed and evaluated for multi-label image segmentation. It was experimentally shown to be able to perform segmentation of datasets with multiple label classes but struggles to segment structures in imbalanced datasets. Additionally, a framework for image segmentation using capsule networks and the EM-routing algorithm was developed and evaluated. As far as the authors know, an attempt of this has never been reported in the literature so far. Although the model did not give very good results, it showed that it was capable of segmenting some simple structures from the hippocampus dataset. A 2.5D architecture based on a U-Net variant was implemented and compared with the performance of the SegCaps, the Multi-SegCaps and the EM-SegCaps models on the cardiac and the hippocampus datasets from the Medical Segmentation Decathlon (MSD) datasets. This paper consists of five sections. The first section gives an introduction to the paper and the motivation behind it. The various methods and the datasets used in the paper are presented in the second section. In the third section, the experiments comparing different methods and their corresponding results are presented. The fourth section summarizes the discussion of the results and the fifth section presents the conclusion and suggests future work. 2 Methodology This section presents the different models developed and the datasets used in the experiments. 2.1 SegCaps The SegCaps architecture presented in [8] was used to perform binary segmentation of medical images (see Fig. 1). The architecture used is called SegCaps R3, referring to the use of three iterations of the dynamic routing algorithm in each capsule. A regular 2D convolution is performed on an input image to produce 16 feature maps. The tensor consisting of 16 feature maps is reshaped in order to give it a new dimension of length one, such that the reshaped tensor now represents a single 16D capsule. The 16D capsule is forwarded to the primary capsule layer, 4 S. Survarachakan et al. Fig. 1. SegCaps architecture which is a regular convolutional capsule layer with one routing iteration, which returns predictions of two 16D capsules. Following the primary capsule layer, there are sets of two convolutional capsule layers at every layer for the remain- der of the contracting path. The first of the two operations is a 5x5 convolutional capsule layer with stride two, which also doubles the number of feature maps that are output. The second layer consists of 5x5 convolutional capsules without a stride. Output features from the latter operation being forwarded to the next level of the contracting

Capsule Nets for Complex Medical Image Segmentation Tasks

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support