<<

1

DUG-RECON: A Framework for Direct Image Reconstruction using Convolutional Generative Networks V.S.S. Kandarpa∗, Alexandre Bousse, Didier Benoit and Dimitris Visvikis

Abstract—This paper explores convolutional generative net- The use of deep learning for the development of either data works as an alternative to iterative reconstruction in corrections or post-reconstruction image based approaches (i) medical image reconstruction. The task of medical image recon- has shown potential to improve the quality of reconstructed struction involves mapping of projection domain data collected from the detector to the image domain. This mapping is done typ- images. An example of data corrections by improving the ically through iterative reconstruction algorithms which are time raw data through scatter correction is proposed in [14]. In consuming and computationally expensive. Trained deep learning this work a modified U-Net is used to estimate scatter and networks provide faster outputs as proven in various tasks across correct the raw data in order to improve computed computer vision. In this work we propose a direct reconstruction (CT) images. Denoising the reconstructed positron emission framework exclusively with deep learning architectures. The pro- posed framework consists of three segments, namely denoising, tomography (PET) images with a deep convolutional network reconstruction and super resolution. The denoising and the super was done in [13]. The authors used perceptual loss along resolution segments act as processing steps. The reconstruction with mean squared error (MSE) to preserve qualitative and segment consists of a novel double U-Net generator (DUG) quantitative accuracy of the reconstructed images. The network which learns the sinogram-to-image transformation. This entire was initially trained on simulated data and then fine-tuned network was trained on positron emission tomography (PET) and computed tomography (CT) images. The reconstruction on real patient data. Despite resulting in an improvement of framework approximates two-dimensional (2-D) mapping from the reconstructed output, the above mentioned methods do not projection domain to image domain. The architecture proposed directly intervene with the reconstruction process. This can be in this proof-of-concept work is a novel approach to direct image done using the two distinct frameworks (ii) and (iii). reconstruction; further improvement is required to implement it The first one involves the incorporation of a deep neural in a clinical setting. network into an unrolled iterative where a trained Index Terms—Medical Image Reconstruction, Deep Learning, neural network accelerates the convergence by improving the Generative Adversarial Networks intermediate estimates in the iterations [15]–[17]. The paper by Gong et al. used a modified U-Net to represent images I.INTRODUCTION within the iterative reconstruction framework for PET images. The deep learning architecture was trained on low-dose recon- HE use of deep learning in medical imaging has been on structed images as input and high-dose reconstructed images the rise over the last few years. It has widely been used in T as the output. The work by Xie et al. further extended this various tasks across medical imaging such as image segmen- work by replacing the U-Net with a generative adversarial tation [1]–[5], image denoising [6]–[9], image analysis [10]– network (GAN) for image representation within the itera- [12]. The utilization of deep learning for image reconstruction tive framework. Kim et al incorporated a trained denoising is a more challenging task. Image reconstruction using deep convolutional neural network (DnCNN) along with a novel learning corresponds to the task of mapping raw projection local linear fitting function into the iterative algorithm. The data retrieved from the detector to image domain data. One DnCNN which is trained on data with multiple noise levels can broadly identify three different categories of approaches improves the image estimate at each iteration. They used for the implementation of deep learning within the framework arXiv:2012.02000v1 [physics.med-ph] 3 Dec 2020 simulated and real patient data in their study. The second of medical image reconstruction: framework, referred to as direct reconstruction, is based on (i) methods that use deep learning as an image processing performing the whole reconstruction process (replacing the step that improves the quality of the raw data and/or the classical framework) with a deep learning algorithm, using reconstructed image [13], [14]; raw data as input and reconstructed images as output [18]– (ii) methods that embed deep-learning image processing tech- [20]. In contrast to (i) and (ii) which have been extensively niques in the iterative reconstruction framework to accel- investigated, direct reconstruction (iii) using deep learning has erate convergence or to improve image quality [15]–[17]; been much less explored. (iii) direct reconstruction with deep learning alone without There have been to date three particular approaches that any use of traditional reconstruction methods [18]–[20]. are relevant to this strategy. The deep learning architecture proposed by Zhu et al. [19] called AUTOMAP uses fully All the authors are affiliated to LaTIM, INSERM, UMR 1101, Universite´ de Bretagne Occidentale. connected (FC) layers (which encode the raw data informa- ∗ (email: [email protected]) tion) followed with convolutional layers. The first three layers in this architecture are FC layers with dimensions 2n2,n2 II.MATERIALS AND METHODS 2 and n respectively where n × n is the dimension of the A. Image Reconstruction Model input image. The AUTOMAP requires the estimation of a huge number of parameters which makes it computationally In medical imaging, image reconstruction corresponds to the task of reconstructing an image x ∈ Rm, from a scanner intensive. Although initially developed for magnetic resonance n imaging (MRI), AUTOMAP has been claimed to work on measurement y ∈ R , where m is the number of voxels other imaging modalities too. Brain images encoded into defining the image and n is the number of detectors in the sensor-domain sampling strategies with varying levels of addi- scanner. In CT, the image x = µ corresponds to X-ray tive white Gaussian noise were reconstructed with AUTOMAP. attenuation, measured by the proportion of X-rays scattered Within the same concept of using FC layers’ architectures a or absorbed as they pass through the object. In PET, x = λ three stage image reconstruction pipeline called DirectPET has is the distribution of a radiotracer delivered to the patient been proposed to reduce associated computational issues [18]. by injection, and is measured through the detection of pairs The first stage down-samples the sinogram data, following of γ-rays emitted in opposite directions (indirectly from the which a unique layer encodes the transfor- positron-emitting radiotracer). mation from sinogram to image space. Finally the estimated The measurement y is a random vector modeling the num- image is improved using a super resolution block. This work ber of detection (photon counting) at each of the n detector was applied to full body PET images and remains the only bins, and follows a Poisson distribution with independent approach that can reconstruct multiple slices simultaneously entries: (up to 16 images). DeepPET is another approach implemented y ∼ Poisson(y¯(x)) (1) on simulated images using encoder-decoder architecture based n where y¯(x) ∈ R is the expected number of counts (noise- on the neural network proposed by the visual geometric group less), which is a function of the image x. In a simplified [20]. Using realistic simulated data, they demonstrated a net- setting, the expected number of counts in CT is work that could reconstruct images faster, and with an image quality (in terms of root mean squared error) comparable to y¯(µ) = exp(−Lµ) (2) that of conventional iterative reconstruction techniques. where L ∈ Rn×m is a system matrix such that each entry In our work we explore the use of U-Net based deep [L]i,j represents the contribution of the j-th image voxel to learning architectures [1] to perform a direct reconstruction the i-th detector. In PET, the expected number of counts is from the sinogram to the image domain using real patient (also in a simplified setting) is datasets. Our aim is to reduce the number of trainable param- eters along with exploring a novel strategy for direct image y¯(λ) = P λ (3) reconstruction using generative networks. More specifically where P ∈ n×m is a system matrix such that each entry our approach consists of a three-stage deep-learning pipeline R [P ] represents the probability that a photon pair emitted consisting of denoising, image reconstruction and super reso- i,j from voxel j. For simplification we assume P = L which is lution segments. Our experiments included training the deep a reasonable assumption in a non time-of-flight setting. Image learning pipeline on PET and CT sinogram-image pairs. A reconstruction is achieved by finding a suitable image xˆ = µˆ single pass through the trained network transforms the noisy or λˆ that approximately solves sinograms to reconstructed images. The reconstruction of both PET and CT datasets was considered and presented in the y = y¯(x) . (4) following sections. At this stage the work presented is a proof of concept and needs further improvement before being Filtered-backprojection (FBP) techniques (see [21] for a re- applied in a clinical setting. view) can efficiently solve (4) but they are vulnerable to noise as the system matrix P is ill-conditioned. Since the 80’s, model-based iterative reconstruction (MBIR) techniques [22], Projection Domain [23] became the standard approach. They consist in iteratively approximating a solution xˆ such that y¯(xˆ) maximizes the like- Data Corrections with Deep Learning lihood of the measurement y. As they model the stochasticity of the system, they are more robust to noise as compared with FBP, and can be completed with a penalty term for additional Iterative Unrolled Iterative Deep Learning control over the noise [24]. However, they are computationally Algorithm Algorithm Algorithm expensive as each iteration requires to compute a projection and a backprojection. Post-Processing Image reconstruction with deep learning however is a data with Deep Learning Deep Learning In update step driven approach wherein there is a training and a prediction phase. Given a set of training data which is a subset of Image Domain the raw data (y) and its corresponding images (x), a deep Fig. 1: Deep Learning in Medical Image Reconstruction learning architecture learns the mapping from raw data to the image and improves this mapping through the training process. During the prediction phase a subset of raw data different

2 from the training data serves as the input to the trained deep the raw data to image while the second U-Net takes as input learning architecture. The output is a reconstructed image the generated image and transforms it back to the raw data. which is obtained on a single forward pass through the The second U-Net assesses the reconstructed image output, network. Hence making the reconstruction process through reiterating the relation between the sinogram and the image. deep learning instantaneous as opposed to an iterative process. This architecture differs from the Pix2Pix, which consists This makes direct reconstruction with deep learning faster and of a generator (U-Net like network) and a discriminator less computationally expensive than iterative algorithms. (classification convolutional network). While the generator in both architectures serves the purpose of transforming images B. Deep Learning Architectures from one domain to the other, the discriminator with regards to Pix2Pix classifies inputs as real/fake. The objective function As shown in Figure 2, we propose a three-stage deep for this architecture can be written as: learning pipeline for the task of tomographic reconstruction.

In the first step the raw data (projection space) are denoised. Ltotal = LG1 + LG2 + LG1+G2 (6) Next the denoised sinograms are transformed to the image where, domain in the image reconstruction segment. The third and n 1 X final segment operates in the image domain to improve the L = |x? − xˆ | (7) G1 n i i image produced after domain transformation. The following i=1 sections discuss these segments in detail. n 1 X L = |y? − yˆ | (8) G2 n i i i=1 Image Super Denoising Reconstruction Resolution G1,G2 are Generator 1 and Generator 2 which predict image and sinogram respectively; x?, xˆ the true and predicted Fig. 2: Proposed Deep Learning pipeline for Direct Image image, y?, yˆ the true and predicted sinogram respectively. Reconstruction LG1+G2 is defined similar to LG2 with the combined archi- tecture of G + G , keeping the weights of G2 fixed. 1) Denoising: We used a modified U-Net architecture to 1 2 The architecture is represented in detail in Figure 4. The denoise the Poisson sampled sinograms, based on the work training for this architecture is summarized in Algorithm 1. A previously carried out for ultrasound denoising [25]. The U- comparison of the trainable parameters for various segments Net is an encoder-decoder network which was initially imple- that are used to perform the domain mapping from sinogram to mented for segmentation but over the years its applications image is provided in Table I, considering the AUTOMAP from have broadened. As shown in Figure 3 there are increasing [19], the Radon inversion layer from [18] and the proposed number of convolutions along with max pooling to arrive at an architecture DUG along with denoising and super resolution encoding of the input and then with convolutions followed by segments. upsampling, arriving at the output with an identical dimension as the input. The important modification in the architecture mentioned in [25] with respect to U-Net was the residual Algorithm 1: Training the DUG connection from the input to the final output. Perdios et al M = number of epochs ; trained the denoising architecture on simulated ultrasound N = total training data (images/sinograms) ; images so as to enhance ultrafast ultrasound imaging. This for i = 1,2,. . . , M do denoising architecture corresponds to the first segment in our for j = 1,2,. . . ,N do proposed framework. It was trained on raw data pairs, i.e., low- Train G1: minimizing LG1 ; count and high-count sinograms, considering multiple noise end levels. The detailed architecture is represented in Figure 3. for j = 1,2,. . . ,N do ? We defined the loss function between a true sinogram y = Train G2: minimizing LG2 ; ? ? > n > n [y1 , . . . , yn] ∈ R and a prediction yˆ = [ˆy1,..., yˆn] ∈ R end as the MSE: for j = 1,2,. . . ,N do n Train combined architecture, freezing the ? 1 X ? 2 MSE(y , yˆ) = (y − yˆ ) , (5) weights of G2: minimizing LG1+G2 ; n i i i=1 end end where, n is the number pixels on the sinogram, corresponding to the number of detectors in the scanner. 2) Image Reconstruction: The novelty in this work is the 3) Super Resolution: The function of the Super Resolution proposed U-Net based network in contrast to previous works (SR) is to improve the estimate produced by the image re- in direct image reconstruction using the FC layer architectures. construction network. Several works already exist concerning This design of the network draws its inspiration from condi- single image super resolution [27], [28]. In this work we em- tional GAN for image to image translation called Pix2Pix [26]. ployed a basic super residual network architecture to improve The proposed network namely double U-Net generator (DUG) the reconstruction. It consists of convolutional blocks followed consists of two cascaded U-Nets. The first U-Net transforms by batch normalization with parametric rectified linear unit

3 TABLE I: Trainable Parameters comparison ? VGG16(x ) and VGG16(xˆ) are the extracted features with VGG convolutional neural network [29] for the true and Architecture Input Size Output Size Trainable 16 Parameters predicted image. AUTOMAP 200 × 168 200 × 200 6,545,920,000 The features are extracted from the 10th layer of the VGG Radon Inversion Layer 200 × 168 200 × 200 382,259,200 architecture i.e., only the first three convolutional blocks are (40 × 40 Patch size) considered. We observed that extracting deeper features led to DeepPET 128 × 128 128 × 128 62,821,473 DUG-RECON 128 × 128 128 × 128 17,444,140 the network hallucinating features in the reconstructed images.

C. Dataset Description Concatenation

3 layers of 32 x (3,3) We applied our methodology on fluorothymidine (FLT) 3 layers of 64 x (3,3) PET/CT images from the American College of Radiology 3 layers of 128 x (3,3) Imaging Network (ACRIN) FLT Breast PET/CT database [30]. 3 layers of 256 x (3,3) The details of the dataset are given in Table II. The sinograms 3 layers of 512 x (3,3) were initially generated by projecting 2-D PET and CT images slices with the Python SKLEARN Radon transform, following the models (2) and (3) for CT and PET respectively, with Fig. 3: Representation of the denoising network. The inputs to Poisson noise added. The methodology represented in Figure 6 the network were two-dimensional (2-D) grayscale slices with was used for data preparation for the PET and CT modalities resolution 128×128 and the outputs were denoised sinograms. respectively. Sample pairs from the PET and CT datasets are shown in Figures 7 and 8. The CT images were downsized from 512 × 512, and the reconstruction was implemented for 2-D 128 × 128 images.

Generator 1 TABLE II: Dataset Description Concatenation 2 layers of 32 x (3,3) Dataset Statistics 2 layers of 64 x (3,3) Modalities CT, PET 2 layers of 128 x (3,3) Number of Patients 83

2 layers of 256 x (3,3) Number of PET 2-D Image slices 76,000 Number of CT 2-D Image slices 21,104 1 layer of 512 x (3,3) PET Matrix size 128 Generator 2 CT Matrix size 512 Scanner GE Discovery ST

Fig. 4: Representation of the DUG, the image reconstruction block. This network was trained on denoised sinograms which 1 were the outputs of the previous segment.

Convolution: 32x(3,3) Layer Batch normalisation Layer Image Volume Poisson Initial Final to 2D Image CT Image Volume Resampled Reconstructed Reconstructed slices sinograms Image Image

1

Residual Block

Fig. 5: Representation of the super resolution block. It consists of 8 residual blocks with Convolution, Batch normalization PET Image Volume and PReLu. Fig. 6: Data preparation

(PReLU) activation. There were a total of 8 residual blocks in the network as represented in Figure 5. The loss function D. Training used in this architecture was perceptual loss: TensorFlow [31] and Keras [32] were used for the real- ization of the architectures described in the section above. ? PerceptualLoss = |VGG16(x ) − VGG16(xˆ)| (9) These architectures were implemented on a single Nvidia

4 ? N truth (GT) images {(xˆk, xk)}k=1 . The SR block was trained with N = 20000 for 100 epochs. For the testing, a set of 2000 sinogram-image pairs were used.

E. Quantitative Analysis Testing for the aforementioned architectures was done on samples that were not a part of the training data. The metrics used for this analysis are root mean squared error (RMSE) and structural similarity index (SSIM) Index. They are defined below:

v m u 1 X RMSE(x?, xˆ) = u (x? − xˆ )2 (10) tn j j j=1

Fig. 7: Example PET sinogram-image pairs from the dataset where n is the number of pixels. xˆ is the GT x? the predicted output.

? (2µx? µx + c1)(2σx?x + c2) SSIM(x , x) = 2 2 2 2 (11) (µx? + µx + c1)(σx? + σx + c2) ? 2 where µx? µx are the averages of x and x respectively, σx? 2 ? and σx are the variances of x and x, σx?x is the covariance ? 2 2 between x and x , c1 = (k1L) and c2 = (k2L) where k1 = 0.01 and k2 = 0.03 by default.

F. Region of Interest Analysis The signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR) were studied for four regions of interest identified within the patient body. The SNR and CNR were evaluated by treating a region as foreground and the other three regions as background. µ − µ Fig. 8: Example CT sinogram-image pairs from the dataset SNR = r b (12) σb

|µr − µb| GeForce GTX 2080Ti GPU. A collection of images {x?}N CNR = . (13) k k=1 pσ2 + σ2 was used to generate a corresponding collection of noise- r b ? N less sinograms {yk}k=1 following models (2) and (3), low- where µr and µb, σr and σb correspond to the mean and LC N HC N counts and high-counts sinograms, {yk }k=1 and {yk }k=1 standard deviation in the region of interest (ROI) and the sinograms were generated by adding Poisson noise, where background respectively. In this study we compared the initial the expected number of counts was adjusted by rescaling reconstructed output of the DUG, the final reconstruction the intensity. The denoising segment was trained with the along with SR and the original GT which was reconstructed LC HC N collection {(yk , (yk )}k=1, using the high-count sinograms with GE discovery ST using an ordered-subsets expectation- as ground truth, with total number of training samples N = maximisation (OSEM) algorithm.. 120000 and number of epochs M = 100. For the training of the second segment, we used a collection of denoised N G. Comparison with DeepPET sinograms, namely {yˆk}k=1, and their corresponding ground ? N true images {xk}k=1. The image reconstruction segment We implemented the architecture DeepPET [20] and com- was trained according to the Algorithm 1, that is to say pared the predictions with our proposed approach for the ? N by alternating between training G1 with {(yˆk, xk)}k=1 and reconstruction of PET images. DeepPET was trained on ? N ? N training G2 with {(xˆk, yk)}k=1, where xˆk is a prediction {(yˆk, xk)}k=1, notation similar to the training section from from G1 with xˆk as an input. This segment was trained with above, with N = 120000, exclusively on PET data. It is N = 40000 and M = 50. The CT data were augmented by worth noting that the input and output dimensions in our rotating the data by 90 degrees to generate the required training study are identical while it was not originally for DeepPET. data. Owing to the larger PET dataset it was not necessary The architecture of DeepPET is summarized in Figure 5. to perform data augmentation. Finally the SR segment was This architecture was trained for 100 epochs with an Adam trained on the images predicted by the DUG and the ground optimiser.

5 32 32 FBP. The image reconstructed with FBP has better SNR and

64 64 CNR compared to the image reconstructed with the proposed architecture. 128 128 256 256 512 1024 512 TABLE III: The SSIM and RMSE for the PET images are evaluated for 4 different 2-D slices. These metrics are com- pared amongst predictions by DUG, DUG+SR and DeepPET

Image Architecture RMSE SSIM

128x128 1 DUG 0.059 0.74 Conv 3*3,stride 1 Batch norm ReLU DUG+SR 0.038 0.84 DeepPET 0.047 0.80 Conv 3*3,stride 2 Upsample 2 DUG 0.043 0.76 DUG+SR 0.046 0.86 Fig. 9: Representation of DeepPET. The number of filters in DeepPET 0.054 0.85 each convolutional layer is labeled on top of each block. 3 DUG 0.050 0.76 DUG+SR 0.038 0.85 DeepPET 0.043 0.83 4 DUG 0.061 0.70 III.RESULTS DUG+SR 0.045 0.82 DeepPET 0.048 0.79 The predictions from the architectures along with the GT and the sinogram are shown in Figure 10 for PET images. The results are displayed for four test image slices across TABLE IV: The SSIM and RMSE for the CT images are the columns. Each column shows the predicted output by evaluated for 4 different 2-D slices. Here the architecture the proposed DUG-RECON architecture and the DeepPET indicates the prediction by DUG and that of DUG along with architecture, as well as the GT. With regards to the proposed SR segment architecture it is observed that the initial reconstructed image Image Architecture RMSE SSIM i.e., the output of the DUG looks blurred while final recon- structed output from the super resolution block has noticeably 1 DUG 0.0083 0.90 DUG+SR 0.0015 0.98 improved details. The predictions by DeepPET are also visibly DeepPET 0.0012 0.99 blurred compared to the final reconstructed output of the 2 DUG 0.0081 0.90 proposed architecture and the GT. These observations are DUG+SR 0.0015 0.99 DeepPET 0.0014 0.99 further ascertained in Table III where the quantitative metrics 3 DUG 0.0015 0.91 are tabulated. Figure 11 provides a comparison of the intensity DUG+SR 0.0018 0.98 profiles for the predictions by DUG, DUG+SR and DeepPET DeepPET 0.0013 0.99 w.r.t. to the GT for PET images. These intensity values are observed along the line marked in yellow in these figures. As TABLE V: ROI Analysis: The mean, standard deviation (SD) this figure shows, the intensity profile of the final reconstructed and the SNR for the 4 regions of interest marked in Figure 12 image of the proposed architecture is closest to the GT. The predictions by DUG and DeepPET are smoother compared to Region Image Mean SD SNR CNR the predictions by DUG+SR and the GT. The ROI analysis is 1 DUG+SR 0.706 0.024 7.15 5.71 tabulated in Table V for the four regions marked in Figure 12. MLEM 0.676 0.035 5.72 4.55 This analysis was carried out for final predictions by the 2 DUG+SR 0.713 0.091 4.81 3.42 proposed architecture and maximum likelihood expectation- MLEM 0.648 0.11 4.38 3.26 3 DUG+SR 0.744 0.071 2.73 1.65 maximization (MLEM). Looking closely at Table V we notice MLEM 0.547 0.154 3.23 1.22 that the mean values of the deep learning predicted image and 4 DUG+SR 0.117 0.008 14.96 10.64 the MLEM reconstructed image are comparable. The results MLEM 0.057 0.010 8.01 4.8 for CT images are displayed in Figure 13. This Figure provides a comparison between reconstructed image predictions with the proposed architecture, DeepPET with respect to the GT. TABLE VI: ROI Analysis: The mean, SD and the SNR for The high-resolution nature of the CT images and a smaller the 4 regions of interest marked in Figure 15 dataset presented challenges during the training of the pro- Region Image Mean SD SNR CNR posed architecture. The predictions by DeepPET appear to 1 DUG+SR 0.011 2.91e-4 7.61 6.66 be better resolved than those by the proposed architecture. FBP 0.011 3.45e-4 15.86 10.69 However, the tissue and the bone structures are not clearly 2 DUG+SR 0.004 1.53e-4 9.34 9.02 seen in the predictions by the deep learning architectures, FBP 0.004 1.84e-4 15.36 13.74 3 DUG+SR 0.005 5.90e-4 9.40 5.49 thereby requiring further work to improve the reconstruction. FBP 0.005 4.88e-4 16.19 7.78 The intensity plots are compared for two different images in 4 DUG+SR 0.012 8.319e-4 15.47 5.92 the Figure 14. The ROI analysis was carried out for 4 regions FBP 0.012 2.736e-4 14.74 11.47 marked in the images reconstructed with deep learning and

6 DUG+SR DeepPET GT

1

2

3

4

Fig. 10: Image predictions by DUG+SR, DeepPET and GT for four PET Images from different parts of the patient volume

IV. DISCUSSION mentation is based on the use of a double U-Net generator (DUG) consisting of two cascaded U-Nets. While the first Deep learning has been applied to different fields of medical network transforms the raw data to an image the second one imaging. The vast majority of developments concern primarily assesses the reconstructed image output of the first network by image processing and analysis/classification tasks. Few works reiterating the relationship between the reconstructed image devoted in the field of image reconstruction have been largely and the raw data. Two additional blocks were added; namely concentrated in the use of deep learning within classical a network denoising the raw data prior to their input in the tomographic reconstruction algorithms. The main objectives DUG network and a super-resolution block operating on the of these works have been an improvement in the speed of DUG output image in order to improve it’s overall quality. The convergence and the quality of the successive image estimation proposed network was directly trained on clinical datasets for within the iterative reconstruction process. The alternative both PET and CT image reconstruction and its performance approach involving direct image reconstruction through the was assessed qualitatively and quantitatively. use of deep learning approaches to estimate images directly from the use of raw data (sinograms or projections) has been Deep neural networks usually result in blurred output. This much less explored both for PET and CT. fact is clear in the predictions made by the DUG network. Most implementations in direct image reconstruction con- Both the qualitative analysis using the profiles through the cern the use of fully connected layers which encode the raw reconstructed images and the quantitative metrics SSIM and data followed by convolutional layers. In most of the proposed the MSE, demonstrate the improvement of the reconstructed implementations a large number of parameters need to be images resulting from the incorporation of the SR block. The optimised which reduces the computational burden and overall qualitative analysis also clearly demonstrates the superiority of robustness. In this work we have proposed an original direct the proposed algorithm for direct PET image reconstruction image reconstruction deep learning framework based on an in comparison to alternative approaches such as DeepPET. architecture inspired by convolutional generative adversarial Finally in the ROI analysis we observed that the SNR and networks used in image to image translation. The imple- CNR are higher with the deep learning approach for the PET

7 images while they are lower than the traditional methods for DUG+SR DeepPET GT CT images. This is consistent with the observations in the qualitative analysis, where the proposed approach was not able to sufficiently resolve different tissue classes in the resulting reconstructed CT images in comparison with the ground truth. One of the potential reasons of the worse performance of DUG-RECON for CT reconstruction relative to the superior performance observed for PET image reconstruction may be the lower number of available CT images in the training pro-

y

t i cess. This limitation will be addressed as part of future work.

s

n Despite the lower performance of the proposed architecture for

e

t

n CT images it still presents comparable predictions and opens

I

d up avenues for deep learning architectures in tomographic

e

s

i

l reconstruction. In general, the limitations of a deep learning a based reconstruction is the adaptability to new data which is

m

r

o very different from the training sample space. Once a practical N methodology is identified, one could have a deep learning Distance (in pixels) pipeline with an ensemble of networks trained on different datasets to perform the reconstruction task. DUG DUG+SR GT

V. CONCLUSION We have demonstrated the use of generative convolutional networks for the tomographic image reconstruction task. More specifically we have proposed a new architecture for direct reconstruction that approximates the 2-D reconstruction pro-

y cess. Also we have significantly reduced the parameters re-

t

i

s quired for the domain transform task in image reconstruction.

n

e

t The three-step training pipeline based exclusively on deep

n

I learning decentralises the various tasks involved in image

d

e reconstruction into denoising, domain transform and super

s

i

l

a resolution. Various super resolution strategies are currently

m

r being explored to improve the reconstructed image. Our pro-

o

N posed strategy for tomographic reconstruction will eventually Distance (in pixels) lead to a network based reconstruction as we continue to improve the framework. Currently it does not perform better Fig. 11: Intensity Profile across the image (highlighted by a than traditional methods in terms of utility metrics but still has yellow line) for a PET image prediction by DUG, DUG+SR the advantage of instantaneous reconstruction and an effective and DeepPET compared with the GT denoising strategy. We plan to extend the work on realistic detector data generated through Monte Carlo simulations in addition to sinograms obtained through Radon transform. We DUG+SR MLEM are also working on adapting the architecture to raw detector 3 3 data. Another important aspect of the data based deep learning 2 2 approach is that the predictions are limited by the quality of the 4 4 dataset. It becomes essential to have realistic datasets without 1 1 compromising on the image quality to improve the training of the neural networks. DUG+SR DUG+SR 14 MLEM 10 MLEM

12 8 10 REFERENCES

R R 6 8

N

N

S 6 C 4 [1] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks 4 for biomedical image segmentation,” in International Conference on 2 2 Medical image computing and computer-assisted intervention. Springer, 0 0 1 2 3 4 1 2 3 4 2015, pp. 234–241. [2] Z. Guo, X. Li, H. Huang, N. Guo, and Q. Li, “Deep learning-based image Region of Interest Region of Interest segmentation on multimodal medical imaging,” IEEE Transactions on Radiation and Plasma Medical Sciences, vol. 3, no. 2, pp. 162–169, Fig. 12: SNR and CNR comparison amongst DUG+SR and 2019. OSEM for PET image along 4 regions of interest [3] A. Sinha and J. Dolz, “Multi-scale guided attention for medical image segmentation,” arXiv preprint arXiv:1906.02849, 2019.

8 DUG+SR DeepPET GT

1

2

3

Fig. 13: Image predictions by DUG+ SR, DeepPET and GT are displayed for 3 CT Images along different parts of the patient volume.

[4] J. Dolz, K. Gopinath, J. Yuan, H. Lombaert, C. Desrosiers, and I. B. [13] K. Gong, J. Guan, C.-C. Liu, and J. Qi, “PET image denoising using Ayed, “Hyperdense-net: a hyper-densely connected CNN for multi- a deep neural network through fine tuning,” IEEE Transactions on modal image segmentation,” IEEE Transactions on Radiation and Radiation and Plasma Medical Sciences, vol. 3, no. 2, pp. 153–161, Plasma Medical Sciences, vol. 38, no. 5, pp. 1116–1126, 2018. 2018. [5] M. Hatt, B. Laurent, A. Ouahabi, H. Fayad, S. Tan, L. Li, W. Lu, [14] J. Maier, S. Sawall, M. Knaup, and M. Kachelrieß, “Deep scatter V. Jaouen, C. Tauber, J. Czakon et al., “The first MICCAI challenge on estimation (DSE): Accurate real-time scatter estimation for X-ray CT PET tumor segmentation,” Medical image analysis, vol. 44, pp. 177– using a deep convolutional neural network,” Journal of Nondestructive 195, 2018. Evaluation, vol. 37, no. 3, p. 57, 2018. [6] V. S. Kadimesetty, S. Gutta, S. Ganapathy, and P. K. Yalavarthy, “Con- [15] Z. Xie, R. Baikejiang, K. Gong, X. Zhang, and J. Qi, “Generative volutional neural network-based robust denoising of low-dose computed adversarial networks based regularized image reconstruction for PET,” tomography perfusion maps,” IEEE Transactions on Radiation and in 15th International Meeting on Fully Three-Dimensional Image Recon- Plasma Medical Sciences, vol. 3, no. 2, pp. 137–152, 2018. struction in Radiology and Nuclear Medicine, vol. 11072. International [7] M. Li, W. Hsu, X. Xie, J. Cong, and W. Gao, “SACNN: Self-attention Society for Optics and Photonics, 2019, p. 110720P. convolutional neural network for low-dose CT denoising with self- [16] K. Kim, D. Wu, K. Gong, J. Dutta, J. H. Kim, Y. D. Son, H. K. Kim, supervised perceptual loss network,” IEEE Transactions on Medical G. El Fakhri, and Q. Li, “Penalized PET reconstruction using deep Imaging, 2020. learning prior and local linear fitting,” IEEE Transactions on Medical [8] H. Chen, Y. Zhang, W. Zhang, P. Liao, K. Li, J. Zhou, and G. Wang, Imaging, vol. 37, no. 6, pp. 1478–1487, 2018. “Low-dose CT denoising with convolutional neural network,” in 2017 [17] K. Gong, J. Guan, K. Kim, X. Zhang, J. Yang, Y. Seo, G. El Fakhri, IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017). J. Qi, and Q. Li, “Iterative PET image reconstruction using convolutional IEEE, 2017, pp. 143–146. neural network representation,” IEEE Transactions on Medical Imaging, [9] Q. Yang, P. Yan, Y. Zhang, H. Yu, Y. Shi, X. Mou, M. K. Kalra, vol. 38, no. 3, pp. 675–685, 2019. Y. Zhang, L. Sun, and G. Wang, “Low-dose CT image denoising using a [18] W. Whiteley and J. Gregor, “Direct image reconstruction from raw mea- generative adversarial network with wasserstein distance and perceptual surement data using an encoding transform refinement-and-scaling neu- loss,” IEEE Transactions on Radiation and Plasma Medical Sciences, ral network,” in 15th International Meeting on Fully Three-Dimensional vol. 37, no. 6, pp. 1348–1357, 2018. Image Reconstruction in Radiology and Nuclear Medicine, vol. 11072. [10] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, International Society for Optics and Photonics, 2019, p. 1107225. M. Ghafoorian, J. A. Van Der Laak, B. Van Ginneken, and C. I. Sanchez,´ [19] B. Zhu, J. Z. Liu, S. F. Cauley, B. R. Rosen, and M. S. Rosen, “Image “A survey on deep learning in medical image analysis,” Medical image reconstruction by domain-transform manifold learning,” Nature, vol. analysis, vol. 42, pp. 60–88, 2017. 555, no. 7697, p. 487, 2018. [11] A. Amyar, S. Ruan, I. Gardin, C. Chatelain, P. Decazes, and [20] I. Haeggstroem, C. R. Schmidtlein, G. Campanella, and T. J. Fuchs, R. Modzelewski, “3-D RPET-NET: development of a 3-D PET imaging “Deeprec: A deep encoder-decoder network for directly solving the PET convolutional neural network for radiomics analysis and outcome pre- reconstruction ,” arXiv preprint arXiv:1804.07851, 2018. diction,” IEEE Transactions on Radiation and Plasma Medical Sciences, [21] F. Natterer, The Mathematics of Computerized Tomography, ser. Classics vol. 3, no. 2, pp. 225–231, 2019. in Applied Mathematics. SIAM, 2001. [12] S. Cui, Y. Luo, H.-H. Tseng, R. K. Ten Haken, and I. El Naqa, “Artificial [22] L. A. Shepp and Y. Vardi, “Maximum likelihood reconstruction for neural network with composite architectures for prediction of local emission tomography,” IEEE Transactions on Medical Imaging, vol. 1, control in radiotherapy,” IEEE Transactions on Radiation and Plasma no. 2, pp. 113–122, 1982. Medical Sciences, vol. 3, no. 2, pp. 242–249, 2018. [23] J. A. Fessler, M. Sonka, and J. M. Fitzpatrick, “Statistical image

9 reconstruction methods for transmission tomography,” Handbook of medical imaging, vol. 2, pp. 1–70, 2000. DUG+SR DeepPET GT [24] A. R. De Pierro, “A modified expectation maximization algorithm for penalized likelihood estimation in emission tomography,” IEEE Transactions on Medical Imaging, vol. 14, no. 1, pp. 132–137, 1995. [25] D. Perdios, M. Vonlanthen, A. Besson, F. Martinez, M. Arditi, and J.-P. Thiran, “Deep convolutional neural network for ultrasound image en- hancement,” in 2018 IEEE International Ultrasonics Symposium (IUS). IEEE, 2018, pp. 1–4. [26] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1125– DUG+SR ) 0.025 DeepPET 1134. 1 - GT

m [27] C. Ledig, L. Theis, F. Huszar,´ J. Caballero, A. Cunningham, A. Acosta,

m 0.020 ( A. Aitken, A. Tejani, J. Totz, Z. Wang et al., “Photo-realistic single

n

o

i image super-resolution using a generative adversarial network,” in

t 0.015 a Proceedings of the IEEE conference on computer vision and pattern

u

n

e recognition, 2017, pp. 4681–4690.

t 0.010

t

A [28] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep 0.005 residual networks for single image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition

20 40 60 80 100 workshops, 2017, pp. 136–144. Distance (in pixels) [29] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. DUG+SR DeepPET GT [30] L. Kostakoglu, F. Duan, M. O. Idowu, P. R. Jolles, H. D. Bear, M. Muzi, J. Cormack, J. P. Muzi, D. A. Pryma, J. M. Specht et al., “A phase ii study of 3-deoxy-3-18f-fluorothymidine PET in the assessment of early response of breast cancer to neoadjuvant chemotherapy: results from ACRIN 6688,” Journal of Nuclear Medicine, vol. 56, no. 11, pp. 1681– 1689, 2015. [31] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard et al., “Tensorflow: A system for large- scale ,” in 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), 2016, pp. 265–283. [32] F. Chollet et al., “Keras,” https://github.com/fchollet/keras, 2015. 0.016

)

1

- 0.014

m 0.012

m

(

n 0.010

o

i

t

a 0.008

u

n

e 0.006

t

t

A 0.004 DUG+SR DeepPET 0.002 GT

20 40 60 80 100 Distance (in pixels)

Fig. 14: Intensity Profile for two CT images (highlighted by a yellow line) predicted by DUG and SR compared with the GT

DUG+SR FBP

1 2 3 4 1 2 3 4

14 16 DUG+SR DUG+SR FBP FBP 14 12

12 10

10

R R 8

N N 8 S C 6 6 4 4

2 2

0 0 1 2 3 4 1 2 3 4 Region of Interest Region of Interest Fig. 15: SNR and CNR comparison amongst DUG+SR and OSEM for CT image along 4 regions of interest

10