Quick viewing(Text Mode)

PTYCHNET : CNN BASED FOURIER PTYCHOGRAPHY Armin Kappeler

PTYCHNET : CNN BASED FOURIER PTYCHOGRAPHY Armin Kappeler

PTYCHNET : CNN BASED FOURIER

Armin Kappeler, Sushobhan Ghosh, Jason Holloway, Oliver Cossairt, Aggelos Katsaggelos

Department of Electrical Engineering and Science, Northwestern University, Evanston, IL 60208, USA

ABSTRACT Fraunhofer is an imaging technique that overcomes ψ(x, y) ψˆ(u, v) A(u, v) I(x, y) the diffraction limit of conventional cameras with applications ψ(x, y) ψˆ(u, v) A(u, v) I(x, y) in and long range imaging. Diffraction blur causes ψ(x, y) ψˆ(u, v) A(u, v) I(x, y) resolution loss in both cases. In Fourier Ptychography, a co- Image herent source illuminates an object, which is then im- sensor aged from multiple viewpoints. The reconstruction of the ob- Coherent Transmissiveψ(x, y) ψˆ(u, v) A(u, v) I(x, y) ject from these set of recordings can be obtained by an itera- source object tive algorithm. However, the retrieval process is slow and does not work well under certain conditions. In this () paper, we propose a new reconstruction algorithm that is based on convolutional neural networks and demonstrate its advan- Fig. 1: Example setup for Fourier Ptychography (FP). Coherent tages in terms of speed and performance. light diffracts through a translucent medium into the far-field. A Index Terms— Fourier Ptychography, Convolutional Neu- lens samples a portion of the Fourier domain which is recorded ral Network, CNN as intensity images at the sensor. See Section 2.1 for details.

1. INTRODUCTION on the algorithm for retrieving the high resolution image. In place of a phase retrieval algorithm, we propose a Convolu- Imaging using traditional optical systems is constrained by the tional Neural Network (CNN) based solution (PtychNet), that space-bandwidth product (SBP) [1], which describes the trade- directly restores the image in the spatial domain without explic- off between high resolution and large field of view. Fourier itly recovering the phase information. CNNs have been proven ptychography (FP) is a coherent imaging technique which aims to be very effective for image classification [11–13], and have to overcome the SBP limitation by capturing a sequence of SBP become increasingly popular with other image processing tasks limited images and computationally combining them to recover such as super-resolution [14–16], image segmentation [17], etc. a high resolution, large FOV image and thus overcoming the We show that PtychNet obtains better reconstruction results SBP barrier. Fourier ptychography has been applied to wide in considerably less time if the low resolution images have no field, high resolution microscopy [2], quantitative phase imag- overlapping frequency bands. When the low-resolution images ing [3], adaptive fourier ptychography imaging [4], long dis- contain overlapping support in the frequency domain, we can tance, sub diffraction imaging [5] and other applications. In use PtychNet to significantly reduce the computation time of fourier ptychography, a high resolution image is recovered from an iterative phase retrieval algorithm. a set of frequency limited low resolution images of an object The remainder of the paper is organised as follows. In Sec- illuminated with coherent light source. To achieve this, an iter- tion 2 we briefly introduce Fourier Ptychography, in Section ative phase retrieval algorithm [6] recovers the phase informa- 3 we explain our proposed framework PtychNet. Sections 4 tion that is lost in the incoherent imaging process. A detailed contains our results and experimental evaluation and Section 5 overview of different phase reconstruction techniques can be concludes the paper. found in [7, 8]. Iterative phase retrieval algorithms performs well if the set 2. FOURIER PTYCHOGRAPHY of low resolution images have overlapping frequency bands in the fourier domain, but the reconstruction quality quickly de- 2.1. Image Formation Model grades as the overlap between the Fourier patches decreases [9]. The requirement of overlap between neighboring patches re- Consider the generalized imaging setup shown in Figure 1. A quires sequential scanning to obtain all the low resolution im- monochromatic source with λ illuminates a trans- ages and hence, a major barrier to single shot ptychography parent object. Let the 2D complex field that emanates from [10]. Reducing or eliminating the overlap-requirement would the object be denoted as ψ(x, y). If a camera is placed in the lead to a much faster aquisition time. In this paper, we foucs far-field and satisfies the Fraunhoffer approximation, the field

1 Captured images Compute complex-field at Enforce magnitudes camera planes k Ii k 1 ψ ψ 2 k k i ← 2 i I = A ψˆ ψ = Aiψˆ ˆk i i i F  ψi F       Convert to Convert to Fourier  spatial domain domain

Compute high resolution Fourier domain image and enforce aperture constraint ˆk+1 k ˆ ψ min Ii Aiψ ← ψˆ −F 2 i       Fig. 3: Block diagram for IERA used in [5], modified with permission.

where the spatial arguments have been omitted for compact- ness. For an ideal lens with radius r, light within the sup- port is passed uniformly and all other light is rejected, A = ||(u − cu, v − cv)||2 ≤ r. Conventional methods estimate ψˆ(u, v) using variations on iterative error reduction algorithms (IERAs) that enforces mag- Fig. 2: Example of image acquisition in Fourier Ptychogra- nitude constraints in the spatial domain and support constraints phy. N × N images with limited, overlapping frequency bands in the Fourier domain [6, 7]. Figure 3 shows the block diagram are captured to recover one high resolution image. Image used of the IERA used in [5]. from [5] with permission. incident on the lens is a scaled of the scene, 3. PTYCHNET

ψˆ(u, v) = F 1 {ψ(x, y)} , λz We propose a learning based algorithm of recovering the high resolution image based on Convolutional Neural Networks. An where λ is the wavelength of illumination, z is the distance be- ˆ overview of the algorithm is shown in Figure 4 Our network tween object and lens, ψ(u, v) is the field at the lens and (u, v) learns a non-linear mapping from the intensity images I to the are coordinates in the frequency domain. The frequency spec- i original input light field ψ. Both, input Ii and output ψ are trum is limited by the finite aperture of the lens, A(u − cu, v − in the spatial domain. The inverse filters of the band-passes ap- cv), where (cu, cv) is the center the lens. The lens focuses plied to the original light field can be approximated with convo- the light onto the image plane–which also satisfies the Fraun- lutional filters and the reconstruction process is locally indepen- hofer approximation–and the intensity of the resulting field is dent which makes this a well-suited problem for a CNN. The recorded by the sensor. The measured intensity is thus given by input data of the CNN consists of the concatenation of all the n o 2 intensity images I to a 3D-cube with dimensions w × h × N 2 I(x, y, c , c ) ∝ F ψˆ(u, v) A(u − c , v − c ) i u v u v (1) where w and h are the width and height of the image and N 2 where signifies an element-wise multiplication. For simplic- are the number of sampled images. The output of the CNN will ity, we will drop the the scaling factor of the Fraunhofer ap- directly be the desired high resolution field ψ. proximation in this paper, though it may be accounted for after image reconstruction if desired. 3.1. Architecture To emulate capturing the scene with a larger lens, N im- ages are captured by translating the lens, (cu, cv), to cover a The proposed CNN is based on the architecture used in [14]. It larger portion of the Fourier spectrum. An example of the data consists of three convolutional layers. The two hidden layers acquisition process is shown in Figure 2. H1 and H2 are each followed by a ReLU activation function. The first layer has 64 kernels with a kernel size of 9 × 9. The 2.2. Iterative Error Reduction Algorithm second layer has 32 kernels with a size of 5 × 5 and the out- put layer has a kernel size of 5 × 5. The output layer has only Recovering the complex field ψˆ(u, v) from the set measured one kernel that will directly produce the reconstructed image in intensity images Ii, i = 1,...,N, is a non-convex optimiza- ˆ the spatial domain ψ. The weights are initialized with random tion problem. That is, recovering ψ(u, v) reduces to solving gaussian distributed values with a standard deviation of 0.001. the optimization problem: We use the Euclidean distance as our loss function. Experi- ∗ X ψˆ = argmin ||ψ − FA ψˆ|| s.t. |ψ |2 = I , ments with TV-minimization as loss function did not lead to ψˆ i i 2 i i any improvements in PSNR. i

2 4.1. Without overlap Table 1 shows the PSNR and SSIM results for the non- overlapping case. Results are shown for the Center image, IERA and PtychNet. Figure 5 shows the original image and the reconstructed images for the center image, Lena and resChart. We can see that PtychNet produces superior results, both visu- ally as as well as in terms of PSNR/SSIM. Gains from IERA to PtychNet are between 0.6 and 2.1 dBs Fig. 4: PtychNet Overview 4.2. With overlap

3.2. Training Procedure The IERA and PtychNet images for 61% overlap are shown in Figure 6. Visually, the IERA reconstructed resChart looks Our algorithm is implemented with the Caffe framework [18]. more detailed and sharper. The difference in Lena is much less We trained our CNN on 91 publicly available images from obvious. IERA also outperforms PtychNet in terms of PSNR. Set91 [19]. We converted the images to gray-scaled images Interestingly, the difference of resChart (IERA:18.28, Ptych- and resized them to w × h , where w = h = 512 pixels. Net:18.04) is much smaller than for Lena (IERA:31.52, Ptych- These images represent our groundtruth data ψ. The forward Net :29.53). For Set5, the IERA outperforms PtychNet by an model from equation 1 was applied to these images to obtain average of 2.4dBs (IERA:35.02, PtychNet:32.61). a collection of low quality intensity images Ii. The resulting However PtychNet has a much less runtime than IERA. As intensity images Ii were resized to w × h pixels and then a comparison, the runtime for a 512 × 512 image for concatenated to a 3D-cube of size w × h × N 2. From these IERA with 100 iterations is about 1 minute, while PtychNet cubes, we extracted about 15,000 patches of size 48 × 48 × N 2 completes in about 0.5 seconds. The IERA algorithm is ini- which were used as training database to the CNN. Note that tialized with the mean image (mean over the 49 input images). since both the input and the output image of the CNN are in Alternatively, we use the output of the PtychNet as initializa- the spatial domain, our reconstruction algorithm is spatially tion image. This leads to a significantly faster convergence of and therefore we can divide the input and output data the algorithm, since it is by itself already a good reconstruction into patches and process them independently. In order to avoid of the original image. In Figure 7, we show the average PSNR border effects due to the zero-padding for the convolutional versus iteration graph for Set5. For comparison, we also ini- layers, we only use the 32 × 32 center pixels of a training patch tialized the IERA with the center input image, whose bandpass to calculate the Euclidean loss. We created training datasets of filter is centered around the zero frequency (0,0). While IERA input images with overlapping and non-overlapping frequency with mean init needs roughly 30 iterations to converge, the Pty- bands. For the non-overlapping case, we achieved better per- chNet initialization only requires about 6 iterations to reach the formance, if we subtracted the center input image (image at maximum PSNR, which reduces the recovery time by factor coordinates 0, 0 in Figure 2) from the reconstructed output 5 and converges to the same PSNR. For the resChart image, image. This approach is similar to the idea of residual net- the PtychNet initialization even results in a slightly better qual- works [13]. Our networks were trained for 200,000 iterations ity. For all seven test images, the PSNR differs no more than with a batch size of 256. 0.02dBs for the different initialization methods. Hence we can achieve the same performance with IERA, but 5 times faster. 4. EXPERIMENTAL RESULTS Image Metric Center IERA PtychNet lena PSNR 25.11 24.68 25.82 In this section, we tested the effectiveness of our CNN by com- SSIM 0.6828 0.6488 0.7146 paring it against the IERA algorithm proposed in [5]. We tested resChart PSNR 14.41 12.92 15.05 out algorithm on the commonly used resolution chart (resChart) SSIM 0.1981 0.1357 0.2536 and lena image. In addition, we used the Set5 images from baby PSNR 25.97 25.46 26.50 [19] .We used PSNR and SSIM as our performance metric.The SSIM 0.6836 0.6488 0.7030 IERA algorithm was evaluated at 100 iterations as the results of IERA did not improve much after 100 iterations. We tested bird PSNR 28.49 28.50 29.70 it with the following 2 overlap configurations: SSIM 0.8175 0.8068 0.8537 butterfly PSNR 21.47 21.57 23.24 • With 0% Overlap SSIM 0.6201 0.5866 0.7258 head PSNR 30.48 30.01 30.67 • With 61% Overlap SSIM 0.7295 0.6964 0.7410 woman PSNR 24.93 24.82 25.83 where overlap is the percentage of overlap area of the input SSIM 0.7611 0.7431 0.8106 images in the frequency domain (see Figure 2b). As a baseline, we show the center image (referred to as Center) from the Table 1: PSNR and SSIM without overlap input image set (image at position 0, 0 in Figure 2d), which corresponds to the low pass filtered original image.

3 original Center image IERA PtychNet

original Center image IERA PtychNet

Fig. 5: Results with 0% overlap

IERA PtychNet

Fig. 7: Avg. PSNR over Set5 vs. Iterations for IERA with dif- ferent initializations. PtychNet initialization shows faster con- vergence compared to mean and center initializations.

is no pre-existing work on CNN based fourier ptychography al- IERA PtychNet gorithm. We show that in case of non-overlapped fourier sam- pling, CNNs performed significantly better than the existing Fig. 6: Results with 61% overlap IERA algorithm, both, in terms of speed and resolution. Al- though IERA performs better than PtychNet with overlapping fourier sampling, PtychNet reduces the runtime of the IERA by 5. CONCLUSION a factor of 5.

We introduced a recovery algorithm for fourier ptychography based on Deep Learning. To the best of our knowledge, there 4 6. REFERENCES [14] Armin Kappeler, Seunghwan Yoo, Qiqin Dai, and Agge- los K Katsaggelos, “Video super-resolution with convo- [1] Adolf W. Lohmann, Rainer G. Dorsch, David Mendlovic, lutional neural networks,” IEEE Transactions on Compu- Carlos Ferreira, and Zeev Zalevsky, “Space–bandwidth tational Imaging, vol. 2, no. 2, pp. 109–122, 2016. product of optical signals and systems,” J. Opt. Soc. Am. [15] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super- A, vol. 13, no. 3, pp. 470–473, Mar 1996. resolution using deep convolutional networks,” IEEE [2] Guoan Zheng, Roarke Horstmeyer, and Changhuei Yang, Transactions on Pattern Analysis and Machine Intelli- “Wide-field, high-resolution fourier ptychographic mi- gence, vol. 38, no. 2, pp. 295–307, Feb 2016. croscopy,” Nat , vol. 7, no. 9, pp. 739–745, Sep [16] Christian Ledig, Lucas Theis, Ferenc Huszar,´ Jose Ca- 2013, Article. ballero, Andrew Cunningham, Alejandro Acosta, Andrew [3] Xiaoze Ou, Roarke Horstmeyer, Changhuei Yang, and Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Guoan Zheng, “Quantitative phase imaging via fourier et al., “Photo-realistic single image super-resolution us- ptychographic microscopy,” Opt. Lett., vol. 38, no. 22, ing a generative adversarial network,” arXiv preprint pp. 4845–4848, Nov 2013. arXiv:1609.04802, 2016. [4] Zichao Bian, Siyuan Dong, and Guoan Zheng, “Adaptive [17] Jonathan Long, Evan Shelhamer, and Trevor Darrell, system correction for robust fourier ptychographic imag- “Fully convolutional networks for semantic segmenta- ing,” Opt. Express, vol. 21, no. 26, pp. 32400–32410, Dec tion,” in Proceedings of the IEEE Conference on Com- 2013. puter Vision and Pattern Recognition, 2015, pp. 3431– [5] Jason Holloway, M. Salman Asif, Manoj Kumar Sharma, 3440. Nathan Matsuda, Roarke Horstmeyer, Oliver Cossairt, [18] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey and Ashok Veeraraghavan, “Toward Long Distance, Karayev, Jonathan Long, Ross Girshick, Sergio Guadar- Sub-diffraction Imaging Using Coherent Camera Arrays,” rama, and Trevor Darrell, “Caffe: Convolutional archi- , IEEE Transactions on, Submit- tecture for fast feature embedding,” in Proceedings of ted for review. the 22nd ACM international conference on Multimedia. [6] R. W. Gerchberg and W. O. Saxton, “A practical algo- ACM, 2014, pp. 675–678. rithm for the determination of the phase from image and [19] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super- diffraction plane pictures,” Optik (Jena), vol. 35, pp. 337, resolution via sparse representation,” IEEE Transactions 1972. on Image Processing, vol. 19, no. 11, pp. 2861–2873, Nov [7] J. R. Fienup, “Phase retrieval algorithms: a comparison,” 2010. Appl. Opt., vol. 21, no. 15, pp. 2758–2769, Aug 1982. [8] C. Yang, J. Qian, A. Schirotzek, F. Maia, and S. March- esini, “Iterative Algorithms for Ptychographic Phase Re- trieval,” ArXiv e-prints, May 2011. [9] Jianliang Qian, Chao Yang, A Schirotzek, F Maia, and S Marchesini, “Efficient algorithms for ptychographic phase retrieval,” Inverse Problems and Applications, Con- temp. Math, vol. 615, pp. 261–280, 2014. [10] Pavel Sidorenko and Oren Cohen, “Single-shot ptychog- raphy,” Optica, vol. 3, no. 1, pp. 9–14, Jan 2016. [11] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Process- ing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, Eds., pp. 1097–1105. Curran As- sociates, Inc., 2012. [12] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Ser- manet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich, “Going deeper with ,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1–9. [13] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.

5