PTYCHNET : CNN BASED FOURIER PTYCHOGRAPHY Armin Kappeler

PTYCHNET : CNN BASED FOURIER PTYCHOGRAPHY Armin Kappeler, Sushobhan Ghosh, Jason Holloway, Oliver Cossairt, Aggelos Katsaggelos Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL 60208, USA ABSTRACT Fraunhofer diffraction Fourier ptychography is an imaging technique that overcomes ψ(x, y) ψˆ(u, v) A(u, v) I(x, y) the diffraction limit of conventional cameras with applications ψ(x, y) ψˆ(u, v) A(u, v) I(x, y) in microscopy and long range imaging. Diffraction blur causes ψ(x, y) ψˆ(u, v) A(u, v) I(x, y) resolution loss in both cases. In Fourier Ptychography, a co- Image herent light source illuminates an object, which is then im- sensor aged from multiple viewpoints. The reconstruction of the ob- Coherent Transmissiveψ(x, y) ψˆ(u, v) A(u, v) I(x, y) ject from these set of recordings can be obtained by an itera- source object tive phase retrieval algorithm. However, the retrieval process is Lens slow and does not work well under certain conditions. In this (aperture) paper, we propose a new reconstruction algorithm that is based on convolutional neural networks and demonstrate its advan- Fig. 1: Example setup for Fourier Ptychography (FP). Coherent tages in terms of speed and performance. light diffracts through a translucent medium into the far-field. A Index Terms— Fourier Ptychography, Convolutional Neu- lens samples a portion of the Fourier domain which is recorded ral Network, CNN as intensity images at the sensor. See Section 2.1 for details. 1. INTRODUCTION on the algorithm for retrieving the high resolution image. In place of a phase retrieval algorithm, we propose a Convolu- Imaging using traditional optical systems is constrained by the tional Neural Network (CNN) based solution (PtychNet), that space-bandwidth product (SBP) [1], which describes the trade- directly restores the image in the spatial domain without explic- off between high resolution and large field of view. Fourier itly recovering the phase information. CNNs have been proven ptychography (FP) is a coherent imaging technique which aims to be very effective for image classification [11–13], and have to overcome the SBP limitation by capturing a sequence of SBP become increasingly popular with other image processing tasks limited images and computationally combining them to recover such as super-resolution [14–16], image segmentation [17], etc. a high resolution, large FOV image and thus overcoming the We show that PtychNet obtains better reconstruction results SBP barrier. Fourier ptychography has been applied to wide in considerably less time if the low resolution images have no field, high resolution microscopy [2], quantitative phase imag- overlapping frequency bands. When the low-resolution images ing [3], adaptive fourier ptychography imaging [4], long dis- contain overlapping support in the frequency domain, we can tance, sub diffraction imaging [5] and other applications. In use PtychNet to significantly reduce the computation time of fourier ptychography, a high resolution image is recovered from an iterative phase retrieval algorithm. a set of frequency limited low resolution images of an object The remainder of the paper is organised as follows. In Sec- illuminated with coherent light source. To achieve this, an iter- tion 2 we briefly introduce Fourier Ptychography, in Section ative phase retrieval algorithm [6] recovers the phase informa- 3 we explain our proposed framework PtychNet. Sections 4 tion that is lost in the incoherent imaging process. A detailed contains our results and experimental evaluation and Section 5 overview of different phase reconstruction techniques can be concludes the paper. found in [7, 8]. Iterative phase retrieval algorithms performs well if the set 2. FOURIER PTYCHOGRAPHY of low resolution images have overlapping frequency bands in the fourier domain, but the reconstruction quality quickly de- 2.1. Image Formation Model grades as the overlap between the Fourier patches decreases [9]. The requirement of overlap between neighboring patches re- Consider the generalized imaging setup shown in Figure 1. A quires sequential scanning to obtain all the low resolution im- monochromatic source with wavelength λ illuminates a trans- ages and hence, a major barrier to single shot ptychography parent object. Let the 2D complex field that emanates from [10]. Reducing or eliminating the overlap-requirement would the object be denoted as (x; y). If a camera is placed in the lead to a much faster aquisition time. In this paper, we foucs far-field and satisfies the Fraunhoffer approximation, the field 1 Captured images Compute complex-field at Enforce magnitudes camera planes k Ii k 1 ψ ψ 2 k k i ← 2 i I = A ψˆ ψ = Aiψˆ ˆk i i i F ψi F Convert to Convert to Fourier spatial domain domain Compute high resolution Fourier domain image and enforce aperture constraint ˆk+1 k ˆ ψ min Ii Aiψ ← ψˆ −F 2 i Fig. 3: Block diagram for IERA used in [5], modified with permission. where the spatial arguments have been omitted for compact- ness. For an ideal lens with radius r, light within the support is passed uniformly and all other light is rejected, A = jj(u − cu; v − cv)jj2 ≤ r. Conventional methods estimate ^(u; v) using variations on iterative error reduction algorithms (IERAs) that enforces mag- Fig. 2: Example of image acquisition in Fourier Ptychogra- nitude constraints in the spatial domain and support constraints phy. N × N images with limited, overlapping frequency bands in the Fourier domain [6, 7]. Figure 3 shows the block diagram are captured to recover one high resolution image. Image used of the IERA used in [5]. from [5] with permission. incident on the lens is a scaled Fourier transform of the scene, 3. PTYCHNET ^(u; v) = F 1 f (x; y)g ; λz We propose a learning based algorithm of recovering the high resolution image based on Convolutional Neural Networks. An where λ is the wavelength of illumination, z is the distance be- ^ overview of the algorithm is shown in Figure 4 Our network tween object and lens, (u; v) is the field at the lens and (u; v) learns a non-linear mapping from the intensity images I to the are coordinates in the frequency domain. The frequency spec- i original input light field . Both, input Ii and output are trum is limited by the finite aperture of the lens, A(u − cu; v − in the spatial domain. The inverse filters of the band-passes ap- cv), where (cu; cv) is the center the lens. The lens focuses plied to the original light field can be approximated with convo- the light onto the image plane–which also satisfies the Fraun- lutional filters and the reconstruction process is locally indepen- hofer approximation–and the intensity of the resulting field is dent which makes this a well-suited problem for a CNN. The recorded by the sensor. The measured intensity is thus given by input data of the CNN consists of the concatenation of all the n o 2 intensity images I to a 3D-cube with dimensions w × h × N 2 I(x; y; c ; c ) / F ^(u; v) A(u − c ; v − c ) i u v u v (1) where w and h are the width and height of the image and N 2 where signifies an element-wise multiplication. For simplic- are the number of sampled images. The output of the CNN will ity, we will drop the the scaling factor of the Fraunhofer ap- directly be the desired high resolution field . proximation in this paper, though it may be accounted for after image reconstruction if desired. 3.1. Architecture To emulate capturing the scene with a larger lens, N images are captured by translating the lens, (cu; cv), to cover a The proposed CNN is based on the architecture used in [14]. It larger portion of the Fourier spectrum. An example of the data consists of three convolutional layers. The two hidden layers acquisition process is shown in Figure 2. H1 and H2 are each followed by a ReLU activation function. The first layer has 64 kernels with a kernel size of 9 × 9. The 2.2. Iterative Error Reduction Algorithm second layer has 32 kernels with a size of 5 × 5 and the output layer has a kernel size of 5 × 5. The output layer has only Recovering the complex field ^(u; v) from the set measured one kernel that will directly produce the reconstructed image in intensity images Ii, i = 1;:::;N, is a non-convex optimiza- ^ the spatial domain . The weights are initialized with random tion problem. That is, recovering (u; v) reduces to solving gaussian distributed values with a standard deviation of 0:001. the optimization problem: We use the Euclidean distance as our loss function. Experi- ∗ X ^ = argmin jj − FA ^jj s.t. j j2 = I ; ments with TV-minimization as loss function did not lead to ^ i i 2 i i any improvements in PSNR. i 2 4.1. Without overlap Table 1 shows the PSNR and SSIM results for the non- overlapping case. Results are shown for the Center image, IERA and PtychNet. Figure 5 shows the original image and the reconstructed images for the center image, Lena and resChart. We can see that PtychNet produces superior results, both visually as as well as in terms of PSNR/SSIM. Gains from IERA to PtychNet are between 0.6 and 2.1 dBs Fig. 4: PtychNet Overview 4.2. With overlap 3.2. Training Procedure The IERA and PtychNet images for 61% overlap are shown in Figure 6. Visually, the IERA reconstructed resChart looks Our algorithm is implemented with the Caffe framework [18]. more detailed and sharper. The difference in Lena is much less We trained our CNN on 91 publicly available images from obvious.

Load more