<<

Improving Resolution and Depth-of-Field of Field Cameras Using a Hybrid Imaging System

Vivek Boominathan, Kaushik Mitra, Ashok Veeraraghavan Rice University 6100 Main St, Houston, TX 77005 [vivekb, Kaushik.Mitra, vashok] @rice.edu

Abstract Lytro Hybrid Imager camera (our system) Current light field (LF) cameras provide low spatial res- olution and limited depth-of-field (DOF) control when com- 11 megapixels 0.1 megapixels 9 X 9 angular pared to traditional digital SLR (DSLR) cameras. We show 9 X 9 angular resolution. that a hybrid imaging system consisting of a standard LF resolution. camera and a high-resolution (HR) standard camera en- ables (a) achieve high-resolution digital refocusing, (b) bet- Fundamental

Angular Angular Resolution resolution trade-off ter DOF control than LF cameras, and (c) render grace- 11 megapixels DSLR no angular ful high-resolution viewpoint variations, all of which were camera previously unachievable. We propose a simple patch-based information. algorithm to super-resolve the low-resolution (LR) views of Spatial Resolution the light field using the high-resolution patches captured us- Figure 1: Fundamental resolution trade-off in light-field ing a HR SLR camera. The algorithm does not require the imaging: Given a fixed resolution sensor there is an inverse LF camera and the DSLR to be co-located or for any cali- relationship between spatial resolution and angular resolu- bration information regarding the two imaging systems. We tion that can be captured. By using a hybrid imaging system build an example prototype using a Lytro camera (380×380 containing two sensors, one a high spatial resolution camera pixel spatial resolution) and a 18 megapixel (MP) Canon and another a light-field camera, one can reconstruct a high DSLR camera to generate a light field with 11 MP reso- resolution light field. 1 th lution (9× super-resolution) and about 9 of the DOF of the Lytro camera. We show several experimental results on 1.1. Motivation challenging scenes containing occlusions, specularities and Fundamental Resolution Trade-off: Given an image complex non-lambertian materials, demonstrating the effec- sensor of a fixed resolution, existing methods for captur- tiveness of our approach. ing light-field trade-off spatial resolution in order to acquire angular resolution (with the exception of the recently pro- 1. Introduction posed mask-based method [25]). Consider an example of a Light field (LF) is a 4-D function that measures the spa- 11 MP image sensor, much like the one used in the Lytro tial and angular variations in the intensity of light [2]. Ac- camera [23]. A traditional camera using such a sensor is quiring light fields provides us with three important capa- capable of recording 11 MP images, but acquires no an- bilities that a traditional camera does not allow: (1) ren- gular information and therefore provides no ability to per- der images with small viewpoint changes, (2) render im- form post-capture refocusing. In contrast, the Lytro camera ages with post-capture control of focus and depth-of-field, is capable of recording 9 × 9 angular resolution, but has and (3) compute a depth map or a range image by utilizing a low spatial resolution of 380 × 380 pixels (since it con- either multi-view stereo or depth from focus/defocus meth- tains a 9 × 9 pixels per lens array). This resolution loss is ods. The growing popularity of LF cameras is attributed not restricted to microlens array-based LF cameras but is a to these three novel capabilities. Nevertheless, current LF common handicap faced by other LF cameras including tra- cameras suffer from two significant limitations that hamper ditional mask-based LF cameras [34], camera arrays [29], their widespread appeal and adoption: (1) low spatial reso- and angle-sensitive pixels [35]. Thus, there is an imminent lution and (2) limited DOF control. need for improving the spatial resolution characteristics of Near Focused Far Focused

Depth Map DSLR Camera (high-resolution image) Light-field Camera (low-resolution light-field)

+

Hybrid Imaging (high-resolution light-field) Depth Map Near Focused Far Focused

Figure 2: Traditional cameras capture high resolution photographs but provide no post-capture focusing controls. Common light field cameras provide depth maps and post-capture refocusing ability but at very low spatial resolution. Here, we show that a hybrid imaging system comprising of a high-resolution camera and a light-field camera allows us to obtain high-resolution depth maps and post-capture refocusing. LF sensors. We propose using a hybrid imaging system con- 1.2. Contributions taining two cameras, one with a high spatial resolution sen- In this paper, we propose a hybrid imaging system con- sor and the second being a light-field camera, which can be sisting of a high resolution standard camera along with the used to reconstruct a high resolution light field (see Figure low-resolution LF camera. This hybrid imaging system 1). (Figure 2) along with the associated algorithms enables us to capture/render (a) high spatial resolution light field, (b) Depth Map Resolution: LF cameras enable to compu- high spatial resolution depth maps, (c) higher depth resolu- tation of depth information by the application of multi-view tion (more depth layers), and (d) shallower DOF. stereo or depth from focus/defocus methods on the rendered views. Unfortunately, the low spatial resolution of the ren- 2. Related Work dered views result in low resolution depth maps. In addi- LF capture: Existing LF cameras can be divided into tion, since the depth resolution of the depth maps (i.e., the two main categories: (a) single shot [28, 18, 34, 23, 30, 17, number of distinct depth profiles within a fixed imaging vol- 29, 35], and (b) multiple shot [21, 4]. Single shot light field ume) is directly proportional to the disparity between views, cameras multiplex the 4-D LF onto the 2D sensor, losing the low resolution of the views directly result in very few spatial resolution to capture the angular information in the depth layers in the recovered range map. This results in er- LF. Such cameras employ either a lenslet array close to the rors when the depth information is directly used for vision sensor [28, 17], a mask close to the sensor [34], angle sen- tasks such as segmentation and object/activity recognition. sitive pixels [35] or an array of lens/prism outside the main lens [18]. An example of multiple shot LF capture is pro- DOF Control: The DOF of an imaging system is in- grammable aperture imaging [21], which allows capturing versely proportional to the image resolution (for a fixed f# light fields at the spatial resolution of the sensor. Recently, and sensor size). Since the rendered views in a LF cam- Babacan et al. [4], Marwah et al. [25] and Tambe et al. [32] era are low-resolution, this results in much larger DOF than show that one can use compressive sensing and dictionary can be attained using high resolution DSLR cameras with learning to reduce the number of images required. The rein- similar sensor size and f#. This is a primary reason why terpretable imager by Agrawal et al. [3] has shown resolu- DSLR cameras provide shallower DOF and are favored for tion trade-offs in a single image capture. Another approach photography. for capturing LF is to use a camera array [19, 20, 37]. How- ever, such approaches are hardware intensive, costly and re- There is a need for a high resolution shallow DOF LF quire extensive bandwidth, storage and power consumption. imaging device. LF Super-resolution and Plenoptic2.0: The Plenoptic2.0 camera [17] recovers the lost resolution the self-similarity within the input image [15]. In our hy- by placing the microlens array at a different location brid imaging method, patches from each view of a LF is compared to the original design [28]. Similarly, the matched with a reference high resolution image the same Raytrix camera [30] uses a microlens array with lenses scene. Since the high resolution image has the exact de- of different focal length to improve spatial resolution. tails of the scene, the super-resolved LF has the true infor- Recently, several LF super-resolution algorithms have mation compared to hallucinated information by [16, 15]. been proposed to recover the lost resolution [7, 36, 26]. Recently, a couple of fast approximate nearest patch search Apart from these hardware modifications to the plenoptic algorithms have been introduced [5]. We use the fast library camera, super-resolution algorithms in context of LF have for approximate nearest neighbors (FLANN) [27] to search also been proposed. Bishop et al. [7] proposed a Bayesian for matching patches in the reference high-resolution im- framework in which they assume Lambertian textural age. priors in the image formation model and estimate both the high resolution depth map and light field. Wanner et al. 3. Hybrid Light Field Imaging [36] propose to compute continuous disparity maps using The hybrid imager we propose is a combination of two the epipolar plane image (EPI) structure of the LF. They imaging systems: a low resolution LF device (Lytro cam- then use this disparity map and variational techniques to era) and a high-resolution camera (DSLR). The Lytro cam- compute super-resolved novel views. Mitra et al. [26] learn era captures the angular perspective views and the depth of Gaussian mixture model (GMM) for light field patches and the scene while the DSLR camera captures a photograph of perform Bayesian inference to obtain super-resolved LF. the scene. Our algorithm combines these two imaging sys- Most of these methods show modest super-resolution by tems to produce a light field with the spatial resolution of a factor of 4×. Here, we exploit the presence of a high the DSLR and the angular resolution of the Lytro. resolution camera to obtain significantly higher resolution light-fields. 3.1. Hybrid Super-resolution Algorithm Hybrid Imaging: The idea of hybrid imaging was pro- Motivated by the recent success and adoption of patch- posed in the context of motion deblurring [6], where a low based algorithms for image super-resolution, we adapt ex- resolution high speed video camera co-located with a high isting patch-based super-resolution algorithm for our hybrid resolution still camera was used to deblur the blurred im- LF reconstruction. Traditional patch-based super-resolution ages. Following this, several examples of hybrid imag- algorithms replace low-resolution patches from the test im- ing have found utility in different applications. Cao et al. age with high-resolution patches from from a large database [9] have proposed a hybrid imaging system consisting of of natural images [16]. These super-resolution techniques a RBG video camera and a LR multi-spectral camera to work reliably up to a factor of about 4; artifacts become produce HR multispectral video using a collocated system. noticeable in the reconstructed image when used for larger Another example of hybrid imaging system is the virtual upsampling problems. In our example, the loss in spatial view synthesis system proposed by Tola et al. [33], where resolution due to light-field capture is significantly larger four regular video cameras and a time-of-flight sensor is (9× in the case of the Lytro camera), beyond the scope of used. They show that by adding the time-of-flight cam- these techniques. Our hybrid imaging system contains a sin- era they could render better quality virtual views than just gle high-resolution detailed texture of the same scene (albeit using camera array with similar sparsity. Recently, a high from a slightly different viewpoint) which we will show sig- resolution camera, co-located with a Shack-Hartmann sen- nificantly improves our ability to perform super-resolution sor has been used to improve the resolution of 3D images with large upsampling factors. from a microscope [22]. All the above-mentioned hybrid Overview: Consider the process of super-resolving a LF imaging systems require that the different sensors be co- by a factor of N. Using our setup, the DSLR captures located in order for the algorithms to be able to effectively an image that has a spatial resolution N times that of the super-resolve image information. Motivated by the recent LF camera. From the high-resolution image, we extract n success of patch based matching algorithms, we propose a patches {href,i}i=1 and store them in dictionary Dh, where patch based strategy for super-resolution absolving the need n is the total number of patches in the HR image. Low- n for co-location providing significant ease of practical real- resolution features {fref,i}i=1 are computed from each of ization. the HR patches by down-sampling by a factor of N and Patch Matching-based Algorithms: Patch matching- then computing first- and second-order gradients. The low- based techniques have been used in a variety of applica- resolution features are stored in dictionary Df . tions including texture synthesis [14], image completion For super-resolving the low-resolution LF, we super- [31], denoising [8], deblurring [12], image super-resolution resolve each view separately using the reference dictionary [15, 16]. The patch based image super-resolution either per- pair Dh/Df . For each patch lj in a given view of the low- formed matching in a database of images [16] or exploited resolution LF, gradient feature fj is calculated. Following from [39], the 9 nearest neighbors in dictionary Df with the resolution features (Mentioned in Section 3) instead of low smallest L2 distance from fj are computed; these 9 nearest resolution patches. ˆk 9 In our super-resolution approach, we create the HR-LR neighbors are denoted as {fref,j}k=1. The estimated high- ˆ patch pairs database using the reference HR image. In the resolution patch hj corresponding to lj is estimated as a k 9 external image statistics method, we obtain the patch pairs weighted linear combination of {h } taken from Dh ref,j i=1 from the Berkley segmentation dataset (BSD) [24], as was which correspond to the 9 nearest neighbors in Df . In prac- tice, we extract an overlapping set of low-resolution patches done in [39]. For testing, we use the light-field data from the with a 1-pixel shift between adjacent patches and recon- Stanford light-field dataset [1]. While [39] analyze the pre- struct the high-resolution intensity at a pixel as the average diction error and uncertainty for 2× image super-resolution, of the recovered intensities for that pixel. we do so for a 4× light-field super-resolution problem since Feature selection: Gradient information can be incor- the upsampling factors for LF capture are typically larger. l 8 × 8 9 porated into patch matching algorithms to improve accu- Given a LR patch of size , we compute its {lk}9 racy when searching for similar patches. Chang et al. [10] nearest neighbors j i=1 and corresponding HR patches k 9 use first- and second-order derivatives as features to facili- {hj }i=1 with respect to the HR-LR databases for the two tate matching. We also use first- and second-order gradients approaches. The HR patches are of size 32 × 32. The re- ˆ as the feature which is extracted from the low-resolution constructed HR patch, hj, is then given by Equation 3. 2 patches. The four 1-D gradient filters used to extract the We choose the parameter σ in the weight-term by cross- features are: validation. As in [39], the prediction error is defined as ˆ 2 ||hGroundT ruth − hj||2 and the prediction uncertainty is the T k 9 g1 =[−1, 0, 1], g2 = g1 (1) weighted variance of the predictors {hj }i=1. The predic- tion error and prediction uncertainty were averaged over all g =[1, 0, −2, 0, 1], g = g T 3 4 3 (2) the views of the input light-field. And this was done for 9 different light-field datasets and averaged. where the superscript “T” denotes transpose. For a low- Since the Stanford light field database contains 17 × 17 resolution patch l, filters {g , g , g , g } are applied and 1 2 3 4 views, we are able to analyze the reconstruction perfor- feature f is represented as concatenation of the vectorized l mance of a 5 × 5 light-field with (a) co-located reference filter outputs. camera, (b) a reference camera with medium baseline and Reconstruction weights: For a test patch, l , let us as- j (c) another with a large baseline. Figure 3 (Left) shows sume that the 9 nearest neighbors to feature f in the dic- j the prediction error for both approaches under varying op- tionary D are {f k }9 . Let {hk }9 denote the cor- f ref,j k=1 ref,j k=1 erating conditions. This result shows that our approach has responding high-resolution patches in D . The up-sampled h a lower prediction error even when the reference image is patch hˆ is then estimated by: j separated from the LF camera by a wide baseline. Figure 3

P9 k k 2 (Right) shows the prediction uncertainty, when finding the wkh −||fj − f || hˆ = k=1 ref,j , where w = exp ref,j nearest neighbors. This plots shows the reliability measure j P9 k 2σ2 k=1 wk of the prediction. High uncertainties (entropy) among all k 9 (3) the HR candidates {hj }i=1 of a given LR patch indicates We use the Stanford light-field database [1], to cross vali- high ambiguity in prediction and results in artifacts like hal- date and estimate the value of σ2 that minimizes the predic- lucinations and blurring. Our hybrid approach has a con- tion error and use that value in all of our experiments. sistently low uncertainty suggesting that computed nearest neighbors have low entropy. These experiments clearly es- 3.2. Performance Analysis tablish the efficacy of hybrid imaging over traditional light To characterize the performance of image restoration field super-resolution. algorithms, Zontak et al. [39] have proposed two mea- sures: 1) prediction error and 2) prediction uncertainty. 4. Experiments We use these performance metrics to compare our LF Experimental Prototype: We decode the Lytro cam- super-resolution approach against the super-resolution ap- era’s raw sensor output using the toolbox provided by proaches based on external image statistics [16]. All Dansereau et al. [13]. This produces a light-field of spa- these approaches are instances of “Example-based super- tial resolution 380 × 380 pixels and an angular resolution resolution” [16], where we use a dataset of high-res/low-res 9 × 9 views. patch pairs (hi, li), i = 1, 2, ..., n to super-resolve a given For a high-resolution DSLR, we use the 18 MP Canon image. We extract patches from the given image, and match T3i DSLR with a Canon EF-S 18 − 55mm f/3.5 − 5.6 IS them to low resolution patches and then replace them with II DSLR Lens. It has a spatial resolution of 5184 × 3456 corresponding high resolution patches to obtain the super- pixels. The lens is positioned such that the FOV of the resolved image. In this paper, we are going to match low Lytro camera occupies the maximum part of the Canon Prediction Error Prediction Uncertainity 1.8 300 External DB of 5 images External DB of 5 images External DB of 10 images External DB of 10 images 1.6 External DB of 40 images External DB of 40 images 250 Hybrid (Large Baseline) Hybrid (Large Baseline) 1.4 Hybrid (Medium Baseline) Hybrid (Medium Baseline) Hybrid (Co−located) Hybrid (Co−located) 1.2 200

1 150 0.8 Uncertainity RMSE per Pixel 0.6 100

0.4 50 0.2

0 0 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 Mean Gradient Magnitude per Patch Mean Gradient Magnitude per Patch Figure 3: Prediction error and prediction uncertainty using a ground truth LF: (Left) We first compare the prediction error of our hybrid imaging approach with respect to extrinsic image statistics [16]. Our approach has lower prediction error than the other techniques. This experiment shows that our approach has lower prediction error even when the reference image is separated by a wide baseline. (Right) We show the prediction uncertainty when finding the 9 nearest neighbors in the low- resolution dictionary. Our approach consistently has less uncertainty than using a database of images. These experiments clearly establish the superiority of hybrid imaging over using a database of images. (Pixel intensities vary between 0 and 255)

Bicubic 9x Mitra et al. 9x Cho et al. 9x Hybrid (ours) 9x

Figure 4: 9× LF super-resolution of a real scene: The scene is captured using a Lytro camera with a spatial resolution of 380 × 380 for each view. We show the central view of the 9× super-resolved LF, reconstructed using bicubic interpolation, Mitra et al. [26], Cho et al. [11], and our algorithm. As shown in the insets, our algorithm is better able to retain the highly textured regions of the scene. FOV. The overlapping FOVs results in a 3420 × 3420 pixel Algorithmic Details: We consider low resolution region of interest on the Canon image is; 9 times larger than patches of size 8 × 8 and high resolution patches of size the spatial resolution of LF produced by the Lytro camera. 72 × 72. To reduce the reconstruction time, we learn a tree The product of our algorithm will be a 9× spatially super- structure using the FLANN library [27] on the dictionary 2 resolved LF. Df instead of using it dictionary directly. The value of σ Bicubic 9x Mitra et al. 9x Cho et al. 9x Hybrid (ours) 9x

Figure 5: 9× LF super-resolution of a complex scene: We show the central view of the 9× super-resolved LF, of another scene, reconstructed using bicubic interpolation, Mitra et al. [26], Cho et al. [11] and our algorithm. This scene features complex elements such as colored liquids and a reflective ball. Our algorithm is able to accommodate these materials and provides a better reconstruction than the other methods.

(a) Refocusing using disparity obtained from low resolution LF (a) The red box and green box correspond to EPI of lines shown in our reconstruction (right image) in Figure 4

(b) Refocusing using disparity obtained from 9× super-resolved LF (b) The violet box and yellow box correspond to EPI plots of lines Figure 7: Shallow depth of field: We simulate a low- shown in our reconstruction (right image) in Figure 5 resolution LF camera with focal length of 40mm, focused Figure 6: Epipolar constraints: In this figure, we show the at 200mm and having an aperture of f/8 using the Blender EPIs from our reconstructed LF. We don’t explicitly enforce software. A high-resolution image of 9 times the spatial epipolar constraints in our algorithm, but from the above resolution of the LF is also generated. A high-resolution Figure it can be seen that the depth dependent slope of tex- LF is reconstructed using our algorithm. We synthetically tured surfaces are generally preserved in our reconstruction. refocus the high-resolution LF to represent a camera with an aperture corresponding to f/0.8. (a) Refocusing done in Equation 3 was chosen to be 40. For the rest of the details using disparity obtained from low-resolution LF is shown. of reconstruction, refer Section 3. The region between the red dashed-lines are in-focus and Shallow DOF: In order to quantitatively analyze the the resulting DOF is 5mm. Also note that the blurring shallow depth of field produced by our hybrid system, we varies slowly outside of the in-focus region. (b) Refocus- render (using Blender) a scene containing a slanted plane ing done using disparity obtained from the super-resolved and a ruler. By performing super-resolution and refocusing, LF is shown. The resulting DOF is 0.55mm, nine times we clearly see that the hybrid imaging approach produces smaller than (a). This illustrates that our hybrid system can about 9× shallower DOF than traditional LF refocusing di- obtain a DOF that is 9 times narrower than the DOF of the rectly from the Lytro data (see Figure 7). low-resolution LF. High resolution LF and Refocusing Comparisons: We compare our algorithm with the super-resolution techniques of Mitra et al. [26] and Cho et al. [11] as well as using Samsung Advanced Institute of Technology through the bicubic interpolation. Mitra et al. [26] present a method to Samsung GRO program. super-resolve LFs using a GMM-based approach. Cho et al. [11] present an algorithm that is tailored to decode a References higher spatial resolution LF from the Lytro camera and then uses dictionary-based method [38] to further super-resolve [1] Stanford light field archive. http://lightfield.stanford.edu/lfs.html. the LF. [2] E. Adelson and J. Bergen. The plenoptic function and the Figure 4 shows the central view of the 9× super-resolved elements of early vision. Computational Models of Visual LF using the bicubic, GMM [26], Cho et al. [11] and our al- Processing, MIT Press, pages 3–20, 1991. gorithm. Clearly, our algorithm recovers the highly textured [3] A. Agrawal, A. Veeraraghavan, and R. Raskar. Reinter- regions of the LF better than the other algorithms. Figure 5 pretable imager: Towards variable post-capture space, angle shows the reconstructed LF for a more complex scene with and time resolution in photography. Fo- translucent color liquids and a shiny ball. Again, our algo- rum, 29:763–773, May 2010. rithm produces better reconstruction. We next show refo- [4] D. Babacan, R. Ansorge, M. Luessi, P. Ruiz, R. Molina, cusing results. Figure 8 and 9 show refocused images from and A. Katsaggelos. Compressive light field sensing. IEEE the first and second scenes respectively. From the inserts, it Trans. Image Processing, 21:4746–4757, 2012. is clear that our refocusing results are superior to the current [5] C. Barnes, E. Shechtman, A. Finkelstein, and D. B. Gold- state-of-art algorithms. man. PatchMatch: A randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics Epipolar Constraints: In our approach, we do not ex- (Proc. SIGGRAPH), 28(3), Aug. 2009. plicitly constrain the super-resolution to follow epipolar ge- [6] M. Ben-Ezra and S. K. Nayar. Motion deblurring using hy- ometry. But, as shown in Figure 6, our reconstruction ad- brid imaging. In and Pattern Recognition, heres to epipolar constraints. In the future, we would like to 2003. Proceedings. 2003 IEEE Computer Society Confer- incorporate these constraints in the reconstruction. ence on, volume 1, pages I–657. IEEE, 2003. Computational Efficiency: The algorithm was imple- [7] T. E. Bishop, S. Zanetti, and P. Favaro. Light field super- mented in MATLAB and takes 1 hour on a Intel i7 third- resolution. In Proc. Int’l Conf. Computational Photography, generation processor with 32GB of RAM to compute a pages 1–9, 2009. super-resolved LF of resolution 11 MP given an input LF [8] A. Buades, B. Coll, and J.-M. Morel. A non-local algorithm with spatial resolution of 0.1 MP. The reconstruction of the for image denoising. In Computer Vision and Pattern Recog- super-resolved light field can be significantly sped up by in- nition, 2005. CVPR 2005. IEEE Computer Society Confer- corporating the epipolar constraint and reducing the search ence on, volume 2, pages 60–65. IEEE, 2005. space of all the patches lying on the same epipolar line in [9] X. Cao, X. Tong, Q. Dai, and S. Lin. High resolution multi- spectral video capture with a hybrid camera system. In Com- the Lytro camera’s light field. puter Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 297–304, 2011. 5. Conclusions [10] H. Chang, D.-Y. Yeung, and Y. Xiong. Super-resolution Current LF cameras such as the Lytro camera have low through neighbor embedding. In Proc. Conf. Comp. Vision spatial resolution. To improve the spatial resolution of the and Pattern Recognition, volume 1, pages 275–282, 2004. LF, we introduced a hybrid imaging system consisting of [11] D. Cho, M. Lee, S. Kim, and Y.-W. Tai. Modeling the cal- the low-resolution LF camera and a high-resolution DSLR. ibration pipeline of the lytro camera for high quality light- field image reconstruction. In ICCV, 2013. Using a patch-based aglorithm, we are able to marry the [12] S. Cho, J. Wang, and S. Lee. Vdeo deblurring for hand-held advantages of both cameras to produce a high-resolution cameras using patch-based synthesis. ACM Transactions on LF. Our algorithm does not require the LF camera and the Graphics, 31(4):64:1–64:9, 2012. DSLR to be co-located or any calibration information re- [13] D. Dansereau, O. Pizarro, and S. Williams. Decoding, cal- lating the cameras. We show that using our hybrid system, ibration and rectification for lenselet-based plenoptic cam- a 0.1 MP LF caputred using a Lytro camera can be super- eras. In Computer Vision and Pattern Recognition (CVPR), resolved to an 11 MP LF while retaining high fidelity in 2013 IEEE Conference on, pages 1027–1034, 2013. textured regions. Furthermore, the system enables users to [14] A. A. Efros and W. T. Freeman. Image quilting for texture refocus images with DOFs that are nine times narrower than synthesis and transfer. Proceedings of SIGGRAPH 2001, the DOF of the Lytro camera. pages 341–346, August 2001. [15] G. Freedman and R. Fattal. Image and video upscaling from Acknowledgments local self-examples. ACM Transactions on Graphics (TOG), 30(2):12, 2011. Vivek Boominathan, Kaushik Mitra and Ashok Veer- [16] W. Freeman, T. Jones, and E. Pasztor. Example-based super- araghavan acknowledge support through NSF Grants NSF- resolution. Computer Graphics and Applications, IEEE, IIS: 1116718, NSFCCF:1117939 and a research grant from 22(2):56–65, 2002.

Near Focused Mid Focused Far Focused

Hybrid

In-Focus In-Focus In-Focus

et al. et

Cho Cho

et al. et

Mitra

Bicubic

Figure 8: Refocusing results for real scene: On the top, we show the LF super-resolved using our method refocused at different depths. Below that, we show zoomed insets from refocused results produced by bicubic interpolation, Mitra et al. [26], Cho et al. [11] and our algorithm. The focused region produced by our algorithm are much more detailed compared to other results.

[17] T. Georgiev and A. Lumsdaine. Superresolution with plenop- [22] C.-H. Lu, S. Muenzel, and J. Fleischer. High-resolution tic camera 2.0. Adobe Systems Inc., Tech. Rep, 2009. light-field microscopy. In Computational Optical Sensing [18] T. Georgiev, C. Zheng, S. Nayar, B. Curless, D. Salasin, and and Imaging. Optical Society of America, 2013. C. Intwala. Spatio-angular resolution trade-offs in integral [23] Lytro. The lytro camera. https://www.lytro.com/. photography. In Eurographics Symposium on Rendering, [24] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database pages 263–272, 2006. of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecologi- [19] M. Levoy, B. Chen, V. Vaish, M. Horowitz, I. McDowall, and cal statistics. In Proc. 8th Int’l Conf. Computer Vision, vol- M. Bolas. Synthetic aperture confocal imaging. ACM Trans. ume 2, pages 416–423, July 2001. Graph., 23(3):825–834, 2004. [25] K. Marwah, G. Wetzstein, Y. Bando, and R. Raskar. Com- [20] M. Levoy and P. Hanrahan. Light field rendering. In SIG- pressive light field photography using overcomplete dictio- GRAPH, pages 31–42, 1996. naries and optimized projections. ACM TRANSACTIONS [21] C.-K. Liang, T.-H. Lin, B.-Y. Wong, C. Liu, and H. Chen. ON GRAPHICS, 32(4), 2013. Programmable aperture photography: Multiplexed light field [26] K. Mitra and A. Veeraraghavan. Light field denoising, light acquisition. ACM Trans. Graphics, 27(3):55:1–55:10, 2008. field superresolution and stereo camera based refocussing us-

Near Focused Mid Focused Far Focused

Hybrid

In-Focus In-Focus In-Focus

et al. et

Cho Cho

al.

et et

Mitra

Bicubic

Figure 9: Refocusing results for complex scene: On the top, we show the LF super-resolved using our method refocused at different depths. Below that, we show zoomed insets from refocused results produced by bicubic interpolation, Mitra et al. [26], Cho et al. [11] and our algorithm. Objects in the green snippet and the red snippet are only 2 inches apart in depth. By refocusing between, we are able to show that we attain a shallow DOF.

ing a gmm light field patch prior. In Computer Vision and [32] S. Tambe, A. Veeraraghavan, and A. Agrawal. Towards Pattern Recognition Workshops (CVPRW), 2012 IEEE Com- motion-aware light field video for dynamic scenes. In IEEE puter Society Conference on, pages 22–28. IEEE, 2012. International Conference on Computer Vision. IEEE, 2013. [27] M. Muja and D. G. Lowe. Fast approximate nearest neigh- [33] E. Tola, C. Zhang, Q. Cai, and Z. Zhang. Virtual View Gen- bors with automatic algorithm configuration. In Interna- eration with a Hybrid Camera Array. Technical report, 2009. tional Conference on Computer Vision Theory and Applica- [34] A. Veeraraghavan, R. Raskar, A. Agrawal, A. Mohan, and tion VISSAPP’09), pages 331–340. INSTICC Press, 2009. J. Tumblin. Dappled photography: Mask enhanced cameras [28] R. Ng, M. Levoy, M. Brdif, G. Duval, M. Horowitz, and for heterodyned light fields and coded aperture refocusing. P. Hanrahan. Light field photography with a hand-held ACM Trans. Graph., 26(3):69:1–69:12, 2007. plenoptic camera. Technical report, Stanford Univ., 2005. [35] A. Wang, P. R. Gill, and A. Molnar. An angle-sensitive cmos [29] Pointgrey. Profusion 25 camera. imager for single-sensor 3d photography. In Solid-State Cir- [30] Raytrix. 3d light field camera technology. cuits Conference Digest of Technical Papers (ISSCC), 2011 http://www.raytrix.de/. IEEE International, pages 412–414. IEEE, 2011. [31] J. Sun, L. Yuan, J. Jia, and H.-Y. Shum. Image completion [36] S. Wanner and B. Goldluecke. Variational light field anal- with structure propagation. ACM Transactions on Graphics ysis for disparity estimation and super-resolution. 2013 (to (ToG), 24(3):861–868, 2005. appear). [37] B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy. High per- formance imaging using large camera arrays. ACM Trans. Graph., 24(3):765–776, 2005. [38] J. Yang, J. Wright, T. Huang, and Y. Ma. Image super- resolution via sparse representation. Image Processing, IEEE Transactions on, 19(11):2861–2873, 2010. [39] M. Zontak and M. Irani. Internal statistics of a single natu- ral image. 2013 IEEE Conference on Computer Vision and Pattern Recognition, 0:977–984, 2011.