Accurate Depth Map Estimation from a Lenslet Light Field Camera

Accurate Depth Map Estimation from a Lenslet Light Field Camera Hae-Gon Jeon Jaesik Park Gyeongmin Choe Jinsun Park Yunsu Bok Yu-Wing Tai In So Kweon Korea Advanced Institute of Science and Technology (KAIST), Republic of Korea [hgjeon,jspark,gmchoe,ysbok]@rcv.kaist.ac.kr [zzangjinsun,yuwing]@gmail.com, [email protected] Abstract This paper introduces an algorithm that accurately estimates depth maps using a lenslet light field camera. The proposed algorithm estimates the multi-view stereo correspondences with sub-pixel accuracy using the cost volume. The foundation for constructing accurate costs is threefold. First, the sub-aperture images are displaced using the phase shift theorem. Second, the gradient costs Lytro software [1] Ours are adaptively aggregated using the angular coordinates of Figure 1. Synthesized views of the two depth maps acquired from the light field. Third, the feature correspondences between Lytro software [1] and our approach. the sub-aperture images are used as additional constraints. of adjacent sub-aperture images in Lytro is between −1 to With the cost volume, the multi-label optimization propa- 1 pixels. Consequently, it is very challenging to estimate an gates and corrects the depth map in the weak texture re- accurate depth map because the one pixel disparity error is gions. Finally, the local depth map is iteratively refined already a significant error in this problem. through fitting the local quadratic function to estimate a In this paper, an algorithm for stereo matching between non-discrete depth map. Because micro-lens images con- sub-aperture images with an extremely narrow baseline is tain unexpected distortions, a method is also proposed that presented. Central to the proposed algorithm is the use of corrects this error. The effectiveness of the proposed algo- the phase shift theorem in the Fourier domain to estimate rithm is demonstrated through challenging real world ex- the sub-pixel shifts of sub-aperture images. This enables the amples and including comparisons with the performance of estimation of the stereo correspondences at sub-pixel accu- advanced depth estimation algorithms. racy, even with a very narrow baseline. The cost volume is computed to evaluate the matching cost of different dis- 1. Introduction parity labels, which is defined using the similarity measure- ment between the sub-aperture images and the center view The problem of estimating an accurate depth map sub-aperture image shifted at different sub-pixel locations. from a lenslet light field camera, e.g. LytroTM [1] and Here, the gradient matching costs are adaptively aggregated RaytrixTM [19], is investigated. Different to conventional based on the angular coordinates of the light field camera. cameras, a light field camera captures not only a 2D image, In order to reduce the effects of image noise, the but also the directions of the incoming light rays. The addi- weighted median filter was adopted to remove the noise tional light directions allow the image to be re-focused and in the cost volume, followed by using the multi-label op- the depth map of a scene to be estimated as demonstrated timization to propagate reliable disparity labels to the weak in [12, 17, 19, 23, 26, 29]. texture regions. In the multi-label optimization, confident Because the baseline between sub-aperture images from matching correspondences between the center view and a lenslet light field camera is very narrow, directly applying other views are used as additional constraints, which as- the existing stereo matching algorithms such as [20] can- sist in preventing oversmoothing at the edges and texture not produce satisfying results, even if the applied algorithm regions. Finally, the estimated depth map is iteratively re- is a top ranked method in the Middlebury stereo matching fined using quadratic polynomial interpolation to enhance benchmark. As reported in Yu et al.[29], the disparity range the estimated depth map with sub-label precision. 1 In the experiments, it was found that a micro-lens im- images captured using a lenslet light field camera. age of lenslet light field cameras contains depth distor- Compared with previous studies, the proposed algorithm tions. Therefore, a method of correcting this error is also computes the cost volume that is based on sub-pixel multi- presented. The effectiveness of the proposed algorithm is view stereo matching. Unique in the proposed algorithm demonstrated using challenging real world examples that is the usage of the phase shift theorem when performing were captured by a Lytro camera, a Raytrix camera, and a the sub-pixel shifts of sub-aperture image. The phase shift lab-made lenslet light field camera. A performance compar- theorem allows the reconstruction of the sub-pixel shifted ison with advanced methods is also presented. An example sub-aperture images without introducing blurriness in con- of the results of the proposed method are presented in Fig.1. trast to spatial domain interpolation. As is demonstrated in the experiments, the proposed algorithm is highly effective 2. Related Work and outperforms the advanced algorithms in depth map estimation using a lenslet light field image. Previous work related to depth map (or disparity map1) estimation from a light field image is reviewed. Compared 3. Sub-aperture Image Analysis with conventional approaches in stereo matching, lenslet light field images have very narrow baselines. Conse- First, the characteristics of sub-aperture images obtained quently, approaches based on correspondence matching do from a lenslet-based light field camera are analyzed, and not typically work well because the sub-pixel shift in the then the proposed distortion correction method is described. spatial domain usually involves interpolation with blurri- 3.1. Narrow Baseline Sub-aperture Image ness, and the matching costs of stereo correspondence are highly ambiguous. Therefore, instead of using correspon- Narrow baseline. According to the lenslet light field cam- dence matching, other cues and constraints were used to es- era projection model proposed by Bok et al.[5], the view- timate the depth maps from a lenslet light field image. point (S; T ) of a sub-aperture image with an angular direc- 2 Georgiev and Lumsdaine [7] computed a normalized tion s = (s; t) is as follows: cross correlation between microlens images in order to es- S D s=f timate the disparity map. Bishop and Favaro [4] introduced = (D + d) x ; (1) T d t=fy an iterative method for a multi-view stereo image for a light field. Wanner and Goldluecke [26] used a structure tensor where D is the distance between the lenslet array and the to compute the vertical and horizontal slopes in the epipolar center of main lens, d is the distance between the lenslet plane of a light field image, and they formulated the depth array and imaging sensor, and f is the focal length of the map estimation problem as a global optimization approach main lens. With the assumption of a uniform focal length that was subject to the epipolar constraint. Yu et al.[29] (i.e. fx = fy = f), the baseline between two adjacent sub- analyzed the 3D geometry of lines in a light field image (D+d)D aperture images is defined as baseline := df . and computed the disparity maps through line matching be- Based on this, we need to shorten f, shorten d, or tween the sub-aperture images. Tao et al.[23] introduced lengthen D for a wider baseline. However, f cannot be too a fusion method that uses the correspondences and defocus short because it is proportional to the angular resolution of cues of a light field image to estimate the disparity maps. the micro-lenses in a lenslet array. Therefore, the maximum After the initial estimation, a multi-label optimization is ap- baseline that is multiplication of the baseline and angular plied in order to refine the estimated disparity map. Heber resolution of sub-aperture images remains unchanged even and Pock [8] estimated disparity maps using the low-rank if the value of f varies. If the physical size of the micro- structure regularization to align the sub-aperture images. lenses is too large, the spatial resolution of the sub-aperture In addition to the aforementioned approaches, there have images is reduced. Shortening d enlarges the angular dif- been recent studies that have estimated depth maps from ference between the corresponding rays of adjacent pixels light field images. For example, Kim et al.[10] estimated and might cause radial distortion of the micro-lenses. Fi- depth maps from a DSLR camera with movement, which nally, lengthening D increases the baseline, but the field simulated the multiple viewpoints of a light field image. of view is reduced. Due to these challenges, the disparity Chen et al.[6] introduced a bilateral consistency metric on range of sub-aperture images is quite narrow. For example, the surface camera in order to estimate the stereo correspon- the disparity range between adjacent sub-aperture views of dence in a light field image in the presence of occlusion. the Lytro camera is smaller than ±1 pixel [29]. However, it should be noted that the baseline of the light 2The 4D parameterization [7, 17, 26] is followed where the pixel co- field images presented in Kim et al.[10] and Chen et al.[6] ordinates of a light field image I are defined using the 4D parameters of are significantly larger than the baseline of the light field (s; t; x; y). Here, s = (s; t) denotes the discrete index of the angular directions and x = (x; y) denotes the Cartesian image coordinates of each 1We sometimes use disparity map to represent depth map. sub-aperture image. (a) Before compensation (b) After compensation Rotate Pivot (c) EPI compensation (d) EPI difference Bilinear Bicubic Phase Original Figure 2.

Load more