<<

The “Vertigo Effect” on Your Smartphone: Dolly Zoom via Single View Synthesis Supplement

Yangwen Liang Rohit Ranade Shuangquan Wang Dongwoon Bai Jungwon Lee Samsung Semiconductor Inc {liang.yw, rohit.r7, shuangquan.w, dongwoon.bai, jungwon2.lee}@samsung.com

1. View Synthesis based on Camera Geometry where k is the same as in the paper such that subject size on focus plane remains the same, i.e. Recall from our paper, consider two pin–hole cameras A C C B and B with camera centers at locations A and B, respec- f1 D0 − t1 tively. From [2], based on the coordinate system of camera k = A = . (5) f1 D0 A, the projections of the same point P ∈ R3 onto these two camera image planes have the following closed–form rela- From Eq. 1, we can then obtain the closed–form solution B A tionship for u1 in terms of u1 as:

uB DA uA KBT A A K R K −1 B D1 (D0 − t1) A t1(D1 − D0) = B ( A) + (1) u u u0 1 DB 1 DB 1 = A 1 + A     D0(D1 − t1) D0(D1 − t1) T A (6) where can also be written as (D0 − t1)f1 m1 − A . n1 T = R (CA − CB) . (2) D0(D1 − t1)  

Here, the 2 × 1 vector uX , the 3 × 3 matrix KX , and the By setting m1 = n1 = 0 and t1 = t, Eq. 6 reduces to scalar DX are the pixel coordinates on the image plane, the Eq. (3) in our paper. P intrinsic parameters, and the depths of for camera X, 1.2. Generalized Dual Camera System X ∈ {A, B}, respectively. The 3 × 3 matrix R and the 3 × 1 vector T are the relative rotation and translation of Similarly, a generalized formula for camera movements camera B with respect to camera A. can be derived using Eq. 1 for this case as well. Here, we assume this dual camera system is well calibrated, and there 1.1. Generalized Single Camera System is no relative rotation between cameras at location C2 and B A Under dolly zoom condition, a generalized formula for C1 (and C1 ). As in Section 1.1, let the camera center B camera movements on the horizontal and/or vertical direc- differences between camera 2 at location C2 and C1 be tions along with the principal axis can be derived using B T Eq. 1. Let the camera center differences between camera C2 − C1 = −m2, −n2, −t2 , (7) A B 1 from position C1 to C1 be  so that the camera movesby m2, n2 and t2 in the horizontal, A B T C1 − C1 = −m1, −n1, −t1 , (3) vertical and along the principalaxis directions. The baseline  b is assumed included in the m2 and/or n2. As in the paper, so that the camera movesby 1, 1 and 1 in the horizontal, m n t the intrinsic matrix K2 of camera 2 can be related to that of vertical and along the principal axis directions. We assume A camera 1 at position C1 as there is no relative rotation during camera translation. This A ′ ′ assumptionis valid due to the fact thatwe are creating a syn- K2 = K1 diag{k , k , 1} (8) B thetic image at camera location C1 from image captured at A B ′ location C1 . As our paper shows, the intrinsic matrix K1 where the factor k can be given as B at camera center C1 is A ′ f2 tan(θ1 /2) KB KA k = A = . (9) 1 = 1 diag{k, k, 1} , (4) f1 tan(θ2/2) B From Eq. 1, we can obtain the closed–form solution for u1 subjects within the scene remains constant for single image in terms of u2 as capture. Assuming the lens aperture A remains the same, A we can obtain the CoC size as: uB D2k u u u f1 k m2 1 = ′ ( 2 − 0)+ 0 + |D − D0| D (D2 − t2)k (D2 − t2)  n2  c(t)= Am = c(0) , (16) (10) D − t D − t By setting m2 = b, n2 = 0 and t2 = t, Eq. 10 reduces to where c(t) is the CoC diameter for an object at depth D Eq. (6) in our paper. and the camera translation t along the principal axis. As we 1.3. Shallow (SDoF) can observe, the initial CoC c(0) is not of concern in our derivation, and any flavor of initial CoC size or shape can Since the focus of this paper is not on the shallow be applied, cf. e.g. [3, 6]. Eq. 16 provides the CoC size depth of field (SDoF) rendering, any acceptable synthetic update to mimic the real dolly zoom effect. SDoF methods on a mobile device can be applied, cf. e.g. [4, 3, 1, 6]. However, to apply synthetic SDoF effect on 2. Experiment Results each dolly zoom frame, the size of the circle of confusion 2.1. Supplementary Videos (CoC) c of the blur kernel is needed to be determined. Un- like synthetic SDoF effects for image capture, cf. e.g. [1], We have included supplementary videos showing the re- the blur strength varies not only according to the depth but sults of our synthesis pipeline described in Section 2.6 of also the dolly zoom translation parameter t for the dolly the main paper. In the dual camera synthesis case, the zoom effect. video shows the input images (I1 and I2), their depth maps Assuming a thin lens camera model [2], the relation be- (D1 and D2), the synthesized views and the correspond- twen c, lens aperture A, magnification factor m, depth to an ing ground truth for successively greater dolly zoom angles object under focus D0 and another object at depth D can be (without and with the SDoF effect) compiled into a video given as [5]: sequence. Similarly, the single camera synthesis case shows |D − D0| the input image (I) and its depthmap (D), the generation of c = Am , (11) D the two images (I1 and I2) and their depth maps (D1 and where magnification factor m is defined as D2) (formed from image I and depth D of each image set as described in Section 2.6, Steps 2–3 of the main paper) f m = . (12) and the results of our synthesis pipeline for successively D0 − f greater dolly zoom angles. This case also shows the result Eq. 11 and 12 are satisfied when there is no zoomingapplied of our pipeline applied to an image set from the smartphone for the camera, i.e. the of the thin len f is dataset (without and with the SDoF effect). fixed [5]. Under the dolly zoom condition as introduced in 2.2. Qualitative Results Section 2 in our paper, the focal length changes according to the movement t along the principal axis. Here, we denote Figures 1, 2 and 3 show the results of our methodapplied the focal length with repect to t as f(t). Therefore, the to images from the smartphone dataset. In each example, relationship between f(t) and t is as shown in Section 2.1 (a) and (b) are the input images I1 and I2 (formed from im- in the paper, i.e., age I of each image set as described in Section 2.6, Steps 2–3 of the main paper) and (c) is the input image I1 with D0 − t f(t)= f(0) . (13) the SDoF effect applied. We then apply the single camera D0 single shot view synthesis pipeline described in Section 2.6 (Steps 4–8) of the main paper to generate synthesized im- Accordingly, the magnification factor m(t) with respect to ages for successfully greater dolly zoom angles ((d)–(i)). t can be obtained as f(t) References m(t)= . (14) (D0 − t) − f(t) [1] Jonathan T. Barron, Andrew Adams, YiChang Shih, and Car- By substituting Eq. 13 into Eq. 14, we can obtain los Hern´andez. Fast bilateral–space stereo for synthetic defo- cus. The IEEE Conference on Computer Vision and Pattern f(0) Recognition (CVPR), June 2015. m(t)= = m(0) = m . (15) D0 − f(0) [2] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Wiley, 2007. Eq. 15 perfectly aligns with pinhole camera model under [3] David E. Jacobs, Jongmin Baek, and Marc Levoy. Focal stack dolly zoom, i.e. the magnification factor for subjects in fo- for depth of field control. Stanford Computer cus is fixed. Also, the relative depth |D − D0| between Graphics Laboratory Technical Report 2012-1, October 2012. [4] M. Kraus and M. Strengert. Depth-of-field rendering by pyra- midal image processing, 2007. [5] Richard Szeliski. Computer Vision: Algorithms and Applica- tions. Springer, 2011. [6] Neal Wadhwa, Rahul Garg, David E. Jacobs, Bryan E. Feld- man, Nori Kanazawa, Robert Carroll, Yair Movshovitz-Attias, Jonathan T. Barron, Yael Pritch, and Marc Levoy. Syn- thetic depth–of–field with a single–camera mobile phone. ACM Transactions on Graphics (TOG), 37(4):64:1–64:13, July 2018. (a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Figure 1: Single camera single shot dolly zoom view synthesis with smartphonedataset - Example 1. Here (a) and (b) are the input images I1 and I2, (c) is the image I1 from (a) with the SDoF effect applied, while (d)–(i) are the synthesized images with our method (after the application of the SDoF effect), for successively greater dolly zoom angles. (a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Figure 2: Single camera single shot dolly zoom view synthesis with smartphonedataset - Example 2. Here (a) and (b) are the input images I1 and I2, (c) is the image I1 from (a) with the SDoF effect applied, while (d)–(i) are the synthesized images with our method (after the application of the SDoF effect), for successively greater dolly zoom angles. (a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Figure 3: Single camera single shot dolly zoom view synthesis with smartphonedataset - Example 3. Here (a) and (b) are the input images I1 and I2, (c) is the image I1 from (a) with the SDoF effect applied, while (d)–(i) are the synthesized images with our method (after the application of the SDoF effect), for successively greater dolly zoom angles.