Software Simulation of Depth of Field Effects in Video from Small Aperture Cameras Jordan Sorensen
Total Page:16
File Type:pdf, Size:1020Kb
Software Simulation of Depth of Field Effects in Video from Small Aperture Cameras MASSACHUSETTS INSTITUTE by OF TECHNOLOGY Jordan Sorensen AUG 2 4 2010 B.S., Electrical Engineering, M.I.T., 2009 LIBRARIES Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of ARCHNES Master of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2010 @ Massachusetts Institute of Technology 2010. All rights reserved. Author .. Depi4nfent of Electrical Engineering and Computer Science May 21, 2010 Certified by....................................... ... Ramesh Raskar Associate Professor, MIT Media Laboratory Thesis Supervisor Accepted by. ...... ... ... ..... .... ..... .A .. Dr. Christopher YOerman Chairman, Department Committee on Graduate Theses 2 Software Simulation of Depth of Field Effects in Video from Small Aperture Cameras by Jordan Sorensen Submitted to the Department of Electrical Engineering and Computer Science on May 21, 2010, in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science Abstract This thesis proposes a technique for post processing digital video to introduce a simulated depth of field effect. Because the technique is implemented in software, it affords the user greater control over the parameters of the effect (such as the amount of defocus, aperture shape, and defocus plane) and allows the effect to be used even on hardware which would not typically allow for depth of field. In addition, because it is a completely post processing technique and requires no change in capture method or hardware, it can be used on any video and introduces no new costs. This thesis describes the technique, evaluates its performance on example videos, and proposes further work to improve the technique's reliability. Thesis Supervisor: Ramesh Raskar Title: Associate Professor, MIT Media Laboratory 4 Acknowledgements Thanks to Ramesh Raskar for his guidance and suggestions as my thesis advisor, and for his help in brainstorming research ideas. Thanks to the rest of the Camera Culture group for allowing me a glimpse of the research and publication process, and for everything I have learned from them over the past year and a half. I would also like to thank Sharmeen Browarek for her help formatting this document and for her assistance in navigating the world of M. Eng theses at MIT. Thanks to Erik, Jeff, and all my friends for their constant support throughout my MIT career. Finally, thanks to my family for helping me survive graduate life. Curt and Laura, your visit helped me relax when I was most worried. Mom and Dad, your care pack- ages kept me well fed while I wrote this thesis and your weekly calls reminded me that someone was thinking of me. 6 Contents 1 Introduction 1.1 Motivation for Simulating Depth of Field 1.2 Proposal...................... 1.3 Goals of Method .. .. .. .. .. 1.4 Structure of Thesis . .. .. .. .. .. 2 Background on Depth of Field and View S ynthesis 2.1 Modeling a Camera . .. .. .. .. .. ..... .... ..... 19 2.1.1 Pinhole Camera Model .. .. .. 2.1.2 Finite Aperture Model . .. .. 2.1.3 Modeling a Finite Aperture Camera Using Pinhole Camera Model......... ... .. 2.2 Video Stabilization . ... .. .. .. .. 2.2.1 Re-Cinematography .. .. .. .. 2.2.2 Light Field Video Stabilization .. 2.2.3 Content Preserving Warps .. .. 2.3 Depth of Field Simulation for Still Images 2.4 Unstructured Lumigraph Rendering .. .. 2.5 Feature Point Detection . .. .. .. .. 2.6 Structure from Motion . .. .. .. .. .. 3 Method for Simulating Depth of Field 33 3.1 Overview .. .. .. .. .. .. .. .. .. 34 3.2 Feature Point Detection .. .. ... .. .. .. .. .. .. .. .. .. 34 3.3 Structure from Motion .. .. .. .. .. .. ... .. .. .. .. .. 35 3.4 Focal Plane Selection .. .. .. .. .. .. .. .. .. .. .. .. .. 36 3.5 Camera Generation . .. .. .. ... .. .. .. .. .. .. .. .. .. 36 3.6 Feature Point Projection .. ... .. .. .. ... .. .. .. ... .. 37 3.7 Image Warping .. .. .. ... .. .. .. ... .. .. ... .. .. 38 3.8 Image Matching . ... .. ... .. ... .. ... .. ... .. ... 39 3.9 Image Averaging . ... .. .. .. ... .. .. .. ... .. .. .. 40 4 Results of Simulation Method 41 4.1 Hardware Used .. .. ... ... ... .. ... ... ... .. ... 41 4.2 Test Videos .. ... .. ... .. ... .. ... .. ... .. ... .. 41 4.3 Feature Point Detection . .. ... .. ... .. ... .. ... .. .. 42 4.3.1 Voodoo...... ... .. ... .. ... ... .. ... 42 4.3.2 Voodoo v Bundler .. ... .. ... ... .. ... .. ... 44 4.4 Structure from Motion . ... .. .. ... .. .. .. ... .. .. .. 48 4.4.1 Voodoo .. .. .. ... .. .. .. .. .. .. .. .. ... .. 50 4.4.2 B undler .. .. ... .. .. ... .. .. .. ... .. .. ... 51 4.5 Feature Point Projection . ... ... ... ... ... ... .... .. 52 4.6 Image Warping .. .. .. ... .. .. .. ... .. .. ... .. .. 55 4.7 Image Matching and Averaging . ... .. ... ... ... ... .. 56 4.8 Final Video . .. ... .. .. .. ... .. .. ... .. .. .. ... 59 5 Further Work 61 5.1 Using Features Points from Other Frames ... .. ... ... ... 61 5.2 Better Triangulation Method . .. ... ... ... ... .. ... ... 63 5.3 Content Preserving Warps .... ... ... ... ... ... ... .. 65 5.4 Generalized View Synthesis Technique..... ... .. .. ... .. 66 5.5 Recinematography...... .... .... .... ... ... .. 66 5.6 Bokeh Effects .. .. .. .. .. ... .. .. .. .. .. .. .. .. .. 67 8 6 Conclusion 10 List of Figures 1-1 Can a cheap consumer camera, like the one on the left, capture video featuring advanced effects like the expensive professional camera on the right? . ... .. ... .. ... .. ... ... .. ... .. ... 16 2-1 Pinhole camera capturing point. ... ... .. ... .. ... .. ... 20 2-2 Finding point's image location using similar triangles. ... .. ... 20 2-3 Increasing the size of the aperture to a finite area. ... .. ... .. 21 2-4 A lens focuses light rays at a point. ... .. ... .. ... .. ... 22 2-5 An image captured using a large aperture (bottom) together with the same image captured using a tiny aperture .. ... .... ... ... 23 2-6 A lens can be modeled as a sum of pinhole cameras. .. ... .. .. 23 2-7 Profusion 25, a light field camera. Image courtesy Point Grey, [7] . 26 3-1 Feature points detected in one frame of test video. ... ... ... 34 3-2 Structure from motion provides 3D feature point locations and camera inform ation.. .... ... ... ... .... ... ... ... 35 3-3 The user selectes four feature points on an object they would like to stay in focus......... ..... ..... ... ... .... ... 36 3-4 Generated virtual camera positions (dots) surrounding actual camera position (plus sign). .. ... .... ... ... .... ... ... ... 36 3-5 Original feature points marked with dots; projected points with crosses. Projected points are points as seen by a virtual camera slightly to left of actual cam era. ... .... ... ... .... ... ... .... 37 3-6 Image is triangulated. The points will then be warped to their pro- jected locations, marked by crosses. .. ... ... .. ... ... .. 39 3-7 Final averaged result of one frame of test video. ... .... ... .. 40 4-1 Feature points for three sample frames of lab space video using three detection schemes with Voodoo. .... .... .... ..... .... 43 4-2 Histograms showing how many frames registered x feature points for each of the three methods. .... ..... .... .... .... .. 43 4-3 Feature points for three sample frames of lab space video using Bundler. 44 4-4 Feature points for three sample frames of lab space video using Voodoo. 45 4-5 Histogram showing how many frames registered X feature points using B undler. .... .... .... ..... .... .... ..... ... 45 4-6 Histogram showing how many frames, registered X feature points using Voodoo. ..... ... .. ... .m.. ... ... .q. ... .. 4-7 Feature points for three sample fra mes of cake box sequence using Bundler. .. 4-8 Feature points for three sample fra mes of cake box sequence using Voodoo. ..... .r... ... .. .. ... .. 4-9 Histogram showing how many frames registered X feature points using Bundler. ...... 4-10 Histogram showing how many frames registered X feature points using Voodoo. ... ... .... .... 4-11 Cake box sequence setup. The box labeled "A" is a red and yellow cake box, and is 18 cm tall. The box labeled "B" is a brown and yellow brownie box, and is 19 cm tall.... ... .... ... .... 4-12 Single frame from cake box sequence. Feature points registered by Voodoo are marked with white dots. Selected feature points used for measurement are marked with white squares with black edges..... 4-13 Estimated 3D coordinates of one frame's feature points using Voodoo structure from motion technique. Scene is viewed from overhead. Se- lected feature points are again marked with white squares with black edges. Feature points from scene background are cropped from view. 50 4-14 Single frame from cake box sequence. Feature points registered by Bundler are marked with white dots. Selected feature points used for measurement are marked with white squares with black edges. ..... 51 4-15 Estimated 3D coordinates of one frame's feature points using Bundler structure from motion technique. Scene is viewed from overhead. Se- lected feature points are again marked with white squares with black edges. Feature points from scene background are cropped from view. 52 4-16 One frame from media lab video marked with actual feature point locations and locations projected from 3D coordinates of each feature point. The original feature point is marked with a '+', the projected 3D coordinates are marked with