Multiple View Geometry for Video Analysis and Post-Production

University of Central Florida STARS Electronic Theses and Dissertations, 2004-2019 2006 Multiple View Geometry For Video Analysis And Post-production Xiaochun Cao University of Central Florida Part of the Computer Sciences Commons, and the Engineering Commons Find similar works at: https://stars.library.ucf.edu/etd University of Central Florida Libraries http://library.ucf.edu This Doctoral Dissertation (Open Access) is brought to you for free and open access by STARS. It has been accepted for inclusion in Electronic Theses and Dissertations, 2004-2019 by an authorized administrator of STARS. For more information, please contact [email protected]. STARS Citation Cao, Xiaochun, "Multiple View Geometry For Video Analysis And Post-production" (2006). Electronic Theses and Dissertations, 2004-2019. 815. https://stars.library.ucf.edu/etd/815 MULTIPLE VIEW GEOMETRY FOR VIDEO ANALYSIS AND POST-PRODUCTION by XIAOCHUN CAO B.E. Beijing University of Aeronautics and Astronautics, 1999 M.E. Beijing University of Aeronautics and Astronautics, 2002 A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the School of Electrical Engineering and Computer Science in the College of Engineering and Computer Science at the University of Central Florida Orlando, Florida Spring Term 2006 Major Professor: Dr. Hassan Foroosh °c 2006 by Xiaochun Cao ABSTRACT Multiple view geometry is the foundation of an important class of computer vision techniques for simultaneous recovery of camera motion and scene structure from a set of images. There are numerous important applications in this area. Examples include video post-production, scene reconstruction, registration, surveillance, tracking, and segmentation. In video post-production, which is the topic being addressed in this dissertation, computer analysis of the motion of the camera can replace the currently used manual methods for correctly aligning an artificially inserted object in a scene. However, existing single view methods typically require multiple vanishing points, and therefore would fail when only one vanishing point is available. In addition, current multiple view techniques, making use of either epipolar geometry or trifocal tensor, do not exploit fully the properties of constant or known camera motion. Finally, there does not exist a general so- lution to the problem of synchronization of N video sequences of distinct general scenes captured by cameras undergoing similar ego-motions, which is the necessary step for video post-production among different input videos. This dissertation proposes several advancements that overcome these limitations. These advancements are used to develop an efficient framework for video analysis and post-production in multiple cameras. In the first part of the dissertation, the novel inter-image constraints are intro- duced that are particularly useful for scenes where minimal information is available. This result iii extends the current state-of-the-art in single view geometry techniques to situations where only one vanishing point is available. The property of constant or known camera motion is also described in this dissertation for applications such as calibration of a network of cameras in video surveillance systems, and Euclidean reconstruction from turn-table image sequences in the presence of zoom and focus. We then propose a new framework for the estimation and alignment of camera motions, including both simple (panning, tracking and zooming) and complex (e.g. hand-held) camera motions. Accuracy of these results is demonstrated by applying our approach to video post-production applications such as video cut-and-paste and shadow synthesis. As realistic image-based rendering problems, these applications require extreme accuracy in the estimation of camera geometry, the position and the orientation of the light source, and the photometric properties of the resulting cast shadows. In each case, the theoretical results are fully supported and illustrated by both numerical simulations and thorough experimentation on real data. iv Dedicated to my wonderful wife Zhe Tian, and my caring parents. v ACKNOWLEDGMENTS I would like to express my sincere appreciation to my advisor, Dr. Hassan Foroosh, who has made this dissertation possible. I am indebted to him for his guidance, inspiration, encouragement and understanding throughout my graduate studies. I have learned a lot from him about doing research and presenting results. I would also like to thank Dr. Michael Georgiopoulos, Dr. Mark Heinrich, Dr. Charles Hughes and Dr. Annie Wu for serving as my committee members, and for their invaluable comments and suggestions. This dissertation is built upon the work that appeared in several papers, which I coauthored with my advisor and collaborators, Dr. Jiangjian Xiao, Yuping Shen, Murat Balci, Imran Junejo, Fei Lu, and Yun Zhai who contributed to this work in many different ways. I would like to especially thank Dr. Mubarak Shah with whom I worked for four semesters during my Ph.D program, and who supported me during that time. I wish to thank the truly amazing people of the Computational Imaging Lab, who made this lab such stimulating and pleasant environment to carry out research. Most importantly, I want to thank my wife, Zhe Tian, for her endless love and confidence in me. vi TABLE OF CONTENTS LIST OF TABLES . xiv LIST OF FIGURES . xv CHAPTER 1 INTRODUCTION :::::::::::::::::::::::::::: 1 1.1 Motivation . 2 1.1.1 Video Post-production . 3 1.1.2 Geometry Based Video Analysis . 4 1.1.3 Complex Camera Motion Recovery . 6 1.1.4 Photometric Information Recovery . 7 1.2 The Novel Framework . 9 1.3 Contributions . 11 1.4 Overview of the Dissertation . 13 CHAPTER 2 RELATED WORK :::::::::::::::::::::::::::: 17 2.1 Camera Calibration . 17 2.1.1 Camera Calibration Using Scene Properties . 18 vii 2.1.2 Self-Calibration . 20 2.2 Video Analysis . 22 2.2.1 Camera Motion Characterization . 22 2.2.2 Layer Segmentation . 23 2.2.3 Video Alignment . 24 2.3 Video Post-production . 26 2.3.1 Video Cut and Paste . 26 2.3.2 Video Completion . 28 CHAPTER 3 BACKGROUND GEOMETRY ::::::::::::::::::::: 29 3.1 Introduction . 29 3.2 History . 30 3.3 Notations . 31 3.4 Pin-hole Camera Model . 32 3.5 Perspective Mapping . 34 3.5.1 Planar Homography . 34 3.5.2 Planar Homology . 35 3.6 Image of the Absolute Conic . 36 3.7 Calibration from Vanishing Points . 38 viii 3.8 Epipolar geometry . 42 CHAPTER 4 CAMERA CALIBRATION USING OBJECTS ::::::::::::: 44 4.1 Introduction . 45 4.2 Our Method . 46 4.2.1 Scene Geometry and Orthogonal Constraints . 48 4.2.2 Inter-image Constraints . 50 4.2.3 Maximum Likelihood Bundle Adjustment . 54 4.2.4 Light Source Orientation Estimation . 55 4.2.5 Algorithm Outline . 56 4.3 Implementation Details . 58 4.3.1 Feature Extraction . 58 4.3.2 Computation Details of the Cross Ratio Constraint . 60 4.4 Degenerate Configurations . 61 4.4.1 Vanishing Point at Infinity . 63 4.4.2 Difficulty in Computing vx .......................... 64 4.5 Experimental Results . 66 4.5.1 Computer Simulation . 67 4.5.2 Real Data . 74 ix 4.6 Applications . 74 4.6.1 Calibration Using Symmetric and 1D Objects . 75 4.6.2 Reconstruction of Partially Viewed Symmetric Objects . 85 4.6.3 Image Based Metrology . 90 4.7 Conclusion . 94 CHAPTER 5 SELF-CALIBRATION USING CONSTANT MOTION :::::::: 96 5.1 Introduction . 97 5.2 The Method . 101 5.2.1 Self-calibration using Constant Inter-frame Motion . 101 5.2.2 Linear approach . 103 5.3 Two-stage Optimization . 107 5.3.1 Conic enforcement after compensation . 107 5.3.2 Reconstruction using bundle adjustment . 110 5.4 Experimental Results . 111 5.4.1 Computer simulation . 112 5.4.2 Real data . 115 5.5 Conclusion . 118 CHAPTER 6 VIDEO ANALYSIS :::::::::::::::::::::::::::: 122 x 6.1 Introduction . 123 6.2 3D Alignment Using Recovered Camera Motions . 126 6.3 2D Alignment of Similar General Motion . 130 6.4 2D Alignment of Similar Special Motion . 133 6.4.1 Pure Rotation . 135 6.4.2 Pure Translation . 137 6.4.3 Zooming . 140 6.4.4 Camera Motion Characterization . 141 6.5 The Implementation Details of Alignment Algorithm . 143 6.5.1 Computing the Camera Ego-Motion Features . 143 6.5.2 Timeline Model . 145 6.6 Experimental Results . 147 6.6.1 General Motion . 148 6.6.2 Pure Translation . 151 6.6.3 Pure Rotation . 152 6.7 Conclusion . 153 CHAPTER 7 3D VIDEO POST-PRODUCTION :::::::::::::::::::: 156 7.1 Introduction . 157 xi 7.2 Key-frame depth estimation . 162 7.3 Video matting with soft shadow . 165 7.3.1 Object cutout and matting . ..

Multiple View Geometry for Video Analysis and Post-Production

Depth Camera Technology Comparison and Performance Evaluation

Direct Interaction with Large Displays Through Monocular Computer Vision

Per-Pixel Calibration for RGB-Depth Natural 3D Reconstruction on GPU

Motion Capture Technologies 24

Lecture 1: Course Introduction and Overview

Research Article Human Depth Sensors-Based Activity Recognition Using Spatiotemporal Features and Hidden Markov Model for Smart Environments

Evaluation of Microsoft Kinect 360 and Microsoft Kinect One for Robotics and Computer Vision Applications

THE Tof CAMERAS

Ptzoptics VL-ZCAM

User Manual Model No: PTEPTZ-ZCAM-G2 V1.3 (English)

E2 User Manual

SR-4000 and Camcube3.0 Time of Flight (Tof) Cameras: Tests and Comparison