3-D SCENE RECONSTRUCTION FROM LINE CORRESPONDENCES BETWEEN MULTIPLE VIEWS

A dissertation proposal submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science and Engineering

By

Michael Linger Ph.D., Computer Science and Engineering, Wright State University, 2014 M.S., Computer Science, Wright State University, 2009 B.S., Computer Science, Wright State University, 2005 B.S., Mathematics, Wright State University, 2005 A.A.S., Electronics Engineering Technology, ITT Technical Institute, 1996

2014 Wright State University

WRIGHT STATE UNIVERSITY

GRADUATE SCHOOL

December 1, 2014 I HEREBY RECOMMEND THAT THE DISSERTATION PREPARED UNDER MY SUPERVISION BY Michael Linger ENTITLED 3-D Scene Reconstruction from Line Correspondences between Multiple Views BE ACCEPTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy.

Arthur A. Goshtasby, Ph.D. Dissertation Director and Director, Computer Science and Engineering Graduate Program

Robert E. W. Fyffe, Ph.D. Vice President for Research and Dean of the Graduate School

Committee on Final Examination

Arthur A. Goshtasby, Ph.D.

Mateen Rizki, Ph.D.

Bin Wang, Ph.D.

Thomas Wischgoll, Ph.D.

Fred Garber, Ph.D. ABSTRACT

Linger, Michael Ph.D. Computer Science and Engineering, Wright State University, 2014. 3-D Scene Reconstruction from Line Correspondences between Multiple Views.

Three-dimensional scene reconstruction from 2-D images has many applications, such as surveillance, mission planning, autonomous navigation systems, cartography, and target recognition. Of specific interest to this research is the reconstruction of urban scenes containing man-made structures, such as roads and buildings, to support the burgeoning surveillance industry. Using 3-D maps to augment existing mission planning cartography products

(DTED/SRTM, CADRG, CIB), mission and event planners will be able to compute strategic line- of-sight coverage for threat avoidance or threat prosecution. Forensic video analysts can use these models to recreate crime scenes, while law enforcement can build flight plans to minimize occlusions from tall structures in their persistent surveillance systems.

Traditional methods of 3-D scene reconstruction leverage image points as a primitive element. Various approaches detect and correlate points for use in triangulation and 3-D reconstruction. Little work has been done in 3-D reconstruction using lines as primitives. In my research, I detect line segments and their associated planar surfaces. Lines detected in 2-D images are back projected to corresponding planar patches and triangulated via linear incident relations resulting in a reconstructed 3-D wireframe model. This research uses high resolution imagery at close range as could be collected by autonomous drones. It reduces data by an order of magnitude by exploiting the point-line duality of projective geometry.

iii

TABLE OF CONTENTS

1 Introduction ...... 1

1.1 Purpose ...... 1

1.2 Problem Description ...... 2

1.3 Scope ...... 4

1.4 Document Overview ...... 5

2 Background ...... 6

2.1 Introduction to Geometry ...... 6

2.2 Geometric Primitives in Cartesian and Homogeneous Coordinates ...... 7

2.3 Introduction to 2-D Projective Geometry ...... 10

2.4 Introduction to 1-D Projective Geometry ...... 12

2.5 Pinhole Camera Model ...... 14

3 Method ...... 16

3.1 Overview ...... 16

3.2 Line Detection ...... 17

3.3 Camera Computation ...... 25

3.3.1 Calibration Box ...... 25

3.3.1.1 Edge Detection ...... 26

3.3.1.2 Hough Transform ...... 26

3.3.1.3 Connected Graph ...... 27

3.3.1.4 Camera Parameters ...... 30

3.3.1.5 Correlation ...... 31

iv 3.3.2 Surveyed Camera ...... 32

3.4 Planar Surface Detection ...... 35

3.5 Homographic Registration ...... 38

3.5.1 Global Rotation ...... 39

3.5.2 Putative Correspondence...... 40

3.5.3 Point Descriptor ...... 42

3.5.4 Template Matching Via Point Descriptor ...... 43

3.5.5 2-D Homographic Registration from Correspondences ...... 45

3.5.5.1 Registration from Point Correspondences ...... 45

3.5.5.2 Registration from Line Correspondences ...... 47

3.5.5.3 Registration from Line Correspondences and Their 1-D Homographies ...... 49

3.6 Line Correspondence ...... 53

3.7 3-D Wireframe ...... 54

3.8 Polygon Boundary ...... 57

4 Results ...... 58

4.1 Edge Detection ...... 58

4.2 Line Detection ...... 60

4.3 3-D Models ...... 62

4.3.1 Model Doll House ...... 62

4.3.2 House Reconstruction ...... 65

4.4 Error Analysis ...... 68

5 Conclusion ...... 70

5.1 Summary of Contributions ...... 70

v 5.2 Challenges Encountered ...... 72

5.3 Recommendations for Future Research ...... 73

Appendix A Geometric Parameterization of a 2-D Homography ...... 75

Appendix B Orthogonal Regression ...... 80

Appendix C Multiple Regression ...... 84

Appendix D Bilinear Interpolation ...... 85

Appendix E Line Templates ...... 87

Appendix F 2-D Image Registration with Points ...... 89

Appendix G 2-D image Registration with Lines ...... 93

Appendix H Image Registration with Lines and 1-D Homographies ...... 97

Appendix I Tensor Commutativity with the Kronecker Product ...... 103

Encyclopedic / Textbook References ...... 105

Scholarly References ...... 106

vi LIST OF FIGURES

Figure 1 Isomorphic Geometries ...... 6

Figure 2 Depiction of a 2-D line ...... 8

Figure 3 Depiction of a plane in 3-D ...... 9

Figure 4 Depiction of a line in 3-D ...... 9

Figure 5 Two projections of a planar surface ...... 11

Figure 6 1-D Projective Geometry ...... 12

Figure 7 Pinhole Camera Model ...... 14

Figure 8 Pinhole camera arithmetic ...... 15

Figure 9 Pipelined 3-D model algorithm ...... 17

Figure 10 Example of a Line Segment...... 19

Figure 11 Near-separability of the pixels from Figure 10...... 19

Figure 12 Line Orientation ...... 20

Figure 13 Example of a an elongated region about five pixels in width that contains an edge. .... 21

Figure 14 Edge Detection using 2nd Moment Filter ...... 22

Figure 15 Line Segment Templates ...... 22

Figure 16 Line Segment Template Match...... 23

Figure 17 Quiver plot depicting edge orientation ...... 24

Figure 18 Clusters of edge pixels ...... 24

Figure 19 Calibration box ...... 25

Figure 20 Linear correlation and distribution of calibration box ...... 26

Figure 21 Hough Transform ...... 27

Figure 22 Hough Transform ...... 27

vii Figure 23 Cube graph edge numbering ...... 29

Figure 24 Iterative Refinement of Calibration Cube ...... 30

Figure 25 Depiction of Camera Center and Rotation...... 33

Figure 26 Intrinsic Camera Calibration ...... 33

Figure 27 Longest scene lines ...... 36

Figure 28 Scene subdivided by octants ...... 37

Figure 29 Exhaustive Putative Correspondence ...... 37

Figure 30 Distance between skew lines ...... 38

Figure 31 Rotation histogram of two images ...... 40

Figure 32 Correlation score and global rotation ...... 40

Figure 33 Putative correspondence ...... 41

Figure 34 Directed Polynomial Point Descriptor ...... 42

Figure 35 Putative Line Correspondence ...... 44

Figure 36 1-D homography ...... 44

Figure 37 Homographic Line Correspondence ...... 48

Figure 38 Resampled Line Segments...... 54

Figure 39 3-D Line Segment Endpoints ...... 55

Figure 40 Transformation of 3-D coordinates to2-D planar coordinates ...... 57

Figure 41 Construction of a non-convex hull ...... 58

Figure 42 Edge Detection Results ...... 59

Figure 43 Hough space ...... 60

Figure 44 Line Detection Results ...... 61

Figure 45 Doll House Wireframe ...... 63

viii Figure 46 Doll House Textures ...... 64

Figure 47 House Wireframe ...... 66

Figure 48 House Textures ...... 67

Figure 49 Illustration of depth error ...... 68

Figure 50 Relative depth error ...... 69

Figure 51 Planar surfaces used in error analysis ...... 69

Figure 52 Rotation Trigonometry ...... 75

Figure 53 Equations of a 2-D line ...... 80

Figure 54 Trigonometric Substitution ...... 82

Figure 55 Similar Triangles ...... 85

Figure 56 Bilinear Interpolation ...... 86

Figure 57 Template Grid ...... 87

Figure 58 Line Template Generation ...... 88

Figure 59 Enumeration of 7x7 line segment templates ...... 88

Figure 60 Images and Corresponding Points ...... 89

Figure 61 2-D Image Registration from Points ...... 92

Figure 62 Images and Corresponding Lines ...... 93

Figure 63 2-D Image Registration from Lines ...... 96

Figure 64 Images and Corresponding Lines ...... 97

ix LIST OF TABLES

Table 1 Points in Cartesian and Homogeneous Coordinates ...... 7

Table 2 Taxonomy of 2-D Geometries [3] ...... 10

Table 3 Error Analysis ...... 70

Table 4 Corresponding Points from Figure 60 ...... 89

Table 5 Normalized Data ...... 90

Table 6 Endpoints from Corresponding Lines ...... 93

Table 7 Normalized Endpoints from Corresponding Lines ...... 94

Table 8 Normalized Lines from Right Image in Homogeneous Coordinates ...... 95

Table 9 Endpoints from Corresponding Lines ...... 97

Table 10 Möbius Transforms ...... 97

Table 11 Normalized Endpoints from Corresponding Lines ...... 98

Table 12 Normalized Möbius Transforms ...... 99

x LIST OF EQUATIONS

Equation 1 Factored 2-D Projective Transformation ...... 10

Equation 2 Parametric equations of a line ...... 12

Equation 3 Pseudo Inverse of the 1-D-to-2-D Line Parameterization ...... 13

Equation 4 Projection of 2-D homography onto a 1-D homography ...... 13

Equation 5 Pinhole Camera Matrix ...... 15

Equation 6 Non-Linear 2nd Moment Filter ...... 21

Equation 7 Camera Rotation Matrix ...... 33

Equation 8 Orthonormal equations from a 3-D to 2-D planar homography ...... 34

Equation 9 Conversion from Full-Scale to Normalized Coordinates ...... 46

Equation 10 Three Linear Equations from a Line Correspondence and 1-D Homography ...... 51

Equation 11 Full-Scale to Normalized Möbius Isomorphism ...... 52

Equation 12 Normalized Equations with Line Correspondence and 1-D Homography ...... 53

xi ACKNOWLEDGEMENT

I would like to thank my committee members for their advice and guidance in this research.

I also would like to thank Restrepo, Mayer, Ulusoy, Mundy and the group at

Brown University for providing image data used in this work and Yiying Tong from Michigan

State University for providing OpenGL starter code used in the visualization of 3-D models.

Furthermore, I would like to thank those who have financially contributed to my education and research including Glenn and Diana Linger, Robbins and Myers, TRW, Northrop Grumman and

Wright State University.

xii

To my wife, Roya and my children, Bobby and Joey

xiii 1 Introduction

Three-Dimensional scene recovery is the process of finding the 3-D geometry of a scene from a series of 2-D images. Historically, approaches at 3-D scene reconstruction rely on the triangulation of identifiable points distributed over 2-D imagery. A Delaunay triangulation is formed over the reconstructed 3-D point cloud and the triangles are then texture mapped with the

2-D image data. For small image sizes of 1280×720 pixels, detected points number in the tens of thousands per image. For larger image sizes of 5456 × 3632 pixels, detected points number in the hundreds of thousands per image. In this research, I propose the detection of 2-D line segments as a primitive image feature and triangulation between their correspondences across multiple images to produce 3-D wireframe models of urban structures. In small images of 1280 × 720 pixels, detected line segments number in the hundreds. For larger images of 5456 × 3632 pixels, line segments number in the thousands per image. The advantage of using line segments as a primitive element over point-based reconstruction is in the memory and computational complexity savings of using fewer primitives.

1.1 Purpose

Three-Dimensional scene reconstruction has many applications, Examples are: surveillance, mission planning, autonomous navigation, cartography, and target recognition. Of specific interest to this research is the reconstruction of urban scenes using man-made structures, such as roads and buildings abundant in line-segments. Reconstructed structures can be used to augment existing mission planning cartography products (DTED/SRTM, CADRG, CIB), and mission and event planners will be able to compute strategic line-of-sight coverage for threat avoidance or threat prosecution. Forensic video analysts can use these models to recreate crime scenes, while law enforcement can generate flight plans to minimize coverage gaps from airborne sensors in their persistent surveillance systems.

1 1.2 Problem Description

A typical workflow in the reconstruction process involves an assumption of geometry, the detection of primitive elements in the images, correlation of primitives between images and 3-D reconstruction from triangulation predicated upon the assumption of scene geometry.

Various geometries are used in 3-D scene recovery. These geometries include Euclidean, similarity, affine, projective, [4], [5], [6]. High altitude surveillance applications where scene relief is not detectable, may approximate a scene by a plane, while low altitude applications require use of projective geometry when scene is planar. More restrictive geometries such as similarity and Euclidean may be used when the images are captured from nadir view. Knowledge about the scene geometry can be used to select the constraints needed to solve the problem.

Many approaches to the detection of primitives have been employed with varying degrees of success throughout the years. Of specific interest to this research are point and line primitives.

Much research exists in point detection, limited research exists in line detection. While my research doesn’t leverage point primitives, I am interested in their detection to assess their feasibility in line detection and in correlation. Point detection algorithms employ image gradients to locate point landmarks. Image gradients are used in edge detection [7], [9], [10], [11] and [12] as a prerequisite to line detection as is the case in [8] and [13]. Detected primitives are forwarded to an algorithm that finds the correspondence between them.

Primitive correspondence is the process locating the same landmarks or features in multiple images. If I can find correspondence between line segments in successive video frames,

I can employ transitivity to find correspondence between lines in multiple frames. Many correspondence algorithms have been attempted using descriptors of primitive elements [25]-

[28], assumption of scene/camera geometry [17]-[23] and tracking of points [45]-[47]. Point descriptors are widely used in the literature. A point descriptor is a real-valued vector usually based upon image intensity gradient information in the neighborhood of the point. Point descriptors are not unique, but they do constrain the search space when trying to find

2 correspondence between points in images. Little research exists in use of line descriptors in image correspondence. By using knowledge of the scene geometry, the location/size/orientation of a primitive in one image constrains the possible location/size/orientation in another image of the same scene, reducing the correspondence search space. In tracking, I can use a crude approximation of scene geometry or the assumption of a small perturbation between video frames to constrain the search space in finding a correspondence. Image registration [29]-[44] is the process of resampling a newly sensed image into the geometry of a reference image. Historical approaches use geometric constraints where the transformation between the images is completely known. Adaptive registration techniques partition a scene based upon localized image details and perform piecewise registration. It is then possible to align the transformed sensed image with the reference image so that corresponding primitives align.

Once correspondence between primitive elements are determined, 3-D scene reconstruction may be performed by using incident relationships imposed by the underlying camera/scene geometry [48]-[57]. Given a geometry, a primitive may be back-projected from the camera center through the image plane to intersect with back-projections from other images. A

2-D image point back-projects to a 3-D line which intersects other back-projected lines to determine the location of the 3-D point. A 2-D image line back-projects to a 3-D plane which intersects corresponding back-projections from other images to form a 3-D line.

Current approaches to 3-D scene reconstruction leverage points as a primitive element.

Correspondence between points are determined via point descriptors and the assumption of two- view . Given sufficient true positive point correspondences in two images, the

2-D epipolar geometry can be constrained between image planes such that a point in one image will be constrained to the line image of its back projection in the other image. Used in conjunction with point descriptors, the epipolar constraint can be used to reduce the false positive correspondences. Camera configurations and image planes (projection or camera matrix) may be extracted from the epipolar constraints (fundamental matrix) induced by point correspondences.

3 Points may then be back-projected to lines via the camera parameters and then via incident back- projections triangulate find the locations of 3-D scene points.

The basic workflow of computer vision (geometry assumption, detection, correspondence, reconstruction via incidence of primitive back-projections) is a standard workflow followed in this theory of reconstruction. My theory differs from standard approaches in that I use line segments as primitive elements. Line segments introduce a sub-geometry (1-D projective or Möbius) that I intend to leverage to reduce false positive correspondences and reduce the number of primitives required for determining geometric constraints between images.

My intention is to exploit the point/line duality of projective geometry to augment reconstruction theory using lines thereby inducing planar surfaces.

Traditional methods of 3-D scene reconstruction leverage image points as a primitive element. Various approaches detect and find correspondence between the points for use in triangulation and 3-D reconstruction. Little work has been done in 3-D reconstruction using lines as primitives. I believe reconstruction from lines is better suited than points in urban imagery where edges are abundant. When using point triangulation, it is common to require operator- intervention to group points into planar surfaces resulting in semi-automatic reconstruction. I hope to overcome this guided reconstruction through the auto-detection of planar surfaces induced by line correspondences.

1.3 Scope

This research is limited to the theory required in the 3-D reconstruction of urban structures having planar surfaces. A natural scene typically contains curves and contours that are not well-suited for the reconstruction via line segment primitives. The piecewise approximation of a curve via line segments does not produce accurate correspondences between multiple images. Likewise, the approximation of a contour via small planar polygons does not produce accurate correspondences between images. Such curves and surfaces will not be detectable by this approach. Planar surfaces are assumed when clusters of coplanar line segments are found.

4 However, such interpolation is not correct when the “planar” surface contains many holes as in the rungs of a ladder. In such cases, the removal of holes is left as an exercise for the next researcher continuing in this line of research.

1.4 Document Overview

Section 2 introduces the geometry prerequisites used in the remainder of this work. It contains the minimal subset of projective geometry needed to understand the method of reconstruction.

Section 3 details the reconstruction algorithm workflow by decomposing it into its constituent parts. Section 4 provides results from preliminary research in image registration as well as line detection and 3-D model reconstruction. Section 5 summarizes my contributions to the field of

Structure from Motion (SFM) theory and Computer Vision. Challenges are identified and recommendations for future research is provided. The appendices contain the derivation of formulas and algorithms that would have distracted from the flow of this document. The interested reader can explore the appendices for a deeper understanding of the arithmetic and algorithms needed to reproduce an implementation of the 3-D reconstruction workflow.

5 2 Background

This section provides the reader with the minimum understanding of the geometries required for the understanding of the method in Section 3. First, the reader is introduced to the Felix Klein definition of geometry and then 2-D and 1-D projective geometry is discussed.

2.1 Introduction to Geometry

Felix Klein defines geometry by a pair (S, G) consisting of a non-empty set S {}and a transformation group G [4]. A transformation group G is a collection of transformations

T : S  S such that G contains the identity I, the transformations of G are invertible and their inverses are in G and finally, G is closed under composition. A figure is a subset of S, f  S .

Figures are congruent (written A  B , A,B  S ) if and only if there exists a transformation t G such that A  t(B) in the sense of set equality. Congruent figures are reflexive, symmetric and transitive. Measurements are introduced as long as they are invariant with respect to congruent figures [4].

Given this abstract definition of geometry, it would seem that one can define two

different geometries (S1,G1)  (S2 ,G2 ) that are fundamentally the same. When this occurs, it is called an isometry, the geometries are said to be isomorphic and there exists a continuous

invertible conversion function μ : S1  S2 , known as an isomorphism. A transformation g G1 onto

may be lifted to hG2 if, for every figure, A  S1 , μgA hμA. In other words, a transformation g in one geometry G1 may be expressed in terms of a corresponding transformation h in an isomorphic geometry G2.

Figure 1 Isomorphic Geometries

6 Any transformation in the left image, g, may be lifted (converted) to its corresponding transformation in the right image, h, via the isomorphism μ.

Of specific interest to this research are the 2-D Cartesian plane R2 and the 2-D projective plane P2. In Cartesian coordinates, a point is specified by a pair of real numbers x, y.

Cartesian coordinates are equal x1, y1   x2 , y2  if their corresponding elements are equal, that

is x1  x2 and y1  y2 . In homographic coordinates, a point in the 2-D projective plane is specified by a triplet of real numbers x, y, w. Homographic coordinates are equal

x1, y1, w1   x2 , y2 , w2  if they are scalar multiples of each other x1  sx2 , y1  sy2 and

2 2 z1  sz2 for some real number s. The isomorphism μ : R P maps the Cartesian plane onto the projective plane via μx, y x, y,1 and the projective plane onto the Cartesian plane via

 x y  μ1x, y,w   ,  . This isomorphism necessitates the extension of the Cartesian plane for  w w  ideal points having w = 0 [5].

2.2 Geometric Primitives in Cartesian and Homogeneous Coordinates

This research uses homogeneous coordinate systems so that a ridged translation can be expressed as a linear operation. The conversion between the two coordinate systems is induced by the isomorphic relation identified in Table 1 below.

Table 1 Points in Cartesian and Homogeneous Coordinates

Cartesian Homogeneous Isomorphic Relation

3-D x y zt X Y Z Wt W x y zt  X Y Zt

2-D x yt X Y Wt W x yt  X Yt

1-D x X W t W  x  X

In 2-D, lines have one degree of freedom and may be expressed in Cartesian coordinates by the equation Ax + By + C = 0 or simply by the homogeneous coordinates [A B C]t. The

7 normal of the line is defined by the vector [A B]t and the distance from a point to the line is given

2 2 via the its projection onto the line normal Ax0  By0  C where A  B 1as shown in

Figure 2 below. Points along the line lie in the null space of the line’s coordinates and vice versa,

 x   A B C  y  0 giving rise to the point-line duality leveraged in this dissertation. w

4

3.5

3

2.5

2

y axis

1.5

1

0.5

0 0 0.5 1 1.5 2 2.5 3 3.5 4 x axis Figure 2 Depiction of a 2-D line This figure depicts a 2-D line, its normal vector and the projection of a point onto the normal via dot product for the purpose of computing orthogonal distance between the point and the line.

Similarly in 3-D, planes have two degrees of freedom and may be expressed in Cartesian coordinates by the equation Ax + By + Cz + D = 0 or simply by the homogeneous coordinates [A

B C D]t. The normal of the plane is defined by the vector [A B C]t and the distance from a point

to the plane is given via the its projection onto the line normal Ax0  By0  Cz0 where

A2  B2  C2 1as shown in Figure 3 below. Points on the plane lie in the null space of the

 x  y planes’s coordinates and vice versa, A B C D    0 .  z    w

8 0

-0.5

-1

-1.5

z axis -2

-2.5

-3 3 -3.5 4 5 4 3 6 2 1 7 y axis x axis Figure 3 Depiction of a plane in 3-D This figure depicts a 3-D plane, its normal vector and the projection of a point onto the normal via dot product for the purpose of computing orthogonal distance between the point and the line.

In 3-D, lines must be constrained by two equations to achieve one degree of freedom. Two linear

A B C D  approaches to define a line would be the right null space of a pair of planes,  1 1 1 1  or   A2 B2 C2 D2 

x y z w  the right null space of a pair of points  1 1 1 1  as shown in   x2 y2 z2 w2 

 x1 x2  A B C D  y y  Figure 4 below. Note the incidence relation,  1 1 1 1   1 2  , illustrating the     0 A2 B2 C2 D2   z1 z2    w1 w2  point-plane duality.

4

2

2 0 1

z axis 0 -2

z axis -1 -4 -2

0 1 5 2 4 2 3 3 4 2 4 5 4 5 1 3 y axis 2 x axis 6 1 x axis y axis

Figure 4 Depiction of a line in 3-D This figure depicts a 3-D line. In the left image, a line is described by the right null space of two planes. A one parameter family of points is induced by their intersection. In the right image, a line is described by the right null space of two 3-D points. A one parameter family of planes

9 (pencil of planes) is induced by their null space. The described line lies in each plane.

2.3 Introduction to 2-D Projective Geometry

Using set and group theory to define geometry, a taxonomy of sub geometries may be defined by constraining the space to a subset of S and/or the transformation group to a subgroup of G. The

2 most general 2-D geometry is projective geometry in homogeneous coordinates, (P , H3x3, det(H)≠ 0). This transformation group has eight degrees of freedom and may be factored into scale s, rotation θ, translation (tx, ty), skew α, scale ratio ρ and projective orientation (v1, v2) as shown below (See Appendix A for derivation).

1 0 0 1 ρ 0 0 s cosθ  s sin θ tx  H  0 1 0   α ρ 0  s sin θ s cosθ t       y  v1 v2 u  0 0 1  0 0 1  Equation 1 Factored 2-D Projective Transformation By factoring a homography into its geometric parameterization, one can progressively estimate each parameter and refine a transformation while setting unestimated parameters to their identities. where u is dependent upon the homographic scale factor

Affine geometry has six degrees of freedom and can be derived by zeroing the projective orientation of the 2-D projective transformation (v1, v2, u) = (0, 0, 1). By setting the skew and scale ratio parameters to their respective identities α = 0 and ρ = 1 the geometry is reduced to a similarity with four degrees of freedom. Finally, Euclidean geometry is derived by further constraining the scale to its identity s = 1 giving the transformation three degrees of freedom.

Table 2 Taxonomy of 2-D Geometries [5]

15 15 h11 h12 h13 10 10   , 5 5 h21 h22 h23

Projective y y   0 0   8 dof h31 h32 h33

-5 -5 H 1 -10 -10 -10 -5 0 5 10 15 -10 -5 0 5 10 15 x x

Affine 6 dof

10 25 25 h11 h12 h13 20 20   15 15 h h h  21 22 23 10 10

y y  0 0 1  5 5  

0 0

-5 -5

-10 -10 -10 -5 0 5 10 15 20 25 -10 -5 0 5 10 15 20 25 x x

10 10 s  cos θ  s sin θ t  5 5 x

Similarity y y s sin θ s  cos θ t  4 dof 0 0  y  -5 -5  0 0 1 

-10 -10 -10 -5 0 5 10 -10 -5 0 5 10 x x

15 15

10 10 cos θ  sin θ tx  5 5

Euclidean y y   sin θ cos θ t y 3 dof 0 0   -5 -5  0 0 1 

-10 -10 -10 -5 0 5 10 15 -10 -5 0 5 10 15 x x

Two-Dimensional projective geometry is of interest to 3-D scene reconstruction, because planar surfaces in a 3-D scene are transferred between image frames via 2-D homography (Figure

5). Four point correspondences or four line correspondences between frames are sufficient to determine the homography mapping a planar surface between two views.

150

100

50

0

z -50

-100

-150

-200

0

-50

-100

-150

-200 50 100 150 200 y -200 -150 -100 -50 0 x Figure 5 Two projections of a planar surface Given a camera center and a viewing plane, the 3-D scene may be projected onto a 2-D surface. A planar homography exists between a pair of images of a planar surface. This homography determines a point-point correspondence as well as a line-line correspondence between views. Using these correspondences and the homography, one may reconstruct the primitives (points/lines) in 3-D.

11 2.4 Introduction to 1-D Projective Geometry

To produce 2-D projective geometry, a 3-D scene is projected onto a 2-D viewing plane. This research focuses on a planar surface within a 3-D scene and its corresponding projection onto multiple 2-D video frames. Similarly, another sub-geometry may be produced by projecting a 2-

D space onto a viewing line. This research, is not concerned with the entire 2-D space (images from video), but instead focuses on each detected line at a time.

Figure 6 1-D Projective Geometry A 1-D homography exists between corresponding views of a line.

In this research, a 2-D homography determined by two views of a 3-D plane is projected onto a series of 1-D homographies determined by a pair of corresponding lines between views.

Consider the parametric equation of a line expressed in homogeneous coordinates.

(qx,qy)  q  p  x x p  l x  x     q y  p y t y   p  ,x  Pt t (x,y)   y    l 1 1  0 1  θ   (px,py)   Equation 2 Parametric equations of a line P converts 1-D coordinates t with respect to a detected line segment back into its 2-D image coordinates x by using the parametric equation of a line.

This parameterization is not invertible because it is not a square matrix. However, it is a one-to- one transformation leading one to believe that a pseudo inverse exists. A commonly used pseudo

12 1 inverse such as P  Pt P Pt is numerically unstable due to the inverse and number of 2x3 3x3 3x2 additions/multiplications. A more appropriate pseudo inverse can be constructed from a simple translation and rotation

1 0  p  cosθ sinθ 0 x cosθ sinθ  p cosθ  p sinθ P   0 1  p   x y .   y     0 0 1  0 0 1  0 0 1  Equation 3 Pseudo Inverse of the 1-D-to-2-D Line Parameterization P+ converts 2-D image coordinates to the 1-D coordinates of a detected line segment. Note that a pseudo inverse can be computed by a simple translation and a clockwise (negative) rotation. The middle row of the 3x3 rotation is removed because y=0 after rotation. By aligning a line segment with the x-axis in 2-D and discarding the y coordinate, a projection to 1-D geometry is formed.

Now, the 2-D planar homography is projected to a 1-D line homography M  P'  H P . 2x2 2x3 3x3 3x2

νx'  ωx '   νt  '  '    ωt    P  νy  H  ωy  P    2x3   3x3   3x2 ω  ν       ν   ω  Equation 4 Projection of 2-D homography onto a 1-D homography First, convert the 1-D parameterization of a line t in the first image into the 2-D image coordinates by using the parametric equation of the line P. Next, transform a 2-D image point from the first image into its corresponding 2-D point in the second image using the planar homography H. Finally, transform the 2-D coordinates of a point in the second image into the 1-D coordinates of its line t’ by using the pseudo inverse P’+ of the parameterization of the line in the second image.

A 1-D homography, also known as a Möbius transform, is a 2x2 matrix M,det(M)  0 . This transform can be lifted from 1-D projective geometry P1 to Cartesian coordinates R1 through the

ωt ωt   isomorphism μ    t necessitating an extension of the real axis to include a point at  ω  ω

a b at  b infinity. Thus, a 1-D homography is a rectangular hyperbola,    f t  . For c d ct  d small projective components, c, the 1-D homography may be estimated by a straight line through

13 a collection of 1-D point correspondences. Otherwise, the 1-D homography may be computed via three 1-D point correspondences using a 1-D projective invariant known as the cross ratio [4].

2.5 Pinhole Camera Model

In the pinhole camera model (Figure 7) 3-D points are projected to an image plane along a ray extending from the camera center through the point. Of specific interest are three coordinate systems and the transformations between them. First, we have a 3-D camera coordinate system, with the camera focal point at the origin and z-axis extending normal to the image plane.

Secondly, we have a 2-D coordinate system spanning the image plane. Finally, we have a global

3-D coordinate system defined for our scene.

Figure 7 Pinhole Camera Model In the Pinhole Camera Model, 3-D points project to an image plane via the ray extending from the camera center to the 3-D point.

We transform 3-D coordinates in the camera coordinate system to the image coordinate

X   fX   f 0 0 1 0 0 0 Y  system via similar triangles,  fY   0 f 0  0 1 0 0    , as shown in the right       Z   Z  0 0 1 0 0 1 0    1  image of Figure 7. Rotation and translation are added to transform from camera coordinates to global coordinates. Finally, scaling is added to convert between image plane units to pixel units.

14 mx 0 0  f 0 px   p11 p12 p13 p14      , ~ K  0 m 0  0 f p P  KR  I  C   p p p p   y   y     21 22 23 24  0 0 1 0 0 1   p31 p32 p33 p34 Equation 5 Pinhole Camera Matrix mx and my account for rectangular pixels. px and py represent ~ the image of the principal axis in the image plane. C is the global coordinates of the camera center.

Given a camera matrix, point transfer from global coordinates to image coordinates is given by the homogeneous equation, x  P X , while line segments are back-projected from an image to 31 34 41 a planar surface via l t  P   as shown in the left image of Figure 8. All such planar surfaces 13 34 14 intersect in the camera center indicating that the center is the right null space of the camera ~ matrix, C  P .

Figure 8 Pinhole camera arithmetic In the left image, a 2-D line segment is back projected to a 3-D planar surface via the camera matrix. In the right image, we see that the ideal line [0 0 1]t back-projects to the principal plane.

Note that the 3-D points on the principal plane must satisfy the equation,

X  x  p p p p  11 12 13 14 Y  y   p p p p     , indicating that the ideal line [0 0 1]t back projects to the    21 22 23 24 Z  0  p31 p32 p33 p34    1 

principal plane. The normal vector of the principal plane is p31 p32 p33. This vector represents the camera direction with the homogeneous ambiguity of scale. We resolve the

15  p11 p12 p13 polarization of the camera direction by multiplying by the determinant of M   p p p   21 22 23  p31 p32 p33

t . So that the camera direction vector is given by M p31 p32 p33 .

3 Method

This section presents the theory of 3-D reconstruction by triangulating line segments in lieu of points. First, line segments are detected in the 2-D imagery. In parallel, camera matrices are computed for each frame. Next, planar surfaces are detected by clustering putative line correspondences in a consensus algorithm and 2-D planar homographies are generated resulting in the registration of planar surfaces between 2-D images. With an accurate registration, corresponding line segments are back projected to planes and the 3-D line segments are defined by their intersection resulting in a 3-D wireframe model. The wireframe is then textured via boundary detection and Delaunay triangulation of each surface.

3.1 Overview

Figure 9 depicts a sequential algorithm for 3-D reconstruction from multiple images. Line segment detection is a prerequisite for building line-based models and it may be performed in parallel with the camera matrix computation as neither process has dependencies. In this theory of reconstruction, manufactured urban structures should be composed of flat surfaces containing many line segments. It is the existence of these surfaces that drives the 1-to-1 correspondence of line segments between images. We detect planar surfaces by considering putative line correspondences within a localized region of an image pair. Assuming a putative correspondence allows us to compute a 3-D line segment. These 3-D line segments are clustered according to coplanarity in a consensus algorithm. Thus 2-D line segments correspond if they contribute to a

3-D planar surface. Given a 3-D planar surface, random sample and consensus (RANSAC) determines a 2-D planar homography that registers the surface. The surface registration determines an accurate 1-to-1 line correspondence that is used in the wireframe reconstruction.

16 The boundary of each surface is detected through existing research and a mesh is formed via

Delaunay triangulation. The reconstructed surfaces are then textured with 2-D imagery and the 3-

D model is produced.

Figure 9 Pipelined 3-D model algorithm The blue processes highlight the contributions of this research, while building upon the foundations shown in gray. Planar surface detection relies on an accurate correlation of primitives between 2-D images and registration produces a 1-to-1 line correspondence.

3.2 Line Detection

Various edge and line detection algorithms were surveyed in references [8] through [16] and none were found to meet the requirements of this theory of reconstruction. The leading line detector employed by software packages such as Matlab and OpenCV use a two stage approach of edge detection (Sobel, Prewitt, Roberts, LoG, Zero Cross, Canny) followed by a Hough transform.

Sobel and Prewitt filters are directional in that they respond to horizontal, vertical and diagonal edges more than other directions biasing the detection of line segments to these directions. Roberts, Laplace of Gaussian (LoG) and Zero Cross detectors approximate image gradients and look for sharp changes in intensity of pixel values. Roberts and Zero Cross detectors implement naïve gradients by computing partial derivatives along x and y directions and assuming that directional derivatives can be computed via a dot product of gradient with direction cosines. While this may be true with smooth continuous surfaces, it is only a crude approximation in discrete imagery. For an accurate gradient in a discrete image, one must

17 consider diagonal pixels as well. Otherwise, the gradient is biased by horizontal and vertical lines. The LoG tries to include diagonal pixels and satisfy 1-D separability by using a Laplace operator on a Gaussian surface. A Gaussian is an infinitely long polynomial and cannot be justified with the finite data in an image. LoG does a much better job in producing a directionally independent response to edges, but I believe that Gaussians should be reserved for only the larger filter sizes when the number of data points can justify the use of transcendental polynomials.

Canny edge detection is an edge tracing algorithm that starts at a pixel of high gradient and follows an edge based upon upper and lower thresholds. Canny produces a fast result, but still depends on a directionally-biased gradient.

Given edges within an image, the Hough transform parametrizes each edge pixel into a one-parameter family of lines. A consensus algorithm identifies edge pixels belonging to the same line. If a putative line segment contains enough pixels, then a line segment is detected. As a voting algorithm, the Hough transform works well when the scene contains only a few lines and performs poorly when the scene contains many lines. In this theory of reconstruction, we need hundreds of lines in a medium image (1280 × 720) or thousands of lines for a large image (5456

× 3632) making the Hough transform infeasible. However, we can use the Hough transform in the computation of cameras when our scene is circumscribed by a calibration box of only 12 lines. This is discussed in detail later.

A theory of line detection is described here and used in the theory of 3-D reconstruction.

Line detection is a three stage process. First, edge pixels are computed. Then, line segment templates are fit to edge pixels indicating the orientation and strength of its contribution to a line.

Finally, edge pixels are clustered into segments and the best-fit line segment is computed through orthogonal regression.

A line segment is a collection of connected pixels nearly satisfying the equation of a line

Ax + By + C = 0 with the two sides of a line having different colors/intensities. In color images, generally one of the colors is always brighter than the other, resulting in a brighter side of the line

18 and a darker side. The brighter side of a line is denoted as the positive side and the darker side of the line is the negative side.

Figure 10 Example of a Line Segment A line segment is shown at progressively higher resolutions. When considering individual pixels, lines are not clearly demarcated due to the width of the line segment and the lack of contrast. Data courtesy of reference [3].

When drawn in a color space such as red-green-blue (RGB), one can see that the distribution of colors on each side of the line is easily separable. Figure 11 shows that the pixels from Figure 10 are clearly separable in RGB space. It is frequently the case that a few pixels cross over onto the wrong side of the plane of separation, but generally, darker pixels adhere to one side of the line while brighter pixels gravitate toward the other side of a line.

Figure 11 Near-separability of the pixels from Figure 10 The pixels from each side of a line segment are plotted in RGB space and shown to be easily separable by a plane.

19 Given the positive and negative sides of a line, it is easy to resolve the inherent orientation problem of a line. Detected lines are oriented in such a way that the brighter side faces the positive direction in a right-handed coordinate system as shown in Figure 12. This resolves the ambiguity of line orientation (θ vs. θ+180º).

y + –

θ

x

Figure 12 Line Orientation The “starting” endpoint of a line segment is selected so that the brighter pixels appear on the positive side of the orientation when plotted in polar coordinates with the starting endpoint at the origin.

The process of line detection begins by identifying edges with a band-pass filter. At each pixel, orientation is detected by template matching. Clusters of adjacent pixels having similar orientations are formed and orthogonal regression is used to find a best fit line segment to each cluster.

Edges are inherently fuzzy [7] as shown in Figure 13 below. The true edge may exist anywhere in an elongated region that separates two regions that could be from a few to several pixels in width.

20

Figure 13 Example of a an elongated region about five pixels in width that contains an edge.

Fuzzy edges motivate the non-linear 2nd moment filter [7] in

1 9 1 9 1 9   1 1 1 1 K 1 9 1 9 1 9 , μ  K I , σ  K μ  I   i, j  m,n im, jn i, j  m,n i, j i, j 2 m1n1 m1n1 1 9 1 9 1 9 Equation 6 Non-Linear 2nd Moment Filter Note that the vector-valued mean μ and intensity I are taken in the context of a three-valued RGB color space while the Euclidean distance, μ  I , yields a scalar value used in the i, j i, j 2 computation of the moment σ. Non-linearity arises from the Euclidean norm.

The moment filter produces a measurement of color variance within a localized rectangular region. If the filter is applied to a pixel in a homogeneous region of an image, then the filter response is a low value (dark region in Figure 14). Areas with larger color variations (edges) produce higher values in the filter response. Application of this edge filter on a 3-channel RGB image results in a single channel image highlighting edge pixels. Unlike a gradient measure, the response of the moment filter is not impacted by the direction of the detected edge.

21

Figure 14 Edge Detection using 2nd Moment Filter The 2nd moment filter is applied at each pixel. The result is a measurement of color variance in a localized 3x3 region.

The filtered image is used to detect points that could lie along a line. Using these points, near- separability is tested in RGB space by using template matching.

Given a template size, width and keep-out region, an exhaustive set of line segment templates is defined. Figure 15 shows all possible line segments templates of size 7x7, width of

2.25 pixels and keep-out region of 0.5 pixels. There are 32 such templates at non-uniform angular increments (Appendix E Line Templates). The number of templates is based upon the template size.

Figure 15 Line Segment Templates An exhaustive set of line segment templates are generated given a size, width and keep-out distance. The number of templates increases quadratically as a function of length. Only lines longer than the template can be detected.

22 Using the pixels identified by the edge filter, each template is applied is succession. The template that best separates the pixels of each side of the line in color space is selected to represent the orientation of the edge. A specific template match is shown in Figure 16 for one of the pixels in the line segment of Figure 10. The best match is defined in terms of linear separability. In the example of Figure 16, the pixels from the two sides of the line are completely separable in RGB space with zero pixels crossing onto the wrong side of the separation plane in RGB space. Note that this is a rare case. It is typical for a few pixels to be on the wrong side of the best plane of separation. The best template is selected as the orientation

template 12

220

200

180

blue

160

140 250

200

230 210 220 190 200 150 170 180 green 150 160 red Figure 16 Line Segment Template Match The pixels in the left image are best shown when plotted in 3-D RGB space as seen in the right image. The pixels from the brighter side of the template are plotted as plus symbols while the pixels in the darker side of the template are plotted as circles. Note that the spatial configuration of pixels within the template is correlated to intensity when plotted in RGB space. In this specific case, the + pixels are linearly separable from the o pixels. of the edge at a single pixel. Each pixel in the edge-filtered image is matched to a template after this manner resulting in a vector field representing edge orientation. Potential template matches are reduced by starting the small filters (3x3) and working toward the large filters (25x25).

Templates are only checked if they are compatible with the orientation of the previous filter.

After the exhaustive search of the eight 3x3 templates, subsequent searches for larger templates require no more than three templates per pixel. A mapping of compatible templates is pre- generated before any template matching occurs.

23 Quiver Plot

35

40

45

50

42 44 46 48 50 52 54 56

Figure 17 Quiver plot depicting edge orientation Each edge pixel is matched to a template establishing the orientation of the edge at that point. Template matching yields a consistent edge orientation in contrast to image gradient. Gradients are subject to large errors due to noise while templates consider many pixels (length × width) within a window.

Adjacent pixels of similar orientation are clustered and line segments are fitted to each cluster using the orthogonal regression defined in Appendix B. Line segment endpoints are determined by projecting each pixel onto the best-fit line. The two projections resulting in the longest line segment are used to determine the corresponding endpoints. This approach typically will stop at the true endpoint of the line or will be shorter. Endpoints rarely overshoot the real endpoint of the line.

10

20

30

40

50

60

70

10 20 30 40 50 60 70

Figure 18 Clusters of edge pixels Pixels are clustered according to proximity and orientation.

24

3.3 Camera Computation

The 3x4 camera matrices must be computed for each frame before we can reconstruct a 3-D model. In this research, we have two approaches to the computation of the camera matrix. In the first approach, the entire scene is encompassed by a calibration box of known dimensions. Its detection leads to a ground-truth solution for the camera matrix. In the second approach, the camera position and orientation are surveyed with bubble levels, plum bobs and a laser range finder along with some trigonometry. The camera intrinsic parameters are measured for a specific focal length and the camera matrix is composed through matrix multiplication of each component.

3.3.1 Calibration Box

In the method of the calibration box, a 3-D rectangular parallelepiped of known dimension circumscribes the scene as shown in Figure 19. First, we must detect the edges of the calibration box by applying a green filter to the image. Next, the 2-D lines of the box are discovered via

Hough transform. Incident relationships determine the vertices of the box and a connected graph is constructed to detect each of its faces. A small perturbation is assumed between each frame and the connected graphs are used to determine line correspondence between frames. Finally, the measured dimensions of the box are used to solve for the camera matrix in each frame.

Figure 19 Calibration box In the left image, a calibration box of known size circumscribes a 3-D scene. In the right image, a green filter is applied to isolate the edges of the calibration box.

25 3.3.1.1 Edge Detection

The calibration box is painted green in order to contrast with the scene. Samples of the green texture are taken from the 2-D imagery and it is noted that these samples form a Gaussian distribution and that their RGB components are linearly related when plotted in their color space as seen in Figure 20 below.

250

200

150

blue

100

50

0 300

250

200

150 240 220 200 180 100 160 140 120 100 50 80 60

green red Figure 20 Linear correlation and distribution of calibration box

The center of the distribution ellipsoid is found through a simple mean of the pixel RGB values,

1    pi and the axis are identified by the eigenvectors of the covariance matrix, 31 n i

    2    Ri  RiGi  Ri Bi   i i i   n  2 t  V   RiGi Gi Gi Bi     , scaled by the square roots of the respective n 1 i i i    2   Ri Bi Gi Bi  Bi    i i i   eigenvalues. Pixels are deemed to be within the green ellipsoid if their Mahalanobis distance

t from the ellipsoid center, dp,  p   V 1 p   is less than 2.5. That is to say that the pixel is within 2.5 standard deviations of center. Each pixel is tested and filtered out if its

Mahalanobis distance is greater than 2.5 resulting in the image shown in Figure 19 above.

3.3.1.2 Hough Transform

The Hough transform is applied to the image of the green-filtered calibration box to detect the lines of the box. In the Hough transform, each pixel in the image is assumed to be on a line, but the direction of the line is unknown. The line is then parameterized according to its distance from

26 the center of the image and orientation. These parameters are discretized and a one-parameter family of putative lines (pencil of lines) is generated for each pixel in the green-filtered image.

Figure 21 depicts the parameterization of a one-parameter family of lines through a given pixel.

0

0.5

1

1.5

distance

2

2.5

3 0 50 100 150 200 250 300 350 orientation Figure 21 Hough Transform In the left image, a line is parameterized according to its length and distance from the center. In the right image, the one-parameter family of lines through a pixel is depicted. Note that the signed distance from the center to the line is used. Negative distances are not considered because the orientation is allowed to sweep a full circle.

Once discretized, the Hough transform becomes a consensus algorithm. The line parameters receiving the most votes corresponds to a line containing many pixels.

Figure 22 Hough Transform This image depicts the Hough transform applied to a green-filtered calibration box. The twelve peaks correspond to the twelve edges comprising the box.

In this manner, the twelve edges of the calibration box are detected in each 2-D image.

3.3.1.3 Connected Graph

The discretization of the Hough transform line parameters causes a slight error in the detected lines of the calibration cube. The cube is used to compute the camera matrix, so the errors in its detection must be minimized. The cube lines can be iteratively refined given the structure of the

27 cube. To find the cube’s structure, a connected graph is formed. First, we locate the vertices of the cube by way of incidence relationship. If three lines are found to intersect, then we’ve found a cube vertex. If four lines are found to intersect, then we’ve found four parallel lines that intersect in an ideal point. Intersections are found by an exhaustive search over line pairs. With

12 1211 twelve lines, there are     66 line pairs. Given a line pair, we iterate over the  2  2! remaining 12  2 10 lines in search of a mutual intersection. Three lines form an over determined system, so the point minimizing the sum of squared distance is selected as the

  A1 B1 C1  common intersection. This point is the right null space of the three lines, A B C  . The  2 2 2  A3 B3 C3  common intersection is tested to make sure that it is within a few pixels distance from each of the

2 2 three lines via, d  Ax0  By0  C where A  B 1. After this manner, the cube vertices are defined as well as three sets of four parallel lines intersecting in ideal points.

Knowing the lines that contribute to each intersection, a connected graph is constructed in the numerical refinement of cube lines as well as the correlation step in the next section. The twelve graph edges are formed from the twelve cube lines and the eight vertices lying upon these lines. Cube faces are identified by traversing cube edges. Knowing that the connected graph forms a cube, a face is detected by a graph cycle of length four. When the six cube faces are discovered, opposite faces are paired by the fact that they don’t share vertices. A single face is arbitrarily selected as the first cube face and its edges are arbitrarily ordered according to a graph cycle. The ordering of the other eight cube edges are determined by the ordering of the initial face.

28

Figure 23 Cube graph edge numbering A cube face is arbitrarily selected as the first face and its edges are arbitrarily ordered according to a graph cycle. These cube edges are labeled 1 through 4. All other edge labels are determined by the initial four edges as shown above.

The numerical refinement of detected calibration cube edges proceeds as follows.

Twelve lines determine the eight cube vertices and three ideal points in over-determined linear systems. A cube vertex is an over-determined system of three lines found by the right null space of a 3x3 matrix and four parallel lines intersect in an ideal point determined by the right null space of a 4x3 matrix. These eleven intersections (8 vertex + 3 ideal points) form twelve over- determined linear systems used to find the cube edges. A cube edge lies along two cube vertices and an ideal point. Thus, the cube edges can be recomputed from the right null space of three homogeneous points. This is another over-determined system, so the sum of squared error is minimized by computing the singular value decomposition of a 3x3 matrix of points and using the vector corresponding to the smallest singular value as the homogeneous coordinates of a cube edge. These linear systems are determined by the connected graph of Figure 23. In summary, twelve cube edges over-determine eleven points and these eleven points over-determine the twelve cube edges. Iterative refinement of cube edges and vertices visually converges after three iterations and converges with respect to 64-bit floating point numbers after only 15-20 iterations as shown in Figure 24 below.

29 unrefined

500

1000

1500

2000

2500

3000

3500

500 1000 1500 2000 2500 3000 3500 4000 4500 5000

refined refined

1100

500

1150 1000

1500

1200

2000

1250 2500

3000

1300

3500

500 1000 1500 2000 2500 3000 3500 4000 4500 5000 2400 2450 2500 2550 2600 2650 2700 2750 2800

Figure 24 Iterative Refinement of Calibration Cube The top image illustrates the unrefined lines found by the raw Hough transform. The bottom images illustrate the result of iterative refinement. The bottom-right image shows four parallel lines intersecting in an ideal point.

3.3.1.4 Camera Parameters

The refined calibration cube from the previous section is combined with measured truth data in the computation of the 3x4 camera matrix. The truth data is input as the endpoints of 3-D line segments. These data are projected to each frame and are expected to lie upon the corresponding line segment detected in the previous section. Algebraically, we have l t  PX Y 0 , 13 34 42 12 where X and Y are the homogeneous 3-D endpoints of a line in the measured calibration box, P is the unknown 3x4 camera matrix and l is the detected 2-D image of the edge of the calibration box. Using commutativity of the kronecker tensor product as derived in Appendix I, we rewrite

t P1  X t    this equation as l t   Pt  0 , where Pt is row i of P written as a column vector. Each  t   2  i 13 Y 21   Pt  24  3  212 121

3-D calibration line to 2-D image correspondence yields two linear equations in the twelve variables of the camera matrix, P. Using all twelve such correspondences produces an over-

30 determined linear system of 24 equations in only 12 variables. Theoretically, we need only to find the right null space of this system. The vector corresponding to the smallest singular value should produce P up to an arbitrary homogeneous scale factor. Unfortunately, the standard approach to SVD uses a Householder transformation repeatedly to transform a matrix into a tridiagonal form. The Householder transformation is generally unstable when the matrix has small diagonal terms due to floating-point round off error caused by fractions having small denominators. This is the case when we construct our linear system using data having a large range not centered on zero. It is important to note that this is only a numerical analysis issue.

Given infinite precision, this issue wouldn’t exist. So, the data must be normalized before computing the SVD and denormalized afterward for an accurate decomposition.

Given 3-D data over a field in3 , we can translate the data to have a mean of 0 and

1 sx 0 0 0 1 0 0  x   0 1 s 0 0 0 1 0    limits of ±1, using S T   y    y  . Similarly, 2-D data may  0 0 1 sz 0 0 0 1  z       0 0 0 1 0 0 0 1 

1 sx 0 0 1 0  x  be normalized using S T   0 1 s 0  0 1    . These normalizations fit into our  y   y   0 0 1 0 0 1 

t 1 1  1 1  model as l T2  S2 S2T2  PT1  S1  S1  T1 X Y 0 . We solve the system for a 34 13 33 33  33 33 44 44  44` 44 42 12

~ 1 1 normalized camera matrix P  S2 T2  PT1  S1 using a stable singular value decomposition

1 1 ~ and then denormalize via P  T2  S2  P  S1 T1 .

3.3.1.5 Correlation

Given the connected graph from the previous section, calibration cubes may be correlated between consecutive frames of a video sequence. Once calibration cube edges are correlated between frames, the truth data used in the camera matrix of the first frame may be applied to

31 compute the camera matrices from subsequent frames. Assuming a small perturbation between frames, the ambiguity of correspondence is resolved by choosing the correspondence in which the camera center moves the least. Given the connected graph of previous sections, the first cube face must correspond to one of the six cube faces in the next image. For each cube face correspondence, there are four 90° rotations and two orientations (clockwise / counterclockwise graph traversal). Thus, there are 642  48 possible correspondences. Given a putative correspondence between the twelve edges of the calibration cube, the 3-D truth data from the previous frame may be applied to the current video frame and the camera matrix may be computed. Out of the 48 possible camera matrices for a given frame, the correct camera matrix is the one in which the camera center moves the least between frames. This assumes a small perturbation in between video frames. The camera center is easily computed from the right null space of the camera matrix and the camera principal axis is given by the bottom row of the camera matrix. To measure the perturbation between consecutive video frames, we can compute the Euclidean distance between camera centers or the angle between the principal axis vectors.

Given unitary principal axis vectors, the cosine of the inscribed angle is given by their dot product. If the dot product is large, then the angle between the two principal axes is small. This allows us to select the correct correspondence and camera matrix from the 48 putative configurations.

3.3.2 Surveyed Camera

In the second researched method of camera computation, the camera position and principal axis are surveyed with respect to a global coordinate system. Camera intrinsic parameters are measured and the final camera matrix is composed of measured values. Recall Equation 5, ~ ~ P  KR I C. The camera center, C , is measured with respect to a global coordinate system as shown in Figure 25. Azimuth rotation, α, is measured with respect to the global

32 coordinate system while elevation and roll (θ, φ respectively) are measured with respect to the camera coordinate system as shown in the right of Figure 25.

Figure 25 Depiction of Camera Center and Rotation Note that azimuth, α, is measured with respect to the global coordinate system, while elevation and roll (θ, φ respectively) are measured with respect to the camera coordinate system.

The definition of camera and global coordinate systems produces the rotation matrix shown in

Equation 7 below.

cos sin 0 1 0 0  cos sin 0       R  sin cos 0  0 sin  cos   sin cos 0  0 0 1 0 cos sin   0 0 1 Equation 7 Camera Rotation Matrix

mx 0 0  f 0 px  In order to measure the camera intrinsic parameters, K   0 m 0  0 f p  for a  y   y   0 0 1 0 0 1  given focal length, a series of calibration pictures are taken with a chess board (Figure 26) to induce a series planar homographies between the scene and the 2-D image.

Figure 26 Intrinsic Camera Calibration

33 The 3-D planar surface can be treated as a 2-D surface and a coordinate system can be imposed

upon it. Then, the 3-D to 2-D planar homographies are given by H  h1 h2 h3 

 sKr1 r2 t, a simple translation and rotation of the camera coordinate system followed by our camera intrinsic parameters, K, and an arbitrary homogeneous scale factor, s. Note, that we need only two column vectors for the rotation matrix, because the 3-D planar surface was treated as if it were a 2-D surface with the z-coordinate set to zero. Solving for the K matrix of intrinsic parameters proceeds as follows.

t Note that the r1 and r2 vectors are orthonormal. The orthogonal constraint, r1 r2  0 gives

t t rise to one equation, while the normal constraint gives us a second equation, r1 r1  r2 r2 1.

Direct substitution, h1=sKr1 and h2=sKr2, for the homography columns in the orthogonal and

t t 1 t t 1 t t 1 normal equations produces, h1 K K h2  0 and h1 K K h1  h2 K K h2 respectively.

Note that the homographic scale constant, s, cancels out at this step. At this point, both equations contain the factor, K t K 1 . For simplicity, substitute B  K t K 1 and solve for B in the

t t t orthonormal equations h1  Bh2  0 and h1  Bh1  h2 Bh2 . Using the commutativity of the 13 33 31 13 33 31 13 33 31 kronecker tensor product (Appendix I) yields two linear equations in the elements of B as shown in Equation 8 below.

bt    bt  1   1 t t  t  t t t t  t  h1  h2  b2   0 , h1  h1  h2  h2  b2   0 13 13 13 13 13 13  bt    bt  19  3   19 19   3  91 91 Equation 8 Orthonormal equations from a 3-D to 2-D planar homography Given a planar homography, H, from an image of a chess board (Figure 26), we construct two linear equations in the 9 variables of B  K t K 1 . Five such homographies are sufficient to over-determine the matrix, B.

34  f x 0 cx  Once B  K t K 1 is solved, note that K   0 f c  ,  y y   0 0 1 

2 2 1 f x 0  cx f x   1 f x 0  cx f x    K 1   0 1 f  c f  and K t K 1  0 1 f 2  c f 2 .  y y y   y y y     2 2 2 2 2 2   0 0 1   cx f x  cy f y cx f x  cy f y 1

1 The elements of our camera intrinsic parameters, K, may then be extracted as f x  , b11

1 b13 b23 f y  , cx  and cy  . Further optimization can be achieved by noting the b22 b11 b22 symmetry of K t K 1 , we need solve for only six variables vice nine. This would allow us to find the intrinsic camera parameters with only three planar homographies.

3.4 Planar Surface Detection

Planar surface detection is a consensus algorithm. Putative line segment correspondences are generated between image frames, their 3-D line segments are reconstructed including false positives and 3-D lines are clustered according to coplanarity. Those planar line segment clusters having membership above a minimum threshold are considered planar surfaces in the scene. Line clusters having fewer lines than the minimum threshold are discarded as false positives.

To reduce the number of putative correspondences between image frames, the shortest lines are discarded. The length threshold was determined empirically to yield around 400 long lines. Figure 27 shows the distribution of detected line segments in terms of length. Most of the detected line segments are short. A minimum line segment length of 300 pixels is empirically selected in order to provide around 400 of the longest lines resulting in a data reduction. Figure

27 shows that only around 3% of the lines were retained for planar surface detection. In the right image of Figure 27, it is demonstrated that the remaining lines are still representative of scene geometry.

35

Figure 27 Longest scene lines

Putative correspondences between images are further reduced by encompassing the scene with a

3-D wireframe box and subdividing by octants to locally correlate line segments as shown in

Figure 28 below.

500 500

1000 1000

1500 1500

2000 2000

2500 2500

3000 3000

3500 3500 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

500 500

1000 1000

1500 1500

2000 2000

2500 2500

3000 3000

3500 3500 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

500 500

1000 1000

1500 1500

2000 2000

2500 2500

3000 3000

3500 3500 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

36 500 500

1000 1000

1500 1500

2000 2000

2500 2500

3000 3000

3500 3500 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

Figure 28 Scene subdivided by octants

Each octet subdivision is projected onto the 2-D images via the camera matrix resulting in a flat quadrilateral or hexagon. Image lines intersecting or contained within this closed polygon boundary can be locally correlated between the images. Subdivision by octants continues recursively until the set of localized correspondences becomes small enough for RANSAC or exhaustive search. Three such subdivisions are sufficient for the imagery in Figure 28.

Assuming all possible line segment correspondences in a localized region, 3-D lines segments are reconstructed via planar back-projection through the camera matrix. An exhaustive putative correspondence is generated between the localized octants of two frames and their 3-D line segments are reconstructed as shown in Figure 29 below.

10

5

15

z axis

0 10

5

0 -5 10 15 y axis 20 -5 25

Figure 29 Exhaustive Pux axistative Correspondence

Note that most of the 3-D line segments in a localized region are skew. Few lines are coplanar. Coplanar lines are those which are parallel or are only a short distance apart.

Parallelism is detected by the dot product of vectors representing line orientation. If p and q are

37 the endpoints of a 3-D line segment, then v = q – p is a vector representing the orientation of the line. For unit vectors, v1 and v2, the dot product represents the cosine of the angle between them,

 v1 v2  cos . If such an angle is smaller than a threshold, v1 v2  cos1.0 then the line segments are considered parallel and coplanar. Skew lines are coplanar if they pass within a threshold distance of each other. Skew lines are parameterized as shown in Figure 30 below.

Figure 30 Distance between skew lines

The shortest distance between skew lines is represented by the vector v(s,t) that is orthogonal to

t v1  both v1 and v2. The parameters s and t are found by solving the linear system, vs,t  0 .  t  21 v2  31 23

The distance between the skew lines is given by d  v s,t . Coplanar lines are shown in blue   2 in Figure 29 above. A planar surface is detected when a cluster of coplanar 3-D line segments of sufficient membership is formed.

3.5 Homographic Registration

Planar surface detection yields an approximate 3-D planar surface, Ax + By + Cz + D = 0 and a cluster of line segments in each of two image frames that roughly correspond due to their contribution to the plane. The purpose of homographic registration is to align the 2-D images of these planar surfaces to induce a 1-to-1 line segment correspondence and to eliminate outliers.

This section discusses only the homographic registration. Refer to the Line Correspondence section for obtaining the 1-to-1 correspondence between line segments.

38 Given two images of a planar surface, there exists a homography between them. This homography, H, maps points in the first image to points in the second image under the transformation, x'  Hx (See reference [5]) or lines in the second image to lines in the first image under the transformation l  H tl ' (See reference [5]). The purpose of computing a planar homography is to correlate all of its line segments between images. Once correlated, line segments may be used in 3-D scene reconstruction. At low-enough resolutions, a 3-D scene may be approximated with a planar surface. At higher resolutions, a planar homography may only approximate a true planar surface in the scene such as a wall or a roof of a building.

First, a histogram approach is used to compute the global rotational difference between the two images. A putative correspondence is computed predicated upon the global rotation; this step serves to reduce the search space of true correspondences. Correspondences are further reduced using template matching. Random Sample and Consensus (RANSAC) removes the false positives from the correspondences and determines an initial homography [23]. This homography is then iteratively improved in a feedback loop producing a final planar homography as well as a set of true line correspondences. Once again, note that a homography can only be used to register planar surfaces. At high resolution, a homography is needed to register each planar surface in the 3-D scene.

3.5.1 Global Rotation

It is important to note that planar surfaces are related by a homography that does not preserve distances or angles. Thus, no global rotation can register the two images. A global rotation is computed only to roughly align the images at low resolution to extract a set of putative correspondences. Using the length and disambiguated orientation of each detected line segment, an orientation histogram is produced by convolving a Gaussian with a series of delta functions to

1  θ 2 n 2 represent line orientations vs. lengths, rθ  e liδθ θi . This results in an orientation i1

39 histogram as shown in Figure 31. These two signals are then correlated via an inner product of a circular shift and the peak correlation score is selected as the global rotation as shown in Figure

32.

E:/brown/site2/frames/frame00000.png 1000

800

600

400

200

0 0 50 100 150 200 250 300 350

E:/brown/site2/frames/frame00001.png 1000

800

600

400

200

0 0 50 100 150 200 250 300 350

Figure 31 Rotation histogram of two images Global image rotation is estimated by computing an orientation histogram. Histogram signals of corresponding planar surfaces are most similar when the perturbation between corresponding images is minimized such as occurs between successive video frames.

Blue shifted 358.242 degrees right 900 left right 800

700

600

500

400

300

200

100

0 0 50 100 150 200 250 300 350

Figure 32 Correlation score and global rotation Global rotation is computed by identifying the maximum correlation between orientation histogram signals. In urban imagery, it is common for line segments to exist in pairs (θ, θ+180º). The right images depicts an overlay of the orientation histograms after a cyclic shift to compensate for the global rotation of 358.242º.

3.5.2 Putative Correspondence

A putative correspondence between line segments is then inferred by a ±3º window predicated upon the global rotation (Figure 33). Notice that the putative correspondence is not 1–to–1.

40

Figure 33 Putative correspondence The tall blue line in the top image could correspond to any of the line segments in the ±3º window depicted in the bottom image after applying a -1.758 º (+358.242 º) global rotation.

The purpose of a putative correspondence is to reduce the possible search space for identifying the true correspondences. Consider the full, unconstrained search space consisting of two images with M detected lines in the first view and N lines in the second view. Line correspondence is not a 1-to-1 relationship due to fragmented line segments. An unconstrained correspondence produces M × N potential correspondences. Each of these correspondences can represent a true positive or a false positive inducing a search space of 2M × N possibilities. In the imagery provided by Brown University, [3], 200 to 300 line segments were detected per frame.

An unconstrained correspondence could yield as many as 250 × 250 ≈ 62K line pairs. However, a constraint on global rotation reduces the putative correspondence to around 1200 line pairs.

Certainly, other constraints exist that could reduce the search space such as global scale and global translation [34]. Neither of these constraints has been found to be as stable as global rotation. Approximating global scale with line segments has a problem in that the line segment end points are inherently unreliable. With sufficiently many line segments global scale may be approximated, but this technique is unreliable as the number of detected line segments decreases.

Global translation has also been estimated using 2-D Fourier transforms, but this technique is

41 dependent on a reliable scale and rotation and is not well suited to dissimilar (Affine, projective) imagery. Global rotation using line segments is a much more reliable constraint.

3.5.3 Point Descriptor

A point descriptor is a feature selection technique intended to capture the signature of a neighborhood for the purpose of matching correspondences. Commonly used point descriptors often use image gradient to characterize a point. Gradients are nothing more than low-order, directed Taylor series with the bias term removed for intensity invariance. Knowing that low- order Taylor series approximate a function locally and that higher-order polynomials approximate over a larger domain, a point descriptor may be constructed from over-determined cubic polynomials to better approximate the image centered on a local point.

10

250 8

200 6 150 4 100

2 50

0 0 -10 -2

-5 -4

0 -6

-8 5 8 10 4 6 0 2 -10 -4 -2 10 -6 -10 -5 0 5 10 -10 -8

Figure 34 Directed Polynomial Point Descriptor The left image depicts the polar resampling pattern about each pixel. The right image shows an overlay of two descriptors from corresponding points in two images. This point descriptor captures the shape of the image intensity centered at a point.

The point descriptor is constructed by fitting low-order polynomials to the image data in a radial pattern. Figure 34 shows a point descriptor built with a radius of 10 pixels and sample rates at

δθ=45º and δr = 1 pixel. Subpixel accuracy is obtained through bilinear interpolation (Appendix

D). A cubic polynomial is uniquely determined by four data points. The descriptor uses 11 points to define a cubic. This results in an over-determined system that may be fitted with a unique cubic using multiple regression (Appendix C). Minimizing the sum of squared error produces a polynomial that smoothes the surface. Figure 34 depicts two descriptors from corresponding points in different images. One can see that matching points have descriptors that

42 fit well. A fitness measure is needed to assess the dissimilarity of two point descriptors. For this dissimilarity measure, the root-mean-squared (RMS) error is employed,

N 1 1 R e  p r  q r 2 dr where R is the radius of the point descriptor (10 pixels) and rms   i   i   i0 R r0

N is the number of polynomials in the descriptor. A point descriptor is a set of coefficients.

There are 16 cubic polynomials, so the descriptor contains 16 × 4 coefficients. For rotation invariance, point descriptors are constructed in the tangential/normal (TN) coordinate system of a line segment. For scale invariance, note that polynomial coefficients are scaled under dilation,

2 2 3 3 pkr p0  kp1r  k p2r  k p3r . Given two polynomials, prand qr pkr,

th q1 q2 q3 qn note that the sequence of n order approximations of k, sk n  , , 3 ,, n , p1 p2 p3 pn approaches the true value of the scale factor k. A three term sequence (cubic) doesn’t contain enough data to extrapolate a limit for sk(n), so the highest-order term is used to approximate k.

3.5.4 Template Matching Via Point Descriptor

After building a many-to-many putative line correspondence between views, template matching is performed to determine the correct matches. This template matching produces a reduced set of line correspondences as well as a 1-D homography mapping individual pixels between the line segments. Given the definition of a point descriptor, point descriptors are computed at one-pixel increments for all of the detected line segments in an image. Putative line correspondences are then confirmed / rejected by matching the best-fit point descriptors along the lines. Figure 35 and

Figure 36 depict the best point matches along two lines of a putative correspondence. One can clearly see the truncation of the right line segment in the 1-D homography induced in Figure 36.

Notice that the 1-D homography appears as a straight line instead of a rectangular hyperbola.

This is because the projective component of the 1-D homography is nearly zero.

43

Figure 35 Putative Line Correspondence The red lines represent the detected line segments in an image pair. The blue line highlights one of the putative correspondences predicated upon a global rotation. Imagery curtesy of Brown University [3].

Line correspondences are filtered according to the average RMS fitness score along all of the points in the correspondence induced by the 1-D homography. Note that in general a 1-D homography is a rectangular hyperbola, but the line segments don’t extend into the horizon.

From a nadir view, the 1-D homographies appear as lines (zero projective component). In this case, the 1-D homography may be extracted by a Hough transform. In the general case, the 1-D homography may be determined by a 3-point RANSAC algorithm. The output of template matching is a reduced set of putative correspondences and their 1-D homographies.

fitness = 2.989562 5

4

3

2

1 0 5 10 15 20 25 30 35 40 45 50

1D homography 30

20

rht rht points 10

0 0 5 10 15 20 25 30 35 40 45 50 lft points

Figure 36 1-D homography

44 The bottom image depicts a 1-D homography between the corresponding line segments of Figure 35. The horizontal axis determines a point along the line segment in the left image. The vertical axis determines a point along the line segment of the right image. An x appears where point descriptors were found to match. In the first 32 pixels of the line in the left image, the matching pixels were fitted along the line segment in the right image. The remaining pixels were not matched due to truncation of the left line segment during line detection. The top image shows the RMS fitness score at each pixel according to the detected 1-D homography.

If no 1-D homography is found or if the fitness score is too large, then the putative line correspondence is rejected. This filtering process removes most of the false positives.

3.5.5 2-D Homographic Registration from Correspondences

The purpose of determining a 2-D planar homography between images of corresponding 3-D planes is to establish a one-to-one, point-to-point correspondence between the planes with sub- pixel accuracy. Using this homography, all other point or line correspondences can be established in these planes. Given primitive (point / line) correspondences, one can begin to reconstruct the 3-D scene.

3.5.5.1 Registration from Point Correspondences

Corresponding points x and x’ of a planar surface are related by a homography x’ = Hx in homogeneous coordinates. The problem with homogeneous coordinates is that the equality is

ωx' h11 h12 h13 x determined up to an unknown scale factor ωy'  h h h   y . Each point    21 22 23    ω  h31 h32 h33 1 correspondence introduces a new unknown scale factor ω. The division in the homogeneous-to-

h x  h y  h Cartesian isomorphism x' 11 12 13 is not well suited when ω (denominator) is near h31x  h32y  h33 zero. A transformation is needed that will remove the unknown scale factor from each point correspondence and produce a linear system. The Direct Linear Transform (DLT) described in

[5] accomplishes this goal. In recognizing that x and x’ are vectors in the same direction, their

cross product should be zero x'i Hxi  0 . Factoring out the hij terms, two independent linear

45  1  t t t h  0  ω'i xi y'i xi   2  equations are formed in nine variables,    h  0 where each hi is a ω' xt 0t  x' xt    i i i i   3  h  row of the homography H. Four pair of point correspondences produces eight independent linear equations which is sufficient to solve for the eight degrees of freedom in H up to an arbitrary

scale factor (ninth variable). It is common to set h33 = 1 or h 1for a unique non- 2 homogeneous solution. An over determined system A h  0 may be normalized by

At A h  0 . In general, there is no solution to the over-determined system due to noise, instead the minimization of At A h is accepted as a best-fit solution. This term is minimized when h is the eigenvector corresponding to the minimum eigenvalue of AtA.

It is important to note that the typical numerical algorithm for computing the eigenvalues and eigenvectors of a positive-definite symmetric matrix suffers from severe round-off errors when the matrix AtA is not diagonally dominant. This is due to the Jacobean-style rotations using diagonal elements in the denominator to compute the sine and cosine of the rotation angle. These ill-conditioned matrices AtA are produced whenever the dataset x and x’ have a mean other than zero and a large variance. This is the case with image data having an origin in the upper-left corner and resolution containing a thousand pixels on each axis. The data must first be normalized using a translation and scale (Equation 9)

1 σ x 0 0 1 0  μx  ωx       ~x  S T  x  0 1 σ 0  0 1  μ  ωy  y   y     0 0 1 0 0 1   ω  Equation 9 Conversion from Full-Scale to Normalized Coordinates First translate coordinates to have a mean of zero and then scale the coordinates to have a variance of one. in order to get a mean of zero and a small variance (e.g. σ=1). The points in each image are normalized according to their scale and translation matrices, S, T, S’ and T’. After computing the ~ homography for the normalized coordinate systems H , the transformation is denormalized using

46 ~ the scale and translation matrices H  T'1S'1H  S T . Refer to Appendix F 2-D Image

Registration with Points for an example.

This is the standard technique for 2-D image registration using points. In this research, the point / line duality of projective geometry is exploited to perform registration and reconstruction using corresponding line segments in lieu of corresponding points. These standard techniques serve as a basis for this research.

3.5.5.2 Registration from Line Correspondences

Corresponding lines l and l’ of a planar surface are related by a homography l  H tl' in homogeneous coordinates. Once again, homogeneous coordinates causes a problem, because the equality is determined up to an unknown scale factor. If this unknown scale factor were incorporated into a system of linear equations, then an extra variable would be produced for each line correspondence resulting in a lot of extra variables. The homography only has eight degrees of freedom in nine variables. So, it is desirable to develop a system of linear equations with no more than the nine variables of the homography. Notice that a line in homogeneous coordinates is a vector of the line’s coefficients l = [A B C]t. Any point along the line

 pt  t t H tl' 0 p = [x y 1] satisfies the equation p l  0 (Figure 37). This motivates the model  t  . q 

Refactoring yields two linear equations in the coefficients of H for each line correspondence

h1  A' pt B' pt C' pt    0  h2  j  t t t    where h is a row of the homography. A'q B'q C'q   0    3    h 

47 (q’x,q’y) (qx,qy)

= 0 l ’ ’ t l l = H l ’ + C ’y B Ax + By + C = 0 + ’x A (px,py) (p’ ,p’ ) x y Figure 37 Homographic Line Correspondence t t t In the left image, line l = [A B C] has endpoints [px py] and [qx qy] . The corresponding line and endpoints appear in the right image. The planar homography H maps the two lines in homogeneous coordinates under the transformation l = Ht l ’.

Four line correspondences produce eight independent linear equations in the nine variables of H. This is sufficient to solve for H up to an arbitrary scale factor. The over determined system A h  0 may be solved by normalizing At A h  0 . Once again, there is no solution to the over-determined system due to noise. The minimization of At A h is accepted as the optimal solution. This term is minimized when h is selected to be the eigenvector corresponding to the minimum eigenvalue of AtA. As in the case of point correspondences, AtA is a positive definite symmetric matrix. The input data must be normalized, so that this matrix is diagonally dominant to reduce numerical instability in the computation of eigenvalues and eigenvectors. Points may be normalized as before by translation and scale

1 σ x 0 0 1 0  μx  ωx ~x  S T  x   0 1 σ 0  0 1  μ   ωy . Lines should be further normalized, so  y   y     0 0 1 0 0 1   ω  that A B  1in = [A B C]. Once again, the final homography must be denormalized via   2 l ~ H  T'1S'1H  S T . Refer to Appendix G 2-D image Registration with Lines for a numerical example.

48 3.5.5.3 Registration from Line Correspondences and Their 1-D Homographies

Corresponding lines l and l’ of a planar surface are related by a homography l  H tl' in homogeneous coordinates. In the coordinate system of each line’s parameterization, points along corresponding lines are related by a Möbius transform (1-D homography). At least four pairs of line correspondences are needed to determine a 2-D homography. However, only two line correspondences with their 1-D homographies are needed to determine a 2-D homography.

Inclusion of the 1-D homography reduces the search space of a 2-D homography by two orders of magnitude.

Recall the parametric equation of a line and 2-D homography,

 q  p  x x p  x  1 l h  q  p   y y  1 2  2  j i P  p y  p p , H  h  , where p is a column of P and h is a row of H. 32  l  33 31 13  3   0 1  h     

Also recall the pseudo inverse of the parametric equation

 cosθ sinθ  px cosθ  py sinθ P    . First, project the 2-D homography onto the 1-D 23  0 0 1 

' homography, M  P  H P . Note that P is derived from l while the pseudo inverse P’+ is 2x2 2x3 3x3 3x2 derived from the corresponding line l ’. This relationship defines three equations, because the

Möbius transform M has three degrees of freedom and the Fundamental Theorem of Möbius

Geometry [4] says that three point correspondences uniquely define a Möbius transform.

Unfortunately, the Möbius transform in homogeneous coordinates is unique up to a scale factor.

A similar problem is encountered as when performing registration with points or lines. The isomorphism converting a Möbius transform from homogeneous coordinates to Cartesian

49 a b az  b coordinates    f z  is not well suited to ideal points where the denominator is c d cz  d zero or even near zero. To avoid the division problem, note that adjM  P'  H P  M I is the 2x3 3x3 3x2 2x2 22

k 0 homogeneous identity  ,k  0 where the adjoint of M is given by 0 k

 m22  m12 adjM     . This produces two equations for each off-diagonal equal to zero and  m21 m11  a third equation where the diagonal elements are equal to each other. For convenience, make the following substitution for the adjoint of M an the pseudo inverse of the parameterization of a line,

h1t    Q  adjM  P' . Now factor the homography coefficients h  h 2t from the identity 2x3 91   23 2x2  3t  h 

h1  q11 q12 q13  2  1 2 j Q H P  M I . In block matrix form Q H P   h p p  where p 23 3x3 3x2 23 3x3 3x2 q q q 31 22  21 22 23 3  h  is a column of P and h i is a row of H. Multiplication yields 13

h1p1 h1p 2  q11 q12 q13   2 1 2 2  QH P    h p h p  . Another multiplication produces 23 3x3 3x2 q q q  21 22 23  3 1 3 2  23 h p h p  32

q h1p1  q h2p1  q h3p1 q h1p2  q h2p2  q h3p2  QH P  11 12 13 11 12 13  M I . Now, 3x3 3x2  1 1 2 1 3 1 1 2 2 2 3 2  23 q21h p  q22h p  q23h p q21h p  q22h p  q23h p  22 22

1 1 2 1 3 1  q11h p  q12h p  q13h p   M      q h1p 2  q h 2p 2  q h3p 2 0 reshape to a 41 matrix,  11 12 13     . Grouping the top two  1 1 2 1 3 1    q21h p  q22h p  q23h p 0  1 2 2 2 3 2    q21h p  q22h p  q23h p   M  41 41

50 t rows, the bottom two rows and transposing each hip j   p jt hit , yields

1t  M  h     t 1t t 2t t 3t   t t t  0 q11P h  q12P h  q13P h q11P q12P q13P  2t         h  which is of the q Pt h1t  q Pt h 2t  q Pt h3t q P t q Pt q Pt    0   21 22 23   21 22 23   3t  41 49 h    91  M  41

 M    0  t    form of a Kronecker tensor product,  Q P  h  . Subtracting the first and last row  23 23  91  0  49    M  41 and substituting for Q, produces three linear equations expressed in coefficients of H,

1 0 0 1    t  0 1 0 0 adjM P'  P  h  0 .   22 23 23  91 31 0 0 1 0  49 34 Equation 10 Three Linear Equations from a Line Correspondence and 1-D Homography The adjoint is employed instead of the inverse of M due to the arbitrary scale factor implicit in the homogeneous identity. Note 1 that M 1  adjM . The determinant is not needed due to the M arbitrary scale factor.

One more equation is needed from the line correspondence, because the projection onto the 1-D homography only produces equations expressed in the tangential direction of each line of the correspondence. An equation is borrowed from the model of line correspondence,

 pt  H tl' 0  t  , to constrain the direction normal to the lines of the correspondence. Note that q  each of these equations produces the required constraint. The top equation is selected arbitrarily to provide the required constraint. When used in conjunction with the three equations from the projection onto a 1-D homography, the second line/endpoint equation becomes linearly dependent.

51 Two pair of line correspondences and their associated 1-D homographies produces eight linear equations in the nine variables of the 2-D homography H. This is sufficient to solve for the

2-D homography up to an arbitrary scale factor. However, there are typically more than two line correspondences and their homographies, so an over determined system is solved as before by finding the eigenvector corresponding to the minimum eigenvalue of At A . Once again, numerical techniques will fail if At A is ill-conditioned. The data must first be normalized so that is diagonally dominant. For lack of an elegant solution, the following brute force isomorphism is applied to the Möbius transform to lift from full-scale coordinates back to normalized space, ~ ~ ~ M  P' S'T'P'M  PT 1S 1P Equation 11 Full-Scale to Normalized Möbius Isomorphism ~ M is the full-scale Möbius transform and M is corresponding transform expressed in the normalized geometry. Reading the transformation from right- ~ to-left, P is the normalized parameterization of the line in the first image (Equation 2). T 1S 1 is the conversion from normalized coordinates to full-scale coordinates in the first image (Equation 9). Then apply the pseudo inverse of the parameterization (Equation 3) in the first image. The first half of the transform ~ PT 1S 1P serves as an adapter to M by converting normalized coordinates in the tangential coordinate system of a line into the full-scale coordinates. The ~ second half of the transform P' S'T'P' performs the inverse of these operations with respect to the second image (prime annotation). ~ ~ P ' (Equation 3) and P (Equation 2) must also be computed using normalized coordinates ~ ~ ~ ~ (Equation 9). Applying these normalizations to the model yields M  P '  H  P . Proceed as ~ ~ ~ ~ before to build equations from Q H P  M I . Recall that this is the equation of a 23 3x3 3x2 22 homogeneous identity, so the equality is known up to an arbitrary scale. Applying the former result yields three normalized equations,

1 0 0 1   ~ ~ ' ~ t  ~ 0 1 0 0 adjM P  P  h  0 .   23 23  91 31 0 0 1 0  49 34

52 Equation 12 Normalized Equations with Line Correspondence and 1-D Homography Using these three equations, solve a linear system for a ~ ~ normalized homography h  H . 91 33

As before, the denormalized homography is computed using the scale and translation matrices ~ H  T'1S'1H  S T . Refer to Appendix H Image Registration with Lines and 1-D

Homographies for examples.

3.6 Line Correspondence

Once a 3-D planar surface is established and a homography is found to register the 2-D image of this surface between two images, a 1-to-1 line correspondence may be established. For two lines to correspond, they must align and their line-segments must overlap. Ideally, corresponding lines

t t are related by the 2-D homography, l1  l2  H , where l1 is the line from the first image and l2 is its corresponding line from the second image. The line segment endpoints, x and y, from image 1

t must lie upon the line in image 2 when mapped by the homography, l2  H x1 y1  0 . Due to noise and sampling errors, the mapping is not exact, so the distances between the image of the

endpoints and the corresponding line segment are measured, A2 B2 C2  Hx1 y1   d , 13 33 32 12

2 2 A2  B2  1. The line segments align when the distances are small. For the high resolution imagery in this research (5456 × 3632), the distance threshold for line segment alignment is 8 pixels. Alignment is only half of the correspondence detection, the line segments also have to

overlap. Given two line segments with endpoints p1,q1  and p2 , q2  overlap is detected by resampling the second line segment into the geometry of the first, scaling the x-axis by the length of the first and then checking the resampled endpoints of the second line using the equation

 cos sin 0 1 0  p1x   p2x q2x  1 l 0 0  sin cos 0  0 1  p    p q   p q as shown in  1     1y   2 y 2 y   2x 2x   0 0 1 0 0 1   1 1 

53 2 2 Figure 38 below. The length of the first line segment is l1  p1x  q1x   p1y  q1y  , the

q  p cosine of the rotation angle is cos  1x 1x and the sine of the rotation angle is given by l1

q  p sin  1y 1y . l1

Figure 38 Resampled Line Segments The two line segments in the left image are resampled into the geometry of the first segment by a similarity transform aligning the first line segment to the x-axis and normalizing its length to one.

The lines do NOT overlap if both of the resampled endpoints, p2 x and q2 x , are greater than 1.0 or less than 0.0. Otherwise, the two line segments overlap. When two corresponding line segments between image frames both align and overlap under a homography, then the lines correspond. A list of true correspondences is generated and all outliers are removed.

3.7 3-D Wireframe

At this point, planar surfaces are identified and the lines within each planar surface are correlated in a 1-to-1 line segment correspondence between two images. In this section, the 3-D line segments are reconstructed from the 2-D correspondences and the camera matrices. Bundle adjustment is performed to remove reconstruction error by projecting each line onto a best-fit planar surface resulting in true coplanarity.

Given a 2-D line segment and the camera matrix for the image, the line may be back- projected through the image plane (Figure 8) yielding a planar surface,   l t  P . Note that the 14 13 34

54 first three elements of the planar surface, n  1  2  3 , represents the normal vector to the plane. Two such planar surfaces of corresponding back-projected lines intersect in the preimage of the reconstructed 3-D line. For numerical stability, these planes must not be parallel or near- parallel. The cosine of the angle between planes is computed via a dot product and planes that form an angle less than 10° are rejected, n  n  cos10 where n  n 1. The 1 2 1 2 2 2 intersection of two planar surfaces results in an infinitely long line. To find the 3-D coordinates of each line segment endpoints, a third line through the endpoint is back-projected to a planar surface as well as in Figure 39 below. The equation of a 2-D line in general form is

Ax  By  C  0 and the normal vector to the line is given by A Bt as depicted in Figure 2.

An orthogonal line would have a normal vector of  B At . When drawn through the line’s

endpoints, (p, q), the normal have equations  Bx  Ay  Bp x  Ap y  0 and

 Bx  Ay  Bq x  Aq y  0 .

Figure 39 3-D Line Segment Endpoints

Each 3-D line segment endpoint is found through the intersection of three planes. Two planes are back projected from line correspondences between two images. The third plane is back projected through a normal to the 2-D line through the required endpoint. These three planes have a unique solution of the 3-D line segment endpoint. There are four such 3-D endpoints for the line

55 segment correspondence. Those two points furthest apart,

2 2 2 d  px  qx   py  qy   pz  qz  , are saved as the 3-D line segment endpoints.

Given 3-D line segment endpoints corresponding to the same planar surface, error must be removed from the back projection process by fitting these points to a best-fit plane. The best- fit plane is found through the minimization of the sum of squared errors, by solving the over-

 P1x P1y P1z 1   Q Q Q 1 1  1x 1y 1z    determined linear system      2   0 . This is achieved by computing the    3  Pnx Pny Pnz 1         4  Qnx Qny Qnz 1

SVD of the matrix of 3-D endpoints and selecting the vector corresponding to the smallest singular value.

t Once the best-fit planar surface has been computed,   1  2  3  4  , all 3-D line segment endpoints must be projected to this surface in a process known as bundle adjustment. Vector projection can be performed one at a time by an analytic process as described in reference [2]. However when working with hundreds of lines at a time, it is much easier to transform the 3-D coordinate system into the 2-D coordinate system of the planar surface. In this manner, we can project all of the points to the best fit plane at once. First, translate the 3-D line segment endpoints to have a mean at the origin. Then rotate about the z-axis and finally rotate

 cos 0 sin 0  cos sin 0 0 1 0 0  x   0 1 0 0  sin cos 0 0 0 1 0    about the y axis T         y   sin 0 cos 0  0 0 1 0 0 0 1  z         0 0 0 1  0 0 0 1 0 0 0 1 

56 x y x2  y2 where cos  , sin  , cos  and x2  y2 x2  y2 x2  y2  z 2

z sin  as shown in Figure 40 below. x2  y2  z 2

Figure 40 Transformation of 3-D coordinates to2-D planar coordinates The normal vector to the bundled plane is represented by x y zt .

Once transformed into this coordinate system, then the x coordinates are zeroed and the points are mapped back to their bundled 3-D position via T-1. In this manner all of the 3-D line segment endpoints may be transformed at once with only two matrix multiplications. The wireframe model is produced by bundling the reconstructed 3-D line segments of each planar surface.

3.8 Polygon Boundary

A triangular mesh and polygon boundary must be constructed for the purpose of texture mapping.

First, the 3-D line segments in a planar surface are transformed to a localized 2-D coordinate system as described in the previous section. Given a cloud of 2-D line segments, their endpoints may be used to construct a triangular mesh using the work of Delaunay [61]. The non-convex hull is constructed using a process defined by Duckham et. al. [62]. Boundary edges are identified from the Delaunay triangulation and then sorted by length. The longest boundary edge is removed providing it is longer than a specified threshold and the resulting boundary is closed as shown in Figure 41 below.

57

10 10 10

9 9 9

8 8 8

7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10

10 10

9 9

8 8

7 7

6 6

5 5

4 4

3 3

2 2

1 1

0 0 0 2 4 6 8 10 0 2 4 6 8 10 Figure 41 Construction of a non-convex hull Long edges are removed in succession until a tight polygon boundary is produced.

An edge belongs to an external boundary if it is a side of only one triangle in the mesh. An edge removal results in a closed polygon as long as its triangle has one internal vertex in the mesh.

Using this technique of edge removal, we obtain a tightly-fitting non-convex polygon boundary around the 2-D line segments. These segments and their triangulation are then transformed back to their 3-D position as specified in the previous section. Scene textures are then mapped from their 3-D triangles to 2-D imagery via the camera matrix. OpenGL handles the texture mapping from the triangle in the 2-D imagery back to the 3-D model. After this manner, all reconstructed planar surfaces are texture mapped with 2-D imagery.

4 Results

Results of edge detection, line detection, wireframe modelling and 3-D reconstruction are presented in this section.

4.1 Edge Detection

Given the raw image sizes used in this research, 5456 × 3632, only small windows can be shown due to page size and resolution limits (Figure 42).

58

Figure 42 Edge Detection Results The top image shows the result of Canny edge detection. This technique is the gold standard and produces binary edges. The bottom edge detection is grayscale and is produced by the non-linear filter of Equation 6.

59 Canny edge detection is an edge walking algorithm that uses image gradients to detect the edge. The starting pixels for edge traversal are selected by an upper intensity threshold. The edge is traversed until the gradient falls below a lower threshold. If the lower threshold is too low, then the traversal loses track of the edge. If the lower threshold is too large, then only the most intense edges are preserved and lines are broken. With the non-linear filter, a grayscale image is produced that must still be thresholded to produce a binary image. In this approach, lines never lose track but may still be broken when thresholded. Care must be taken with the

Canny edge detection to avoid track loss during traversal. For the purpose of line segment detection, both algorithms are comparably sufficient as long as the Canny detection doesn’t lose track.

4.2 Line Detection

The gold standard for line segment detection is the Hough algorithm. However, it is concluded in this research that the Hough algorithm only works when there are few primitives in the scene.

Hough works well for the detection of a calibration box of only 12 line segments, but fails to perform when the scene contains hundreds or even thousands of lines. The problem with the

Hough transform is the location of peak values in the Hough space. With thousands of lines, the

Hough space is clouded as shown in Figure 43 below.

Figure 43 Hough space ρ resolution 5 pixels; θ resolution 0.5°. 323 peaks found

60

Figure 44 Line Detection Results The top image shows the results of the Hough line detection in Matlab. The bottom image shows only the longest lines from the template matching line detection described in this research.

The Hough line detection algorithm in Matlab detected 235 lines in 16.86 seconds (13.9 lines per second). By constrast, the template matching line detection found 14K lines in in 403.5 seconds

(34.7 lines per second). When pipelined with a six-core processor and 12 hyperthreads, the template matching line detection processed 12 frames and detected 15.7K lines per frame on average for a total bandwidth of 113.7 lines per second. The selling point of the template

61 matching line detection is in the density of the results. The Hough transform just doesn’t produce enough lines for 3-D reconstruction.

4.3 3-D Models

This section showcases the 3-D models that were produced using the techniques from the Method section above. Multiple snapshots are displayed to simulate 3-D in a 2-D document format.

4.3.1 Model Doll House

The reconstruction of the doll house discovered 18 planar surfaces containing 1199 line segments as shown in Figure 45 and Figure 46. Lines were detected in a series of 28 images each of size

5456 × 3632 pixels producing 178,491 line segments in 1:08:53.668 hours. On average, line detection found 6.37K lines per frame in 147 seconds each. The average rate of line detection was 43.18 lines per second.

The 3-D reconstruction took 0.687 seconds on an Intel i7 six-core processor running at

3.2GHz and having 16GB of RAM. CPU utilization remained at 100% for the reconstruction, but

RAM utilization was negligible. The uncompressed image size is 56MB resulting in a one second disk read time on a SATA III at 7800 RPM. Both edge detection and line template matching are quick algorithms and can be performed in real time. Their CPU times were not measured because the system call to retrieve the high-res CPU counter would dominate the processing time. These algorithms could only be timed if they were run many times on the same image and an average could be computed. This would diminish the latency caused by reading the high-res timer. The predominant factor to the performance of line detection is the use of a flood- fill algorithm to cluster templates of compatible direction. It is a wasteful algorithm resulting in unnecessary and redundant memory read operations resulting in the 1-hour run time. The 3-D reconstruction was performed in 0.687 seconds, because the doll house was comprised of few planar surfaces each having few line segments.

62 12

10

8

12 6

10 4

8 2 15

6 30 0

4 25

10 10 2 20 12

0 14 26 15 24 16 22 5 10 18 12 20 10 20 14 18 16 16 22 18 5 14 20 24 12 22 10 26 0 24 26 22 24 0 26 8 28 20 16 18 28 6 12 14 8 10 5 10 15 20 25 30 6

12

10

8

12 6 12 10

4 10 8

2 8 6

0 6 4 6 6 4 8 8 2 10 10 10

12 12 2 12 0 14 14 14 16 16 16 6 18 0 8 18 18 10 20 12 20 20 22 10 28 14 26 16 12 24 22 14 24 22 18 16 22 20 18 20 26 24 20 18 24 22 22 16 24 28 26 24 14 26 26 12 26 28 10

12

12 10

10 8

8 12 6

6 4 10

4

2 8

2

0 28 6 26 0

28 24 4 26 26 22 24 24 20 22 22 28 2 18 20 26 20 6 18 24 8 18 16 22 10 16 12 20 16 0 14 14 14 18 14 16 12 16 18 12 12 10 14 20 26 24 22 10 12 10 22 20 8 24 18 16 10 14 6 26 12 10 8 6

Figure 45 Doll House Wireframe

63

Figure 46 Doll House Textures

64 4.3.2 House Reconstruction

The reconstruction of the full-scale house discovered 13 planar surfaces containing 21,147 line segments as shown in Figure 47 and Figure 48. Lines were detected in a series of 12 images each of size 5456 × 3632 pixels producing 188,475 line segments in 0:27:37.870 hours. On average, line detection found 15.7K lines per frame in 127 seconds each. The average rate of line detection was 113.69 lines per second.

The 3-D reconstruction took 108.99 seconds on an Intel i7 six-core processor running at

3.2GHz and having 16GB of RAM. Once again, CPU utilization remained at 100% for the reconstruction, but RAM utilization was negligible. The uncompressed image size is 56MB resulting in a one second disk read time on a SATA III at 7800 RPM. Both edge detection and line template matching are quick algorithms and can be performed in real time. Hence, their CPU times were not measured. Once again, the predominant factor to the performance of line detection is the use of a non-optimal flood-fill algorithm to cluster templates of compatible direction. This 3-D reconstruction took significantly longer than the doll house, because it contained many planar surfaces and many line segments in each surface. Uncontrollable shadow and occlusions made correlation between frames more difficult as the overlapping area of detected lines were non-optimal. On the lighter side, the scene textures resulted in a dense 3-D line cloud. Photography of the scene was constrained by natural structures such as trees and bushes as well as neighboring structures effectively prohibiting many camera positions. The full- scale size of the scene denied aerial images at close range resulting in negligible roof coverage.

65 7

6

5

7 4

10 6 3

5 2 8

4 1

6 3 0

0 2 4

2 1

2 4 0 0 10 6 2 10 0 8 4 8 8

6 6 6 -2 10 4 8 4 2 10 12 2 10 12 8 -4 0 6 4 0 2 0 2 4 6 8 10 12 14 0

7

6

5

4 7 7

3 6 6

5 2 5

4 1 4

3 0 3 0 0 0 2 2 2 2 2 1 4 1 4 4 6 0 0 0 8 6 6 2 0 2 12 10 4 8 4 10 8 6 6 8 12 8 6 8 10 10 10 4 10 12 2 0

7

6

7 5

6 4

5 3 7 4 2 6 3 1

5 2 0 12 4 1

12 10 3 0 10 8 10 12 2 8 6 8 10 0 6 1 8 2 4 6 4 6 4 4 0 6 2 4 2 8 10 2 8 2 6 10 4 0 0 2 0 0 0 Figure 47 House Wireframe

66

Figure 48 House Textures

67

4.4 Error Analysis

As mentioned in the 3-D wireframe reconstruction, back-projected planar surfaces forming an angle of less than 10° are rejected, because the distance of the reconstructed line to the camera center becomes sensitive to error. To illustrate this concept, consider the nadir view diagram below. We have two corresponding images of a primitive object. Both cameras are at a distance, d, from the scene. The angle, θ, represents the angle between back projected planar surfaces and

α represents the error in camera direction.

Figure 49 Illustration of depth error

The relative depth error ε/d is plotted as a function of inscribed planar angle, θ, and principal camera axis error, α. When the inscribed planar angle is above 10°, the relative depth error is insignificant. However, relative error is sensitive to small perturbations in α when the inscribed planar angle is less than 10°. For this reason, corresponding line segments that back project to planar surfaces having an inscribed angle less than this threshold are rejected. (SFM) implies that there must be some motion between successive images. When the motion is too little, then the triangulation of primitives is ill-conditioned.

68

Figure 50 Relative depth error

For the model doll house, precise measurements can be taken and compared with the 3-D wireframe model. In this analysis, three planar surfaces were selected as shown in Figure 51 below.

Figure 51 Planar surfaces used in error analysis Three planar surfaces used for the error analysis in Table 3.

69 Table 3 Error Analysis ID measured computed absolute relative 1 1.75 1.85 0.10 5.71% 2 1.56 1.63 0.07 4.49% 3 3.14 3.22 0.08 2.55% 4 3.14 3.22 0.08 2.55% 5 2.34 2.42 0.08 3.42% 6 2.34 2.42 0.08 3.42% 7 9.40 9.47 0.07 0.74% 8 1.74 1.87 0.13 7.47% 9 1.56 1.64 0.08 5.13% 10 3.12 3.29 0.17 5.45% 11 2.34 2.46 0.12 5.13% 12 3.12 3.29 0.17 5.45% 13 2.34 2.43 0.09 3.85% 14 9.40 9.30 0.10 1.06% 15 3.76 3.83 0.07 1.86% 16 3.98 4.00 0.02 0.50%

Table 3 shows the measured and computed dimensions in inches as well as the absolute error and relative error for pairs of parallel lines. The sampled dimensions from the 3-D reconstruction contain an average error of 0.09 inches with a standard deviation of 0.038. Relative error ranges from 0.5% to 7.5%.

5 Conclusion

This section summarizes my contributions to the field of computer vision and structure from motion principle. Research challenges are summarized and recommendations are provided for future research.

5.1 Summary of Contributions

The main contribution of this research is the use of lines as primitive elements to reconstruct 3-D scene geometry from multiview 2-D imagery in lieu of historical approaches using points as primitive elements. The result is an order of magnitude data reduction. The problem is a combination of camera matrix estimation and identification of line correspondences across multiple images. Zhang [63] performs early work in the field by reconstructing 3-D line segments when given a correlation between 2-D line segments. It is an iterative numerical

70 algorithm that refines camera motion to maximize overlap between the reconstructed 3-D segments. Similar to Zhang’s approach, Bartoli and Sturm [66] also assume that the line correspondence is an input to the algorithm and the camera position and orientation are computed in an optimization problem. However, it is known that the 2-D line segment correlation is equivalent to having the 3-D reconstruction and the hard problem is in the correlation of primitives as addressed in section 3.5. Schindler, Krishnamurthy and Dellaert [64] improve upon optimization approaches by constraining detected lines to horizontal and vertical edges thus reducing the search space of the camera motion. It seems like a good assumption that urban scenes should be filled with mutually orthogonal lines, but real-world structures contain many angles. Early on in my research, I tried something similar until I was foiled by a triangular building in Providence, RI [3]. Without knowing scene geometry a priori, it is difficult to determine the 3-D orientation of a line. The vertical and horizontal dispositions were given as input into their research. My research makes no assumption of 3-D line orientation. Taylor and

Kriegman [65] improve upon optimization approaches by parameterizing the 3-D lines and camera matrix while a numerical algorithm perturbs these parameters until the 2-D projection of these lines match actual images. In this approach, the correlation of lines between images is not needed as input. The tradeoff is that they increase the search space. Their approach is only shown successful for smaller scenes containing few line segments (less than 50 lines).

Chandraker, Jongwoo and Kriegman [67] reformulate the problem using a calibrated rig. Two poses of the camera rig captures four images of the scene where the two images of a single pose are related via the calibration of the rig. They solve for the motion of the rig by tracking lines between successive video frames having only a small perturbation. A putative line correlation from their tracking algorithm contains errors that are eliminated via RANSAC. The idea is that four images of the same line produces back projected planes intersecting in a common

3-D line. So, a 4 × 4 matrix composed of these planes should have rank two. Given at least three

71 such correspondences, they construct a 4n × 4 linear system in the unknown variables of rotation and translation between poses of the camera rig.

My research is the only known method which assumes that urban scenes must have planar surfaces containing straight lines. This allows us to locate planar surfaces from the consensus algorithm in 3.4. It also gives us the correlation of primitives (section 3.6) via the registration techniques of section 3.5. Finally, the detection of planar surfaces gives us the bundle adjustment and polygon boundaries of sections 3.7 and 3.8. It is this assumption of planar surfaces that makes my research novel.

Secondary contributions to the fields of computer vision and structure from motion are as follows. In the field of image processing, I have contributed a non-linear edge detection filter that solves the lost track problem of the Canny edge detector and improves upon separable FIR filters in that the response to an edge is independent of the edge direction. I have contributed the directional polynomial point descriptor used in the matching of corner points as well as a crude matching along points of a line. The linear algebra of the homographic registration utilizes 1-D homographies to reduce the number of line-segment correspondences needed for 2-D registration.

In effect, the desired 2-D homography is projected to a 1-D homography either by the point descriptor correspondence or by projection of the camera matrix. I have contributed to the field of aerial image registration through investigation of line segment orientations. Finally, I have recommended the use of line segments for the reconstruction of manufactured scenes containing planar surfaces to reduce the primitive data by an order of magnitude.

5.2 Challenges Encountered

This research encountered many challenges. A few of them are highlighted in this section. The three largest problems in 3-D reconstruction from 2-D imagery by triangulation are the detection of primitives (points, lines), camera matrix estimation and the correlation of primitives between images. I believe that I have solved the problem of line segment detection. More-than-sufficient line segment densities were produced. The problem of camera matrix estimation is challenging.

72 It can easily be solved by the introduction of a calibration box into the scene. This technique is viable in controlled laboratory environments and is useful for reconstructing 3-D models to be used in virtual reality simulations. Outside the laboratory, it is infeasible to build a calibration structure around the scene. GPS satellites are at known locations and they always enclose the scene, but their altitudes are too high to produce useful camera matrices. For outdoor scenes, the camera position and orientation must be known. Finally, the correlation of primitives is a hard problem. The assumption of large planar surfaces and the octant subdivision technique of the scene mitigate the correspondence problem, but all lines look alike. One must consider the configuration of lines in a planar surface for an accurate correspondence.

5.3 Recommendations for Future Research

The performance of the template-matching line detection algorithm is slow for large images.

This is due to the number of memory references when clustering neighboring pixels of compatible directions using a recursive flood fill algorithm. For each pixel contributing to the orientation of the line, the algorithms searches all eight of its neighboring pixels resulting in many unnecessary memory references. It is recommended that the traversed orientation be used to limit the memory references to the line segment pixels only. It is believed that this algorithm could be implemented in real time without the use of hardware acceleration.

Data collection is also a challenging problem, research should be performed to augment an existing 3-D model for the purpose of refinement. Given a crude 3-D model, it could be refined through the introduction of higher-resolution imagery in a course-to-fine approach at modeling. The coarse model could be used to guide the reconstruction of a fine model and low- resolution texture maps could be replaced with high-resolution imagery.

Line segment detection results in the piecewise approximation of curved edges. Given the template matching technique defined in this research, it should be possible to detect smooth curves by tracking a gradual change in line orientation. It is believed that subpixel precision of curve detection can be achieved with this approach.

73 Finally, I recommend the exploration of drones and military GPS to build a dynamic calibration structure over an urban scene. I have found that there is no substitute for a calibration structure vice knowledge of camera position and orientation. It is the difference between interpolation and extrapolation. Accurate camera matrices are a prerequisite to 3-D reconstruction.

74 Appendix A Geometric Parameterization of a 2-D Homography

Matrix factorization is not unique, but is nevertheless a useful analysis technique. In this appendix, a 2-D homography is factored into pure projective, Affine and similarity components,

H = HP HA HS. To derive a similarity, the order of parameterization is important. This factorization begins with a rotation followed by scaling and apply a translation last. Using polar

coordinates, rotations are reduced 2.5 to a translation in the angular

2 (x',y') coordinate. Converting back to

1.5 Cartesian coordinates produces

y axis x'  r cosα  θ 1 r (x,y) y'  r sinα  θ

0.5

 0 0 0.5 1 1.5 2 2.5 x axis

Figure 52 Rotation Trigonometry

75 By applying angle addition formulas, this becomes

x y x'  r cosθ  r sinθ r r . x y y '  r sinθ  r cosθ r r

x'  cosθ  sinθx In matrix form, the rotation becomes  . Scaling the result and then  '     y  sinθ cosθ y adding a translation component produces the similarity transform,

x'  scosθ  ssinθx t    x  '       . In homogeneous coordinates, the similarity is given by y  ssinθ scosθ y ty 

' ωx  s cos θ  s sin θ tx ωx   ωy'  s sin θ s cos θ t ωy .    y      ω   0 0 1 ω 

a11 a12 0 a11 a12 A general Affine transform is of the form H  a a 0 ,  0 , while A  21 22  a21 a22  0 0 1

 1 0 0 a pure projective transform is of the form, H   0 1 0 containing only the vanishing line P   v1 v2 u of the plane. In block matrix form, the desired factorization of the homography is:

h h h  11 12 13  I 0K 0R t  KR Kt  H  H H H  h h h    . This P A S  21 22 23  t     t t  v u 0 1 0 1 v KR v Kt  u h31 h32 h33 form begins by factoring the rotation from the Affine component in order to annihilate the h12 term using a Jacobean rotation,

h11 h12  h11 h12  cosθ sin θ cosθ  sin θ KR         . Select θ to annihilate the h21 h22 h21 h22 sin θ cosθsin θ cosθ 

76 h12 h12 term, h11 sinθ  h12 cosθ  0 . Theta is then computed from sin θ   and 2 2 h11  h12

h11 2 2 cosθ  . From real analysis, h11  h12  0 , because that would imply that both 2 2 h11  h12 h11=0 and h12=0 causing the determinant of the Affine transform to be zero (planar surface in one image appears as a line in another image). In this case, there isn’t enough information in the image for a 3-D reconstruction, because corresponding lines are degenerate in the second image.

a 0 Now, the expression is a lower triangular Affine transform of the form, A    . By b c factoring out the square root of the determinant, the desired form is produced,

 a  0 1 ρ 0  ac  A     ac   . Knowing that the determinant is 1, the diagonal entries  α ρ  b c   ac ac  must be reciprocals. Using this technique, continue to factor the expression:

h11 h12  cosθ sinθ cosθ  sinθ 1 h11 h12 h11  h12cosθ  sinθ          h h  sinθ cosθ sinθ cosθ 2 2 h h h h sinθ cosθ  21 22   h11  h12  21 22 12 11  

Matrix multiplication yields:

 h 2  h 2   11 12 0  h h  1 ρ 0 det(H) cos θ  sin θ 11 12  KR   R  det(H)  h h   α ρ  h h  h h det(H) sin θ cos θ   21 22     11 21 12 22   2 2 h 2  h 2  h11  h12 det(H) 11 12 

At this point, most of the decomposition is determined. Only the translation and the

h13  projective components are missing. From the block matrix form, Kt    . So, h23

77 h ρ 0 h 1  13    13  t t  K       . Back to the block matrix form, v KR  h31 h32. h23  α 1 ρh23

cosθ sinθ ρ 0 t t 1 1 1 1    Solving for v yields, v  h31 h32R K = h31 h32   .    sinθ cosθ α 1 ρ

Finally, only the dependent homographic scale term u remains. Returning to the

t t block matrix form one more time yields, h33  v Kt  u . Solving for u gives u  h33  v Kt .

Putting it all together produces the following factorization:

h h 1.   11 12 h21 h22

2. rad = hypot(h11, h12)

3. θ = atan2(-h12, h11)

4. ρ = Δ / rad

h h  h h 5. α  11 21 12 22 rad

1 ρ 0 1  ρ 0  6. K    , K     α ρ  α 1 ρ

cosθ  sinθ 1  cosθ sin θ  7. R    , R    sinθ cosθ  sin θ cosθ

1 h13  8. t  K   h23

9.

10.

78 This ten step computation produces the required factorization

h11 h12 h13   1 0 0 1 ρ 0 0 s cosθ  s sinθ tx  H  h h h    0 1 0   α ρ 0  s sinθ s cosθ t  highlighting  21 22 23      y  h31 h32 h33 v1 v2 u  0 0 1  0 0 1  the geometric parameterization of a 2-D homography with eight degrees of freedom.

79 Appendix B Orthogonal Regression

Orthogonal regression is the process of computing a best-fit function to a data set minimizing the sum of squared errors (SSE) where the errors are measured orthogonal to the estimation function.

Of specific interest to this research is the estimation of a 2-D line through a point cloud. The most common technique involves the computation of the eigenvectors of a covariance matrix or similar formulation [1]. The eigenvector computation can be circumvented by using a two- parameter definition of a line, resulting in a closed-form solution.

Figure 53 Equations of a 2-D line The left image depicts the general form equation of a line in 2-D. The vector normal to the line is indicated by coefficients [A B]. The general form of a line contains a degree of freedom in the scale of the coefficients. This necessitates the computation of eigenvectors to determine the ratios of coefficients [A B C]. In the right image, convert [A B C] to cylindrical coordinates [r θ C] with r = 1. This removes the scale ambiguity of the line.

The orthogonal distance between a point and a line is given by d  x0 cosθ  y0 sinθ  C [2].

2 The sum of square distances is D  xi cosθ  yi sinθ  C . Minimize the SSE by computing the point where the partial derivatives are zero [2].

δD N

 2xi cosθ  yi sinθ  C  0 δC i1

CN  xi cosθ  yi sinθ  0 i

1 δD C   xi cosθ  yi sinθ. This equation is used as a variable substitution into  0 N i δθ and is also used to compute the value of C after the determination of θ.

80

δD Now, examine  0 . δθ

δD  2xi cosθ  yi sinθ  C xi sinθ  yi cosθ  0 . Multiplication and refactoring δθ i

2 2 yi  xi 2 2 yields  2sinθ cosθ  xi yi cos θ  sin θ Cyi cosθ  xi sinθ  0 . i 2 i i

Trigonometric angle addition formulas [2] simplifies to

2 2 yi  xi sin 2θ  cos 2θ xi yi  Cyi cosθ  xi sinθ  0 . Now perform a substitution i 2 i i

δD for C using the result from  0 . δC

2 2 yi  xi 1 sin 2θ  cos 2θxi yi  xi cosθ  yi sinθyi cosθ  xi sinθ  0 . After i 2 i N i i multiplication and refactoring, the equation is

2 2 yi  xi sin 2θ  cos 2θ xi yi i 2 i

2 2      yi    xi  1    2 2 1  i   i    xi  yi cos θ  sin θ 2sinθ cosθ  0 . N  i  i  N 2

Using the same angle addition formulas from before produces

2 2      yi    xi  1    1  i   i    xi  yi cos 2θ  sin 2θ  0 . N  i  i  N 2

81 One final factoring yields

2 2  y 2  x2 1       1    sin 2θ  i i   y    x    cos 2θ x y   x  y   0 ,  2 2N   i  i   i i N  i  i  i   i   i    i  i  i  which is of the form Rcos2θ  S sin2θ  0 for constants R and S. Factor out the amplitude to

 R S  arrive at R2  S 2  cos 2θ  sin 2θ   0 . Using the trigonometric  2 2 2 2   R  S R  S 

R S substitutions cos φ and sin φ  shown in Figure 54 R 2  S 2 R 2  S 2

Figure 54 Trigonometric Substitution and dropping the amplitude term, cosφcos2θ sinφsin2θ  0. One final angle addition

φ π φ 3π formula yields cosφ 2θ  0. This equation is satisfied when θ   or θ   . 2 4 2 4

Note that these are the same solution due to the ambiguity of a line’s orientation (θ vs. θ+180º).

The following algorithm summarizes the 2-D least squares solution for a line.

1    1. R   xi yi   xi  yi  i N  i  i 

2 2 y 2  x2 1      2. S  i i   y    x     i  i  2 2N  i   i  

3. φ  atan2S,R

φ π 4. θ   2 4

82 5. A  cosθ

6. B  sinθ

1 1 7. C   cosθ xi  sinθ yi N i N i

83 Appendix C Multiple Regression

Multiple regression is used to fit a polynomial to a data set that minimizes the sum of squared

error (SSE) where the error is measured vertically εi  yi  f xi ,

2 M f x  P0  P1x  P2 x  PM x . The sum of all such squared errors over the data set is

N 2 2 M E  yi  P0  P1xi  P2 xi  PM xi  . Locate the minimum squared error by solving the i1

δE system  0 , 0  j  M where M is the order of the polynomial. First, compute all of the δPj partial derivatives

N δE 2 M  2 yi  P0  P1xi  P2 xi  PM xi  0 δP0 i1

N δE 2 M  2yi  P0  P1xi  P2 xi  PM xi xi  0 δP1 i1

N δE 2 M 2  2yi  P0  P1xi  P2 xi  PM xi xi  0 δP2 i1

N δE 2 M j  2yi  P0  P1xi  P2 xi  PM xi xi  0 δPj i1

In matrix form,

 1 1  1  2 M  P0   1 1  1   1 x1 x1`  x1     y0  x1 x2  xn  2 M  P1 x1 x2  xn     1 x x  x     y 2 2 2  2 2 2  2 2 2  1   x1 x2  xn   P2    x1 x2  xn  which is a                         1 x x 2  x M     y  M M M  n n n    M M M  M  x1 x2  xn  PM  x1 x2  xn  linear system of M+1 variables and as many independent equations. The polynomial coefficients may be solved by any numerical technique.

84 Appendix D Bilinear Interpolation

Linear interpolation is an approximation technique where the value of a function is approximated between two known points by using a straight line. By using similar triangles, one can see that

x  p x y  q y  p y  p y . q x  p x

(qx,qy)

(x,y) y

(px,py) x

Figure 55 Similar Triangles

This equation is greatly simplified when using a grid of uniform spacing, qx  px 1 and px is

translated to the origin. With these simplifications, y  x  qy  py . In matrix form, the

1 x interpolation becomes y  py qy    . In bilinear Interpolation (Figure 56), the  x  interpolation is performed in two directions. First, use linear interpolation to compute two z

I00 I01 1 x values for x and y={0,1} using z       , then use linear interpolation again with I10 I11  x 

{x, y=0}, {x, y=1} and the corresponding z values. The final interpolated z value is

I00 I01 1 x z  1 y y      . I10 I11  x 

85

Figure 56 Bilinear Interpolation In bilinear interpolation, perform linear interpolation in the direction of the x-axis followed by linear interpolation in the direction of the y-axis.

The final result is a quadratic surface from linear interpolation in two directions.

86 Appendix E Line Templates

To compute the line segment templates (Figure 15) required to exhaustively represent a line of any orientation, a template size must be selected. In this research, templates are 3x3 through

25x25 pixels in size. For example, Figure 57 depicts a template of size 11x11. Then, we consider every orientation by examining a line through each pixel center as shown in the right of

Figure 57.

Figure 57 Template Grid Pixels are represented by black outlines of squares. Pixel centers are integers. In polar coordinates, 80 unique angles exist in this example.

Pixel centers are then converted to the polar coordinate system and sorted by angle. Duplicate angles are removed resulting in 80 unique angles as depicted in the right image of Figure 57.

Each template is generated by selecting the pixels that are within a specified distance from the target line and outside a keep-out region. Figure 58 shows a specific template with a width of

2.25 pixels and a keep-out region of 0.5 pixels from the target line. The keep-out region is necessary, because line segments are inherently blurry.

87

Figure 58 Line Template Generation Each template is generated by drawling a center line through a pixel center. A keep-out region and template width are specified. Pixel centers within the template width and outsize the keep- out region are selected to form the template.

An exhaustive list of templates are shown below for a template size of 7x7, width of 2.25 pixels and keep-out of 0.5 pixels.

Figure 59 Enumeration of 7x7 line segment templates

88 Appendix F 2-D Image Registration with Points

Given the images in Figure 60 and corresponding points in Table 4, determine the planar homography H relating the two views.

Figure 60 Images and Corresponding Points Two planar images are shown with corresponding points. Point data is shown in Table 4.

Table 4 Corresponding Points from Figure 60

Left Image Right Image Left Image Right Image

px py qx qy px py qx qy 968 1462 1551 2979 1190 393 1232 1555 351 255 361 1524 1394 742 1587 2025 3749 783 3558 1528 2295 568 2275 1572 92 1141 724 2807 2736 642 2669 1577 3558 10 3310 205 1538 1331 1924 2725 3725 1608 3585 2593 728 1036 1176 2548 2510 359 2396 1184 1538 1332 1924 2725 1936 374 1892 1354 3086 656 2969 1510 1354 288 1323 1346 3089 509 2948 1277 1188 537 1309 1775 3129 388 2959 1061 1801 635 1880 1784 3144 283 2953 870 2238 234 2109 1036 3124 162 2913 649 2051 119 1886 880 3167 40 2929 396 2662 84 2451 642 1664 290 1608 1278 3441 272 3231 757 1190 393 1234 1555 3550 452 3354 1047 1461 637 1594 1862 3541 578 3358 1262 1957 511 1966 1567 3396 1532 3334 2577 2926 292 2755 948 3443 1874 3402 2928 2671 266 2515 980 919 111 797 1136 1784 86 1617 884 535 569 748 1964 2516 228 2361 954 141 960 654 2578 1565 450 1599 1562

89 2250 1365 2464 2621 2095 275 1992 1149 2082 1543 2384 2849 2062 136 1905 909 395 368 488 1695 2162 192 2019 982 541 202 505 1392 202 380 320 1761 217 82 99 1260 259 505 462 1940 146 132 71 1368 147 132 72 1367 713 386 793 1650 58 534 302 2028 2320 141 2146 845 42 488 257 1965 2671 266 2515 981 2259 391 2186 1297 1704 218 1609 1143

First, compute normalization matrices using the mean and std deviation of the coordinates in the

1 σ x 0 0 1 0  μx  ωx ~x  S T  x   0 1 σ 0  0 1  μ   ωy data set,  y   y    .  0 0 1 0 0 1   ω 

1 0 -1862.95238095238 0.00088324578610 0 0 T  0 1 - 0.52711111111111 S   0 0.00224214662846 0 l   l   0 0 1   0 0 1

1 0 -1865.22222222222 0.00098703175415 0 0 T  0 1 -1542.34920634921 S   0 0.00149237843335 0 r   r   0 0 1   0 0 1

Normalized data is given in Table 5.

Table 5 Normalized Data

Left Image Right Image Left Image Right Image

px py qx qy px py qx qy -0.7905 2.0962 -0.3101 2.1440 -0.5944 -0.3007 -0.6250 0.0189 -1.3354 -0.6101 -1.4847 -0.0274 -0.4142 0.4818 -0.2746 0.7203 1.6658 0.5737 1.6708 -0.0214 0.3816 0.0917 0.4045 0.0443 -1.5642 1.3764 -1.1264 1.8873 0.7711 0.2576 0.7934 0.0517 1.4971 -1.1594 1.4260 -1.9958 -0.2870 1.8024 0.0580 1.7650 1.6446 2.4235 1.6975 1.5680 -1.0024 1.1410 -0.6803 1.5008 0.5715 -0.3769 0.5239 -0.5348 -0.2870 1.8047 0.0580 1.7650 0.0645 -0.3433 0.0264 -0.2811 1.0803 0.2890 1.0895 -0.0483 -0.4495 -0.5361 -0.5352 -0.2930 1.0829 -0.0406 1.0687 -0.3960 -0.5961 0.0222 -0.5490 0.3472 1.1182 -0.3119 1.0796 -0.7184

90 -0.0547 0.2419 0.0146 0.3606 1.1315 -0.5473 1.0737 -1.0034 0.3313 -0.6572 0.2406 -0.7557 1.1138 -0.8186 1.0342 -1.3332 0.1661 -0.9150 0.0205 -0.9885 1.1518 -1.0922 1.0500 -1.7108 0.7058 -0.9935 0.5782 -1.3437 -0.1757 -0.5316 -0.2539 -0.3945 1.3938 -0.5720 1.3481 -1.1720 -0.5944 -0.3007 -0.6230 0.0189 1.4901 -0.1684 1.4695 -0.7392 -0.3550 0.2464 -0.2677 0.4770 1.4821 0.1141 1.4734 -0.4184 0.0831 -0.0361 0.0995 0.0368 1.3541 2.2531 1.4497 1.5441 0.9389 -0.5272 0.8782 -0.8870 1.3956 3.0199 1.5168 2.0679 0.7137 -0.5854 0.6414 -0.8392 -0.8337 -0.9330 -1.0544 -0.6064 -0.0697 -0.9890 -0.2450 -0.9825 -1.1729 0.0939 -1.1027 0.6293 0.5768 -0.6707 0.4893 -0.8780 -1.5209 0.9706 -1.1955 1.5456 -0.2632 -0.1729 -0.2628 0.0293 0.3419 1.8787 0.5910 1.6098 0.2050 -0.5653 0.1251 -0.5870 0.1935 2.2778 0.5121 1.9500 0.1758 -0.8769 0.0393 -0.9452 -1.2966 -0.3568 -1.3594 0.2278 0.2641 -0.7514 0.1518 -0.8363 -1.1676 -0.7289 -1.3426 -0.2244 -1.4670 -0.3298 -1.5252 0.3263 -1.4538 -0.9980 -1.7433 -0.4214 -1.4167 -0.0496 -1.3850 0.5934 -1.5165 -0.8859 -1.7710 -0.2602 -1.5156 -0.8859 -1.7700 -0.2617 -1.0157 -0.3164 -1.0583 0.1607 -1.5942 0.0154 -1.5429 0.7248 0.4037 -0.8657 0.2771 -1.0407 -1.6083 -0.0877 -1.5874 0.6308 0.7137 -0.5854 0.6414 -0.8377 0.3498 -0.3052 0.3166 -0.3662 -0.1404 -0.6931 -0.2529 -0.5960

 1  t t t h t  0  ω'i xi y'i xi   2  Next, build the matrix A A using the model    h  0 . ω' xt 0t  x' xt    i i i i   3  h 

62.0000 5.1353 0.0000 0 0 0 5.3871 -4.1094 -61.1786 5.1353 62.0000 0.0000 0 0 0 -4.1094 -30.3740 -13.5019 0.0000 0.0000 63.0000 0 0 0 -61.1786 -13.5019 0.0000 0 0 0 62.0000 5.1353 0.0000 -7.6727 -18.4557 21.3540 0 0 0 5.1353 62.0000 0.0000 -18.4557 -64.5672 -56.0047 0 0 0 0.0000 0.0000 63.0000 21.3540 -56.0047 -0.0000 5.3871 -4.1094 -61.1786 -7.6727 -18.4557 21.3540 183.9437 32.4739 8.2655 -4.1094 -30.3740 -13.5019 -18.4557 -64.5672 -56.0047 32.4739 241.2507 51.3510 -61.1786 -13.5019 0.0000 21.3540 -56.0047 -0.0000 8.2655 51.3510 124.0000

The eigenvector corresponding to the smallest eigenvalue of is ~ h 

0.54385262007774 0.10133256663765 0.00932392425904 -0.22385548456331 0.57427911686487

91 0.04675647471093 -0.00185046711013 0.05188519458813 0.55594932808672 ~ Reshape h by rows to arrive at the normalized homography ~ H  0.54385262007774 0.10133256663765 0.00932392425904 -0.22385548456331 0.57427911686487 0.04675647471093 -0.00185046711013 0.05188519458813 0.55594932808672

1 1 ~ Denormalization, H  Tr  Sr  H  Sl Tl , yields the final 2-D homography

H =

0.97175874112 0.89853510655 -0181.35206002630 -0.27127636359 2.09419365998 1187.40976760642 -0.00000328412 0.00023375628 1.00000000000

Finally, resample the right image into the geometry of the left image as shown in Figure 61.

Figure 61 2-D Image Registration from Points In the right image, resample the right image from Figure 60. The left image is an overlay of the left image of Figure 60 and the resampled image.

92 Appendix G 2-D image Registration with Lines

Given the images in Figure 62 and corresponding lines in Table 6, determine the planar homography H relating the two views.

Figure 62 Images and Corresponding Lines Two planar images are shown with corresponding lines. A few of the lines correspond to actual lines in the image. These lines are few in number, so join corresponding image features to create artificial lines in the images. Endpoint data is shown in Table 6.

Table 6 Endpoints from Corresponding Lines

Left Image Right Image

px py qx qy px py qx qy 69 875 386 257 3 318 334 1 122 877 434 261 3 366 381 3 262 883 560 257 3 493 503 3 322 877 592 295 6 549 552 5 1307 2806 3907 2897 68 2273 2502 2848 1098 2722 3829 2820 34 2189 2448 2757 3840 2822 3578 1672 2455 2749 2723 1634 3925 2882 3642 1657 2509 2844 2784 1663 4000 874 3728 229 3573 1213 3609 305 4020 807 3775 229 3627 1228 3659 305 112 371 3734 49 61 60 3708 135 695 554 3741 710 494 280 3381 757 709 1556 2946 1673 92 1141 2082 1542 1934 237 1737 2719 1764 119 492 2271 982 2148 1059 320 85 1674 919 111 3071 2916 2777 105 1630 2686 2663 84

93 Compute normalization matrices using the mean and std deviation of the coordinates in

1 σ x 0 0 1 0  μx  ωx the data set, ~x  S T  x   0 1 σ 0  0 1  μ   ωy .  y   y     0 0 1 0 0 1   ω 

1 0 - 2090.40625 0.00064581338697 0 0 T  0 1 -1261.15625 S   0 0.00095090405290 0 l   l   0 0 1   0 0 1

1 0 -1535.84375 0.00071207205029 0 0 T  0 1 -1081.4375  S   0 0.00097168831404 0 r   r   0 0 1   0 0 1

Normalized data is given in Table 7.

Table 7 Normalized Endpoints from Corresponding Lines

Left Image Right Image

px py qx qy px py qx qy -1.3055 -0.3672 -1.1007 -0.9549 -1.0915 -0.7418 -0.8558 -1.0498 -1.2712 -0.3653 -1.0697 -0.9511 -1.0915 -0.6952 -0.8223 -1.0479 -1.1808 -0.3596 -0.9884 -0.9549 -1.0915 -0.5718 -0.7355 -1.0479 -1.1421 -0.3653 -0.9677 -0.9187 -1.0894 -0.5174 -0.7006 -1.0460 -0.5059 1.4690 1.1732 1.5555 -1.0452 1.1578 0.6880 1.7165 -0.6409 1.3891 1.1228 1.4823 -1.0694 1.0762 0.6495 1.6281 1.1299 1.4842 0.9607 0.3907 0.6545 1.6204 0.8453 0.5369 1.1848 1.5413 1.0020 0.3764 0.6930 1.7127 0.8888 0.5651 1.2332 -0.3681 1.0576 -0.9815 1.4506 0.1278 1.4762 -0.7545 1.2462 -0.4319 1.0879 -0.9815 1.4891 0.1424 1.5118 -0.7545 -1.2777 -0.8465 1.0615 -1.1526 -1.0502 -0.9925 1.5467 -0.9196 -0.9012 -0.6724 1.0660 -0.5241 -0.7419 -0.7787 1.3139 -0.3153 -0.8921 0.2804 0.5526 0.3916 -1.0281 0.0579 0.3889 0.4475 -0.1010 -0.9739 -0.2282 1.3863 0.1625 -0.9352 -0.7433 1.1559 -0.7158 0.8433 -0.6661 -0.8949 -1.0331 0.5758 -0.4392 -0.9430 0.6333 1.5736 0.4434 -1.0994 0.0670 1.5591 0.8026 -0.9692

Compute the general equation of the normalized lines in the right image.

94

Table 8 Normalized Lines from Right Image in Homogeneous Coordinates

Right Image A B C -0.30802519555053 -0.23569584864447 -0.51105269149527 -0.35272285799635 -0.26916323500788 -0.57211280908208 -0.47612727387937 -0.35603602514270 -0.72326414037284 -0.52859844283751 -0.38879133945583 -0.77697982734666 0.55872078057273 -1.73318337039468 2.59070795162263 0.55191896237446 -1.71894192938897 2.44016853008647 -1.08343247015408 -0.19083530947649 1.01833266745297 -1.14756389888069 -0.19581981382849 1.13058589026691 -0.88229298914790 -0.02563459381027 1.28313306771027 -0.89686831385849 -0.02278630560913 1.33873034474612 0.07287662355296 -2.59692676739087 -2.50096418589913 0.46349532579686 -2.05575200917397 -1.25705937684388 0.38964701392985 -1.41702338006796 0.48261609921922 2.09107325181308 0.90575564796304 0.50732949091664 -1.51874883484379 -0.59386808993803 -1.22708648026096 -2.52833299313087 -0.73557042794482 1.31636802708608

h1  A' pt B' pt C' pt    0 Next, build the matrix M t M using the model  h2  .  t t t    A'q B'q C'q   0    3    h 

18.3199 5.1160 2.6254 4.6334 2.8805 -4.0696 -0.3347 6.7754 -20.9608 5.1160 17.5910 -1.1406 2.8805 2.5899 -3.2709 6.7754 -0.3130 -2.1172 2.6254 -1.1406 18.8631 -4.0696 -3.2709 4.4073 -20.9608 -2.1172 0.7791 4.6334 2.8805 -4.0696 12.6801 3.7392 -2.6254 6.0948 1.0805 -9.7829 2.8805 2.5899 -3.2709 3.7392 13.4090 1.1406 1.0805 -6.8234 -15.1178 -4.0696 -3.2709 4.4073 -2.6254 1.1406 13.1369 -9.7829 -15.1178 2.8369 -0.3347 6.7754 -20.9608 6.0948 1.0805 -9.7829 42.5495 9.6822 1.1769 6.7754 -0.3130 -2.1172 1.0805 -6.8234 -15.1178 9.6822 36.6314 -1.5423 -20.9608 -2.1172 0.7791 -9.7829 -15.1178 2.8369 1.1769 -1.5423 37.0555

The eigenvector corresponding the smallest eigenvalue of is ~ h 

0.59447286301482 -0.19174266767672 -0.03470110912665 0.11397172463850 0.51895812731118 -0.01344908908096

95 -0.03174438911898 0.00655337380029 0.57017012784900

~ Reshape h by rows to arrive at the normalized homography ~ H 

0.59447286301482 -0.19174266767672 -0.03470110912665 0.11397172463850 0.51895812731118 -0.01344908908096 -0.03174438911898 0.00655337380029 0.57017012784900

1 1 ~ Denormalization, H  Tr  Sr  H  Sl Tl , yields the final 2-D homography

H =

0.838894255907 -0.407298178308 126.536482792066 0.088535239773 0.850339384220 -261.457872815009 -0.000033876553 0.000010297382 1.000000000000

Finally, resample the right image into the geometry of the left image as shown in Figure 63.

Figure 63 2-D Image Registration from Lines In the right image, resample the right image from Figure 62. The left image is an overlay of the left image of Figure 62 and the resampled image.

96 Appendix H Image Registration with Lines and 1-D Homographies

This exercise uses the following simulated data.

left right 600 700

600 500

500

400

400

300

300

200

200

100 100

0 0 -200 0 200 400 600 800 1000 -200 0 200 400 600 800 1000 1200 1400

Figure 64 Images and Corresponding Lines Two planar images are shown with corresponding lines. Line data is simulated with no error. Endpoints are corresponding as well.

Table 9 Endpoints from Corresponding Lines

Left Image Right Image

px py qx qy px py qx qy 262.6694 400.6458 196.0328 433.6469 263.6642 429.3729 191.7353 455.2660 305.1730 247.8270 253.1436 199.6651 311.2203 266.3241 254.5679 208.3766 613.3986 468.7130 628.3603 499.8373 686.1814 571.5622 706.6285 613.1297 288.8755 239.5356 355.6127 263.5818 293.2525 255.4234 367.9035 289.6226 706.2751 278.8618 638.2791 289.5846 808.3272 354.4585 716.2787 357.4605 612.0384 503.8740 669.4182 472.6300 684.9293 614.4897 761.7333 588.3504 254.5390 335.1722 330.6194 313.7168 255.2133 356.1226 339.4809 342.4971 272.6969 301.3159 243.1168 365.1838 275.1982 321.3199 242.6431 387.4566 114.7959 279.2899 142.7347 340.8109 109.4940 279.1691 137.1718 347.6308 135.3541 300.1476 205.8481 253.0141 130.1000 303.5152 203.7325 261.1066

Table 10 Möbius Transforms

a c M    b d a b c d 1.04405736554517 0.00020920605475 0 1.00000000000000 1.16103487029300 0.00022214483497 -0.00000000000087 1.00000000000000 1.33352148095219 -0.00017062287859 0.00000000000368 1.00000000000000 1.13634542354705 -0.00025799114699 0.00000000000068 1.00000000000000 1.36338803872368 0.00027653869320 -0.00000000000513 1.00000000000000 1.22357206576533 -0.00022421375100 0.00000000000184 1.00000000000000

97 1.05996226133378 -0.00023331631531 0.00000000000012 1.00000000000000 1.05205531211818 0.00006442570677 0.00000000000025 1.00000000000000 1.08228599061387 -0.00014373061298 0 1.00000000000000 0.98670494044027 -0.00018038147639 -0.00000000000001 1.00000000000000

First, compute the normalization matrices, so that the endpoint data in Table 9 has a mean of zero and variance of one. Each point is normalized using ~x  STx. Normalized data appears in Table 11.

TL = TR = 1.0000 0 -361.4491 1.0000 0 -386.9728 0 1.0000 -339.3527 0 1.0000 -380.1327 0 0 1.0000 0 0 1.0000

SL = SR = 0.0050 0 0 0.0042 0 0 0 0.0107 0 0 0.0079 0 0 0 1.0000 0 0 1.0000

Table 11 Normalized Endpoints from Corresponding Lines

Left Image Right Image

px py qx qy px py qx qy -0.4928 0.6576 -0.8252 1.0116 -0.5161 0.3909 -0.8172 0.5965 -0.2807 -0.9819 -0.5403 -1.4986 -0.3171 -0.9035 -0.5542 -1.3636 1.2569 1.3878 1.3315 1.7217 1.2524 1.5198 1.3379 1.8498 -0.3620 -1.0708 -0.0291 -0.8129 -0.3923 -0.9901 -0.0798 -0.7186 1.7202 -0.6489 1.3810 -0.5339 1.7636 -0.2038 1.3783 -0.1800 1.2501 1.7650 1.5363 1.4298 1.2471 1.8606 1.5686 1.6531 -0.5333 -0.0448 -0.1538 -0.2750 -0.5515 -0.1906 -0.1988 -0.2988 -0.4427 -0.4081 -0.5903 0.2771 -0.4678 -0.4669 -0.6041 0.0581 -1.2305 -0.6444 -1.0911 0.0156 -1.1614 -0.8016 -1.0456 -0.2580 -1.1279 -0.4206 -0.7762 -0.9262 -1.0752 -0.6083 -0.7670 -0.9450

98 One-dimensional homographies are normalized according to Equation 11,

~ ~ ~ M  P' S'T'P'M  PT 1S 1P . For example, the first normalized Möbius transform is computed as follows (Equation 9, Equation 2, Equation 3). ~ P' = P  = -0.8258 0.5639 -0.6467 -0.8961 0.4438 57.5793 0 0 1.0000 0 0 1.0000

S' = T 1 = 0.0042 0 0 1.0000 0 361.4491 0 0.0079 0 0 1.0000 339.3527 0 0 1.0000 0 0 1.0000 T' = S 1 = 1.0000 0 -386.9728 200.4576 0 0 0 1.0000 -380.1327 0 93.2139 0 0 0 1.0000 0 0 1.0000 ~ P' = P = -0.9409 263.6642 -0.6845 -0.4928 0.3387 429.3729 0.7290 0.6576 0 1.0000 0 1.0000 ~ M = M = 1.0441 0 0.7623 0.0000 0.0002 1.0000 0.0320 1.0000

Proceeding in this fashion, compute the normalized Möbius transforms in Table 12.

Table 12 Normalized Möbius Transforms

a c M    b d a b c d 0.76233770621599 0.03203339454061 0.00000000000000 1.00000000000000 0.90921671764552 0.02723882518475 -0.00000000000000 1.00000000000000 0.99057854686905 -0.01722160458745 0.00000000000002 1.00000000000000 0.96484418126937 -0.04345286213709 0.00000000000000 1.00000000000000 1.09821697434463 0.05314633933702 -0.00000000000002 1.00000000000000 0.85536574816969 -0.03323427874430 -0.00000000000000 1.00000000000000 0.81581011781032 -0.04155027861097 0.00000000000000 1.00000000000000 0.77747525568173 0.00646984640628 0.00000000000000 1.00000000000000 0.81585568134645 -0.01439701184878 -0.00000000000000 1.00000000000000 0.72974930993894 -0.02483511791033 0.00000000000000 1.00000000000000

99

1 0 0 1   ~ ~ ' ~ t  ~ Using 0 1 0 0 adjM P  P  h  0 (Equation 12), build the first three rows of   23 23  91 31 0 0 1 0  49 34

~ the linear system, Ah  0 with the first pair of line correspondences. ~ ~ M = P ' =b 0.7623 0.0000 -0.8258 0.5639 -0.6467 0.0320 1.0000 0 0 1.0000

~ ~ adj M =a P =g   -0.6845 -0.4928 1.0000 0.0000 0.7290 0.6576 -0.0320 0.7623 0 1.0000

1 0 0 1   ~ ~ ' ~ t  0 1 0 0 adjM P  P     23 23  0 0 1 0  49 34

0.5783 -0.6194 -0.0265 -0.3949 0.4230 0.0181 0.8285 -0.9863 -0.7831 0.4070 -0.5430 -0.8258 -0.2779 0.3708 0.5639 0.3187 -0.4252 -0.6467 0.0181 0.0193 0 0.0124 -0.0132 0 -0.5360 0.5709 0

The fourth equation extracted from this correspondence comes from the first equation of

h1  A' pt B' pt C' pt    0  h2  using normalized coordinates.  t t t    A'q B'q C'q   0    3    h 

[A’ B’ C’] = [0.2056 0.3011 -0.0116] is the equation of a line through the normalized coordinates

(-0.5161, 0.3909) and (-0.8172, 0.5965) from Table 11. The value of pt =[-0.4928 0.6576 1] also comes from Table 11. The fourth equation from the first line correspondence is

t A' B' C' p 

-0.1013 0.1352 0.2056 -0.1484 0.1980 0.3011 0.0057 -0.0076 -0.0116

100 Proceeding in this manner with four equations for each correspondence, compute At  A  940 409

8.8699 0.4004 0.3180 -1.2567 1.2672 0.1532 -5.7253 2.2351 -9.6054 0.4004 6.9107 -0.2940 1.2672 -0.5042 -1.2695 2.2279 1.6265 -0.1917 0.3180 -0.2940 5.8870 0.1532 -1.2695 -0.5898 -5.8283 -1.3217 0.6902 -1.2567 1.2672 0.1532 6.4285 4.0451 -1.1909 -0.8997 -0.1595 -3.7172 1.2672 -0.5042 -1.2695 4.0451 9.6821 -0.5341 -0.6540 -1.7159 -10.3624 0.1532 -1.2695 -0.5898 -1.1909 -0.5341 6.0548 -3.8101 -5.5465 0.5356 -5.7253 2.2279 -5.8283 -0.8997 -0.6540 -3.8101 42.6630 11.6768 8.1396 2.2351 1.6265 -1.3217 -0.1595 -1.7159 -5.5465 11.6768 40.0842 -0.1621 -9.6054 -0.1917 0.6902 -3.7172 -10.3624 0.5356 8.1396 -0.1621 19.2628

The eigenvector corresponding the smallest eigenvalue of AtA is

~ h 

-0.5888 0.0000 0.0318 -0.1393 -0.5093 0.0224 0.0322 0.0028 -0.6098

Reorganize into rows and columns ~ H 

-0.5888 0.0000 0.0318 -0.1393 -0.5093 0.0224 0.0322 0.0028 -0.6098

and finally denormalize to arrive at the final homography up to an arbitrary scale factor ~ H 

-0.5888 0.0000 0.0318 -0.1393 -0.5093 0.0224 0.0322 0.0028 -0.6098

~ H  T'1S'1H  S T 

-0.6397 0.0116 -1.1184 -0.0265 -0.6769 10.2975 0.0002 0.0000 -0.6780

It is common to further normalize h33 as follows resulting in the 2-D registration homography:

101

H h33  0.9435 -0.0172 1.6496 0.0391 0.9984 -15.1884 -0.0002 -0.0000 1.0000

102 Appendix I Tensor Commutativity with the Kronecker Product

Computer vision algorithms frequently encounter the matrix equation A X  B  C mn np pq mq where A, B and C are matrices of constants and X is a matrix of unknowns. Even when A or B is square, they typically contain some geometric constraint rendering them non-invertible. To establish a set of linear equations in the elements of X, commute the X and B matrices. This is performed via the Kronecker tensor product as follows.

x1  x  A  X  B  A   2 b1 b2  bq  mn n p pq mn      xn 

j where xi is row vector i of X and b is column vector j of B. Multiply this expression out yielding the mq matrix

n n n  1 2 q  a1k xk b a1k xk b  a1k xk b  k1 k1 k1  n n n   1 2 q  a2k xk b a2k xk b  a2k xk b  k1 k1 k1        n n n   a x b1 a x b2  a x bq   mk k  mk k  mk k   k1 k1 k1  mq

Next, reshape this matrix into an mq1matrix by writing each of its rows as a column vector and then proceed to collect and factor like terms. Note that the right-hand C matrix must also be reshaped into an matrix to preserve equality.

103 n  1   a1k xk b  k1  n   2  a1k xk b   1 1 1 1t 1t 1t k1  a x b  a x b  a x b   a b a b  a b    11 1 12 2 1n n 11 12 1n   2 2 2   2t 2t 2t   n  a x b  a x b  a x b a b a b  a b q  11 1 12 2 1n n   11 12 1n   a x b   1k k          k1       n  a x bq  a x bq  a x bq a bqt a bqt  a bqt  a x b1   11 1 12 2 1n n   11 12 1n   2k k  1 1 1   1t 1t 1t   k1  a21x1b  a22x2b  a2n xnb a21b a22b  a2nb t t t t t t t n     x1  a11B a12B  a1n B  x1  x1  c1   2  2 2 2 2t 2t 2t a x b  a x b  a x b a b a b  a b  t   t t t   t   t   t  a2k xk b  21 1 22 2 2n n   21 22 2n    x2  a21B a22B  a2n B  x2  t x2  c2   k1                A  B                   mn q p         q q q qt qt qt  n  a x b  a x b  a x b   a b a b  a b        mqnp      q  21 1 22 2 2n n 21 22 2n xt  a Bt a Bt  a Bt  xt  xt  ct  a2k xk b      n   m1 m2 mn   n   n   m      np1 mq1 k1       1 1 1 1t 1t 1t   am1x1b  am2 x2b  amn xnb  am1b am2b  amnb   n  2 2 2 2t 2t 2t 1 a x b  a x b  a x b  a b a b  a b   a x b  m1 1 m2 2 mn n m1 m2 mn  mk k      k1          n   a x b2  a x bq  a x bq  a x bq  a bqt a bqt  a bqt   mk k  m1 1 m22mnn  m1 m2 mn k1   mq1 mq1    n  q  amk xk b   k1 mq1

t th t th Where  is the Kronecker tensor product, xi is the i row of X written as a column vector and ci is the i row of C written as a column vector. Notice that X and B have commuted under the Kronecker tensor product as desired.

104 Encyclopedic / Textbook References

[1] Eberly, David. "Least Squares Fitting of Data." Geometric Tools, LLC, 15 July 1999. Web. 21 Mar. 2013. . [2] Thomas, George B., and Ross L. Finney. Calculus and Analytic Geometry. 9th ed. Reading, Mass.: Addison-Wesley, 1996. Print.

105 Scholarly References

Data Set [3] Restrepo, Maria I., et al. "Characterization of 3-d volumetric probabilistic scenes for object recognition." Selected Topics in Signal Processing, IEEE Journal of 6.5 (2012): 522-537.

Geometry [4] Henle, Michael. Modern Geometries: the Analytic Approach. Upper Saddle River, NJ: Prentice Hall, 1997. Print. [5] Hartley, Richard, and Andrew Zisserman. Multiple View Geometry in Computer Vision. Cambridge, UK: Cambridge University Press, 2000. Print. [6] James Anderson, Hyperbolic Geometry. London: Springer-Verlag, 1999. Print.

Primitive Detection [7] Linger, Michael. "Color image segmentation algorithm: An approach to image segmentation through ellipsoidal clustering and edge detection." Aerospace and Electronics Conference (NAECON), Proceedings of the 2011 IEEE National. IEEE, 2011. [8] Duda, Richard O., and Peter E. Hart. "Use of the Hough transformation to detect lines and curves in pictures." Communications of the ACM 15.1 (1972): 11-15. [9] Canny, John. "A computational approach to edge detection." Pattern Analysis and Machine Intelligence, IEEE Transactions on 6 (1986): 679-698. [10] Lindeberg, Tony. "Edge detection and ridge detection with automatic scale selection." International Journal of Computer Vision 30.2 (1998): 117-156. [11] Kimmel, Ron, and Alfred M. Bruckstein. "Regularized Laplacian zero crossings as optimal edge integrators." International Journal of Computer Vision 53.3 (2003): 225- 243. [12] Haralick, Robert M. "Digital step edges from zero crossing of second directional derivatives." Pattern Analysis and Machine Intelligence, IEEE Transactions on 1 (1984): 58-68. [13] Burns, J. Brian, Allen R. Hanson, and Edward M. Riseman. "Extracting straight lines." Pattern Analysis and Machine Intelligence, IEEE Transactions on 4 (1986): 425-455. [14] Harris, Chris, and Mike Stephens. "A combined corner and edge detector." Alvey vision conference. Vol. 15. 1988. [15] Anandan, P. "Computing dense displacement fields with confidence measures in scenes containing occlusion." 1984 Cambridge Symposium. International Society for Optics and Photonics, 1985. [16] Anandan, Padmanabhan. "A computational framework and an algorithm for the measurement of visual motion." International Journal of Computer Vision 2.3 (1989): 283-310.

Correlation [17] Ayache, N., and B. Faverjon. "Fast stereo matching of edge segments using prediction and verification of hypotheses." Proc. Computer Vision and Pattern Recognition. 1985. [18] Jones, Graeme A. "Constraint, optimization, and hierarchy: Reviewing stereoscopic correspondence of complex features." Computer Vision and Image Understanding 65.1 (1997): 57-78. [19] Li, Ze-Nian. "Stereo correspondence based on line matching in Hough space using dynamic programming." Systems, Man and Cybernetics, IEEE Transactions on 24.1 (1994): 144-152.

106 [20] Collins, Robert T., and J. Ross Beveridge. "Matching perspective views of coplanar structures using projective unwarping and similarity matching." Computer Vision and Pattern Recognition, 1993. Proceedings CVPR'93., 1993 IEEE Computer Society Conference on. IEEE, 1993. [21] Hartley, R. "Multilinear relationships between coordinates of corresponding image points and lines." Proc. International Workshop on Computer Vision and Applied Geometry. 1995. [22] Schmid, Cordelia, and Andrew Zisserman. "Automatic line matching across views." Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference on. IEEE, 1997. [23] Fischler, Martin A., and Robert C. Bolles. "Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography." Communications of the ACM 24.6 (1981): 381-395. [24] Chen, Min, et al. "Scale and rotation robust line-based matching for high resolution images." Optik-International Journal for Light and Electron Optics 124.22 (2013): 5318-5322. [25] Ke, Yan, and Rahul Sukthankar. "PCA-SIFT: A more distinctive representation for local image descriptors." Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on. Vol. 2. IEEE, 2004. [26] Liu, Hong-Min, Zhi-Heng Wang, and Chao Deng. "Extend point descriptors for line, curve and region matching." Machine Learning and Cybernetics (ICMLC), 2010 International Conference on. Vol. 1. IEEE, 2010. [27] Wang, Zhiheng, Hongmin Liu, and Fuchao Wu. "HLD: A robust descriptor for line matching." Computer-Aided Design and Computer Graphics, 2009. CAD/Graphics' 09. 11th IEEE International Conference on. IEEE, 2009. [28] Wang, Zhiheng, Fuchao Wu, and Zhanyi Hu. "MSLD: A robust descriptor for line matching." Pattern Recognition 42.5 (2009): 941-953.

Registration [29] Zitova, Barbara, and Jan Flusser. "Image registration methods: a survey." Image and vision computing 21.11 (2003): 977-1000. [30] Goshtasby, Ardeshir. Image Registration: Principles, Tools and Methods. London: Springer, 2012. Print. [31] Stockman, George, Steven Kopstein, and Sanford Benett. "Matching images to models for registration and object detection via clustering." Pattern Analysis and Machine Intelligence, IEEE Transactions on 3 (1982): 229-241. [32] Krüger, Wolfgang. "Robust and efficient map-to-image registration with line segments." Machine Vision and Applications 13.1 (2001): 38-50. [33] Agarwal, Anubhav, C. V. Jawahar, and P. J. Narayanan. "A survey of planar homography estimation techniques." Centre for Visual Information Technology, Tech. Rep. IIIT/TR/2005/12 (2005). [34] Reddy, B. Srinivasa, and Biswanath N. Chatterji. "An FFT-based technique for translation, rotation, and scale-invariant image registration." IEEE transactions on image processing 5.8 (1996): 1266-1271. [35] Leprince, Sébastien, et al. "Automatic and precise orthorectification, coregistration, and subpixel correlation of satellite images, application to ground deformation measurements." Geoscience and Remote Sensing, IEEE Transactions on 45.6 (2007): 1529-1558. [36] Caner, Gulcin, et al. "Local image registration by adaptive filtering." Image Processing, IEEE Transactions on 15.10 (2006): 3053-3065.

107 [37] Long, Tengfei, et al. "Automatic Line Segment Registration Using Gaussian Mixture Model and Expectation-Maximization Algorithm." 1-12. [38] Crispell, Daniel, Joseph Mundy, and Gabriel Taubin. "-Free Registration of Aerial Video." (2008). [39] Yang, Gehua, et al. "Registration of challenging image pairs: Initialization, estimation, and decision." Pattern Analysis and Machine Intelligence, IEEE Transactions on 29.11 (2007): 1973-1989. [40] Quan, Long, and Takeo Kanade. "Affine structure from line correspondences with uncalibrated affine cameras." Pattern Analysis and Machine Intelligence, IEEE Transactions on 19.8 (1997): 834-845. [41] Holtkamp, David J., and A. Ardeshir Goshtasby. "Precision registration and mosaicking of multicamera images." Geoscience and Remote Sensing, IEEE Transactions on 47.10 (2009): 3446-3455. [42] Walker, Michael W., Lejun Shao, and Richard A. Volz. "Estimating 3-D location parameters using dual number quaternions." CVGIP: image understanding 54.3 (1991): 358-367. [43] Zeng, Hui, Xiaoming Deng, and Zhanyi Hu. "A new normalized method on line-based homography estimation." Pattern Recognition Letters 29.9 (2008): 1236-1244. [44] Wu, Zhou, and Ardeshir Goshtasby. "Adaptive image registration via hierarchical voronoi subdivision." Image Processing, IEEE Transactions on 21.5 (2012): 2464- 2473.

Tracking [45] Faugeras, Olivier D., and Martial Hebert. "The representation, recognition, and locating of 3-D objects." The international journal of robotics research 5.3 (1986): 27-52. [46] Govindu, Venu, and Chandra Shekhar. "Alignment using distributions of local geometric properties." Pattern Analysis and Machine Intelligence, IEEE Transactions on 21.10 (1999): 1031-1043. [47] Crowley, James L., et al. "Measurement and integration of 3-D structures by tracking edge lines." International Journal of Computer Vision 8.1 (1992): 29-52.

Reconstruction [48] Zhou, Guoqing, et al. "A comprehensive study on urban true orthorectification." Geoscience and Remote Sensing, IEEE Transactions on 43.9 (2005): 2138-2147. [49] Hartley, Richard I. "A linear method for reconstruction from lines and points." Computer Vision, 1995. Proceedings., Fifth International Conference on. IEEE, 1995. [50] Liebowitz, David, and Andrew Zisserman. "Metric rectification for perspective images of planes." Computer Vision and Pattern Recognition, 1998. Proceedings. 1998 IEEE Computer Society Conference on. IEEE, 1998. [51] Kaucic, Robert, Richard Hartley, and Nicolas Dano. "Plane-based projective reconstruction." Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conference on. Vol. 1. IEEE, 2001. [52] Hartley, Richard I. "Projective reconstruction from line correspondences." Computer Vision and Pattern Recognition, 1994. Proceedings CVPR'94., 1994 IEEE Computer Society Conference on. IEEE, 1994. [53] Pellejero, Oscar A., Carlos Sagüés, and J. Jesús Guerrero. "Automatic computation of the fundamental matrix from matched lines." Current Topics in Artificial Intelligence. Springer Berlin Heidelberg, 2004. 197-206. [54] Spetsakis, Minas E., and John Yiannis Aloimonos. "Structure from motion using line correspondences." International Journal of Computer Vision 4.3 (1990): 171-183.

108 [55] Torr, Philip HS, and Andrew Zisserman. "Robust parameterization and computation of the trifocal tensor." Image and Vision Computing 15.8 (1997): 591-605. [56] Liu, Yuncai, and Thomas S. Huang. "Estimation of rigid body motion using straight line correspondences." Computer Vision, Graphics, and Image Processing 43.1 (1988): 37-52. [57] Faugeras, Olivier, and Bernard Mourrain. "On the geometry and algebra of the point and line correspondences between n images." Computer Vision, 1995. Proceedings., Fifth International Conference on. IEEE, 1995. [58] Torr, Philip HS, and Andrew Zisserman. "Feature based methods for structure and motion estimation." Vision Algorithms: Theory and Practice. Springer Berlin Heidelberg, 2000. 278-294. [59] Kanatani, Ken-Ichi. "The constraints on images of rectangular polyhedra." Pattern Analysis and Machine Intelligence, IEEE Transactions on 4 (1986): 456-463. [60] Henle, James M. "Where the camera was." Mathematics magazine 77.4 (2004): 251- 259. [61] Delaunay, B. “Sur la sphère vide. A la mémoire de Georges Voronoï.” Bulletin de l'Académie des Sciences de l'URSS, Classe des sciences mathématiques et na (1934), 6, 793–800 [62] Duckham, Matt, et al. "Efficient generation of simple polygons for characterizing the shape of a set of points in the plane." Pattern Recognition 41.10 (2008): 3224-3236.

Reconstruction from Lines [63] Zhang, Zhengyou. "Estimating motion and structure from correspondences of line segments between two perspective images." Pattern Analysis and Machine Intelligence, IEEE Transactions on 17.12 (1995): 1129-1139. [64] Schindler, Grant, Panchapagesan Krishnamurthy, and Frank Dellaert. "Line-based structure from motion for urban environments." (2006). [65] Taylor, Camillo J., and David Kriegman. "Structure and motion from line segments in multiple images." Pattern Analysis and Machine Intelligence, IEEE Transactions on 17.11 (1995): 1021-1032. [66] Bartoli, Adrien, and Peter Sturm. "Structure-from-motion using lines: Representation, triangulation, and bundle adjustment." Computer Vision and Image Understanding 100.3 (2005): 416-441. [67] Chandraker, Manmohan, Jongwoo Lim, and David Kriegman. "Moving in stereo: Efficient structure and motion using lines." Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2009.

109