<<

PROJECTIVE VISUAL HULLS Beckman CVR Technical Report 2002–01

Svetlana Lazebnik, M.S. Department of Computer Science University of Illinois at Urbana-Champaign, 2002 Jean Ponce, Advisor

This thesis presents an image-based method for computing the visual hull of an object bounded by a smooth surface and observed by a finite number of perspective cameras.

The essential structure of the visual hull is projective: to compute an exact topological

(combinatorial) description of its boundary, we do not need to know the Euclidean properties of the input cameras or of the scene. Unlike most existing visual hull computation methods, ours requires only a projective reconstruction of the camera matrices, or equivalently, the epipolar geometry between each pair of cameras in the scene. Starting with a rigorous theoretical framework of oriented projective geometry and projective differential geometry, we develop a suite of algorithms to construct the visual hull and associated data structures.

The thesis discusses our implementation of the algorithms, and presents experimental results on synthetic and real data sets. PROJECTIVE VISUAL HULLS BECKMAN CVR TECHNICAL REPORT 2002–01

BY SVETLANA LAZEBNIK

B.S., DePaul University, 2000

THESIS

Submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science in the Graduate College of the University of Illinois at Urbana-Champaign, 2002

Urbana, Illinois c Copyright by Svetlana Lazebnik, 2002 ABSTRACT

This thesis presents an image-based method for computing the visual hull of an object bounded by a smooth surface and observed by a finite number of perspective cameras. The essential structure of the visual hull is projective: to compute an exact topological

(combinatorial) description of its boundary, we do not need to know the Euclidean properties of the input cameras or of the scene. Unlike most existing visual hull computation methods, ours requires only a projective reconstruction of the camera matrices, or equivalently, the epipolar geometry between each pair of cameras in the scene. Starting with a rigorous theoretical framework of oriented projective geometry and projective differential geometry, we develop a suite of algorithms to construct the visual hull and associated data structures. The thesis discusses our implementation of the algorithms, and presents experimental results on synthetic and real data sets.

iii To Max

iv ACKNOWLEDGMENTS

First and foremost, thanks are due to my advisor, Jean Ponce, for finding my research interesting, holding it to an exacting standard, and constantly telling me to be more positive. I gratefully acknowledge the National Science Foundation for supporting this research under the grant IRI-990709, and the Computer Science Department and the College of Engineering for supporting me with the SURGE fellowship and various awards.

I would also like to thank Edmond Boyer for providing the gourd data set, and Steve Sullivan for providing the squash and the Steve data sets. Both Edmond and Steve were responsible for inspiring the research that eventually developed into this thesis. Fred Roth- ganger also deserves a mention for taking time for idle conversations and for carrying on the constant uphill battle to keep the lab machines up and running.

Thanks are due to my family: my grandma and parents for constantly asking when the thesis will be done, and my sister Maria for her silent brand of commiseration. Good luck in grad school, Maria! Finally, I must say that this thesis would never have been completed on time without the help and loving care of my husband, Dr. Max Raginsky. I hope you will be there for me when I am writing my Ph.D. thesis!

v TABLE OF CONTENTS

CHAPTER PAGE

1 Introduction ...... 1 1.1 DefiningtheVisualHull...... 2 1.2 PreviousWork:ComputingDiscreteVisualHulls ...... 6 1.2.1 VolumeIntersection...... 6 1.2.2 ShapefromDeformingContours...... 8 1.2.3 ApplicationsofVisualHulls...... 10 1.3 MathematicalIngredients...... 12 1.4 Overview...... 13

2 Oriented Projective Geometry ...... 15 2.1 Basics ...... 16 2.1.1 OrientedProjectiveSpace ...... 16 2.1.2 Flats...... 18 2.1.3 Join,Meet,andRelativeOrientation ...... 20 2.1.4 OrientedProjectiveTransformations ...... 23 2.2 ComputingwithFlats...... 25 2.2.1 OrientedProjectiveFrames...... 25 2.2.2 SimplexOrientation...... 27 2.2.3 RepresentingGeneralFlats ...... 28 2.2.4 RepresentingProjectiveTransformations ...... 30 2.3 ImagingGeometryofaSingleCamera...... 32 2.4 OrientedMulti-ViewGeometry...... 39 2.4.1 Fundamental Matrix ...... 39 2.4.2 OrientedTrifocalTensor ...... 45 2.4.3 OrientedTransfer ...... 48 2.4.3.1 TransferUsingEpipolarGeometry...... 49 2.4.3.2 TransferUsingtheTrifocalTensor...... 51 2.5 OrientedProjectiveReconstruction...... 54

3 Projective Differential Geometry ...... 58 3.1 ...... 59 3.1.1 DifferentialEquationsofCurves...... 59 3.1.2 OsculatingSpaces ...... 62

vi 3.1.3 OrderofContact...... 64 3.2 Surfaces...... 65 3.2.1 OrderofContactofSurfaces ...... 67 3.2.2 DevelopableSurfaces...... 67 3.2.3 ConjugateNets...... 69 3.2.4 AsymptoticDirections...... 72 3.2.5 AlternativeDefinitionsofConjugacy ...... 75 3.2.6 LocalShape...... 79 3.3 OrientingCurvesandSurfaces ...... 83 3.3.1 OrientingPlaneCurves...... 84 3.3.2 OrientingSurfaces...... 88

4 Visual Hulls ...... 93 4.1 PropertiesofRimsandApparentContours...... 93 4.2 FrontierPoints...... 102 4.3 TheRimMesh...... 109 4.3.1 OrientedStructureoftheRimMesh...... 111 4.3.2 ReconstructingtheRimMesh ...... 112 4.3.3 CombinatorialComplexityoftheRimMesh...... 119 4.4 IntersectionCurves...... 120 4.4.1 GeometricPropertiesofIntersectionCurves...... 122 4.4.2 TracingIntersectionCurves...... 130 4.5 The1-SkeletonoftheVisualHull...... 142 4.5.1 ClippingIntersectionCurves ...... 142 4.5.2 IntersectionPoints...... 144 4.5.3 AnIncrementalAlgorithm...... 147 4.6 ComputingtheFacesoftheVisualHull ...... 151 4.6.1 RayIntervals...... 152 4.6.2 VerticalDecomposition ...... 158 4.6.3 ConvexObjects:TheVisualHullandtheRimMesh ...... 162

5 Implementation and Results ...... 166 5.1 ImplementationDetails...... 166 5.1.1 DiscreteContourRepresentation ...... 166 5.1.2 General Position Assumptions ...... 169 5.1.3 3DReconstruction...... 173 5.1.4 Efficiency ...... 174 5.2 RimMeshResults ...... 175 5.3 VisualHullResults...... 180

6 Conclusion ...... 194 6.1 Summary...... 194 6.2 FutureWork...... 196

vii APPENDIX A Oriented Formulas ...... 201 A.1 Formulas for T2 ...... 201 A.2 Formulas for T3 ...... 202 A.3 Algebraic and Infinitesimal Properties of Join and Meet ...... 205

REFERENCES ...... 207

viii CHAPTER 1

Introduction

Suppose that we have taken a few snapshots of an object from a few known camera viewpoints and then extracted the silhouette of the object from each photograph. Clearly, we have lost most of the information about the 3D shape of the object. However, we can still try our best to reverse the imaging process by reconstructing this shape. We can imagine each camera as a slide projector emitting a cone of rays from each pinhole through the silhouette on its image . The object is then constrained to lie in the region of space that falls inside the cone due to each camera. This region, called the visual hull, is the least-committed estimate of the shape of the object based on silhouette data alone. We can ask, what is the accuracy of the reconstruction provided by the visual hull? From a theoretical point of view, we may be interested in reconstruction in the limit: if we can observe all possible silhouettes of the object, does the visual hull reconstruct its shape exactly? As it turns out the answer is negative: for instance, “dents” or concave parts of the surface will never appear on its silhouettes. From the practical point of view, we are concerned with discretization:sincewe can take only a finite number of pictures, what can we say about the shape of the computed discrete visual hull? How does it depend on the input camera positions? What does it tell us about the shape of the actual 3D object? How does it differ from the theoretical limiting case? In this thesis, we will attempt to answer some of these questions.

1 1.1 Defining the Visual Hull

Our point of departure is the standard theoretical definition of a visual hull introduced by Laurentini [29]:

Definition 1.1 (Visual Hull). The visual hull V of an object Ω relative to a viewing region R is a region of space consisting of all points X such that for each viewpoint (camera center) O in R, the visual ray L starting at O and passing through X contains at least one point of the object Ω.

Even though we are interested in reconstructing Ω based on the images taken by the cameras with centers in R, Definition 1.1 does not actually mention the image planes of the cameras. To arrive at an image-based definition of the visual hull (and also to clarify

Definition 1.1), consider a point X that belongs to V .Letx be the projection of X in the image plane of the camera with center O. Because the ray L passes through Ω, the image point x must also be a projection of some point of Ω, so it must belong to the silhouette,or the two-dimensional region on the image plane occupied by the projection of Ω. Therefore, we can rephrase Definition 1.1 to state that V is the set of all points X that project inside the silhouette of Ω in the image plane of a perspective camera centered at any point O in

R. In other words, the visual hull is the maximal (largest) shape that produces the same silhouettes as Ω when seen from all points in R. In the opening, we mentioned the limiting visual hull, which can be obtained by observing the silhouettes of Ω from every possible viewpoint (for technical reasons, we have to require that the viewing region R be outside the convex hull of S [29]). This is the external visual hull, denoted V ∗, and it is contained in any other visual hull of Ω. In theoretical study, then, the external visual hull can be thought of as the visual hull. There exist algorithms for computing the external visual hull of polyhedral and smooth objects, in 2D and in 3D [29, 30, 48] but they assume that an exact model of the target object is available.

2 Figure 1.1 illustrates the concept of visual hulls on a simple 2D example. On the left, we see the imaging setup for a bean-shaped object Ω and a three-camera viewing region

R = {O1, O2, O3}. Note that in 2D, the retinas of the cameras are lines, and the projections of Ω (silhouettes) are one-dimensional. On the right, we see the discrete visual hull V formed by intersecting three back-projected cones K1, K2,andK3. Figure 1.2 shows the relationship between the object Ω, the external visual hull V ∗, and the discrete visual hull V .Notethat

V ∗ is contained in V , but is larger than Ω itself.

W

K3

O K1 O 3 3 K2

O O 1 1

O O 2 2 (a) (b)

Figure 1.1 (a) Observing a 2D object Ω from a viewing region R = {O1, O2, O3}.(b) Reconstructing Ω: the three visual cones K1,K2,K3 and the discrete hull V as their inter- section.

As mentioned earlier, in this work we are concerned with the discrete visual hull, where R is a finite set of isolated viewpoints. For this reason, we find it useful to state a definition of the visual hull based on volume intersection:

Definition 1.2. Let R = {O1, O2,...,On} be the viewing region consisting of a finite number of camera centers. Consider the solid visual cone Ki of rays that originate at Oi

3 W É V* É V

Figure 1.2 The relationship between the object Ω, the external visual hull V ∗,andthe ∗ discrete visual hull V where R = {O1, O2, O3} as in figure 1.1. Note that V does not capture the concavity on one side of Ω (the subset of the boundary of V ∗ that does not coincide with Ω is actually a bitangent to the boundary of Ω). and pass through any point on the surface Ω. Then the visual hull V is equal to the (possibly unbounded) solid formed by the intersection of the viewing cones: n V = Ki. i=1

Although the visual hull as an intersection of visual cones is a familiar notion in computer vision, the issue of computing its exact geometric and topological structure has received little attention in previous work. In the current thesis, we attempt to fill this gap. To this end, we represent the visual hull as a generalized or topological polyhedron [17, 23]:

Definition 1.3. A generalized polyhedron M is a 3D solid (bounded or unbounded) whose boundary ∂M is a union of faces, edges,andvertices subject to the following constraints:

• Each face is a 2-manifold with boundary;

• The intersection of two faces is either an edge, a vertex, or empty;

• The intersection of two edges is either a vertex or empty;

• Each vertex has a neighborhood of ∂M homeomorphic to a disk.

4 W

Apparent contour (outline)

Visual cone

Silhouette O

Figure 1.3 Terminology introduced so far: visual cone, silhouette, outline.

If we regard V as a topological polyhedron formed by the intersection of a finite number of visual cones, then the three components of its boundary description are characterized as follows:

Face: a maximal region of ∂V that belongs to a single visual cone and has the topology of

a 2-manifold with boundary.

Edge: a maximal connected subset of an intersection between two cones that has the

topology of a 1-manifold with boundary.

Vertex: an isolated point on ∂V that is the intersection of three or more faces.

To avoid confusion, we explain here our usage of the term exact.Namely,wesaythata representation of the visual hull is exact when it correctly captures (up to arbitrary projec- tive transformations) the topological and geometric features of the solid that results from intersecting a finite number of viewing cones, as defined above. By contrast, from a theoret- ical point of view, the external visual hull would be considered exact since it represents the

“limit” of all possible visual hulls.

5 1.2 Previous Work: Computing Discrete Visual Hulls

There exist two conceptually different strategies to computing discrete visual hulls. The

first strategy, which is the oldest, is to directly implement volume intersection, while the second one is to assume a smooth object and a continuous camera motion and to reconstruct the object as the envelope of its planes.

1.2.1 Volume Intersection

The main advantage of volume intersection algorithms is that they (in principle) work with any combination of calibrated input viewpoints and make no assumptions about the object shape, e.g., smoothness or topology. One common way to implement volume in- tersection is to approximate visual cones by polyhedra. The oldest such algorithm dates back to Baumgart’s 1974 PhD thesis [2]. In this work, a polyhedral visual hull is con- structed by intersecting the viewing cones associated with polygonal approximations to the extracted silhouettes. Since 1974, many volume intersection systems have continued to rely on 3D polyhedral intersections, which can be tricky to implement. The main difficulties involve handling degenerate special cases and dealing with numerical instabilities that arise when intersecting polygons that are nearly tangent. Some reconstruction systems implement volume intersection using commercial software packages or publicly available generic solid modeling libraries [34].

Possibly the most popular technique for visual hull construction is to approximate visual cones by voxel volumes. For example, Szeliski [57] has introduced an efficient voxel-based algorithm that relies on hierarchical spatial data structures. Because of its robustness, simplicity, and speed, this basic approach remains popular. A practical modeling system today is likely to be similar to the one outlined by Wong and Cipolla [62], which uses octree- based carving, combined with the Marching Cubes algorithm [37] to extract a triangle mesh out of the voxel volume. Voxel carving is not susceptible to numerical difficulties that can

6 arise in exact computations with polyhedra. Moreover, its running time depends only on the number of input cameras and on the resolution of the volumetric representation, not on the intrinsic complexity of the visual hull. However, volumetric methods suffer from artifacts resulting from the quantization of the voxel volume.

For decades, 3D polyhedral intersections and voxel carving have remained the only ap- proaches to computing visual hulls. In the last few years, vision and graphics researchers have become more interested in image-based rendering and have become more aware of multi-view geometry [14, 21]. This “paradigm shift” has resulted in new efficient algorithms that avoid general 3D intersections by taking advantage of epipolar geometry. Matusik et. al. [38] describe a fast, simple algorithm that involves image-based sampling along visual rays emanating from a virtual camera. The main limitation of this algorithm is that its output is view-dependent: if one wishes to render the visual hull from a new virtual viewpoint, one must re-run the construction algorithm. Another disadvantage is that image-based visual hulls require custom-built rendering routines. For many applications, it is preferable to gen- erate a standard polygonal model that can be displayed using standard graphics hardware.

Two later papers from the same research group [39, 54] extend the idea behind image-based visual hulls to produce view-independent polyhedral models. The most important contribu- tion of these newer algorithms is the reduction of 3D polyhedral intersections to 2D. This is an important algorithmic advance, one which we extend in this thesis. Perhaps the most important way in which we go beyond state-of-the-art visual hull algorithms is in producing a representation of the visual hull in terms of its intrinsic topological features, instead of artifacts of discretization like polyhedra, voxels, or irregularly sampled points. We will also show how to compute visual hulls while making minimal assumptions about the form of the input and the imaging geometry. A common thread in recent research involves using the visual hull as an initial step in surface optimization based on criteria other than silhouette-consistency. Sullivan and

7 Ponce [56] describe a system that creates a rough model using polyhedral intersection, and then optimizes this model using triangular splines. In this work, the optimization component seeks to minimize the average ray-surface distance in 3D (as opposed to image-plane distance between the input silhouettes and the reprojected model). An important optimization crite- rion introduced recently is photo-consistency, embodied in the photo hull [27]. Whereas the visual hull is required to conform only to the silhouettes in the original pictures, the photo hull must exactly reproduce all input images of the target object. Cross and Zisserman [10] discuss an “optimal” reconstruction algorithm that uses the visual hull as the initial input to an optimization routine based on photo-consistency. The idea of combining silhouette- and photo-consistency criteria in automatic object reconstruction holds much promise for future research. The algorithm proposed in this thesis relies on epipolar geometry, uses only two-dimensional computations, and if necessary, constructs a completely image-based representation of the visual hull. For these reasons, it provides a good starting point for optimization using photo-consistency constraints.

1.2.2 Shape from Deforming Contours

Volume intersection is the most general approach to computing discrete visual hulls, requiring no assumptions about the configuration of the input viewpoints, the geometry of the surface, or its topology. However, this approach is rather ill-adapted to handling large numbers of almost coincident visual cones — a situation that arises when we wish to reconstruct an object from a video clip taken by a camera following a continuous trajectory.

If we assume that the object is smooth, we can use the elegant differential techniques of shape from deforming contours [9]. One if the key ideas is to consider the apparent contour as the image of a critical set of points on the surface at which the projection map folds [16].

The critical set is also known as the contour generator or the rim (see figure 1.4). Observing a smooth point on the apparent contour in the image allows us to reconstruct the tangent

8 plane to the object in space. If the camera moves continuously, the rim gradually “slips” along the surface, and the surface may be reconstructed as the envelope of its tangent planes.

This is volume intersection in an infinitesimal sense, and the shape obtained as a result of such an algorithm is an approximation to the visual hull. This approach typically assumes that the target objects are smooth, and requires all outlines to be non-singular (conceptual difficulties arise when some parts of apparent contours become occluded by the surface, or when contours change topology as they evolve). In practice, finite-difference approximations have to replace derivative computations, introducing numerical instability.

W

X

Rim

x

Apparent contour

Tangent plane

O

Figure 1.4 The rim is the set of all points X on the surface for which the tangent plane passes through the camera center. The apparent contour in the image is formed by all points x that are projections of rim points.

Koenderink [24] was among the first to elucidate the relationship between the local ge- ometry of the 3D surface on the geometry of the 2D contour in a single image. Giblin and Weiss [16] have introduced a mathematical framework for the problem of smooth surface reconstruction from apparent contours, along with a reconstruction algorithm for three-

9 dimensional objects assuming orthographic projection and camera motion in a great circle.

Subsequent approaches generalize this framework to handle perspective projection and gen- eral (known) camera motions. Cipolla and Blake [7] derive formulas for depth and Gaussian of the object surface along the evolving apparent contours. Based on extensive experimental results, the authors conclude that computing depth and curvature by taking

finite-difference approximations of first and second derivatives of the spatiotemporal surface

(the surface formed by apparent contours that evolve in time as the camera moves) is inher- ently sensitive to noise and computation error. Vaillant and Faugeras [58] present another treatment of the reconstruction problem, using a surface parametrization based on the Gauss map. They also note that computing depth and curvature requires derivatives of image mea- surements up to second order, and propose more robust solutions. Boyer and Berger [4] report a discrete algorithm based on a local approximation of the surface by an osculat- ing quadric. All three approaches assume known camera motion and perspective projection

(fully calibrated), make extensive use of differential geometry, and focus on estimating local properties of the surface, such as . The present work inherits some of these features. We assume smooth surfaces and use techniques of differential geometry to establish local properties of visual hulls. However, our reconstruction algorithm does not rely on approximations of local surface shape or curvature — the entity we compute is exactly the discrete visual hull.

1.2.3 Applications of Visual Hulls

Though applications per se are not the subject of the current thesis, it must be mentioned that visual hulls have considerable practical utility. Visual hulls are conceptually simple, robust, can be built from a very small number of views (≈ 10), and require as input only silhouettes and camera parameters — information that can be readily obtained in controlled

10 lab or studio environments. We are aware of at least one commercial system [44] that has used visual hulls for automatic 3D model construction.

One of the most popular applications for visual hulls is virtual reality. Lok [35, 36] describes a system that relies on visual hulls to render avatars, or graphical representations of users in immersive virtual spaces. Leibe et al. [34] use visual hulls to automatically reconstruct objects placed on a special table by the users. The idea is that such objects can serve as natural components of a novel “wire-free” interface for human/computer interaction. In this work, the visual hull is constructed not from silhouettes in images taken by several cameras, but from shadows cast on the table by several strategically placed infrared light sources. Though this setup may seem unusual, the principle behind visual hull construction remains exactly the same.

If visual hull technology is to be useful for spontaneous human interaction with virtual environments, it needs to be fast enough to run in real time. This issue is addressed by

Matusik et al. [38], who speed up the reconstruction process by introducing a view-dependent sampled representation and by applying clever optimizations to the computationally intensive operation of intersecting reprojected visual rays with silhouettes.

Besides virtual reality, another important application of visual hulls is 3D photography, or acquisition of high-quality geometric and photometric models of real-world objects. For example, Shlyakhter et al. [54] use visual hulls to build models of trees. A recent state-of-the- art system for 3D photography is based on opacity hulls [40], which are essentially visual hulls augmented with transparency information. In this work, silhouettes are extracted using so- phisticated matting techniques that involve photographing the object against a background of plasma monitors displaying specially calibrated sinusoidal patterns, and visual hulls are computed using a variant of the real-time image-based algorithm [38]. In the future, 3D photography is likely to be extended to dynamic objects and increasingly complex environ- ments.

11 1.3 Mathematical Ingredients

In this thesis, our first task is to discover the “true” nature of the visual hull — that is, to

find the largest set of transformations that does not change the structure of its surface as a topological polyhedron (recall Section 1.1). The key features of the visual hull surface are its edges and vertices. As already mentioned earlier, edges of the visual hull lie on intersection curves between pairs of cones. An edge point is a point common to two visual cones, or equivalently, the intersection of two visual rays formed by back-projecting two points on two different outlines. In Chapter 4, we will show that visual hull vertices can be of two types:

Intersection Point: a point common to three visual cones. Equivalently, it is the point of intersection of three visual rays formed by back-projecting three outline points in

three different views.

Frontier Point: a point where a plane passing through two of the camera centers is tangent to the object.

Though the technical details of the definitions of edges and vertices are not important at this stage, it is easy to see that these definitions are based on the notion of contact of lines and planes with a surface in 3D. Such contacts are the domain of projective geometry,as they remain invariant under all (smooth) transformations that leave lines and planes intact.

Having concluded that the true nature of the visual hull is projective, we state our primary goal: to develop algorithms for reconstructing the visual hull based on projective information alone. However, things are not so simple: projective geometry happens to lack many basic notions without which one cannot compute the visual hull. In a purely projective framework, we cannot define a ray or a segment; such intuitive relations as front/back and inside/outside have no meaning. Thus, in Chapter 2 we are forced to explore oriented projective geometry [55], an elegant extension of standard projective geometry that will give us just enough additional expressive power to describe the structure of visual hulls.

12 Besides oriented projective geometry, we will also need to explore projective differential geometry, an area of mathematics that deals with projectively-invariant properties of smooth curves and surfaces. Chapter 3 will deal with the fundamentals of this subject, and state several important results which will be used in our main derivations in Chapter 4.

Finally, let us say a word about the key assumptions underlying our work. We assume that the object Ω is bounded by a smooth surface. The justification for the smoothness assumption comes from Koenderink who insists that “real-life tolerances make everything smooth” [25]. In addition, we assume that Ω is generic. Informally, an object is generic with respect to some property if this property also holds for any object that is obtained by an infinitesimal perturbation. The genericity assumption also extends to the outlines seen by all input cameras: namely, outlines must remain topologically stable under infinites- imal movements of the camera center. This requirement rules out viewpoints located on one-dimensional critical event surfaces that form the cell boundaries in perspective aspect graphs [47]. Finally, a general position assumption applies to the configuration of the cameras in the viewing region. Specifically, no four cameras can be coplanar and no three cameras can be located in the same tangent plane to the surface.

The main purpose of the genericity assumption is to restrict our attention to properties that we expect to find in “typical” situations. For our purposes, generic objects and camera configurations are precisely those that persist under small perturbations that can easily be introduced by noise and numerical error in real-world applications.

1.4 Overview

The rest of the thesis is organized as follows. Chapter 2 presents the basics of oriented projective geometry and applies these basics to analyze the geometry of single- and multi- view configurations of perspective cameras. Appendix A includes a reference table of useful formulas derived using the framework of Chapter 2. Chapter 3 is a survey of projective

13 differential geometry for curves in 2D and 3D, and surfaces in 3D. Chapter 4 is the heart of the thesis: it applies the mathematical framework of Chapters 2 and 3 to describe the geometric properties of visual hulls. In addition to several results of mostly theoretical interest, this chapter gives algorithms for constructing the visual hull and related data structures. The implementation of these algorithms is described in Chapter 5, which also contains results on four different data sets (one synthetic and three real). The thesis concludes in Chapter 5 with a summary and a discussion of future research directions. A preliminary version of this research [32] was presented at the IEEE Conference on

Computer Vision and Pattern Recognition in December 2001.

14 CHAPTER 2

Oriented Projective Geometry

The study of oriented projective geometry (OPG) has been inaugurated by Stolfi [55], who has argued for the adoption of this framework in the fields of computational geometry and computer graphics. Laveau and Faugeras [31] were probably the first to show an interest in OPG in the field of computer vision. Hartley [20] has built on oriented ideas to develop his ideas of quasi-affine reconstruction and chirality (these will be briefly mentioned in Section 2.5). Werner and Pajdla [60, 61] have described oriented matching constraints that are mathematically equivalent to the epipolar consistency constraints described in Section 2.4.

This chapter consists of two parts. The first part, Sections 2.1-2.2, presents a summary of Stolfi’s framework. This summary, though necessarily brief and dense, is intended to be self-contained (at least for the purposes of this thesis). The second part applies OPG to perspective cameras. Section 2.3 deals with the “anatomy” of a single camera, and Section

2.4 deals with multi-view geometry, in particular, the fundamental matrix and the trifocal tensor. To our knowledge, the formulas for oriented transfer (Section 2.4.3) are novel. Section

2.5 briefly considers the subject of oriented reconstruction. Also, refer to Appendix A for a list of oriented formulas for computing with flats in 2D and 3D.

15 2.1 Basics

2.1.1 Oriented Projective Space

Let us begin by recalling the standard definition of an unoriented projective space:

Definition 2.1 (Projective Space). The n-dimensional projective space Pn is formed from the real vector space Rn+1 by taking away the null vector 0 and identifying all vectors that are non-zero scalar multiples of each other. In other words, Pn is the quotient of Rn+1 \{0} under the relation ∼, defined as

x ∼ y iff x = λy for some λ =0 .

Using the usual notation for a quotient, we can write the above briefly as

Pn =(Rn+1 \{0})/ ∼ .

An oriented projective space Tn is defined in the same way, except that we only identify vectors that are positive multiples of each other:

Definition 2.2 (Oriented Projective Space). The n-dimensional oriented projective space Tn is the quotient

Tn =(Rn+1 \{0})/  ,where

x  y iff x = λy for some λ>0 .

In the following, we will use the analytic or vector space model of Tn [55, p. 16]. In this model, we can refer to points in Tn by their representative vectors in Rn+1. The statement

X =p(x)saysthatX is the unique point of Tn that is the equivalence class consisting of the vector x and all its positive multiples. For any two vectors x and x in Rn+1,wehave p(x)=p(x) if and only if x  x. The points represented by vectors x and −x are called antipodal. We will denote the antipode of point X as ¬X.

16 Vector space Oriented projective space Ray Point Linear subspace Flat set Direct sum of subspaces Join of flats Intersection of subspaces Meet of flats Linear map Projective map

Table 2.1 Structures and operations in the vector space Rn+1, together with the structures and operations they induce in the oriented projective space Tn.

In the vector space model, the geometric structure of the oriented projective space is induced by the structure of the underlying vector space (see Table 2.1). In the following section, we will define flats of Tn, which correspond to linear subspaces of Rn+1.

A Note On Topology. In the rest of this document, we will assume that Rn+1 has the standard topology induced by the Euclidean metric1. Equivalently, this is the topology whose basis consists of sets of the form A1 × ...× An+1,whereeachAi is an open inverval of the real line R [43]. Then the topology of Pn (resp. Tn) is simply the quotient topology induced on Rn+1 by the quotient map that takes each vector of Rn+1 to its equivalence class under the relation ∼ (resp. ). That is, the open sets of Pn (resp. Tn) are exactly the sets whose pre-images under the respective quotient map are open in Rn+1. An important topological fact is that both Pn and Tn are n-dimensional manifolds — that is, every point in these spaces has a neighborhood homeomorphic to an open set of

Rn. Even though Pn is more familiar to us, the oriented space Tn is actually simpler: it is homeomorphic to Sn,then-dimensional sphere embedded in Rn+1.

1Actually, the metric induced by any p-norm will do.

17 2.1.2 Flats

Before defining an oriented flat, we will first introduce a simpler concept of unoriented

flat sets.

Definition 2.3 (Flat Set). AsetF of points of Tn is a d-dimensional flat set of Tn if there exists a (d +1)-dimensional vector subspace S of Rn+1 such that

F = {p(x) | x ∈ S \{0}} .

Flat sets of dimensions 0, 1, and 2 are unoriented points, lines, and planes, respectively. A line is uniquely determined by two non-coincident points, and a plane is uniquely determined by three non-collinear points. Overall, a set of d+1 points in general position, called a proper simplex, forms a basis for a d-dimensional flat set, in the same way that a set of d+1 linearly independent vectors forms a basis for a (d + 1)-dimensional vector subspace.

Definition 2.4 (Simplex). A d-simplex is an ordered tuple of d+1 vertices,orpointsinTn. The simplex is proper if the vectors representing these points are linearly independent; oth- erwise, it is degenerate.Thespan of a simplex is the unique flat set of minimum dimension that contains it.

If a flat set is spanned by a simplex with vertices (X1,...,Xd+1), we will denote that

flat set as [X1,...,Xd+1]. Next, we need to formulate a notion of equivalence for any two simplices spanning the same flat set. Intuitively, two simplices are equivalent if one can be continuously transformed into the other such that all intermediate simplices span the same flat set. To make this idea more precise, we first need to define basis equivalence for vector spaces.

Definition 2.5 (Basis Equivalence). Let x1,...,xd+1 and y1,...,yd+1 be two ordered bases for a d +1-dimensional vector subspace V of Rn+1. Then there exists a unique linear trans- formation L : V → V such that yi = Lxi for i =1,...,d+1. The two bases are called equivalent (equivalently oriented) if L has positive determinant.

18 The above equivalence relation partitions the set of all ordered bases for the same vector subspace into two classes (that there are exactly two classes can be shown using the product rule for determinants and the fact that the determinant of L must be either positive or negative).

Definition 2.6 (Simplex Equivalence). Let (X1,...,Xd+1) and (Y1,...,Yd+1) be two proper

n simplices spanning the same d-dimensional flat set of T .Letx1,...,xd+1 and y1,...,yd+1 be any two sets of vectors such that Xi =p(xi) and Yi =p(yi) for i =1,...,d+1.Then

(X1,...,Xd+1) and (Y1,...,Yd+1) are equivalently oriented if x1,...,xd+1 and y1,...,yd+1 are equivalently oriented.

It is easy to verify that multiplying any vector xi or yi by a positive scale factor does not change the determinant of the transformation L in Definition 2.5. Therefore, Definition

2.6 of simplex equivalence does not depend on the particular choice of vector representatives for the simplices (X1,...,Xd+1)and(Y1,...,Yd+1).

Proposition 2.1. Let σ be a permutation of the integers 1,...,d+1.Then(X1,...,Xd+1) and (Xσ(1),...,Xσ(d+1)) are in the same equivalence class if and only if σ is an even permuta- tion. Also, the simplices (X1,...,Xi,...,Xd+1) and (X1,...,¬Xi,...,Xd+1) are in different equivalence classes.

One particularly useful oriented notion is the interior of a simplex. Without this no- tion, we cannot define what it means for a point to belong to a segment, a triangle, or a tetrahedron.

Definition 2.7 (Simplex Interior). The interior of a simplex (X1,...,Xd+1) is the set of all points X that produce an equivalent simplex when substituted for any of the vertices

Xi. That is, the simplex (X1,...,Xi−1,X,Xi+1,...,Xd+1) is equivalent to (X1,...,Xd+1) for i =1,...,d+1.

19 All the simplices spanning a given flat set of Tn form two equivalence classes, identified with two orientations of that flat set. In this way, each flat set gives rise to two oppositely oriented flats, corresponding to the two simplex classes.

Definition 2.8 (Flat). An oriented flat is a flat set to which an orientation has been assigned by naming one equivalence class of simplices that span it. Given any oriented flat X,there exists an opposite flat, denoted ¬X, consisting of the same points, only taken with a different orientation.

Let F denote the set of all oriented flats of Tn,andF d the set of flats of dimension d.

We will postulate the existence of the set F −1, consisting of two flats Λ, the positive vacuum, and ¬Λ, the negative vacuum (the bases for Λ and ¬Λ consist of zero points). Similarly, the set F n consists of two flats Υ, the positive universe,and¬Υ, the negative universe.An

(n + 1)-simplex will be called positive if it spans Υ, and negative if it spans ¬Υ.

An important comment about the members of F 0 is in order. By definition, a zero- dimensional flat set is an unordered set of antipodal points. This flat set gives rise to two oriented flats, each of which consists of the same two antipodal points, but with a different point singled out as the “positive” (spanning) simplex. Thus, zero-dimensional flats can be identified with the points of Tn. Nevertheless, it is important to keep in mind the technical distinction between the two types of entities. For instance, the operations of join, meet, and relative orientation, which will be presented in the next section, are defined for flats, not for points.

2.1.3 Join, Meet, and Relative Orientation

In the following, F =[X1,...,Xd+1]andG =[Y1,...,Ye+1] will denote d-ande- dimensional oriented flats of Tn, respectively2.

2Even though we will denote unoriented flat sets and oriented flats in the same way, the context will always make it clear which entity is being discussed.

20 Definition 2.9 (Join). Thejoinoftwodisjoint flats F and G is the flat spanned by a simplex formed by concatenating the simplices that span F and G:

F ∨ G =[X1,...,Xd+1] ∨ [Y1,...,Ye+1]

=[X1,...,Xd+1,Y1,...,Ye+1] .

Note that the join operation is not defined if (X1,...,Xd+1,Y1,...,Ye+1)isnotaproper simplex. Whenever F ∨ G is undefined, we write F ∨ G = 0 (recall that 0 does not actu- ally exist in the oriented projective space Tn, so we can use the null symbol to denote an “undefined” result of an operation).

Proposition 2.2. Join has the following properties:

1. dim(F ∨ G)=d + e +1.

2. (¬F ) ∨ G = F ∨ (¬G)=¬(F ∨ G).

3. Identity: Λ ∨ F = F ∨ Λ=F , (¬Λ) ∨ F = F ∨ (¬Λ) = F .

4. Associativity: (F ∨ G) ∨ H = F ∨ (G ∨ H).

5. Commutativity: G ∨ F, if (d +1)(e +1)is even. F ∨ G = ¬(G ∨ F ) , if (d +1)(e +1)is odd.

Briefly, we write F ∨ G = ¬(d+1)(e+1)(G ∨ F ).

The definition of the meet operator is a bit more tricky. In particular, the meet of two

flats F and G is defined only with respect to a flat U of smallest dimension that contains both F and G.

Definition 2.10 (Meet). Let P , Q, R,andU be flats such that F = P ∨ Q, G = Q ∨ R, and P ∨ Q ∨ R = U.ThenQ is the result of the meet of F and G with respect to U:

F ∧U G = Q.

21 In other words, for any three flats P , Q,andR such that P ∨ Q ∨ R = U, we have by definition

(P ∨ Q) ∧U (Q ∨ R)=Q.

In the future, we will write F ∧ G instead of F ∧Υ G.

Proposition 2.3. Meet has the following properties:

1. dim(F ∧ G)=d + e − n .

2. (¬F ) ∧ G = F ∧ (¬G)=¬(F ∧ G).

3. Identity: Υ ∧ F = F ∧ Υ=F , (¬Υ) ∧ F = F ∧ (¬Υ) = F .

4. Associativity: (F ∧ G) ∧ H = F ∧ (G ∧ H).

5. Commutativity: F ∧ G = ¬(n−d)(n−e)(G ∧ F ).

In two dimensions, the meet operation is defined only for two lines, and in three dimen- sions, it is defined only for two lines, or a line and a plane.

One of the most useful things about an oriented projective space is the ability to define relative orientation of various flats with respect to one another. We can answer questions like, does a given point lie to the left or the right of a given line? In the unoriented projective plane P2, this question is meaningless, since P2 is unorientable, and lines have no “sides”. In fact, if one removes a line from of P2, the remaining set of points is homeomorphic to a disk.

On the other hand, T2 is topologically equivalent to a sphere, and lines are great circles that partition the sphere into two connected components, or “sides”.

In general relative orientation is defined for two flats F and G if d + e +1=n,thatis, if F ∨ G is equal to the positive or negative universe. Relative orientation, denoted F G, can be defined either in terms of join or meet operations, as follows:

22 Definition 2.11 (Relative Orientation). ⎧ ⎫ ⎧ ⎫ ⎧ ⎫ ⎨ +1 ⎬ ⎨ Υ ⎬ ⎨ Λ ⎬ ∨ ∧ F G = ⎩ 0 ⎭ iff F G = ⎩ 0 ⎭ iff F G = ⎩ 0 ⎭ . −1 ¬Υ ¬Λ In 2D, relative orientation is defined for points and lines, and in 3D, it is defined for points and planes, or for pairs of lines. The following result [55, p. 66, Theorem 2] will be useful to us:

Proposition 2.4. AflatF of dimension d is uniquely characterized by the sign-valued

n−d−1 function σF : F →{−1, 0, 1}, defined by σF (G)=F G, for any (n−d−1)-dimensional flat G.

2.1.4 Oriented Projective Transformations

Definition 2.12 (Oriented Projective Transformation). A function M : Tn → Tn is an oriented projective transformation of Tn if it takes positive (n +1)-simplices to positive (n +1)-simplices, and there exists a linear map L : Rn+1 → Rn+1 such that

p(L(x)) = M(p(x)) for all x ∈ Rn+1 \{0} .

Two linear maps L and L give rise to the same oriented projective transformation if and only if L = λL,λ > 0. Oriented projective maps can be extended from maps on points to maps on flats as follows:

M(Λ) = Λ ,

M(¬Λ) = ¬Λ ,

M([X1,...,Xd+1]) = [M(X1),...M(Xd+1)] .

Proposition 2.5. Let M : Tn → Tn be an oriented projective transformation. Then M has the following properties:

1. M(F ∨ G)=M(F ) ∨ M(G) .

23 2. M(F ∧ G)=M(F ) ∧ M(G) .

3. M(¬F )=¬M(F ) .

4. M(F ) M(G)=F G .

Apart from full-rank maps of the space Tn to itself that preserve the relative orientation of simplices, we are also interested in projective maps between oriented spaces of different dimensions. Of particular interest will be the camera projection map taking points of T3 to points of T2.

Definition 2.13 (Generalized Projective Transformation). A function M : Tm → Tn is a generalized projective transformation if there exists a linear map L : Rm+1 → Rn+1 such that

p(L(x)) = M(p(x)) for all x ∈ Rm+1 \{0} .

The null space and the range of M are two flat sets defined as follows:

Null(M)={X ∈ Tm | M(X)=0} ,

Range(M)={Y ∈ Tn | Y = M(X) for some x ∈ Tm} .

A generalized projective transformation extends to flats in the same way as an oriented projective transformation. Note that if F is a flat that is not disjoint from Null(M), then

M(F )=0. Unlike a true oriented projective map, a generalized map can return an undefined result even if the argument is a well-defined flat.

Proposition 2.6. Let M : Tm → Tn be a generalized projective transformation. M has the following properties:

1. M(F ∨ G)=M(F ) ∨ M(G) .

24 2. Let U be any flat of Tm disjoint from Null(M),andV = M(U) be the well-defined flat of Tn that is the image of U under M.IfF and G are both contained in U,then

M(F ∧U G)=M(F ) ∧V M(G) .

3. M(¬F )=¬M(F ) .

Note that property 4 of oriented projective maps has no analogue in the list above. In fact, the requirement that the orientation of simplices be preserved does not even make sense for a generalized map, which may take a full-rank simplex of the domain into a degenerate simplex of the range.

One interesting class of generalized projective maps are invertible maps from Tn to itself that map positive (n + 1)-simplices onto negative (n + 1)-simplices. Such maps are called orientation-reversing. By constrast, oriented projective transformations are constrained by definition to be orientation-preserving.

2.2 Computing with Flats

In this section, we will introduce the representation for flats used in the rest of the thesis, and present a complete list of oriented formulas for join, meet, and relative orientation.

2.2.1 Oriented Projective Frames

Points will be represented using signed homogeneous coordinates, defined with respect to an oriented projective basis. Intuitively, to define a projective basis it would seem to

n+1 be sufficient to select a basis a1,...,an+1 of R and to take the resulting n +1points

Ai =p(ai). However, the knowledge of these n + 1 points is insufficient to determine a unique coordinate vector of a new point: even though we can unambiguously go from a vector basis a1,...,an+1 to a set of points A1,...,An+1, we cannot go backwards. After all, each point Ai is equally well represented not only by the basis vector ai, but by any positive

25 multiple λiai. Thus, we simply don’t know how to select a representative vector for each point Ai to form a unique linear combination. The solution is to add one extra point to the projective basis.

Definition 2.14 (Oriented Projective Basis). An oriented projective basis for Tn consists of n +2points A1,...,An+2 such that the following conditions are met:

1. (A1,...,An+1) is a proper positive simplex. That is, [A1,...,An+1]=Υ.

2. The point An+2 is in the interior of the simplex (A1,...,An+1) (recall Definition 2.7).

n+1 3. There exists a basis a1,...,an+1 of R such that

Ai =p(ai),i=1,...,n+1, and An+2 =p(a1 + ...+ an+1) .

(A1,...,An+1) is called the main simplex,andAn+2 is called the unit point.

Note that the traditional definition of a basis for an unoriented projective space includes only the third requirement above.

With the additional constraint imposed by the unit point, the ambiguity in selecting a vector basis a1,...,an+1 effectively disappears: the n + 1 independent scale factors λi associated with all ai are now reduced to a single factor λ common to all the vectors.

n Proposition 2.7. Let A1,...,An+2 be an oriented projective basis of T ,anda1,...,an+1

  n+1 and a1,...,an+1 be two bases of R satisfying the conditions of Definition 2.14. Then

 there exists a scalar λ>0 such that ai = λai, i =1,...,n+1.

Now, given an oriented projective basis A1,...,An+2 and an underlying vector basis a1,...,an+1, we can easily determine homogeneous coordinates of new points. The vector

T x =(x1,...,xn+1) is the signed homogeneous coordinate vector of a point X if and only if

X =p(x1a1 + ...+ xn+1an+1).

26 2.2.2 Simplex Orientation

In this section, we describe an algebraic criterion that allows us to designate a simplex as positive or negative based on the homogeneous coordinates of its points.

Definition 2.15 (Simplex Orientation Using Determinants). Let (X1,...,Xn+1) be a proper

(n+1)-simplex and x1,...,xn+1 be the homogeneous coordinate vectors of its points in some fixed oriented projective coordinate system. Then the simplex is defined as positive if and only if the determinant of the coordinate vectors, denoted |x1,...,xn+1|, is positive.

We can actually show that the above convention is independent of the particular oriented frame used. That is, if a simplex is designated as positive in one frame, it will also be positive in any other frame that meets the criteria of Definition 2.14.

To simplify working with projective coordinates, we can define certain canonical frames.

n In T , the canonical frame consists of points Ei =p(εi), i =1,...,n+2:

T ε1 =(1, 0,...,0) ,

T ε2 =(0, 1,...,0) ,

... (2.1)

T εn+1 =(0, 0,...,1) ,

T εn+2 =(1, 1,...,1) .

n We will designate (E1,...,En+1)asthecanonical positive simplex of T .

For purposes of computation, it is sometimes convenient to represent coordinate vectors in n-dimensional , En, as homogeneous coordinate vectors in Tn. For example, a

T T Euclidean coordinate vector (x1,...,xn) becomes a homogeneous vector w (x1,...,xn, 1) , where w is some positive scalar. In the Euclidean interpretation, we consider the last entry of each vector to be the homogeneous coordinate. A positive value indicates a finite point, and a value of 0 indicates a point at infinity, or a vector with direction given by the first two entries.

27 The canonical oriented projective frame (2.1) has a convenient Euclidean interpretation.

In two dimensions, we can interpret E1 as the point at infinity along the positive x-axis,

E2 as the point at infinity along the positive y-axis, and E3 as the origin. To visualize the orientation of this triad, we can picture a circular arrow pointing counterclockwise from E1 to E2 to E3.

Similarly, in three dimensions, we can think of E1, E2 and E3 as the points at infinity along the positive x-, y-, and z-axes, respectively, and E4 as the origin. To visualize the orientation of this “tetrahedron”, we can imagine curling the fingers of the left hand clockwise from E1

3 to E2 to E3, with the thumb pointing towards E4 .

2.2.3 Representing General Flats

One common way to represent flats of general dimension is by using Pl¨ucker coordi-

Tn n+1 nates [55, Chapter 19]. A d-dimensional flat in is represented using a vector of d+1 homogeneous coordinates. The Pl¨ucker coordinates of a point4 are simply its signed homo- geneous coordinates, as defined in Section 2.2.1. Letustakethecased = n.ThetwomembersofF n are represented by scalars: Υ by posi- tive numbers, and ¬Υ by negative numbers. This system fits rather well with the orientation convention described in the previous section. For instance, given homogeneous coordinate vectors of n + 1 points, we can compute their span simply by taking their determinant (see

Definition 2.15). Also, given any two flats F =[X1,...,Xd+1]andG =[Y1,...,Ye+1]such that d + e +1=n, we can compute F ∨ G as follows: ⎧ ⎫ ⎧ ⎫ ⎨ Υ ⎬ ⎨ +1 ⎬ ∨ | | F G = ⎩ 0 ⎭ iff sgn x1,...,xd+1, y1,...,ye+1 = ⎩ 0 ⎭ , ¬Υ −1

3Ideally, we would like to orient tetrahedra using the right-hand rule. But then, the interpretation of the canonical simplex has to be changed to make E1 the origin, etc. This can be done consistently in all dimensions by considering the homogeneous coordinate to be the first entry of each vector. Unfortunately, in vision and graphics literature it is customary to put the homogeneous coordinate last, and we have to stay faithful to this convention. 4Here and in the future, when we say “point”, we mean “zero-dimensional flat”. Refer to the discussion at the end of Section 2.1.2.

28 where xi are coordinates of Xi,andyi are coordinates of Yi. A comparison of the above formula with the abstract formula for relative orientation (Definition 2.11), makes it im- mediately obvious how to compute the relative orientation of two flats of complementary dimension given coordinates of their representative simplices.

Next, let us consider hyperplanes, flats of dimension d = n − 1 (lines in T2 and planes in T3). For convenience of computation, we will represent hyperplanes not by their (n +1)- dimensional Pl¨ucker coordinate vectors, but by hyperplane coefficients, which can be obtained from Pl¨ucker coordinate vectors by reversing their entries and flipping certain signs. The purpose of this change is to simplify relative orientation formulas by converting them to dot products. The coefficients of a hyperplane H are defined (up to positive scale) by a vector h such that, for any point X with homogeneous coordinates x,wehave ⎧ ⎫ ⎧ ⎫ ⎨ +1 ⎬ ⎨ +1 ⎬ T H X = ⎩ 0 ⎭ iff sgn(h x)=⎩ 0 ⎭ . −1 −1

Finally, lines in T3 will be represented using a vector of six Pl¨ucker coordinates,also

T defined up to positive scale. A Pl¨ucker coordinate vector L =(l12,l13,l14,l23,l24,l34) of a line L can be computed given coordinates of two points that lie on L,orcoefficientsoftwo planes that contain L. Significantly, not every sextuple of numbers represents valid Pl¨ucker coordinates of some line in space. For any line L,wemusthaveL ∨ L = 0, and this places a quadratic constraint on the coordinate vector L, expressed in equation (A.18). To simplify the formula for relative orientation of two lines, we can also define a coefficient vector of the line, as shown in (A.17).

In the rest of this document, we will simplify notation by identifying flats with their coordinate (coefficient, Pl¨ucker) vectors. For example, we will generally make no distinction between a point X and its coordinate vector x. Lowercase letters will denote flats in T2, and uppercase letters will denote flats in T3. In the next section, we will show formulas for manipulating flats in T2 and T3 using their vector representation. The general method for

29 deriving these formulas may be found in Stolfi [55, Chapter 20]. Even though join, meet, and relative orientation are defined only on flats, we will abuse notation by using the symbols

∨, ∧,and to denote operations on vectors. For example, if x and y are the homogeneous coordinates of two 2D points X and Y , we will write x ∨ y to denote the coefficient vector of the line X ∨ Y . This notation is potentially ambiguous. For example, if we do not know whether the 3-vectors x and y are supposed to represent points or lines, we cannot evaluate the statement x ∨ y. For example, if x and y are two points, then x ∨ y = x × y (A.1). However, if x is a point and y is a line, then x∨y = xT y (A.4). We will avoid such problems by always clearly stating which flats the coordinate vectors are supposed to represent.

For a complete reference sheet of formulas for computing with flats in 2D and 3D, refer to Appendix A.

2.2.4 Representing Projective Transformations

Oriented projective transformations in Tn correspond to (n +1)× (n + 1) matrices with positive determinant. We will also refer to oriented projective transformations as orientation- preserving transformations. Orientation-reversing transformations are invertible transforma- tions from Tn to Tn that map positive (n + 1)-simplices to negative (n + 1)-simplices (recall

Section 2.1.4). Such transformations correspond to (n +1)× (n + 1) matrices with negative determinant.

If we know the action of a given transformation on points, we can derive the induced action of the same transformation on higher-dimensional flats. The following result, which will be used in Section 2.5, shows how the coefficient vector of a plane changes under a transformation that acts on points:

Proposition 2.8. Let P = X∨Y ∨Z, and suppose that the points X, Y ,andZ undergo an orientation-preserving or orientation-reversing projective transformation M,asX˜ = MX,

30 etc. Then the coefficient vector P transforms as follows:

P˜ = X˜ ∨ X˜ ∨ Z˜ = |M| M −T P , (2.2) where |M| denotes the determinant of the matrix M.

T Proof. Let W =(w1,w2,w3,w4) be the coordinate vector of an arbitrary point, and

T W˜ = MW =(˜w1, w˜2, w˜3, w˜4) be the coordinates of the transformed point. According to Proposition 2.4, a flat is uniquely characterized by the function giving its relative ori- entation with respect to every other flat of complementary dimension. That is, if we have

P˜ W˜ = P˜ W˜ (equivalently, P˜ ∨W˜ = P˜ ∨W˜ ) for any arbitrary point W˜ ,thenP˜ = P˜. In the following, A, B, C, D will denote the columns of the inverse transformation matrix, M −1.

P˜ ∨ W˜ = |X˜ , Y˜ , Z˜, W˜ | = |MX,MY ,MZ,MW | = |M||X, Y , Z,M−1W˜ |

= |M||X, Y , Z, w˜1A +˜w2B +˜w3C +˜w4D|

=˜w1 |M||X, Y , Z, A| +˜w2 |M||X, Y , Z, B| +

w˜3 |M||X, Y , Z, C| +˜w4 |M||X, Y , Z, D|

T T T T = |M| (˜w1P A +˜w2P B +˜w3P C +˜w4P D)

= |M| P T M −1W˜ =(|M| M −T P ) ∨ W˜ .

Therefore, P˜ = |M| M −T P .

Proposition 2.9 (Corollary). Let X = P ∧ Q ∧ R, and suppose that the planes are trans- formed as P˜ = M −T P , etc. Then X transforms as follows:

X˜ = P˜ ∧ Q˜ ∧ R˜ = |M −1| MX . (2.3)

We will also be interested in generalized projective transformations between spaces of different dimensions, represented by non-square matrices. The main such transformation, camera projection, will be explained in detail in Section 2.3.

31 Finally, we must mention that matrices of projective transformations, just like signed homogeneous coordinate vectors, are defined up to a positive scale factor. Two matrices M and M  represent the same transformation if and only if M = λM , λ>0.

2.3 Imaging Geometry of a Single Camera

The 3D scene is the oriented space T3, and the image plane is modeled by T2. The imaging process of a perspective camera can be described as a generalized projective transformation from the scene to the image plane. This transformation is computed using a 3 × 4 camera projection matrix P : ⎛ ⎞ P T x  P X = ⎝ QT ⎠ X . (2.4) RT The rows P , Q, R of the matrix P can be geometrically interpreted as coefficients of three

T projection planes of the camera [14]. Letting x =(x1,x2,x3) and using formula (A.14), we can rewrite the projection equation (2.4) as ⎛ ⎞ ⎛ ⎞ x1 P ∨ X ⎝ ⎠ ⎝ ⎠ x2  Q ∨ X . x3 R ∨ X

If X lies on the plane P ,wehaveP X =sgn(P ∨ X)=0,sox1 = 0. Similarly, if X lies on Q,thenx2 =0,andifX lies on R,thenx3 = 0. The third plane, R, is usually accorded special status as the focal plane of the camera. This special status is not inherent in the mathematics of camera projection, but reflects the constraint that any point seen by an actual physical camera must lie in front of its focal plane: ⎧ ⎨ R X =+1, ∨ x is a visible point in the image iff ⎩ R X > 0 , (2.5) x3 > 0 .

A natural model for the image plane is the two-sided plane [55, p. 13]. The front range of the plane consists of points with x3 > 0, and the back range consists of points with x3 < 0.

32 The two sides are “glued together” by a line at infinity, made up of points with x3 =0(see

figure, after [55, Figure 3, p. 14]).

The null space of the camera projection transformation P is the point O such that P O = 0. In standard projective geometry, O determines, up to an arbitrary scale factor, the center of the camera. To give a well-defined orientation to O,wedefineitasthemeet of the three projection planes:

O  P ∧ Q ∧ R . (2.6)

For P to represent a valid camera transformation, its range has to be the whole im- age plane, T2. This means that the coordinate vectors P , Q,andR have to be linearly independent (equivalently, the rank of P hastobethree).

Proposition 2.10. Let X, Y ,andZ be three points in space with images x  P X, y  P Y ,andz  P Z.Then

|x, y, z||O, X, Y , Z| . (2.7)

Proof. Let P˜ denote the 4×4 matrix formed by appending the row vector OT to the camera matrix P : ⎛ ⎞ P T ⎜ ⎟ ⎜ QT ⎟ P˜ = ⎜ ⎟ . ⎝ RT ⎠ OT It is easy to find the expression for the determinant of P˜ by expanding along the last row and plugging in formula (A.11): |P˜| = −OT O .

Let S denote the 4 × 4 matrix formed by concatenating the column vectors O, X, Y ,and Z:

S = OXY Z .

33 The determinant of the product PS˜ can be derived as follows:

|PS˜ | = |P˜||S| = −OT O |S|−|S| .

We can compute the same determinant in a different way, by explicitly multiplying the two matrices and expanding along the first column:          P T OPT XPT YPT Z   0 P T XPT YPT Z       T T T T   T T T  | ˜ |  Q OQXQYQZ   0 Q XQYQZ  PS =  T T T T  =  T T T   R ORXRYRZ   0 R XRYRZ   OT OOT XOT YOT Z   OT OOT XOT YOT Z     T T T   P XPYPZ  T  T T T  = −O O  Q XQYQZ  −|x, y, z| .   RT XRT YRT Z

We have found that |PS˜ |−|S| and |PS˜ |−|x, y, z|. The desired conclusion follows immediately: |x, y, z||O, X, Y , Z| .

Proposition 2.11 (Corollary). Let Π be a plane in space not containing the camera center O. Then the restriction of the camera projection map P to the plane Π is orientation- preserving if and only if Π ∨ O < 0 (that is, the plane Π is facing away from the camera center).

Each pair of projection planes intersects along a projection ray of the camera. The oriented Pl¨ucker coordinates of these three rays are given by Q∧R, R ∧P ,andP ∧Q.The next proposition identifies these rays with the first three points of the canonical projective basis (2.1) for the image plane, T2.

34 Proposition 2.12. Let A, B,andC be points in T3 such that Q ∧ R  O ∨ A, R ∧ P 

O ∨ B,andP ∧ Q  O ∨ C. Then we have

T P A  (1, 0, 0) = ε1 ,

T P B  (0, 1, 0) = ε2 ,

T P C  (0, 0, 1) = ε3 .

B C

¶ 2 ¶ 3 A

¶ 1

O

Figure 2.1 Illustration of Proposition 2.12.

Proof. Point A lies in the intersection of planes Q and R,sowehaveQ ∨ A = R ∨ A =0. Analogously, P ∨ B = R ∨ B =0andP ∨ C = Q ∨ C = 0. Therefore, P A =(α, 0, 0)T ,

P B =(0,β,0)T ,andP C =(0, 0,γ)T . Now we just have to show that α = P ∨A, β = Q∨B, and γ = R ∨ C are all positive.

Since the operation of meet is associative, we can rewrite (2.6) as O  P ∧ (Q ∧ R). Let us now follow the definition of meet (Definition 2.10). Let L be a line such that P  L ∨ O.

From the statement of the proposition, we already have a point A such that Q∧R  O ∨A.

35 Because O is the result of the meet P ∧ (Q ∧ R), we know that L ∨ O ∨ A  P ∨ A is equal to the positive universe Υ. According to equation (A.14), α = P ∨ A > 0. Geometrically, the point A on the projection ray Q ∧ R is in front of the plane P . By similar reasoning, we get β = Q ∨ B > 0andγ = R ∨ C > 0.

T The three points ε1, ε2, ε3, together with unit point ε4 =(1, 1, 1) , form the canonical projective basis for the oriented image plane, and for the projectively equivalent space of oriented lines through the camera center O. Thus, we have identified the ray Q∧R with the coordinate vector ε1, R ∧ P with ε2,andP ∧ Q with ε3. The next proposition [14, p. 183]

T shows that the visual ray corresponding to the image point (x1,x2,x3) = x1ε1 +x2ε2 +x3ε3 can be written as a linear combination of the three basis visual rays with the same coefficients.

Proposition 2.13. Let X be some (unknown) point in T3 that projects to point x =

T (x1,x2,x3) : x  P X. Then the viewing ray containing the point X is given by

O ∨ X  x1Q ∧ R + x2R ∧ P + x3P ∧ Q . (2.8)

In matrix form:

O ∨ X  Q ∧ RR∧ PP∧ Q x .

Proof. First, we will derive (2.8) as an unoriented expression [14, p. 183], and then we will check that it indeed represents a line oriented from O to X. Consider the three planes

x1Q − x2P ,x2R − x3Q,x3P − x1R .

EachoftheseplanescontainsX. Let us take, for example, the first plane:

(x1Q − x2P ) ∨ X = x1(Q ∨ X) − x2(P ∨ X)

= x1x2 − x2x1 =0.

Also, each of the planes contains O. Thus, we may form a line L containing both O and X as the meet of any two of the planes. At least one of x1, x2,orx3 must be nonzero. Let us

36 suppose that x1 = 0 (the arguments for x2 and x3 are very similar).

L  (x3P − x1R) ∧ (x1Q − x2P )

2 = x1Q ∧ R + x1x2R ∧ P + x1x3P ∧ Q .

Since L contains both O and X, we know that either L  O ∨ X or L  X ∨ O.

There are two cases to consider, based on the sign of x1. First, suppose x1 > 0. Then we have

L  x1Q ∧ R + x2R ∧ P + x3P ∧ Q ,

P ∧ L  P ∧ (x1Q ∧ R)  O .

Let us follow the definition of meet to “deconstruct” the statement O  P ∧ L.LetM be some line such that P  M ∨ O. Also, we can write L  O ∨ X˜ , such that either X˜  X or X˜ −X. By definition of meet, M ∨ O ∨ X˜  P ∨ X˜ > 0. Since x1 > 0, we also have

P ∨ X > 0. Because both P ∨ X˜ > 0andP ∨ X˜ > 0, we must have X˜  X. Therefore, L  O ∨ X.

Next, suppose x1 < 0. Then

L −x1Q ∧ R − x2R ∧ P − x3P ∧ Q ,

P ∧ L  P ∧ (−x1Q ∧ R) −O .

The argument is very similar to the one above. Because P ∧ L −O and P ∨−X > 0, by definition of meet, we have L −O ∨−X = O ∨ X.

In many cases, we are interested not only in the projection of 3D points into an image, but in the projection of other flats, such as lines. The following proposition (for the unoriented version, see Faugeras et al. [14, p.195]) gives the correct oriented formula for the coefficient vector of a line in 2D given by the projection of a line in 3D. This result will be used in

Section 2.4.1 to define the oriented fundamental matrix between two views.

37 Proposition 2.14. Let L be a line in space, and l be its image under the perspective pro- jection P . Then we may compute the coefficient vector of the 2D line as follows: ⎛ ⎞ (Q ∧ R) ∨ L l  ⎝ (R ∧ P ) ∨ L ⎠ . (2.9) (P ∧ Q) ∨ L

Proof. Let X and Y be any two points such that L  X ∨ Y .Then ⎛ ⎞ ⎛ ⎞ P T X P T Y l  (P X) ∨ (P Y )=⎝ QT X ⎠ ∨ ⎝ QT Y ⎠ RT X RT Y ⎛ ⎞ (QT X)(RT Y ) − (QT Y )(RT X) = ⎝ (RT X)(P T Y ) − (RT Y )(P T X) ⎠ (P T X)(QT Y ) − (P T Y )(QT X) ⎛ ⎞ XT (QRT − RQT )Y = ⎝ XT (RP T − PRT )Y ⎠ XT (PQT − QP T )Y ⎛ ⎞ (Q ∧ R) ∨ X ∨ Y = ⎝ (R ∧ P ) ∨ X ∨ Y ⎠ by (A.10) (P ∧ Q) ∨ X ∨ Y ⎛ ⎞ (Q ∧ R) ∨ L = ⎝ (R ∧ P ) ∨ L ⎠ . (P ∧ Q) ∨ L

Using our notation for the coefficient vector of a line (A.17), we can also write (2.9) as ⎛ ⎞ T (Q ∧ R)∗ ⎜ T ⎟ l  P ∗L = ⎝ (R ∧ P )∗ ⎠ L . T (P ∧ Q)∗

The matrix P ∗ is called the line projection matrix.

The next result concerns the inverse operation, back-projection of lines. Of course, a line l in the image does not uniquely determine a line in space. Let L be any line that projects to l in the image, and let Π be the plane containing the camera center O and the line L:

Π  O ∨ L. Then any other (unknown) line L that also projects to l must also lie on

38 Π. The following proposition shows that the coefficient vector Π can be computed using a simple formula. See Faugeras et al. [14, p. 184] for the unoriented version of the same result.

Proposition 2.15. Let l be a line in the image, and L be any line that projects to l according to (2.9). Then the coefficient vector of the plane Π  O ∨ L can be computed as follows:

Π  P T l . (2.10)

Proof. Let X and Y be two points such that L  X ∨ Y ,andZ be any other point in space. Also, let x, y,andz be the projections of these three points in the image. Note that l  x ∨ y. By equation (2.7), we have the following:

|x, y, z||O, X, Y , Z|

(x ∨ y) ∨ z  (O ∨ X ∨ Y ) ∨ Z

l ∨ z  Π ∨ Z .

Thus, if Z is some arbitrary point in space, it must have the same orientation relative to

Π as its projection, z  P Z,relativetol. Let us rewrite the expression l ∨ z:

l ∨ z  l ∨ (P Z)

= lT (P Z)=(lT P )Z

=(P T l)T Z =(P T l) ∨ Z .

Therefore, Π  P T l.

2.4 Oriented Multi-View Geometry

2.4.1 Fundamental Matrix

Let Pi and Pj be the projection matrices of two cameras: ⎛ ⎞ ⎛ ⎞ T T Pi Pj ⎝ T ⎠ ⎝ T ⎠ Pi = Qi ,Pj = Qj . T T Ri Rj

39 T Suppose that X is a scene point that projects onto image points xi =(xi1,xi2,xi3)  PiX

T and xj =(xj1,xj2,xj3)  PjX (refer to Figure 2.2). The following is an oriented adaptation of the derivation of Faugeras et al. [14, p. 264]. Using (2.8), the oriented lines from the respective camera centers Oi and Oj through X are given by

Li = Oi ∨ X  xi1Qi ∧ Ri + xi2Ri ∧ Pi + xi3Pi ∧ Qi , (2.11)

Lj = Oj ∨ X  xj1Qj ∧ Rj + xj2Rj ∧ Pj + xj3Pj ∧ Qj . (2.12)

Since Li and Lj both pass through the point X,wemusthaveLi ∨ Lj = 0. Because of the linearity of the join operator (Proposition A.1), it is easy to see that the expression

T T for Li ∨ Lj is bilinear in (xi1,xi2,xi3) and (xj1,xj2,xj3) . We can write this expression compactly in matrix form as

T Li ∨ Lj  xj Fijxi , where

⎛ ⎞ (Qi ∧ Ri) ∨ (Qj ∧ Rj)(Ri ∧ Pi) ∨ (Qj ∧ Rj)(Pi ∧ Qi) ∨ (Qj ∧ Rj) ⎝ ⎠ Fij = (Qi ∧ Ri) ∨ (Rj ∧ Pj)(Ri ∧ Pi) ∨ (Rj ∧ Pj)(Pi ∧ Qi) ∨ (Rj ∧ Pj) (Qi ∧ Ri) ∨ (Pj ∧ Qj)(Ri ∧ Pi) ∨ (Pj ∧ Qj)(Pi ∧ Qi) ∨ (Pj ∧ Qj) ⎛ ⎞ |Qi, Ri, Qj, Rj||Ri, Pi, Qj, Rj||Pi, Qi, Qj, Rj| ⎝ ⎠ = |Qi, Ri, Rj, Pj||Ri, Pi, Rj, Pj||Pi, Qi, Rj, Pj| . (2.13) |Qi, Ri, Pj, Qj||Ri, Pi, Pj, Qj||Pi, Qi, Pj, Qj|

To get a geometric interpretation of the fundamental matrix Fij, let us use the result on projection of lines (2.9) to write out the coefficients of the epipolar line lji which is the projection of Li onto the image plane of the jth view: ⎛ ⎞ (Qj ∧ Rj) ∨ Li ⎝ ⎠ lji = (Rj ∧ Pj) ∨ Li . (2.14) (Pj ∧ Qj) ∨ Li

After substituting (2.11) for Li and comparing with (2.13), it becomes easy to see that

lji = Fijxi . (2.15)

40 X

L Li j

lji lij

xi xj

O e e i ij ji Oj

Figure 2.2 Epipolar geometry.

In this way, we can regard Fij as the matrix that transforms points in the ith view to corresponding epipolar lines in the jth view. The epipolar constraint

T xj Fijxi = 0 (2.16)

can be rewritten as xj ∨ lji = 0, which says that the point xj lies on the epipolar line lji.

This becomes obvious if we write an alternative expression for lji as the projection of of the optical ray Li = Oi ∨ X:

lji  (PjOi) ∨ (PjX)=eji ∨ xj . (2.17)

The point eji  PjOi is called the epipole.

T If we switch the roles of Pi and Pj, we get the epipolar constraint xi Fjixj =0.By

T inspection of (2.13), it is clear that Fji = Fij . The projection of the optical ray Lj into the image plane of the ith camera is given by lij = Fjixj. Also, the second epipole is given by eij  PiOj.

41 From looking at (2.17), it is clear that lji = 0 when xj  eji. Thus, we can conclude that

T ejiFij = Fjieji = 0 .

Similarly,

T eijFji = Fijeij = 0 .

Thus, eij is the null vector of Fij,andeji is the null vector of Fji.

We can explicitly write out the coordinates of the epipoles as follows: ⎛ ⎞ ⎛ ⎞ Pi ∨ (Pj ∧ Qj ∧ Rj) |Pi, Pj, Qj, Rj| ⎝ ⎠ ⎝ ⎠ eij  PiOj = Qi ∨ (Pj ∧ Qj ∧ Rj) = |Qi, Pj, Qj, Rj| , (2.18) Ri ∨ (Pj ∧ Qj ∧ Rj) |Ri, Pj, Qj, Rj| ⎛ ⎞ ⎛ ⎞ Pj ∨ (Pi ∧ Qi ∧ Ri) |Pj, Pi, Qi, Ri| ⎝ ⎠ ⎝ ⎠ eji  PjOi = Qj ∨ (Pi ∧ Qi ∧ Ri) = |Qj, Pi, Qi, Ri| . (2.19) Rj ∨ (Pi ∧ Qi ∧ Ri) |Rj, Pi, Qi, Ri|

The last coordinates ei3 and ej3 of eij and eji reflect the orientations of each camera center with respect to the other camera’s focal plane:

sgn(ei3)=Ri Oj and sgn(ej3)=Rj Oi .

Previously,wehaveassumedthatallscenepointsmustbeinfrontofthefocalplanesof each camera that can see them. This constraint, however, does not apply to camera centers.

Whereas for regular image points, we can simply assume that the third coordinate is positive, this assumption does not hold for epipoles. For this reason, it is crucial to compute epipoles using the oriented formulas above.

In the beginning of the section, we have seen that if two points xi and xj are the projections of the same 3D point X, then the epipolar constraint (2.16) is satisfied. Let us now consider the converse of this statement. For example, if we have xi  PiX and xj −PjX, the epipolar constraint is still satisfied, but there is no point in space with a well-defined orientation that would project to the two image points.

42 X X

Li Lj Li Lj

Oi Oj Oi Oj

Figure 2.3 The epipolar consistency criterion (2.20). The oppositely oriented arrows on the left and right sides of the figure indicate the orientation of Oi ∨ Lj and Oj ∨ Li, respectively.

Proposition 2.16. Let xi and xj be two points such that

T xj Fijxi =0,Fijxi = 0 ,Fjixj = 0 .

There exists a unique 3D point X such that xi  PiX and xj  PjX if and only if

Oi ∨ Lj −Oj ∨ Li , (2.20) where Li and Lj are two viewing rays given by

Li = Qi ∧ Ri Ri ∧ Pi Pi ∧ Qi xi ,

Lj = Qj ∧ Rj Rj ∧ Pj Pj ∧ Qj xj .

Figure 2.3 illustrates (2.20).

Proof. Suppose there does exist a point X such that xi  PiX and xj  PjX.Then

Oi ∨ Lj  Oi ∨ (Oj ∨ X)=−Oj ∨ (Oi ∨ X) −Oj ∨ Lj .

Now, suppose that the converse is true:

Oi ∨ Lj −Oj ∨ Li .

Because xi and xj satisfy the epipolar constraint, the visual rays given by Li and Lj intersect in space. That is, there exists a point X such that Li ∨ X = 0 and Lj ∨ X = 0.Notethat

43 the orientation of X is not defined, since the same constraints are satisfied by −X.Letus choose the orientation such that Li  Oi ∨ X.ThenLj  Oj ∨ X˜ ,whereX˜ ∼ X. Finally, we have:

Oi ∨ Lj  Oi ∨ Oj ∨ X˜ ,

−Oj ∨ Li  Oi ∨ Oj ∨ X .

Since Oi ∨ Lj −Oj ∨ Li, X˜  X. By construction of the viewing rays (Proposition 2.13),

X projects to xi and xj.

The epipolar consistency criterion (2.20) can be stated in several alternative ways.

• Since the epipolar lines lij = Fjixj and lji = Fijxi are oriented projections of the rays

Lj and Li, respectively, we can invoke Proposition 2.15 to write (2.20) as

T T Pi (Fjixj) −Pj (Fijxi) . (2.21)

• By examining the second half of the proof of Proposition 2.16, we can notice that if

(2.20) is not satisfied, there exists a point X such that xi  PiX and xj −PjX.

Then Oj ∨X −Lj. But the oriented projection of Lj is lij = Fjixj, and the oriented

projection of −Lj is eij ∨ xi. Therefore, we must have

Fjixj −eij ∨ xi

and, by similar reasoning,

Fijxi −eji ∨ xj .

Thus, the alternative form of (2.20) becomes

Fjixj  eij ∨ xi or equivalently, Fijxi  eji ∨ xj . (2.22)

This is the “strong realizability” condition of Werner and Pajdla [60, 61].

44 2.4.2 Oriented Trifocal Tensor

The following derivation is after Faugeras et al. [14, p. 419].

Let Pi, Pj and Pk be three camera matrices, and L be a 3D line that projects onto lines

T T T li =(li1,li2,li3) , lj =(lj1,lj2,lj3) ,andlk =(lk1,lk2,lk3) in the three images (see Figure

2.4). Suppose that we know the (properly oriented) coefficient vectors li and lj in two of the

Ok

lk

Pk

L

Pj Pi

lj li

Oi Oj

Figure 2.4 The trifocal tensor (see text). images. According to Proposition 2.15, these two lines back-project to the following planes:

T Πi = Oi ∨ L  Pi li ,

T Πj = Oj ∨ L  Pj lj .

45 By intersecting the above planes, we will retrieve the (unoriented) coordinates of the line L:

Πi ∧ Πj =(Oi ∨ L) ∧ (Oj ∨ L)=(Oi ∨ L) ∧ (L ∨ Oj) L , Oi ∨ L ∨ Oj > 0 , = −L , Oi ∨ L ∨ Oj < 0 . The last step follows by definition of meet. To keep orientation consistent, we will assume that Oi ∨L∨Oj = Πi ∨Oj = Oi ∨Πj > 0. In practice, we can compute the properly oriented coordinates of Πi or Πj without knowing L, so we can always find the sign of Oi ∨ L ∨ Oj and interchange i and j if necessary. Let us expand the expression L = Πi ∧ Πj:

T T L  (Pi li) ∧ (Pj lj)=(Pili1 + Qili2 + Rili3) ∧ (Pjlj1 + Qjlj2 + Rjlj3)

= li1lj1Pi ∧ Pjli1lj2Pi ∧ Qjli1lj3Pi ∧ Rj +

li2lj1Qi ∧ Pjli2lj2Qi ∧ Qjli2lj3Qi ∧ Rj +

li3lj1Ri ∧ Pjli3lj2Ri ∧ Qjli3lj3Ri ∧ Rj .

Next, let us compute the projection of L onto the line in the third image, lk, using the line projection equation (2.9): ⎛ ⎞ (Qk ∧ Rk) ∨ L ⎝ ⎠ lk = (Rk ∧ Pk) ∨ L , (Pk ∧ Qk) ∨ L ⎛ ⎞ (Qk ∧ Rk) ∨ (Pi ∧ Pj)(Qk ∧ Rk) ∨ (Pi ∧ Qj)(Qk ∧ Rk) ∨ (Pi ∧ Rj) T ⎝ ⎠ lk1 = li (Qk ∧ Rk) ∨ (Qi ∧ Pj)(Qk ∧ Rk) ∨ (Qi ∧ Qj)(Qk ∧ Rk) ∨ (Qi ∧ Rj) lj , (Qk ∧ Rk) ∨ (Ri ∧ Pj)(Qk ∧ Rk) ∨ (Ri ∧ Qj)(Qk ∧ Rk) ∨ (Ri ∧ Rj) ⎛ ⎞ (Rk ∧ Pk) ∨ (Pi ∧ Pj)(Rk ∧ Pk) ∨ (Pi ∧ Qj)(Rk ∧ Pk) ∨ (Pi ∧ Rj) T ⎝ ⎠ lk2 = li (Rk ∧ Pk) ∨ (Qi ∧ Pj)(Rk ∧ Pk) ∨ (Qi ∧ Qj)(Rk ∧ Pk) ∨ (Qi ∧ Rj) lj , (Rk ∧ Pk) ∨ (Ri ∧ Pj)(Rk ∧ Pk) ∨ (Ri ∧ Qj)(Rk ∧ Pk) ∨ (Ri ∧ Rj) ⎛ ⎞ (Pk ∧ Qk) ∨ (Pi ∧ Pj)(Pk ∧ Qk) ∨ (Pi ∧ Qj)(Pk ∧ Qk) ∨ (Pi ∧ Rj) T ⎝ ⎠ lk3 = li (Pk ∧ Qk) ∨ (Qi ∧ Pj)(Pk ∧ Qk) ∨ (Qi ∧ Qj)(Pk ∧ Qk) ∨ (Qi ∧ Rj) lj . (Pk ∧ Qk) ∨ (Ri ∧ Pj)(Pk ∧ Qk) ∨ (Ri ∧ Qj)(Pk ∧ Qk) ∨ (Ri ∧ Rj)

Overall, we can express the coordinates of lk as a bilinear function of li and lj as follows: ⎛ ⎞ T 1 li Gk lj ⎝ T 2 ⎠ lk = li Gk lj , where (2.23) T 3 li Gk lj

46 ⎛ ⎞ |Qk, Rk, Pi, Pj||Qk, Rk, Pi, Qj||Qk, Rk, Pi, Rj| 1 ⎝ ⎠ Gk = |Qk, Rk, Qi, Pj||Qk, Rk, Qi, Qj||Qk, Rk, Qi, Rj| , (2.24) |Qk, Rk, Ri, Pj||Qk, Rk, Ri, Qj||Qk, Rk, Ri, Rj| ⎛ ⎞ |Rk, Pk, Pi, Pj||Rk, Pk, Pi, Qj||Rk, Pk, Pi, Rj| 2 ⎝ ⎠ Gk = |Rk, Pk, Qi, Pj||Rk, Pk, Qi, Qj||Rk, Pk, Qi, Rj| , (2.25) |Rk, Pk, Ri, Pj||Rk, Pk, Ri, Qj||Rk, Pk, Ri, Rj| ⎛ ⎞ |Pk, Qk, Pi, Pj||Pk, Qk, Pi, Qj||Pk, Qk, Pi, Rj| 3 ⎝ ⎠ Gk = |Pk, Qk, Qi, Pj||Pk, Qk, Qi, Qj||Pk, Qk, Qi, Rj| . (2.26) |Pk, Qk, Ri, Pj||Pk, Qk, Ri, Qj||Pk, Qk, Ri, Rj|

1 2 3 The three matrices Gk, Gk,andGk are called the trifocal matrices. The whole trio is also sometimes known as the trifocal tensor. For short, we will write (2.23) as

lk = Tijk(li, lj) .

We can regard the trifocal tensor Tijk as a function that, given lines li in view i and lj in view j, returns a line lk in view k such that li, lj,andlk are projections of the same 3D line L. Just as the fundamental matrix is a device for transferring points from one view into lines in the second view, the trifocal tensor is a device for transferring lines in two views into a line in a third view. This transfer fails (that is, we have Tijk(li, lj)=0) in two cases [14, p. 423]:

1. li and lj are epipolar lines with respect to the kth view. Since li passes through the

T T epipole eik and lj passes through the epipole ejk, the planes Pi li and Pj lj formed by

back-projecting these lines both contain the camera center Ok. The line L formed by

the intersection of these planes is actually a visual ray that passes through Ok,andits

image in the kth view degenerates to a point.

2. li and lj are corresponding epipolar lines with respect to ith and jth views. Then the

T T (unoriented) planes Pi li and Pj lj coincide, and the 3D line L is undetermined.

At this stage, one important disclaimer must be made. In the unoriented case, given two lines li and lj that do not satisfy either of the above conditions, there always exists a 3D line

47 li lj

eij eji

Figure 2.5 Orientation consistency for two images of the same line (Proposition 2.17). The ith image plane is on the left, and the jth image plane is on the right.

L that projects onto these lines. In the oriented case, there may not be a consistent way of assigning orientation to this 3D line. That is, it may only be possible to find a line L that projects onto li in the first image, and onto −lj in the second one.

Proposition 2.17. Given two lines li and lj in the ith and jth views, respectively, there

∗ ∗ exists an oriented 3D line L such that li  Pi L and lj  Pj L if and only if

Oi ∨ Πj −Oj ∨ Πi , (2.27)

where Πi and Πj are the two back-projected planes

T T Πi  Pi li and Πj  Pj lj .

An image-based equivalent of condition (2.27) is

eij ∨ li −eji ∨ lj .

This condition is illustrated in Figure 2.5. It is related to a result due to Werner and

Pajdla [61] on matching constraints for lines in two images.

2.4.3 Oriented Transfer

Suppose we know two points xi and xj in the ith and jth views that are the projections of the same 3D point X. From Proposition 2.16, we know that this is equivalent to the

48 epipolar consistency requirement (2.20). Without explicitly computing X itself, how can we

find the properly oriented point xk  PkX in the third view?

2.4.3.1 Transfer Using Epipolar Geometry

First, we will solve this problem with the help of epipolar geometry, that is, with the knowledge of the fundamental matrices between the three views. We can apply the matrices

Fik and Fjk to map the points xi and xj onto their respective epipolar lines in the kth view:

lki = Fikxi and lkj = Fjkxk .

But also, from our oriented derivation in Section 2.4.1, we know that

lki  eki ∨ xk and lkj  ekj ∨ xk .

Let us take the intersection (meet) of the two epipolar lines:

lki ∧ lkj  (eki ∨ xk) ∧ (ekj ∨ xk)=−(eki ∨ xk) ∧ (xk ∨ ekj) xk , eki ∨ xk ∨ ekj < 0 , = (2.28) −xk , eki ∨ xk ∨ ekj > 0 .

Now, we have

eki ∨ xk ∨ ekj  lki ∨ ekj =(Fikxi) ∨ ekj

−lkj ∨ eki = −(Fjkxj) ∨ eki . (2.29)

In this way, we don’t need the knowledge of xk to find the sign of eki ∨ xk ∨ ekj. Putting together (2.28) and (2.29), we ge the following formula for computing the properly oriented coordinates of xk: (Fikxi) ∧ (Fjkxj) , (Fikxi) ∨ ekj < 0or(Fjkxj) ∨ eki > 0; xk = (2.30) (Fjkxj) ∧ (Fikxi) , (Fikxi) ∨ ekj > 0or(Fjkxj) ∨ eki < 0 .

Epipolar transfer is illustrated in Figure 2.6. It is well known [14, p.413] that point transfer using fundamental matrices fails in the following cases:

49 Ok

lkj lki

xk

ekj eki

X

xj xi

Oi Oj

Figure 2.6 Transfer using epipolar geometry (see text).

1. The 3D point X lies on the line defined by camera centers Oi and Oj.Inthiscase,

we have lki ∼ (PkOi) ∨ (PkOj)  eki ∨ ekj. Similarly, lkj ∼ ekj ∨ eki.Thus,thetwo

(unoriented) epipolar lines lki and lkj coincide.

2. The 3D point X lies on the line defined by Oi and Ok,orOj and Ok. In the first case,

we have xi ∼ eik and Fikxi = 0; in the second case, xj ∼ ejk and Fjkxj = 0.

3. The three camera centers Oi, Oj and Ok are collinear. Then eki ∼ ekj,andthe

(unoriented) epipolar lines lki  eki ∨ xk and lkj  ekj ∨ xk coincide.

4. The 3D point X is in the trifocal plane containing Oi, Oj and Ok. Then the image

points eki, ekj,andxk are collinear, and the lines lki and lkj once again coincide.

50 In the last three cases above, we actually have enough information to predict the position of xk. Since the visual rays formed by back-projecting xi and xk are distinct, we can recover

X as the unique point of their intersection in space, and find xk by projecting X into the third image. The failure of equation (2.30) in these cases is an unsatisfying feature of the method. In practice, the problem may be even more serious: whenever the configuration of the three cameras approaches one of the degenerate cases above, (2.30) becomes numerically unstable.

2.4.3.2 Transfer Using the Trifocal Tensor

Next, let us see how the transfer of points can be accomplished with the help of the trifocal tensor. Let li be a line in the ith view containing the point xi. Also, let lj and

  lj be two lines in the jth view such that lj ∧ lj  xj (see Figure 2.7). Consider the three back-projected planes

T T T  Πi = Pi li , Πj = Pj lj , Πj = Pj lj .

 Let us assume that the pairs (li, lj)and(li, lj) meet the trifocal consistency constraints of Proposition 2.17:

 Πi ∨ Oj  Oi ∨ Πj  Oi ∨ Πj > 0 . (2.31)

These constraints guarantee the existence of two 3D lines L and L such that

  L = Πi ∧ Πj , and L = Πi ∧ Πj .

Moreover, the properly oriented projections of these lines into the kth image are given by

  lk = Tijk(li, lj)andlk = Tijk(li, lj) .

 Let yj and yj be any two points in the jth image such that

  lj  yj ∨ xj and lj  xj ∨ yj .

51 Ok

l k y' k ' lk xk Pi yk

L L' Y' X

Y li lj ' lj x y'j xi j

yj

O i Oj

Figure 2.7 Transfer using the trifocal tensor.

52  Because we assumed that lj ∧ lj  xj,wemusthave

 |yj, xj, yj| > 0 . (2.32)

  Let X, Y ,andY be points on the plane Πi that project onto xj, yj and yj (recall that xi and xj are constrained to meet the epipolar consistency criterion (2.20), so X must project

  onto xi as well). Because lj  yj ∨ xj and lj  xj ∨ yj are oriented projections of the lines L and L, respectively, we can write

L  Y ∨ X and L  X ∨ Y  .

By Proposition 2.10, (2.32) implies that

 |Oj, Y , X, Y | > 0 .

 By (2.31), we know that Πi ∨ Oj > 0. Since the points Y , X,andY all belong to the plane Πi, we must conclude that

 Y ∨ X ∨ Y −Πi .

Now, let us consider the transformation induced on the plane Πi by the third projection matrix Pk. By Proposition 2.10,

  Πi ∨ Ok |Ok, Y , X, Y ||yk, xk, yk| ,

   where yk  PkY and yk  PkY .Sincelk and lk are projections of L  Y ∨ X and L  X ∨ Y , respectively, we have

  lk  yk ∨ xk and lk  xk ∨ yk .

    If |yk, xk, yk| > 0, then xk  lk ∧ lk,andif|yk, xk, yk| < 0, then xk  lk ∧ lk. Finally, we have the procedure for finding the properly oriented coordinates of xk:  Tijk(li, lj) ∧ Tijk(li, lj) , Πi ∨ Ok > 0; xk =  (2.33) Tijk(li, lj) ∧ Tijk(li, lj) , Πi ∨ Ok < 0 .

53 Though it is not necessary for the derivation presented above, it is useful to explicitly write out the coordinates of the point X, defined as the point that projects onto xj in the

T jth image and lies on the plane Πi = Pi li. Because li passes through the point xi in the ith image, and xi and xj are in exact epipolar correspondence, X also projects to xi.The coordinates of X can be found by taking the meet of the visual ray Mj  Oj ∨ X with the plane Πi:

Mj ∧ Πi  (Oj ∨ X) ∧ Πi = −X,sinceOj ∨ Πi < 0 by (2.31) .

Plugging in formula (2.8) to find Mj, we end up with the following expression for X:

X = −Mj ∧ Πi = − Qj ∧ Rj ∧ Πi Rj ∧ Pj ∧ Πi Pj ∧ Qj ∧ Πi xj . (2.34)

This is one simple example of a formula for reconstructing the 3D coordinates of a point based on its projections in two images.

2.5 Oriented Projective Reconstruction

In our derivations of the fundamental matrix and trifocal tensor in Section 2.4, we have assumed that the projection matrices are known for all cameras in the scene. In practice, fundamental matrices or trifocal tensors are often estimated first, and the camera matrices are derived from them. However, the goal of our research is to reconstruct objects in space given not just two or three, but an arbitrary number of views. Because it is inefficient to independently estimate trifocal tensors for each triple of views in the scene, we would like to have all the camera matrices computed at once. In Section 2.3, we have placed no affine or Euclidean constraints on the camera matrices, only requiring them to have full rank. There exist several methods in the literature [14, 21] for projective reconstruction under this assumption. However, the output of these methods does not respect oriented constraints. In this section, we will discuss the issues involved in upgrading a standard projective reconstruction to an oriented projective reconstruction.

54 Let us formulate the reconstruction problem more precisely. Suppose we take n images j of a scene consisting of m points. Let xi denote the image of the ith point in the jth picture.

j j j j T Note that if xi =(xi1,xi2,xi3) is defined, this means that the ith point is visible to the jth j camera, we can assume that xi3 > 0 (recall (2.5) in Section 2.3). A projective reconstruction consists of estimated 3D points X1,...,Xm and camera matrices P1,...,Pn, such that

j xi ∼ PjXi . (2.35)

j However, we need a reconstruction meeting a stronger assumption. Since each xi is deter- j mineduptoapositive factor by the constraint xi3 > 0, the projected point PjXi must also j match xi up to a positive factor: j xi  PjXi . (2.36)

If the cameras P1,...,Pn and the 3D points X1,...,Xm meet the above constraint, they are said to constitute a strong realization or an oriented projective reconstruction of the scene. Fortunately, it is relatively simple to convert a reconstruction satisfying (2.35) to a reconstruction satisfying (2.36). We have the following result, due to Hartley [20], [21, p.508]:

Proposition 2.18. Suppose there exists a set of points Xˆ 1,...,Xˆ m and cameras Pˆ1,...,Pˆn such that j xi  PˆjXˆ i .

j Moreover, the third coordinate of each xi is positive. That is, the point Xˆ i lies in front of j the focal plane of camera Pˆj if xi is defined. Let P1,...,Pn; X1,...,Xm be a projective reconstruction of the same scene, such that (2.35) is satisfied. Then there exist camera matrices P˜1 = ±P1, ..., P˜n = ±Pn and 3D points X˜ 1 = ±X1, ..., X˜ m = ±Xm such that j for each xi , j xi  P˜jX˜ i .

55 To compute the sign-adjusted matrices and 3D points, we can use the following procedure. j j j Begin with any point xi .Ifxi  PjXi,leavePj and Xi alone. If xi −PjXi, multiply Pj j j by −1. For every other point Xi such that xi is defined, multiply Xi by −1ifxi −PjXi

Next, consider all cameras Pj that can see Xi and flip their signs if necessary. In this way, continue propagating the changes until (2.36) is satisfied for all cameras and points.

So far, we have described how to upgrade a projective reconstruction to an oriented projective reconstruction, where each camera matrix Pj is determined up to a positive scale factor. Notice, however, that the upgrade is not unique. If P1,...,Pm; X1,...,Xn is an oriented reconstruction, we could multiply all the cameras and all the points by −1, and obtain another oriented reconstruction, since the overall sign each of PjXi would remain unchanged. This observation brings us to the subject of the ambiguity inherent in any reconstruction.

Let us assume that P1,...,Pn; X1,...,Xm meets (2.36), and consider what happens when we respectively transform the cameras and the points as

−1 P˜j = PjM and X˜ i = MXi , (2.37) where M is a non-singular 4 × 4 matrix. This is called the projective ambiguity of the scene:

−1 j P˜jX˜ i =(PjM )(MXi)=PjXi  xi . (2.38)

j Based on the knowledge of the 2D points xi alone, we cannot tell whether the “real” re- construction should be P1,...,Pn; X1,...,Xm or P˜1,...,P˜n; X˜ 1,...,X˜ m. The degrees of freedom of the unknown transformation M encapsulate our ignorance about the scene based on the available 2D data. Note that we have not constrained the sign of the determinant of M

— it may be either an orientation-preserving or an orientation-reversing transformation. If

M is orientation-reversing, it simultaneously flips the orientations of the camera focal planes and the image points, so their relative orientation remains the same. It is also possible to show that an arbitrary projective transformation does not affect the relative orientation of

56 camera centers and scene planes. However, as the following argument will demonstrate, the relative orientation of one camera’s center with respect to another camera’s focal plane does depend on the sign of the determinant of M. Consider the transformation of a single camera

Pj: ⎛ ⎞ ⎛ ⎞ ˜ T T −1 ⎜ Pj ⎟ ⎜ Pj M ⎟ ˜ ⎜ T ⎟ −1 ⎜ T −1 ⎟ Pj = ⎝ Q˜ j ⎠ = PjM = ⎝ Qj M ⎠ . T T −1 R˜ j Rj M By Proposition 2.9, the center O of the camera transforms as

−T −T −T −1 O˜j = P˜j ∧ Q˜ j ∧ R˜ j =(M Pj) ∧ (M Qj) ∧ (M Rj)=|M | MOj .

Consider some other camera in the scene, say Pk. Let us also transform this camera as

−1 P˜k = PkM , and compute the relative orientation of its focal plane R˜ k and the center O˜j of the first camera:

−T −1 −T T −1 R˜ k O˜j  R˜ k ∨ O˜j =(M Rk) ∨ (|M |MOj)=(M Rk) (|M |MOj)

−1 T −1 −1 T = |M |Rk M MOj = |M |Rk Oj

−1 = |M |(Rk ∨ Oj) .

If |M −1| > 0, the relative orientation of the camera centers and focal planes does not change. If we know whether the center of the jth camera is actually visible to the kth camera (this is equivalent to knowing the proper orientation of the epipole ekj), we can constrain M to have either positive or negative determinant, to enforce the correct visibility relationship between the cameras.

For the purposes of this thesis, we have not needed to estimate properly oriented camera matrices based on multi-view point matches, since metric camera calibration was available for every data set demonstrated in Chapter 5. However, in the future it will be necessary to further study the process of oriented reconstruction, and to implement and test this process on real-world data.

57 CHAPTER 3

Projective Differential Geometry

Projective differential geometry (PDG) deals with the properties of curves and surfaces that remain invariant under projective transformations. Many familiar constructions of Eu- clidean differential geometry are absent here, including surface normals, Gaussian curvature, and the Gauss map. We are interested only in the simplest projective differential invariants that require partial derivatives up to second order. The most important such invariant is local shape, discussed in Section 3.2.6.

As a branch of mathematics, PDG seems to have seen most of its activity before the

1950’s. The bulk of our presentation in this chapter follows the 1932 textbook of Lane [28], which seems to be the most recent elementary English-language source. Currently, PDG is all but unused in the field of computer vision. We tend to agree with Koenderink [25], who states that PDG has many applications in computer vision, and its neglected status is due largely to the unavailability of accessible introductory texts. We are interested in reviving PDG because it offers the right framework for the problem of reconstructing smooth curves and surfaces based on projective information alone. Over the past decade, vision researchers have been acquiring a deeper understanding of projective reconstruction techniques for points, lines, and planes [14, 21]. In our opinion, the addition of tools applicable to the reconstruction of more complex geometric entities would greatly enrich the subject of multi-view geometry.

58 3.1 Curves

We will deal with smooth curves in Pn for n =2, 3.

Definition 3.1 (Curve). A parametrized smooth curve in Pn is a smooth mapping x : I → Pn where I is an open interval of the real line R. That is, the curve point corresponding to any parameter value t ∈ I can be written in homogeneous coordinates as

T x(t)= x1(t),...,xn+1(t) such that the coordinate functions xi(t) are infinitely continuously differentiable.

   T Apointx(t) is called regular if the derivative x (t)= x1(t),...,xn+1(t) exists and is independent of x(t)—thatis,x(t) = 0 and x(t) is not a scalar multiple of x(t). The trace of the curve is the set Γ ⊂ Pn that is the image of the domain I under the mapping x.In the subsequent sections, when we refer to a curve as Γ, we mean that Γ is the trace of a differentiable map, as defined above. Beginning in Section 3.3.1, we will also be interested in closed curves. A closed curve can be defined as a differentiable map from the unit circle to Pn. Another definition, after Do Carmo [12, p. 30] is as follows:

Definition 3.2 (Closed Curve). A curve defined on a closed interval I =[a, b] is the restric- tion of a curve defined on an open interval that contains [a, b].Aclosed curve is a mapping x : I → Pn defined on the interval I =[a, b] such that all the derivatives of x agree at x(a) and x(b):

x(i)(a)=x(i)(b),i=0,...,∞ .

3.1.1 Differential Equations of Curves

Following Lane [28], we begin the study of curves in projective space by showing that a curve contained in a k-dimensional flat of Pn must satisfy a differential equation of order k + 1. Incidentally, this characteristic differential equation is the same for two curves related by a projective transformation, so it can serve as a kind of projective differential invariant.

59 Definition 3.3 (Immersion). AcurveΓ is said to be immersed in a linear subspace of dimension k (that is, a k-dimensional flat) if it is contained in a linear subspace of dimension k, but not in any subspace of lower dimension.

Proposition 3.1. AcurveΓ is immersed in a linear subspace of dimension k if and only if the coordinates x(t) of a variable point on the curve satisfy a linear homogeneous differential equation of order k +1, but not of lower order [28, p. 5].

Proof. Suppose that Γ is immersed in a k-dimensional subspace defined by k +1 points

T a1,...,ak+1. We will denote the coordinates of the ith point as ai =(ai,1,...,ai,n+1) . Then we may express a point x on the curve as a linear combination of k + 1 independent scalar functions ξi associated with each ai: k+1 x(t)= ξi(t)ai , (3.1) i=1 k+1 xj(t)= ξi(t)aij for j =1,...,n+1. (3.2) i=1

Equation (3.2) states that the function xj(t) is linearly dependent on functions ξ1(t), ...,

ξk+1(t). This means that we can find k +2constantsc0,...,ck+1 such that the following equation holds:

c0xj + c1ξ1 + ...+ ck+1ξk+1 =0.

Differentiating k + 1 times, we get the following system: ⎧ ⎪ ⎪ c0xj + c1ξ1 + ...+ ck+1ξk+1 =0 ⎨    c0x + c1ξ + ...+ ck+1ξ =0 j 1 k+1 (3.3) ⎪ ...... ⎩⎪ (k+1) (k+1) (k+1) c0xj + c1ξ1 + ...+ ck+1ξk+1 =0.

There exists a non-trivial solution for c0,...,ck+1 if and only if the matrix of the system is singular:      xj ξ1 ... ξk+1        xj ξ1 ... ξk+1     . . .. .  =0. (3.4)  . . . .   (k+1) (k+1) (k+1)  xj ξ1 ... ξk+1

60 Expanding the above determinant along the first column, and dividing out by the coef- (k+1) ficient of xj (since we assumed that Γ cannot be immersed in a space of dimension less than k, we know that none of the minor determinants are zero), we get a differential equation of the form

(k+1) (k) xj + φ1xj + ...+ φk+1xj =0,j=1,...,4or

(k+1) (k) x + φ1x + ...+ φk+1x =0, (3.5)

where φ1,...,φk+1 are scalar functions of t. It is clear that (3.5) is an order (k + 1) linear homogeneous differential equation.

For example, a curve that reduces to a single point satisfies the equation

x + φx =0. (3.6)

We can see this by realizing that the solution is of the form x = λa where λ is a scalar function of t and a is a fixed point. A line, which can be written as x = λ1a + λ2b,satisfies the equation

  x + φ1x + φ2x =0. (3.7)

Finally, the equation of a plane curve is

   x + φ1x + φ2x + φ3x =0. (3.8)

Obviously, in P2, every curve satisfies (3.8). The general solution to equation (3.5) has the form (3.1): it is a linear combination of the functions ξ1,...,ξk+1 weighted by arbitrary constants a1,...,ak+1.Anytwosets of k + 1 independent points a1,...,ak+1 and a˜1,...,a˜k+1 can be related by a projective transformation of the form

a˜i = Mai , (3.9)

61 where M is some (n +1)× (n + 1) non-singular matrix. Thus, the differential equation (3.5) is the same for any two curves Γ and Γwhosepoints˜ x(t)andx˜(t) are related by a projective transformation x˜(t)=Mx(t). Note that the exact form of (3.5) is not preserved under certain transformations that do not change the curve, such as multiplying x(t) by a scalar function μ(t) or reparametrizing the curve as x t(s) . However, this is not a problem for us since we will not use the differential equations themselves as curve invariants.

3.1.2 Osculating Spaces

Equation (3.5) says that x(k+1) lies in the span of independent points x, x, x,...,x(k).

These k + 1 points are useful for defining a series of subspaces that locally characterize Γ at x [28, p. 11]:

Definition 3.4 (Osculating Space). At a regular point x of a curve Γ immersed in n- dimensional space, the osculating linear space of order k is the flat spanned by x, x, x,...,x(k).

We will use the join operator to write the order-k osculating subspace as x∨x ∨...∨x(k). The most important osculating space is order-1, the tangent line x∨x. The tangent line can be defined as the limit of a line connecting x with a “nearby” point that approaches x along Γ. In this framework, we can give a geometric interpretation to x as the derivative point that lies on the tangent line and is defined by a process of convergence1:

x(t + δt) − x(t) x(t) = lim . (3.10) δt→0 δt

Consider what happens when x is multiplied by a scalar function μ(t), a transformation that does not change the image of the curve in Pn. Taking the derivative, we get

(μx) = μx + μx . (3.11)

1Strictly speaking, we need to define convergence in Pn, but this can be easily done [50, p. 72].

62 As it turns out, we can scale x to place the derivative point (μx) anywhere on the tangent to the curve Γ at x, though it cannot coincide with x itself unless Γ is zero-dimensional (this fact is expressed by equation (3.6)).

Proposition 3.2. The curve tangent is a projective differential invariant. That is, the tangent is not affected by scaling of the homogeneous coordinates of x by a scalar function or by reparametrization of the curve. If a projective transformation x˜(t)=Mx(t) is applied to the curve, the tangent transforms in a natural way. That is, the tangent line at the transformed point x˜ is given by (Mx) ∨ (Mx).

Proof. First, consider the effect of multiplying the curve by a scalar function μ(t): x˜(t)=

μ(t) x(t). As can be seen from (3.11), the new point x˜ =(μx) is still in the span of x and x, so the tangent line remains invariant to scaling of homogeneous coordinates. If we reparametrize the curve as x˜(s)=x t(s) ,wheret(s) is a differentiable function representing the parameter change, we can use the chain rule to get

dt x˜ = x . ds

Thus, the linear span of x and x is not changed by reparametrization. Finally, the fact that the tangent line at x˜ is given by (Mx) ∨ (Mx)isobvious:

x˜ ∨ x˜ =(Mx) ∨ (Mx) =(Mx) ∨ (Mx) by linearity of differentiation.

The order-2 osculating space x ∨ x ∨ x,ortheosculating plane, is the limit of the plane through x and two points that independently approach x. Looking back at equation

(3.7) we can see that for a curve that is contained in a one-dimensional subspace of P3, x is actually linearly dependent on x and x, so a proper osculating plane does not exist, degenerating instead to a line. More generally, (3.5) states that for a curve contained in a

63 Type Dimensions of osculating spaces Regular Point 0, 1 Inflection Point 0, 1, 1 Flat Point 0, 1, 1, 0 Handle Point (stationary osculating plane) 0, 1, 2, 2 Cusp 0, 0

Table 3.1 Projectively invariant classification of curve points. k-dimensional subspace, x(k+1) is dependent on x, x,...,x(k).Thatis,x(k+1) lies in the osculating subspace of order k, so the osculating subspace of order k + 1 is degenerate.

Using the same techniques as in the proof of Proposition 3.2, we can establish that projective osculating subspaces of any order are projective differential invariants [50, p. 72].

Taking advantage of this result, we can obtain a projectively invariant classification of curve points based on the sequence of dimensions of their osculating spaces of order 0, 1, etc. Table 3.1 shows the characteristic sequences for a few useful types [50, p. 73].

In particular, an inflection point has a degenerate osculating plane, which means that the point x is in the span of points x and x. This is precisely the condition expressed in (3.7). Even though we derived (3.7) using the global assumption that all points on the curve lie on a fixed line, the equation can also be interpreted locally. Namely, (3.7) holds at a particular point x if the curve instantaneously becomes one-dimensional at x. In the same way, it is easy to see that a cusp locally satisfies the equation (3.6) of a 0-dimensional curve, and a handle point locally satisfies the equation (3.8) for a plane curve.

3.1.3 Order of Contact

Possibly the most important invariant of projective differential geometry is order of con- tact2, which describes how “intimately” curves interact with other curves and lines in space.

2In fact, since the definition of order of contact relies only on the differential properties of the curves, and not on the linear structure of space, order of contact is invariant not only under projective transformations, but under a more general class of diffeomorphisms.

64 Definition 3.5 (Order of Contact). Two curves Γ1 and Γ2, parametrized by x1(s) and x2(t) respectively, have contact of order k at point x = x1(s0)=x2(t0) if there exists a parameter transformation t = t(s) with t0 = t(s0) such that

(i) (i) x1 (s0)=x2 t(s0) ,i=0,...,k .

The order of contact of two curves is the maximum k such that the curves have contact of order k there [50, p. 75].

Intuitively, two curves have order of contact k at a certain point x if they intersect each other at k+1 “consecutive” points. By visualizing contact in this way, one can easily become convinced of the following well-known facts.

Proposition 3.3. A tangent line to a curve Γ at a point x has order of contact ≥ 1 with Γ at x.Ifx is an inflection point, then the tangent has order of contact ≥ 2 with Γ at x.

3.2 Surfaces

In our work, we will use the word surface to denote a two-dimensional manifold im- mersed in P3 each of whose points can locally be expressed in parametric form using smooth coordinate functions.

Definition 3.6 (Surface). AsetΣ ⊂ P3 is a surface if each point in Σ is contained in an open coordinate neighborhood V ⊂ P3, and there exists a map x : U → V ∩ Σ,whereU is an open subset of R2, such that x is smooth (has continuous partial derivatives of all orders) and has a continuous inverse map x−1 : V ∩ Σ → U.

Using homogeneous coordinates, we can write any point in Σ as

T x(u, v)= x1(u, v),x2(u, v),x3(u, v),x4(u, v) , where each xi(u, v) is a smooth coordinate function. Next, let us define the notion of a tangent plane [28, p. 35]:

65 Definition 3.7 (Tangent Plane). The tangent plane at a point of a surface is the plane containing the tangent lines at the point of all curves on the surface through the point.

Note that this definition makes no reference to the concept of a to the surface, which does not exist in projective space. We can easily verify that the tangent plane is determined by x ∨ xu ∨ xv, where subscripts denote partial differentiation with respect to u and v. Let Γ be a curve on the surface Σ passing through the point x. The equation of Γ is x(t)=x u(t),v(t) . The tangent to Γ at x is spanned by x and

   x = xuu + xvv . (3.12)

Thus, we can see that the tangent to any surface curve Γ at x lies in the subspace spanned by x, xu,andxv (see Figure 3.1). A point x on Σ is called regular if x, xu,andxv are independent, so they indeed span a plane. A surface that contains exclusively regular points is called smooth or regular3.

x P

x x' v xu S G

Figure 3.1 The surface tangent plane.

3We can modify Definition 3.6 to exclude the possibility of non-regular (singular) points: simply add the condition that the inverse map x−1 : V ∩ Σ → U must also be smooth.

66 3.2.1 Order of Contact of Surfaces

In this section, we state the definitions of order of contact for two surfaces, and for a surface and a curve. In Section 3.2.6, we will use these notions to classify the local shape of a surface in the neighborhood of a point by the number of tangent lines having order-2 contact with the surface at that point.

Definition 3.8 (Contact of Two Surfaces). Two surfaces Σ1 and Σ2, parametrized by x1(s, t) and x2(u, v) in the neighborhood of a common point x = x1(s0,t0)=x2(u0,v0), have contact of order k at x if there exists a locally regular parameter transformation u = u(s, t) and v = v(s, t) such that all derivatives of x1(s, t) and x2 u(s, t),v(s, t) are equal up to order k [50, p. 79].

Definition 3.9 (Contact of a Curve and a Surface). AcurveΓ and a surface Σ, which have a common point x, have contact of order k if there exists a curve Γ ⊂ Σ passing through x such that Γ and Γ have contact of order k [12, p. 171].

For example, a tangent line (as well as the tangent plane itself) has order ≥ 1contact with the surface, and a curve has order ≥ 2 contact with its osculating plane.

3.2.2 Developable Surfaces

In the rest of this document, we will make extensive use of the properties of a developable, a special kind of surface made up of a one-parameter family of lines. For our purposes, a developable is a surface that locally (that is, in the neighborhood of each of its points) can be described as either a tangent developable or a cone, as defined below.

Definition 3.10 (Tangent Developable). A tangent developable is a surface that contains all the tangent lines of a curve y(u). The equation of a tangent developable can be written as

x(u, v)=y(u)+v y(u) . (3.13)

67 The tangent lines to y are called the generators of the tangent developable, and the curve y itself is called the edge of regression. The edge of regression can be regarded as the locus of intersections of “consecutive” straight lines contained on the surface.

Definition 3.11 (Cone). A cone is a surface that has the parametric form

x(u, v)=a + v y(u) (3.14) where y(u) is some curve and a is a constant. The point a is called the apex of the cone. Note that a cone and a cylinder are projectively equivalent (we may regard a cylinder as a cone with the apex at infinity).

Developable surfaces can also be defined as envelopes of a one-parameter family of planes. The following proposition deals with a property of developable surfaces that follows directly from this definition.

Proposition 3.4. The tangent plane of a remains the same for any point along a single generator.

Proof. Let us first compute the tangent plane of a tangent developable (3.13):

x = y + v y

  xu = y + v y

 xv = y

  x ∨ xu ∨ xv ∼ y ∨ y ∨ y .

The tangent plane at x(u, v) is spanned by y, y,andy, and there is no dependence on v. Note that for values of u where the curve y has an inflection point, the tangent plane of the developable is not defined.

68 We can verify the same property for the equation of a cone (3.14):

x = a + v y

 xu = v y

xv = y

 x ∨ xu ∨ xv ∼ a ∨ y ∨ y .

Note that the tangent plane of a cone is not defined when v =0.

3.2.3 Conjugate Nets

Our goal in this section is to derive a formula (3.22) relating pairs of special conjugate directions on a surface tangent plane. By a direction on the tangent plane to surface Σ at point x, we will mean any point on the line spanned by the first-order partial derivatives xu and xv. The bilinear form appearing in (3.22) is actually the projective version of the famous second fundamental form from Euclidean projective geometry. We will introduce the notion of conjugacy through a geometric definition of a conjugate net.

Definition 3.12 (Net of Curves). Two one-parameter families of curves on the surface Σ are said to form a net if exactly one curve of each family passes through each point of Σ, and the of the two curves passing through the same point are always distinct [28, p.

34].

Definition 3.13 (Conjugate Net). Anetisconjugate if the tangents of the curves of one family of the net constructed at the points of each fixed curve of the other family form a developable surface. The tangents to the two curves passing through any point are said to be conjugate or to lie in conjugate directions [28, p. 122].

It is known that a surface immersed in P3 can support infinitely many conjugate nets.

In fact, one of the families of a conjugate net can be assigned arbitrarily [28, p. 127].

69 Proposition 3.5. Consider the neighborhood of a point x = x(s0,t0) on the surface Σ,and let the two families of curves of a conjugate net be given by

x(s, t0)=x u(s, t0),v(s, t0) and x(s0,t)=x u(s0,t),v(s0,t) . (3.15)

(For the first family, t is treated as a constant, and for the second family, s is treated as a constant.) Then the tangent directions xs = xuus + xvvs and xt = xuut + xvvt of the two curves passing through the point x = x(s0,t0) satisfy the following curvilinear differential equation:

lusvs + m (usvt + utvs)+nvsvt = 0 (3.16) where l = |x, xu, xv, xuu|, m = |x, xu, xv, xuv|,andn = |x, xu, xv, xvv|.

x t,s ds ()00+ x t,s ds t()00+

x t,s x t,s ()00 t()00 s const.

S t const.

Figure 3.2 Illustration for the proof of Proposition 3.5.

Proof. Let us construct two “consecutive” tangents in the t-direction at x(t0,s0)andx(t0,s0+ δs). The two tangent lines are given by

x(t0,s0) ∨ xt(t0,s0)andx(t0,s0 + δs) ∨ xt(t0,s0 + δs) . (3.17)

70 (See Figure 3.2.) By definition 3.13, the one-parameter family of lines defined by x(t0,s) ∨ xt(t0,s)ass varies is supposed to sweep out a developable surface. That is, as δ approaches

0, we expect the two consecutive tangents x(t0,s0)∨xt(t0,s0)andx(t0,s0+δs)∨xt(t0,s0+δs) to intersect in a point lying on the edge of regression (or apex) of the developable. Let us

find the limit of the two lines defined by x(t0,s0)∨x(t0,s0 +δs)andxt(t0,s0)∨xt(t0,s0 +δs):

lim [x(t0,s0) ∨ x(t0,s0 + δs)] δ→0 x(t0,s0 + δs) − x(t0,s0) ∼ lim x(t0,s0) ∨ δ→0 δs

= x(t0,s0) ∨ xs(t0,s0) , (3.18)

lim[xt(t0,s0) ∨ xt(t0,s0 + δs)] δ→0 xt(t0,s0 + δs) − xt(t0,s0) ∼ lim xt(t0,s0) ∨ δ→0 δs

= xt(t0,s0) ∨ xst(t0,s0) . (3.19)

If the two lines defined by (3.17) intersect in the limit, the two lines (3.18) and (3.19) must lie in the same plane. This constraint can be written as

|x, xs, xt, xst| =0. (3.20)

Let us write the complete expression for the derivative point xst using the chain rule:

xst =(xuuut + xuvvt)us +(xvuut + xvvvt)vs

= xuuusut + xuv(usvt + vsut)+xvvvsvt . (3.21)

Now we can expand (3.20) by plugging in the expression for xst (3.21):

|x, xs, xt, xst|

= |x, xuus + xvvs, xuut + xvvt, xuuusut + xuv(usvt + vsut)+xvvvsvt|

= |x, xu, xv, xuuusut + xuv(usvt + vsut)+xvvvsvt|

= usut |x, xu, xv, xuu| +(usvt + vsut) |x, xu, xv, xuv| + vsvt |x, xu, xv, xvv| .

71 Setting the above expression to zero and rewriting, we get the desired result (3.16). In matrix form, lm ut us vs = 0 (3.22) mn vt where l = |x, xu, xv, xuu|, m = |x, xu, xv, xuv|,andn = |x, xu, xv, xvv|.

If the conjugate curves (3.15) are actually parametric curves, then the condition (3.20) becomes simply m = 0. From this observation, we easily obtain the following result [28, p.

122]:

Proposition 3.6 (Corollary). If the parametric curves on the surface Σ form a conjugate net, then x satisfies the following differential equation:

xuv = cx + axu + bxv , where a, b,andc are scalar constants.

3.2.4 Asymptotic Directions

In the previous section, we introduced the concept of conjugate directions. In this section, we will consider asymptotic directions that are conjugate to themselves. But first, we will give a purely geometric definition of a curve whose tangent at every point lies in an asymptotic direction.

Definition 3.14 (Asymptotic Curve). AcurveΓ on a surface Σ is called asymptotic if at each point of the curve, the tangent plane to Σ coincides with the osculating plane to Γ.[28, p. 35].

Proposition 3.7. Let the curve defined by x = x u(t),v(t) be an asymptotic curve. Then at each point x of the curve, the following differential equation is satisfied:

l (u)2 +2muv + n (v)2 =0, (3.23) where l, m,andn are defined as in (3.16).

72 Proof. The points x, x,andx that span the osculating plane must also lie in the tangent plane x ∨ xu ∨ xv.Wehave

   x = xuu + xvv ,

         x =(xuuu + xuvv )u + xuu +(xvuu + xvvv )v + xvv

 2    2   = xuu(u ) +2xuvu v + xvv(v ) + xuu + xvv .

Clearly, x and x already lie in the tangent plane. The remaining point x must be dependent on x, xu,andxv, so the following expression must vanish:

  2    2   |x, xu, xv, x | = |x, xu, xv, xuu(u ) +2xuvu v + xvv(v ) + xuu + xvv |

 2    2 = |x, xu, xv, xuu(u ) +2|xuvu v + xvv(v ) |

 2    2 =(u ) |x, xu, xv, xuu| +2u v |x, xu, xv, xuv| +(v ) |x, xu, xv, xvv|

= l (u)2 +2muv + n (v)2 .

In matrix form, we get lm u u v =0. (3.24) mn v

By statement of Proposition 3.7, (3.24) holds at every point of an asymptotic curve on a surface. However, by examining the proof one ascertains that (3.24) also applies locally at a point x of any curve Γ lying on the surface Σ if the osculating plane to Γ at x coincides with the tangent plane to Σ, or if the point x is in the span of x and x, in which case Γ has an inflection and the osculating plane is not defined. This observation motivates the following definition.

Definition 3.15 (Asymptotic Directions). Let αxu + βxv be a direction in the tangent plane to surface Σ at point x. This direction is asymptotic for Σ at x if the tangent line t = x∨ αxu +βxv has order ≥ 2 contact with Σ at x (t can be referred to as an asymptotic tangent).

73

Proposition 3.8. If t = x ∨ αxu + βxv is an asymptotic tangent to Σ at x,then

lα2 +2mαβ+ nβ2 =0, (3.25)

where l = |x, xu, xv, xuu|, m = |x, xu, xv, xuv|,andn = |x, xu, xv, xvv|.

Proof. By Definition 3.9, if t has order 2 contact with Σ at x, Σ must contain a curve Γ passing through x and having order 2 contact with t. This can only happen if t is the the tangent of Γ at x,andx is also an inflection point of Γ. Let Γ be parametrized as x u(t),v(t) . If Γ has an inflection at x, this means that the second derivative point x is

 contained in x ∨ x ∼ t.Sincet lies in the tangent plane x ∨ xu ∨ xv, we can also write

 |x, xu, xv, x | = 0. By proof of Proposition 3.7, this is equivalent to (3.25).

Incidentally, by comparing (3.25) with (3.22), we can easily see why asymptotic directions are sometimes referred to as self-conjugate.

Proposition 3.9. At any point of a developable surface, there is a single asymptotic direc- tion, which is the direction of the generator at that point.

Proof. Let us compute the coefficients l, m,andn of the equation (3.23) for a tangent developable surface (3.13).

   x = y + vy xuu = y + vy

   xu = y + vy xuv = y

 xv = y xvv = 0 .

      2    l = |x, xu, xv, xuu| = |y + vy , y + vy , y , y + vy | = −v |y, y , y , y |

     m = |x, xu, xv, xuv| = |y + vy , y + vy , y , y | =0

n = |x, xu, xv, xvv| =0.

74 Now, let us repeat the same procedure for a cone (3.14):

 x = a + vyxuu = vy

  xu = vy xuv = y

xv = yxvv = 0 .

    l = |x, xu, xv, xuu| = |a + vy,vy , y,vy | = −v |a, y, y , y |

  m = |x, xu, xv, xuv| = |a + vy,vy , y, y | =0

n = |x, xu, xv, xvv| =0.

In both cases, the equation (3.23) reduces to l(u)2 = 0 which has a single solution u =0.

Thus, a developable surface has a single asymptotic direction, which is the direction of the generator. A developable surface does not contain an asymptotic net; its single family of asymptotic curves is precisely the set of its generators.

3.2.5 Alternative Definitions of Conjugacy

In this section, we look at two alternative definitions of conjugate directions, and show how the fundamental equation (3.22) may be derived from each of them. The first alternative definition makes use of the notion of harmonic position.

Definition 3.16 (Harmonic Position). Let x1, x2, x3, x4 be four collinear points. The two pairs (x1, x2) and (x3, x4) are said to be in harmonic position if cr(x1, x2; x3, x4)=−1, where cr denotes cross ratio [53, p. 48].

Definition 3.17 (Conjugacy and Harmonic Position). At a point of a surface in ordinary space (P3) two tangents are said to be conjugate, or lie in conjugate directions, if they separate the asymptotic tangents harmonically. A net of curves on such a surface is said to be a conjugate net if the two tangents of the curves of the net at each point of the surface are conjugate tangents.

75 We can give an alternative proof of a variant of Proposition 3.5 using the definition above.

Proposition 3.10. Let x1 = α1xu +β1xv and x2 = α2xu +β2xv be two asymptotic directions at the point x on a surface Σ.Alsoletx3 = α3xu + β3xv and x4 = α4xu + β4xv be two conjugate directions. By the above definition, we can write

cr(x1, x2; x3, x4)=−1 .

Then the conjugate directions satisfy

lα3α4 + m (α3β4 + β3α4)+nβ3β4 =0,

where l = |x, xu, xv, xuu|, m = |x, xu, xv, xuv|,andn = |x, xu, xv, xvv|.

Proof. The points x1,...,x4 all lie on the line xu ∨ xv. To compute cr(x1, x2; x3, x4), we

∨ 1 0 need a projective coordinate system for xu xv. Let us assign coordinates 0 and 1 to xu

1 and xv, respectively, and make 1 the unit point. Then the coordinate vector for each of the

αi i four points x becomes βi . To simplify the calculations, we will represent each point using

αi i 1 2 3 4 − its projective parameter θ = βi . It is easy to show (see [53, p. 48]) that cr(θ ,θ ; θ ,θ )= 1 if and only if

(θ1 + θ2)(θ3 + θ4)=2(θ1θ2 + θ3θ4) . (3.26)

Because θ1 and θ2 are asymptotic directions, they each satisfy the equation

2 lθi +2mθi + n =0

(this is the same equation as (3.24), divided by v). By two well-known identities derived from the quadratic formula, the sum and product of the roots of the equation are

2m n θ1 + θ2 = − and θ1θ2 = . l l

76 Plugging back into (3.26), we get 2m n − (θ3 + θ4)=2 +2θ3θ4 l l

lθ3θ4 + m(θ3 + θ4)+n =0 lm θ4 θ3 1 =0, mn 1 yielding an expression equivalent to (3.22).

Yet another way to define conjugacy is via a construction involving an involution with respect to an osculating quadric [50, p. 81]. An involution is a projective transformation that is its own inverse, but is not the identity. An osculating quadric of Σ at point x is any quadric that has contact of order 2 with the surface (an osculating quadric is not unique).

The standard way to represent a quadric surface Ω is using a 4 × 4 symmetric matrix Q, such that for all points y belonging to Ω, yT Qy =0. Q can be regaded as a projective transformation that maps points in space to coefficient vectors of their polar planes with respect to Ω. For a point on Ω, the polar plane is the tangent plane. For example, Qy is the coefficient vector of the tangent plane to Ω at y. The intersection of Ω with the plane defined by Qy is a conic containing all points z such that the line y ∨ z is tangent to Ω at z.For this reason, this conic is known as the rim or contour generator of Ω with respect to y [21, p.190]. This gives us enough background information to introduce the second alternative definition of conjugate surface tangents.

Definition 3.18 (Conjugacy as Involution of Tangents of an Osculating Quadric). Let x1 =

α1xu + β1xv and x2 = α2xu + β2xv be two directions on the tangent plane Π = x ∨ xu ∨ xv.

Then x1 is conjugate to x2 if

x ∨ x2 ∼ Π ∧ (Qx1) , (3.27) where Q is the matrix of an osculating quadric to Σ at x.

Let us discuss in more detail the geometric construction underlying the definition (refer

∗ to Figure 3.3). The plane Π = Qx1 is the polar plane of the point x1. The intersection of

77 ∗ Π with the osculating quadric Ω is the rim of Ω with respect to x1. Since the line x ∨ x1 is tangent to Σ and also to Ω, it is clear that Π∗ contains the point x. The line of intersection between the tangent plane Π and the polar plane Π∗, denoted by Π∧Π∗, is conjugate to the

∗ tangent line x ∨ x1.NotethatΠ ∧ Π is also the tangent to the rim of Ω with respect to x1.

Once again, we can give an alternative proof of Proposition 3.5 using the above construction.

P*

rim P

x x 1

x 2 W

Figure 3.3 Conjugacy as involution of tangents of an osculating quadric.

Proposition 3.11. Let x1 = α1xu +β1xv and x2 = α2xu +β2xv be two conjugate directions in the tangent plane Π = x ∨ xu ∨ xv. By Definition 3.18, x1 and x2 satisfy (3.27). Then we have

lα1α2 + m (α1β2 + β1α2)+nβ1β2 =0, where l = |x, xu, xv, xuu|, m = |x, xu, xv, xuv|,andn = |x, xu, xv, xvv|.

78 ∗ T Proof. The point x2 lies on the plane Π = Qx1,sowemusthavex2 Qx1 =0.

T T x2 Qx1 =(α2xu + β2xv) Q(α1xu + β1xv)

T T T = α1α2 x Qxu +(α1β2 + β1α2) x Qxv + β1β2 x Qxv u u v T T xu Qxu xu Qxv α2 = α1 β1 T T . xu Qxv xv Qxv β2

Qx ∼ x ∨ xu ∨ xv

Qxu =(Qx)u ∼ (x ∨ xu ∨ xv)u

=(x ∨ xu)u ∨ xv +(x ∨ xu) ∨ xuv

=(xu ∨ xu + x ∨ xuu) ∨ xv +(x ∨ xu) ∨ xuv

= x ∨ xuu ∨ xv + x ∨ xu ∨ xuv

T xu Qxu ∼|xu, x, xuu, xv| + |xu, x, xu, xuv| = |x, xu, xv, xuu| .

By an analogous process, we derive

T T xu Qxv ∼|x, xu, xv, xuv| and xv Qxv ∼|x, xu, xv, xvv| .

T T (Note that the constant of proportionality concealed by ∼ is the same for xu Qxu, xu Qxv,

T and xv Qxv.) The result is once again identical to (3.22): lm α2 α1 β1 =0. (3.28) mn β2

3.2.6 Local Shape

In Section 3.2.4, we have seen that for a developable surface, the quadratic equation

2 α α lα2 +2mαβ+ nβ2 =0 or l +2m + n = 0 (3.29) β β has a single root. Let us consider the quantity ln−m2, which is the determinant of the matrix

(3.22) and (one-fourth) the discriminant of (3.29). For a developable surface, ln − m2 =0.

79 More generally, (3.29) will have a single real solution at any point on the surface where ln − m2 = 0, two real solutions where ln − m2 < 0, and no solutions where ln − m2 > 0. We may ask, is there anything fundamental about the sign of this quantity?

Proposition 3.12. The sign of the determinant ln− m2 is a projective geometric invariant.

That is, sgn(ln − m2) is invariant to the following:

1. Rescaling: x˜(u, v)=μ(u, v) x(u, v).

2. Reparametrization: x˜(s, t)=x u(s, t),v(s, t) ;

3. Projective transformations: x˜ = Mx.

Proof.

1. Rescaling:

(μx)u = μux + μxu (μx)uu = μuux +2μuxu + μxuu

(μx)v = μvx + μxv (μx)uv = μuvx + μuxv + μvxu + μxuv

(μx)vv = μvvx +2μvxv + μxvv

|μx, (μx)u, (μx)v, (μx)uu| = |μx,μux + μxu,μvx + μxv,μuux +2μuxu + μxuu|

4 = |μx,μxu,μxv,μxuu| = μ l

4 |μx, (μx)u, (μx)v, (μx)uv| = μ m

4 |μx, (μx)u, (μx)v, (μx)vv| = μ n.

The determinant transforms to μ4(ln − m2), which has the same sign.

l m 2. Reparametrization transforms the matrix m n to us vs lm us ut (usvt − utvs) . ut vt mn vs vt

4 2 The determinant of the transformed matrix is (usvt −utvs) (ln−m ), which once again

has the same sign.

80 3. The effect of the projective transformation on l is as follows:

|Mx, (Mx)u, (Mx)v, (Mx)uu| = |Mx,Mxu,Mxv,Mxuu|

= |M||x, xu, xv, xuu| = |M| l.

Similarly, m transforms to |M| m and n to |M| n. The value of the determinant becomes

|M|2(ln − m2), so the sign remains the same.

In all cases, ln − m2 changes only up to a positive scale factor. We may treat the sign of this quantity as a projective invariant.

By analogy with Euclidean differential geometry, we may classify each point x of a surface

Σ embedded in P3 as follows (also see Figure 3.4):

Elliptic: ln − m2 > 0, no asymptotic tangents.

Parabolic: ln − m2 =0, one asymptotic tangents.

Hyperbolic: ln − m2 < 0, two asymptotic tangents.

In addition, a point is called flat if all second-order partial derivatives of x are zero. Overall, we can see that the property of being an elliptic, hyperbolic, parabolic, or flat point is a property of projective differential geometry [50, p. 80]. As the above table indicates, local surface shape can also be characterized by the number of lines (asymptotic tangents) that have order 2 contact with the surface. These numbers are 0 for elliptic points, 1 for parabolic points, 2 for hyperbolic points, and infinity (all lines) for flat points.

We will call the matrix lm mn the local shape matrix. The next proposition calls attention to some interesting algebraic properties of this matrix. The geometric interpretation of these properties will become apparent in Section 3.3.

81 Elliptic Parabolic Hyperbolic

Figure 3.4 The local shape of a surface at a point.

Proposition 3.13.

l m α 2 2 (αβ) 1. Suppose that x is an elliptic point on Σ.Then m n β = lα +2mαβ + nβ

has the same sign for all points αxu + βxv on the line xu ∨ xv. If the sign is positive, x will be called a convex point. Otherwise, x will be called concave.

2. Suppose that x is a hyperbolic point on Σ,andletx1 = α1xu + β1xv and x2 =

α2xu + β2xv be two points such that x ∨ x1 and x ∨ x2 lie in conjugate directions.

2 2 2 2 Then the expressions lα1 +2mα1β1 + nβ1 and lα2 +2mα2β2 + nβ2 have opposite signs.

 3. Suppose that x is a parabolic point of Σ,andletx = αxu + βxv be a point such that

∨  l m α x x is the unique asymptotic direction. Then m n β = 0 .

The proof is a straightforward exercise in solving quadratic equations.

Proposition 3.14 (Corollary to Proposition 3.13, Part 3). The conjugate to any tangent at a parabolic point is the single asymptotic tangent at that point. The asymptotic tangent itself has no unique conjugate, since every other direction is conjugate to it.

82 Side Note: Euclidean Local Shape. In Euclidean space E3, we represent points by non- homogeneous coordinate vectors x =(x, y, z)T . The non-homogeneous coordinate vector

(x, y, z)T becomes the homogeneous vector w(x, y, z, 1)T . For the rest of this section, we will use non-homogeneous coordinates and denote by xˆ the “homogenized” vector x. One useful notion in Euclidean space is of a unit normal vector n to the tangent plane of Σ at point x.

The unit normal is computed as n = xu × xv, assuming xu and xv are unit tangent vectors.

Let us rewrite the matrix (3.22) using “homogenized” coordinates:   x xu xv xuu  l = |xˆ, xˆu, xˆv, xˆuu| =   1 0 0 0

= −|xu, xv, xuu| = −(xu × xv) · xuu = −n · xuu ,

m = |xˆ, xˆu, xˆv, xˆuv| = −(xu × xv) · xuv = −n · xuv ,

n = |xˆ, xˆu, xˆv, xˆvv| = −(xu × xv) · xvv = −n · xuv , lm n · xuu n · xuv = − . mn n · xuv n · xvv

This is minus the matrix of the famous second fundamental form.

3.3 Orienting Curves and Surfaces

So far in this document, we have not used the machinery of Oriented Projective Geom- etry, as introduced in Chapter 2. In fact, we have explicitly treated curves and surfaces in projective spaces P2 and P3. In this section, we will once again transfer discussion to the oriented spaces T2 and T3 (because Pn is a more general space than Tn, all the results of Sections 3.1 and 3.2 apply without modifications). In Section 3.3.1, we will introduce conventions for orienting plane curves, and in Section 3.3.2, we will move on to smooth surfaces.

83 3.3.1 Orienting Plane Curves

Recall that a curve in T2 is given by a smooth function of a single parameter (Definition

3.1). The trace of this function, denoted Γ, is naturally oriented in the direction of increasing values of the parameter. In accordance with this convention, the oriented curve tangent x(t) ∨ x(t) is actually the limit of a line “pointing” from x(t) towards a “subsequent” point x(t + δt) (see Section 3.1.2).

G G t x t x t x G

Γ is on the positive Γ is on the negative Γ crosses side of the tangent line side of the tangent line the tangent line

Figure 3.5 The local relationship between a curve and its tangent line. The positive side of the tangent line is indicated by darker shading.

Ifapointx of Γ is not an inflection, then Γ locally lies either on the positive or on the negative side of the oriented tangent line (see Figure 3.5). In the former case, we can say that Γ is locally convex, and in the latter case, it is locally concave. Given the parametric equation of Γ in the neighborhood of x, how can we tell which of the above cases holds?

Proposition 3.15. Consider a point x on Γ that is not an inflection. Then Γ is locally on the positive side of the tangent line t = x ∨ x if and only if

|x, x, x| > 0 . (3.30)

Proof. Suppose that Γ is on the positive side of t in the neighborhood of x. Then for any point x(t + δt) infinitesimally close to x,wehave

t ∨ x(t + δt)=|x(t), x(t), x(t + δt)| > 0 .

84 Conversely, if Γ is on the negative side of t, then the determinant |x(t), x(t), x(t + δt)| is negative. Let us take the limit of this quantity as δt approaches 0:        2   lim |x(t), x (t), x(t + δt)| lim x(t), x (t), x(t + δt) − x(t) − x (t)δt  . (3.31) δt→0 δt→0 δt2

If we rewrite x(t + δt) with the help of a Taylor expansion, we obtain the following: 1 1 x(t + δt)=x(t)+x(t)δt + x(t)δt2 + x(t)+... δt3 2 6

2 − −   1  2 x(t + δt) x(t) x (t)δt = x + x (t)+... δt δt 6 2 lim x(t + δt) − x(t) − x(t)δt = x . δt→0 δt2

Plugging the above result back into (3.31), we obtain

lim |x, x, x(t + δt)| = |x, x, x| . δt→0

Let us look at a simple example of how reparametrization affects the orientation of a curve. Let x and x˜ be two functions with the same trace Γ, such that x(t)=x˜(−t). It is easy to determine the relationships between same-order derivatives of the two maps:

d d d x = x(t)= x˜(−t)=− x˜(−t)=−x˜ , dt dt d(−t) d d x = − x˜(−t)= x˜(−t)=x˜ . dt d(−t)

Since x˜ = −x, reparametrizing Γ from x to x˜ changes the orientation of the tangent to Γ at each point x(t)=x˜(−t). Consequently, the orientation of Γ becomes reversed. If we have |x, x, x| > 0 for some point x(t) under the old parametrization, then Γ is convex in the neighborhood of x. For the “reversed” curve in the neighborhood of x˜(−t), we have

|x˜, x˜, x˜| = |x, −x, x| < 0 .

85 Therefore, Γ becomes locally concave. We can see that changing the orientation of the curve also flips the notions of convexity and concavity. However, if we apply only orientation- preserving transformations to a curve or to the whole plane T2, then these notions remain stable.

Proposition 3.16. The sign of the determinant |x, x, x| is an oriented projective invari- ant. It is invariant to the following transformations:

1. Rescaling: multiplication of x(t) by a positive scalar function μ(t).

dt 2. Orientation-preserving change of parameters: reparametrizing x(t) as x t(s) , ds > 0.

3. Orientation-preserving projective transformation: multiplying x by a 3 × 3 matrix M with positive determinant.

Proof.

1. Rescaling:

|μx, (μx), (μx)| = |μx,μx + μx,μx +2μx + μx|

= |μx,μx,μx| =(μ)3|x, x, x|

|x, x, x| if μ>0 .

2. Reparametrization:      2   2 2 2   dx d x  dx dt d x dt dx d t  x, ,  = x, , +  ds ds2  dt ds dt2 ds dt ds2    3  2  3 dt  dx d x dt   = x, ,  = |x, x , x | ds dt dt2 ds dt |x, x, x| if > 0 . ds

3. Oriented projective transformation:

|Mx, (Mx), (Mx)| = |Mx,Mx,Mx|

= |M||x, x, x||x, x, x| if |M| > 0 .

86 x

G W

y

Figure 3.6 Locating the curve interior. At the concave point x, the tangent is locally inside Ω. At the convex point y, the tangent is locally outside Ω.

If Γ is a simple closed curve bounding a region Ω in T2, then it can be oriented so that for each point x on the curve, Ω locally lies on the positive side of the oriented tangent line t  x ∨ x. The following definition will make this convention more precise:

Definition 3.19 (Locating the Curve Interior). Let Γ be a simple closed curve in T2 bounding aregionΩ, x be a point on Γ,andt  x ∨ x be the oriented tangent line at x. We say that

Ω is located on the positive side of t if, for any line m such that t ∧ m  x, there exists a point y such that m  x ∨ y and the segment from x to y is contained in Ω4.

If Γ does not have an inflection at x, two cases are possible: the tangent line t may locally lie either outside or inside of Ω (see Figure 3.6).

4This definition is an attempt to generalize the idea of an inward-pointing normal from Euclidean differ- ential geometry.

87 Proposition 3.17. Let Γ be the smooth closed boundary of a region Ω, oriented so that Ω is everywhere on the positive side of the curve tangent. Then for any non-inflection point x, the tangent line t = x ∨ x is locally inside Ω if and only if |x, x, x| > 0.

Proof. By Proposition 3.15, |x, x, x| > 0 if and only if Γ is locally on the positive side of t. If |x, x, x| > 0, then both the boundary and the interior of Ω are on the positive side of t in the neighborhood of x. In this case, Γ, the boundary of the region Ω, separates points on t from points in the interior of Ω. On the other hand, if |x, x, x| < 0, then the boundary Γisonthenegative side of t, while the interior of Ω is still locally on the positive side of t.

Therefore, t must be locally inside Ω.

3.3.2 Orienting Surfaces

Locally, the orientation at a point on a surface is determined by the orientation of the tangent plane at that point. Informally, a surface is called orientable if there exists a globally smooth choice of orientations of all the tangent planes.

Definition 3.20 (Orientable Surface). A surface Σ is called orientable if it is possible to cover it with a system of coordinate neighborhoods in such a way that if a point belongs to two such neighborhoods — that is, if this point can be written as x(u, v)=x˜ u(s, t),v(s, t) — then the Jacobian determinant     ∂(u, v)  us ut  =   ∂(s, t) vs vt is positive.

To better understand the above definition, consider a coordinate change in a neigborhood of Σ: x˜(s, t)=x u(s, t),v(s, t) . Let us compute the transformed tangent plane x˜ ∨ x˜s ∨ x˜t:

x˜ ∨ x˜s ∨ x˜t = x ∨ (xuus + xvvs) ∨ (xuut + xvvt)

= x ∨ xuus ∨ xvvt + x ∨ xvvs + xuut

=(usvt − utvs)(x ∨ xu ∨ xv) .

88 The orientation of the tangent plane is preserved if and only if usvt − utvs is positive. This quantity is the determinant of the Jacobian matrix of the coordinate transformation.

At this stage, we will talk about orienting a special class of surfaces that arise as bound- aries of smooth three-dimensional manifolds with boundary embedded in T3,orsmooth solids for short.

Definition 3.21 (Manifold With Boundary). Let H3 be the upper half-space of R3 consisting of all points with non-negative last coordinate. H3 can be assigned the standard subspace topology: each open set of H3 has the form U ∩ H3,whereU is an open set of R3.A set Ω ⊂ T3 is a 3-dimensional smooth manifold with boundary if every point of Ω has a neighborhood diffeomorphic to an open set of H3.Theboundary of Ω,denoted∂Ω,istheset of points whose neighborhoods are diffeomorphic to neighborhoods of points on the boundary of H3.Theinterior of Ω,denotedint(Ω),isthesetΩ \ ∂Ω.

The next well-known result establishes the connection between three-dimensional mani- folds with boundary and smooth surfaces, entities already familiar to us.

Proposition 3.18. If Ω is a three-dimensional smooth manifold with boundary, then ∂Ω is a two-dimensional smooth manifold without boundary [19, p. 59].

If we know that a smooth surface Σ is actually the boundary ∂ΩofsomesolidΩas in Definition 3.21, we can orient the tangent plane Π at every point x of Σ such that Ω is located on the positive side of Π (more precisely, Ω is located in the closed half-space consisting of points y for which Π ∨ y ≥ 0).

Definition 3.22 (Locating the Surface Interior). Let Σ=∂Ω for some smooth solid Ω, x be a point on Σ,andΠ be the tangent plane at x. We say that Ω is located on the positive side of Π if, for any line m such that Π ∧ m  x, there exists a point y such that m  x ∨ y and the segment from x to y is contained in Ω.

89 It is possible to show [19, p. 97] that the orientation of Σ selected in the above manner is globally consistent — that is, the boundary of a smooth solid is always orientable.

Proposition 3.19. Consider the surface Σ=∂Ω at point x, with the tangent plane given by Π = x ∨ xu ∨ xv. Assume that Ω is on the positive side of Π, as in Definition 3.22. Let t = x ∨ αxu + βxv be a non-asymptotic line in the tangent plane. Then t is locally outside of Ω (that is, there exists a neighborhood of x on t that is not contained in the interior of

Ω) if and only if

lα2 +2mαβ+ nβ2 > 0 , where l, m,andn are defined as in (3.16).

Proof. Our strategy is to reduce the three-dimensional case at hand to the two-dimensional case already covered in Proposition 3.17. Pick any line m such that Π ∧ m  x.Byour orientation convention (Definition 3.22), there exists a point y lying in int(Ω) such that m  x ∨ y. By definition of meet, Π ∧ m  x implies Π ∨ y > 0. Let Π˜ = t ∨ y,and let Γ be the curve formed by intersecting Σ with Π˜ , oriented so that its tangent line x ∨ x matches t. Clearly, the osculating plane to Γ at x is either Π˜ or −Π˜ . Depending on the sign of Π ∨ x, the two possibilities are as follows: Π ∨ x > 0: x ∨ x ∨ x  Π˜ , (3.32) Π ∨ x < 0: x ∨ x ∨ x −Π˜ .

Let Ω˜ (resp. int(Ω),˜ ∂Ω)˜ denote the intersection of Ω (resp. int(Ω), ∂Ω) with Π˜ .Atthis stage, we can invoke Definition 3.19 (modified in the obvious way to apply to the curve Γ contained in the plane Π˜ ) to determine whether Ω˜ is located on the positive side of t in

Π˜ . By construction, t ∨ y = Π˜ ,sowehavet ∧Π˜ m  x. Therefore, the line m meets the conditions of Definition 3.19, and Ω˜ is indeed on the positive side of t in Π˜ .

Just as Definition 3.19, Proposition 3.17 can be rewritten to apply to the case of a planar curve in T3. In its modified form, the proposition states that t is outside Ωin˜ Π˜ if and only if x ∨ x ∨ x  Π˜ .

90 From the proof of Proposition 3.7, we obtain

  2 2 Π ∨ x |x, xu, xv, x | = lα +2mαβ+ nβ .

By combining the above result with (3.32), we can see that t is locally outside Ωin˜ Π˜ if and only if lα2 +2mαβ+ nβ2 > 0. Because this is true for any planar section, the original statement of the proposition holds.

Oriented Conjugate Mapping. We conclude this section by deriving a linear map from one properly oriented conjugate direction to the other. This map will be used in the proof of Proposition 4.4 that will serve as an important ingredient in the algorithm for computing rim meshes.

Let us consider the conjugate directions x1 = α1xu + β1xv and x2 = α2xu + β2xv.Note that the relationship lm α1 α2 β2 =0 mn β1 does not depend on the orientation of x1 and x2. That is, we could independently multiply the directions by arbitrary (non-zero) scale factors, and the resulting pair would still be conjugate. At this point, we will introduce the orientation of an ordered conjugate pair as follows.

Definition 3.23. The pair (x1, x2) of conjugate directions will be called positive if

x ∨ x1 ∨ x2  x ∨ xu ∨ xv . (3.33)

Proposition 3.20. The pair (x1, x2) is positive if and only if

α1β2 − α2β1 > 0 . (3.34)

Proof. Let us expand the left side of (3.33):

x ∨ x1 ∨ x2 = x ∨ (α1xu + β1xv) ∨ (α2xu + β2xv)

=(α1β2 − α2β1)(x ∨ xu ∨ xv) .

91 We can conclude that (x1, x2) is a positive conjugate pair if and only if α1β2 − α2β1 > 0.

Incidentally, the sign of α1β2 −α2β1 is precisely the relative orientation of oriented projective

α1 α2 u ∨ v points β1 and β2 in the one-dimensional space spanned by the line x x .

Proposition 3.21. Consider the mapping α2 0 −1 lm α1  . (3.35) β2 10 mn β1 S

Then x1 = α1xu + β1xv and x2 = α2xu + β2xv are conjugate directions. Moreover, (x1, x2)

2 2 is a positive pair if and only if lα1 +2mα1β1 + nβ1 > 0 — that is, the tangent line x ∨ x1 is locally outside the surface.

α2  α1 Proof. If β2 S β1 ,wehave lm α1 α2 β2 = lα1α2 + m(α1β2 + α2β1)+nβ1β2 mn β1

 lα1(−mα1 − nβ1)+mα1(lα1 + mβ1)+

m(−mα1 − nβ1)β1 + nβ1(lα1 + mβ1)=0.

Let us now expand the constraint (3.34):

2 2 α1β2 − α2β1  α1 (lα1 + mβ1) − (−mα1 − nβ1) β1 = lα1 +2mα1β1 + nβ1 .

2 2 Thus, (x1, x2) is positive if and only if lα1 +2mα1β1 + nβ1 > 0.

92 CHAPTER 4

Visual Hulls

This chapter forms the heart of the thesis. Here, we apply the mathematical techniques introduced in Chapters 2 and 3 to give a thorough treatment of projective visual hulls. Sec- tion 4.1 opens with background information on the properties of rims and apparent contours.

The most interesting result in this section is Proposition 4.3, which gives a projective proof of the following qualitative relationship, first noted by Koenderink [24]:

The projection of an elliptic (resp. hyperbolic, parabolic) point on the rim is a

locally convex (resp. concave, inflection) point on the apparent contour.

The subject of Section 4.2 is frontier points, which form a major topological feature on the surface of the visual hull. Section 4.3 describes the rim mesh, which is the decomposition induced on the surface by a set of rims. Finally, Sections 4.4 through 4.6 deal with the three main stages of computing the visual hull itself: tracing intersection curves, computing the 1-skeleton, and finding the faces.

4.1 Properties of Rims and Apparent Contours

In Chapter 2, we described how flats (points, lines, and planes) in oriented projective space behave under perspective projection. Now we consider what happens to a smooth surface Σ bounding a generic solid Ω under the action of a camera with projection matrix

P and oriented center O (in this section, we return to the convention of denoting points in

93 T3 by capital letters, and points in T2 by lowercase letters). We will begin by giving precise definitions for some of the concepts first mentioned in the Introduction, namely rims and apparent contours, and by establishing their geometric properties.

Definition 4.1 (Rim and Apparent Contour). The rim or contour generator of Σ with respect to the camera center O is the curve Γ consisting of all points X on Σ for which O lies in the tangent plane X ∨ Xu ∨ Xv:

|X, Xu, Xv, O| =0.

The image of Γ under the action of camera matrix P is a curve γ in T2 called the apparent contour or outline of Σ.

The following well-known and fundamental result relates the viewing direction from the camera center O to a rim point X and the tangent to the rim at X.

Proposition 4.1. Let X be a point on the rim Γ, O be the center of a perspective camera, and T be the unoriented tangent line to Γ at X. Then the viewing direction O ∨ X and the tangent line T are in conjugate directions.

Proof (sketch). The proof of the above proposition has to do with the main property of conjugate directions (Proposition 3.5). At each point X of the curve Γ, we can construct a line along the viewing direction O ∨ X. Obviously, these lines sweep out a developable surface, namely a cone with apex at O. Then Definition 3.13 suggests that the tangent to the rim at X and the visual ray O ∨ X lie in conjugate directions. Another way to prove this result is by using Definition 3.18. Suppose that the surface

Σ is approximated in the neighborhood of X by an osculating quadric. Then the conjugate direction to the viewing ray O ∨ X is given by the tangent to the rim of the quadric with respect to the center of projection O. Since all local properties involving derivatives up to second order are the same for the surface and the approximating quadric, the tangent to the rim of the quadric is also the tangent to the rim of Σ.

94 Because the outline γ is the projection of the rim Γ, a parametrization of Γ as X(t)=

X u(t),v(t) induces a parametrization of γ as x(t)  P X(t). The tangent line T = X ∨X to Γ at X is related to the tangent line t = x ∨ x to γ at x as follows:

t  (P X) ∨ (P X) =(P X) ∨ (P X)=P ∗T , (4.1) where P ∗ is the line projection matrix, introduced in Section 2.3. Overall, the osculating linear spaces of γ are related to the osculating linear spaces of Γ in the way we expect, via the generalized projective transformation P (see Proposition 2.6 for properties of generalized projective transormations).

Local Contour Features. The infinitesimal “behavior” of the apparent contour depends on the local shape of the surface at the corresponding rim point, as well as on the direction of the viewing ray in the surface tangent plane. The following proposition identifies situations that give rise to inflection points, cusps, and higher-order singularities of the contour.

Proposition 4.2 (Local Contour Features). Let Γ be the rim of Σ with respect to center of projection O, parametrized as X u(t),v(t) in the neighborhood of some point X.Alsolet

γ be the outline of Σ on the image plane, parametrized as x(t)  P X(t). Then the following are true:

1. If X is a hyperbolic point of Σ and O ∨ X is an asymptotic direction of Σ at X,then x is a cusp point of γ.

2. If X is a parabolic point of Σ and O ∨ X is not an asymptotic direction of Σ at X, then x is an inflection point of γ.

3. If X is a parabolic point of Σ and O ∨ X is an asymptotic direction of Σ at X,then X is not a regular point of Γ.

95 Proof.

1. The rim tangent X ∨ X is conjugate to the viewing direction O ∨ X. However O ∨ X

is an asymptotic direction, so it is self-conjugate. This means that the derivative point X belongs to O ∨ X,sox ∨ x  (P X) ∨ (P X)=0. Therefore, γ has a cusp at x.

2. The outline γ has an inflection point at x if |x, x, x| = 0. By Proposition 2.10, we

have |x, x, x| = |P X,PX,PX| = |O, X, X, X| .

   Because O∨X is not asymptotic, the derivative point X = Xuu +Xvv is independent of O and X. The three points O, X,andX span the (unoriented) tangent plane, so

we can write

    2    2 |O, X, X , X |∼|X, Xu, Xv, X | = l (u ) +2muv + n (v ) .

The last step is expanded in the proof of Proposition 3.7. By Proposition 3.14, since

Σ is parabolic at X and X ∨ X is conjugate to O ∨ X, it must be the asymptotic

direction. Therefore, l (u)2 +2muv + n (v)2 = 0, and this completes the proof.

3. By Proposition 3.14, an asymptotic direction at a parabolic point is conjugate to every

other direction in the tangent plane. Therefore, a rim tangent cannot be defined at X

as the unique conjugate to the viewing direction, so X is not a regular point of Γ.

Cases 1 and 2 above, cusps and inflections, are actually generic features of the outline that remain stable under small perturbations of the viewing direction. We will be interested in one more generic feature, called a crossing. A crossing occurs when two non-adjacent points of Γ fall along the same visual ray. This feature is bilocal, and it cannot be charac- terized in terms of infinitesimal properties of the surface or the rim. It is important to note

96 O O O Fold Cusp T-junction

Figure 4.1 Generic contour features. The dashed parts of the rim and the apparent contour are occluded from view. that in the neighborhood of a point that projects to a cusp, an inflection, or a crossing, the rim itself generally remains smooth. By constrast, case 3 of the previous proposition shows a situation when the rim itself becomes singular. Fortunately, this situation is not generic. In our work, we will assume that the rim is a smooth closed curve (or a collection of smooth closed curves) with no singularities.

Visibility. In the preceding discussion of rims and active contours, we have not taken into account self-occlusion. Namely, a rim point X will be hidden from view if the ray L = O∨X enters the object Ω prior to grazing it at X. The projection of X cannot realistically be observed on the image contour, unless the object is transparent. This consideration affects the appearance of the generic contour features described above: a cusp is observed as a termination of the contour, and a crossing is observed as a T-junction, with the more distant of the two branches becoming occluded from view (see Figure 4.1). In general, visibility is not a local phenomenon: the infinitesimal properties of Σ in the neighborhood of X cannot tell us if the viewing ray L has already passed through the object elsewhere. However, there is one necessary condition for visibility: L must be locally outside Ω. Note that this condition is never satisfied for concave points (recall Part 1 of

97 Proposition 3.13) — hence the well-known fact that concavities or “dents” never show up on the silhouette of an object.

Definition 4.2. Let X be a point on the rim Γ due to projection center O. We say that X is locally visible (or simply visible for short) if the viewing ray O ∨ X is locally outside Ω (refer to Proposition 3.19).

OLX OLX W W

X is locally visible X is locally invisible

Figure 4.2 Local visibility (Definition 4.2).

Orienting the Rim. So far, we have explained how the orientation of the rim tangent at every point induces an orientation on the corresponding tangent to the outline. However, we have not specified how to orient the rim itself. To begin, let us assume that Σ is oriented so that Ω is everywhere on the positive side of the tangent plane. The contour tangent t at apointx  P X is related to the rim tangent T at X as in (4.1). By Proposition 2.15 on back-projection of lines, the plane resulting from the back-projection of t can be written as

O ∨ T ,whereO is the camera center. Since T lies in the tangent plane Π = X ∨ Xu ∨ Xv, we know that O ∨ T ∼ Π. In addition, we want the orientation of t to be consistent with the orientation of Π.Thatis,ifY is a point such that Π ∨ Y > 0, then y  P Y must satisfy t ∨ y > 0. Therefore, the back-projection of the contour tangent t must match the orientation of the tangent plane Π. To summarize, we want to orient the rim tangent T so

98 that it satisfies

 O ∨ T = O ∨ X ∨ X  X ∨ Xu ∨ Xv . (4.2)

The orientation convention is illustrated in Figure 4.3.

X X' X

Xu

Xv P P O

Figure 4.3 Orienting the rim. Left: the intrinsic orientation of the surface tangent plane, X ∨ Xu ∨ Xv (the orientation is indicated with a counterclockwise arrow). Right: orienting the rim tangent such that O ∨ X ∨ X matches the intrinsic orientation of Π.

Henceforth, we will always assume that the rim is oriented according to the above con- vention, and that the orientation of the apparent contour is induced by the orientation of the rim. If we let X1 = αXu + βXv be a direction such that O ∨ X  X ∨ X1, then it becomes

 easy to see that (X1, X ) is a positive pair of conjugate directions (recall Definition 3.23).

 Suppose also that X = α2Xu + β2Xv. Then Proposition 3.20 tells us that α1β2 − α2β1 > 0.

This simple observation will be used in several proofs in the subsequent sections.

Local Shape of the Surface and Convexity of the Outline. The next result estab- lishes the connection between the local shape at a rim point and the “curvature” of the corresponding point on the outline: specifically, elliptic points on the rim project onto con- vex points on the contour, and hyperbolic points project onto convex ones (the notions of convexity/concavity of the contour were introduced in Section 3.3.1). This qualitative rela-

99 tionship was first demonstrated by Koenderink as a consequence of his formula for Gaussian curvature of the rim point in terms of the apparent curvature of the contour and the radial curvature of the rim along the line of sight [24]. Unlike Koenderink’s formula, our result re- lies solely on oriented and projective differential geometry, without making use of Euclidean concepts.

Proposition 4.3. Let Γ be the rim of Σ with respect to center of projection O, parametrized as X u(t),v(t) in the neighborhood of a visible point X, and oriented so that (4.2) is satisfied. Let γ be the outline of Σ on the image plane, parametrized as x(t)  P X(t). Provided x is regular, we have the following three cases:

1 . |x, x, x| > 0: Σis elliptic at X.

2 . |x, x, x| < 0: Σis hyperbolic at X.

3 . |x, x, x| =0: Σis parabolic at X.

Figure 4.4 illustrates the above proposition.

Proof. Case 3 above has already been proved in Proposition 4.2, so we only need to consider cases 1 and 2. Let α1,α2,β1,β2 be constants such that O ∨ X  X ∨ α1Xu + β1Xv and

 X = α2Xu + β2Xv.Thenwehave

|x, x, x| = |O, X, X, X|

 = |X,α1Xu + β1Xv,α2Xu + β2Xv, X |

 =(α1β2 − β1α2) |X, Xu, Xv, X |

2 2 =(α1β2 − α2β1)(lα2 +2mα2β2 + nβ2 ) . (4.3)

First, let us consider the term α1β2 − α2β1 of the above expression. Because the rim tangent

 T = X ∨X is oriented to satisfy (4.2), (α1Xu +β1Xv,α2Xu +β2Xv) is a positive conjugate pair, so by Proposition 3.20, α1β2 − α2β1 > 0.

100 X

x Y O y z

Z

Figure 4.4 Illustration of Proposition 4.3. The dashed curve is the rim, and the dotted curve is the parabolic curve (that is, the curve on the surface consisting entirely of parabolic points). X is an elliptic point on the rim that projects onto a convex point x on the outline, Y is a hyperbolic point that projects onto a concave point y,andZ is a parabolic point that projects onto an inflection point z.

2 2 Now let us take on the second term of (4.3), lα2 +2mα2β2 + nβ2 . By hypothesis, the viewing ray O ∨ X is locally outside Ω at X. By Proposition 3.19, this means that

2 2 lα1 +2mα1β1 + nβ1 > 0. If X is an elliptic point, by part 1 of Proposition 3.13, we know

2 2 that lα2 +2mα2β2 + nβ2 > 0, which establishes case 1 above. Next, suppose that X is hyperbolic. Because the viewing ray is conjugate to the rim tangent, part 2 of Proposition

2 2 3.13 tells us that lα2 +2mα2β2 + nβ2 < 0. The overall sign of (4.3) is negative, and part 2 of the above proposition is established.

Note. If we assume that the visual ray O ∨ X is locally inside the object, then the result of

Proposition 4.3 is reversed: X is hyperbolic if |x, x, x| is positive, and elliptic if |x, x, x| is negative.

101 4.2 Frontier Points

Suppose that we are observing the object Ω using n camera matrices P1,...Pn with oriented centers O1,...,On. The rims Γ1,...,Γn associated with these camera centers form an arrangement on the surface Σ. The global properties of this arrangement will be treated in the next section; the current section is devoted to the local geometry of a point where two rims cross.

Suppose that X is a point where the rims Γi and Γj meet. Then the visual rays Oi ∨ X and Oj ∨ X both lie in the tangent plane to Σ at X, and the tangent plane coincides with the (unoriented) epipolar plane defined by Oi, Oj,andX. Such an intersection is generic: if either Oi or Oj is perturbed slightly, we can imagine taking the line joining the new centers, and “pivoting” a plane around this line until it touches the surface in a new point somewhere near the old intersection X. By constrast, an intersection of three rims is not generic. For another rim Γk to pass through X, the camera center Ok would have to belong to the epipolar plane. However, such an arrangement would be destroyed by a small perturbation of either Oi, Oj,orOk: the plane through the three camera centers has no degrees of freedom left to ensure tangency to the surface. In short, the only “interesting” local feature of an arrangement of rims on the surface is a crossing between two rims.

   Let Xi and Xj denote the derivative points of the rims Γi and Γj at X.Also,Ti = X∨Xi

 and Tj = X ∨ Xj will denote the tangents to the respective rims. Because of our orientation convention (4.2), we can write the tangent plane Π to Σ at X as follows:

Π  Oi ∨ Ti  Oj ∨ Tj .

 Letusconsiderjusttheith view. Since Oj belongs to the plane Oi ∨ Ti = Oi ∨ X ∨ Xi,we have

 |Oi, X, Xi, Oj| =0.

102   In the ith image, let xi  PiX be the projection of X,andxi  PiXi is the derivative point of γi at xi. By Proposition 2.10, we can conclude that

  |PiX,PiXi,PiOj||xi, xi, eij| =0.

Recall that eij  PiOj is the epipole in the ith view (Section 2.4.1). The expression

  |xi, xi, eij| = 0 can be rewritten as ti ∨ eij =0,whereti = xi ∨ xi is the tangent to γi at xi. Geometrically, the contour point xi is distinguished by the property that its tangent line ti passes through the epipole eij. Alternatively (and equivalently), we could say that

 the derivative point xi lies on the epipolar line lij  eij ∨ xi.Inthejth view, an analogous relationship holds:

  |Oj, X, Xj, Oi| = |xj, xj, eji| =0.

Overall, xi and xj are points of epipolar tangency, where the epipolar lines lij and lji have order 1 contact with γi and γj (see some figure). Following existing vision terminology, we will refer to xi and xj as frontier points.

Definition 4.3 (Frontier Points). Apointxi lying on the apparent contour γi is called a

 frontier point with respect to the jth view if the derivative point xi lies on the epipolar line lij  eij ∨ xi:

 |xi, xi, eij| =0. (4.4)

Two frontier points xi and xj in the ith and jth view, respectively, are called corresponding

T frontier points if they satisfy the epipolar constraint xj Fijxi =0.

Figure 4.5 illustrates the above definition. Note that the definition has been stated entirely in terms of image features, making no reference to a unique 3D point X that is the intersection of rims Γi and Γj, such that xi  PiX and xj  PjX (in the future, we will refer to X as a frontier point as well). However, we can use a general position argument to show that such a point will always exist. Generically, a plane passing through the camera

103 P X l ij lji

xi xj

O eij e i ji Oj

Figure 4.5 Frontier points. The dashed curves on the surface are rims. centers Oi and Oj can only be tangent to Σ at one point, and this is precisely the point X that projects to xi in the ith view and xj in the jth view. In the respective images, the matching epipolar lines lij and lji are each tangent to the respective contour exactly once at xi and xj, so the epipolar match between these two points is unique. Overall, we do not need to worry about a situation where xi and xj are projections of two distinct 3D points

Xi and Xj,oneontherimΓi, and the other on Γj. Another important observation is that in a generic situation, frontier points always come

 in matching pairs. That is, the condition |xi, xi, eij| = 0 for some point xi on γi always implies the existence of a point xj on γj that is in epipolar correspondence with xi and

  satisfies |xj, xj, eji| = 0. To see why this is true, let us first assume that |xi, xi, eij| =0 for some contour point xi in the ith view that is the back-projection of a unique point X

T belonging to the rim Γi on Σ. The tangent plane to X at Σ is given by Π  Pi ti,where

 ti = xi ∨ xi. Since the epipole eij lies on the tangent ti, the camera center Oj lies on the tangent plane Π. By Definition 4.1, the point X must also lie on the rim Γj,andbyan

T argument earlier in this section, there must exist a point xj on γj such that xj Fijxi =0

 and |xj, xj, eji| =0.

104 At this stage, we will begin to take the point of view of reconstruction: assuming we do not know the surface Σ, but we do know the matrices Pi of all the cameras in the scene, and can observe the properly oriented contours γi in all the images, what can we learn about Σ? An interesting reconstruction task is finding the relative orientation of the unknown rim

  tangents Ti = X ∨Xi and Tj = X ∨Xj in the tangent plane Π. We will call this orientation

  positive if X ∨ Xi ∨ Xj  Π,andnegative otherwise. The following proposition shows how the orientation may be determined from image information alone.

Proposition 4.4. Let X be a point locally visible to the ith and jth cameras, and define

  κi = |xi, xi, xi |,λi = lij · ti

(in particular, λi is positive if lij  ti and negative if lij −ti). Then

  X ∨ Xi ∨ Xj  Π iff κi λi > 0 . (4.5)

2 Proof. From Proposition 4.3, we already know how to interpret the sign of κi.Sinceln− m is positive for elliptic points and negative for hyperbolic points,

2 sgn(κi)=sgn(ln − m ) . (4.6)

Now consider the sign of λi. The epipolar line lij is the oriented projection of the visual ray

 Oj ∨ X, and the tangent ti is the projection of the rim tangent X ∨ Xi. Using Proposition

 2.15, we can conclude that lij  ti if and only if Oi ∨ Oj ∨ X  Oi ∨ X ∨ Xi.But

 Oi ∨ Oj ∨ X = X ∨ Oi ∨ Oj,andOi ∨ X ∨ Xi  Π. Overall, we can express the role of λi as

X ∨ Oi ∨ Oj  λiΠ . (4.7)

    Let us select constants αi,αi,βi,βi and αj,αj,βj,βj to satisfy the following relationships:

   Oi ∨ X  X ∨ (αiXu + βiXv) , X ∨ Xi  X ∨ (αiXu + βiXv) ,

   Oj ∨ X  X ∨ (αjXu + βjXv) , X ∨ Xj  X ∨ (αjXu + βjXv) .

105   Next, let us expand the expressions X ∨ Oi ∨ Oj and X ∨ Xi ∨ Xj:

X ∨ Oi ∨ Oj =(αiXu + βiXv) ∨ X ∨ Oj

=(αiXu + βiXv) ∨ (αjXu + βjXv) ∨ X

= X ∨ (αiXu + βiXv) ∨ (αjXu + βjXv)

=(αiX ∨ Xu + βiX ∨ Xv) ∨ (αjXu + βjXv)

=(αiβj − αjβi)(X ∨ Xu ∨ Xv) . (4.8)

     X ∨ Xi ∨ Xj = X ∨ (αiXu + βiXv) ∨ Xj

   = −(αiXu + βiXv) ∨ X ∨ Xj

    = −(αiXu + βiXv) ∨ X ∨ (αjXu + βjXv)

    = X ∨ (αiXu + βiXv) ∨ (αjXu + βjXv)

    =(αiβj − αjβi)(X ∨ Xu ∨ Xv) . (4.9)

We would like to use the conjugate mapping S (3.35) to express the derivative points as functions of the viewing directions:   αi αi αj αj  = S and  = S . βi βi βj βj But first, we must make sure that S preserves the proper orientation of the respective conju-

  gate pairs. By the orientation convention (4.2), we know that (αiXu + βiXv,αiXu + βiXv)

  and (αjXu + βjXv,αjXu + βjXv) must both be positive conjugate pairs. By Proposition

3.21, these pairs are positive if the tangents X ∨ (αiXu + βiXv)andX ∨ (αjXu + βjXv)are both locally outside the surface. But this is true since by hypothesis, X is a point visible

    by both cameras. Finally, we can use S to compute the determinant αiβj − αjβi:                αi αj    αi αj  | |  αi αj  − 2 −     S S  = S   =(ln m )(αiβj αjβi) . (4.10) βi βj βi βj βi βj

By comparing (4.8) and (4.7), it is easy to see that

sgn(λi)=sgn(αiβj − αjβi) . (4.11)

106 Putting the pieces together, we get

2 sgn(κi λi) = sgn[(ln − m )(αiβj − αjβi)] by (4.6) and (4.11)

    =sgn(αiβj − αjβi) . by (4.10)

2       When (ln − m )(αiβj − αjβi)  αiβj − αjβi > 0, (4.9) becomes X ∨ Xi ∨ Xj  Π.

2   Otherwise, when (ln − m )(αiβj − αjβi) < 0, X ∨ Xi ∨ Xj −Π.

 Let us briefly consider the situations in which κi λi vanishes. First of all, whenever xi is dependent on xi, the tangent line ti does not exist, and both κi and λi become zero. This can occur only if xi is a cusp point of γi.ButthenOi ∨X is actually an asymptotic tangent, so it is not locally outside the surface and X cannot be considered visible to the ith camera.

  Another situation is when xi is an inflection point of γi.Then|xi, xi, xi | =0,soκi vanishes.

Since there is only a finite number of inflectional tangents to γi,theepipoleeij generically will not fall on one of these tangents, and xi will not be a frontier point. Overall, if we exclude the finite number of inflection points and cusps of γi from consideration, we do not have to worry about degenerate cases where κi λi =0. Let us see what happens when we exchange the roles of the ith and jth views in the statement of Proposition 4.4. First of all, since the local shape of X does not depend on the viewpoint, and since both visual rays Oi ∨ X and Oj ∨ X are locally outside Σ, we must have sgn(κj)=sgn(κi), where κj is defined analogously to κi. However, because

    X ∨ Xi ∨ Xj −X ∨ Xj ∨ Xi, we know that κi λi and κj λj must have opposite signs.

Therefore, sgn(λj)=−sgn(λi). In other words, if lij  ti in the ith view, then lji −tj in the jth view. Since the rim tangents ti and tj both back-project to the tangent plane Π,we also know that Piti  Pjtj. ThisimpliesthatPilij −Pjlji. Looking back at the epipolar consistency criterion (2.21) in Proposition 2.16, we realize that this criterion is satisfied for xi and xj, just as expected. As will be discussed in Section 5.1.2, given real data, it is impossible to find a pair of frontier points that would lie on exactly corresponding epipolar

107 lines. For this reason, our implementation can check the consistency constraints

sgn(κj)=sgn(κi) and sgn(λj)=−sgn(λi)

to determine whether xi and xj really may be a pair of corresponding frontier points.

Finally, we should say a word about one interesting circumstance in which the quantity

κi λi arises naturally. The task of searching for all frontier points in the ith view can be thought of as finding the zeros of the differentiable function

 ϕi(u)=|xi(u), xi(u), eij| . (4.12)

The first derivative of this function with respect to u is       |xi, xi, xi | when eij ∨ xi  xi ∨ xi ϕi = |xi, xi , eij| = |eij, xi, xi | =    −|xi, xi, xi | when eij ∨ xi −xi ∨ xi .

Clearly,

 sgn(ϕi)=sgn(κi λi) . (4.13)

 The sign of ϕi(u)atapointwhereϕi(u) = 0 tells us whether the values of ϕi are changing from negative to positive or vice versa. The geometric interpretation of the sign of

ϕi(u)=ti(u) ∨ eij

 is the relative orientation of the tangent line ti and the epipole eij.Whenϕi > 0, the tangent line is turning in such a way that the epipole begins to appear on its positive side;

 the situation is reversed for ϕi < 0. Note that if u0 and u1 are two successive zeros of ϕi,the

  signs of ϕi(u0)andϕi(u1) must be different. (We know that the values of ϕi are either positive or negative on the interval (u0,u1). In the first case, ϕi is going from negative to positive

  at u0 and from positive to negative at u1, which means that ϕi(u0) > 0andϕi(u1) < 0. By

  similar reasoning, if ϕi is negative on (u0,u1), then ϕi(u0) < 0andϕi(u1) > 0.) Thus, the sign of κi λi must alternate for successive frontier points.

108 Because ti and eij are properly oriented projections of the tangent plane Π to Σ at X

 and the camera center Oj, the sign of ϕi can be given an equally easy 3D interpretation:

  when ϕi > 0 (resp. ϕi < 0), the tangent plane Π is turning along the rim Γi in such a way as to leave Oi on its positive (resp. negative) side.

Literature on Frontier Points. Before concluding this section, let us say a word about frontier points in computer vision literature. The importance of frontier points has been recognized by vision researchers for a long time. Rieger [51] uses frontier points to identify the rotation axis of an object under orthographic projection. Porrill and Pollard [49] describe how epipolar tangencies may be used for stereo calibration from space curves or view-dependent profiles. Cipolla et al. [8] define a continuous notion of the frontier of a curved surface as the envelope of the rims under some smooth camera motion, and use frontier points to iteratively recover unknown motion parameters. Epipolar tangencies are also the basis of recent efficient and practical solutions for the special case of rotational motion [41, 62]. Finally, Cross and Zisserman [10] note the fact that frontier points belong both to the visual hull and to the surface itself; consequently, the visual hull gives a good approximation of the surface in regions densely populated by frontier points. They also introduce the term epipolar net to describe the set of all frontier points on the surface.

4.3 The Rim Mesh

In the previous section, we have derived an image-based criterion (4.5) to determine the relative orientation of two rim tangents around a frontier point. In this section, we use this result to show how to compute the rim mesh in some restricted cases. But first, the rim mesh needs to be defined more precisely.

Definition 4.4 (Rim Mesh). Let O1,...,On be oriented centers of projection, and Γ1,...,Γn be the rims associated with these centers. The subdivision of the surface Σ induced by the

109 curves Γ1,...,Γn is called the rim mesh. The rim mesh is a partition of Σ into three col- lections of vertices, edges,andfaces. The vertices are points of intersection between two rims (frontier points), the edges are segments of the rims between successive vertices, and the faces are maximal connected regions of Σ bounded by closed paths consisting of vertices and edges.

In the language of computational geometry, the rim mesh is simply the arrangement of the rims on the surface [1]. Let us begin by examining the decomposition induced on Σ by just a single rim Γi, defined by the smooth implicit function

|X, Xu, Xv, Oi| =0.

For each point X belonging to Σ, we will denote the sign of |X, Xu, Xv, Oi| as the “signa- ture” of the point:

σi =sgn|X, Xu, Xv, Oi| . (4.14)

Clearly, the rim separates the points of Σ for which σi > 0 from the points for which σi < 0.

Let us state the geometric interpretation of the sign of σi. For a surface point X where

σi > 0, the camera center Oi is located on the positive side of the tangent plane to Σ at X.

By our convention, the interior of Σ is also on the positive side of the tangent plane, so we can say that Σ is facing away from the camera at X. On the other hand, Σ is facing towards the camera at points for which σi < 0. This sign convention may seem counter-intuitive, but it is consistent with the orientation conventions we have chosen to follow.

In the one-rim case, the faces of the rim mesh are the maximal connected regions on the surface consisting of points for which σi ≥ 0orσi ≤ 0. Note that the faces can have non-trivial topology (i.e. they do not have to be simply connected), even if Σ itself has genus

0. The edges of the rim mesh correspond to connected components of Γi. As discussed in

Section 4.1, in the generic case, the rim does not contain any isolated points, crossings, or

110 other singularities. Therefore, each edge is a closed loop. Since we are considering the case of a single rim, there can be no frontier points, or vertices.

Next, consider the full decomposition induced on Σ by the rims Γ1,...,Γn.Inthiscase, each face of the mesh is a maximal connected region consisting of points for which the signature vector (σ1,...,σn) remains constant. Each edge of the rim mesh is a maximal region for which exactly one σi is zero. It is clear that the edge for which σi = 0 is incident to exactly two faces, for one of which σi > 0, and for the other σi < 0. Finally, each vertex is an isolated point for which σi =0andσj = 0 for some i = j. As we have argued earlier, in the generic case, three or more signatures cannot be zero simultaneously. Since each vertex is the intersection of exactly two rims, it is incident to exactly four edges of the rim mesh. Note that, even though the signature vector has the same value for every point in the interior of a face of the rim mesh, this value in general does not uniquely identify a face — this is also demonstrated by the one-rim example of the sphere with the bump. However, if

Σ is the boundary of a convex object, then each face can indeed be uniquely identified by stating whether it faces away from or towards each camera in the input set.

4.3.1 Oriented Structure of the Rim Mesh

Each edge of the rim mesh naturally inherits its orientation from the rim to which it belongs, and each face inherits its orientation from Σ. In addition, each face F induces a direction on each edge E belonging to its boundary. Intuitively, this is the orientation that E has to be given, regardless of the orientation of the underlying rim segment, to make F lie on the positive side of E. The direction of an edge can be determined as follows. Let X be a point belonging to the rim Γi and lying on the boundary of a particular face, and let Π be the tangent plane to Σ at X. Consider any curve Γ parametrized by X(t)=X u(t),v(t) such

 that X = X(t0), and X(t0 + δt) lies in the interior of the face if δt is positive. Let Xi and

  X denote the respective derivative points to Γi and Γ at X,andTi = X ∨ Xi the tangent

111   line to Γi at X.IfTi ∨X  Π, then the direction at X is given by Ti,andifTi ∨X −Π, then the direction at X is given by −Ti. It is not difficult to see that this procedure gives the whole edge a direction that is either the same as or opposite to its intrinsic orientation. Note that the two faces incident on each edge induce opposite directions on that edge. The face for which the direction of the edge corresponds with its intrinsic orientation is said to lie on the positive side of the edge, and the other face is said to lie on the negative side of the edge. One more piece of oriented information associated with the rim mesh is the circular order of the four edges incident on each vertex. Intuitively, this is the order in which we would encounter the edges if we were to walk around a vertex in the positive sense along a small

  closed loop. Suppose that X belongs to two rims Γi and Γj,andletXi and Xj denote the

   derivative points of Γi and Γj, respectively. In the limit, the four points Xi, Xj, −Xi,and

 −Xj represent the directions in the tangent plane pointing along each of the four edges. We want to order these four directions into a circular list such that if direction Y immediately precedes direction Z,thenX ∨ Y ∨ Z  Π,whereΠ is the tangent plane to Σ at X.Itis easy to see that there are only two possible orderings:       (Xi, Xj, −Xi, −Xj)ifX ∨ Xi ∨ Xj  Π ,       (4.15) (Xj, Xi, −Xj, −Xi)ifX ∨ Xi ∨ Xj −Π .

4.3.2 Reconstructing the Rim Mesh

We would like to compute the topological structure (that is, all the incidence relationships of the vertices, edges, and faces) of the rim mesh given only the apparent contours γ1,...,γn

1 and the camera matrices P1,...,Pn. Unfortunately, while the rim mesh is well defined for any smooth surface Σ and any generic combination of viewpoints, the task of computing this mesh given only image information is limited by several practical considerations. The

1 Note that even though the structure of the rim mesh depends only on the camera centers O1,...,On, we actually need to know the complete camera matrices in order to compute the epipolar geometry between pairs of views.

112 most important issue is self-occlusion. As discussed in Section 4.1, not every part of the rim projects to the 2D contour. In practice, only the parts of the contour belonging to the sihouette of the object can be detected reliably. For reasonably complex objects, it is also impossible to detect every corresponding pair of frontier points. In some cases, the frontier points will be occluded in both images, and in others, one point may be visible, but its mate in the second image will be occluded.

Because it is generally impossible to recover the complete structure of the rim mesh based on image information alone, we are forced to make several restrictive assumptions on the input.

(A1) The visual ray that touches a point on any rim has contact with Σ only at that point.

(A2) The edges and faces of the rim mesh are simply connected. That is, each edge of

the rim mesh is homeomorphic to an open interval on the real line, and each face is

homeomorphic to an open disk.

(A3) The surface Σ is connected.

Condition (A1) above makes sure that no information is lost due to self-occlusion in the process of projecting the rim onto the apparent contour, while (A2) restricts the rim mesh to the computationally convenient form of a simple subdivision [18]. In particular, it can be shown [18, Theorem 1.1] that the 1-skeleton or graph of a simple subdivision, that is, the collection of its vertices and edges, is connected if and only if the surface is connected. Since by (A3) Σ is connected, we can design an algorithm to compute the rim mesh based on the assumption that the 1-skeleton of the mesh is connected. Note that (A2) cannot hold when in the case of a single rim, since all the edges in this case are loops. Therefore, we must also assume that at least two cameras are present in the scene. An interesting special case occurs when the genus of Σ is 0. Then the 1-skeleton is connected if and only if the rim mesh is a simple subdivision. To prove the “only if”

113 direction, suppose that the 1-skeleton is connected, but some face of the rim mesh is not simply connected. Because Σ is an orientable surface of genus 0, this face can only have the topology of a disk with holes. Then its boundary must consist of several disjoint loops, a contradiction since we assumed that the 1-skeleton is connected. Overall, if we know that Σ is a surface of genus 0, we also show that (A2) holds if and only if the 1-skeleton is connected, a condition that is much simpler to verify in practice than (A2) itself.

In combination, the three assumptions above enable us to describe a simple algorithm to compute the rim mesh. First of all, we must describe the formal representation of the rim mesh that is manipulated by the algorithm. This representation consists of three lists, of vertices, edges, and faces, where the information stored for each object is as follows:

Vertex: pointers to the four incident edges, listed in the same order as the directions along

each edge, determined according to (4.15).

Edge: beginning and ending vertices (U, V ), such that the edge is oriented from U to V . Note that edges belonging to two different rims can have the same endpoints, so we

need to annotate the edge with the index of the rim to which it belongs. Also, each

edge stores pointers to the faces on its positive and negative side (positive and negative

faces for short).

Face: circular list of edges on the boundary, with a direction tag for each edge. An edge

E on the boundary of a face F has direction tag +1 if F is on the positive side of E,

and −1iftheF is on the negative side of E. We will denote the respective cases as (E,+1) and (E,−1).

The above representation is actually a variant of the winged-edge data structure [3, 23], one of the oldest formal schemes for describing the boundary of a manifold solid. Note that this data structure maintains only topological information about the connectivity of the rim mesh, as opposed to geometric information about the shape of Σ. In fact, if we are reconstructing

114 the rim mesh based on apparent contours alone, very little geometric information is actually available to us. If a point belongs to two rims simultaneously, we can observe its projections as matching frontier points in two images, and then reconstruct the 3D coordinates of that point in some projective system. Even though an exact geometric reconstruction is impossible for any other point on Σ, we can compute an exact topological representation of the surface in the form of the rim mesh.

The algorithm to compute the rim mesh “fills in” the vertex, edge, and face lists in several steps. First, it finds the 1-skeleton (vertices and edges) of the mesh, then it determines the ordering of the edges incident on each vertex, and finally, it uses the ordering information to trace the boundaries of all the faces. These steps are listed in more details below. At this stage, we have made the description of the algorithm deliberately idealistic. Most significantly, we ignore (for the time being) such real-life issues as noise and error in contour extraction and camera calibration. The implementation details will be given in Chapter 5.

115 Algorithm 4.1 (Find Rim Mesh).

1. Find Vertices. For each two views i, j, find all pairs of matching frontier points, that is, points that satisfy Definition 4.3. Each matching pair in the image corresponds to a single vertex V belonging to the intersection of rims Γi and Γj.

2. Find Edges. For each contour γi, construct a circular list of frontier points where the ordering is induced by the orientation of the apparent contour (equivalently, by the increasing order of the parameter). For each interval between two successive pair of frontier points corresponding to vertices U and V ,add(U, V ) as the edge of the rim mesh.

3. Orient the Vertices. For each vertex of the rim mesh, create the circular list of edges incident on it by using (4.5) and (4.15).

4. Find Faces. For each edge E =(U, V ), find the boundary of its positive and negative faces. If the positive face pointer of E is not already initialized, it can be initialized to a new face F , and the boundary of F can be found using the following loop:

Set U0 = U. Append (E,+1) to the boundary list of F . Set the positive face pointer of E to F . While U0 = V Let E be the predecessor of E in the circular edge list of V . If E =(V,W) Append (E, +1) to the boundary list of F . Set the positive face pointer of E to F . Else If E =(W, V ) Append (E, −1) to the boundary list of F . Set the negative face pointer of E to F . End If Set V = W , E = E. End While

If the negative face pointer of E =(U, V ) is not initialized, create a new face F and find the boundary of F using the same loop as above, but with different initial conditions:

Set U0 = V . Append (E,−1) to the boundary list of F . Set the negative face pointer of E to F . While U0 = U ...

116 Proposition 4.5. If the assumptions (A1) through (A3) on p. 113 hold, then Algorithm 4.1 correctly finds the rim mesh of Σ.

Proof. First, we establish the correctness of Steps 1, 2, and 3 of the algorithm. As argued in Section 4.2, there exists a one-to-one correspondence between a pair of matching frontier points in views i and j and a unique 3D point lying on the intersection of the rims Γi and Γj.

Therefore, Step 1, which identifies all pairs of matching frontier points, correctly finds each vertex of the rim mesh. The projection map Pi establishes a bijective correspondence between the points on the rims Γi and the points on the apparent contours γi (as a consequence of

(A1), Pi is actually a diffeomorphism). In particular, each edge of the rim mesh that lies on Γi maps to an interval on γi between a pair of successive frontier points. Step 2, which accounts for all intervals on each γi between all successive pairs of frontier points, thus accounts for all the edges of the rim mesh. Step 3 makes straightforward use of the result of

Proposition 4.4, so its correctness follows from the correctness of that proposition. Now let us examine Step 4 of the algorithm. First of all, we can verify that the circular list of edges output by the algorithm as the boundary of each face represents a proper edge cycle. From the code, it is clear that each pair of successive edges must share a vertex V ; moreover, the entrance condition of the while-loop makes sure that no two non-adjacent edges have a vertex in common. Next, we need to argue that the while-loop in Step 4 of the algorithm correctly assigns direction tags to all the edges — that is, E is marked with +1 in the boundary list of F if and only if F is the positive face of E. To begin, consider the initialization of the loop. If a face F is initially identified as the positive (resp. negative) face of an edge E,thenE is added to the boundary list of F with direction tag +1 (resp.

−1). Each subsequent iteration of the while-loop adds a new edge E to the boundary of F

(in the next paragraph, we will argue that E is indeed an edge on the boundary of F , but for now, we will simply assume that this is true). E shares a vertex V with its predecessor.

Because the boundary of F must have a consistent circular orientation, the direction of E

117 must point from V to its other vertex. If V is the first vertex of E, then the direction of E coincides with its intrinsic orientation. Thus, F is the positive face of E, so it receives the direction tag +1. Otherwise, if V is the second vertex of E,thenF is the negative face of E; accordingly, E receives the direction tag −1.

Finally, we have to verify that Step 4 correctly chooses the successor of each edge on the boundary of F . Assume that the vertex V corresponds to a point X belonging to rims Γi

 and Γj, and that the edge E lies on Γi. Clearly, the edge E following E on the boundary

 of F must be one of the two edges lying on Γj and having V as one endpoint. Because E belongs to the closure of F , it must lie on the same side of the rim Γi as the rest of F .In

  terms of signatures (4.14), E differs from F only in that σj = 0 for points on E , while

  σj =+1orσj = −1 for points inside F .Sinceσi is the same for E and F , E lies on the same side of Γi as F .

The directions of E and E can be extended to their common endpoint V as follows. Let

  Xi and Xj be the derivative points of Γi and Γj at X. If the direction tag of E is +1 (resp.

  −1), then X inherits the direction Xi (resp. −Xi)fromE. Similarly, depending on the

    direction tag of E , X inherits either Xj or −Xj from E .LetYi and Yj denote the two

 respective directions. As concluded earlier, F and E are located on the same side of Γi.If

 Γi is traversed according to the direction of E, then we can say that E lies on the positive side of Γi. In the local neighborhood of X, this statement translates to

X ∨ Yi ∨ Yj  Π , (4.16) where Π is the tangent plane to Σ at X.SinceV is the first endpoint of E in the circular

  traversal of the boundary of F , Yj points along E and is associated with E in the circular list of edges incident on V . On the other hand, since V is the last endpoint of E traversed,

−Yi, and not Yi, is the direction corresponding to E in the list (see Figure 4.6). Then (4.16)

 implies that X ∨Yi ∨−Yj −Π,soE must precede E in the edge list of V .Wehaveshown that if E is the successor of E on the boundary of F ,thenE must be the predecessor of E

118 in the list of edges incident to V . Thus, the selection of E in first line of the while-loop in

Step 4 is correct.

Yi

Y j X E'

Gj

F E

-Yi

Gi

Figure 4.6 Illustration for the proof of Proposition 4.5. The circular arrow around X shows the positive orientation of the tangent plane to Σ.

4.3.3 Combinatorial Complexity of the Rim Mesh

In computational geometry, the complexity of an arrangement is the total number of its vertices, edges, and faces [1]. Under the three assumptions listed on p. 113, the complexity of the rim mesh can be easily computed using the well-known Euler-Poincar´e formula [23, p. 43]: v − e + f =2− 2 g, (4.17) where v, e,andf are the numbers of vertices, edges, and faces of the rim mesh, and g is the genus of Σ. Since each vertex has degree 4, and the sum of the degrees2 of all the vertices is equal to twice the number of edges, we have 4 v =2e or

e =2v. (4.18)

2By the degree of a vertex we mean the number of edges incident on that vertex, regardless of edge orientation.

119 Plugging the above expression into (4.17) and solving for the number of faces, we get f =

2 − 2 g + v. Thus, the total complexity of the rim mesh is

v + e + f =4v +2− 2 g, which is of course only linear in the number of vertices. However, it is possible for the number of vertices to be at least quadratic in n (the total number of rims), since each pair of rims can intersect in two or more frontier points. Unfortunately, it is difficult to put an exact upper bound on the total number of frontier points because it depends on the complexity of the shape of Σ. The more “crumpled” the surface, the more frontier points we can expect. In the simplest case when Σ bounds a convex object, each pair of rims intersects at most twice, and the algorithm for computing the rim mesh takes O(n2) time. In practice, the running time will also depend on the discrete representation of the apparent contours and on the desired precision of the computations. The nature of this dependence will be examined more closely in Chapter 5.

4.4 Intersection Curves

In this section, we begin to attack the problem of computing the visual hull of an object based on a finite number of views. As in Section 4.3, we model the object as a solid Ω bounded by a smooth closed surface Σ. However, our reconstruction scenario no longer needs the restrictive assumptions introduced on p. 113. Specifically, we do not require Σ to be connected, and allow apparent contours to contain cusps and T-junctions. From now on, we will rely on only one realistic restriction: that the object Ω be fully visible to each camera in the scene — that is, any point belonging to the object must be located in front of the focal plane of each camera. As explained in the Introduction, the visual hull of an object with respect to a finite set of viewpoints is defined as the intersection of the solid visual cones associated with each viewpoint. First, let us define what a solid visual cone is.

120 Definition 4.5 (Solid Visual Cone). The solid visual cone associated with solid Ω and camera with matrix ⎛ ⎞ T Pi ⎝ T ⎠ Pi = Qi T Ri and oriented center Oi = Pi ∨ Qi ∨ Ri is the solid consisting of all points X such that

(a) The visual ray O ∨ X intersects Ω;

(b) Ri ∨ X > 0 (that is, X is in front of the camera’s focal plane).

The boundary of the visual cone is swept out by silhouette rays (more precisely, silhouette half-lines) that do not penetrate the interior of Ω. Each silhouette ray makes contact with Σ at a point (or possibly two points) on the rim Γi, though not every point of Γi has a silhouette ray passing through it. Overall, the set of all silhouette rays forms a one-parameter piecewise- smooth family, or a cone (developable surface) Ki with apex Oi. Each smooth patch of the cone consists of silhouette rays that touch an uninterrupted segment of the rim Γi; the cone is not smooth at a finite number of T-junctions, or rays that touch the surface at two isolated points.

Each silhouette ray is the back-projection of some point on the apparent contour γi.The subset of γi that gives rise to the surface of visual cone will be referred to as the outline.From now on, we will assume that the outline is given a continuous parametrization as xi(u). Since the visual cone is located in front of the image plane, it follows that the third coordinate of xi(u) is always positive. In the following, we will have no need to refer to the apparent contour itself, so we will simply begin denoting the outline in the ith view as γi. Also, since the outline is not smooth only at a finite number of points, corresponding to T-junctions of the apparent contour, we will simply “pretend” that it is a smooth function. Perhaps the most important justification for this assumption comes from practice: our outline extraction algorithms have no mechanism for automatically identifying T-juctions or discontinuities.

121 4.4.1 Geometric Properties of Intersection Curves

To obtain the boundary description of a visual hull, we need to intersect the surfaces of the solid cones associated with each input viewpoint. To begin, we will study the intersection of just two cone surfaces Ki and Kj formed by back-projecting the outlines γi and γj. Ki and Kj meet along an intersection curve Iij consisting of points X for which there exists a choice of parameters u and v such that the rays Li(u)andLj(v) formed by back-projecting xi(u)andxj(v) intersect. Li(u)andLj(v) are obtained as

Li(u)  Qi ∧ Ri Ri ∧ Pi Pi ∧ Qi xi(u) ,

Lj(v)  Qj ∧ Rj Rj ∧ Pj Pj ∧ Qj xj(v) .

The task of finding Iij essentially reduces to the task of solving the bivariate implicit equation

Li(u) ∨ Lj(v)=0. (4.19)

A similar strategy for finding the intersection curve of two ruled surfaces has recently been published by Heo et al. [22] in the CAD community. Just as in Section 2.4.1, (4.19) can be written as the epipolar constraint

T f(u, v)=xj(v) Fijxi(u)=0, (4.20)

where Fij is the fundamental matrix between views i and j whose explicit form in terms of the rows of the projection matrices Pi and Pj is given by (2.13). Note that the bivariate function defined in (4.19) is actually not identical to the function f(u, v) defined by (4.20). In fact, we can write

f(u, v)=μi(u) μj(v) Li(u) ∨ Lj(v) , where μi and μj are two smooth positive scalar functions. Nevertheless, the zero sets of the two functions define an identical curve, denoted δij from now on.

122 Note. Since we allow silhouettes in the image to have holes and multiple connected com- ponents, the boundaries of these silhouettes, γi and γj, may consist of several disjoint loops.

Thus, we should think of u or v as being defined over a collection of disjoint circular inter- vals (closed intervals on the real line whose endpoints have been identified). The domain of the function f(u, v), which is given by the Cartesian product of the underlying spaces of u and v, has the topology of a collection of tori. In the simplest case when γi and γj are each connected, the domain is homeomorphic to a single torus. However, this seem- ingly complicated global structure will not affect our local analysis, since the domain is a manifold: around each point (u, v), it has the same structure as an open neighborhood of R2.

Epipolar Consistency. While f(u, v) = 0 is a necessary condition for Ki and Kj to have an intersection point corresponding to parameter values u and v,itisnotsufficient.Thatis,

T given two points xi(u)andxj(v) such that xj(v) Fijxi(u) = 0, there does not always exist apointX such that xi(u)  PiX and xj(v)  PjX. The additional condition to guarantee the existence of X is stated in Proposition 2.16. Namely, provided xi and xj do not coincide with the respective epipoles eij and eij,wemusthave

Fijxi  eji ∨ xj . (4.21)

The above constraint can be written as a scalar inequality

g(u, v) > 0 , where g(u, v)= Fijxi(u) · eji ∨ xj(v) .

It is clear that g(u, v) is a smooth function. Moreover, if we assume that the outlines do not pass through the epipoles, g(u, v) never vanishes whenever f(u, v) = 0. We must conclude that g(u, v) never changes sign on a connected component of δij. As a result, finding the parts of δij that do not satisfy (4.21) should be a relatively simple task. On p. 136, we will present an algorithm for tracing δij that begins by computing an abstract combinatorial

123 description consisting of vertices and edges (Algorithm 4.2). To discard the components of

δij where g(u, v) < 0, the algorithm will simply discard the vertices that fail (4.21).

Finally, note that if (4.21) is satisfied, we can immediately verify that

Ri ∨ X > 0andRj ∨ X > 0 based on our earlier assumption that the third coordinate of xi and xj must always be pos- itive.

Singularities of the Intersection Curve. Consider an intersection point X  Oi ∨ sXi(u)  Oj ∨ tXj(v). The tangent line to Iij at X is given by

T = Πi ∧ Πj , where Πi and Πj are the tangent planes to Ki and Kj at X (these are the same as the tangent planes to Σ at the points where the respective rays Li and Lj contact Σ). The two tangent planes can also be written as the back-projections of the respective tangent lines to

  the outline, ti = xi ∨ xi and tj = xj ∨ xj:

T T Πi  Pi ti and Πj  Pj tj .

Whenever xi and xj are matching frontier points, we have Πi  Πj, so the tangent planes to Ki and Kj coincide at X, and the tangent line to Iij is undefined. Thus, we can see that the intersection curve touches the surface at a rim crossing, and is singular there. This observation leads right into the next proposition.

Proposition 4.6. Let (u, v) be a point belonging to δij (that is, f(u, v)=0). The two outline points xi(u) and xj(v) are matching frontier points if and only if δij is singular at (u, v): that is, fu =0and fv =0.

Proof. We will show only the “if” part of the proof, and the “only if” direction can then be obtained simply by reversing the argument. First, let us find the partial derivative of f(u, v)

124 with respect to u:

T   T   fu = xj Fijxi = xi Fjixj = xi ∨ lij |xi, xi, eij| , (4.22)

 where xi is the derivative of xi with respect to u,andlij = Fjixj  eij ∨ xi is the epipolar

 line containing xi in the ith image. Thus, fu =0ifandonlyif|xi, xi, eij| =0,thatis,when xi is a frontier point (point of epipolar tangency) in the ith view. Similarly, we have for fv:

 T   fv = xj Fijxi = xj ∨ lji |xj, xj, eji| . (4.23)

 Clearly, fv =0ifandonlyif|xj, xj, eji| = 0, which happens whenever xj is a frontier point in the jth view. When fu and fv vanish simultaneously, xi and xj are matching frontier points by Definition 4.3.

Note. Recall from Section 4.2 that in the absence of occlusion, frontier points in the ith and jth view always occur in matching pairs. However, not all parts of the respective apparent contours belong to the outlines γi and γj. Therefore, it is possible to observe a frontier point only in one view. This corresponds to a situation when fu =0andfv = 0, or vice versa.

Using second-order techniques for analyzing singularities of an intersection curve of two implicit surfaces [23, 46], it is possible to determine the geometry of δij in the neighborhood of a singularity.

Proposition 4.7. Let (u0,v0) be a point on δij such that fu(u0,v0)=0and fv(u0,v0)=0.

Then δij in the neighborhood of (u0,v0) has an X-shaped crossing, that is, it consists of two transversally intersecting curve branches. Moreover, the tangent vectors to these branches are given by fvv fvv − , 1 and − − , 1 . fuu fuu

Proof. We would like to find an explicit parametric form for δij in the neighborhood of

(u0,v0). This form is given by a curve u(t),v(t) such that f u(t),v(t) ≡ 0. Actually,

125 since we expect the singularity to look like a crossing of two curve branches, there should be two distinct parametric curves passing through (u0,v0)onwhichf vanishes identically.

Assuming that u0 = u(t0)andv0 = v(t0), Taylor expansion of f(t)=f u(t),v(t) about t0 yields

 1  2 f(t0 + δt)=f(t0)+f (t0) δt + f (t0) δt + ... 2   = f(u0,v0)+(fu u + fv v ) δt +

1  2    2   2 fuu (u ) +2fuv u v + fvv (v ) + fu u + fv v δt + ... 2

1  2    2 2 = fuu (u ) +2fuv u v + fvv (v ) δt + ... . 2

We can simplify the above expression by noticing that the mixed second derivative

 T  fuv = xj Fijxi

  vanishes because the derivative points xi and xj lie on the respective matching epipolar lines lij = Fjixj and lji = Fijxi. The Taylor expansion of f(t) now becomes

1  2  2 2 f(t0 + δt)= fuu (u ) + fvv (v ) δt + ... . 2

To make sure that f u(t),v(t) ≡ 0, every coefficient of the Taylor series must be identically

 2  2 zero. In particular, the quantity fuu (u ) + fvv (v ) must vanish:

 2  2 fuu (u ) + fvv (v ) =0 2 u fuu  + fvv =0 v  u ± − fvv  = . v fuu

fvv − uu We must verify that fuu is positive. First, let us find the value of f :   T  T  |xi, xi, xi | , ti  lij xj Fijxi = xi Fjixj = xi ∨ lij    −|xi, xi, xi | , ti −lij ,

126  where ti = xi ∨ xi is the tangent line to γi at xi. As in Proposition 4.4, we can write

fuu  κi λi , (4.24)

  where κi = |xi, xi, xi |,andλi = lij · ti. Similarly, we have

fvv  κj λj . (4.25)

Recall from Section 4.2 that the signs of κi and κj must be equal for a pair of matching frontier points, while the signs of λi and λj must be opposite. Moreover, we have argued that κi λi and κj λj cannot vanish for visible frontier points in generic situations. Overall, we can conclude that fuu and fvv are both nonzero and have opposite signs, which means

− fvv that fuu is indeed positive.

The parametric form of a branch of δij in the neighborhood of (u0,v0)isgivenby

  u(t),v(t) =(u0,v0)+(u ,v)t + ... ,

   u u where (u ,v) v , 1 is the tangent vector to the branch. Since v has two distinct solutions, we must conclude that two distinct curve branches pass through (u0,v0), having tangent vectors proportional to fvv fvv − , 1 and − − , 1 . fuu fuu

Instead of thinking of a singularity of δij as a transversal crossing of two curve branches, we can think of it as a vertex with four incident branches. The tangent vectors pointing along each of these branches are fvv fvv fvv fvv − , 1 , − − , −1 , − − , 1 , − , −1 . fuu fuu fuu fuu − fvv Since fuu is a positive number, each of the above vectors points into a different “quad- rant” obtained by partitioning a neighborhood of (u0,v0) with the lines u = u0 and v = v0.

127 Consequently, it is possible to uniquely identify each of the four branches of δij incident on

(u0,v0) by stating which quadrant it locally occupies. This technique will be used later in the section, where we develop a formal description of the combinatorial structure of δij (see Figure 4.7).

Let X be the frontier point corresponding to the parameter values (u0,v0) in space. We can parametrize Σ in the neighborhood of X such that the curve X(u, v0)istherimΓi,and

X(u0,v)istherimΓj.Thenf(u, v) = 0 defines an implicit curve in parameter space in the

  neighborhood of (u0,v0). Let Xi and Xj denote the derivative points of Γi and Γj at X.

  Clearly, Xi = Xu and Xj = Xv. Also, the directions corresponding to the two branches of the intersection curve are given by fvv fvv − Xu + Xv and − − Xu + Xv . (4.26) fuu fuu

Proposition 4.8. Let (u0,v0) be a singular point of δij,andletX be the corresponding intersection point on Σ,withΣ parametrized in the neighborhood of X by u and v,as explained above. Then the directions (4.26) of the two branches of the intersection curve in the tangent plane to Σ at X are conjugate. That is, ⎛ ⎞ − − fvv lm f − fvv 1 ⎝ uu ⎠ =0, fuu mn 1 where l = |X, Xu, Xv, Xuu|, m = |X, Xu, Xv, Xuv|,andn = |X, Xu, Xv, Xvv|.

Proof. Let us find the exact values of fuu and fvv under this special parametrization of Σ.

For this, we need to explicitly represent the scale ambiguity inherent in the representation of xi and xj.Letμi(u)andμj(v) be smooth scalar functions such that

xi = μiPiX and xj = μjPjX .

Then

T  T   fuu = xj Fijxi =(μjPjX) Fij(μi PiX +2μiPiXu + μiPiXuu)

T T T =(μjPjX) Fij(μiPiXuu)=μi μj X (Pj FijPi)Xuu .

128 Similarly,

T T fvv = μi μj Xvv(Pj FijPi)X .

T Let A = Pj FijPi.Then

T T − fvv − XvvAX − XvvAX = T = T T . fuu X AXuu XuuA X

The 4 × 4 matrix A can be viewed as a mapping from any 3D point X to the coefficient vector of the epipolar plane Π  Oj ∨ Oi ∨ X. Moreover, the transpose of A maps X onto the oppositely oriented plane Oi ∨ Oj ∨ X:

T T AX = Pj FijPiX  Pj lji  Oj ∨ Oi ∨ X

T T T A X = Pi FjiPjX  Pi lij  Oi ∨ Oj ∨ X .

Thus, AX −AT X for any X. It is not difficult to show that A must be a skew-symmetric matrix: AT = −A.

Note also that epipolar plane Π coincides with the tangent plane X ∨ Xu ∨ Xv,sowe can write Π = αX ∨ Xu ∨ Xv,whereα is a positive or negative scale factor. Using this substitution, we obtain

T ∨ | | − fvv − XvvAX − Xvv Π α X, Xu, Xv, Xvv n = T T = = = . fuu XuuA X −Xuu ∨ Π α |X, Xu, Xv, Xuu| l

Finally, it becomes easy to verify that the directions (4.26) are conjugate: ⎛ ⎞ − −fvv n lm f lm − −fvv 1 ⎝ uu ⎠ = n 1 l fuu mn 1 l mn 1 n n n = −l + m − + n l l l n = −l + n =0. l

129 4.4.2 Tracing Intersection Curves

T Let xi(u)andxj(v)betwooutlinepointssuchthatxj(v) Fijxi(u) = 0, and the epipolar consistency criterion of Proposition 2.16 is satisfied. As discussed in the previous section, the unique 3D point X that projects onto xi(u)andxj(v) also belongs to the intersection curve of Ki and Kj. The coordinates of X can be computed from xi(u)andxj(v)withthe help of equation (2.34) or a similar reconstruction formula. The process of 3D reconstruction defines a mapping (u, v) → X from the curve δij in parameter space to the intersection curve

3 Iij in T . For now, we will simplify the discussion by assuming that the epipolar consistency criterion is satisfied for all xi(u)andxj(v) such that f(u, v) = 0, to make sure that δij is indeed the domain of the mapping. This mapping is invertible: given some point X on Iij, it is possible to recover the parameter values u and v by computing the projections PiX and

PjX and seeing where they land on the respective outlines γi and γj. Thus, there exists a bijective correspondence between points on δij and Iij. In the previous section, we have already discussed the intimate relationship between

δij and Iij. Namely, a regular (resp. singular) point of iij gives rise to a regular (resp. singular) point of Iij.Moreover,(u, v) provides a local parametrization of the surface Σ in the neighborhood of a singular point of Iij; using this parametrization, we have shown that the singularities of iij and Iij have the same topological type (X-shaped crossing). Overall,

δij can serve as convenient topological representation of the intersection curve Iij in space.

In this section, we will take advantage of this representation by showing how to trace the intersection curve Iij by tracing the curve δij in parameter space.

The most straightforward way to trace δij is to break it up into branches where one variable can be expressed as a function of the other and to trace each branch separately, also recording global information about the connectivity of branches. In the curve tracing algorithm described later in the section, we will take u to be the independent variable. A well-known rule of implicit differentiation can be used to find the derivative of v with respect

130 to u: dv fu = − . (4.27) du fv

This quantity is undefined whenever fv =0.Atanysuchcritical point, v cannot be locally expressed as a function of u. Thus, we should define the branches of δij as the maximal connected segments of the curve for which the partial derivative fv does not vanish. However, to preserve symmetry between the variables u and v, we will also classify points for which fu = 0 as critical. Since the geometry of the intersection of two cones does not depend on the order in which these cones are taken, our tracing algorithm should obtain the a curve with the same combinatorial description if the roles of γi and γj are interchanged. For future reference, let us state an “official” definition of a critical point:

Definition 4.6 (Critical Point). Let δij be the curve defined by the implicit function f(u, v)=

0 (4.20). A critical point of δij is a point (u0,v0) belonging to δij and having one of the following three types:

1 .fu(u0,v0)=0,fv(u0,v0)=0;

2 .fu(u0,v0)=0,fv(u0,v0) =0;

3 .fu(u0,v0) =0 ,fv(u0,v0)=0.

The geometric interpretation of the three types above should be obvious from the equa- tions (4.22) and (4.23) for fu and fv. In the first case, xi(u0)andxj(v0) are matching frontier points visible on the outlines of both views; in the second case, xi(u0) is a frontier point in the ith view and xj(v0) is a transversal intersection of the epipolar line lji = Fijxi(u0)with the outline γj; and in the third case, xj(v0) is a frontier point in the jth view and xi(u0) is a transversal intersection of lij = Fjixj(v0)withγi. As far as the shape of δij itself, the

first type is a singularity of δij, while the other two are local extrema in v-andu-directions, respectively (as will be seen later, inflection points are impossible). These critical points will form the vertices in the combinatorial description of the curve, while the branches of the

131 curve between the different vertices will be the edges. It is clear that a vertex of type 1 has four incident branches, while a vertex of type 2 or 3 has two. Also note that by construction, a branch of the curve must be monotone (uniformly nondecreasing or nonincreasing) with respect to both the u-andthev-axis. Before formally specifying a combinatorial represen- tation of the curve, we must further classify the last two types of critical points: namely, when is a point of type 2 or 3 a local minimum in the respective direction, and when is it a local maximum?

Proposition 4.9. Suppose that (u0,v0) is a type 2 critical point of δij: that is, fu(u0,v0)=

  0,fv(u0,v0) =0 .Letκi = |xi, xi, xi | and λi = lij · ti as in Proposition 4.4 (recall that

 ti = xi ∨ xi), and introduce a new quantity νj = tj ∨ eji.Then(u0,v0) is a local maximum (resp. minimum) in the v-direction if

κi λi νj > 0

(resp. κi λi νj < 0).

Analogously, if (u0,v0) is a type 3 critical point of δij, then it is a local maximum (resp. minimum) in the u-direction if

κj λj νi > 0

  (resp. κj λj νi < 0), where κj = |xj, xj, xj |, λj = lji · tj,andνi = ti ∨ eji.

Proof. Suppose (u0,v0) is a type 2 critical point of δij: fu =0,fv = 0. Then in the neighbor- hood of (u0,v0), it is possible to express v as a function of u such that f u, v(u) ≡ 0. By the well-known second derivative test, the function v(u) has a local maximum (resp. minimum) if v < 0 (resp. v > 0). To find an expression for v, we will use a technique similar to the one in the proof of Proposition 4.7. Assuming that v0 = u(v0), let us write a Taylor expansion of f(u0 + δu)= u0 + δu,v(u0 + δu) about u0:

 1   2  2 f(u0 + δu)=f(u0,v0)+(fu + fv v )δu + fuu +2fuv v + fvv (v ) + fv v δu + ... 2

 1   2  2 = fv v δu + fuu +2fuv v + fvv (v ) + fv v δu + ... . 2

132  du  In the first line, we used the fact that u = du =1andu = 0, and in the second line, we plugged in f(u0,v0)=0andfv = 0. Because f u, v(u) ≡ 0, every term of the series must vanish identically. In particular, to make the first-order term vanish, v must be zero. This is not surprising, since we expected the critical point (u0,v0) to be a local extremum in the v-direction. The Taylor expansion reduces to

1  2 f(u0 + δu)= fuu + fv v δu + ... . 2

Setting the second-order coefficient to zero, we obtain

  fuu fuu + fv v =0 or v = − . fv

From (4.23),

 fv |xj, xj, eji|tj ∨ eji = νj and from (4.24),

fuu  κi λi .

Finally, we obtain  fuu sgn(v ) = sgn − = −sgn(κi λi νj) . fv

If (u0,v0) is a type 3 critical point (fu =0, fv = 0), we can use exactly the same technique to establish the second part of the proposition. We express u as a function of v such that f u(v),v ≡ 0, write a Taylor expansion, and set each term to zero in order to determine the value of u: fvv u = − . fu

Plugging in the values of fvv and fu from (4.25) and (4.22), we get

 sgn(u )=−sgn(κj λj νi) .

133 Note. As argued in the proof of Proposition 4.7, both fuu and fvv are generically nonzero at critical points of δij. Therefore, critical points of type 2 and 3 cannot be inflections of the curve. Using the results of the previous proposition, we can develop a more complete classifi- cation of critical points, as shown in Figure 4.7. The figure also shows symbolic labels for the branches incident on each critical point. The signs following u0 and v0 identify whether the values of the respective parameters increase or decrease as we follow a particular branch away from (u0,v0). Actually, since u and v are each defined on a circular interval (a closed interval where the first point is identified with the last point, as stated in Definition 3.2), we should be careful about the use of terms “increasing” and “decreasing”. Consider a closed interval [a, b] ⊂ R. The intrinsic orientation of this interval can be thought of as an arrow pointing in the increasing direction of the real numbers. Since the “arrows” at a and b point in the same direction, this convention defines a consistent orientation for the circular interval formed by identifying a and b. Then “increasing” (resp. “decreasing”) simply refers to movement along the circular interval in the direction consistent with (resp. opposite to) its intrinsic orientation.

In our formal representation of δij, each curve branch will be identified by its endpoints

(u0,v0), (u1,v1) and the sign label associated with (u0,v0). The sign label is necessary because several branches of δij may share both endpoints. Note that the label associated with (u1,v1) can be obtained by flipping the label associated with (u0,v0). This is because each curve branch is monotone with respect to both u and v, so the direction of travel from

(u1,v1)to(u0,v0) is the reverse of the direction from (u0,v0)to(u1,v1). Since the branches of δij do not have an intrinsic orientation, the order of the two endpoints is not constrained a priori. However, we want the tracing procedure to proceed along the increasing direction of the independent parameter u, so we will let (u0,v0) be the endpoint whose sign label has the

134 ++ -+ uv (,00 ) -- +- Type 1 fu =0,fv =0

-+ ++ uv (,00 ) uv (,00 ) -- +-

Type 2A Type 2B fu =0,fv =0,localminimum fu =0,fv =0,localmaximum

++ -+

uv uv (,00 ) (,00 ) +- -- Type 3A Type 3B fu =0, fv =0,localminimum fu =0, fv =0,localmaximum

Figure 4.7 A complete classification of critical points with incident branch labels. form +±. This way, in tracing out the given branch of the intersection curve, the algorithm will move along γi from u0 to u1 along the intrinsic orientation of the outline.

At this stage, let us summarize the formal representation of the combinatorial structure of the curve δij. This representation is essentially an undirected graph, consisting of a list of vertices and edges. Each vertex and edge is identified using the following information:

Vertex: coordinates (u, v), vertex type (Figure 4.7).

Edge: endpoints (u0,v0), (u1,v1)andsignlabel+± associated with (u0,v0).

135 For computational convenience, all the u and v vertex coordinates should be organized in

two circular lists (the u-list and the v-list) ordered according to the increasing values of

the respective curve parameters (in case γi or γj have several connected components, each component should have its own u-orv-list).

Next, we specify an algorithm for computing the combinatorial description of δij.

Algorithm 4.2 (Combinatorial Description of Intersection Curve).

1. Find Vertices.

 For Each xi(u) such that |xi, xi, eij| =0 Let lji = Fij xi. For Each xj(v) such that lji ∨ xj =0andeji ∨ xj  lji (1) Find the type of (u, v) as shown on Figure 4.7 and add (u, v) to the vertex list. End For End For

 For Each xj(v) such that |xj, xj, eji| =0 Let lij = Fji xj. For Each xi(u) such that lij ∨ xi =0andeij ∨ xi  lij (2) Find the type of (u, v) as shown on Figure 4.7 and add (u, v) to the vertex list. End For End For

It is implied that the above procedure correctly identifies matching frontier points in both   views: that is, if xi(u)andxj(v) are points such that |xi, xi, eij| =0and|xj, xj, eji| =0, then (u, v) is inserted into the vertex list exactly once.

2. Find Edges.

For Each vertex (u0,v0)(3) If (u0,v0) is of type 1, 2A, or 3A Create a new branch beginning at (u0,v0) and having label ++. Trace the branch using Algorithm 4.3, which returns the second endpoint (u1,v1) and a discretized representation of the branch. End If

136 If (u0,v0)isoftype1,2B,or3A Create a new branch beginning at (u0,v0) and having label +−. Trace the branch using Algorithm 4.3. End If End For

[The intersection curve is a simple loop having one branch.] If the vertex list is empty (4) Let u0 =0,lji = Fij xi(u0). Find the unique v0 such that lji ∨ xj(v0)=0andeji ∨ xj(v0)  lji.   If |xi, xi, eij||xj, xj, eji| < 0(5) Create a new branch beginning and ending at (u0,v0) and having label ++. Trace the branch using Algorithm 4.3. Else Create a new branch beginning and ending at (u0,v0) and having label +−. Trace the branch using Algorithm 4.3. End If End If

Part 1 of Algorithm 4.2 simply finds all pairs (u, v) that meet the definition of a critical point (Definition 4.6). In lines (1) and (2), two checks are performed. The first check

(lji∨xj =0andlij ∨xi = 0) simply verifies that f(u, v) = 0. The second check (eji∨xj  lji and eji ∨xj  lji) verifies that the epipolar consistency constraint (4.21) holds. As discussed earlier, this constraint is satisfied for any point (u, v)ofδij if and only if it is satisfied for the whole connected component containing (u, v). Therefore, to discard the components that fail (4.21), it is sufficient to discard the vertices that “anchor” these components. Note that the epipolar consistency check can only fail when γi (resp. γj) intersects lij (resp. lji)in points that lie on both sides of the epipole. If the epipoles lie outside the convex hulls of both outlines, (4.21) will always hold.

Part 2 of Algorithm 4.2 is also fairly straightforward. Given a vertex (u0,v0), we can identify whether it has a branch labeled ++ or +− simply by looking at its type. It is not difficult to see that a list consisting of all possible labels of the form (u0,v0)+± uniquely

137 identifies each branch of δij. However, the for-loop that begins on line (3) fails to trace the intersection curve if it consists of a single branch with no critical points. The if-statement of line (4) is necessary to deal with this case. When the intersection curve has no critical points, it is necessary to find one point where to begin the tracing. This is done by picking some fixed value of u0 on the first contour (say, u0 = 0) and by finding the corresponding value v0. Now all that remains is to determine the label of the branch. Because (u0,v0)isa

dv regular point of δij, we can simply compute the sign of the derivative du as in (4.27). Then dv − dv the label is ++ if du > 0and+ if du < 0. Putting together (4.22) and (4.23) we get

dv   −|xi, x , eij||xj, x , eji| . du i j

This is exactly the quantity computed on line (5). Once the initial point (u0,v0)andthe label have been determined, the intersection curve has to be traced until it loops back to

(u0,v0).

Once the combinatorial description of δij is computed using Algorithm 4.2, it is easy to trace each branch of the intersection curve geometrically. The tracing process (Algorithm

4.3 below) can be thought of as computing the values of v as a function of u.Namely,ifthe branch has label ++, then v can be expressed as a function of u as follows:

+   V (u)=v such that f(u, v) = 0 and for any v between v0 and v, f(u, v ) =0.

Similarly, if the branch has label +−, then the tracing function is

−   V (u)=v such that f(u, v) = 0 and for any v between v and v0, f(u, v ) =0.

The two functions are illustrated on Figure 4.8.

Given a fixed value of u,thevaluesV +(u)andV −(u) are computed simply by obtaining the epipolar line of the outline point xi(u)inthejth view, and finding a point where this line interests γj. This process is shown in lines (1) and (2) of Algorithm 4.3. The only tricky part of the algorithm is detecting when the function trace should be terminated. One possi- bility is that the given intersection curve has no critical points at all, in which case u varies

138 + v Vu() v d ij v 1 v + 1 Vu()

v v 0 0 dij u u u u u u 0 1 0 1 (a) (b) v v dij dij v v 0 - 0 Vu - () Vu() v 1 v 1

u u u u u u 0 1 0 1 (c) (d)

+ − Figure 4.8 The functions V and V used for traversing branches of δij.Thecurveδij is shown as a thin line, and the graphs of V +(u)andV −(u) are superimposed as thicker lines. (a) and (b) show examples of V +, while (c) and (d) show V −. In (a) and (c), the functions have a discontinuity at u1, and in (b) and (d), they pass a type 2 critical point. through the whole circular range over which it is defined, and the trace stops when u reaches u0 again. This case is detected in line (3) below. Otherwise, the trace must terminate when u passes a discontinuity of V + or V − as shown in Figure 4.8 (a) and (c), or a critical point of type 1 or 2 as shown on Figure 4.8 (b) and (d). These two conditions are checked in line (4).

Algorithm 4.3 (Trace Intersection Curve Branch).

Let u = u0, and let Δu be a fixed increment size. While the second endpoint is not encountered: [Compute v = V +(u) if the label is ++ (resp. v = V −(u) if the label is +−).] Let lji = Fijxi(u). (1) Let v be the first successor (resp. predecessor) of v0 such that lji ∨ xj(v)=0. (2) Add (u, v) to the current branch of δij.

139 If u0 ∈ [u, u +Δu](3) [Finished tracing a simple loop with no critical points.] The second endpoint is (u0,v0). Else If there exists u1 ∈ [u, u +Δu] in the u-list such that xi(u1) is a frontier point of γi + − or u1 is a discontinuity of V (resp. V )(4) Let v1 be the first successor (resp. predecessor) of v0 such that (u1,v1)isa critical point. Then (u1,v1) is the second endpoint of the current branch. Else [Endpoint not detected, increment u and keep tracing.] Let u = u +Δu. End If End While

Figure 4.9 shows an example of an intersection curve between two views of a gourd from the data set shown on Figure 5.6. Parts (a) and (b) of the figure shows the two views with superimposed contours and epipolar lines corresponding to critical points. The epipolar lines tangent to the outlines are solid, and the epipolar lines corresponding to frontier points in the other view are dashed. Part (c) of the figure shows the curve δij in parameter space.

The critical points of type 2A, 2B, 3A, and 3B are indicated using symbols , , ,and, respectively. Finally, part (d) is a plot of the intersection curve in 3D. The 3D points are computed using the reconstruction method described in Section 5.1.

140 (a) (b) v

u (c) (d)

Figure 4.9 Intersection curve tracing: an example (see text).

141 4.5 The 1-Skeleton of the Visual Hull.

In this section, we describe how to compute the 1-skeleton of the visual hull boundary,

∂V . The 1-skeleton of the visual hull is the subset of ∂V that belongs to any intersection curve Iij,where(i, j) is any pair of distinct input views. When the visual hull is formed by intersecting only two cones, no special effort is required to compute its 1-skeleton. All one needs to do is trace the single intersection curve between the two cones, a task accomplished by Algorithm 4.2 of Section 4.4.2. With more than two cameras present in the scene, the task of computing the 1-skeleton becomes much more complicated. The first problem is how

n to determine which fragments of 2 different intersection curves actually lie on ∂V ,and the second problem is how to find the connectivity of these fragments. In the two following subsections, we will address each of these problems in turn.

4.5.1 Clipping Intersection Curves

For a given intersection curve Iij, how can we compute its contribution to the 1-skeleton? Recall that the visual hull consists of points whose images do not fall outside the silhouette of the object in any input view. We can apply the definition by projecting Iij into each available view k = i, j (there is no sense in projecting Iij into views i and j because the projections will simply coincide with the silhouette boundaries) and “clipping away” the parts that fall outside ωk, the silhouette in the kth view.

k k Let ιij denote the projection of Iij into the kth view. Computing ιij is straightforward given our image-based representation of Iij as δij, a curve consisting of all pairs (u, v)such that the points xi(u)andxj(v)lyingonγi and γj, respectively, are images of some point X on Iij (recall Section 4.4.1). For every (u, v) ∈ δij,wetakexi(u)andxj(v), and then transfer

k k them to the kth view to obtain a point xij of ιij. Transfer can be accomplished using one of the oriented formulas of Section 2.4.3. For example, the epipolar transfer formula (2.30)

142 can be written as k xij  sgn (Fikxi) ∨ ekj (Fjkxj) ∧ (Fikxi) . (4.28)

Though (4.28) is a simple and convenient formula, its use is actually not desirable in a robust implementation. As stated on p. 49, (4.28) fails, among other cases, whenever the

3D intersection point X is located in the trifocal plane spanned by the camera centers Oi,

Oj,andOk. In implementation, the epipolar transfer relationship is numerically unstable whenever the intersection point lies near the trifocal plane. Because this situation is relatively common in practice, our implementation will rely on the more robust (though somewhat cumbersome) formula (2.33) based on the trifocal tensor Tijk: k  xij  sgn li ∨ eik Tijk(li, lj) ∧ Tijk(li, lj) , (4.29)

  where li is a line passing through xi,andlj, lj are two lines such that lj ∧ lj  xj.

k The next order of business is how to clip away the parts of ιij that fall outside ωk,the

k silhouette in the kth view — in other words, how to compute ιij ∩ ωk. Intersecting a curve with a solid region in the plane (actually, the two-sided plane T2) is a standard geometric operation that presents no conceptual difficulties. First, we find the points of intersection

k k of ιij with the boundary of ωk, which happens to be γk, split ιij into components bounded by the intersection points, and then figure out which components lie in the interior of the

k silhouette. The only “novel” aspect of the problem is the oriented nature of both ιij and

ωk. Because of our assumption that the 3D object Ω lies in front of the focal plane of each camera, the entire silhouette ωk is located in the front range of the kth image plane.

However, the same assumption does not extend to Iij: some parts of the intersection curve may fall behind the focal plane of the kth camera, and so appear in the back range of the

k kth image plane. Obviously, any points of ιij that end up antipodal to some point of ωk

k must be excluded from ιij ∩ ωk. An implementation that does not keep track of orientation properly is likely to make a mistake in this case. This, along with additional implementation issues, will be discussed more thoroughly in Chapter 5.

143 (a) (b) (c)

Figure 4.10 Clipping intersection curves: an example (see text).

The final step in the clipping process involves lifting the results of the computation of

k ιij ∩ ωk back onto the original curve Iij. This is simply a matter of maintaining consistency

k between the data structures representing Iij and ιij. In our implementation, Iij does not even exist as such, but is represented by δij, a curve in the joint parameter space of γi and

k γj. Conceptually, the projected (rather, transfered) curve ιij is obtained by appending an

k additional data item, the coordinates of xij, to each point (u, v) in the representation of δij.

k After clipping away parts of ιij against ωk, we get back the corresponding clipped version of

δij simply by discarding the coordinates in the kth view.

Figure 4.10 shows an example of clipping: the intersection curve shown in Figure 4.9 is clipped against a third view of the gourd. Part (a) shows the reprojected intersection curve superimposed upon the third image, part (b) shows the intersection curve clipped against the silhouette, and part (c) shows the clipped curve in 3D.

4.5.2 Intersection Points

So far, we have not discussed the combinatorial structure of the 1-skeleton. In partic- ular, how do segments belonging to different intersection curves connect with each other?

Generically, only one type of junction is possible: three segments belonging to curves Iij,

144 Ijk,andIik are incident to one another at an intersection point of Ki, Kj,andKk as shown in Figure 4.11. (In a generic situation, triples of cones meet transversally at a finite number of isolated points, and sets of more than three cones do not have any points in common.)

It is convenient to think of intersection points of Ki, Kj,andKk as locations where the visual cone Kk cuts the intersection curve Iij. These locations may be spotted in the

k k kth image as follows. As in (4.28) or (4.29), let xij be a point of ιij that is the result of

k transferring the pair xi(u)andxj(v), (u, v) ∈ δij. Whenever xij  xk(w) for some point xk(w)onγk, the triple of parameter values (u, v, w) corresponds to a unique intersection point in space. That is, there exists a point X such that xi(u)  PiX, xj(v)  PjX, and xk(w)  PkX. An intersection point is essentially a trinocular stereo match, though in a generic situation, it cannot be a true physical point on the surface of the object being reconstructed (for the intersection point to lie on the surface, the trifocal plane would have to be tangent to the object, which does not happen when the cameras are in general position).

It is not difficult to see that the set of all xk(w) that are images of intersection points is

k k equal to ιij ∩ γk. This is rather convenient, since computation of ιij ∩ γk is actually the most

k important step in finding ιij ∩ ωk, as explained in the previous subsection. All we have to do is explicitly keep track of all the intersection points computed by the clipping algorithm.

Let Ii (resp. Ij, Ik)denotethesetofallpointsxi(u) (resp. xj(v), xk(w)) that are projections of some intersection point of Ki, Kj,andKk. Then we can write

i j k Ii = ιjk ∩ γi , Ij = ιik ∩ γj , Ik = ιij ∩ γk .

Thus, we “see” each intersection point exactly three times: when clipping Ijk against the ith silhouette, Iik against the jth silhouette, and Iij against the kth silhouette (see Figure

4.11). In the implementation, it is important to keep track of these repeated sightings.

145 Ok

Oj xk()w k gij

xj()v gk g j ik I I ij g ik j X

Ijk Gi

Gk Gj W

xi()u i gjk

gi

Oi

Figure 4.11 X is an intersection point between views i, j,andk. The thick black curves incident on X are fragments of Ijk, Iik,andIij that belong to the 1-skeleton of the visual hull. The parts of the respective intersection curves that are “clipped away” by the cones Ki, Kj,andKk are dashed. Note that X does not lie on the surface of the object Ω: the three outline points xi(u), xj(v), and xk(w) are actually projections of the white points on the rims Γi,Γj,andΓk.

146 4.5.3 An Incremental Algorithm

In our implementation, the 1-skeleton of the visual hull is represented using a graph- like data structure consisting of a vertex list and an edge list. Vertices come in two types: intersection points and critical points of individual intersection curves, or frontier points for short. The edge list consists of intersection curve segments connecting pairs of vertices. The following is a list of data items stored with each vertex and edge.

Vertex:

Frontier point: this is the same as a vertex in the intersection curve data structure

described on p. 135. It is identified using the pair (i, j) of camera indices and the

pair (u, v) of parameter values in the respective views.

Intersection point: identified using a triple (i, j, k) of camera indices, and a triple (u, v, w) of parameter values in the respective images.

Edge: each edge stores two pointers into the vertex list to indicate its endpoints, as well

as a sign label inherited from the “parent” edge on the intersection curve to which it belongs (see p. 4.4.2). In addition, the edge stores a piecewise-linear approximation of

its interior in a list of (u, v) pairs obtained by tracing the curve using Algorithm 4.3.

Note that we can get the (i, j) identifier of the edge by taking the two camera indices

that its endpoints have in common.

Note that the above representation is entirely image-based: it does not explicitly specify 3D coordinates of points in space, only the parameter values of their projections in two or three images. We have implemented a simple incremental algorithm that, at the end of the ith iteration, produces a complete 1-skeleton of the intersection of cones K1,...,Ki. An incremental algorithm is particularly desirable for certain practical situations. In a real-time setting, the

147 incremental algorithm could run for as many steps as permitted by the time constraints.

Alternatively, in an interactive system, a user could halt the computation as soon as the model achieves an acceptable degree of approximation to the object being reconstructed. An incremental algorithm could also be used with an experimental setup in which images are acquired not simultaneously, but one at a time. Instead of waiting until all images of the object have been taken, it may be preferable to build an intermediate model based on whatever data is available. Every time a new visual cone is added to the scene, the existing 1-skeleton is modified in two steps: first, all parts of existing intersection curves that fall outside the new cone are clipped away; and second, intersection curves due to the new cone are traced and clipped against all pre-existing views. A high-level summary of the steps is as follows:

Algorithm 4.4 (Compute the 1-Skeleton).

For i =2,...,n Clip each existing edge E against ωi.(1) For j =1,...,i− 1 Trace the intersection curve Iij using Algorithms 4.2 and 4.3. (2) For k =1,...,i− 1, k = j Clip each edge E of Iij against ωk. End For End For End For

The pseudocode of Algorithm 4.4 omits all the steps involved in “bookkeeping”, or main- taining the data structures representing the 1-skeleton. Most of these operations are straight- forward, and do not need detailed explanation. For example, when clipping an intersection curve edge against an outline, we modify the vertex list by deleting intersection or fron- tier points that are clipped away and inserting newly discovered intersection points. Before adding an intersection point to the list, it is important to check whether this point has been encountered in a previous clipping operation (as explained earlier, each intersection point

148 is seen three times, corresponding to its projections in three images). In addition, the clip- ping operation requires us to modify the edge list: previously existing edges may disappear entirely, acquire different endpoints, or give rise to several new edges as a result of splitting. It is easy to see that Algorithm 4.4 is relatively inefficient. Intuitively, a lot of the work involved in computing the intermediate stages of the 1-skeleton is unnecessary. Here are a few aspects of the algorithm that are obviously wasteful:

• Algorithm 4.4 traces the complete intersection curve between each pair of viewpoints, whereas it would be far more desirable to trace only the parts of the curve that end

up on the surface of the visual hull.

• The algorithm computes many intersection points that are present in the intersection of some subset of visual cones, but do not belong to the final visual hull. Ideally, we would

not like to spend time on computing points that disappear after some intermediate

stage.

• Each intersection point is found three times, which appears redundant.

The incremental algorithm can be improved by replacing step (2), which computes the full intersection curve Iij, with a step that computes only the part of Iij that lies on the visual hull of K1,...,Ki. To do this, we take advantage of the fact that step (1) has already located all the intersection points of Ki with the cones K1,...,Ki−1. Suppose that one of these intersection points has camera indices (i, j, k) and parameter values (u, v, w). Then all we have to do is trace the curve Iij beginning at (u, v), and Iik beginning at (u, w). Each curve should be traced until it terminates at another intersection point. Essentially, we are computing the combinatorial structure of the 1-skeleton at the same time as tracing intersection curve branches that belong to the visual hull. This approach does not require us to compute the global structure of intersection curves, so there is no need for Algorithm

4.2 in its complete form.

149 At this stage, we do not specify all the implementation details needed to make the modification work. These details include determining the direction of traversal of intersection curves (e.g., if we begin tracing Iij at (u, v), in which direction should we change u and v so as to follow the part of the curve that belongs to the visual hull), locating critical points of intersection curves “on the fly”, and deciding when to terminate the trace of a particular curve segment. A significant complication for the improved incremental algorithm is presented by the fact that there may exist components of some intersection curve Iij that are not connected to the rest of the 1-skeleton by means of intersection points. We need to design a procedure to reliably and efficiently detect such components.

While the modification described above improves the running time of Algorithm 4.4, it still retains some inefficiencies. In particular, we are still forced to compute intermediate vertices and edges that are clipped away at subsequent stages. This is a fundamental limi- tation of the incremental approach that can be overcome by an algorithm that works with all n views simultaneously. Let us briefly sketch the outline of a better algorithm based on the idea of graph search. This algorithm would traverse the 1-skeleton beginning with some vertex that is known to belong to the visual hull, and branch out from this vertex to discover new vertices and edges. To initialize the search, we can use a frontier point (which under ideal conditions is guaranteed to lie on the visual hull), or some intersection point found by brute force. Next, we use the knowledge of the local connectivity of the 1-skeleton at this vertex to find its incident edges. Recall that the connectivity of a frontier point can be determined using the catalogue of Figure 4.7, while the connectivity of an intersection point is always fixed (see Figure 4.11). Having identified the incident edges of a given vertex, we trace each of them using Algorithm 4.3 while simultaneously looking for the next endpoint.

If the edge belongs to the curve Iij, it needs to be traced concurrently in every other view k = i, j until it crosses from the inside to the outside of the silhouette in one of the views.

150 Just as the modified incremental algorithm, this improved algorithm becomes more com- plicated when we allow the 1-skeleton to have multiple connected components. To traverse all the connected components, one needs a method for efficiently finding an initial vertex in each component, or for detecting when certain components of the 1-skeleton remain to be traversed. Overall, the graph traversal method, while superior in principle, is much more dif-

ficult to specify and implement correctly than Algorithm 4.4. Within the scope of this thesis, simplicity and correctness have been given priority over computational efficiency. Thus, at this time we have not implemented any of the possible improvements to Algorithm 4.4.

4.6 Computing the Faces of the Visual Hull

Finally, we consider the problem of computing a full boundary representation of the visual hull starting with the 1-skeleton output by Algorithm 4.4. Fortunately, very little additional work remains to be done: the 1-skeleton, which describes the boundaries of all visual hull faces, contains enough implicit information to allow us to identify the interiors of these faces. Recall that a visual hull face is a subset of a cone strip, or the part of ∂V lying on the surface of one of the original visual cones Ki. The whole cone strip may be implicitly represented as a collection of intervals along all possible visual rays

Li(u)= Qi ∧ Ri Ri ∧ Pi Pi ∧ Qi xi(u) , (4.30) where xi(u) is any point on the outline γi. Of course, each interval is bounded by a point belonging to an intersection curve of Ki and any other cone in the scene. The ability to identify the boundaries of all intervals along a particular visual ray, as well as the relative ordering of the intervals, enables us to perform many useful computational tasks involving

∂V . Possibly the most important task for our purposes is vertical decomposition, which will be discussed in Section 4.6.2. But before we can describe the vertical decomposition process, we need to introduce some basic techniques for reasoning about the ray intervals that are the “building blocks” of the faces of ∂V .

151 4.6.1 Ray Intervals

In this section, we will describe two primitive operations that are the at the heart of the algorithm presented in the next section. These operations are: ordering interval endpoints along the orientation of the visual ray, and classifying endpoints of a ray interval as front or back with respect to the ith view.

First, let us define a linear ordering for the points on a given visual ray.

Definition 4.7 (Linear Ordering For Points). Let X and Y be two distinct points on the visual ray Li. Then we can define an ordering relation on these points as follows:

X < Y iff Li  X ∨ Y . (4.31)

The above relation can be used to order all the intervals along which Li intersects ∂V .

Intuitively, we can think of this as sorting all intervals in increasing order of their “distance” from the camera center Oi (one must be careful, though, because the notion of distance does not exist in projective geometry).

The ordering (4.31) can be easily computed by looking at the projections of the points X and Y into any other view j. The visual ray Li projects onto the epipolar line lji = Fijxi, while X and Y project onto points xj and yj.

The projection preserves the relative orientation of X and Y ,sowehave

Li  X ∨ Y iff lji  xj ∨ yj . (4.32)

The notion of linear ordering can be easily extended from individual points lying on a visual ray to edges of the same cone strip that are intersected by the same visual ray.

Definition 4.8 (Linear Ordering For Edges). Let E and F denote edges of the ith cone strip that are touched by the ray Li(u) in two distinct points X(u) and Y (u). Then we have

E

152 Notice that the ordering of edges must remain constant over the maximal open interval of u where the two edges are touched by the same visual ray. For the ordering to change, the two edges would have to intersect each other in their interiors. However, such an event is not possible by construction of the 1-skeleton (i.e., edges can meet only at endpoints).

The vertical decomposition algorithm uses edge ordering in order to build an active edge list that is dynamically maintained via insertions and deletions. An edge is inserted when the sweeping ray passes its first endpoint (the endpoint that is reached first as u varies in the increasing direction) and deleted when the ray passes its second endpoint. Before we can insert a new edge E into the list, we must locate its proper position with respect to the edges already in the list. Let E1,...,El denote the edges in the active list, in order. We compute the points X1,...,Xl where the sweeping ray intersects each respective edge, locate the first vertex X of E in this list of points using image-based computations as in (4.32), and insert

E at the corresponding position in the edge list. Sometimes it is necessary to simultaneously insert two edges that share the same first endpoint X (X may be an intersection point or a type 3A critical point). In these cases, simply locating X is not sufficient to determine which of the two edges should come first in the list. One way to solve this problem is to select a value of u that falls in the interior of both edges, construct the points where they meet the ray Li(u), and test the relative orientation of the two points. However, when X is a type 3A critical point, a more elegant method can be used. This method involves the classification of edges as front or back. We will return to the ordering issue after introducing this classification.

We will begin by defining the front/back classification for intersection curve points be- longing to the 1-skeleton.

Definition 4.9 (Front/Back Classification For Points). Let X be a point on the boundary of the ith cone strip, and Li be the visual ray passing through X. Also, let S denote the interval on Li that lies on ∂V and has X as an endpoint. Then X is called front (resp.

153 back) with respect to the ith view if it is the first (resp. last) point of S. That is, for any point Y in the interior of S, we have X < Y (resp. X > Y ).

The following proposition gives a convenient image-based criterion for determining the front/back status of X.

Proposition 4.10 (Front/Back Status of X). The point X of Iij, considered as part of the boundary of the ith cone strip, is front (resp. back) if and only if

tj ∨ eji < 0 (resp. > 0) , where tj is the tangent to γj at the point xj  PjX.

Proof. First, let us apply Definition 4.9. Let Y be a point on Li that is located in the interior of the strip. Then X is a front point if Li  X ∨ Y , and is a back point if

Li  Y ∨ X. Let us consider the information available in the jth view. The visual ray Li projects onto the epipolar line lji = Fijxi, while X and Y project onto points xj and yj satisfying lji  eji ∨ xj and lji  eji ∨ yj. How can we tell by looking at the projection of

X in the jth image whether X is the front or back endpoint of its ray interval? Ideally, we would like an expression that does not include the “fictitious” point yj.IfX is front, then

Li  X ∨ Y ,andlji  eji ∨ xj  xj ∨ yj. Therefore,

tj ∨ yj −tj ∨ eji , (4.33)

 where tj = xj ∨ xj is the tangent to γj at xj.Byconvention,tj is oriented so that the interior of the silhouette is located on its positive side. Since the interior of the ith cone strip projects into the interior of the silhouette in the jth image, we must have tj ∨ yj > 0.

From (4.33), we can conclude that

tj ∨ eji < 0 .

Similarly, when X is back, we have

tj ∨ eji > 0 .

154 The above argument can be easily reversed to show that whenever tj ∨ eji is negative (resp. positive), then X is the front (resp. back) endpoint of its ray interval.

Notice that the sign of tj ∨ eji can change only at a frontier point. Since the edges of the

1-skeleton cannot contain frontier points in their interior, the front/back status is actually the same for all points in the interior of a single edge of the 1-skeleton. Thus, just as with linear ordering, we can extend the definition of front/back status to apply to whole edges of the 1-skeleton.

Definition 4.10 (Front/Back Classification for Edges). An edge E of the 1-skeleton, con- sidered as part of the boundary of the ith cone strip, is called front (resp. back) if each point in its interior is a front (resp. back) point according to Definition 4.9.

How can we determine the front/back status for a given edge E that lies on an intersection curve Iij without picking a representative point X in its interior? More accurately, we would like to determine the status of E by taking X to be the first vertex of E encountered as the visual ray moves in the increasing direction of u. As long as the first vertex is not a type 1 or 3 critical point, we can simply compute the sign of tj ∨ eji. Otherwise, X projects to a frontier point xj(v0)inthejth view, so we have tj ∨ eji = 0. To get around this difficulty, we must consider the local behavior of the function

 ϕj(v)=|xj(v), xj(v), eji| = tj(v) ∨ eji in the neighborhood of its zero v0. Recall that a similar function was first introduced in

 (4.12). As discussed on p. 108, the sign of the derivatives ϕj(v0)  κj λj (4.13) tells us about the sign of ϕj in the interior of E. At this stage, we have to take into account the sign label associated with E in the combinatorial representation of the 1-skeleton (see p. 147). If the sign label is ++, then v moves in the increasing direction away from v0 as we traverse E

 in the increasing direction of u. In this situation, a positive sign of ϕj(v0)meansthatϕj is

 increasing at v0, so it becomes positive in the interior of E. Similarly, if ϕj(v0) is negative,

155 then ϕ is negative in the interior of E.Ifthelabelis+−,thenv decreases as u increases,

 and the rule becomes reversed. Namely, ϕj(v0) > 0 (resp. < 0) implies that ϕ is negative (resp. positive) in the interior of E. Below, we summarize the rule for determining the front/back status of the edge E based on its first vertex X.

Algorithm 4.5 (Determine Front/Back Status).

If X is a type 2 critical point or an intersection point on the curve Iij  E is front (resp. back) if |xj, xj, eji| < 0 (resp. > 0). Else [X is a type 1 or 3 critical point on Iij] If the label is ++ E is front (resp. back) if κj λj < 0 (resp. > 0). Else [Label +−] E is front (resp. back) if κj λj > 0 (resp. < 0). End If End If

At this point, we can return to the issue of computing the relative linear ordering of two edges that share the same front endpoint. As mentioned earlier in the section, this ordering can be decided for type 3A critical points using only local information (as opposed to selecting sample points in the interior of the edges). The two different cases are shown in Figure 4.12. In the figure, the edge that comes first (resp. second) in the linear ordering is labeled near (resp. far). The two cases can be characterized as follows. In Figure 4.12 (a), where the front edge is near and the back edge is far, the visual ray Li is locally outside the ith cone strip at X — that is, all points of Li in the immediate neighborhood of X (except, of course, X itself) do not belong to the strip. By constrast, in Figure 4.12 (b), Li is locally inside the cone strip. The following proposition tells us how to distinguish the two cases.

Proposition 4.11. Let (u0,v0) be a Type 3A critical point and X be the corresponding point of the 1-skeleton that projects onto xi(u0) and xj(v0) on γi and γj, respectively. Then the

156 Li Li F back/far F front/far X X back/near front/near E E

(a) (b)

Figure 4.12 Two ordering possibilities for two edges sharing the same first endpoint X. visual ray Li(u0) is locally outside (resp. inside) the ith cone strip at X if κj > 0 (resp.

< 0), where

  κj = |xj(v0), xj(v0), xj (v0)| .

Proof. First, consider the case when Li is locally outside the cone strip. In the jth image,

Li projects onto the epipolar line lji = Fijxi that is tangent to γj at xj.Wemusthave lji ∼ tj,wheretj is the tangent to γj at xj. Since the interior of the ith cone strip projects to the interior of the silhouette ωj bounded by γj, all points of lji in the neighborhood of xj must lie outside the silhouette. In this case, the orientation of lji does not matter, so we can also say that the tangent tj is locally outside the silhouette. Because of the convention that the silhouette must lie on the positive side of tj, we can conclude that γj is locally convex at xj. By Proposition 3.15, we must have κj > 0. In the case when Li is locally inside the cone strip, the tangent tj ∼ lj must lie locally inside ωj,soγj is locally concave at xj and

κj < 0.

157 4.6.2 Vertical Decomposition

Vertical decomposition breaks up each cone strip into a collection of monotone cells:

Definition 4.11 (Monotone Cells). A subset F of the visual cone Ki is called monotone if each visual ray of the form 4.30 intersects the interior of F along at most a single non- degenerate interval (an interval having two distinct endpoints).

Vertical decomposition is a technique widely used in computational geometry [17], and in our case, the set of “vertical” directions is simply the set of all visual rays that form a given cone. There are several important reasons for representing cone strips as collections of monotone cells:

• ∂V can have multiply connected faces that are bounded by two or more disjoint loops of intersection curve edges. To compute a valid description of ∂V , it is necessary to

group all loops that form the boundary of a single face. As it turns out, decomposition

into monotone cells is the most convenient way of solving this problem.

• In computational geometry, one of the main reasons for performing a vertical de-

composition is to obtain a geometric structure where each unit or cell has “constant

description complexity”. The same consideration applies in our case. Whereas a face of ∂V may have a boundary consisting of an arbitrary number of loops, and each loop

may consist of an arbitrary number of intersection curve edges, a single cell can be

given a fixed-size description, consisting of a front intersection curve edge, a back edge,

and at most two “vertical” edges.

• Vertical decomposition facilitates triangulation, a process that converts the formal

description of ∂V into a mesh suitable for rendering by standard graphics software.

The simple cells output by the vertical decomposition algorithm can be triangulated using a simple linear-time algorithm for monotone polygons [45]. All the 3D meshes

shown in Chapter 5 (e.g. Figure 5.9) are obtained using this method.

158 The algorithm for vertical decomposition performs a sweep along the contour γi in the increasing direction of u, dynamically updating the set of ray intervals belonging to the ith cone strip as u varies. Each interval is identified by the near/far set of edges bounding it, and the edges are kept in the ordered active edge list, as described earlier in Section 4.6.1. The active edge list needs to be updated whenever the sweeping ray reaches the first or second endpoint of some ray. For the sake of presentation, we distinguish between critical events that change the number of intervals and combinatorial events that simply replace one edge by another in the list. Critical events are shown in Figure 4.13, and combinatorial events are shown in Figure 4.14. Critical events are of much more interest to us, and we describe them in detail below.

L L Li i Li i

Ef E Eb b E Es f Es

X X X X E E p E E p E b b f Ef

(a) Appear (b) Split (c) Disappear (d) Merge

Figure 4.13 Critical events for traversing the ith cone strip. The direction of increasing u is from left to right.

Appear (Figure 4.13 (a)). For a new interval to appear along the visual ray, the point

X on Li must be the first endpoint of two segments of the 1-skeleton. The front and back edges that share the endpoint X are labeled Ef and Eb in Figure 4.13. For this event, we have Ef 0. When the sweepline algorithm encounters an appear event, it needs to determine the relative order of Ef and Eb,andto insert them into the active edge list. Proposition 4.11 tells us how to recognize this event when X is a type 3A critical point. When X is an intersection point, we must determine

159 the near/far status of Ef and Eb by sampling points in their interior as explained earlier.

Split (Figure 4.13 (b)). An existing ray interval splits when the first endpoint of two edges cuts into its interior. Unlike the appear event, the split event is characterized by Eb

Li being locally inside the ith cone strip. In this case, X may be an intersection point or a critical point of type 3A with κj < 0 (Proposition 4.11). Upon encountering the split event, the decomposition algorithm locates the two edges Ep and Es that immediately precede and follow Eb and Ef in the active edge list, and breaks up the cone strip along the “vertical” interval of Li that connects Ep to Es. This interval will form part of the boundaries of three monotone cells in the final output.

Disappear (Figure 4.13 (c)). A ray interval disappears when Li passes the second end- point of two edges Ef and Eb,andEf 0.

Merge (Figure 4.13 (d)). Two distinct ray intervals merge when the visual ray passes the second endpoint of Ef and Eb,withEb

Let us briefly discuss the combinatorial events illustrated in Figure 4.14. In parts (a) and (b) of the figure, the visual ray passes a type 1 or 2 critical point X (that is, a frontier point in the ith view). In parts (c) and (d) of the figure, the visual ray passes an intersection point X that connects two front (resp. back) edges bounding the cone strip. In each case,

160 the algorithm simply needs to delete the edges having X as their second endpoint (denoted

Ef and Eb in the figures) and replace them with the edges having X as the first endpoint

  (denoted Ef and Eb).

Li Li Li Li

Eb X ¢ ¢ ¢ Eb Eb Eb Eb Eb Eb

Xb X Xf E ¢ E ¢ E E ¢ Ef f Ef f f f X Ef

(a) Type 1 (b) Type 2 (c) Front (d) Back critical point critical point intersection point intersection point

Figure 4.14 Combinatorial events in the sweepline vertical decomposition algorithm.

The final output of the vertical decomposition algorithm consists of a list of monotone cells formed by breaking up the strip along the intervals created during the processing of split and merge events. The monotone cells themselves can be thought of as maximal contiguous collections of ray intervals having the same topological structure. The boundary of each cell is formed by a single front edge, a single back edge, and up to two vertical edges, or intervals along the first and last ray to come in contact with the cell.

Figure 4.15 shows an example of cell decomposition on the Steve data set shown in Figure

5.17. Part (a) shows a close-up of the part of the boundary of one cone strip with the split and merge points indicated by  and , respectively, and part (b) shows the boundary with the vertical edges inserted by the decomposition algorithm. Because the abstract represen- tation of the cone strip boundary is difficult to visualize, we have plotted the boundary in the space parametrized by u, the contour parameter, and d, the Euclidean distance from the camera center. We must emphasize that distance is used here for visualization purposes only.

161 d d

u u (a) (b)

Figure 4.15 Vertical decomposition example. The horizontal axis corresponds to u,the contour parameter, and the vertical axis corresponds to d, the Euclidean distance from the camera center (see text).

Note. In the process of breaking up the cone strip into monotone cells, the vertical decom- position algorithm sometimes has to insert vertices in the interior of edges that belong to the 1-skeleton. However, the decomposition of two different cone strips that share a given edge will generally result in different vertices being created on that edge. To avoid inconsis- tencies between boundary descriptions of different strips, vertical decomposition should not modify the structure of the original 1-skeleton. Rather, it should output an independent, self-contained description of an individual cone strip.

4.6.3 Convex Objects: The Visual Hull and the Rim Mesh

We conclude this chapter by considering the relationship between the visual hull and the rim mesh of a convex object. First, let us recall the restrictive assumptions on p. 113 that enabled us to specify an algorithm to compute the rim mesh based on image information alone. In the case of a convex object, conditions (A1) and (A3) are satisfied automatically.

To satisfy the remaining condition (A2), we simply need to assume that two or more cameras are present in the scene and that the rim mesh is connected. In addition, we will assume that there is no noise in the data, so that all frontier points in the images come in matching

162 pairs. Then the set of all singularities of intersection curves is exactly the set of all rim crossings, or vertices of the rim mesh.

Any outline γi of a convex object is a convex curve. Because the rim mesh is connected, there must exist at least one other outline γj that has two or more matching frontier points with γi.Letxi and xj be a single pair of matching frontier points. Because the epipolar lines lji = Fijxi and lij = Fjixj intersect γj and γi only at the respective frontier points, the intersection curve Iij cannot have type 2 or 3 critical points. It is also clear that the visual rays Li and Lj formed by back-projecting xi and xj graze ∂V at a single frontier point X that is the intersection of rims Γi and Γj, as well as a singularity of Iij (a vertex of the 1-skeleton). Overall, any visual ray through a frontier point has only one (degenerate) interval on ∂V . Because the absence of type 2 and 3 critical points rules out critical events that can change the number of ray intervals as we traverse a cone strip, we must conclude that every visual ray intersects its respective cone strip in exactly one interval.

Consider what happens as we sweep out the ith cone strip according to perform vertical decomposition. At each possible position, the sweeping ray Li meets the strip along a single interval that contains a point on Γi in its interior (obviously, Li must touch the rim, and the point of tangency must lie on ∂V ). When Li passes through a rim crossing, its ray interval degenerates to a single point, causing a critical event that terminates one monotone cell and initializes another. As Li moves along Γi from a particular frontier point to its successor, it sweeps out a single monotone cell. Thus, the set of cells into which vertical decomposition breaks up the ith cone strip are in one-to-one correspondence with the edges of the rim mesh that belong to Γi. In addition, each cell is completely bounded by a loop of intersection curveedges,soitisalsoafaceof∂V .

Let v, e,andf denote the number of vertices, edges, and faces of ∂V .Wehave

v = vr + vi ,

163 where vr is the number of vertices of the rim mesh, and vi is the number of intersection points.Aswehavearguedabove,thefacesof∂V are in one-to-one correspondence with edges of the rim mesh. Therefore, we also have

f = er =2vr ,

where vr and er is the number of vertices and edges of the rim mesh. The last equality follows from (4.18). Starting with Euler’s formula v − e + f = 2 (the genus of a convex object is 0), we get

e = v + f − 2=vr + vi +2vr − 2

=3vr + vi − 2 . (4.34)

But we also know that the sum of the degrees of all vertices is equal to twice the number of edges. In the 1-skeleton of ∂V , the degree of a frontier point is 4 and the degree of an intersection point is 3, so we have

2 e =4vr +3vi . (4.35)

Eliminating e from (4.34) and (4.35) and solving for vi, we obtain the number of intersection points as a function of the number of frontier points:

vi =2vr − 4 .

Now we can also express the total number of vertices and edges of ∂V in terms of vr:

v = vr + vi = vr +(2vr − 4) = 3 vr − 4 .

e = v + f − 2=(3vr − 4) + (2 vr) − 2=5vr − 6 .

Finally, the total complexity (the total number of vertices, edges, and faces) of ∂V is

v + e + f =(3vr − 4) + (5 vr − 6) + (2 vr)=10vr − 10 .

164 The most interesting thing about the above derivation is that we can find the exact combi- natorial complexity of ∂V given only the number of vertices in the rim mesh. As in the case of many other geometric problems, the assumption of convexity allows us to carry out the kind of analysis that is difficult or impossible otherwise.

165 CHAPTER 5

Implementation and Results

This chapter is concerned with implementation of the algorithms developed in Chapter 4. In Section 5.1 we discuss the main issues that arise in putting the theory of Chapter 4 into practice, including efficiency and robustness considerations. In Section 5.2, we present rim mesh examples published earlier [32]. Section 5.3 contains several examples of complete visual hull meshes computed from challenging real data.

5.1 Implementation Details

5.1.1 Discrete Contour Representation

The algorithms for computing the rim mesh and the visual hull require properly oriented contours of the object as input. So far, we have not said how these contours may be obtained.

Segmentation is a very difficult issue in computer vision, so we have not attempted to write an automatic procedure for contour extraction in general scenes. Instead, we have obtained contours using a two-stage process: in the first stage, original input images are manually converted to binary silhouette images such as the ones shown on Figure 5.4 (although in that case, the silhouette images were generated directly from synthetic data using OpenGL), and in the second stage, contours are extracted from the binary image using a subpixel-precision implementation of the Canny edge detection algorithm [5]. The silhouette images follow the convention of representing the foreground by white and background by black. Based on this

166 convention, the edges are oriented so that white is on their positive side (that is, on their left). For example, the contours of the egg from Figure 5.4 are all oriented counterclockwise.

Following extraction, contours are represented as piecewise-linear closed curves (so the silhouettes are simple polygons). In our original implementation [32], contours were explicitly modeled as C2 curves using cubic B-splines. However, in switching to a piecewise-linear representation we were able to simplify the implementation without reducing the quality of the results. For all our input datasets, contour resolution was relatively high (average spacing of about 1 pixel between consecutive vertices), and we have not seen any discretization artifacts. For cases where higher resolution is required, our program includes an option to upsample the contour using spline interpolation. To derive the theoretical results of Chapters 2 and 4, we have assumed that all apparent contours in the images are smoooth. To adapt this framework to the discrete setting, we need to redefine frontier points and to give alternative formulas for the differential quantities

κλ and ν definedinPropositions4.4and4.9. A frontier point on a piecewise-linear contour is a vertex x whose two incident edges have different relative orientations with respect to the epipole e. This definition is illustrated in

Figure 5.1 (a). As in the figure, let xp and xs be the predecessor and the successor of x along the contour orientation, and tp = xp ∨ x and t = x ∨ xs be the two “tangents” to the edges incident to x.Thenx is a frontier point if

tp ∨ e −t ∨ e . (5.1)

Notice that we have to make the assumption that the epipole does not lie on either tp or t.

In the next section, we will argue why this assumption is justified.

In Section 4.2, we have shown that the sign of κλ tells us about the change of the relative orientation of the contour tangent and the epipole (recall the discussion on p. 108). To summarize, when κλ > 0, the epipole migrates from the negative to the positive side of the tangent, and when κλ < 0, it migrates from the positive side to the negative. This

167 reasoning directly translates to the following rule:

tp ∨ e < 0andt ∨ e > 0: κλ>0 ,

tp ∨ e > 0andt ∨ e < 0: κλ<0 . (5.2)

For example, in Figure 5.1 (a), κλ>0atx.

t p t e x e t x l s x xp

(a) (b)

Figure 5.1 (a) Computing κλ: x is a frontier point. (b) Computing ν.

Another important quantity is ν, as defined in Proposition 4.9. Let x a point of intersec- tion of the contour and the epipolar line l (Figure 5.1 (b)). Assuming that x is not a vertex, we simply have ν = t ∨ e , (5.3) where t is the properly oriented line through the edge containing x. For example, in Figure 5.1 (b) we have ν<0atx.

In our implementation, we use (5.1) to localize frontier points, and (5.2) to determine the ordering of rims around frontier points for rim mesh construction (Proposition 4.4 and Algorithm 4.1). In addition, (5.2) and (5.3) are used to decide critical point type for inter- section curve tracing (Proposition 4.9 and Algorithm 4.2), as well as to find the front/back status of edges bounding the 1-skeleton of the visual hull (Algorithm 4.5).

168 5.1.2 General Position Assumptions

As already noted in the previous section, to make sure that the expressions (5.1), (5.2), and (5.3) are always valid, one must assume that the epipole never falls exactly on a tangent to one of the edges of the discrete contour, and that a reprojected epipolar line never inter- sects the contour at a vertex. In the implementation, we assume that these conditions always hold. That is, we make a general position assumption about the input. In computational geometry, general position assumptions are often made to reduce the number of special cases that need to be addressed when designing and analyzing algorithms. Though real-world inputs to geometric algorithms are frequently “degenerate” (a fact that greatly complicates implementation), in our situation the general position assumption appears to be justified. The main reason is that our input data is derived from imprecise measurements. Positions of contour vertices are affected by noise in the image, while camera matrices and epipole positions are affected by calibration error. In short, it is fair to assume that all the quantities used as input to our program (represented as double-precision floating point numbers) have a certain number of “significant digits”, with the rest being random noise. This noise in our case performs a beneficial function, perturbing the data to eliminate degenerate situations.

For our purposes, the most important consequence of the general position assumption is the non-existence of pairs of matching frontier points. Recall from Definition 4.3 that match- ing frontier points are points of epipolar tangency that lie on exactly matching epipolar lines.

However, in practice, matching frontier points never lie on exactly matching epipolar lines. Errors in contour extraction and camera calibration effectively perturb the observed image locations of frontier points, so that the rays back-projected from the 2D points no longer meet in 3D. As a result, whenever we have a frontier point in one image, the corresponding epipolar line in the other image either misses the contour, or intersects it in a pair of nearby points. In the discrete implementation, frontier points are vertices of the contour, while the

169 general position assumption states that reprojected epipolar lines can intersect the contour only at points located in the interior of edges.

Figure 5.2 contrasts the “ideal” case of matching frontier points and the usual “degen- erate” case obtained with real-world data. In part (a), xi and xj are two matching points on γi and γj, respectively. The back-projected rays Li and Lj intersect in the 3D frontier point X, and the intersection curve Iij has a crossing, or a type 1 critical point as explained in Section 4.4.1. Figure 5.2 (b) shows a more realistic situation. There, the contours γi and

γj have been “perturbed” so that the frontier points xi and xj no longer correspond (the rays Li and Lj do not meet in space). The epipolar line lji = Fijxi does not coincide with the contour tangent tj, but intersects γj in two points xj1 and xj2, while the other epipolar line lj = Fjixj misses γi altoghether. In Figure 5.2 (b), the intersection curve Iij loses the crossing and acquires two type 2 critical points instead.

L L L i i j Lj X X 2 I ij t l j t t ij i t i l j X ji 1 x I x j xj2 x ij j x i i Iij

xj1

Oj Oj Oi Oi

(a) (b)

Figure 5.2 (a) Ideal singularity (type 1 critical point) of the intersection curve. (b) Real- world data: type 2 critical points.

The non-existence of matching frontier points has both favorable and unfavorable prac- tical consequences. For example, it complicates the procedure for computing the rim mesh (Algorithm 4.1), which assumes that all frontier points come in matching pairs. In the implementation, we use a greedy matching strategy based on the non-projective-invariant

170 distance function      ⊥   ⊥  dist(xi, xj)= dist (xi,Fjixj) + dist (xj,Fijxi) , where dist⊥(x, l) is the perpendicular distance between the point x and the line l.Ineach round, we match the pair having the minimum value of dist(xi, xj), stopping either when all the points have been matched or when the distance values have exceeded some threshold. In case unmatched points remain when this process terminates, Algorithm 4.1 cannot be run to completion. In our experience, these failures have limited the range of inputs for which the rim mesh can be computed successfully. However, because the applicability of Algorithm 4.1 is limited a priori by the restrictive assumptions listed on p. 113, and because rim meshes provide only a limited amount of useful geometric information for reconstruction, we have not attempted to improve our implementation with more robust frontier point matching.

On the other hand, the apparent “instability” of frontier points in real input is actually an advantage when tracing intersection curves, since it implies the absence of type 1 critical points. Fortunately, noise and calibration error does not create any new special cases not described by the theory of Section 4.4.2. As shown in Figure 5.2 (b), perturbing intersection curve crossing merely creates pairs of type 2 or 3 critical points that are already handled by Algorithm 4.2. Because we do not need to recognize and process type 1 critical points, the number of special cases that must be handled by the code of Algorithm 4.2 is greatly reduced.

Overall, we have verified the general position assumption on a wide range of data: from the synthetic egg sequence (Figure 5.4), where noise is minimal and calibration error is non- existent; to the complex Steve sequence (Figure 5.17) where large numbers of critical points and complex contour shapes offer ample opportunities for near-degeneracy.

The general position assumption has served us well at several stages of visual hull com- putation. In particular, we have not experienced any combinatorial problems when finding critical points or tracing intersection curves. However, just as with matching frontier points

171 in Algorithm 4.1, matching triples of intersections points has proven to be the main source of instability. Recall from the discussion of Section 4.5.2 that each intersection point is lo- cated three times during the construction of the 1-skeleton: when clipping the intersection curve Iij against the kth view, Iik against the jth view, and Ijk against the ith view. Each intersection point is represented using the triple (u, v, w) of parameters in views i, j,andk, respectively. To match the triples, we use a greedy strategy very similar to the one used in our implementation of Algorithm 4.1. If (u1,v1,w1), (u2,v2,w2), and (u3,v3,w3) are three intersection points between the same triple of views, then the distance function for matching them is simply  |ul − um| + |vl − vm| + |wl − wm| . (l,m) ∈{(1,2), (1,3), (2,3)} Though intersection point matching has worked reliably in most cases, we have seen a few problems in our most challenging input, the Steve dataset. To be specific, beginning with

3564 unmatched points, 1179 triples were found, while 27 points remained unmatched. Unlike the rim mesh construction algorithm, the visual hull construction algorithm does not attempt to compute a globally consistent topological representation, so it does not fail because of a small number of unmatched intersection points. Our code uses a variety of simple heuristics to repair the small geometric inconsistencies caused by unmatched points.

Part of the blame for the occasional failure of matching could reside with the transfer function used for tracing the intersection curve between the first two views in the third view.

We have implemented the epipolar transfer method as derived in (2.30) and (4.28), whose potential shortcomings are discussed in Section 2.4.3 and 4.5.1, as opposed to the more robust and accurate trifocal tensor method as in (2.33) and (4.29). Though we have not seen any severe degeneracy problems with transfer, insufficient numerical accuracy in certain geometric configurations may have caused failures in matching frontier points. In the future, it will be necessary to implement the second transfer method and compare performance with the simpler method currently used.

172 5.1.3 3D Reconstruction

In order to transform the image-based representation of the visual hull output by the algorithms of Chapter 4 into a representation that is convenient for visualization, we need a method for computing 3D coordinates of points based on their projections in two or more images. Recall that our image-based representation stores pairs (u, v) and triples (u, v, w) of parameter values for points in the interior of 1-skeleton branches and intersection points, respectively. From these parameter values, we derive the coordinates of the projections of

3D points in the respective images, and to compute the original 3D positions (in a projective coordinate system, of course), we need only apply a triangulation formula. One such formula is (2.34) in Section 2.4.3. However, this formula is not very suitable for implementation for several reasons: it works only with two views, assigns asymmetric roles to the two 2D points, and requires an arbitrary choice of a line through one of the 2D points. Instead, we have implemented a simple linear method as described in Forsyth and Ponce [15]. Briefly, let X be the (unknown) coordinate vector of a point in 3D, and let xi  PiX and xj  PjX be its projections in the ith and jth image. Then we can write the projection constraints as xi × PiX =0 [xi] Pi or × X = 0 , (5.4) xj × PjX =0 [xj]×Pj where the matrix [x]× is defined in (A.2). We get a system of six linear homogeneous equa- tions in four unknowns whose least-squares solution gives us an estimate of X. This method can easily accommodate more than two cameras: we can stack as many matrices of the form

[xi]×Pi as we want. We have used (5.4) to compute the 3D positions of all points on the 3D skeleton,enablingustocreatefiguressuchas5.9.

Note. In all the data sets used as input to our program, the cameras have been metrically calibrated. For this reason, the “projective” 3D coordinates computed using (5.4) are actually

Euclidean. However, we must emphasize that we have used the metric calibration data only

173 for visualization purposes. In the computation of visual hulls, only projective-invariant quantities have been used.

5.1.4 Efficiency

Before moving on to demonstrate our results, we briefly consider the issue of computa- tional efficiency, which must be addressed if our visual hull modeling program is to be truly useful for large-scale general-purpose object modeling and reconstruction applications. As already stated in Section 4.5.3, in designing the algorithms for this thesis, we have empha- sized correctness and ease of implementation, as opposed to computational efficiency. Given these priorities, it is not surprising that our current implementation is somewhat slow. Some of the inefficiencies are inherent in the design of our algorithms, especially Algorithm 4.4.

As explained below, however, our running times can be substantially improved simply by optimizing a few primitive operations. The two most computationally expensive basic computations in visual hull construc- tion are intersecting contours with lines during intersection curve tracing and clipping, or intersecting contours with reprojected intersection curves during the construction of the 1- skeleton. In our case, we are largely able to avoid the computational overhead of intersecting contours with epipolar lines by exploiting continuity during intersection curve tracing. Recall from Section 4.4.2 that a single intersection curve branch is traced by computing a function v = V (u), where u and v are parameters of the first and second contour, respectively. We move through a range of u values in increasing order, casting epipolar lines into the second image and finding v values corresponding to the intersections of the epipolar lines with the second contour. In a single step, both u and v change only by a small amount. Moreover,

dv because we know the sign of du , we know whether v will increase or decrease when u is incremented. Thus, to find a new value V (u), we simply need to start searching at the

dv previous value of v in the direction given by the sign of du . Effectively, this approach finds

174 each new intersection in constant time. Because most line-contour intersections performed by our algorithm are incremental, this operation is not a major source of inefficiency.

The clipping operation is perhaps the most serious bottleneck in the current version of our code. We have implemented a brute-force clipping algorithm that tests each edge of the intersection curve against each edge of the contour. This approach has O(lm) running time, where l and m are the number of vertices of the contour and the intersection curve, respectively. Much better results can be achieved with an output-sensitive algorithm whose running time depends on the number of intersection points between the two curves. It is relatively straightforward to implement a sweep algorithm that runs in O (l + m + k)log(l + m) time, where k is the number of intersections between the two curves [17].

5.2 Rim Mesh Results

In this section, we show two examples of rim mesh computation first published in the preliminary version of this research [32]. Our first dataset consists of six synthetic egg images generated using OpenGL. The egg model is shown in Figure 5.3 and the silhouette images generated from this model are shown in Figure 5.4. Figure 5.5 shows the 1-skeleton of the rim mesh of the egg from three of the input viewpoints. The positions of the frontier points are calculated using the transfer equation (4.28), and the edges of the rim mesh are shown as straight segments connecting the frontier points (recall that frontier point position is the only geometric information we can recover from the rim mesh). The faces of the rim mesh are difficult to visualize, so they are not shown. The rim mesh of the egg has 24 vertices (frontier points), 48 edges, and 26 faces, obeying

Euler’s formula v −e+f = 2 (the genus of the egg is 0, so it does not appear in the formula).

In addition, we can verify the relations between vertices, edges and faces of the rim mesh as derivedinSection4.3:

e =2v, f =2+v. (5.5)

175 Figure 5.3 A synthetic egg model.

The second dataset consists of nine images taken from a 30-image turntable sequence of a gourd, courtesy of Edmond Boyer. The original gourd images are shown in Figure 5.6. Figure 5.7 shows the 1-skeleton of the rim mesh of the gourd, represented in the same way as in Figure 5.5. The rim mesh of the gourd has 96 vertices, 192 edges, and 98 faces. Once again, these numbers verify Euler’s formula and the relations (5.5). Notice that most of the vertices (frontier points) are located very close together near the top and the bottom of the gourd, as is typical in turntable sequences. Interestingly, we are able to recover a consistent topological structure of the rim mesh even though the geometric configuration of the frontier points seems rather unfavorable. Figure 5.8 shows an alternative visualization of the rim mesh. We have “flattened” the

1-skeletons of the rim meshes with the help of a publicly available graph drawing program.

The two graphs reveal the regular structure of rim crossings which is more difficult to observe in Figure 5.5 and especially Figure 5.7.

176 View 1 View 2 View 3

View 4 View 5 View 6

Figure 5.4 The egg sequence: synthetic silhouette images generated by OpenGL.

View 3 View 4 View 6

Figure 5.5 The egg sequence: the 1-skeleton of the rim mesh shown in three of the original viewpoints.

177 View 1 View 2 View 3

View 4 View 5 View 6

View 7 View 8 View 9

Figure 5.6 The gourd sequence: original images.

178 View 1 View 4 View 7

Figure 5.7 The gourd sequence: the 1-skeleton of the rim mesh in three of the original images.

7

0

1

13

14

15

12

2

3

4

5

6

20

21

16

11

22

23

8

9

10

17

18

19 The egg (v = 24, e = 48, f = 26) The gourd (v = 96, e = 192, f = 98).

Figure 5.8 An alternative visualization of the 1-skeletons of the rim mesh for the gourd and for the egg. Note that both graphs are planar, though the layout on the right has intersecting edges.

179 5.3 Visual Hull Results

In this section, we show results of the visual hull construction algorithm for four data sets: the egg (Figure 5.4, 6 views) the gourd (Figure 5.6, 9 views), the squash (Figure 5.14,

3 views) and Steve (Figure 5.17, 11 views). Table 5.1 summarizes some statistics about the size of the computed visual hulls. The synthetic egg sequence was already introduced in the previous section. Because this sequence contains a minimum of distracting geometric detail, it serves as a particularly clear and simple illustration of our algorithm. Figure 5.9 shows the 1-skeleton and the visual hull mesh rendered from three different viewpoints, and Figure 5.10 shows one of the views decomposed into six different cone strips. Finally, Figure 5.11 shows the boundaries of all six cone strips using the same method of visualization as Figure 4.15.

In Section 4.6.3, we have derived formulas for the number of vertices, edges, and faces of the visual hull of a convex object. Using an earlier, less robust version of our visual hull algorithm that attempted to implement the reasoning of Section 4.6.3 to discover the exact topological structure of the visual hull, we have determined that the visual hull of the egg has 68 vertices (24 frontier points and 44 intersection points), 114 edges, and 48 faces. We can plug in these numbers and verify the relations

v =3vr − 4 e =5vr − 6 ,f=2vr , where v, e,andf are the numbers of vertices, edges, and faces of the visual hull mesh, and vr is the number of frontier points.

Note. According to Table 5.1, the visual hull of the egg has 48 critical points, while we have just stated that the “ideal” topological visual hull has 24 frontier points. This 2-to-1 relationship is to be expected, since we have established in Section 5.1.2 that each frontier point gives rise to two critical points as a result of noise and calibration error.

180 Figure 5.9 The egg sequence. Left: three views of the 1-skeleton. Intersection points are marked by squares, and critical points are marked by smaller circles. Right: the complete visual hull mesh rendered from the same viewpoints.

181 Strip 1 Strip 2 Strip 3

Strip 4 Strip 5 Strip 6

Figure 5.10 The visual hull decomposed into six cone strips. (The reference view on top is the same as the middle bottom view on Figure 5.9).

182 d d d

u u u Strip 1 Strip 2 Strip 3 d d d

u u u Strip 4 Strip 5 Strip 6

Figure 5.11 The egg sequence: cone strips visualized in udspace (u is the contour parameter and d is the distance from the camera). Critical points are shown as small circles, and intersection points are shown as larger squares.

Figure 5.12 displays the 1-skeleton and the visual hull of the gourd computed from the nine-image sequence shown in Figure 5.6. The model produced by our program is of fairly high quality: the triangle mesh is closed at the top and the bottom of the gourd, even though the geometry of the visual hull is somewhat “ugly” in these areas, with many closely-spaced intersection points and very short edges. Figure 5.13, which shows the nine cone strips making up the visual hull, allows us to get a better look at the detailed structure of the model.

183 Figure 5.12 The gourd sequence: the 1-skeleton (left) and the visual hull mesh (right).

184 Strip 1 Strip 2 Strip 3

Strip 4 Strip 5 Strip 6

Strip 7 Strip 8 Strip 9

Figure 5.13 The visual hull mesh of the gourd (top, boxed) decomposed into the nine constituent strips.

185 Squash Egg Gourd Steve (Figure 5.14) (Figure 5.4) (Figure 5.6) (Figure 5.17) Views 3 6 9 11 Critical points 16 48 21 1008 Intersection points 4 44 77 1179 Edges 13 114 128 2760 Model vertices 5825 8588 5070 101978 Model triangles 5717 8540 4903 98958

Table 5.1 Model statistics. Rows 2-5 refer to the combinatorial representation of the visual hull surface, and rows 6-8 refer to the size of final 3D triangle models (e.g., the model displayed on top of Figure 5.10).

Another sample data set is the squash (courtesy of Steve Sullivan), reproduced in Figure 5.14. Because this set consists of only three images, it gives rise to a particularly simple visual hull having only 4 intersection points and 13 edges. The 1-skeleton and the visual hull mesh of the squash are shown in Figure 5.15, and the three cone strips in Figure 5.16. Despite its relatively simple geometric structure, the reconstructed model is a good representation of the original object.

View 1 View 2 View 3

Figure 5.14 The squash sequence.

186 Figure 5.15 The squash sequence: the 1-skeleton (left) and the visual hull mesh (right). .

187 Strip 1 Strip 2 Strip 3

Figure 5.16 The squash sequence: cone strips.

Our last example is the Steve data set (courtesy of Steve Sullivan and ILM) composed of the 11 images shown in Figure 5.17. It is far more challenging than the other examples presented in this section. First of all, it is the only set that has contours with multiple connected components (Figure 5.18). Second, the geometric complexity of the Steve data set is an order of magnitude higher than of any of the other models, as Table 5.1 clearly shows. Whereas our program took seconds to compute the other visual hull models, it took almost a day to compute this one. Nevertheless, we can see from Figures 5.19 and 5.20 that our algorithm was successful. The quality of the visual hull mesh is on par with the quality of the meshes computed for the other examples, and the artifacts that exist (e.g., the small isolated component next to the right hand that can be seen in the right-most column of Figure 5.19) are not mistakes made by the algorithm, but reflect the actual shape of the visual hull given the silhouettes and the camera positions. Finally, Figure 5.21 illustrates the incremental process of computing the visual hull mesh. The figure contains snapshots of the visual hull computed by intersecting the first 3, 5, and 7 visual cones. The reconstruction based on 7 cameras forms a fairly good approximation of the 3D object, and is very close to the full reconstruction of Figure 5.19.

188 View 1 View 2 View 3 View 4

View 5 View 6 View 7 View 8

View 9 View 10 View 11

Figure 5.17 The Steve sequence.

189 View 3 View 4

Figure 5.18 Two of the silhouette images for the Steve sequence: small holes in the silhou- ette give rise to multiple connected components of the contour.

190 Figure 5.19 The Steve sequence: the 1-skeleton (left) and the visual hull mesh (right).

191 Figure 5.20 The Steve sequence: two additional views of the 1-skeleton and the visual hull.

192 3 cameras

5 cameras

7 cameras

Figure 5.21 The Steve sequence: incremental construction of the visual hull.

193 CHAPTER 6

Conclusion

6.1 Summary

The research presented in this thesis has been motivated by the following observation:

The visual hull is a purely projective construction: its geometric and combinato-

rial structure is determined by contacts of lines with a smooth surface, and these

contacts can be characterized using only the tools and techniques of projective geometry.

Our main goals were to produce a precise and mathematically rigorous description of the visual hull in terms of projectively invariant features, and to design algorithms to compute the visual hull using only projective information. To achieve these goals, it has been necessary to study two extensions to the standard projective framework: oriented projective geometry

(OPG) and projective differential geometry (PDG). OPG allows us to operate with useful geometric entities and notions that are absent in standard projective geometry: ray, segment, interval, convex/concave, front/back, in- side/outside. Chapter 2 deals with two major topics: OPG (Sections 2.1-2.2) and the ap- plication of OPG to single- and multi-view camera geometry (Sections 2.3-2.5). We have rederived a number of textbook results in the oriented setting, and introduced formulas for transfer (Section 2.4.3) that to our knowledge have not been published before. We ex-

194 pect these results to prove useful to researchers interested in using OPG in computer vision applications.

Chapter 3 is essentially a tutorial on PDG. PDG offers the right framework for dealing with the problem of reconstructing smooth 3D objects based only on projective information.

In this framework, we cannot use many familiar Euclidean notions: for example, we cannot construct surface normals, find principal directions, or measure . Nevertheless, as shown in Section 3.2.6, some important properties can be traced back to projective geometry. One such property is local shape of a point on a smooth surface (elliptic, hyperbolic, or parabolic). In Section 3.3 of Chapter 3, we integrate OPG and PDG by showing how to orient curves and surfaces in 2D and 3D. In Chapter 4, we use the theoretical framework laid out in Chapters 2 and 3 in order to describe the properties of visual hulls and to design several algorithms for constructing visual hulls and related data structures. The early part of the chapter contains one interesting theoretical result: Proposition 4.3 gives a projective proof of a corollary of Koenderink’s famous theorem relating the local shape of a point on the rim to the sign of the curvature of its projection on the apparent contour. The rest of the chapter introduces two data structures, the rim mesh and the visual hull mesh, and gives algorithms for their construction. The rim mesh is a decomposition induced on the surface by the set of rims associated with the input cameras. If the input data meets a certain set of restrictive assumptions, then the topological structure of the rim mesh may be reconstructed based only on 2D data. While the rim mesh describes the original surface, the visual hull mesh describes the boundary of the visual hull. The edges of this mesh are pieces of intersection curves between pairs of visual cones, and the algorithm for tracing these intersection curves is given in Section 4.4.

Section 4.5 describes how to compute the 1-skeleton, or the set of edges of the visual hull, and Section 4.6 completes the chapter by showing how to find the faces of the visual hull and to simplify their shape for easier visualization and 3D rendering.

195 Finally, in Chapter 5, we describe our implementation of the algorithms introduced in

Chapter 4 and present results on synthetic and real data. Examples of successfully computed rim meshes and visual hulls shown in Sections 5.2 and 5.3 help to validate the correctness of our algorithms. Though the results obtained by our preliminary implementation are encouraging, we have identified many directions for potential improvement. Several of these directions have already been discussed in Section 5.1, including the development of a more efficient and robust implementation. Next, we will discuss long-term extensions that go beyond the scope of the present thesis.

6.2 Future Work

Our research can be extended in several ways, both theoretical and practical. We con- clude this thesis by highlighting a few of the most promising future directions.

Improving visual hull meshes. Visual hulls are not always “pretty” geometric models, nor are they completely faithful reconstructions of the original 3D objects. As discussed in the Introduction, visual hulls are intrinsically incapable of reproducing concavities and other fine details of surface geometry. In addition, visual hull models computed from a finite number of views tend to suffer from a number of geometric artifacts, including “floating” connected components and protrusions in areas that were occluded in the input set of images.

Thus, visual hull meshes usually require some post-processing to make them more suitable for graphics applications such as texturing and simplification. In particular, the triangle meshes produced at the last stage of our reconstruction program consist mostly of “slivers” — very long and thin triangles whose shape approximates the shape of ray intervals that make up the cone strips. To eliminate these slivers, we plan to experiment with several methods for resampling the meshes, or with alternative methods of triangulating cone strips.

196 Because of the intrinsic limitations of visual hulls, several existing reconstruction systems use visual hulls as a rough initial approximation to the desired model, to be refined later through various optimization stages [10, 56]. One promising approach involves optimization based on the texture in the interior of the object. We are interested in refining the initial visual hull using the technique of model-based stereo introduced by Debevec et al. [11]. The main idea behind model-based stereo is illustrated in Figure 6.1.

X Actual surface ` Approximate X surface

x i x ` j xj

O Oi j

Figure 6.1 Model-based stereo.

Model-based stereo uses the process of reprojection:ifxi is a point in the ith view, we can back-project it onto the point Xˆ on the approximate surface, and then project Xˆ onto xˆj in the jth view. To determine whether Xˆ could be a true 3D point, we can compare the intensity patterns in the neighborhoods of xi and xˆj using cross-correlation or some other similarity function. If the similarity between the two neighborhoods is not sufficiently high, we can search for a better surface point by moving Xˆ along the visual ray through the camera center Oi, obtaining new reprojected points on the same epipolar line in the jth view until a maximum of the similarity function is reached — presumably, when Xˆ reaches

197 X, the point on the true surface. It is possible to use the special properties of the visual hull to obtain useful constraints for this search. For example, since the visual hull is guaranteed to contain the object, it is sufficient to search in one direction of the epipolar line — the direction that corresponds to moving the 3D point farther away from the camera. In short, we believe that model-based stereo is a promising method for refining the visual hull to produce higher-quality geometric models.

Acquiring reflectance properties. While visual hulls address the problem of recon- structing the shape of 3D objects, they cannot capture surface properties like texture and reflectance. State of the art applications of image-based rendering require not only geome- try, but also an estimate of the surface light field or surface reflectance field — information that would allow the object to be rendered not only from novel viewpoints, but also under novel lighting conditions. One recent example in this line of research is the opacity hull of

Matusik et al. [40]. The opacity hull consists of two parts: the geometry, provided by the visual hull extended with transparency mattes; and reflectance data, acquired from images of the object taken from different viewpoints and using moving light sources. Adding trans- parency to the visual hull enables the system to capture objects that cannot be successfully represented using standard visual hull technology (e.g., objects with fine-scale features like fur), and improves the appearance of rendered images. The philosophy underlying opacity hulls is different from the philosophy of methods that attempt to improve the quality of reconstructed models by producing more accurate geometric meshes. Namely, one may be able to create high-quality renderings with a relatively crude geometric model, as long as the shortcomings of the model are compensated for by detailed texture and reflectance data. In our future work, we would like to explore this idea by shifting the focus from geometry-based to appearance-based reconstruction techniques.

198 Alternative models for reconstruction. The discussion of appearance-based models brings us to the more general subject of different model classes for reconstruction. In par- ticular, we are interested in models that can benefit from the oriented multi-view framework developed in Chapter 2. We have recently begun to study a promising modeling method that represents objects as collections of planar patches [52]. Like visual hulls, this method utilizes information about a sparse set of object features, only this information is gathered not from outlines, but from points in the interior of the silhouette. Multi-view structure and motion techniques, combined with image processing machinery developed in the context of affine- invariant interest point detection [42], allow us to obtain estimates of surface tangent planes at a sparse set of points with highly distinctive intensity patterns. An attractive feature of the patch-based method is that it is capable of automatically recovering camera motion parameters, so there is no need for prior calibration. We are interested in incorporating patch-based constraints into our current silhouette-based reconstruction system to provide camera projection matrices for visual hull construction, as well as alternative estimates of surface geometry for model refinement.

At this stage, the patch-based approach is somewhat limited, since it assumes affine camera projection. To increase the versatility of the approach, we would like to extend it to perspective camera projection, and to make it more robust by incorporating orientation consistency constraints for planar patches.

Object recognition. Historically, reconstruction and recognition of real-world objects have been two dominant threads of research in computer vision. While this thesis deals exclusively with silhouette-based reconstruction, we have also carried out research focusing on silhouette-based recognition [33]. We have developed a method for determining whether a given pair of silhouettes can come from two different views of the same 3D object. The geometric framework for this matching problem uses several of the constructions discussed

199 at length in this thesis, most notably epipolar tangencies and frontier points. The existing approach, which relies on Euclidean information about the camera, could be made more elegant by incorporating oriented and differential invariants discussed in Chapters 2 and 3. In this thesis, and in our earlier work on recognition, we have developed a framework that captures the geometric constraints between different views of a single object instance. A far more general problem is that of category-based object recognition. What geometric features derived from image data would allow us to classify pictures drawn from a semantic object category (e.g. cars, chairs, faces, badgers)? We expect that future modeling methods capable of “cracking” category-based recognition will be quite a bit different from reconstruction methods like the one described in this thesis. Nevertheless, in developing mathematically sound algorithms that reason about invariant properties of 3D shape, we have followed a methodology that should prove useful for recognition as well as for reconstruction.

200 APPENDIX A

Oriented Formulas

In the next two sections, we list oriented formulas for manipulating flats in 2D and 3D. Refer to Section 2.2.3 for our conventions for representing flats. The general framework for deriving similar formulas in arbitrary dimensions can be found in Stolfi [55, Chapter 19, 20].

A.1 Formulas for T2 Points Lines T T x =(x1,x2,x3) l =(l1,l2,l3) T T y =(y1,y2,y3) m =(m1,m2,m3)

l1 = x2y3 − x3y2 l2 = x3y1 − x1y3 (A.1) l3 = x1y2 − x2y1 l = x ∨ y In matrix form: anti-commutative ⎛ ⎞ 0 −x3 x2 ⎝ ⎠ l =[x]×y , [x]× = x3 0 −x1 . (A.2) −x2 x1 0

x1 = l2m3 − l3m2 x2 = l3m1 − l1m3 (A.3) − x = l ∧ m x3 = l1m2 l2m1 anti-commutative In matrix form: x =[l]×m .

201 ∨ u = l x u = l1x1 + l2x2 + l3x3 (A.4) commutative

σ = l x σ =sgn(l1x1 + l2x2 + l3x3)(A.5) commutative

A.2 Formulas for T3 Points Planes T T X =(x1,x2,x3,x4) P =(p1,p2,p3,p4) T T Y =(y1,y2,y3,y4) Q =(q1,q2,q3,q4) T T Z =(z1,z2,z3,z4) R =(r1,r2,r3,r4)

Lines T L =(l12,l13,l14,l23,l24,l34) T M =(m12,m13,m14,m23,m24,m34)

l12 = x1y2 − x2y1 l13 = x1y3 − x3y1 ∨ l14 = x1y4 − x4y1 L = X Y (A.6) anti-commutative l23 = x2y3 − x3y2 l24 = x2y4 − x4y2 l34 = x3y4 − x4y3

l12 = p3q4 − p4q3 l13 = p4q2 − p2q4 ∧ l14 = p2q3 − p3q2 L = P Q (A.7) anti-commutative l23 = p1q4 − p4q1 l24 = p3q1 − p1q3 l34 = p1q2 − p2q1

202      x2 y2 z2    p1 = −  x3 y3 z3     x4 y4 z4     x1 y1 z1    p2 =  x3 y3 z3     x4 y4 z4  P = X ∨ Y ∨ Z   (A.8)  x1 y1 z1    p3 = −  x2 y2 z2     x4 y4 z4     x1 y1 z1    p4 =  x2 y2 z2    x3 y3 z3

p1 = −l23x4 + l24x3 − l34x2 p2 = l13x4 − l14x3 + l34x1 (A.9) p3 = −l12x4 + l14x2 − l24x1 p4 = l12x3 − l13x2 + l23x1 In matrix form: ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ p1 0 −l34 l24 −l23 x1 P = L ∨ X ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ p2 ⎟ ⎜ l34 0 −l14 l13 ⎟ ⎜ x2 ⎟ commutative ⎜ ⎟ = ⎜ ⎟ ⎜ ⎟ . ⎝ p3 ⎠ ⎝ −l24 l14 0 −l12 ⎠ ⎝ x3 ⎠ p4 l23 −l13 l12 0 x4

If L = Q ∧ R, then the above formula can be written as

P =(Q ∧ R) ∨ X =(RQT − QRT )X . (A.10)

203      p2 q2 r2    x1 =  p3 q3 r3     p4 q4 r4     p1 q1 r1    x2 = −  p3 q3 r3     p4 q4 r4  X = P ∧ Q ∧ R   (A.11)  p1 q1 r1    x3 =  p2 q2 r2     p4 q4 r4     p1 q1 r1    x4 = −  p2 q2 r2    p3 q3 r3

x1 = l12p2 + l13p3 + l14p4 x2 = −l12p1 + l23p3 + l24p4 (A.12) x3 = −l13p1 − l23p2 + l34p4 x4 = −l14p1 − l24p2 − l34p3 In matrix form: ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ x1 0 l12 l13 l14 p1 X = L ∧ P ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ x2 ⎟ ⎜ −l12 0 l23 l24 ⎟ ⎜ p2 ⎟ commutative ⎜ ⎟ = ⎜ ⎟ ⎜ ⎟ . ⎝ x3 ⎠ ⎝ −l13 −l23 0 l34 ⎠ ⎝ p3 ⎠ x4 −l14 −l24 −l34 0 p4

If L = Y ∨ Z, then the above formula can be written as

X =(Y ∨ Z) ∧ P =(YZT − ZY T )P . (A.13)

∨ u = P X u = p1x1 + p2x3 + p3x3 + p4x4 (A.14) anti-commutative

σ = P X σ =sgn(p1x1 + p2x3 + p3x3 + p4x4) (A.15) anti-commutative

204 u = l12m34 − l13m24 + l14m23 + l23m14 − l24m13 + l34m12 (A.16)

In matrix form:

∨ ∗T ∗ T u = L M u = L M , L =(l34, −l24,l23,l14, −l13,l12) . (A.17) commutative If L is the coordinate vector of a valid line, we must have

l12l34 − l13l24 + l14l23 =0. (A.18)

σ = L M σ =sgn(l12m34 − l13m24 + l14m23 + l23m14 − l24m13 + l34m12) (A.19) commutative

A.3 Algebraic and Infinitesimal Properties of Join and Meet

By examining the formulas shown in Sections A.1 and A.2, it is easy to confirm the bilinearity of meet and join operations. The general principle follows from the elegant study of the double or Grassmann-Cayley algebra [6],[14, Chapter 3].

Proposition A.1 (Bilinearity of Join and Meet).

1. Let λ be any scalar and f and g be any coordinate (coefficient, Pl¨ucker) vectors of two

flats F and G such that the operation F ∨ G (resp. F ∧ G)isdefined.Then

(λf) ∨ g = f ∨ (λg)=λ(f ∨ g) ,

(λf) ∧ g = f ∧ (λg)=λ(f ∧ g) .

2. Let f = f1 + f2.Then

(f1 + f2) ∨ g = f1 ∨ g + f2 ∨ g ,

(f1 + f2) ∧ g = f1 ∧ g + f2 ∧ g .

205 3. Let g = g1 + g2.Then

f ∨ (g1 + g2)=f ∨ g1 + f ∨ g2 ,

f ∧ (g1 + g2)=f ∧ g1 + f ∧ g2 .

Note that the scalar multiplication operation in property 1, and the addition in properties 2 and 3 are not geometric operations, but operations on coordinate vectors in the vector space of their respective dimension. Even though it is not geometrically meaningful to add flats or multiply them by scalars, it is perfectly legal to express the coordinate vector of a flat using a linear combination of several other coordinate vectors. The next proposition is useful in the setting of Chapter 3, where we deal with flats whose coordinate vectors change smoothly as a function of one or two scalar variables (e.g., a tangent line to a curve or a tangent plane to a surface).

Proposition A.2 (Product Rule for Join and Meet). Let f and g be as in Prop. A.1, except that now they are also smooth vector-valued functions of one or more scalar variables.

Then the join and meet operations are compatible with the product rule for differentiation,

(f ∨ g) = f  ∨ g + f ∨ g ,

(f ∧ g) = f  ∧ g + f ∧ g , where the prime denotes (partial) differentiation with respect to the same variable.

Proposition A.2 is used in the proof of Proposition 3.11 in Chapter 3.

206 REFERENCES

[1] P. Agarwal and M. Sharir, “Arrangements and Their Applications,” in Handbook of Computational Geometry (J. Sack and J. Urrutia, eds.), North-Holland, Amsterdam, 2000.

[2] B. Baumgart, “Geometric Modeling for Computer Vision,” Ph. D. Thesis (Tech. Report AIM-249), Stanford University, 1974.

[3] B. Baumgart, “A Polyhedron Representation for Computer Vision,” in National Com- puter Conference, 1975, pp. 589-596.

[4] E. Boyer and M. Berger, “3D Surface Reconstruction Using Occluding Contours,” Int. Journal of Computer Vision, 22(3), 1997, pp. 219-233.

[5] J. Canny, “A Computational Approach to Edge Detection,” IEEE Trans. On Pattern Analysis and Machine Intelligence, 8, 1986, pp. 679-698.

[6] S. Carlsson, “The Double Algebra: An Effective Tool for Computing Invariants in Com- puter Vision,” in Applications of Invariance in Computer Vision: Joint European-U.S. Workshop, 1993.

[7] R. Cipolla and A. Blake, “Surface Shape from the Deformation of Apparent Contours,” Int. Journal of Computer Vision, 9(2), 1992, pp. 83-112.

[8] R. Cipolla, K. Astrom, and P.J. Giblin, “Motion from the Frontier of Curved Surfaces,” in Proc. IEEE Int. Conf. on Computer Vision, 1995, pp. 269-275.

[9] R. Cipolla and P.J. Giblin, Visual Motion of Curves and Surfaces, Cambridge University Press, Cambridge, 1999.

[10] G. Cross and A. Zisserman, “Surface Reconstruction from Multiple Views Using Ap- parent Contours and Surface Texture,” in NATO Advanced Research Workshop on Con- fluence of Computer Vision and Computer Graphics, 2000, pp. 25-47.

[11] P. Debevec, C. J. Taylor, and J. Malik, “Modeling and Rendering Architecture from Photographs: A Hybrid Geometry and Image-based Approach,” in Proc. SIGGRAPH, 1996, pp. 11-20.

207 [12] M. do Carmo, Differential Geometry of Curves and Surfaces, Prentice-Hall, Englewood Cliffs, New Jersey, 1976.

[13] O. Faugeras and L. Robert, “What Can Two Images Tell Us About a Third One?”, Int. Journal of Computer Vision, 18(1), 1996, pp. 5-19.

[14] O. Faugeras, Q. Luong, and T. Papadopoulo, The Geometry of Multiple Images,MIT Press, Cambridge, Massachusetts, 2001.

[15] D. Forsyth and J. Ponce, Computer Vision: A Modern Approach, Prentice Hall, Upper Saddle River, NJ, 2002.

[16] P. Giblin and R. Weiss, “Reconstruction of Surfaces from Profiles,” in Proc. IEEE Int. Conf. on Computer Vision, 1987, pp. 136-144.

[17] J. Goodman and J. O’Rourke, eds. Handbook of Discrete and Computational Geometry, CRC Press, Boca Raton, FL, 1997.

[18] L. Guibas and J. Stolfi, “Primitives for the Manipulation of General Subdivisions and the Computation of Voronoi Diagrams,” ACM Transactions on Graphics, 4(2), 1985, pp. 74-123.

[19] V. Guillemin and A. Pollack, Differential Topology, Prentice-Hall, Englewood Cliffs, NJ, 1974.

[20] R. Hartley, “Chirality,” International Journal of Computer Vision, 26(1), 1998, pp. 41-61.

[21] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, Cambridge, 2000.

[22] H.-S. Heo, M.-S. Kim, and G. Elber, “The Intersection of Two Ruled Surfaces,” Computer-Aided Design 31, 1999, pp. 33-50.

[23] C. Hoffmann, Geometric and Solid Modeling: an Introduction, Morgan Kaufmann Pub- lishers, Inc., San Mateo, CA, 1989.

[24] J. Koenderink, “What Does the Occluding Contour Tell Us About Solid Shape?”, Per- ception, 1984, pp. 321-330.

[25] J. Koenderink, Solid Shape, MIT Press, Cambridge, MA, 1990.

[26] K. Kutulakos and C. Dyer, “Recovering Shape by Purposive Viewpoint Adjustment,” Int. Journal of Computer Vision, 12(2/3), 1994, pp. 113-136.

208 [27] K. Kutulakos and S. Seitz, “A Theory of Shape by Space Carving,” in Proc. IEEE Int. Conf. on Computer Vision, 1999, pp. 307-314.

[28] E. Lane, Projective Differential Geometry of Curves and Surfaces, The University of Chicago Press, Chicago, IL, 1932.

[29] A. Laurentini, “The Visual Hull Concept for Silhouette-based Image Understanding,” IEEE Trans. on Pattern Analysis and Machine Intelligence, 16(2), 1994, pp. 150-162.

[30] A. Laurentini, “The Visual Hull of Curved Objects,” in Proc. IEEE Int. Conf. on Computer Vision, 1999, pp. 356-361.

[31] S. Laveau and O. Faugeras, “Oriented Projective Geometry for Computer Vision,” in Proc. European Conference on Computer Vision, 1996, pp. 147-156.

[32] S. Lazebnik, E. Boyer, and J. Ponce, “On Computing Exact Visual Hulls of Solids Bounded by Smooth Surfaces,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2001, pp. 156-161.

[33] S. Lazebnik, A. Sethi, C. Schmid, D. Kriegman, J. Ponce and M. Hebert, “On Pencils of Tangent Planes and the Recognition of Smooth 3D Shapes from Silhouettes,” in Proc. of the European Conference on Computer Vision, 2002, pp. 651-665.

[34] B. Leibe, T. Starner, W. Ribarsky, Z. Wartell, D. Krum, J. Weeks, B. Singletary, and L. Hodges, “Toward Spontaneous Interaction with the Perceptive Workbench,” IEEE Computer Graphics and Applications, 20(6), 2000, pp. 54-65.

[35] B. Lok, “Avatar Advances,” Computer Graphics World, 24(2), 2001, pp. 17-20.

[36] B. Lok, “Online Model Reconstruction for Interactive Virtual Environments,” in Proc. Symposium on Interactive 3D Graphics, 2001, pp. 69-72.

[37] W. Lorensen and H. Cline, “Marching Cubes: a High-Resolution 3D Surface Construc- tion Algorithm,” in Proc. SIGGRAPH, 1987, pp. 163-170.

[38] W. Matusik, C. Buehler, R. Raskar, S. Gortler, and L. McMillan, “Image-based Visual Hulls,” in Proc. SIGGRAPH, 2000, pp. 369-374.

[39] W. Matusik, C. Buehler, and L. McMillan, “Polyhedral Visual Hulls for Real-Time Rendering,” in Proc. Twelfth Eurographics Workshop on Rendering, 2001, pp. 115-125.

[40] W. Matusik, H. Pfister, A. Ngan, P. Beardsley, R. Ziegler, and L. McMillan, “Image- Based 3D Photography Using Opacity Hulls,” in Proc. SIGGRAPH, 2002, pp. 427-437.

209 [41] P. Mendonca, K. Wong, and R. Cipolla, “Camera Pose Estimation and Reconstruc- tion from Image Profiles under Circular Motion,” in Proc. European Conf. on Computer Vision, 2000, pp. 864-877.

[42] K. Mikolajczyk and C. Schmid, “An Affine Invariant Interest Point Detector,” in Proc. European Conference on Computer Vision, 2002, pp. 128-142.

[43] J. Munkres, Topology, 2nd Edition, Prentice Hall, Upper Saddle River, NJ, 2000.

[44] W. Niem and R. Buschmann, “Automatic Modelling of 3d Natural Objects From Mul- tiple Images,” in European Workshop on Combined Real and Synthetic Image Processing for Broadcast and Video Production, 1994.

[45] J. O’Rourke, Computational Geometry in C, 2nd ed., Cambridge University Press, Cam- bridge, 1998.

[46] J. Owen and A. Rockwood, “Intersection of General Implicit Surfaces,” in Geometric Modeling: Algorithms and Trends (G. Farin, ed.), SIAM Publications, Philadelphia, 1987.

[47] S. Petitjean, J. Ponce, and D. Kriegman, “Computing Exact Aspect Graphs of Curved Objects: Algebraic Surfaces,” Int. Journal of Computer Vision, 9(3), 1992, pp. 231-255.

[48] S. Petitjean, “A Computational Geometric Approach to Visual Hulls,” Int. Journal of Computational Geometry and Applications, 8(4), 1998, pp. 407-436.

[49] J. Porrill and S. Pollard, “Curve Matching and Stereo Calibration,” Image and Vision Computing, 9(1), 1991, pp. 45-50.

[50] H. Pottmann and J. Wallner, Computational Line Geometry, Springer-Verlag, Berlin, 2001.

[51] J. Rieger, “Three-dimensional motion from fixed points of a deforming profile curve,” Optics Letters 11, 1986, pp. 123-125.

[52] F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce, “3D Object Modeling and Recog- nition Using Affine-Invariant Patches and Multi-View Spatial Constraints,” submitted to IEEE Conf. on Computer Vision and Pattern Recognition, 2003.

[53] J. Semple and G. Kneebone, Algebraic Projective Geometry, Oxford University Press, Oxford, 1952.

[54] I. Shlyakhter, M. Rozenoer, J. Dorsey, and S. Teller, “Reconstructing 3D Tree Mod- els from Intrumented Photographs,” IEEE Computer Graphics and Applications, 21(3), 2001, pp. 53-61.

210 [55] J. Stolfi, Oriented Projective Geometry: A Framework for Geometric Computations, Academic Press, San Diego, CA, 1991.

[56] S. Sullivan and J. Ponce, “Automatic Model Construction, Pose Estimation, and Ob- ject Recognition from Photographs using Triangular Splines,” IEEE Trans. on Pattern Analysis and Machine Intelligence, 20(10), 1998, pp. 1091-1096.

[57] R. Szeliski, “Rapid Octree Construction From Image Sequences,” Computer Vision, Graphics, and Image Processing: Image Understanding, 1(58), 1993, pp. 23-32.

[58] R. Vaillant and O. Faugeras, “Using Extremal Boundaries for 3-D Object Modeling,” IEEE Trans. on Pattern Analysis and Machine Intelligence, 14(2), 1992, pp. 157-173.

[59] E. Weisstein, “Second Derivative Test,” Eric Weisstein’s World of Mathematics, 1999, http://www.mathworld.wolfram.com/SecondDerivativeTest.html.

[60] T. Werner and T. Pajdla, “Cheirality in Epipolar Geometry,” Czech Technical Univer- sity Research Report, 2000.

[61] T. Werner and T. Pajdla, “Oriented Matching Constraints,” in British Machine Vision Conf., 2001, pp. 441-450.

[62] K.-Y.K. Wong and R. Cipolla, “Structure and Motion from Silhouettes,” in Proc. IEEE Int. Conf. on Computer Vision, 2001, pp. 217-222.

211