Robust Algorithms for Ob ject Lo calizati on
y
Dinesh Mano cha Aaron S. Wallack
Department of Computer Science Computer Science Division
University of North Carolina University of California
Chap el Hill, NC 27599-3175 Berkeley, CA 94720
mano [email protected] wallack@rob otics.eecs.b erkeley.edu
Abstract
Ob ject lo calization using sensed data features and corresp onding mo del features is a fun-
damental problem in machine vision. We reformulate ob ject lo calization as a least squares
problem: the optimal p ose estimate minimizes the squared error discrepancy b etween the
sensed and predicted data. The resulting problem is non-linear and previous attempts to
estimate the optimal p ose using lo cal metho ds such as gradient descent su er from lo cal
minima and, at times, return incorrect results. In this pap er, we describ e an exact, accurate
and ecient algorithm based on resultants, linear algebra, and numerical analysis, for solv-
ing the nonlinear least squares problem asso ciated with lo calizing two-dimensional ob jects
given two-dimensional data. This work is aimed at tasks where the sensor features and the
mo del features are of di erenttyp es and where either the sensor features or mo del features
are p oints. It is applicable to lo calizing mo deled ob jects from image data, and estimates the
p ose using all of the pixels in the detected edges. The algorithm's running time dep ends
mainly on the typ e of non-p oint features, and it also dep ends to a small extent on the number
of features. On a SPARC 10, the algorithm takes a few microseconds for rectilinear features,
a few milliseconds for linear features, and a few seconds for circular features.
Supp orted byFannie and John Hertz Fellowship
y
Supp orted in part by Junior FacultyAward, University ResearchAward, NSF Grant CCR-9319957 and
ARPA Contract DABT63-93-C-0048. 1
2
1 Intro duction
Mo del-based recognition and lo calization are fundamental tasks in analyzing and under-
standing two-dimensional and three-dimensional scenes [12 , 10 ]. There are many applications
for vision systems: insp ection and quality control, mobile rob ot navigation, monitoring and
surveillance. The input to such systems typically consists of two-dimensional image data or
range data. The mo del-based recognition problem is to determine which ob ject O from a set
of candidate ob jects fO ;O ;:::g b est explains the input data. Since the ob ject O can have
1 2
any orientation or lo cation, the lo calization problem is to compute the p ose p for a given
ob ject O which b est ts the sensed data.
The problems of recognition and lo calization have b een extensively studied in the vision
literature [12, 10 ]. The ma jority of the ob ject recognition algorithms utilize edge-based
data, where edges are extracted from image data using lter-based metho ds. Edge data is
either comprised of pixels whichhave exceeded some threshold, or edge segments formed by
straight line approximations of those pixels. On the other hand, ob jects are usually mo deled
as a set of non-p oint features, such as line segments, p olygons, or parameterized curves.
For this reason, lo calization algorithms must handle matching sensor features and mo del
features which are of di erenttyp es. We present an exact and robust mo del-based ob ject
lo calization algorithm for features of di erenttyp es, where either the sensed features or the
mo del features are p oints refer Figure 1. Unlike Kalman ltering based approaches to the
same problem, this algorithm is immune to problems of lo cal minima, and unlike approaches
which reduce the problem to matching p oint features which can b e solved exactly [16, 26 , 5],
this algorithm do es not intro duce error by sampling p oints from mo del features.
Object Edge Detected
Pixels
Figure 1: Given image data, this lo calization algorithm estimates the ob ject's p ose directly
from the edge detected pixels and the mo del features.
Our approachinvolves reducing the mo del-based ob ject lo calization problem b etween dis-
similar features to a nonlinear least squares error problem, where the error of a p ose is
3
de ned to b e the discrepancy b etween the predicted sensor readings and the actual sensor
readings. In this pap er, we concentrate on the ob ject lo calization problem, i.e.,we assume
that the features have already b een interpreted using a corresp ondence algorithm. We use
global metho ds to solve the problem of optimal p ose estimation. In particular these metho ds
compute the p ose that minimizes the sum of the squared errors b etween the sensed data
and the predicted data. Rather than using lo cal metho ds such as gradient descent, the
problem is formulated algebraically and all of the lo cal extrema of the sum of squared error
function, including the global minimum, are computed. For the case of matching p oints to
circular features, we use an approximation of the squared error b ecause the error cannot b e
formulated algebraically without intro ducing an additional variable for each circular feature.
The algorithm's running time mainly dep ends up on the algebraic complexity of the result-
ing system, which is a function the typ e of non-p oint features. Since the algorithm takes all
of the sensor features and mo del features into account, the running time is linear in the size
of the data set. It has b een implemented and works well in practice. In particular, ob jects
with rectilinear b oundaries are lo calized in microseconds onaSPARC10workstation. The
running time increases with the complexity of the non-p oint features: the algorithm takes
a few tens of mil liseconds to lo calize p olygonal mo dels, and one to two seconds to lo calize
mo dels with circular arcs.
1.1 Previous Work
The lo calization techniques presented in this pap er are applicable to the general eld of
mo del-based machine vision. In machine vision applications, lo calization is usually reformu-
lated as a least squares error problem. Some metho ds have fo cused on interesting sensor
features suchasvertices, and ignored the remainder of the image data [11 , 5 , 3 , 14 , 6, 7],
thereby reducing the problem to a problem of matching similar features. These approaches
are likely to b e sub optimal b ecause they utilize a subset of the data [23]. In order to utilize
all of the edge data, mo deled ob jects must b e compared directly to edge pixel data. In
the mo del-based machine vision literature, there are three main approaches for lo calizing
ob jects where the mo del features and the sensed features are of di erenttyp es: sampling
p oints from the non-p oint features to reduce it to the problem of lo calizing p oint sets; con-
structing virtual features from the p oint feature data in order to reduce it to the problem of
matching like features; estimating the ob ject's p ose by computing the error function b etween
the dissimilar feature typ es and nding the minimum using gradient descent techniques, or
4
Kalman ltering techniques.
One approach for ob ject lo calization involves sampling p oints from non-p oint features in
order to reduce it to the problem of matching p oints to p oints. The main advantage of
reducing the problem of matching p oints to non-p oint features to the problem of matching
p oint sets is that the problem is exactly solvable in constant time. For two-dimensional
rigid transformations, there are only three degrees of freedom: x; y ; [16, 26]. By decoupling
these degrees of freedom, the resultant problems b ecome linear, and are exactly solvable in
constant time. The crucial observation was that the relative p osition, x; y , of the p oint
sets could b e determined exactly irresp ective of the orientation, since the average of the x-
co ordinates and y -co ordinates was invariant to rotation. This decoupling technique is valid
for three dimensions as well [5]. To reduce the problem of matching p oints to non-p oint
features to the solvable problem of matching p oints to p oints, p oints are sampled from the
non-p oint features. This approach is optimal when ob jects are mo deled as p oint-sets, but
can b e sub optimal if ob jects are mo deled as non-p oint features, b ecause sampling intro duces
errors when the sampled mo del p oints are out of phase with the actual sensed p oints.
Another approach to ob ject lo calization for dissimilar features involves constructing vir-
tual features from sensed feature data and then computing the ob ject's p ose by matching
the like mo del features and virtual features. Such approaches solve an arti cial problem,
that of minimizing the error b etween virtual features and mo del features rather than mini-
mizing the error b etween the sensed data and the mo deled ob jects. Faugeras and Heb ert [5]
presented an algorithm for lo calizing three-dimensional ob jects using p ointwise p ositional in-
formation. In their approach, virtual features were constructed from sensed data and then,
ob jects were identi ed and lo calized from the features, rather than the sensed data. Forsyth
et al. describ ed a three-dimensional lo calization technique which represented the data by
parameterized conic sections [7]. Ob jects were recognized and lo calized using the algebraic
parameters characterizing the conic sections. Recently,Ponce, Ho ogs, and Kriegman have
presented algorithms to compute the p ose of curved three-dimensional ob jects [21, 22]. They
relate image observables to mo dels, reformulate the problem geometrically, and utilize al-
gebraic elimination theory to solve a least squares error problem. One drawback of their
approach is that they minimize the resultant equation, not the actual error function. Ponce
et al. use the Levenb erg-Marquardt algorithm for the least square solution, whichmay return
a lo cal minimum [21].
5
Another approach to ob ject lo calization involves reducing it to the correct least squares
problem, but then solving that least squares problem using gradient descent techniques.
The least squares approachinvolves formulating an equation to b e minimized; consequently,
the error function b etween a p oint and a feature segment cannot takeinto account fea-
ture b oundaries. Gradient descent metho ds require an initial search p oint and su er from
problems of lo cal minima. Huttenlo cher and Ullman used gradient descent techniques to
improve the p ose estimates develop ed from comparing sets of three mo del p oint features and
three image p oint features [14]. Many lo calization algorithms have relied on Kalman lters
to match p oints to non-p oint features. Kalman lters su er from lo cal minima problems as
well. Given a set of p oints and corresp onding non-p oint features, Ayache and Faugeras [3] at-
tempted to compute the least squares error p ose by iteration and by Kalman lters. Kalman
lter-based approaches have demonstrated success in various machine-vision minimization
applications [17, 28 , 4 ]. Kalman lter-based approaches su er from two problems: sensitivity
to lo cal minima problems, and the requirementofmultiple data p oints.
A great deal of work has b een done on matching features p oints to p oints, lines to
lines, p oints to features etc. in the context of interpreting structurefrom motion [6, 13 ].
The problem involves matching sensor features from two distinct two-dimensional views of
an ob ject; the problem of motion computation is reduced to solving systems of nonlinear
algebraic equations. In practice, the resulting system of equations is overconstrained and the
co ecients of the equations are inaccurate due to noise. As a result, the optimal solution
is usually found using gradient descent metho ds on related least squares problems. Most of
the lo cal metho ds known in the literature do not work well in practice [15] and an algorithm
based on global minimization is greatly desired.
Sparse sensors have recently demonstrated success in recognition and lo calization applica-
tions [27 , 20]. The need for accurate lo calization algorithms has arisen with the emergence
of these sparse probing sensor techniques. This technique has b een applied to recognition
and lo calization using b eam sensor probe data and results in p ose estimation with p ositional
1
o
accuracy of " and orientational accuracy of 0:3 [27 , ?].
1000
The algorithm is applicable to b oth sparse and dense sensing paradigms; in sparse sensing
applications, such as probing, a small set of data is collected in each exp eriment, whereas in
dense sensing applications, suchasmachine vision, an inordinate amount of data is collected
in each exp eriment. Sparse sensing applications necessitated development of this lo calization
6
algorithm b ecause sparse sensing strategies require an exact optimal p ose strategy. First, a
small data set cannot accommo date sampling p oints from the data set in order to transform
the problem of matching features of dissimilar typ es to matching features of similar typ es,
and second, a small data set is more susceptible to problems of lo cal minima which can arise
with lo cal metho ds such as gradient descent or Kalman ltering.
1.2 Organization
The rest of the pap er is organized in the following manner. In Section 2, we review
the algebraic metho ds used in this pap er and explain how probabilistic analysis provides a
basis for using a least square error mo del to solve the lo calization problem. We describ e
the lo calization problem and showhow algebraic metho ds can b e applied for optimal p ose
determination in Section 3. The algorithm for lo calizing p olygonal ob jects given p oint data
is describ ed in Section 4 and the algorithm for lo calizing generalized p olygonal ob jects, i.e.,
ob jects with linear and circular features is describ ed in Section 5. In Section 6, we present
various examples, and use the lo calization technique to compare the relative lo calization
accuracy of utilizing all available data compared to using only an interesting subset of the
data. Finally,we conclude by highlighting the contributions of this work.
2 Background
2.1 Error Mo del
In this section, we describ e the error mo del used for lo calization. Probability theory
provides the basis or reducing the ob ject lo calization to a least squares error problem [25].
Assuming imp erfect sensing and imp erfect mo dels, our goal is to tolerate these discrepancies
and provide the b est estimate of the ob ject's p ose. Real data presents constraints which
are inconsistent with p erfect mo dels, and for this reason, we reformulate the problem into
a related solvable problem; minimizing the squared error is a common metho d for solving
problems with inconsistent constraints, and it provides to the maximum likeliho o d estimate,
assuming that the errors noise approximate indep endent normal Gaussian distributions.
The reason we assume a normal Gaussian error distribution is the comp osition of error
from many di erent sources. Even though wemay b e using high precision, high accuracy
sensors to prob e a two-dimensional ob ject, such sensors have manyintrinsic sources of error.
Furthermore, real ob jects can never b e p erfectly mo deled, and this accounts for an additional
source of error. These error distributions de ne a conditional probability distribution on
7
the sensed data values given the b oundary p osition which dep ends on the ob ject's p ose,
equation 1. The conditional probability can b e rewritten in terms of the sum of the
squared discrepancies equation 2. The conditional probability of the p ose given the
sensor data is related to the conditional probability of the sensor data given the p ose by
Bayes' theorem equations 3,4.
The minimum sum squared error p ose corresp onds to the maximum probability maximum
likeliho o d estimate, since minimizing the negated exp onent maximizes the result of the
whole expression. Thereby, computing the least squares error p ose is analogous to p erforming
maximum likeliho o d estimation.
2 2
D iscr epancy s ;s D iscr epancy s ;s
1 2
1;pr edicted 2;pr edicted