Robust Algorithms for Object Localization
Total Page:16
File Type:pdf, Size:1020Kb
Robust Algorithms for Ob ject Lo calizati on y Dinesh Mano cha Aaron S. Wallack Department of Computer Science Computer Science Division University of North Carolina University of California Chap el Hill, NC 27599-3175 Berkeley, CA 94720 mano [email protected] wallack@rob otics.eecs.b erkeley.edu Abstract Ob ject lo calization using sensed data features and corresp onding mo del features is a fun- damental problem in machine vision. We reformulate ob ject lo calization as a least squares problem: the optimal p ose estimate minimizes the squared error discrepancy b etween the sensed and predicted data. The resulting problem is non-linear and previous attempts to estimate the optimal p ose using lo cal metho ds such as gradient descent su er from lo cal minima and, at times, return incorrect results. In this pap er, we describ e an exact, accurate and ecient algorithm based on resultants, linear algebra, and numerical analysis, for solv- ing the nonlinear least squares problem asso ciated with lo calizing two-dimensional ob jects given two-dimensional data. This work is aimed at tasks where the sensor features and the mo del features are of di erenttyp es and where either the sensor features or mo del features are p oints. It is applicable to lo calizing mo deled ob jects from image data, and estimates the p ose using all of the pixels in the detected edges. The algorithm's running time dep ends mainly on the typ e of non-p oint features, and it also dep ends to a small extent on the number of features. On a SPARC 10, the algorithm takes a few microseconds for rectilinear features, a few milliseconds for linear features, and a few seconds for circular features. Supp orted byFannie and John Hertz Fellowship y Supp orted in part by Junior FacultyAward, University ResearchAward, NSF Grant CCR-9319957 and ARPA Contract DABT63-93-C-0048. 1 2 1 Intro duction Mo del-based recognition and lo calization are fundamental tasks in analyzing and under- standing two-dimensional and three-dimensional scenes [12 , 10 ]. There are many applications for vision systems: insp ection and quality control, mobile rob ot navigation, monitoring and surveillance. The input to such systems typically consists of two-dimensional image data or range data. The mo del-based recognition problem is to determine which ob ject O from a set of candidate ob jects fO ;O ;:::g b est explains the input data. Since the ob ject O can have 1 2 any orientation or lo cation, the lo calization problem is to compute the p ose p for a given ob ject O which b est ts the sensed data. The problems of recognition and lo calization have b een extensively studied in the vision literature [12, 10 ]. The ma jority of the ob ject recognition algorithms utilize edge-based data, where edges are extracted from image data using lter-based metho ds. Edge data is either comprised of pixels whichhave exceeded some threshold, or edge segments formed by straight line approximations of those pixels. On the other hand, ob jects are usually mo deled as a set of non-p oint features, such as line segments, p olygons, or parameterized curves. For this reason, lo calization algorithms must handle matching sensor features and mo del features which are of di erenttyp es. We present an exact and robust mo del-based ob ject lo calization algorithm for features of di erenttyp es, where either the sensed features or the mo del features are p oints refer Figure 1. Unlike Kalman ltering based approaches to the same problem, this algorithm is immune to problems of lo cal minima, and unlike approaches which reduce the problem to matching p oint features which can b e solved exactly [16, 26 , 5], this algorithm do es not intro duce error by sampling p oints from mo del features. Object Edge Detected Pixels Figure 1: Given image data, this lo calization algorithm estimates the ob ject's p ose directly from the edge detected pixels and the mo del features. Our approachinvolves reducing the mo del-based ob ject lo calization problem b etween dis- similar features to a nonlinear least squares error problem, where the error of a p ose is 3 de ned to b e the discrepancy b etween the predicted sensor readings and the actual sensor readings. In this pap er, we concentrate on the ob ject lo calization problem, i.e.,we assume that the features have already b een interpreted using a corresp ondence algorithm. We use global metho ds to solve the problem of optimal p ose estimation. In particular these metho ds compute the p ose that minimizes the sum of the squared errors b etween the sensed data and the predicted data. Rather than using lo cal metho ds such as gradient descent, the problem is formulated algebraically and all of the lo cal extrema of the sum of squared error function, including the global minimum, are computed. For the case of matching p oints to circular features, we use an approximation of the squared error b ecause the error cannot b e formulated algebraically without intro ducing an additional variable for each circular feature. The algorithm's running time mainly dep ends up on the algebraic complexity of the result- ing system, which is a function the typ e of non-p oint features. Since the algorithm takes all of the sensor features and mo del features into account, the running time is linear in the size of the data set. It has b een implemented and works well in practice. In particular, ob jects with rectilinear b oundaries are lo calized in microseconds onaSPARC10workstation. The running time increases with the complexity of the non-p oint features: the algorithm takes a few tens of mil liseconds to lo calize p olygonal mo dels, and one to two seconds to lo calize mo dels with circular arcs. 1.1 Previous Work The lo calization techniques presented in this pap er are applicable to the general eld of mo del-based machine vision. In machine vision applications, lo calization is usually reformu- lated as a least squares error problem. Some metho ds have fo cused on interesting sensor features suchasvertices, and ignored the remainder of the image data [11 , 5 , 3 , 14 , 6, 7], thereby reducing the problem to a problem of matching similar features. These approaches are likely to b e sub optimal b ecause they utilize a subset of the data [23]. In order to utilize all of the edge data, mo deled ob jects must b e compared directly to edge pixel data. In the mo del-based machine vision literature, there are three main approaches for lo calizing ob jects where the mo del features and the sensed features are of di erenttyp es: sampling p oints from the non-p oint features to reduce it to the problem of lo calizing p oint sets; con- structing virtual features from the p oint feature data in order to reduce it to the problem of matching like features; estimating the ob ject's p ose by computing the error function b etween the dissimilar feature typ es and nding the minimum using gradient descent techniques, or 4 Kalman ltering techniques. One approach for ob ject lo calization involves sampling p oints from non-p oint features in order to reduce it to the problem of matching p oints to p oints. The main advantage of reducing the problem of matching p oints to non-p oint features to the problem of matching p oint sets is that the problem is exactly solvable in constant time. For two-dimensional rigid transformations, there are only three degrees of freedom: x; y ; [16, 26]. By decoupling these degrees of freedom, the resultant problems b ecome linear, and are exactly solvable in constant time. The crucial observation was that the relative p osition, x; y , of the p oint sets could b e determined exactly irresp ective of the orientation, since the average of the x- co ordinates and y -co ordinates was invariant to rotation. This decoupling technique is valid for three dimensions as well [5]. To reduce the problem of matching p oints to non-p oint features to the solvable problem of matching p oints to p oints, p oints are sampled from the non-p oint features. This approach is optimal when ob jects are mo deled as p oint-sets, but can b e sub optimal if ob jects are mo deled as non-p oint features, b ecause sampling intro duces errors when the sampled mo del p oints are out of phase with the actual sensed p oints. Another approach to ob ject lo calization for dissimilar features involves constructing vir- tual features from sensed feature data and then computing the ob ject's p ose by matching the like mo del features and virtual features. Such approaches solve an arti cial problem, that of minimizing the error b etween virtual features and mo del features rather than mini- mizing the error b etween the sensed data and the mo deled ob jects. Faugeras and Heb ert [5] presented an algorithm for lo calizing three-dimensional ob jects using p ointwise p ositional in- formation. In their approach, virtual features were constructed from sensed data and then, ob jects were identi ed and lo calized from the features, rather than the sensed data. Forsyth et al.