in Pro c. IEEE International Conference on Pattern Recognition, Brisbane, Australia,
1998
Multi-feature Hierarchical Template Matching
Using Distance Transforms
D.M. Gavrila
Daimler-Benz AG, Research and Technology
Wilhelm Runge St. 11
89081 Ulm, Germany
ture representation. Matching pro ceeds by cor- Abstract
relating the template against the DT image; the
We describe a multi-feature hierarchical algo-
correlation value is a measure of similarityinim-
rithm to eciently match N objects (templates)
age space.
with an image using distancetransforms (DTs).
Previous work on DT-based matching [1] [2]
The matching is under translation, but it can
[7] [3] [11] [5] [10] [6] has dealt with the case
cover more general transformations by generating
of matching one template against an image, al-
the various transformed templates explicitly. The
lowing certain geometrical transformations (e.g.
novel part of the algorithm is that, in addition to
translation, rotation, ane). Here we consider
acoarse-to- ne search over the translation pa-
a more general case of matching N templates
rameters, the N templates aregrouped o -line
with an image under translation. Matching of
into a template hierarchy based on their similar-
one template under more general transformations
ity. This way, multiple templates can be matched
can b e seen as a sp ecial case when all the trans-
simultaneously at the coarse levels of the search,
formed templates are generated explicitly. In ad-
resulting in various speed-up factors. Further-
dition to a coarse-to- ne searchover the trans-
more, in matching, features are distinguishedby
lation parameters, the N templates are group ed
type and separate DT's arecomputed for each
o -line into a template hierarchy based on their
type (e.g. basedonedge orientations). These
similarity. Multiple templates can b e matched
concepts are il lustrated in the application of traf-
simultaneously at the coarse levels of the search,
c sign detection.
resulting in various sp eed-up factors.
The outline of the pap er is as follows. Section
1 Intro duction
2 reviews previous work on distance transforms,
distance measures and matching strategies. Sec-
Matching is a central problem in pattern recog-
tion 3 discusses the prop osed extensions to the
nition and computer vision. A common applica-
DT matching scheme, whichinvolve the use of
tion is ob ject detection and tracking. The vari-
multiple features and an ecent match strategy
ous matching metho ds that have b een prop osed
by means of a template hierarchy. Section 4 lists
can b e distinguished bywhattyp e of features are
exp eriments in the application of trac sign de-
used [12]. At the one end there are pixel-based
tection. Finally,we conclude in Section 5.
metho ds, which t mo dels directly to ( ltered)
image pixels. At the other end there are sym-
b olic matching metho ds which op erate on a few
2 Previous Work
high-level features (e.g. parts of ob jects and their
relations) and apply graph matching metho ds to
2.1 Distance Transforms
establish corresp ondence.
A distance transform (DT) converts a binary im- In this pap er, we consider metho ds for im-
age, which consists of feature and non-feature age matching using distance transforms (DTs).
pixels, into an image where each pixel value de- Matching using DTs involves intermediate-level
notes the distance to the nearest feature pixel. features [2] which are extracted lo cally at various
DTs approximate global distances by propagat- image lo cations, e.g. edge p oints. A DT converts
ing lo cal distances at image pixels. Particular the binary image, which consists of feature and
DT algorithms dep end on a variety of factors. non-feature pixels, into a DT image where each
One factor is whether they result in a Euclidean pixel denotes the distance to the nearest feature
distance metric or not (EDTs vs. WDT) [8] [13]. pixel. Similarly, the ob ject of interest is repre-
Figure 1 illustrates a EDT. WDTs de ne vari- sented by a binary template using the same fea- ximations of the \true" Euclidean dis- ous appro Raw
Image
tance measure. One such approximation is the
chamfer-2-3 metric [1] [2] [13], used in our exp er-
feature extraction
iments. Another factor is how the distances are
Feature Feature 4.23.6 2.82.22.0 2.2 2.8 3.6 4.2 Image Template 3.6 2.8 2.2 1.4 1.0 1.4 2.2 2.8 3.6 (binary) (binary) 2.8 2.2 1.4 1.0 0.0 1.0 1.4 2.2 2.8 2.2 1.4 1.0 0.0 1.0 0.0 1.0 1.4 2.2 DT DT
1.4 1.0 0.0 1.0 1.4 1.0 0.0 1.0 1.4 correlation 1.0 0.0 1.0 1.0 1.0 1.0 1.0 0.0 1.0 DT DT 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 Image Template 1.41.0 1.0 1.0 1.0 1.0 1.0 1.0 1.4
2.22.0 2.0 2.0 2.0 2.0 2.0 2.0 2.2
Figure 2: Matching using a DT
Figure 1: A binary pattern and its Euclidean Dis-
tance Transform
distances of the template features to the near-
est features in the image. The lower these dis-
propagated over the image, whether in a raster
tances are, the b etter the matchbetween image
scan or a contour scan fashion. Most algorithms
and template at this lo cation. There are a num-
use a raster scan fashion where the propagation of
ber of matching measures that can b e de ned on
distances is in a manner indep endent of the fea-
the distance distribution. One p ossibilityisto
ture lo cations in the image, with a mask of xed
use the average distance to the nearest feature.
size and shap e. Contour scan algorithms prop-
This is the chamfer distance.
agate the distances from the feature lo cations.
X
1
Some DT approaches also weigh the distances
D (T; I) d (t) (1)
chamf er I
jT j
from features by their salience, where salientfea-
t2T
tures (e.g. edge strength, length, curvature) re-
where jT j denotes the numb er of features in T
sult in comparably lower "distance" values [10].
and d (t) denotes the distance b etween feature t
Finally, there are sequential and parallel DT al-
I
in T and the closest feature in I . The cham-
gorithms [4].
fer distance consists thus of a correlation b e-
tween T and the distance image of I ,followed
2.2 Match Measures and Strategies
by a division. Other more robust measures re-
duce the e ect of missing features (i.e. due to
Matching with DT is illustrated schematically in
o cclusion or segmentation errors) by using the
Figure 2. It involves two binary images, a seg-
average truncated distance or the f -th quantile
mented template T and a segmented image I ,
value (the Hausdor distance) [7] [11]. In ap-
whichwe'll call "feature template" and "feature
plications, a template is considered matched at
image". The "on" pixels denote the presence of
lo cations where the distance measure D (T; I)is
a feature and the "o " pixels the absence of a
below a user-supplied threshold
feature in these binary images. What the actual
features are, do es not matter for the matching
D (T; I) < (2)
metho d. Typically, one uses edge- and corner-
p oints. The feature template is given o -line for
a particular application, and the feature image Figure 3 illustrates the matching scheme of
is derived from the image of interest by feature Figure 2 for the typical case of edge features. Fig-
extraction. ure 3a-b shows an example image and template.
Figure 3c-d shows the edge detection and DT
Matching T and I involves computing the dis-
transformation of the edge image. The distances
tance transform of the feature image I . The
in the DT image are intensity-co ded; lighter col-
template T is transformed (e.g. translated, ro-
ors denote increasing distance values.
tated and scaled) and p ositioned over the result-
ing DT image of I ; the matching measure D (T; I) The advantage of matching a template (Figure
is determined by the pixel values of the DT im- 3b) with the DT image (Figure 3d) rather than
age which lie under the "on" pixels of the tem- with the edge image (Figure 3c) is that the re-
plate. These pixel values form a distribution of sulting similarity measure will b e more smo oth
computation of DT image: serial vs.
parallel, salience weighing
match measures: Euclidean vs. robust
measures, directed vs. undirected measures
matching N templates: none
global search algorithms: exhaustive vs.
(a) (b)
hierarchical (in transformation space, in im-
age resolution)
3 Extensions
3.1 Multiple Feature-Typ es: Edge
Orientation
(c) (d)
So far, no distinction has b een made regarding
the typ e of features. All features would app ear
Figure 3: (a) original image (b) template (c) edge
in one feature image (or template), and subse-
image (d) DT image
quently, in one DT image. If there are several fea-
ture typ es, and one considers the match of a tem-
plate at a particular lo cation of the DT image,
as a function of the template transformation pa-
it is p ossible that the DT image entries re ect
rameters. This enables the use of various e-
shortest distances to features of non-matching
cent search algorithms to lo ckonto the correct
typ e. The similarity measure would b e to o op-
solution, as will b e discussed shortly. It also al-
timistic, increasing the number of false positives
lows more variabilitybetween a template and an
one can exp ect from matching.
ob ject of interest in the image. Matching with
the unsegmented (gradient) image, on the other
A simple way to takeadvantage of p ossibil-
hand, typically provides strong p eak resp onses
ity to distinguish feature typ es is to use sep-
but rapidly declining o -p eak resp onses.
arate feature-images and DT images, for each
typ e. Thus having M distinct feature typ es re-
Anumb er of extensions have b een prop osed
sults in M feature images and M DT images.
to the basic DT matching scheme. Some deal
Similarly,the\untyp ed" feature template is sep-
with hierarchical approaches to improve match
arated in M \typ ed" feature templates. Match-
eciency and use multiple image resolutions [2].
ing pro ceeds as b efore, but now the match mea-
Others use a pruning [3] [7] or a coarse-to- ne
sure b etween image and template is the sum of
approach [11] in the parameter space of relevant
the match measures b etween template and DT
template transformations. The latter approaches
image of the same typ e.
take advantage of the smo oth similarity measure
asso ciated with DT-based matching; one need
Wenow consider the frequent case of the use of
not to match a template for each lo cation, ro-
edge p oints as features. For this case, we prop ose
tation or other transformation. Other extensions
the use of edge orientation as feature typ e by
involve the use of a un-directed ("symmetric")
partitioning the unit circle in M bins
similarity measure b etween image and a template
i i +1
[7] [5]. In this case, a DT is applied on b oth
f [ 2; 2 ] ji =0;:::;M 1 g (3)
M M
the image and template. Matching takes places
Thus a template edge p oint with edge orientation
with the feature image and feature template, vice
is assigned to the typ ed template with index
versa, as seen in Figure 2.
Here is a summary of various asp ects covered
b M c (4)
in past work on DT-based matching
2
We still have to account for measurement error
features: edge p oints, corner p oints
in the edge orientation and the tolerance we'll
multi-typing:none
allowbetween the edge orientation of template
and image p oints during matching. Let the abso-
distance metric:chamfer-2-3, chamfer-3-
lute measurement error in edge orientation of the 4, Euclidean
measure b etween template and image at a \cor- template and image p oints b e and , re-
T I
rect" lo cation. Let denote the distance along sp ectively. Let the allowed tolerance on the edge
the diagonal of a unit grid element. Then by orientation during matching b e . In order
tol
having to account prop erly for these quantities, a tem-
1
plate edge p oint is assigned to a range of typ ed
= + (7)
l tol l
templates, namely those with indices 2
one has the desirable prop erty that, using un-
( +) ( )
M c; :::; b M cg (5) fb
truncated distance measures suchasthechamfer
2 2
distance, one can assure that the coarse-to- ne
mapp ed cyclically over the interval 0;:::;M 1,
approach will not miss a solution. The second
with
term accounts for the (worst) case that the so-
= + + (6)
T I tol
lution lies at the center of the 4 enclosing grid
points which form a square.
For applications where there is no sign informa-
tion asso ciated with the edge orientation, a tem-
Now consider the case where the ab ove L-level
plate edge p oint is also assigned to the typ ed tem-
searchiscombined with a L-level template hier-
plates one obtains by substituting + for in
archy. Matching can b e seen as traversing the
Equation (5).
tree structure of templates. Each no de corre-
sp onds to matching a (prototyp e) template p
3.2 Matching N Templates:
with the image at no de-sp eci c lo cations. For
Template Hierarchy
the lo cations where the distance measure b e-
tween template and image is b elow user-supplied
One often encounters the problem of matching
threshold , one computes new interest lo cations
p
N templates with an image. If the N templates
for the children no des (generated by sampling the
b ear no relationship to each other, there is little
lo cal neighb orho o d with a ner grid) and adds
one can do b etter than match each of the tem-
the children no des to the list of no des to b e pro-
plates separately. If there is some structure in the
cessed. The matching pro cess starts at the ro ot,
template distribution, one can do b etter. The
the interest lo cations lie initially on a uniform
prop osed scheme to match the N related tem-
grid over relevant regions in the image. The tree
plates involves the use of a template hierarchy,
can b e traversed in breadth- rst or depth- rst
in addition to a coarse-to- ne searchover the im-
fashion. In the exp eriments, we use depth- rst
age. The idea is that at a coarse level of search,
traversal which has the advantage that one needs
when the image grid size of the search is large,
to maintain only L 1setsofinterest lo cations.
it would b e inecienttomatcheachoftheN
ob jects separately, if they are relatively similar
Let p b e the template corresp onding to the
to each other. Instead, one would group similar
no de currently pro cessed during the traversal and
templates together and representthemby a pro-
let C = ft ;:::;t g b e the set of templates cor-
1 c
totyp e template; matching would b e done with
resp onding to its children no des. Let b e the
p
this prototyp e, rather than with the individual
maximum distance b etween p and the elements
templates, resulting in a (p otentially signi cant)
of C .
sp eed-up. This grouping of templates is done
=maxD (p; t ) (8)
p i
t 2C
i
at various levels, resulting in a hierarchy, where
at the leaf levels there are the N templates one
Then byhaving
needs to match with, and on intermediate levels
1
there are the prototyp es.
= + + (9)
p tol p l
2
To make matters more concrete, consider rst
the case of a coarse-to- ne search where one one has the desirable prop erty that, using un-
matches a single template under translation. As- truncated distance measures suchasthechamfer
sume there are L levels of search(l =1; :::; L), distance, one can assure that the coarse-to- ne
determined by the size of the underlying uni- approach using the template hierarchy will not
l
form grid and the distance threshold which miss a solution. The thresholds one obtains by
l
determines when a template matches suciently Equation (9) are quite conservative, in practice
enough to consider matching on a ner grid (in one can use lower thresholds to sp eed up match-
the neighb orho o d of the promising solution). Let ing, at the cost of p ossibly missing a solution (see
denote the allowed tolerance on the distance Exp eriments). tol
4 Exp eriments Subsection 3.2. Coarse-to- ne sampling uses a
grid size of =8; 4; 1 for the three levels of
the template tree. We used distance thresholds
To illustrate the prop osed matching metho d we
=3:5; 1:35; 0:6 pixels for the three levels, re-
apply it to the detection of circular and trian-
l
sp ectively.
gular (up/down) signs, as seen on highways and
secondary roads. For the moment, we do not
The exp eriments involved b oth o - and online
consider trac signs which app ear tilted and/or
tests. O -line, we used a database of 1000 im-
skewed in the image; the only shap e parameter
ages, taken during day-time (sunny, rainy) and
considered is scale. Edge p oints are used as fea-
night-time. We obtained single-image detection
tures, further di erentiated by their edge orien-
rates of ab out 90%, when allowing solutions to
tation. The edge orientations are discretized in
deviate by 2 pixels and by radius 1 from the
8values. We use templates for circles and tri-
values obtained byahuman. Typically, there
angles with radii in the range of 7-18 pixels (the
were 4-6 false p ositives p er image (in a later
images are of size 360 by 288 pixels). This leads
pictograph classi cation phase, more than 95%
to a total of 36 templates, for which a template
of these were rejected using a RBF network).
tree is sp eci ed \manually" as in Figure 4. The
Figure 5 illustrates the followed hierarchical ap-
tree has three levels (not counting the ro ot level,
proach. The white dots indicate lo cations where
which contains no template). The ro ot no de has
the matchbetween image and a (prototyp e) tem-
six children corresp onding to two prototyp es for
plate of the template tree was go o d enough to
each of the three main shap es to b e matched: cir-
consider matching with more sp eci c templates
cle, triangle-up, triangle-down. The prototyp es
(the children), on a ner grid. The nal detec-
at the rst level of the hierarchy are simply the
tion result is also shown. More detection results
templates with radii equal to the median value of
are given in Figure 6, including some false p osi-
intervals [7-12] and [13-18], namely 9 and 15. The
tives The trac signs of the database that were
prototyp es at the second level are the templates
not detected were had lowcontrast, were tilted
with radii equal to the median value of intervals
or skewed. Improvement of the detection rate
[7-9], [10-12], [13-15] and [16-18]. Each template
can thus b e achieved in a relative straightforward
(or prototyp e) is partitioned into 8 typ ed tem-
manner, bylowering the edge threshold and by
plates based on edge orientation. The direction
adding more templates.
of the edge orientation is sp eci ed, we only search
Given image width W , image height H ,and
for circles and triangles with a "light-inside-dark-
K templates, a non-hierarchical matching algo-
outside" contour characteristic.
rithm would require W H K correlations b e-
tween template and image. In the presented hi-
erarchical approach b oth factors W H and K
y a coarse-to- ne approachinim-
Td(9) Td(15) C(15) Tu(9) Tu(15) are pruned (b
age space and in template space). It is not p os- vide an analytical expression for the
7-12 13-18 13-18 7-12 13-18 sible to pro sp eed-up, b ecause it dep ends on the actual im-
C (9)
age data and template distribution. Typically,
wehave observed sp eed-up factors in the range
C (8) C (11) of 200-400.
C (7) C (8) C (9) C (10) C (11) C (12)
5 Conclusion
C (R) : Tu (R) : Td (R) :
In this pap er we prop osed two extensions to DT- hing. The rst extension dealt with R R based matc
R
tiating the features bytyp e (i.e. by edge
2R 2R di eren
R R
orientation) and the second dealt with matching
using a template hierarchy. We observed that
Figure 4: Template hierarchy
this approach can result in a signi cant sp eed-
up when compared to the exhaustiveapproach,
in the order of two magnitudes. Some interest- Matching uses a depth-order traversal over
ing problems lie ahead regarding the automatic the template tree, in the manner describ ed by
generation of the template hierarchy. ysis and Machine Intel ligence , 10(6):849{
865, November 1988.
[3] G.E. Ford D.W. Paglieroni and E.M. Tsu-
jimoto. The p osition-orientation masking
approach to parametric search for tem-
plate matching. IEEE Transactions on Pat-
tern Analysis and Machine Intel ligence ,
16(7):740{747, 1994.
[4] H. Embrechts and D. Ro ose. A parallel eu-
clidean distance transformation algorithm.
CVIU, 63(1):15{26, January 1996.
[5] D. M. Gavrila and L. S. Davis. 3-D mo del-
(a)
based tracking of humans in action: a multi-
view approach. In IEEE Conferenceon
Computer Vision and Pattern Recognition,
pages 73{80, San Francisco, 1996.
[6] D. Huttenlo cher. Monte carlo compari-
son of distance transform based matching
measures. In ARPA Image Understanding
Workshop, pages 1179{1183, 1997.
[7] D. Huttenlo cher, G. Klanderman, and W.J.
Rucklidge. Comparing images using the
hausdor distance. IEEE Transactions on
(b)
Pattern Analysis and Machine Intel ligence ,
15(9):850{863, 1993.
Figure 5: Trac sign detection: (a) day and (b)
[8] F. Leymarie and Martin D. Levine. Fast
night (white dots denote intermediate results; the
raster scan distance propagation on the dis-
lo cations matched during hierarchical search)
crete rectangular lattice. Computer Vision,
Graphics, and Image Processing. Image Un-
derstanding, 55(1):84{94, January 1992.
[9] D. Mumford. Mathematical theories of
shap e: Do they mo del p erception? In SPIE
Vol. 1570 Geometric Methods in Computer
Vision, pages 2{10, 1991.
[10] P.L. Rosin and G.A.W. West. Salience dis-
tance transforms. GMIP, 57(6):483{521,
November 1995.
Figure 6: More detection results
[11] W. Rucklidge. Lo cating ob jects using the
hausdor distance. In International Confer-
ence on Computer Vision, pages 457{464,
References
1995.
[1] H. Barrowetal.Parametric corresp ondence
[12] P. Suetens, P.Fua, and A. Hanson. Com-
and chamfer matching: Twonewtechniques
putational strategies for ob ject recognition.
for image matching. In International Joint
ACM Computing Surveys, 24(1):6{61, 1992.
ConferenceonArti cial Intel ligence,pages
[13] E. Thiel and A. Montanvert. Chamfer
659{663, 1977.
masks: Discrete distance functions, geomet-
[2] G. Borgefors. Hierarchical chamfer match- rical prop erties, and optimization. In Inter-
ing: A parametric edge matching algo- national Conference on Pattern Recognition,
rithm. IEEE Transactions on Pattern Anal- pages 244{247, The Hague, 1992.