LB_Keogh Supports Exact Indexing of Shapes under Rotation Invariance with Arbitrary Representations and Distance Measures Keogh, Wei, Xi, Lee &Vlachos Come, we shall learn of the indexing of shapes Set forth these figures as I have conceived their shape…* OutlineOutline ofof TalkTalk • The utility of shape matching • Shape representations • Shape distance measures • Lower bounding rotation invariant measures with the LB_Keogh • Accuracy experiments • Efficiency experiments • Conclusions

*Paradiso Canto XVIII 85 The Utility of Shape Matching I

…discovering mimicry, clustering petroglyphs, finding unusual arrowheads, tracking fish migration, finding anomalous fruit fly wings…

1st Discord (Castroville Cornertang)

Specimen 20773

2nd Discord 1st Discord (Martindale point) Drosophila melanogaster The Utility of Shape Matching II

…automatically annotating A old manuscripts, mining medical images, biometrics, spatial mining of horned lizards, indexing nematodes…

B

B C

C A

1st Discord

ABC Shape Representations I For virtually all shape matching problems, rotation is the problem

If I asked you to group these reptile skulls, rotation would not confuse you

There are two ways to be rotation invariant

1) Landmarking: Find the one “true” rotation 2) Rotation invariant features Landmarking Best Rotation Alignment Owl Monkey Owl Monkey Orangutan (species unknown) Northern Gray-Necked • Domain Specific Landmarking Find some fixed point in your domain, eg. the nose on a face, the stem of leaf, the tail of a fish …

A B C • Generic Landmarking Find the major axis of the shape and use that as the canonical alignment Generic Landmark Alignment

Generic Landmark Alignment Best Rotation Alignment A A

B B

The only problem with landmarking is that it does not work Rotation invariant Red Howler Monkey features

Possibilities include: Orangutan Ratio of perimeter to area, fractal measures, elongatedness, circularity, min/max/mean Orangutan curvature, entropy, perimeter of (juvenile) convex hull and histograms

Borneo Orangutan

Mantled Howler Monkey Histogram

The only problem with rotation invariant features is that in throwing away rotation information, you must invariably throw away useful information We can convert shapes into a 1D signal. Thus can we remove information about scale and offset. Rotation we must deal with in …so it seemed to our algorithms… change its shape, from running lengthwise to revolving round…*

0 200 400 600 800 1000 1200 There are many other 1D representations of shape, and our algorithm can work with any of them

*Dante Alighieri.The Divine Comedy Paradiso -- Canto XXX, 90. ShapeShape DistanceDistance MeasuresMeasures

Speak to me There of the useful are but distance Euclidean three… measures Distance Dynamic Time Warping Longest Common Subsequence For the next ten slides, temporarily forget about Mantled Howler Monkey rotation invariance Alouatta palliata

Euclidean Distance Euclidean Distance works well for matching Red Howler Monkey many kinds of shapes Alouatta seniculus seniculus Dynamic Time Warping is useful for natural shapes, Lowland Gorilla which often exhibit (Gorilla gorilla graueri) intraclass variability

DTW Alignment

Mountain Gorilla (Gorilla gorilla beringei) Is man an ape or an angel? Matching skulls is an important problem

This region will not be matched

LCSS can deal LCSS with missing or Alignment occluded parts

DTW For brevity, we will only give details of Euclidean distance in this talk

However, the main point of our paper is that the same idea works for DTW and LCSS with no overhead

We will present empirical results that do show that DTW can be significantly better than Euclidean distance Euclidean Distance Metric

Given two time

C series Q = q1…qn and C = c1…cn , the Q Euclidean distance

0 10 20 30 40 50 60 70 80 90 100 between them is defined as:

n I notice that you 2 D()Q,C ≡ ∑ qi − (ci ) Z-normalized i=1 the time series first The next slide shows a useful optimization Early Abandon Euclidean Distance

During the C computation, if current calculation sum of the squared abandoned at Q this point differences between

0 10 20 30 40 50 60 70 80 90 100 each pair of corresponding data I see, because points exceeds r2 , we incremental value can safely abandon the is always a lower calculation bound to the final value, once it is Abandon all hope greater than the ye who enter here best-so-far, we may as well abandon Most indexing techniques work by grouping objects into logical units, and defining a lower bound distance to the units Here we will use “wedges” as the logical For example, for indexing unit, and LB_Keogh as cities we can use MBRs and the lower bound the classic MIN-DIST distance function of Guttman Wedge Suppose two C1 shapes get converted to C2 time series…

U

W L

Having candidate sequences C1, .. , Ck , we can form two new sequences U and L :

Ui = max(C1i , .. , Cki ) Li = min(C1i , .. , Cki )

They form the smallest possible bounding envelope that encloses sequences C1, .. , Ck.

We call the combination of U and L a wedge, and denote a wedge as W. W = {U, L} A lower bounding C1 measure between an

C2 arbitrary query Q and the set of candidate sequences contained in U a wedge W, is the LB_Keogh W L

Q W ⎧(q −U ) 2 if q >U n i i i i ⎪ 2 LB _ Keogh(Q,W ) = ∑ ⎨ (qi − Li ) if qi < Li i=1 ⎪ ⎩ 0 otherwise Generalized Wedge

•Use W(1,2) to denote that a wedge is built from sequences C1 and C2 . • Wedges can be hierarchally nested. For

example, W((1,2),3) consists of W(1,2) and C3 .

C1 (or W1 ) C2 (or W2 ) C3 (or W3 ) Of course, fatter wedges mean looser lower bounds…

W(1, 2)

Q W(1,2)

W((1, 2), 3) Q W((1,2),3) We are finally ready to explain our idea for rotation invariance, an idea we have sidestepped to this point. Suppose we have a shape as before…

We can create every possible rotation of the shape, by considerer every possible circular shift of the time series, as shown at my left... But we already know how to index such time series by using wedges! We just need to figure out the best wedge making policy.. It sucks being a grad student Hierarchal Clustering

C3 (or W3)

W3 W3 W3

C5 (or W5) W((2,5),3)

W2

W(2,5) W(2,5) W(((2,5),3), (1,4))

C2 (or W2)

W5

C (or W ) 4 4 W1 W1

W(1,4)

C1 (or W1) W4 W4

K = 5 K = 4 K = 3 K = 2 K = 1

Which wedge set to choose ? Once we have all possible rotations of all the objects we want to index inserted into wedges, we can simply use any LB_Keogh indexer Since the introduction of LB_Keogh indexing at this conference 4 years ago, at least 50 groups around the world have used/extended/adapted the idea, making this work easily reimplementable What are the disadvantages of using LB_Keogh?

There are Nun "LB_Keogh has provided a convincing lower bound" T. Rath "LB_Keogh can significantly speed up DTW.". Suzuki "LB_Keogh is the best…". Zhou & Wong "LB_Keogh offers the tightest lower bounds". M. Cardle. "LB_Keogh makes retrieval of time-warped time series feasible even for large data sets". Muller et. al. "LB_Keogh can be effectively used, resulting in considerably less number of DTW computations." Karydis "exploiting LB_Keogh, we can guarantee indexability". Bartolini et. al. "LB_Keogh, the best method to lower bound.." Capitani. "LB_Keogh is fast, because it cleverly exploits global constraints that appear in dynamic programming" Christos Faloutsos. By using the LB_Keogh framework, we can leverage off the wealth of work in the literature AllAll ourour ExperimentsExperiments areare Reproducible!Reproducible! People that do irreproducible experiments should be boiled alive

Agreed! All our data is publicly available

www.cs.ucr.edu/~eamonn/shape/ WeWe testedtested onon manymany diversediverse datasetsdatasets

ĩ …and I recognized Leaf of mine, in whom I found pleasure

¥ Acer circinatum the face (Oregon Vine Maple)

…as a fish dives …the shape of that cold through water ₤ which stings and lashes people with its tail *

*Purgatorio -- Canto IX 5, ¥Purgatorio -- Canto XXIII, ₤Purgatorio -- Canto XXVI, ĩParadiso -- Canto XV 88 Classes Instances Euclidean DTW Error Other Techniques Name Error (%) (%) {R}

Face 16 2240 3.839 3.170{3}

Swedish 15 1125 13.33 10.84{2} Leaves

Chicken 5 446 19.96 19.96{1} 20.5 Discrete strings

MixedBag 9 160 4.375 4.375{1} Chamfer 6.0, Hausdorff 7.0

OSU Leaves 6 442 33.71 15.61{2}

Diatoms 37 781 27.53 27.53{1} 26.0 Morphological Curvature Scale Spaces

Plane 7 210 0.95 0.0{3} 0.55 Markov Descriptor

Fish 7 350 11.43 9.71{1} 36.0 Fourier /Power Cepstrum

Note that DTW is sometimes worth the little extra effort Implementation details should not matter, for example the results reported should be the same if reimplemented in Ret Hat Linux

We therefore use a cost model that is independent of hardware/software/buffer size etc. See the paper for details

We compare to brute force, and were possible a Fourier based approach (it can’t handle DTW) MainMain MemoryMemory ExperimentsExperiments • Projectile point database • Increasingly larger datasets • One-nearest-neighbor queries Euclidean DTW

1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 2 0 32 3 64 5 64125 12 0 B 250 0 Bru 25 00 rute 50 00 B’ te for N 5 1000 0 FFT force Nu 10 000 Ea Forc ce umb 200 0 Earl mbe 2 4000 00 rly a e, R er o 400 000 0 y aba r of 80 000 We band =5 f ob 8 1600 Wedg ndon obj 16 dge on jects e ects i in da n dat tabas abase e (m) (m) Wedge: Euclidean Wedge: DTW 32 16 8

Heterogeneous

4

F r a c t i o n

o f

o b j e c t s

r e t r i e v e d 32

y

16 t i l a

8 n o i s n 4 e

m i

• point/Heterogenous Projectile databases • Increasingly large dimensionality • One-nearest-neighbor queries D 0 Indexing Experiments Indexing Experiments Projectile Points 0.1 0.12 0.08 0.06 0.04 0.02

s l n u ie h p k a S s s y e o n e k y m i n e o p o k a H s n M o o r e M m l r Lemuridae o w e Alouatta H o l w H o d e H r R d u le m (Humans) t e r n u a L il m M e ta - L g

d Papionini e in f f R u Papio – baboons Genus Mandrillus- Mandrill R Tribe

d All these are in the tribe Papionini These are in the family These are in the genus These are in the same species Homo sapiens e ll R i r n y d o e n o k a b n n a o M o B o b M y e a e v l k li B w n e O o O il n d M e l e k v c w u e O J n - d y e a k i r c k Cebidae a G e n S - Pongo d y (Hoolock a e r c a G f - i

te k i a h S

W d e i d k e r a il a S n e e B d v Cebidae (New World monkeys) e d ju r n

a n a Aotus trivirgatus White-nosed Saki, Chiropotes albinasus White-nosed

t satanas Saki, Chiropotes Black Bearded e a *Purgatorio -- XXIV 117 Canto t u e

B u g l Subfamily Aotinae Subfamily Pitheciinae sakis These are in the Genus g n a These are the same species Bunopithecus hooloc Gibbon) All these are in the family Family a m n r a le r O n o a O o b e m ib e n f r G o n k o B c b o l ib o , G o y k e H c k n lo n

o . o o n o m e e u l H il i a G n n Chlorocebus t o e d d n v e e e ju R h u Cercopithecus c y a t G e s d k n u e Cercopithecus neglectus een monkey, both of een monkey,

o Cercopithecus cephus h Cercopithecus M c m , Cercopithecus ascanius a t s y pygerythrus , Chlorocebus s ' e y ) as a e sabaceus , Chlorocebus u z z nk k M a o n r o m B a M e z n z e D a e Cercopithecini r r De Brazza's Monkey, Guenon, Mustached Monkey Green Vervet Monkey B Monkey Red-tailed G e r D o Cercopithecus Chlorocebus Cercopithecini

t Tribe … this its stock from tree was cultivated * except for the skull identified as being either a Vervet or Gr All these are in the genus which belong in the Genus of e which is in the same Tribe ( v r e V Flat-tailed Horned Lizard Phrynosoma mcallii

Dynamic Time Unlike the Warping primates, reptiles require warping…

Texas Horned Lizard Phrynosoma cornutum There is a special reason why this tree is so tall and inverted at its top*

Iguania

Alligatoridae

Crocodylidae Alligatorinae Amphisbaenia Chelonia

Al C lig aim Cr Xa El Gly Ph Ph Ph Ph Ph at a ic ntu sey p ry ry ry ry ry or n os s a tem no no no no no m cro au ia de y so so so so so iss co C T C ra vig nt s m m m m m m is d ro om ro ty il ata u a b a d a t a d a h sip ilu co is co pic is hl ra it au ou er pi s d to d a en co ma ru g na en ylu m ylu ber n rs s las n si s a s s g nie i si de s ca ch joh ii ri i si tap le ns hr ge to ac lii ni tus *Purgatorio -- Canto XXXIII 64 Petroglyphs are images incised in PetroglyphPetroglyph MiningMining rock, usually by prehistoric, • They appear worldwide peoples. They were an important form of pre-writing symbols, used • Over a million in America alone in communication from • Surprisingly little known about them approximately 10,000 B.C.E. to modern times. Wikipedia

who so sketched out the shapes there?* .. they would strike the subtlest minds with awe*

* Purgatorio -- Canto XII 6 Such complex shapes probably need DTW

FutureFuture Work:Work: DataData MiningMining

Limenitis (subset) Danaus (subset) We did not want to work on shape data mining until we could do fast Aterica galene Limenitis reducta Greta morgane Danaus plexippus matching, that would have been ass backwards

Limenitis archippus Tellervo zoilus Placidina euryanassa

Limenitis Danaus archippus plexippus .. so similar in act and coloration that I will put them both to one*

:

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 * Inferno -- Canto XXIII 29 Questions?Questions? Michail Vlachos: Public Feel free to email us with questions Nudity and Index Eamonn Keogh: Project Leader Structures [email protected] [email protected]

Li Wei: Lower Bounding Sang Hee Lee: [email protected] Anthropology and Primatology [email protected]

Xiaopeng Xi: Image Processing [email protected]