Local Feature Learning

Gustavo Carneiro Tutorial ICIP 2013 Local Feature Learning and Non-rigid Matching

Why Local Features? Model Test 1 Test 2

Global Local

MORE ROBUST

MORE DISTINCTIVE

2-dim feature space G 2-dim feature space L

G. Carneiro - University of Adelaide 2 Local Feature

• Limited spaal support + Robust to changes and paral occlusion - Discriminang power (compensated by # descriptors)

• Applicaons – Visual classificaon – Image matching

G. Carneiro - University of Adelaide 3 Representaon

• An image is represented by a set of N descriptors (or parts)

G. Carneiro - University of Adelaide 4 Representaon

• An image is represented by a set of N descriptors (or parts)

f1=[ ,x1,θ1,σ1] scale appearance geometry dominant orientation position 5 Representaon

• An image is represented by a set of N descriptors (or parts)

f2 = [ a2 , g2 ]

G. Carneiro - University of Adelaide 6 Representaon

• An image is represented by a set of N descriptors (or parts)

F={ f1 , f2 , f3 , ... , fN }

G. Carneiro - University of Adelaide 7 Image Matching

[Schaffalitzky & Zisserman, ECCV02]

G. Carneiro - University of Adelaide 8 Visual Classificaon

• Instance-based recognion

[Lowe,IJCV04] • Class recognion

G. Carneiro - University of Adelaide 9 Objecve Funcons

• Image matching – Maximize precision and recall of feature matching

• Instance-based recognion and class recognion – Minimize classificaon error

G. Carneiro - University of Adelaide 10 Hand-designed Local Features 90’s to 2000’s • ‘Where’ step – Repeatable

Harris “corners” [Harris88] DoG [Lowe99,04] Sum square diffs Diff of Gaussians Robust to rotaon Robust to scale • ‘What’ step – Robust – Disncve

SIFT [Lowe99,04] 128 dimensions Robust to rigid transforms and brightness 11 Quesons

• Are these hand-designed features opmal?

• Image matching and instance-based recognion (Does it maximize matching precision and recall?) – If disncve & robust then max. precision and recall

• Class recognion (Does it minimize classificaon error?) – If robust & disncve then min. classificaon error

G. Carneiro - University of Adelaide 12 Posive Evidence

viewpoint scale

[Mikolajczyk and Schmid, PAMI’05]

• SIFT-like features showed superior performance – Dominates matching and classificaon applicaons G. Carneiro - University of Adelaide 13

Matching Problem

• Building Rome in a Day [Agarwal et al. ICCV09] – Reconstruct 3D scenes from large collecon of images • Matching based on SIFT features • Image similarity also based on SIFT features

G. Carneiro - University of Adelaide 14 Class Recognion

• Bag of Features [Sivic and Zisserman, ICCV03]

SVM Classifier

• Also use SIFT features

G. Carneiro - University of Adelaide 15 However

• Good features to track [Shi and Tomasi,CVPR94] – Feature selecon based on model similarity

• Detectability, Uniqueness, and Reliability [Obah and Ikeuchi,PAMI97]

16 But these are not SIFT…

• True, but D. Lowe noced something similar about the discriminang power of SIFT

• Not all descriptors have the same discriminang power. • Can a similar thing be said about the robustness properes of the feature?

G. Carneiro - University of Adelaide 17 Explicit Characterizaon of Robustness and Discriminang Power [Carneiro and Jepson, CVPR05]

• Robustness: Pon(sf(fl,fo);fl) ˜ Pβ(sf(fl,fo);aon,bon) • Disncveness : Poff(sf(fl,fo);fl) ˜ Pβ(sf(fl,fo);aoff,boff) • Detectability: Pdet(xl) P P Feature vector #260 on off

Pdet(xl)=87%

Feature vector #540

Pdet(xl)=67%

18 Phase correlaon Train Classifier to Select and Characterize “Good” Features

• P(Obj|Match,Img) =(1/Z)P(Match|Obj,I)P(Obj|Img) 19 Selecng and Characterizing Good Features…

• Does it lead to more effecve matching?

• Does it lead to more effecve classificaon?

• Why can’t we learn the features by maximizing the actual objecve funcon? – Instead of designing and charactering individual features

G. Carneiro - University of Adelaide 20 In the Beginning…

[Rosenbla 57]

fi (x)

wi

y = sign(! fi (x)wi + b) • Features to use?

– Again, hand-designed… 21 Learning Input Features?

• Perceptron can only deal with linear problems • Mul-layer perceptron (non-linear acvaon funcons)? – Can deal with more complex problems! – Can we finally learn the input features from the image?

(1) wi (2) wi

` …

! (!viwi + b) 22 Back-propagaon [Rumelhart et al. 86] • Algorithm that allowed training of mul-layer perceptron could not handle more than 1-2 hidden layers – Long me to converge (if it converges at all)

• Back to hand-designing/selecon/ characterizaon of features…

G. Carneiro - University of Adelaide 23 Tradional Methods

• Matching Problems

Hand-designed Features Outlier Matching Rejecon Hand-designed Features

• Visual Classificaon Problems

Hand-designed Supervised Features Classifier

G. Carneiro - University of Adelaide 24 Race is on for the “Best” Hand-designed Features • No transformaon – Gray values • Frequency domain – Discrete Fourier transform (DFT), Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT) • Good Reconstrucon and Uncorrelaon – Principal Components Analysis (PCA) • Good Reconstrucon and Independence – Independent Component Analysis (ICA), sparse coding • Linear class separability – Linear Discriminant Analysis (LDA) • Gradient Orientaon Histogram – SIFT, HOG, GLOH, RIFT, etc. • Image Differenals – Local jets • Sampling and representaon variaons – Robust and disncve G. Carneiro - University of Adelaide 25 Also for the “Best” Classifiers

• Generave Classifiers – Mixture Model – Naïve Bayes

• Discriminave Classifiers – Logisc Regression – Mul-layer Perceptron – Nearest Neighbor – Support Vector Machine – Boosng –

G. Carneiro - University of Adelaide 26 Outlier Rejecon

• Random Sample Consensus (RANSAC) [Fischler and Bolles, 81; Torr and Murray, IJCV’97]

• MSAC (M-esmator) , MLESAC (max likelihood), IMPSAC (importance sampling), etc.

• Consequence of feature matching failures

• More about this later on Dr. Chin’s session

G. Carneiro - University of Adelaide 27 Feature Learning

• Using the tradional architecture, can we learn the features with the opmizaon funcon used for the classifier?

• Matching – maximize feature robustness and discriminang power

• Classificaon – minimize classificaon error

G. Carneiro - University of Adelaide 28 Feature Learning - Matching

• Acve Shape and Appearance Models [Cootes et al. 95,98]

posion x Appearance g

• Combine shape and gray level in a single PCA space • Gradient descent to perform matching

G. Carneiro - University of Adelaide 29 Feature Learning - Matching

• FERNS [M. Özuysal et al. CVPR’07] • Semi-naïve Bayes classifier:

G. Carneiro - University of Adelaide 30 D. Capel. 2009 Feature Learning - Matching

• Explicitly learn a feature transform that is – Robust and discriminang • Photo Tourism dataset [Snavely et al. SIGGRAPH’06] used by Winder et al. [Winder and Brown, CVPR’07,09 and PAMI’11] – More than 100,000 patches (3 scenes) – Backprojecng 3D points to 2D images from scene reconstrucons – Variaons in scene locaon, brightness and paral occlusion

G. Carneiro - University of Adelaide 31 Feature Learning - Matching

• Discriminave Learning of Local Features [Brown, Hua and Winder’PAMI11]

Learning carried out To maximize AUC in ROC graph

T-Block: Steerable filters, E-Block: Linear Distance metric learning Gradients, DoG, etc. N-Block: Normalizaon to account for photometric variaons S-Block: 32 (Linear) Distance Metric Learning [Chopra et al.CVPR05,Goldberger et al.NIPS04, Weinberger & Saul JMLR09]

• Image patches: • Linear transform: • Distance in T space:

G. Carneiro - University of Adelaide 33 Results [Winder and Brown’PAMI11]

• Errors at 95% (% of error when 95% of TP are found) • In parenthesis: dimensionality

G. Carneiro - University of Adelaide 34 Feature Learning - Matching

• Learn feature transforms from non-linear distance metric learning [Carneiro,CVPR’10] – Uses original image input (no sampling stage, and focus on embedding stage) – Use photo tourism dataset – Non-linear distance metric learning [Sugiyama JMLR07]

G. Carneiro - University of Adelaide 35 Combining Feature Spaces

• Breiman’s idea about ensemble classifiers [Breiman 01]: – combine low-bias, high-variance (unstable) classifiers to produce low-bias, low-variance classifiers.

• Distance

G. Carneiro - University of Adelaide 36 Intuion Unkown target problem

Small dist. Large dist.

T

G. Carneiro - University of Adelaide 37 Random training problem 1 Intuion Unkown target problem Small dist. Largedist.

T

G. Carneiro - University of Adelaide 38 Random training problem Experiments

• Using cross validaon, – 50 training classes for training each feature space – 50 training feature spaces

G. Carneiro - University of Adelaide 39 Experiments • Matching database of Mikolajczyk and Schmid

40 Feature Learning - Matching

• Convexify Brown et al.’s opmizaon funcon [Simonyan et al. ECCV’12]

• Use boosng to produce non-linear feature transform [Trzcinski et al. NIPS’12]

• More to come J, but idea is the same – Given classes of local descriptors, find transformaon that keep features from the same class together, and separate features from disnct classes

G. Carneiro - University of Adelaide 41 Back to the Classificaon Problem

• Feature selecon to minimize classificaon error – Robust Real-me Object Detecon [Viola and Jones, IJCV’01]

• Feature extracon to minimize Bayes error (BE) – Minimum BE facilitates training [Carneiro and Vasconcelos, CRV’05]

• Feature Learning – Supervised Convoluonal Networks [Lecun 90s unl today]

G. Carneiro - University of Adelaide 42 Supervised Convolutional Network [Lecun, 90s until today]

• Sparse connecvity • Shared weights Hard to train L, but top results • Max pooling (parcularly on MNIST)

Classifier

43 Hierarchical Feature Representaon [Sivic and Zisserman,ICCV03] • The current mainstream is to use a 2-layer hierarchy of features

Hand-craed Unsupervised Supervised Low-level Mid-level Classifier Features Features • The first is usually hand-designed (SIFT) – Non-linearity • The second is unsupervised (K-Means, LLC, LCC) – Pooling • Classificaon uses the representaon from the 2nd layer

G. Carneiro - University of Adelaide 44 Feature Learning

• This architecture makes the learning of the features quite hard, parcularly at the lower level

• Learning mid-level – K-means – Sparse coding – But, this does not necessarily minimize the classificaon error

G. Carneiro - University of Adelaide 45 Back to Mul-layer

• In 2002, G. Hinton found a way to train Mul-layer perceptrons with a large number of hidden layers – Contrasve divergence: unsupervised + supervised training

Cause Cause

Image Class Image Class

Tradional ML

• Resurgence of interest – Deep Belief Networks – Convoluonal Neural Networks

G. Carneiro - University of Adelaide 46 Deep Learning [Hinton 06] • Learn auto-encoder (RBM)

• Unroll

• Fine-tune

G. Carneiro - University of Adelaide 47 Deep Learning

• Hierarchical features [Zeiler et al. ICCV’11]

Image 48 Deep Learning

• Applicaon in medical image analysis [Carneiro et al., TIP’12,PAMI’13] • Top results in Le Ventricle segmentaon from ultrasound data • Jaccard Index of <= .16, compared to .24 for mulple model data associaon [Nascimento et al., TIP08] and .18 for boosng [Georgescu et al., CVPR05]

49 Deep Learning

• Learn hierarchy of representaons and minimize classificaon error – Classificaon on ImageNet [Krizhevski et al. 2012] – 1000 categories, 1.5 Million labeled images – Convoluonal net (650K neurons, 630M synapses, 60M parameters – Trained with backpropagaon on GPU and other tricks – Error rate: 15% (whenever class isn’t in the top 5) – Previous error rate: 25%

50 Deep Learning

• Scene Parsing [Farabet et al. ICML 2012] – Convoluonal Net (fully supervised training) – Top performance on S. Gould’s dataset

G. Carneiro - University of Adelaide 51 Conclusions

• Feature learning improves both matching and classificaon

• Matching – Sll uses typical features (gradient, steerable filters, etc.) and sampling. – Distance metric learning, randomized forest, boosng, etc.

• Classificaon – Full learning of low- to mid- and high-level features from image patches – Deep learning

G. Carneiro - University of Adelaide 52