Local Feature Learning
Gustavo Carneiro Tutorial ICIP 2013 Local Feature Learning and Non-rigid Matching
Why Local Features? Model Test 1 Test 2
Global Local
MORE ROBUST
MORE DISTINCTIVE
2-dim feature space G 2-dim feature space L
G. Carneiro - University of Adelaide 2 Local Feature
• Limited spa al support + Robust to changes and par al occlusion - Discrimina ng power (compensated by # descriptors)
• Applica ons – Visual classifica on – Image matching
G. Carneiro - University of Adelaide 3 Representa on
• An image is represented by a set of N descriptors (or parts)
G. Carneiro - University of Adelaide 4 Representa on
• An image is represented by a set of N descriptors (or parts)
f1=[ ,x1,θ1,σ1] scale appearance geometry dominant orientation position 5 Representa on
• An image is represented by a set of N descriptors (or parts)
f2 = [ a2 , g2 ]
G. Carneiro - University of Adelaide 6 Representa on
• An image is represented by a set of N descriptors (or parts)
F={ f1 , f2 , f3 , ... , fN }
G. Carneiro - University of Adelaide 7 Image Matching
[Schaffalitzky & Zisserman, ECCV02]
G. Carneiro - University of Adelaide 8 Visual Classifica on
• Instance-based recogni on
[Lowe,IJCV04] • Class recogni on
G. Carneiro - University of Adelaide 9 Objec ve Func ons
• Image matching – Maximize precision and recall of feature matching
• Instance-based recogni on and class recogni on – Minimize classifica on error
G. Carneiro - University of Adelaide 10 Hand-designed Local Features 90’s to 2000’s • ‘Where’ step – Repeatable
Harris “corners” [Harris88] DoG [Lowe99,04] Sum square diffs Diff of Gaussians Robust to rota on Robust to scale • ‘What’ step – Robust – Dis nc ve
SIFT [Lowe99,04] 128 dimensions Robust to rigid transforms and brightness 11 Ques ons
• Are these hand-designed features op mal?
• Image matching and instance-based recogni on (Does it maximize matching precision and recall?) – If dis nc ve & robust then max. precision and recall
• Class recogni on (Does it minimize classifica on error?) – If robust & dis nc ve then min. classifica on error
G. Carneiro - University of Adelaide 12 Posi ve Evidence
viewpoint scale
[Mikolajczyk and Schmid, PAMI’05]
• SIFT-like features showed superior performance – Dominates matching and classifica on applica ons G. Carneiro - University of Adelaide 13
Matching Problem
• Building Rome in a Day [Agarwal et al. ICCV09] – Reconstruct 3D scenes from large collec on of images • Matching based on SIFT features • Image similarity also based on SIFT features
G. Carneiro - University of Adelaide 14 Class Recogni on
• Bag of Features [Sivic and Zisserman, ICCV03]
SVM Classifier
• Also use SIFT features
G. Carneiro - University of Adelaide 15 However
• Good features to track [Shi and Tomasi,CVPR94] – Feature selec on based on model similarity
• Detectability, Uniqueness, and Reliability [Obah and Ikeuchi,PAMI97]
16 But these are not SIFT…
• True, but D. Lowe no ced something similar about the discrimina ng power of SIFT
• Not all descriptors have the same discrimina ng power. • Can a similar thing be said about the robustness proper es of the feature?
G. Carneiro - University of Adelaide 17 Explicit Characteriza on of Robustness and Discrimina ng Power [Carneiro and Jepson, CVPR05]
• Robustness: Pon(sf(fl,fo);fl) ˜ Pβ(sf(fl,fo);aon,bon) • Dis nc veness : Poff(sf(fl,fo);fl) ˜ Pβ(sf(fl,fo);aoff,boff) • Detectability: Pdet(xl) P P Feature vector #260 on off
Pdet(xl)=87%
Feature vector #540
Pdet(xl)=67%
18 Phase correla on Train Classifier to Select and Characterize “Good” Features
• P(Obj|Match,Img) =(1/Z)P(Match|Obj,I)P(Obj|Img) 19 Selec ng and Characterizing Good Features…
• Does it lead to more effec ve matching?
• Does it lead to more effec ve classifica on?
• Why can’t we learn the features by maximizing the actual objec ve func on? – Instead of designing and charactering individual features
G. Carneiro - University of Adelaide 20 In the Beginning…
• Perceptron [Rosenbla 57]
fi (x)
wi
y = sign(! fi (x)wi + b) • Features to use?
– Again, hand-designed… 21 Learning Input Features?
• Perceptron can only deal with linear problems • Mul -layer perceptron (non-linear ac va on func ons)? – Can deal with more complex problems! – Can we finally learn the input features from the image?
(1) wi (2) wi
` …
! (!viwi + b) 22 Back-propaga on [Rumelhart et al. 86] • Algorithm that allowed training of mul -layer perceptron could not handle more than 1-2 hidden layers – Long me to converge (if it converges at all)
• Back to hand-designing/selec on/ characteriza on of features…
G. Carneiro - University of Adelaide 23 Tradi onal Methods
• Matching Problems
Hand-designed Features Outlier Matching Rejec on Hand-designed Features
• Visual Classifica on Problems
Hand-designed Supervised Features Classifier
G. Carneiro - University of Adelaide 24 Race is on for the “Best” Hand-designed Features • No transforma on – Gray values • Frequency domain – Discrete Fourier transform (DFT), Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT) • Good Reconstruc on and Uncorrela on – Principal Components Analysis (PCA) • Good Reconstruc on and Independence – Independent Component Analysis (ICA), sparse coding • Linear class separability – Linear Discriminant Analysis (LDA) • Gradient Orienta on Histogram – SIFT, HOG, GLOH, RIFT, etc. • Image Differen als – Local jets • Sampling and representa on varia ons – Robust and dis nc ve G. Carneiro - University of Adelaide 25 Also for the “Best” Classifiers
• Genera ve Classifiers – Mixture Model – Naïve Bayes
• Discrimina ve Classifiers – Logis c Regression – Mul -layer Perceptron – Nearest Neighbor – Support Vector Machine – Boos ng – Random Forest
G. Carneiro - University of Adelaide 26 Outlier Rejec on
• Random Sample Consensus (RANSAC) [Fischler and Bolles, 81; Torr and Murray, IJCV’97]
• MSAC (M-es mator) , MLESAC (max likelihood), IMPSAC (importance sampling), etc.
• Consequence of feature matching failures
• More about this later on Dr. Chin’s session
G. Carneiro - University of Adelaide 27 Feature Learning
• Using the tradi onal architecture, can we learn the features with the op miza on func on used for the classifier?
• Matching – maximize feature robustness and discrimina ng power
• Classifica on – minimize classifica on error
G. Carneiro - University of Adelaide 28 Feature Learning - Matching
• Ac ve Shape and Appearance Models [Cootes et al. 95,98]
posi on x Appearance g
• Combine shape and gray level in a single PCA space • Gradient descent to perform matching
G. Carneiro - University of Adelaide 29 Feature Learning - Matching
• FERNS [M. Özuysal et al. CVPR’07] • Semi-naïve Bayes classifier:
G. Carneiro - University of Adelaide 30 D. Capel. 2009 Feature Learning - Matching
• Explicitly learn a feature transform that is – Robust and discrimina ng • Photo Tourism dataset [Snavely et al. SIGGRAPH’06] used by Winder et al. [Winder and Brown, CVPR’07,09 and PAMI’11] – More than 100,000 patches (3 scenes) – Backprojec ng 3D points to 2D images from scene reconstruc ons – Varia ons in scene loca on, brightness and par al occlusion
G. Carneiro - University of Adelaide 31 Feature Learning - Matching
• Discrimina ve Learning of Local Features [Brown, Hua and Winder’PAMI11]
Learning carried out To maximize AUC in ROC graph
T-Block: Steerable filters, E-Block: Linear Distance metric learning Gradients, DoG, etc. N-Block: Normaliza on to account for photometric varia ons S-Block: 32 (Linear) Distance Metric Learning [Chopra et al.CVPR05,Goldberger et al.NIPS04, Weinberger & Saul JMLR09]
• Image patches: • Linear transform: • Distance in T space:
G. Carneiro - University of Adelaide 33 Results [Winder and Brown’PAMI11]
• Errors at 95% (% of error when 95% of TP are found) • In parenthesis: dimensionality
G. Carneiro - University of Adelaide 34 Feature Learning - Matching
• Learn feature transforms from non-linear distance metric learning [Carneiro,CVPR’10] – Uses original image input (no sampling stage, and focus on embedding stage) – Use photo tourism dataset – Non-linear distance metric learning [Sugiyama JMLR07]
G. Carneiro - University of Adelaide 35 Combining Feature Spaces
• Breiman’s idea about ensemble classifiers [Breiman 01]: – combine low-bias, high-variance (unstable) classifiers to produce low-bias, low-variance classifiers.
• Distance
G. Carneiro - University of Adelaide 36 Intui on Unkown target problem
Small dist. Large dist.
T
G. Carneiro - University of Adelaide 37 Random training problem 1 Intui on Unkown target problem Small dist. Largedist.
T
G. Carneiro - University of Adelaide 38 Random training problem Experiments
• Using cross valida on, – 50 training classes for training each feature space – 50 training feature spaces
G. Carneiro - University of Adelaide 39 Experiments • Matching database of Mikolajczyk and Schmid
40 Feature Learning - Matching
• Convexify Brown et al.’s op miza on func on [Simonyan et al. ECCV’12]
• Use boos ng to produce non-linear feature transform [Trzcinski et al. NIPS’12]
• More to come J, but idea is the same – Given classes of local descriptors, find transforma on that keep features from the same class together, and separate features from dis nct classes
G. Carneiro - University of Adelaide 41 Back to the Classifica on Problem
• Feature selec on to minimize classifica on error – Robust Real- me Object Detec on [Viola and Jones, IJCV’01]
• Feature extrac on to minimize Bayes error (BE) – Minimum BE facilitates training [Carneiro and Vasconcelos, CRV’05]
• Feature Learning – Supervised Convolu onal Networks [Lecun 90s un l today]
G. Carneiro - University of Adelaide 42 Supervised Convolutional Network [Lecun, 90s until today]
• Sparse connec vity • Shared weights Hard to train L, but top results • Max pooling (par cularly on MNIST)
Classifier
43 Hierarchical Feature Representa on [Sivic and Zisserman,ICCV03] • The current mainstream is to use a 2-layer hierarchy of features
Hand-cra ed Unsupervised Supervised Low-level Mid-level Classifier Features Features • The first is usually hand-designed (SIFT) – Non-linearity • The second is unsupervised (K-Means, LLC, LCC) – Pooling • Classifica on uses the representa on from the 2nd layer
G. Carneiro - University of Adelaide 44 Feature Learning
• This architecture makes the learning of the features quite hard, par cularly at the lower level
• Learning mid-level – K-means – Sparse coding – But, this does not necessarily minimize the classifica on error
G. Carneiro - University of Adelaide 45 Back to Mul -layer Perceptrons
• In 2002, G. Hinton found a way to train Mul -layer perceptrons with a large number of hidden layers – Contras ve divergence: unsupervised + supervised training
Cause Cause
Image Class Image Class
Tradi onal ML Deep Learning
• Resurgence of interest – Deep Belief Networks – Convolu onal Neural Networks
G. Carneiro - University of Adelaide 46 Deep Learning [Hinton 06] • Learn auto-encoder (RBM)
• Unroll
• Fine-tune
G. Carneiro - University of Adelaide 47 Deep Learning
• Hierarchical features [Zeiler et al. ICCV’11]
Image 48 Deep Learning
• Applica on in medical image analysis [Carneiro et al., TIP’12,PAMI’13] • Top results in Le Ventricle segmenta on from ultrasound data • Jaccard Index of <= .16, compared to .24 for mul ple model data associa on [Nascimento et al., TIP08] and .18 for boos ng [Georgescu et al., CVPR05]
49 Deep Learning
• Learn hierarchy of representa ons and minimize classifica on error – Classifica on on ImageNet [Krizhevski et al. 2012] – 1000 categories, 1.5 Million labeled images – Convolu onal net (650K neurons, 630M synapses, 60M parameters – Trained with backpropaga on on GPU and other tricks – Error rate: 15% (whenever class isn’t in the top 5) – Previous error rate: 25%
50 Deep Learning
• Scene Parsing [Farabet et al. ICML 2012] – Convolu onal Net (fully supervised training) – Top performance on S. Gould’s dataset
G. Carneiro - University of Adelaide 51 Conclusions
• Feature learning improves both matching and classifica on
• Matching – S ll uses typical features (gradient, steerable filters, etc.) and sampling. – Distance metric learning, randomized forest, boos ng, etc.
• Classifica on – Full learning of low- to mid- and high-level features from image patches – Deep learning
G. Carneiro - University of Adelaide 52