Supervised Learning Support Vector Machines

Supervised Learning Support Vector Machines

Supervised Learning Support Vector Machines Taken form the slides of Andrew Zisserman Binary Classification ¤ Given training data d (xi, yi ) for i =1...N with x ! " and yi ! {#1,+1} ¤ Learn Classifier such that: "! 0 y = +1 f (x) = # i $< 0 yi = +1 ¤ i.e. y i f ( x i ) > 0 for a correct classification Linear Separability Linearly Separable Not Linearly Separable Linear Classifiers ¤ A linear classifier has the form f (x) = wT x + b ¤ in 2D the discriminant is a line ¤ W is the normal to the line (slop), and b the bias (intercept ) ¤ W is also known as the weight vector Linear Classifiers ¤ A linear classifier has the form f (x) = wT x + b ¤ In 3D the discriminant is a plane, and in nD it is a hyperplane ¤ How can we find this separating hyperplane ? The Perceptron Algorithm f (x) = wT x + b ¤ Wlog we can assume that the separating hyperplane passes through the origin and is of form w.x ¤ The perceptron algorithm starts with an initial guess w = 0 ¤ Cycle though the data points { xi, yi } ¤ Predict sign(f(xi)) as the label for example xi ¤ If incorrect, update wi+1 = wi +αyixi (0<α<1) ¤ else wi+1 = wi ¤ Until all the data is correctly classified Perceptron Characteristics ¤ If the data is linearly separable, then the algorithm will converge ¤ Convergence can be slow ... ¤ Separating line close to training data What is the best W? ¤ Maximum margin solution: most stable under perturbations of the inputs Support Vector Machine SVM Sketch Derivation T T ¤ Since w x + b = 0 and c ( w x + b ) = 0 define the same plane, we have the freedom to choose the normalization of W T T ¤ Choose normalization such that w x + + b = + 1 and w x ! + b = ! 1 for the positive and negative support vectors respectively ¤ Then the margin is given by w wT (x ! x ) 2 .(x ! x ) = + ! = || w || + ! || w || || w || Support Vector Machine SVM Optimization ¤ Learning the SVM can be formulated as an optimization: 2 &$wT x + b !1 if y =+1 &( max subject to % i i )for i=1...N w T || w || '&w xi + b " #1 if yi =-1*& ¤ Or equivalently 2 T min || w || subject to yi (w xi + b) !1 for i=1...N w ¤ This is a quadratic optimization problem subject to linear constraints and there is a unique minimum Linear Separability Again: What is the best w? ¤ The points can be linearly separated but there is a very narrow margin ¤ But possibly the large margin solution is better, even though one constraint is violated In general there is a trade off between the margin and the number of mistakes on the training data Introduce Slack Variables Soft Margin Solution ¤ The optimization problem becomes: N 2 T min || w || +C !i subject to yi (w xi + b) $1-!i for i=1...N w!"d ,!!"+ # i ¤ Every constraint can be satisfied if ! i is sufficiently large ¤ C is a regularization parameter ¤ Small C allows constraints to be easily ignored: large margin ¤ Large C makes constraints hard to ignore: narrow margin ¤ C=∞ enforces all constraints: hard margin ¤ This is still a quadratic optimization problem and there is a unique minimum. Note, there is only one parameter, C. C = Infinity hard margin C = 10 soft margin Application: Pedestrian detection in Computer Vision ¤ Objective: detect (localize) standing humans in an image ¤ Face detection with a sliding window classifier ¤ Reduces object detection to binary classification ¤ Does an image window contain a person or not? Training Data and Features Feature: histogram of oriented gradients (HOG) Algorithm Learned Model Non-linear SVMs ¤ Datasets that are linearly separable with some noise work out great: x 0 ¤ But what are we going to do if the dataset is just too hard? 0 x ¤ How about… mapping data to a higher-dimensional space x2 0 x Non-linear SVMs: Kernel Trick ¤ General idea: the original feature space can always be mapped to some higher-dimensional feature space where the training set is separable: Φ: x → φ(x) Summary ¤ SVM is a robust classifier ¤ Widely used in computer vision ¤ Handles high dimensional data Task ¤ How to model the problem of text classification using SVM ¤ Why cannot we choose KNN classifier instead? ¤ Why cannot we do feature selection? .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    28 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us