International Master in Computer Vision

International Master in Computer Vision Fundamentals of machine learning for computer vision Eva Cernadas Contents Machine learning theory (Dr. Jaime Cardoso) Linear regression and optimization (Dr. Jaime Cardoso) Clustering Model selection and evaluation Classical classification models Artificial neural networks Support vector machines (SVM) Ensembles: bagging, boosting and random forest Artificial Neural Networks Eva Cernadas 2 Artificial neural networks ● ANN: Artificial Neural Networks ● Neural network: combination of local processing units (neurons) ● Neuron: simple processing of various inputs and one output. ● Input connections with weights for each input. ● The output is detemined by activation functions from the inputs and weights. ● The weights are persistent: they are the memory of the neural net. ● The neurons are grouped in layers: multi-layers networks: input, hidden (one or various) and output layers. ● The weights are calculated to approximate n-dimensional functions. Artificial Neural Networks Eva Cernadas 3 Neural network ● The term ANN are normally used to multi-layer perceptron (MLP). ● Supervised training: calculate the weights using input patterns and desired outputs. ● In the test: the pattern propagate through the net to calculate the output. ● There are supervised and unsupervised neural networks. Artificial Neural Networks Eva Cernadas 4 Perceptron (I) ● One neuron with n inputs and one output, binary activation function (0,1) (for example, 2-class classification problems): x 1 Step activation function . w . 1 . n . w i z 1 t≥Γ xi Σ . z(x)=f wi xi f (t)= . ∑ . ( i=1 ) {0 t<Γ } wn xn ● Need a threshold Γ (for example Γ=0.5) to calculate the output (0 or 1). Varying the Γ value generate a ROC curve. ● Connection weights w1...wn : iteratively calculated Artificial Neural Networks Eva Cernadas 5 Perceptron (II) ● Gradient descend: minimize the difference between the desired output (y) and predicted (z), : learning speed w (t +1)=w(t)+μ( yi−zi)xi ,i=1,…, N Note that: yi-zi∈{0,1} N ● Repeat until ∑ y i z i = 0 . Converge only when the data i=1 are linearly separated. Artificial Neural Networks Eva Cernadas 6 Perceptron (III) ● Activation f derivable: sigmoid or hyperbolic tangent: Logistic sigmoid function: 0,1 1 f (t)=σ(t)= 1+e−at Hyperbolic tangent function: 1 e−at−eat f (t)=tanh at= eat +e−at Artificial Neural Networks Eva Cernadas 7 Perceptron (V) T ● Kernel perceptron: linear classification z = f ( w ϕ ( x ) + b ) in the multi-dimensional projected hidden space (f : step function): N N w=∑ ai yi ϕ(xi) b=∑ ai yi i=1 i=1 ● Inicially, ai=0, i=1..N; ai=ai+1 if yizi>0 until that N ∑ yi zi=0 i=1 Artificial Neural Networks Eva Cernadas 8 Multi-layer neural network (MLP) ● Each neuron is a linear classifier of the input space (hyperplane). ● Divide the space in two subspaces with outputs 0,1 (sigmoid function) or 1 (hiperbolic function). ● Joining various neurons into one layer, divide the space into regions with piecewise linear borders (surfaces). Artificial Neural Networks Eva Cernadas 9 Multi-layer Layer Perceptron (MLP) 1ª hidden (H-1)ª hidden ● Hidden layer: activation Input .... layer layer pattern Output sigmoidal/tanh layer (Hª) x1 1 1 . ● . 1 z Output layer: step (clasificación) x . 1 1 . ● . We need: training set, cost . x . I z n-1 . H M . function, and derivable activation . xn I1 IH-1 ● k wj : neuron weight j=1..Ik in layer k=1..H ● k k T k-1 k aij =(wj ) hi +bj , i=1..N (pattern), k=1..H (layer), j=1..Ik (neuron) ● k-1 hi : output of layer k-1 (Ik-1 values) for pattern xi ● k k k hi : output of layer k for pattern xi: hij =f(aij ) with j=1..Ik ● Yij =desired output to output neuron j and pattern xi Artificial Neural Networks Eva Cernadas 10 Backpropagation (I) ● Gradient descent: k ∂ J k ∂ J Δ w j=−μ k , Δ b j =−μ k , k=1,…, H ∂ w j ∂ b j N ● Mean squared error Ji : evaluate on output layer J=∑ J i for i pattern: i=1 I H H 2 1 H 2 |yi−hi | J i= ∑ ( yij−hij ) = 2 j=1 2 ● The derivative is: k k ∂ J i ∂ J i ∂ aij ∂ Ji ∂ J i ∂ aij k = k k , k = k k ∂ w j ∂ aij ∂ w j ∂ b j ∂ aij ∂b j Artificial Neural Networks Eva Cernadas 11 Backpropagation (II) k ∂ J i ● Define: δ ij ≡ k ∂ aij ∂ ak ∂ ak ● Thus: ij k−1 ij ak =(wk )Th k-1+bk k =hi , k =1 ij j i j ∂ w j ∂ b j ● So: N N k k k−1 k k Δ w j=−μ∑ δij hi ; Δ b j =−μ∑ δij i=1 i=1 Artificial Neural Networks Eva Cernadas 12 Backpropagation (III) ● For the output layer (k=H): H H H H H h k=f(a k) δij =( yij−hij )f '(aij )=εij f '(aij ) ij ij N k H H k−1 Δ w j=−μ∑ εij f '(aij )hij i=1 N k H H Δ b j=−μ∑ εij f '(aij ) i=1 ● For the layer k<H: I k+1 k+1 I k+1 k +1 k ∂ J i ∂ J i ∂ ail k +1 ∂ ail δij= k =∑ k+1 k =∑ δil k ∂ aij l=1 ∂ ail ∂ aij l=1 ∂ aij Artificial Neural Networks Eva Cernadas 13 Backpropagation (IV) ● k+1 k+1 T k k+1 k k Since ail =(wl ) hi +bl and hij =f(aij ), then: I k k+1 k+1 k+1 k k +1 ∂ ail k+1 k ail =∑ wlm f (aim)+bl k =wlj f '(aij) m=1 ∂ aij ● k k+1 So δij is a recursive function depending on δij and k+1 wl : I k+1 I k+1 k k +1 k+1 k k k+1 k +1 δij=∑ δil wlj f ' (aij)=f ' (aij)∑ δil wlj ,k=K−1,…,1 l=1 l=1 I k+1 k k+1 k +1 k k k ● δ =ε f ' (a ) Denoting ε ij = ∑ δ il w lj , obtain: ij ij ij l=1 ● The derivative is, respectively, f ’(t)=af(t)[1-f(t)] and f ’(t)=a[1-f 2(t)] for sigmoid and tanh activation functions. Artificial Neural Networks Eva Cernadas 14 Backpropagation (V) repeat # epoch loop for i=1:N ● k k for k=1:H # direct propagation Initial weights (wj , bj ) for j=1:Ik randomly low k k T k-1 k k k aij =(wj ) hi +bj ; hij =f(aij ) ● endfor Stop criterion: 1) J or gradient endfor of J lower than a threshold; 2) maximum of epochs for j=1:IH ε H=y -h H; δ H=ε H f ’(a H) ij ij ij ij ij ij ● Speed μ intermediate: avoid endfor slowness and oscilations for k=H-1:-1:1 # backpropagation I for j=1:I k +1 k k k +1 k+1 ● Fall into local minimum which εij= δil wlj ; δ k=ε k f ’(a k+1) ∑ ij ij ij has high J: select the best l=1 endfor among different initializations endfor endfor ● Update the weights pattern by for k=1:H # weights update pattern (online): it can avoid for j=1:Ik N N k k k−1 k k local minimums and converges Δ w j =−μ∑ δij hi ; Δb j=−μ∑ δij faster i=1 i=1 k k k k k k wj =wj +Δwj ; bj =bj +Δbj endfor endfor Bach processing until stop criterion Artificial Neural Networks Eva Cernadas 15 Enhacement over backpropagation (I) ● Preprocessing: inputs using 0 mean and the same variance as the activation range f(t). ● Symmetrical activation (tanh) instead of the sigmoid function. ● Moment (α ): inertia in the learning: N k k k k−1 Δ w j (t+1)=α Δ w j (t)−μ∑ δij hi i=1 k t=iteraction; α∈[0.7-0.95]; Δwj reduce aprox. in 1-α Artificial Neural Networks Eva Cernadas 16 Enhacement over backpropagation (II) ● RMSProp (root mean square propagation): use a second order moment instead of first order moment; η,β: hyper- parameters epoch=presentation of training set S =0;S =0 w b Use the squared for epoch=1:nepoch gradient to scale the learning speed. calculate Δw and Δb 2 2 Sw=βSw+(1-β)|Δw| ;Sb=βSb+(1-β)Δb ηΔ w ηΔ b w (t+1)=w(t)− ;b(t +1)=b(t)− ε+√Sw ε+√Sb endfor Artificial Neural Networks Eva Cernadas 17 Enhacement over backpropagation (III) ● Adaptive learning speed: μ(t=0)∈[0.03-0.1] – μ(t+1)=(1+εi)μ(t) if J(t+1)<J(t) – μ(t+1)=(1-εd)μ(t) and a=0 if J(t+1)>(1+εc)J(t), with εi=0.05, εd=0.3 e εc=0.04 ● Speed reduction: t=epoch; μ0,β,k: hyper-parameters μ k μ μ(t)= 0 μ(t)=0.95t μ μ(t)= 0 1+t β 0 √t Artificial Neural Networks Eva Cernadas 18 Enhacement over backpropagation (IV) ● Different speed for each weight: increase μ when the gradient of that weight has the same sign in two iteractions. ● Desired outputs yij corresponding with activation function (sigmoid or hyperbolic). ● If yij∈[0,1], they can be seen as probabilities and cross entropy can be use as cost function: N I H H H J=−∑∑ [ yij ln hij +(1− yij)ln(1−hij )] i=1 j=1 Artificial Neural Networks Eva Cernadas 19 Enhacement over backpropagation (V) ● Karhunen-Loewe divergence or relative entropy, using softmax activations for the output, can also be considered as cost function: H aij N I H H H e hij hij = I H J=−∑∑ yij ln H ail i=1 j=1 yij ∑ e l=1 ● The number H of layers and neurons Ik in each layer k=1..H must be decided: tunned hyper-parameters.

Load more