Ensemble Methods - Boosting
Bin Li
IIT Lecture Series
1 / 48 Boosting a weak learner
1 1 I Weak learner L produces an h with error rate β = 2 < 2 , with Pr 1 δ for any . L has access to continuous− stream of training≥ data− and a classD oracle.
1. L learns h1 on first N training points. 2. L randomly filters the next batch of training points, extracting N/2 points correctly classified by h1, N/2 incorrectly classified, and produces h2. 3. L builds a third training set of N points for which h1 and h2 disagree, and produces h3. 4. L ouptuts h = Majority Vote(h1, h2, h3) I Theorem (Schapire, 1990): “The Strength of Weak Learnability”
2 3 errorD(h) 3β 2β < β ≤ −
2 / 48 Plot of error rate 3β2 2β3 − 0.5 0.4 0.3
0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5
3 / 48 What is AdaBoost? ( x ) m G m α M m =1 ( x ) ( x ) ( x ) ( x ) 3 2 M 1 G G G G Final Classifier Boosting G ( x )=sign Training Sample Training Sample Training Sample Weighted Sample Weighted Sample Weighted Sample Weighted Sample Weighted Sample Weighted Sample Weighted Sample Weighted Sample Weighted Sample January 2003 Trevor Hastie, Stanford University 11 Figure from HTF 2009. 4 / 48 AdaBoost (Freund & Schapire, 1996)
1. Initialize the observation weigths: wi = 1/N, i = 1, 2,..., N. 2. For m = 1 to M repeat steps (a)-(d):
(a) Fit a classifier Gm(x) to the training data using weigths wi (b) Compute N i=1 wi I (yi = Gm(xi )) errm = N 6 P i=1 wi (c) Compute αm = log((1 errm)/errPm). (d) Update weights for i =− 1,..., N:
wi wi exp [αm I (yi = Gm(xi ))] ←− · · 6
and renormalize wi to sum to 1. M 3. Output G(x) = sign m=1 αmGm(x) hP i
5 / 48 Example (2)
ExampleAn example taken from of(R. AdaBoost Schapire)
2 2 1 =0.err3,α1 1==0i.42wi I,( wrongG1(xi ) pattern:= yi ) = 0d.3i =0.166, correct: di =0.07 1−err1 6 computationα1 = log see next slide= 0.847 P err1 wi for misclassified points: 0.1 exp(0.847) = 0.233. × 0.233 After normalizing, wi for wroing points is 0.233×3+0.1×7 = 0.1667. 0.1 For correct points, wi is 0.233×3+0.1×7 = 0.0714 6 / 48 Martin Riedmiller, Albert-Ludwigs-Universit¨at Freiburg, [email protected] Machine Learning 10 An example of AdaBoostExample (cont.) (3)
Example taken from (R. Schapire)