1/29/17
Decision Trees: Overfitting CSE 446: Machine Learning Emily Fox University of Washington January 30, 2017
©2017 Emily Fox CSE 446: Machine Learning
Decision tree recap
Loan status: Root poor 22 18 4 14 Safe Risky
Credit? Income?
excellent Fair 9 0 9 4 high low 4 5 0 9 For each leaf Safe Term? node, set Term? Risky ŷ = majority value 3 years 5 years 0 4 9 0 3 years 5 years 0 2 4 3 Risky Safe
Risky Safe
2 ©2017 Emily Fox CSE 446: Machine Learning
1 1/29/17
Greedy decision tree learning
• Step 1: Start with an empty tree
• Step 2: Select a feature to split data Pick feature split • For each split of the tree: leading to lowest • Step 3: If nothing more to, classification error make predictions
Stopping • Step 4: Otherwise, go to Step 2 & conditions 1 & 2 continue (recurse) on this split Recursion
3 ©2017 Emily Fox CSE 446: Machine Learning
Scoring a loan application
xi = (Credit = poor, Income = high, Term = 5 years) Start
excellent poor Credit?
fair
Income? Safe Term? high Low 3 years 5 years
Risky Safe Term? Risky
3 years 5 years
Risky Safe ŷi = Safe
4 ©2017 Emily Fox CSE 446: Machine Learning
2 1/29/17
Decision trees vs logistic regression: Example
©2017 Emily Fox CSE 446: Machine Learning
Logistic regression
Weight Feature Value Learned
h0(x) 1 0.22
h1(x) x[1] 1.12
h2(x) x[2] -1.07
6 ©2017 Emily Fox CSE 446: Machine Learning
3 1/29/17
Depth 1: Split on x[1]
y values Root - + 18 13
x[1]
x[1] < -0.07 x[1] >= -0.07 13 3 4 11
7 ©2017 Emily Fox CSE 446: Machine Learning
Depth 2
y values Root - + 18 13
x[1]
x[1] < -0.07 x[1] >= -0.07 13 3 4 11
x[1] x[2]
x[1] < -1.66 x[1] >= -1.66 x[2] < 1.55 x[2] >= 1.55 7 0 6 3 1 11 3 0
8 ©2017 Emily Fox CSE 446: Machine Learning
4 1/29/17
Threshold split caveat
y values Root - + 18 13
x[1]
For threshold splits, x[1] < -0.07 x[1] >= -0.07 same feature can be 13 3 4 11 used multiple times
x[1] x[2]
x[1] < -1.66 x[1] >= -1.66 x[2] < 1.55 x[2] >= 1.55 7 0 6 3 1 11 3 0
9 ©2017 Emily Fox CSE 446: Machine Learning
Decision boundaries
Depth 1 Depth 2 Depth 10
10 ©2017 Emily Fox CSE 446: Machine Learning
5 1/29/17
Comparing decision boundaries Decision Tree Depth 1 Depth 3 Depth 10
Logistic Regression Degree 1 features Degree 2 features Degree 6 features
11 ©2017 Emily Fox CSE 446: Machine Learning
Predicting probabilities with decision trees
Loan status: Root Safe Risky 18 12
Credit?
excellent fair poor P(y = Safe | x) 9 2 6 9 3 1
= 3 = 0.75 3 + 1 Safe Risky Safe
12 ©2017 Emily Fox CSE 446: Machine Learning
6 1/29/17
Depth 1 probabilities
Y values root - + 18 13
X1
X1 < -0.07 X1 >= -0.07 13 3 4 11
13 ©2017 Emily Fox CSE 446: Machine Learning
Depth 2 probabilities
Y values root - + 18 13
X1
X1 < -0.07 X1 >= -0.07 13 3 4 11
X1 X2
X1 < -1.66 X1 >= -1.66 X2 < 1.55 X2 >= 1.55 7 0 6 3 1 11 3 0
14 ©2017 Emily Fox CSE 446: Machine Learning
7 1/29/17
Comparison with logistic regression
Logistic Regression Decision Trees (Degree 2) (Depth 2)
Class
Probability
15 ©2017 Emily Fox CSE 446: Machine Learning
Overfitting in decision trees
©2017 Emily Fox CSE 446: Machine Learning
8 1/29/17
What happens when we increase depth?
Training error reduces with depth
Tree depth depth = 1 depth = 2 depth = 3 depth = 5 depth = 10
Training error 0.22 0.13 0.10 0.03 0.00
Decision boundary
17 ©2017 Emily Fox CSE 446: Machine Learning
Two approaches to picking simpler trees
1. Early Stopping: Stop the learning algorithm before tree becomes too complex
2. Pruning: Simplify the tree after the learning algorithm terminates
18 ©2017 Emily Fox CSE 446: Machine Learning
9 1/29/17
Technique 1: Early stopping
• Stopping conditions (recap): 1. All examples have the same target value 2. No more features to split on
• Early stopping conditions: 1. Limit tree depth (choose max_depth using validation set) 2. Do not consider splits that do not cause a sufficient decrease in classification error 3. Do not split an intermediate node which contains too few data points
19 ©2017 Emily Fox CSE 446: Machine Learning
Challenge with early stopping condition 1
Hard to know exactly when to stop
Simple Complex Also, might want trees trees some branches of
tree to go deeper while others remain shallow True error
Classification Error Training error
max_depth Tree depth
20 ©2017 Emily Fox CSE 446: Machine Learning
10 1/29/17
Early stopping condition 2: Pros and Cons
• Pros: - A reasonable heuristic for early stopping to avoid useless splits
• Cons: - Too short sighted: We may miss out on “good” splits may occur right after “useless” splits - Saw this with “xor” example
21 ©2017 Emily Fox CSE 446: Machine Learning
Two approaches to picking simpler trees
1. Early Stopping: Stop the learning algorithm before tree becomes too complex
2. Pruning: Simplify the tree after the learning algorithm terminates
Complements early stopping
22 ©2017 Emily Fox CSE 446: Machine Learning
11 1/29/17
Pruning: Intuition Train a complex tree, simplify later
Complex Tree
Simpler Tree
Simplify
23 ©2017 Emily Fox CSE 446: Machine Learning
Pruning motivation
Simple Complex tree tree True Error Simplify after tree is built
Don’t stop too early
Classification Error Training Error
Tree depth
24 ©2017 Emily Fox CSE 446: Machine Learning
12 1/29/17
Scoring trees: Desired total quality format
Want to balance: i. How well tree fits data ii. Complexity of tree
want to balance Total cost = measure of fit + measure of complexity
(classification error) Large # = likely to overfit Large # = bad fit to training data 25 ©2017 Emily Fox CSE 446: Machine Learning
Simple measure of complexity of tree
L(T) = # of leaf nodes Start
excellent poor Credit?
good fair bad Safe Risky
Safe Safe Risky
26 ©2017 Emily Fox CSE 446: Machine Learning
13 1/29/17
Balance simplicity & predictive power
Too complex, risk of overfitting
Start
excellent poor Credit? Too simple, high fair classification error Income? Safe Term? Start high low 3 years 5 years
Risky Risky Safe Term? Risky
3 years 5 years
Risky Safe
27 ©2015-2016 Emily Fox & Carlos Guestrin CSE 446: Machine Learning
Balancing fit and complexity
Total cost C(T) = Error(T) + λ L(T)
tuning parameter If λ=0:
If λ=∞:
If λ in between:
28 ©2017 Emily Fox CSE 446: Machine Learning
14 1/29/17
Tree pruning algorithm
©2017 Emily Fox CSE 446: Machine Learning
Step 1: Consider a split
Start Tree T
excellent poor Credit?
fair
Income? Safe Term? high low 3 years 5 years
Risky Safe Term? Risky
3 years 5 years Candidate for Safe Risky pruning
30 ©2017 Emily Fox CSE 446: Machine Learning
15 1/29/17
Step 2: Compute total cost C(T) of split
Start Tree T
excellent poor λ = 0.3 Credit? Tree Error #Leaves Total fair T 0.25 6 0.43 Income? Safe Term? high low 3 years 5 years C(T) = Error(T) + λ L(T)
Risky Safe Term? Risky
3 years 5 years Candidate for Safe Risky pruning
31 ©2017 Emily Fox CSE 446: Machine Learning
Step 2: “Undo” the splits on Tsmaller
Start Tree Tsmaller
excellent poor λ = 0.3 Credit? Tree Error #Leaves Total fair T 0.25 6 0.43 Income? Safe Term? Tsmaller 0.26 5 0.41 high low 3 years 5 years C(T) = Error(T) + λ L(T) Risky Safe Safe Risky
Replace split by leaf node? 32 ©2017 Emily Fox CSE 446: Machine Learning
16 1/29/17
Prune if total cost is lower: C(Tsmaller) ≤ C(T)
Worse training error but
Start Tree Tsmaller lower overall cost
excellent poor λ = 0.3 Credit? Tree Error #Leaves Total fair T 0.25 6 0.43 Income? Safe Term? Tsmaller 0.26 5 0.41 high low 3 years 5 years C(T) = Error(T) + λ L(T) Risky Safe Safe Risky
Replace split by leaf node? YES! 33 ©2017 Emily Fox CSE 446: Machine Learning
Step 5: Repeat Steps 1-4 for every split
Start Decide if each split can be “pruned” excellent poor Credit?
fair
Income? Safe Term? high low 3 years 5 years
Risky Safe Safe Risky
34 ©2017 Emily Fox CSE 446: Machine Learning
17 1/29/17
Summary of overfitting in decision trees
©2017 Emily Fox CSE 446: Machine Learning
What you can do now…
• Identify when overfitting in decision trees • Prevent overfitting with early stopping - Limit tree depth - Do not consider splits that do not reduce classification error - Do not split intermediate nodes with only few points • Prevent overfitting by pruning complex trees - Use a total cost formula that balances classification error and tree complexity - Use total cost to merge potentially complex trees into simpler ones
36 ©2017 Emily Fox CSE 446: Machine Learning
18 1/29/17
Boosting CSE 446: Machine Learning Emily Fox University of Washington January 30, 2017
©2017 Emily Fox CSE 446: Machine Learning
Simple (weak) classifiers are good!
Income>$100K?
Yes No
Safe Risky Logistic regression Shallow Decision w. simple features decision trees stumps
Low variance. Learning is fast!
But high bias…
38 ©2017 Emily Fox CSE 446: Machine Learning
19 1/29/17
Finding a classifier that’s just right true error
error train error Classification
Model complexity
Weak learner Need stronger learner
Option 1: add more features or depth Option 2: ????? 39 ©2017 Emily Fox CSE 446: Machine Learning
Boosting question
“Can a set of weak learners be combined to create a stronger learner?” Kearns and Valiant (1988)
Yes! Schapire (1990)
Boosting
Amazing impact: simple approach widely used in industry wins most Kaggle competitions
40 ©2017 Emily Fox CSE 446: Machine Learning
20 1/29/17
Ensemble classifier
©2017 Emily Fox CSE 446: Machine Learning
A single classifier Input: x
Income>$100K?
Yes No
Safe Risky
Output: ŷ = f(x) - Either +1 or -1 Classifier 42 ©2017 Emily Fox CSE 446: Machine Learning
21 1/29/17
Income>$100K?
Yes No
Safe Risky
43 ©2017 Emily Fox CSE 446: Machine Learning
Ensemble methods: Each classifier “votes” on prediction
xi = (Income=$120K, Credit=Bad, Savings=$50K, Market=Good)
Income>$100K? Credit history? Savings>$100K? Market conditions?
Yes No Bad Good Yes No Bad Good
Safe Risky Risky Safe Safe Risky Risky Safe
f1(xi) = +1 f2(xi) = -1 f3(xi) = -1 f4(xi) = +1
Combine? Ensemble model Learn coefficients
F(xi) = sign(w1 f1(xi) + w2 f2(xi) + w3 f3(xi) + w4 f4(xi))
44 ©2017 Emily Fox CSE 446: Machine Learning
22 1/29/17
Prediction with ensemble
Income>$100K? Credit history? Savings>$100K? Market conditions?
Yes No Bad Good Yes No Bad Good
Safe Risky Risky Safe Safe Risky Risky Safe
f1(xi) = +1 f2(xi) = -1 f3(xi) = -1 f4(xi) = +1
Combine? Ensemble w1 2 model Learn coefficients w2 1.5
w3 1.5 F(xi) = sign(w1 f1(xi) + w2 f2(xi) + w3 f3(xi) + w4 f4(xi)) w4 0.5
45 ©2017 Emily Fox CSE 446: Machine Learning
Ensemble classifier in general • Goal: - Predict output y • Either +1 or -1 - From input x • Learn ensemble model:
- Classifiers: f1(x),f2(x),…,fT(x)
- Coefficients: ŵ1,ŵ2,…,ŵT • Prediction: T
yˆ = sign wˆ tft(x) t=1 ! 46 X ©2017 Emily Fox CSE 446: Machine Learning
23 1/29/17
Boosting
©2017 Emily Fox CSE 446: Machine Learning
Training a classifier
Training Learn f(x) Predict data classifier ŷ = sign(f(x))
48 ©2017 Emily Fox CSE 446: Machine Learning
24 1/29/17
Learning decision stump
Credit Income y A $130K Safe B $80K Risky Income? C $110K Risky A $110K Safe A $90K Safe > $100K B $120K Safe ≤ $100K 3 1 4 3 C $30K Risky C $60K Risky B $95K Safe A $60K Safe ŷ = Safe ŷ = Safe A $98K Safe
49 ©2017 Emily Fox CSE 446: Machine Learning
Boosting = Focus learning on “hard” points
Training Learn f(x) Predict data classifier ŷ = sign(f(x))
Evaluate Boosting: focus next classifier on places where f(x) does less well Learn where f(x) makes mistakes
50 ©2017 Emily Fox CSE 446: Machine Learning
25 1/29/17
Learning on weighted data: More weight on “hard” or more important points
• Weighted dataset:
- Each xi,yi weighted by αi
• More important point = higher weight αi
• Learning:
- Data point i counts as αi data points
• E.g., αi = 2 è count point twice
51 ©2017 Emily Fox CSE 446: Machine Learning
Learning a decision stump on weighted data
Increase weight α of harder/misclassified points
CreditCredit IncomeIncome y y Weight α AA $130K$130K SafeSafe 0.5 BB $80K$80K RiskyRisky 1.5 Income? CC $110K$110K RiskyRisky 1.2 AA $110K$110K SafeSafe 0.8 AA $90K$90K SafeSafe 0.6 > $100K BB $120K$120K SafeSafe 0.7 ≤ $100K 2 1.2 3 6.5 CC $30K$30K RiskyRisky 3 CC $60K$60K RiskyRisky 2 BB $95K$95K SafeSafe 0.8 AA $60K$60K SafeSafe 0.7 ŷ = Safe ŷ = Risky AA $98K$98K SafeSafe 0.9
52 ©2017 Emily Fox CSE 446: Machine Learning
26 1/29/17
Learning from weighted data in general
• Usually, learning from weighted data
- Data point i counts as αi data points
• E.g., gradient ascent for logistic regression:
Sum over data points
NN ((tt+1)+1) ((tt)) ((tt)) ww ww ++⌘⌘ hhαjj((x xii)) 11[[yyii == +1] +1] PP((yy=+1=+1 xxii,,ww )) jj jj i || ii=1=1 XX ⇣⇣ ⌘⌘
Weigh each point by α i
53 ©2017 Emily Fox CSE 446: Machine Learning
Boosting = Greedy learning ensembles from data
Training data
Learn classifier Higher weight f (x) for points where 1 Predict ŷ = sign(f1(x)) f1(x) is wrong
Weighted data
Learn classifier & coefficient
ŵ,f2(x)
Predict ŷ = sign(ŵ1 f1(x) + ŵ2 f2(x))
54 ©2017 Emily Fox CSE 446: Machine Learning
27 1/29/17
AdaBoost
©2017 Emily Fox CSE 446: Machine Learning
AdaBoost: learning ensemble [Freund & Schapire 1999]
• Start with same weight for all points: αi = 1/N
• For t = 1,…,T
- Learn ft(x) with data weights αi
- Compute coefficient ŵt
- Recompute weights αi
• Final model predicts by: T
yˆ = sign wˆ tft(x) t=1 ! 56 X©2017 Emily Fox CSE 446: Machine Learning
28 1/29/17
Computing coefficient ŵt
©2017 Emily Fox CSE 446: Machine Learning
AdaBoost: Computing coefficient ŵt of classifier ft(x)
Yes ŵt large Is ft(x) good? No ŵt small
• ft(x) is good è ft has low training error
• Measuring error in weighted data? - Just weighted # of misclassified points
58 ©2017 Emily Fox CSE 446: Machine Learning
29 1/29/17
Weighted classification error
Learned classifier ŷ =
Data point Weight of correct 1.20 Correct!Mistake! Weight of mistakes (Food(SushiFoodSushi waswas OKOK,great,great ,,αα=0.5=1.2) ) 0.50
Hide label
59 ©2017 Emily Fox CSE 446: Machine Learning
Weighted classification error
• Total weight of mistakes:
• Total weight of all points:
• Weighted error measures fraction of weight of mistakes:
weighted_error = .
- Best possible value is 0.0
60 ©2017 Emily Fox CSE 446: Machine Learning
30 1/29/17
AdaBoost:
Formula for computing coefficient ŵt of classifier ft(x) 1 1 weighted error(f ) wˆ = ln t t 2 weighted error(f ) ✓ t ◆
weighted_error(f ) 1 weighted error(ft) t on training data weighted error(ft) ŵt Yes
Is ft(x) good? No
61 ©2017 Emily Fox CSE 446: Machine Learning
AdaBoost: learning ensemble
• Start with same weight for all points: αi = 1/N
• For t = 1,…,T 1 1 weighted error(ft) - Learn ft(x) with data weights αi wˆ = ln t 2 weighted error(f ) ✓ t ◆ - Compute coefficient ŵt
- Recompute weights αi
• Final model predicts by: T
yˆ = sign wˆ tft(x) t=1 ! 62 X©2017 Emily Fox CSE 446: Machine Learning
31 1/29/17
Recompute weights αi
©2017 Emily Fox CSE 446: Machine Learning
AdaBoost: Updating weights αi based on where classifier ft(x) makes mistakes
Yes Decrease αi
Did ft get xi right? No Increase αi
64 ©2017 Emily Fox CSE 446: Machine Learning
32 1/29/17
AdaBoost: Formula for updating weights αi
-ŵt αi e , if ft(xi)=yi αi ç ŵt αi e , if ft(xi)≠yi
Multiply α by Implication ft(xi)=yi ? ŵt i
Yes
Did ft get xi right? No
65 ©2017 Emily Fox CSE 446: Machine Learning
AdaBoost: learning ensemble
• Start with same weight for all points: αi = 1/N
• For t = 1,…,T 1 1 weighted error(ft) - Learn ft(x) with data weights αi wˆ = ln t 2 weighted error(f ) ✓ t ◆ - Compute coefficient ŵt
- Recompute weights αi -ŵt αi e , if ft(xi)=yi αi ç • Final model predicts by: ŵt T αi e , if ft(xi)≠yi yˆ = sign wˆ tft(x) t=1 ! 66 X ©2017 Emily Fox CSE 446: Machine Learning
33 1/29/17
AdaBoost: Normalizing weights αi
If xi often mistake, If xi often correct, weight αi gets very weight αi gets very large small
Can cause numerical instability after many iterations
Normalize weights to add up to 1 after every iteration ↵ ↵ i i N j=1 ↵j 67 P©2017 Emily Fox . CSE 446: Machine Learning
AdaBoost: learning ensemble
1 1 weighted error(f ) • Start with same weight for wˆ = ln t t 2 weighted error(f ) all points: αi = 1/N ✓ t ◆ • For t = 1,…,T -ŵ - Learn ft(x) with data weights αi t αi e , if ft(xi)=yi - Compute coefficient ŵt αi ç ŵt - Recompute weights αi αi e , if ft(xi)≠yi
- Normalize weights αi • Final model predicts by: ↵ T i ↵i yˆ = sign wˆ f (x) N t t j=1 ↵j t=1 ! 68 X ©2017 Emily Fox CSE 446: Machine Learning P .
34 1/29/17
AdaBoost example
©2017 Emily Fox CSE 446: Machine Learning
t=1: Just learn a classifier on original data
Original data Learned decision stump f1(x)
70 ©2017 Emily Fox CSE 446: Machine Learning
35 1/29/17
Updating weights αi
Increase weight αi of misclassified points
Learned decision stump f1(x) New data weights αi Boundary
71 ©2017 Emily Fox CSE 446: Machine Learning
t=2: Learn classifier on weighted data
f1(x)
α Weighted data: using i Learned decision stump f2(x) chosen in previous iteration on weighted data
72 ©2017 Emily Fox CSE 446: Machine Learning
36 1/29/17
Ensemble becomes weighted sum of learned classifiers
ŵ1 f1(x) ŵ2 f2(x)
0.61 + 0.53 =
73 ©2017 Emily Fox CSE 446: Machine Learning
Decision boundary of ensemble classifier after 30 iterations
training_error = 0
74 ©2017 Emily Fox CSE 446: Machine Learning
37 1/29/17
Boosted decision stumps
©2017 Emily Fox CSE 446: Machine Learning
Boosted decision stumps
• Start same weight for all points: αi = 1/N • For t = 1,…,T
- Learn ft(x): pick decision stump with lowest weighted training error according to α i
- Compute coefficient ŵt
- Recompute weights αi
- Normalize weights αi • Final model predicts by: T
yˆ = sign wˆ tft(x) t=1 ! 76 X©2017 Emily Fox CSE 446: Machine Learning
38 1/29/17
Finding best next decision stump ft(x) Consider splitting on each feature:
Income>$100K? Credit history? Savings>$100K? Market conditions?
Yes No Bad Good Yes No Bad Good
Safe Risky Risky Safe Safe Risky Risky Safe
weighted_error weighted_error weighted_error weighted_error = 0.2 = 0.35 = 0.3 = 0.4
Income>$100K? f = Yes No t Safe Risky 1 1 weighted error(f ) wˆ = ln t ŵt 2 weighted error(f ) = 0.69 t ✓ t ◆ 77 ©2017 Emily Fox CSE 446: Machine Learning
Boosted decision stumps
• Start same weight for all points: αi = 1/N • For t = 1,…,T
- Learn ft(x): pick decision stump with lowest weighted training error according to αi
- Compute coefficient ŵt
- Recompute weights αi
- Normalize weights αi • Final model predicts by: T
yˆ = sign wˆ tft(x) t=1 ! 78 X©2017 Emily Fox CSE 446: Machine Learning
39 1/29/17
-ŵt Updating weights α -0.69 i α αi e = i e = αi /2 , if ft(xi)=yi Income>$100K? αi ç ŵt 0.69 Yes No α αi e = i e = 2αi , if ft(xi)≠yi Safe Risky Previous New Credit Income y ŷ weight α weight α A $130K Safe Safe A $130K Safe Safe 0.5 0.5/2 = 0.25 B $80K Risky Risky B $80K Risky Risky 1.5 0.75 C $110K Risky Safe C $110K Risky Safe 1.5 2 * 1.5 = 3 A $110K Safe Safe A $110K Safe Safe 2 1 A $90K Safe Risky A $90K Safe Risky 1 2 B $120K Safe Safe B $120K Safe Safe 2.5 1.25 C $30K Risky Risky C $30K Risky Risky 3 1.5 C $60K Risky Risky C $60K Risky Risky 2 1 B $95K Safe Risky B $95K Safe Risky 0.5 1 A $60K Safe Risky A $60K Safe Risky 1 2 A $98K Safe Risky A $98K Safe Risky 0.5 1 79 ©2017 Emily Fox CSE 446: Machine Learning
Summary of boosting
©2017 Emily Fox CSE 446: Machine Learning
40 1/29/17
Variants of boosting and related algorithms
There are hundreds of variants of boosting, most important:
Gradient boosting • Like AdaBoost, but useful beyond basic classification
Many other approaches to learn ensembles, most important:
• Bagging: Pick random subsets of the data
- Learn a tree in each subset Random - Average predictions forests • Simpler than boosting & easier to parallelize • Typically higher error than boosting for same # of trees (# iterations T)
81 ©2017 Emily Fox CSE 446: Machine Learning
Impact of boosting (spoiler alert... HUGE IMPACT)
Amongst most useful ML methods ever created
Extremely useful in • Standard approach for face detection, for example computer vision
Used by most winners of • Malware classification, credit fraud detection, ads click through rate estimation, sales forecasting, ML competitions ranking webpages for search, Higgs boson (Kaggle, KDD Cup,…) detection,…
Most deployed ML systems use • Coefficients chosen manually, with boosting, with model ensembles bagging, or others
82 ©2017 Emily Fox CSE 446: Machine Learning
41 1/29/17
What you can do now…
• Identify notion ensemble classifiers • Formalize ensembles as the weighted combination of simpler classifiers • Outline the boosting framework – sequentially learn classifiers on weighted data • Describe the AdaBoost algorithm - Learn each classifier on weighted data - Compute coefficient of classifier - Recompute data weights - Normalize weights • Implement AdaBoost to create an ensemble of decision stumps
83 ©2017 Emily Fox CSE 446: Machine Learning
42