Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Amo G. Tong 1 Lecture 3 Supervised Learning • Decision Tree • Readings: Mitchell Ch 3. • Some materials are courtesy of Vibhave Gogate and Tom Mitchell. • All pictures belong to their creators. Introduction to Machine Learning Amo G. Tong 2 Supervised Learning • Given some training examples < 푥, 푓(푥) > and an unknown function 푓. • Find a good approximation of 푓. • Step 1. Select the features of 푥 to be used. • Step 2. Select a hypothesis space 퐻: a set of candidate functions to approximate 푓. • Step 3. Select a measure to evaluate the functions in 퐻. • Step 4. Use a machine learning algorithm to find the best function in 퐻 according to your measure. • Concept Learning. • Input: values of attributes Decision Tree: learn a discrete value function. • Output: Yes or No. Introduction to Machine Learning Amo G. Tong 3 Decision Tree • Each instance is characterized by some discrete attributes. • Each attribute has several possible values. • Each instance is associated with a discrete target value. • A decision tree classifies an instance by testing the attributes sequentially. attribute1 value11 value12 value13 attribute2 value21 value22 target value 1 target value 2 Introduction to Machine Learning Amo G. Tong 4 Decision Tree • Play Tennis? Attributes Target value Introduction to Machine Learning Amo G. Tong 5 Decision Tree • Play Tennis? Attributes Target value Introduction to Machine Learning Amo G. Tong 6 Decision Tree • A decision tree. • Outlook=Rain and Wind=Weak. Play tennis? • Outlook=Sunny and Humidity=High. Play Tennis? Introduction to Machine Learning Amo G. Tong 7 Decision Tree • A decision tree. • How to build a decision tree according to the training data so that it can well classify the unobserved instance? Introduction to Machine Learning Amo G. Tong 8 Decision Tree • We will learn some algorithms to build a decision tree. • We will discuss some issues regarding decision tree. Introduction to Machine Learning Amo G. Tong 9 ID3 • To build a decision tree is to decide which attribute should be tested. • Which attribute is the best classifier? algorithm measure Introduction to Machine Learning Amo G. Tong 10 ID3 • To build a decision tree is to decide which attribute should be tested. • Which attribute is the best classifier? • The ID3 algorithm: select the attribute that can maximally reduce the impurity of the instances. Introduction to Machine Learning Amo G. Tong 11 ID3 • To build a decision tree is to decide which attribute should be tested. • Which attribute is the best classifier? • The ID3 algorithm: select the attribute that can maximally reduce the impurity of the instances. • Entropy measures the purity of a collection of instances. • For a collection S of positive and negative instances, its entropy is defined as • 퐸푛푡푟표푝푦 푆 : = −푝1 log2 푝1 − 푝0 log2 푝0, • where 푝1 is proportion of positive instances and 푝0 = 1 − 푝1. Introduction to Machine Learning Amo G. Tong 12 ID3 • 퐸푛푡푟표푝푦 푆 : = −푝1 log2 푝1 − 푝0 log2 푝0 half-half 1 Smaller 퐸푛푡푟표푝푦 푆 ⇒ Higher purity. 퐸푛푡푟표푝푦 푆 all negative all positive 0 1 푝1 Introduction to Machine Learning Amo G. Tong 13 ID3 • Generally, 퐸푛푡푟표푝푦 푆 : = −∑ 푝푖 log2 푝푖, where 푝푖 is the proportion of the instances have the -th value. Introduction to Machine Learning Amo G. Tong 14 ID3 • 퐸푛푡푟표푝푦 푆 : = −푝1 log2 푝1 − 푝0 log2 푝0. • 푝1 = 9/14, 푝0 = 5/14, 퐸푛푡푟표푝푦(푆) = 0.94 Introduction to Machine Learning Amo G. Tong 15 ID3 • To build a decision tree is to decide which attribute should be tested. • Which attribute is the best classifier? • The ID3 algorithm: select the attribute that can maximally reduce the impurity of the instances. • Entropy measures the purity of a collection of instances. • What will happen if an attribute 퐴 is selected? • If 퐴 has 푘 values, the instances 푆 will be divided into subsets: 푆1, … , 푆푘. • The expected entropy is 푆 ∑ 푖 퐸푛푡푟표푝푦(푆 ) 푖 푆 푖 Introduction to Machine Learning Amo G. Tong 16 ID3 • Consider Outlook. 5 4 5 • 퐸푛푡푟표푝푦 푆 + 퐸푛푡푟표푝푦 푆 + 퐸푛푡푟표푝푦(푆 ) 14 푆푢푛푛푦 14 푂푣푒푟푐푎푠푡 14 푅푎푖푛 Introduction to Machine Learning Amo G. Tong 17 ID3 • What will happen if an attribute 퐴 is selected? • If 퐴 has 푘 values, the instances 푆 will be divided into subsets: 푆1, … , 푆푘. 푆 • After 퐴 is tested: the expected entropy is ∑ 푖 퐸푛푡푟표푝푦(푆 ) 푖 푆 푖 • Before 퐴 is tested: 퐸푛푡푟표푝푦(푆) • Information Gain: 푆푖 • 퐺푎푛(푆, 퐴) = 퐸푛푡푟표푝푦(푆) − ∑푖 퐸푛푡푟표푝푦(푆푖) 푆 Difference • ID3: select the attribute that can 푚푎푥푚푧푒 Gain(푆, 퐴), until every leaf is pure. Introduction to Machine Learning Amo G. Tong 18 ID3 Step 1. Outlook has the highest gain. ? ? Introduction to Machine Learning Amo G. Tong 19 ID3 Overcast is pure. Step 1. Outlook has the highest gain. ? ? Introduction to Machine Learning Amo G. Tong 20 ID3 Gain(S,A) S: instances belonging to the current branch. Step 1. Outlook has the highest gain. ? ? Introduction to Machine Learning Amo G. Tong 21 ID3 Information Gain: 푆 퐺푎푛(푆, 퐴) = 퐸푛푡푟표푝푦(푆) − ෍ 푖 퐸푛푡푟표푝푦(푆 ) 푆 푖 푖 Introduction to Machine Learning Amo G. Tong 22 Issues • Issues. • Hypothesis Space. • Inductive Bias. • Overfitting. • Real-valued features. • Missing values. Introduction to Machine Learning Amo G. Tong 23 Hypothesis Space • Hypothesis Space. • Fact: every finite discrete function can be represented by a decision tree. • ID3 searches a complete space. • Note. Each 푛-feature Boolean function can be represented by a binary decision tree with 푛 depth. • Just test the features one by one. Introduction to Machine Learning Amo G. Tong 24 Inductive bias • Inductive bias • ID3 outputs only one decision. There can be many decision trees consistent with the training data. • What is the inductive bias? • ID3 prefers shorter trees. • Purity => less need of classification => shorter trees. Introduction to Machine Learning Amo G. Tong 25 Inductive bias • Concept learning algorithm. • Restriction Bias: assume the true target is a conjunction of constrains. • Preference Bias: None (or consistency). We output the whole version space. • If no restriction bias, • (a) we need every possible instance to present to narrow the space into exact one hypothesis. • (b) predictions is not better than random guesses. Introduction to Machine Learning Amo G. Tong 26 Inductive bias • Decision tree learning + ID3 algorithm. • Restriction Bias: None. The space is complete. • Preference Bias: shorter trees are preferred. • If no preference bias: • ?? Introduction to Machine Learning Amo G. Tong 27 Inductive bias • Fact: every finite discrete function can be represented by a decision tree. • ID3 searches a complete space. • ID3 prefers shorter trees. • Purity => less need of classification => shorter trees. • Why not longer trees? • Occam’s Razor: Simplest model that explains the data should be preferred. Introduction to Machine Learning Amo G. Tong 28 Overfitting • Overfitting. • Consider the error of a hypothesis ℎ over • (a) the training data, 푒푟푟표푟푡푟푎푖푛(ℎ) • (b) the entire distribution 퐷 of the data, 푒푟푟표푟퐷(ℎ) • Overfitting happens if there is another hypothesis ℎ’ such that ’ • 푒푟푟표푟푡푟푎푖푛(ℎ) < 푒푟푟표푟푡푟푎푖푛(ℎ ) and ’ • 푒푟푟표푟퐷(ℎ) > 푒푟푟표푟퐷(ℎ ) • Generalization/prediction is important. Introduction to Machine Learning Amo G. Tong 29 Overfitting • Overfitting. • Source of overfitting: training data cannot reflect the entire distribution. • (a) errors • Fitting data does not help. • (b) imbalance distribution • If a leaf has only one training instance, is it reliable? Introduction to Machine Learning Amo G. Tong 30 Overfitting 푦 푦 Truth Good data 푥 푥 푦 Too much noise 푦 Ill-distributed 푥 푥 Introduction to Machine Learning Amo G. Tong 31 Overfitting • Overfitting. • Source of overfitting: training data cannot reflect the entire distribution. • (a) errors • Fitting data does not help. • (b) imbalanced distribution • If a leaf has only one training instance, is it reliable? • It happens when the tree is too large [the rules are too specific]. Introduction to Machine Learning Amo G. Tong 32 Overfitting • Overfitting. • It happens when the tree is too large [the rules are too specific]. Accuracy on training data Accuracy on testing data Size of the tree Introduction to Machine Learning Amo G. Tong 33 Overfitting • Source of overfitting • (a) noise • If some training data has errors, they cannot reflect the entire distribution • (b) imbalanced distribution • If a leaf has only one training instance, is it reliable? • It happens when the tree is too large [the rules are too specific]. • How to avoid overfitting? Control the size of the tree. Introduction to Machine Learning Amo G. Tong 34 Overfitting • Overfitting. • How to avoid overfitting? Control the size of the tree. • Method 1. Stop growing the tree if the further classification is not statistically significant. • Method 2. Grow the full tree and then prune the tree. • Use training data to grow the tree. • Use validation data to evaluate the utility of post-pruning. • Idea: the same noise is not likely to appear in both datasets. • Two approaches: reduce-error pruning and rule post-pruning. Introduction to Machine Learning Amo G. Tong 35 Overfitting • Reduce-error pruning. • Pruning a node consists of removing the subtree rooted at that node, making it a leaf node, and assigning it the most common classification of the training instances affiliated with that node. Introduction to Machine Learning Amo G. Tong 36 Overfitting • Reduce-error pruning. • Pruning a node consists of removing the subtree rooted at that node, making it a leaf node, and assigning it the most common classification of the training instances affiliated with that node. • Pruning effect: the change of the accuracy on the validation set after pruning. • Algorithm: • Pruning the node that has the best pruning effect. • Until no pruning is helpful. [cannot increase the accuracy on validation set.] Introduction to Machine Learning Amo G. Tong 37 Decision Tree • Overfitting. • Reduce-error pruning. Introduction to Machine Learning Amo G.

Introduction to Machine Learning

ID3 ALGORITHM in DATA MINING APPLIED to DIABETES DATABASE Dr.R.Jamuna

Decision Tree Learning an Implementation and Improvement of the ID3 Algorithm

Machine Learning: Decision Trees

Course 395: Machine Learning – Lectures

An Improved ID3 Decision Tree Algorithm Based on Attribute Weighted

On the Optimization of Fuzzy Decision Trees

Selected Algorithms of Machine Learning from Examples

Extracting Decision Trees from Diagnostic Bayesian Networks to Guide Test Selection

An Implementation of ID3 --- Decision Tree Learning Algorithm

Decision Trees

Classification: Decision Trees

Pissn:2321-2241 Volume: 8; Issue: 12; December -2019