Introduction to Machine Learning

Total Page:16

File Type:pdf, Size:1020Kb

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine Learning Amo G. Tong 1 Lecture 3 Supervised Learning • Decision Tree • Readings: Mitchell Ch 3. • Some materials are courtesy of Vibhave Gogate and Tom Mitchell. • All pictures belong to their creators. Introduction to Machine Learning Amo G. Tong 2 Supervised Learning • Given some training examples < 푥, 푓(푥) > and an unknown function 푓. • Find a good approximation of 푓. • Step 1. Select the features of 푥 to be used. • Step 2. Select a hypothesis space 퐻: a set of candidate functions to approximate 푓. • Step 3. Select a measure to evaluate the functions in 퐻. • Step 4. Use a machine learning algorithm to find the best function in 퐻 according to your measure. • Concept Learning. • Input: values of attributes Decision Tree: learn a discrete value function. • Output: Yes or No. Introduction to Machine Learning Amo G. Tong 3 Decision Tree • Each instance is characterized by some discrete attributes. • Each attribute has several possible values. • Each instance is associated with a discrete target value. • A decision tree classifies an instance by testing the attributes sequentially. attribute1 value11 value12 value13 attribute2 value21 value22 target value 1 target value 2 Introduction to Machine Learning Amo G. Tong 4 Decision Tree • Play Tennis? Attributes Target value Introduction to Machine Learning Amo G. Tong 5 Decision Tree • Play Tennis? Attributes Target value Introduction to Machine Learning Amo G. Tong 6 Decision Tree • A decision tree. • Outlook=Rain and Wind=Weak. Play tennis? • Outlook=Sunny and Humidity=High. Play Tennis? Introduction to Machine Learning Amo G. Tong 7 Decision Tree • A decision tree. • How to build a decision tree according to the training data so that it can well classify the unobserved instance? Introduction to Machine Learning Amo G. Tong 8 Decision Tree • We will learn some algorithms to build a decision tree. • We will discuss some issues regarding decision tree. Introduction to Machine Learning Amo G. Tong 9 ID3 • To build a decision tree is to decide which attribute should be tested. • Which attribute is the best classifier? algorithm measure Introduction to Machine Learning Amo G. Tong 10 ID3 • To build a decision tree is to decide which attribute should be tested. • Which attribute is the best classifier? • The ID3 algorithm: select the attribute that can maximally reduce the impurity of the instances. Introduction to Machine Learning Amo G. Tong 11 ID3 • To build a decision tree is to decide which attribute should be tested. • Which attribute is the best classifier? • The ID3 algorithm: select the attribute that can maximally reduce the impurity of the instances. • Entropy measures the purity of a collection of instances. • For a collection S of positive and negative instances, its entropy is defined as • 퐸푛푡푟표푝푦 푆 : = −푝1 log2 푝1 − 푝0 log2 푝0, • where 푝1 is proportion of positive instances and 푝0 = 1 − 푝1. Introduction to Machine Learning Amo G. Tong 12 ID3 • 퐸푛푡푟표푝푦 푆 : = −푝1 log2 푝1 − 푝0 log2 푝0 half-half 1 Smaller 퐸푛푡푟표푝푦 푆 ⇒ Higher purity. 퐸푛푡푟표푝푦 푆 all negative all positive 0 1 푝1 Introduction to Machine Learning Amo G. Tong 13 ID3 • Generally, 퐸푛푡푟표푝푦 푆 : = −∑ 푝푖 log2 푝푖, where 푝푖 is the proportion of the instances have the -th value. Introduction to Machine Learning Amo G. Tong 14 ID3 • 퐸푛푡푟표푝푦 푆 : = −푝1 log2 푝1 − 푝0 log2 푝0. • 푝1 = 9/14, 푝0 = 5/14, 퐸푛푡푟표푝푦(푆) = 0.94 Introduction to Machine Learning Amo G. Tong 15 ID3 • To build a decision tree is to decide which attribute should be tested. • Which attribute is the best classifier? • The ID3 algorithm: select the attribute that can maximally reduce the impurity of the instances. • Entropy measures the purity of a collection of instances. • What will happen if an attribute 퐴 is selected? • If 퐴 has 푘 values, the instances 푆 will be divided into subsets: 푆1, … , 푆푘. • The expected entropy is 푆 ∑ 푖 퐸푛푡푟표푝푦(푆 ) 푖 푆 푖 Introduction to Machine Learning Amo G. Tong 16 ID3 • Consider Outlook. 5 4 5 • 퐸푛푡푟표푝푦 푆 + 퐸푛푡푟표푝푦 푆 + 퐸푛푡푟표푝푦(푆 ) 14 푆푢푛푛푦 14 푂푣푒푟푐푎푠푡 14 푅푎푖푛 Introduction to Machine Learning Amo G. Tong 17 ID3 • What will happen if an attribute 퐴 is selected? • If 퐴 has 푘 values, the instances 푆 will be divided into subsets: 푆1, … , 푆푘. 푆 • After 퐴 is tested: the expected entropy is ∑ 푖 퐸푛푡푟표푝푦(푆 ) 푖 푆 푖 • Before 퐴 is tested: 퐸푛푡푟표푝푦(푆) • Information Gain: 푆푖 • 퐺푎푛(푆, 퐴) = 퐸푛푡푟표푝푦(푆) − ∑푖 퐸푛푡푟표푝푦(푆푖) 푆 Difference • ID3: select the attribute that can 푚푎푥푚푧푒 Gain(푆, 퐴), until every leaf is pure. Introduction to Machine Learning Amo G. Tong 18 ID3 Step 1. Outlook has the highest gain. ? ? Introduction to Machine Learning Amo G. Tong 19 ID3 Overcast is pure. Step 1. Outlook has the highest gain. ? ? Introduction to Machine Learning Amo G. Tong 20 ID3 Gain(S,A) S: instances belonging to the current branch. Step 1. Outlook has the highest gain. ? ? Introduction to Machine Learning Amo G. Tong 21 ID3 Information Gain: 푆 퐺푎푛(푆, 퐴) = 퐸푛푡푟표푝푦(푆) − ෍ 푖 퐸푛푡푟표푝푦(푆 ) 푆 푖 푖 Introduction to Machine Learning Amo G. Tong 22 Issues • Issues. • Hypothesis Space. • Inductive Bias. • Overfitting. • Real-valued features. • Missing values. Introduction to Machine Learning Amo G. Tong 23 Hypothesis Space • Hypothesis Space. • Fact: every finite discrete function can be represented by a decision tree. • ID3 searches a complete space. • Note. Each 푛-feature Boolean function can be represented by a binary decision tree with 푛 depth. • Just test the features one by one. Introduction to Machine Learning Amo G. Tong 24 Inductive bias • Inductive bias • ID3 outputs only one decision. There can be many decision trees consistent with the training data. • What is the inductive bias? • ID3 prefers shorter trees. • Purity => less need of classification => shorter trees. Introduction to Machine Learning Amo G. Tong 25 Inductive bias • Concept learning algorithm. • Restriction Bias: assume the true target is a conjunction of constrains. • Preference Bias: None (or consistency). We output the whole version space. • If no restriction bias, • (a) we need every possible instance to present to narrow the space into exact one hypothesis. • (b) predictions is not better than random guesses. Introduction to Machine Learning Amo G. Tong 26 Inductive bias • Decision tree learning + ID3 algorithm. • Restriction Bias: None. The space is complete. • Preference Bias: shorter trees are preferred. • If no preference bias: • ?? Introduction to Machine Learning Amo G. Tong 27 Inductive bias • Fact: every finite discrete function can be represented by a decision tree. • ID3 searches a complete space. • ID3 prefers shorter trees. • Purity => less need of classification => shorter trees. • Why not longer trees? • Occam’s Razor: Simplest model that explains the data should be preferred. Introduction to Machine Learning Amo G. Tong 28 Overfitting • Overfitting. • Consider the error of a hypothesis ℎ over • (a) the training data, 푒푟푟표푟푡푟푎푖푛(ℎ) • (b) the entire distribution 퐷 of the data, 푒푟푟표푟퐷(ℎ) • Overfitting happens if there is another hypothesis ℎ’ such that ’ • 푒푟푟표푟푡푟푎푖푛(ℎ) < 푒푟푟표푟푡푟푎푖푛(ℎ ) and ’ • 푒푟푟표푟퐷(ℎ) > 푒푟푟표푟퐷(ℎ ) • Generalization/prediction is important. Introduction to Machine Learning Amo G. Tong 29 Overfitting • Overfitting. • Source of overfitting: training data cannot reflect the entire distribution. • (a) errors • Fitting data does not help. • (b) imbalance distribution • If a leaf has only one training instance, is it reliable? Introduction to Machine Learning Amo G. Tong 30 Overfitting 푦 푦 Truth Good data 푥 푥 푦 Too much noise 푦 Ill-distributed 푥 푥 Introduction to Machine Learning Amo G. Tong 31 Overfitting • Overfitting. • Source of overfitting: training data cannot reflect the entire distribution. • (a) errors • Fitting data does not help. • (b) imbalanced distribution • If a leaf has only one training instance, is it reliable? • It happens when the tree is too large [the rules are too specific]. Introduction to Machine Learning Amo G. Tong 32 Overfitting • Overfitting. • It happens when the tree is too large [the rules are too specific]. Accuracy on training data Accuracy on testing data Size of the tree Introduction to Machine Learning Amo G. Tong 33 Overfitting • Source of overfitting • (a) noise • If some training data has errors, they cannot reflect the entire distribution • (b) imbalanced distribution • If a leaf has only one training instance, is it reliable? • It happens when the tree is too large [the rules are too specific]. • How to avoid overfitting? Control the size of the tree. Introduction to Machine Learning Amo G. Tong 34 Overfitting • Overfitting. • How to avoid overfitting? Control the size of the tree. • Method 1. Stop growing the tree if the further classification is not statistically significant. • Method 2. Grow the full tree and then prune the tree. • Use training data to grow the tree. • Use validation data to evaluate the utility of post-pruning. • Idea: the same noise is not likely to appear in both datasets. • Two approaches: reduce-error pruning and rule post-pruning. Introduction to Machine Learning Amo G. Tong 35 Overfitting • Reduce-error pruning. • Pruning a node consists of removing the subtree rooted at that node, making it a leaf node, and assigning it the most common classification of the training instances affiliated with that node. Introduction to Machine Learning Amo G. Tong 36 Overfitting • Reduce-error pruning. • Pruning a node consists of removing the subtree rooted at that node, making it a leaf node, and assigning it the most common classification of the training instances affiliated with that node. • Pruning effect: the change of the accuracy on the validation set after pruning. • Algorithm: • Pruning the node that has the best pruning effect. • Until no pruning is helpful. [cannot increase the accuracy on validation set.] Introduction to Machine Learning Amo G. Tong 37 Decision Tree • Overfitting. • Reduce-error pruning. Introduction to Machine Learning Amo G.
Recommended publications
  • ID3 ALGORITHM in DATA MINING APPLIED to DIABETES DATABASE Dr.R.Jamuna
    IJITE Vol.03 Issue-03, (March, 2015) ISSN: 2321-1776 Impact Factor- 3.570 ID3 ALGORITHM IN DATA MINING APPLIED TO DIABETES DATABASE Dr.R.Jamuna Professor, Department of Computer Science S.R.College, Bharathidasan university, Trichy. Abstract The past decade has seen a flurry of promising breakthroughs in data mining predictions. Many of these developments hold the potential to prevent dreadful diseases to improve the quality of life. The advances in medical science and technology have corresponded to the use of computer algorithms as an intermediary between the medical researchers and technocrats. Diabetes Mellitus is a major killer disease of mankind today. Data mining techniques can be used to highlight the significant factors causing such a disorder. Even though total cure is not possible for this pancreatic disorder the complications can be avoided by awareness using data mining algorithms. In this paper, eight major factors playing significant role in the Pima Indian population are analyzed. Real time data is taken from the dataset of National Institute of Diabetes and Digestive and Kidney Diseases. The data is subjected to an analysis by logistic regression method using spss 7.5 statistical software, to isolate the most significant factors. Then the significant factors are further applied to decision tree technique called the Iterative Dichotomiser-3 algorithm which leads to significant conclusions. Conglomeration of data mining techniques and medical research can lead to life saving conclusions useful for the physicians. Keywords: BMI, Diabetes, decision tree, logistic regression, plasma. A Monthly Double-Blind Peer Reviewed Refereed Open Access International e-Journal - Included in the International Serial Directories International Journal in IT and Engineering http://www.ijmr.net.in email id- [email protected] Page 10 IJITE Vol.03 Issue-03, (March, 2015) ISSN: 2321-1776 Impact Factor- 3.570 ID3 ALGORITHM IN DATA MINING APPLIED TO DIABETES DATABASE Introduction Diabetes Mellitus is a major killer disease of mankind today.
    [Show full text]
  • Decision Tree Learning an Implementation and Improvement of the ID3 Algorithm
    Decision Tree Learning An implementation and improvement of the ID3 algorithm. CS 695 - Final Report Presented to The College of Graduate and Professional Studies Department of Computer Science Indiana State University Terre Haute, Indiana In Partial Fullfilment of the Requirements for the Degree Master of Science in Computer Science By Rahul Kumar Dass May 2017 Keywords: Machine Learning, Decision Tree Learning, Supervised Learning, ID3, Entropy, Information Gain. Decision Tree Learning An implementation and improvement of the ID3 algorithm. Rahul Kumar Dass CS 695 - supervised by Professor L´aszl´oEgri. Abstract The driving force behind the evolution of computing from automating automation [3] to automating learning [8] can be attributed to Machine Learning algorithms. By being able to generalize from examples, today's computers have the ability to perform autonomously and take critical decisions in almost any given task. In this report, we briefly discuss some of the foundations of what makes Machine Learning so powerful. Within the realm of supervised learning, we explore one of the most widely used classification technique known as Decision Tree Learning. We develop and analyze the ID3 algorithm, in particular we demonstrate how concepts such as Shannon's Entropy and Information Gain enables this form of learning to yield such powerful results. Furthermore, we introduce avenues through which the ID3 algorithm can be improved such as Gain ratio and the Random Forest Approach [4]. 2 Acknowledgements There are many people that I would like to express my sincerest gratitude, without which this research course would not have been completed. Firstly, I would like start by thanking my research advisor Professor L´aszl´oEgri for his support, guidance and giving me the opportunity to study Machine Learning without any prior education or experience in the field.
    [Show full text]
  • Machine Learning: Decision Trees
    What is learning? • “Learning denotes changes in a system that ... MachineMachine Learning:Learning: enable a system to do the same task more efficiently the next time.” –Herbert Simon • “Learning is constructing or modifying DecisionDecision TreesTrees representations of what is being experienced.” Chapter 18.1-18.3 –Ryszard Michalski • “Learning is making useful changes in our minds.” –Marvin Minsky Some material adopted from notes by Chuck Dyer 1 2 Why study learning? A general model of learning agents • Understand and improve efficiency of human learning – Use to improve methods for teaching and tutoring people (e.g., better computer-aided instruction) • Discover new things or structure previously unknown – Examples: data mining, scientific discovery • Fill in skeletal or incomplete specifications about a domain – Large, complex AI systems can’t be completely derived by hand and require dynamic updating to incorporate new information. – Learning new characteristics expands the domain or expertise and lessens the “brittleness” of the system • Build agents that can adapt to users, other agents, and their environment 3 4 1 Major paradigms of machine learning The inductive learning problem • Rote learning – One-to-one mapping from inputs to stored • Extrapolate from a given set of examples to make accurate representation. “Learning by memorization.” Association-based predictions about future examples storage and retrieval. • Supervised versus unsupervised learning • Induction – Use specific examples to reach general conclusions – Learn an unknown function f(X) = Y, where X is an input example and Y is the desired output. • Clustering – Unsupervised identification of natural groups in data – Supervised learning implies we are given a training set of (X, Y) • Analogy – Determine correspondence between two different pairs by a “teacher” – Unsupervised learning means we are only given the Xs and some representations (ultimate) feedback function on our performance.
    [Show full text]
  • Course 395: Machine Learning – Lectures
    Course 395: Machine Learning – Lectures • Lecture 1-2: Concept Learning (M. Pantic) • Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) • Lecture 5-6: Evaluating Hypotheses (S. Petridis) • Lecture 7-8: Artificial Neural Networks I (S. Petridis) • Lecture 9-10: Artificial Neural Networks II (S. Petridis) • Lecture 11-12: Instance Based Learning (M. Pantic) • Lecture 13-14: Genetic Algorithms (M. Pantic) Maja Pantic Machine Learning (course 395) Decision Trees – Lecture Overview • Problem Representation using a Decision Tree • ID3 algorithm • The problem of overfitting Maja Pantic Machine Learning (course 395) Problem Representation using a Decision Tree • Decision Tree learning is a method for approximating discrete classification functions by means of a tree-based representation • A learned Decision Tree classifies a new instance by sorting it down the tree – tree node ↔ classification OR test of a specific attribute of the instance – tree branch ↔ possible value for the attribute in question • Concept: Good Car size large small ‹size = small, brand = Ferari, mid model = Enzo, sport = yes, brand no sport engine = V12, colour = red› yes Volvo BMW SUV engine no no yes no F12 V12 V8 no yes no Maja Pantic Machine Learning (course 395) Problem Representation using a Decision Tree • A learned Decision Tree can be represented as a set of if-then rules • To ‘read out’ the rules from a learned Decision Tree – tree ↔ disjunction () of sub-trees – sub-tree ↔ conjunction () of constraints on the attribute values • Rule: Good Car size IF (size = large AND brand = BMW) large small OR (size = small AND sport = yes mid AND engine = V12) brand no sport THEN Good Car = yes yes ELSE Good Car = no; Volvo BMW SUV engine no no yes no F12 V12 V8 no yes no Maja Pantic Machine Learning (course 395) Decision Tree Learning Algorithm • Decision Tree learning algorithms employ top-down greedy search through the space of possible solutions.
    [Show full text]
  • An Improved ID3 Decision Tree Algorithm Based on Attribute Weighted
    International Conference on Civil, Materials and Environmental Sciences (CMES 2015) An Improved ID3 Decision Tree Algorithm Based on Attribute Weighted 1 1 1 Xian Liang , Fuheng Qu , Yong Yang Hua Cai2 1 College of Computer Science and Technology, 2 College of electronic and information engineering, Changchun University of Science and Technology, Changchun University of Science and Technology, Changchun 130022, China Changchun 130022, China Abstract—ID3 decision tree algorithm uses information gain properties C has m different values{CCC , ,..., },the selection splitting attribute tend to choose the more property 1 2 m values, and the number of attribute values can not be used to data set of S is divided into m subsets Si (i=1,2...m), measure the attribute importance, in view of the above description attribute A has v different values problems, a new method is proposed for attribute weighting, {AAA , ,..., } the idea of simulation conditional probability, calculation of 1 2 v ,description attribute A set S is partitioned the close contact between the attributes and the decision into v subsets S j (j=1,2...v), the connection between the attributes, as the attribute weights and combination with attribute information gain to selection splitting attribute, attribute A and classification attribute C is defined as v improve the accuracy of decision results. Experiments |S j | show that compared with the improved algorithm and the FA = ∑ .Wj ,The number of samples about the traditional ID3 algorithm, decision tree model has higher j=1 |S | predictive accuracy, less number of leaves. value of attribute A is A j is|S j | ,The total number of Keywords-decision tree; ID3 algorithm; information gain samples is|S | ,the sum of probability of classification I.
    [Show full text]
  • On the Optimization of Fuzzy Decision Trees
    Fuzzy Sets and Systems 112 (2000) 117}125 On the optimization of fuzzy decision trees Xizhao Wang!,*, Bin Chen", Guoliang Qian", Feng Ye" ! Department of Mathematics, Hebei University, Baoding 071002, Hebei, People+s Republic of China " Machine Learning Group, Department of Computer Science, Harbin Institute of Technology, Harbin 150001, People+s Republic of China Received June 1997; received in revised form November 1997 Abstract The induction of fuzzy decision trees is an important way of acquiring imprecise knowledge automatically. Fuzzy ID3 and its variants are popular and e$cient methods of making fuzzy decision trees from a group of training examples. This paper points out the inherent defect of the likes of Fuzzy ID3, presents two optimization principles of fuzzy decision trees, proves that the algorithm complexity of constructing a kind of minimum fuzzy decision tree is NP-hard, and gives a new algorithm which is applied to three practical problems. The experimental results show that, with regard to the size of trees and the classi"cation accuracy for unknown cases, the new algorithm is superior to the likes of Fuzzy ID3. ( 2000 Elsevier Science B.V. All rights reserved. Keywords: Machine learning; Learning from examples; Knowledge acquisition and learning; Fuzzy decision trees; Complexity of fuzzy algorithms; NP-hardness (k) (k) Notation pi , qi relative frequencies concerning some classes X a given "nite set [X"A] a branch of a decision tree i" X" D F(X) the family of all fuzzy subsets fj P( Ai Cj) the conditional frequency (k) de"ned on X Entri fuzzy entropy (k) A, B, C, P, N fuzzy subsets de"ned on X, i.e.
    [Show full text]
  • Selected Algorithms of Machine Learning from Examples
    Fundamenta Informaticae 18 (1993), 193–207 Selected Algorithms of Machine Learning from Examples Jerzy W. GRZYMALA-BUSSE Department of Computer Science, University of Kansas Lawrence, KS 66045, U. S. A. Abstract. This paper presents and compares two algorithms of machine learning from examples, ID3 and AQ, and one recent algorithm from the same class, called LEM2. All three algorithms are illustrated using the same example. Production rules induced by these algorithms from the well-known Small Soybean Database are presented. Finally, some advantages and disadvantages of these algorithms are shown. 1. Introduction This paper presents and compares two algorithms of machine learning, ID3 and AQ, and one recent algorithm, called LEM2. Among current trends in machine learning: similarity-based learning (also called empirical learning), explanation-based learning, computational learning theory, learning through genetic algorithms and learning in neural nets, only the first will be discussed in the paper. All three algorithms, ID3, AQ, and LEM2 represent learning from examples, an approach of similarity-based learning. In learning from examples the most common task is to learn production rules or decision trees. We will assume that in algorithms ID3, AQ and LEM2 input data are represented by examples, each example described by values of some attributes. Let U denote the set of examples. Any subset C of U may be considered as a concept to be learned. An example from C is called a positive example for C, an example from U – C is called a negative example for C. Customarily the concept C is represented by the same, unique value of a variable d, called a decision.
    [Show full text]
  • Extracting Decision Trees from Diagnostic Bayesian Networks to Guide Test Selection
    Extracting Decision Trees from Diagnostic Bayesian Networks to Guide Test Selection Scott Wahl 1, John W. Sheppard 1 1 Department of Computer Science, Montana State University, Bozeman, MT, 59717, USA [email protected] [email protected] ABSTRACT procedure is a natural extension of the general process In this paper, we present a comparison of five differ- used in troubleshooting systems. Given no prior knowl- ent approaches to extracting decision trees from diag- edge, tests are performed sequentially, continuously nar- nostic Bayesian nets, including an approach based on rowing down the ambiguity group of likely faults. Re- the dependency structure of the network itself. With this sulting decision trees are called “fault trees” in the sys- approach, attributes used in branching the decision tree tem maintenance literature. are selected by a weighted information gain metric com- In recent years, tools have emerged that apply an al- puted based upon an associated D-matrix. Using these ternative approach to fault diagnosis using diagnostic trees, tests are recommended for setting evidence within Bayesian networks. One early example of such a net- the diagnostic Bayesian nets for use in a PHM applica- work can be found in the creation of the the QMR knowl- tion. We hypothesized that this approach would yield edge base (Shwe et al., 1991), used in medical diagno- effective decision trees and test selection and greatly re- sis. Bayesian networks provide a means for incorpo- duce the amount of evidence required for obtaining ac- rating uncertainty into the diagnostic process; however, curate classification with the associated Bayesian net- Bayesian networks by themselves provide no guidance works.
    [Show full text]
  • An Implementation of ID3 --- Decision Tree Learning Algorithm
    An Implementation of ID3 --- Decision Tree Learning Algorithm Wei Peng, Juhua Chen and Haiping Zhou Project of Comp 9417: Machine Learning University of New South Wales, School of Computer Science & Engineering, Sydney, NSW 2032, Australia [email protected] Abstract Decision tree learning algorithm has been successfully used in expert systems in capturing knowledge. The main task performed in these systems is using inductive methods to the given values of attributes of an unknown object to determine appropriate classification according to decision tree rules. We examine the decision tree learning algorithm ID3 and implement this algorithm using Java programming. We first implement basic ID3 in which we dealt with the target function that has discrete output values. We also extend the domain of ID3 to real-valued output, such as numeric data and discrete outcome rather than simply Boolean value. The Java applet provided at last section offers a simulation of decision-tree learning algorithm in various situations. Some shortcomings are discussed in this project as well. 1 Introduction 1.1 What is Decision Tree? What is decision tree: A decision tree is a tree in which each branch node represents a choice between a number of alternatives, and each leaf node represents a decision. Decision tree are commonly used for gaining information for the purpose of decision -making. Decision tree starts with a root node on which it is for users to take actions. From this node, users split each node recursively according to decision tree learning algorithm. The final result is a decision tree in which each branch represents a possible scenario of decision and its outcome.
    [Show full text]
  • Decision Trees
    Chapter 9 DECISION TREES Lior Rokach Department of Industrial Engineering Tel-Aviv University [email protected] Oded Maimon Department of Industrial Engineering Tel-Aviv University [email protected] Abstract Decision Trees are considered to be one of the most popular approaches for rep- resenting classifiers. Researchers from various disciplines such as statistics, ma- chine learning, pattern recognition, and Data Mining have dealt with the issue of growing a decision tree from available data. This paper presents an updated sur- vey of current methods for constructing decision tree classifiers in a top-down manner. The chapter suggests a unified algorithmic framework for presenting these algorithms and describes various splitting criteria and pruning methodolo- gies. Keywords: Decision tree, Information Gain, Gini Index, Gain Ratio, Pruning, Minimum Description Length, C4.5, CART, Oblivious Decision Trees 1. Decision Trees A decision tree is a classifier expressed as a recursive partition of the in- stance space. The decision tree consists of nodes that form a rooted tree, meaning it is a directed tree with a node called “root” that has no incoming edges. All other nodes have exactly one incoming edge. A node with outgoing edges is called an internal or test node. All other nodes are called leaves (also known as terminal or decision nodes). In a decision tree, each internal node splits the instance space into two or more sub-spaces according to a certain discrete function of the input attributes values. In the simplest and most fre- 166 DATA MINING AND KNOWLEDGE DISCOVERY HANDBOOK quent case, each test considers a single attribute, such that the instance space is partitioned according to the attribute’s value.
    [Show full text]
  • Classification: Decision Trees
    Classification: Decision Trees These slides were assembled by Byron Boots, with grateful acknowledgement to Eric Eaton and the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution. Robot Image Credit: Viktoriya Sukhanova © 123RF.com Last Time • Common decomposition of machine learning, based on differences in inputs and outputs – Supervised Learning: Learn a function mapping inputs to outputs using labeled training data (you get instances/examples with both inputs and ground truth output) – Unsupervised Learning: Learn something about just data without any labels (harder!), for example clustering instances that are “similar” – Reinforcement Learning: Learn how to make decisions given a sparse reward • ML is an interaction between: – the input data – input to the function (features/attributes of data) – the the function (model) you choose, and – the optimization algorithm you use to explore space of functions • We are new at this, so let’s start by learning if/then rules for classification! Function Approximation Problem Setting • Set of possible instances X • Set of possible labels Y • Unknown target function f : X ! Y • Set of function hypotheses H = h h : { | X ! Y} Input: Training examples of unknown target function f x ,y n = x ,y ,..., x ,y {h i ii}i=1 {h 1 1i h n ni} Output: Hypothesis that best approximates h H f 2 Based on slide by Tom Mitchell Sample Dataset (was Tennis Played?) • Columns denote
    [Show full text]
  • Pissn:2321-2241 Volume: 8; Issue: 12; December -2019
    International Journal of Combined Research & Development (IJCRD) eISSN:2321-225X;pISSN:2321-2241 Volume: 8; Issue: 12; December -2019 Contriving Decision Trees with Iterative Dichotomiser Algorithm(ID3) Sreevani Tapila , Dharamvir 1,2 Assistant professor 1,2 Department of Master of computer Applications 1,2 The Oxford College of Engineering, Bangalore-560068 Abstract : Decision tree learning is one of the typically used in the machine predictive modelling approaches used learning and natural language in statistics, data mining and machine learning. It uses a decision tree (as a predictive model) to go processing domains. from observations about an item (represented in the 1.1. ID3 Algorithm branches) to conclusions about the item's target value (represented in the leaves) .In decision tree learning, ID3 (Iterative Dichotomiser 3) is ID3 (Examples, Target_Attribute, an algorithm invented by Ross Quinlan[1] used to Attributes) generate a decision tree from a dataset. ID3 is the Create a root node for the tree precursor to the C4.5 algorithm, and is typically used If all examples are positive, Return the in the machine learning and natural language single-node tree Root, with label = +. processing domains . If all examples are negative, Return the Keywords: Decision trees,ID3,Entropy,Information single-node tree Root, with label = -. Gain,C4.5 sucessor of ID3,Gain of Attributes. If number of predicting attributes is empty, then Return the single node tree 1. Introduction Root, A decision tree is a simple representation with label = most common value of the for classifying examples. For this section, target attribute in the examples. assume that all of the input data have finite Otherwise Begin discrete domains, and there is a single target A ← The Attribute that best classifies feature called the "classification".
    [Show full text]