
Data Mining and Machine Learning Madhavan Mukund Lecture 19, Jan{Apr 2020 https://www.cmi.ac.in/~madhavan/courses/dmml2020jan/ Limitations of classification models Bias : Expressiveness of model limits classification I For instance, linear separators Variance: Variation in model based on sample of training data I Shape of a decision tree varies with distribution of training inputs Models with high variance are expressive but unstable In principle, a decision tree can capture an arbitrarily complex classification criterion Actual structure of the tree depends on impurity calculation Danger of overfitting: model tied too closely to training set Is there an alternative to pruning? Madhavan Mukund Data Mining and Machine Learning Lecture 19, Jan{Apr 2020 2 / 15 Ensemble models Sequence of independent training data sets D1, D2,..., Dk Generate models M1, M2,..., Mk Take this ensemble of models and \average" them I For regression, take the mean of the predictions I For classification, take a vote among the results and choose the most popular one Challenge: Infeasible to get large number of independent training samples Can we build independent models from a single training data set? I Strategy to build the model is fixed I Same data will produce same model Madhavan Mukund Data Mining and Machine Learning Lecture 19, Jan{Apr 2020 3 / 15 Bootstrap Aggregating = Bagging Training data has N items I TD = fd1; d2;:::; dN g Pick a random sample with replacement 1 I Pick an item at random (probability N ) I Put it back into the set I Repeat K times Some items in the sample will be repeated If sample size is same as data size (K = N), expected number of 1 distinct items is (1 − ) · N e I Approx 63.2% Madhavan Mukund Data Mining and Machine Learning Lecture 19, Jan{Apr 2020 4 / 15 Bootstrap Aggregating = Bagging Sample with replacement of size N : bootstrap sample I Approx 2/3 of full training data Take k such samples Build a model for each sample I Models will vary because each uses different training data Final classifier: report the majority answer I Assumptions: binary classifier, k odd Provably reduces variance Madhavan Mukund Data Mining and Machine Learning Lecture 19, Jan{Apr 2020 5 / 15 Bagging with decision trees Madhavan Mukund Data Mining and Machine Learning Lecture 19, Jan{Apr 2020 6 / 15 Bagging with decision trees Madhavan Mukund Data Mining and Machine Learning Lecture 19, Jan{Apr 2020 7 / 15 Bagging with decision trees Madhavan Mukund Data Mining and Machine Learning Lecture 19, Jan{Apr 2020 8 / 15 Bagging with decision trees Madhavan Mukund Data Mining and Machine Learning Lecture 19, Jan{Apr 2020 9 / 15 Bagging with decision trees Madhavan Mukund Data Mining and Machine Learning Lecture 19, Jan{Apr 2020 10 / 15 Bagging with decision trees Madhavan Mukund Data Mining and Machine Learning Lecture 19, Jan{Apr 2020 11 / 15 When to use bagging Bagging improves performance when there is high variance I Independent samples produce sufficiently different models A model with low variance will not show improvement I k-nearest neighbour classifier I Given an unknown input, find k nearest neighbours and choose majority I Across different subsets of training data, variation in k nearest neighbours is relatively small I Bootstrap samples will produce similar models Madhavan Mukund Data Mining and Machine Learning Lecture 19, Jan{Apr 2020 12 / 15 Random Forest Applying bagging to decision trees with a further twist As before, k bootstrap samples D1, D2,..., Dk For each Di , build decision tree Ti as follows I Each data item has M attributes I Normally, choose maximum impurity gain among M attributes, then best among remaining M − 1,... I Instead, fix a small limit m < M | say m = log2 M + 1 I At each level, choose a random subset of available attributes of size m I Evaluate only these m attributes to choose next query I No pruning | build each tree to the maximum Final classifier: vote on the results returned by T1, T2,..., Tk Madhavan Mukund Data Mining and Machine Learning Lecture 19, Jan{Apr 2020 13 / 15 Random Forest . Theoretically, overall error rate depends on two factors I Correlation between pairs of trees | higher correlation results in higher overall error rate I Strength (accuracy) of each tree | higher strength of individual trees results in lower overall error rate Reducing m, the number of attributes examined at each level, reduces correlation and strength I Both changes influence the error rate in opposite directions Increasing m increases both correlation and strength Search for a value of m that optimizes overall error rate Madhavan Mukund Data Mining and Machine Learning Lecture 19, Jan{Apr 2020 14 / 15 Out of bag error estimate Each bootstrap sample omits about 1/3 of the data items Hence, each data item is omitted by about 1/3 of the samples If data item d does not appear in bootstrap sample Di , d is out of bag (oob) for Di Oob classification | for each d, vote only among those Ti where d is oob for Di Oob samples serve as a test data set I Estimate generalization error rate of overall model based on error rate of oob classification without a separate test data set Can also estimate relative significance of attributes I For a given tree, perturb the values of attribute A in oob data items I Measure the change in error rate for oob samples Madhavan Mukund Data Mining and Machine Learning Lecture 19, Jan{Apr 2020 15 / 15.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages15 Page
-
File Size-