Machine Learning Review Review: Learning Models Review: Learning

Review • Training Set / Testing Set (60/40) – In sample – Out of sample • ValiDation Metrics (training anD testing & Different moDels) Machine Learning – Root Mean Square Error (RMSE) – Mean Absolute Error (MAE) (fixes large resiDuals) – Cross ValiDation Training & Testing – Correlation • Over-fitting, UnDer-fitting Conclusion – KNN (k small : over-fitting, k large : unDer-fitting) – Parametric MoDels (Linear à Polynomials of Degree) – Compare to In-Sample & Out-of-Sample anD over/unDer fitting) Review: Learning MoDels Review: Learning MoDels • Parametric MoDels (Linear to Polynomials) • RanDom Forest (Breiman, Cutler) • KNN – Ensemble of random decision trees grown using bagging. • Decision Tree – Idea: A single tree in the forest is weaker than a full decision tree, but – NoDe Splitting by putting them all together, we get better overall performance thanks • Which feature to split on? to Diversity. – Classification maximize: information gain [concepts ranDom, entropy, information gain, Gini impurity] – Regression : Variance reDuction, correlation, mean square error (rmse), mean absolute error (mae) – Motivation: overcome the over-fitting problem of single full Decision • Which value to split on? MeDian, average, ranDom trees. – Depth of tree • RanDomness comes in two (sometimes three). – Aggregating Leaves (to control complexity) return average preDiction. – Bagging creating training Data. • Ensemble Learners – Feature selection at noDe: – Same Class of Learners • Best features among a ranDom (smaller) subset of ranDom features – Different Class of Learners. – Value selection/thresholD • Bootstrap Aggregating • RanDomly select value to spit on – Manipulate training sample (generate it ranDomly) • Project a variation (simplifieD) of RanDom Forest. – ADa Boost – Example. Subset of features is just 1. Project 2. Highlights Overview (see “NEW”): • Thinking: Which moDel to use for what type of New: All of the the Decision tree learners you create shoulD be Data? implementeD using a matrix Data representation (numpy nDarray). • DT Learner – What Does the Data look like? – BuilDTree() – adDEviDence() à calls BuilDTree() • ADDing Time Series how Do the assumption – SelectSplitFeature() • Best feature to split on change? • Absolute value correlation with Y. • RTLearner – BuildTree() – addEvidence() – SelectSplitFeature() à only one Different from DT • RanDomly selects a feature http://quantsoftware.gatech.eDu/images/4/4e/How-to-learn-a-Decision-tree.pDf • BagLearner() • InsaneLearner() – Bag of “BagLearners” (LinearRegression) – 20 BagLearner instances where: • each instance is composeD of 20 LinRegLearner instances. .

Machine Learning Review Review: Learning Models Review: Learning

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support