Review

• Training Set / Testing Set (60/40) – In sample – Out of sample • Validation Metrics (training and testing & different models) – Root Mean Square Error (RMSE) – Mean Absolute Error (MAE) (fixes large residuals) – Cross Validation Training & Testing – Correlation • Over-fitting, Under-fitting Conclusion – KNN (k small : over-fitting, k large : under-fitting) – Parametric Models (Linear à Polynomials of degree) – Compare to In-Sample & Out-of-Sample and over/under fitting)

Review: Learning Models Review: Learning Models • Parametric Models (Linear to Polynomials) • (Breiman, Cutler) • KNN – Ensemble of random decision trees grown using bagging. • Decision Tree – Idea: A single tree in the forest is weaker than a full decision tree, but – Node Splitting by putting them all together, we get better overall performance thanks • Which feature to split on? to diversity. – Classification maximize: information gain [concepts random, entropy, information gain, Gini impurity]

– Regression : reduction, correlation, mean square error (rmse), mean absolute error (mae) – Motivation: overcome the over-fitting problem of single full decision • Which value to split on? Median, average, random trees. – Depth of tree • Randomness comes in two (sometimes three). – Aggregating Leaves (to control complexity) return average prediction. – Bagging creating training data. • Ensemble Learners – at node: – Same Class of Learners • Best features among a random (smaller) subset of random features – Different Class of Learners. – Value selection/threshold • Bootstrap Aggregating • Randomly select value to spit on – Manipulate training sample (generate it randomly) • Project a variation (simplified) of Random Forest. – Ada Boost – Example. Subset of features is just 1. Project 2. Highlights

Overview (see “NEW”): • Thinking: Which model to use for what type of New: All of the the decision tree learners you create should be data? implemented using a matrix data representation (numpy ndarray). • DT Learner – What does the data look like? – BuildTree() – addEvidence() à calls BuildTree() • Adding Time Series how do the assumption – SelectSplitFeature() • Best feature to split on change? • Absolute value correlation with Y. • RTLearner – BuildTree() – addEvidence() – SelectSplitFeature() à only one different from DT • Randomly selects a feature

http://quantsoftware.gatech.edu/images/4/4e/How-to-learn-a-decision-tree.pdf

• BagLearner() • InsaneLearner() – Bag of “BagLearners” (LinearRegression) – 20 BagLearner instances where: • each instance is composed of 20 LinRegLearner instances.