Gradient Boosting to Build Additive Tree Models, for Example, for Representing the Logits in Logistic Regression

Total Page:16

File Type:pdf, Size:1020Kb

Gradient Boosting to Build Additive Tree Models, for Example, for Representing the Logits in Logistic Regression January 2003 Trevor Hastie, Stanford University 1 Boosting Trevor Hastie Statistics Department Stanford University Collaborators: Brad Efron, Jerome Friedman, Saharon Rosset, Rob Tibshirani, Ji Zhu http://www-stat.stanford.edu/∼hastie/TALKS/boost.pdf January 2003 Trevor Hastie, Stanford University 2 Outline • Model Averaging • Bagging • Boosting • History of Boosting • Stagewise Additive Modeling • Boosting and Logistic Regression • MART • Boosting and Overfitting • Summary of Boosting, and its place in the toolbox. Methods for improving the performance of weak learners. Classification trees are adaptive and robust, but do not generalize well. The techniques discussed here enhance their performance considerably. Reference: Chapter 10. January 2003 Trevor Hastie, Stanford University 3 Classification Problem Bayes Error Rate: 0.25 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 11 0 1 0 101 10 1 1 1 1 1 1 11 1 11 11 1 1 1 1 1 0 1 1 1 1 11 1 1 101 0 11 1 0 1 1 1 1 010 11 11 0 01 11 1 1 1 1 010 0 01 1001010 1 0 1 1 1 11 11 0 00001000101 00001111100 10 1 11101 10000100000 000001001011010 0 1 1 1 0 100 100100000010000010001 11 000010 000011000000 00010 1 1 1 1 11 1 1 1 00 010100010100000000101010 0000101 1101 1 1 1 1 0 11100000000101000011001 00000 1 1 1 1 11 1 10000000000010000000000010000101010010 11011 X2 1 1 1 00 00000 0 1000 0 1 0 0 10 0 00101000000001000010000010 1 1 1 111 1 11010100100000010001010010101010 1100 010 01 1 1 1 010 0010100000000 0001001101001 0 11 0 10110101101 000000000010 0 11 11 1 1 1 10 0101101000100000000 1 11 11 1 1 1 1 1 11 10 1 01 1000100 011 110 1 1 1 1 1 111111 1 110101010 1 11 1 111 1 1001 100 0 010 0 1 1 1 11 11 00 0 11 10 11 1 1 1 1 11101 01 1 1 1 111 1 1 1 1 1 1 11 1 1 1 11 1 1 1 1 1 1 111 1 1 11 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 -6-4-20246 1 1 -6 -4 -2 0 2 4 6 X1 Data (X, Y ) ∈ Rp ×{0, 1}. X is predictor, feature; Y is class label, response. (X, Y ) have joint probability distribution D. Goal: Based on N training pairs (Xi,Yi)drawn from D produce a classifier Cˆ(X) ∈{0, 1} Goal: choose Cˆ to have low generalization error R(Cˆ)=PD(Cˆ(X) = Y ) = ED[1(Cˆ(X)= Y )] January 2003 Trevor Hastie, Stanford University 4 Deterministic Concepts Bayes Error Rate: 0 1 1 1 11 1 1 1 1 1 11 11 1 11 1 1 1 1 11 1 1 1 1 11 1 1 1 1 1 1 1 1 11 1 1 11 1 1 1 1 1 1 111 11 1 1 1 1 1 11 11 11 111 11 111 1 1 1 1 1 1 111 1 1 1 1 11 1 0 00 1 11 1 1 1 1 1 11 11 11 11 000 0 0 1 11 1 11 1 1 1 11 000 00 01 1 1 1 1 000 0 01 11 1 1 1 1 111 00000 000 0000 00 0000 1 1 11 1 1 1 110000 0 0000 0 0 0 00 1 1 1 1 11 11110 000 00 0000000000000 0 11 1111 1 1 1 1000000 000 0 00 0 0 000 0 1 1 1 11 1100 00000000000 00000 0000 0 111 11 1 1 1 0 00000 000 00000000000000000 001 1 1 1 1 11 00000 0 000000 000000000 0 0000 11 1 1 11 1 1 X2 1 1 1 0 00 00000 0 0000000 0 0 1 1 1 0000 00000 0000000 0 000 1 1 11 1 100 0 00 0 00 0000000 000000 0111 1 1 11 1 11 1 000 000 000000 00 0 000 0 01111 1 1 11 1 00 00 0000000000000 000 00000 11111 1 1 1 1 1 11 0 0000000000 000 11 1 1 1 1 1 11 1 0 000000000000000 0 0 1 111 11 1 11 11000000 000 00 00 11 1 1 1 1 1 1 111 00 0 0 1 11111 1 1 1 1 1 11 00 00 0000 11 1 1 1 11 11 1 111 1 1 1 1 1 1 1 111 1111 11 11 1 1 1 1 11 11 11 1 111 1 11 111 1 1 11 1 1 1 1 1 1111 1 1 1 1 1 1 1 1 1 1 11 1 1 1 111 1 1 1 1 1 1 1 1 11 -3-2-10123 -3 -2 -1 0 1 2 3 X1 X ∈ Rp has distribution D. C(X)isdeterministic function ∈ concept class. Goal: Based on N training pairs (Xi,Yi = C(Xi)) drawn from D produce a classifier Cˆ(X) ∈{0, 1} Goal: choose Cˆ to have low generalization error R(Cˆ)=PD(Cˆ(X) = C(X)) = ED[1(Cˆ(X)= C(X))] January 2003 Trevor Hastie, Stanford University 5 Classification Tree Sample of size 200 1 94/200 x.2<-1.06711 x.2>-1.06711 1 0 0/34 72/166 x.2<1.14988 x.2>1.14988 0 1 40/134 0/32 x.1<1.13632 x.1>1.13632 0 1 23/117 0/17 x.1<-0.900735 x.1>-0.900735 1 0 5/26 2/91 x.1<-1.1668 x.2<-0.823968 x.1>-1.1668 x.2>-0.823968 1 1 0 0 0/12 5/14 2/8 0/83 x.1<-1.07831 x.1>-1.07831 1 1 1/5 4/9 January 2003 Trevor Hastie, Stanford University 6 Decision Boundary: Tree Error Rate: 0.073 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 11 1 11 1 1 0 0 1 1 1 0 11 1 1 0 0 0 000 1 1 1 00 0 0 0 00 1 0 0 0 000 0 1 0 0 0 0 11 1 0 0 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 00 X2 0 0 0 0000 00 0 0 0 0 1 1 0 00 0000 0 0 0 0 0 00 0 00 00 00 1 000 0 1 1 1 000 000 0 0 0 1 1 0 0 00 1 1 1 1 0 10 1 1 1 1 01 11 1 1 1 1 1 1 11 1 1 1 11 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 -3-2-10123 -3 -2 -1 0 1 2 3 X1 When the nested spheres are in IR10,CART produces a rather noisy and inaccurate rule Gˆ(X), with error rates around 30%. January 2003 Trevor Hastie, Stanford University 7 Model Averaging Classification trees can be simple, but often produce noisy (bushy) or weak (stunted) classifiers. • Bagging (Breiman, 1996): Fit many large trees to bootstrap-resampled versions of the training data, and classify by majority vote. • Boosting (Freund & Shapire, 1996): Fit many large or small trees to reweighted versions of the training data. Classify by weighted majority vote. In general Boosting Bagging Single Tree. “AdaBoost ··· best off-the-shelf classifier in the world” — Leo Breiman, NIPS workshop, 1996. January 2003 Trevor Hastie, Stanford University 8 Bagging Bagging or bootstrap aggregation averages a given procedure over many samples, to reduce its variance — a poor man’s Bayes. See pp 246. Suppose G(X,x) is a classifier, such as a tree, producing a predicted class label at input point x. To bag G, we draw bootstrap samples X∗1,...X∗B each of size N with replacement from the training data. Then ˆ { ∗b }B Gbag(x) = Majority Vote G(X ,x) b=1. Bagging can dramatically reduce the variance of unstable procedures (like trees), leading to improved prediction. However any simple structure in G (e.g a tree) is lost. January 2003 Trevor Hastie, Stanford University 9 Original Tree Bootstrap Tree 1 0 0 10/30 7/30 x.2<0.39 x.2<0.36 x.2>0.39 x.2>0.36 0 1 0 1 3/21 2/9 1/23 1/7 x.3<-1.575 x.1<-0.965 x.3>-1.575 x.1>-0.965 1 0 0 0 2/5 0/16 1/5 0/18 Bootstrap Tree 2 Bootstrap Tree 3 0 0 11/30 4/30 x.2<0.39 x.4<0.395 x.2>0.39 x.4>0.395 0 1 0 0 3/22 0/8 2/25 2/5 x.3<-1.575 x.3<-1.575 x.3>-1.575 x.3>-1.575 1 0 0 0 2/5 0/17 2/5 0/20 Bootstrap Tree 4 Bootstrap Tree 5 0 0 13/30 12/30 x.2<0.255 x.2<0.38 x.2>0.255 x.2>0.38 0 1 0 1 2/16 3/14 4/20 2/10 x.3<-1.385 x.3<-1.61 x.3>-1.385 x.3>-1.61 0 0 1 0 2/5 0/11 2/6 0/14 January 2003 Trevor Hastie, Stanford University 10 Decision Boundary: Bagging Error Rate: 0.032 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 11 1 11 1 1 0 0 1 1 1 0 11 1 1 0 0 0 000 1 1 1 00 0 0 0 00 1 0 0 0 000 0 1 0 0 0 0 11 1 0 0 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 00 X2 0 0 0 0000 00 0 0 0 0 1 1 0 00 0000 0 0 0 0 0 00 0 00 00 00 1 000 0 1 1 1 000 000 0 0 0 1 1 0 0 00 1 1 1 1 0 10 1 1 1 1 01 11 1 1 1 1 1 1 11 1 1 1 11 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 -3-2-10123 -3 -2 -1 0 1 2 3 X1 Bagging averages many trees, and produces smoother decision boundaries.
Recommended publications
  • Regularizing Adaboost with Validation Sets of Increasing Size
    Regularizing AdaBoost with validation sets of increasing size Dirk W.J. Meijer and David M.J. Tax Delft University of Technology Delft, The Netherlands [email protected], [email protected] Abstract—AdaBoost is an iterative algorithm to construct Algorithm 1: AdaBoost classifier ensembles. It quickly achieves high accuracy by focusing x N on objects that are difficult to classify. Because of this, AdaBoost Input: Dataset , consisting of objects tends to overfit when subjected to noisy datasets. We observe hx1; : : : ; xN i 2 X with labels that this can be partially prevented with the use of validation hy1; : : : ; yN i 2 Y = f1; : : : ; cg sets, taken from the same noisy training set. But using less than Input: Weak learning algorithm WeakLearn the full dataset for training hurts the performance of the final classifier ensemble. We introduce ValidBoost, a regularization of Input: Number of iterations T AdaBoost that takes validation sets from the dataset, increasing in 1 1 Initialize: Weight vector wi = N for i = 1;:::;N size with each iteration. ValidBoost achieves performance similar 1 for t = 1 to T do to AdaBoost on noise-free datasets and improved performance on t noisy datasets, as it performs similar at first, but does not start 2 Call WeakLearn, providing it with weights wi ! to overfit when AdaBoost does. 3 Get back hypothesisP ht : X Y N t 6 Keywords: Supervised learning, Ensemble learning, 4 Compute ϵ = i=1 wi [ht(xi) = yi] 1 Regularization, AdaBoost 5 if ϵ > 2 then 6 Set αt = 0 I.
    [Show full text]
  • Adaboost Artificial Neural Network for Stock Market Predicting
    2016 Joint International Conference on Artificial Intelligence and Computer Engineering (AICE 2016) and International Conference on Network and Communication Security (NCS 2016) ISBN: 978-1-60595-362-5 AdaBoost Artificial Neural Network for Stock Market Predicting Xiao-Ming BAI1,a, Cheng-Zhang WANG2,b,* 1Information School, Capital University of Economics and Business, Beijing, China 2School of Statistics and Mathematics, Central University of Finance and Economics, Beijing, China [email protected], [email protected] *Corresponding author Keywords: AdaBoost, Artificial Neural Network, Stock Index Movement. Abstract. In this work, we propose a new direction of stock index movement prediction algorithm, coined the Ada-ANN forecasting model, which exploits AdaBoost theory and ANN to fulfill the predicting task. ANNs are employed as the weak forecasting machines to construct one strong forecaster. Technical indicators from Chinese stock market and international stock markets such as S&P 500, NSADAQ, and DJIA are selected as the predicting independent variables for the period under investigation. Numerical results are compared and analyzed between strong forecasting machine and the weak one. Experimental results show that the Ada-ANN model works better than its rival for predicting direction of stock index movement. Introduction Stock price index movement is a primary factor that investors have to consider during the process of financial decision making. Core of stock index movement prediction is to forecast the close price on the end point of the time period. Research scheme based on technical indicators analysis assumes that behavior of stock has the property of predictability on the basis of its performance in the past and all effective factors are reflected by the stock price.
    [Show full text]
  • A Gentle Introduction to Gradient Boosting
    A Gentle Introduction to Gradient Boosting Cheng Li [email protected] College of Computer and Information Science Northeastern University Gradient Boosting I a powerful machine learning algorithm I it can do I regression I classification I ranking I won Track 1 of the Yahoo Learning to Rank Challenge Our implementation of Gradient Boosting is available at https://github.com/cheng-li/pyramid Outline of the Tutorial 1 What is Gradient Boosting 2 A brief history 3 Gradient Boosting for regression 4 Gradient Boosting for classification 5 A demo of Gradient Boosting 6 Relationship between Adaboost and Gradient Boosting 7 Why it works Note: This tutorial focuses on the intuition. For a formal treatment, see [Friedman, 2001] What is Gradient Boosting Gradient Boosting = Gradient Descent + Boosting Adaboost Figure: AdaBoost. Source: Figure 1.1 of [Schapire and Freund, 2012] What is Gradient Boosting Gradient Boosting = Gradient Descent + Boosting Adaboost Figure: AdaBoost. Source: Figure 1.1 of [Schapire and Freund, 2012] P I Fit an additive model (ensemble) t ρt ht (x) in a forward stage-wise manner. I In each stage, introduce a weak learner to compensate the shortcomings of existing weak learners. I In Adaboost,\shortcomings" are identified by high-weight data points. What is Gradient Boosting Gradient Boosting = Gradient Descent + Boosting Adaboost X H(x) = ρt ht (x) t Figure: AdaBoost. Source: Figure 1.2 of [Schapire and Freund, 2012] What is Gradient Boosting Gradient Boosting = Gradient Descent + Boosting Gradient Boosting P I Fit an additive model (ensemble) t ρt ht (x) in a forward stage-wise manner. I In each stage, introduce a weak learner to compensate the shortcomings of existing weak learners.
    [Show full text]
  • The Evolution of Boosting Algorithms from Machine Learning to Statistical Modelling∗
    The Evolution of Boosting Algorithms From Machine Learning to Statistical Modelling∗ Andreas Mayry1, Harald Binder2, Olaf Gefeller1, Matthias Schmid1;3 1 Institut f¨urMedizininformatik, Biometrie und Epidemiologie, Friedrich-Alexander-Universit¨atErlangen-N¨urnberg, Germany 2 Institut f¨urMedizinische Biometrie, Epidemiologie und Informatik, Johannes Gutenberg-Universit¨atMainz, Germany 3 Institut f¨urMedizinische Biometrie, Informatik und Epidemiologie, Rheinische Friedrich-Wilhelms-Universit¨atBonn, Germany Abstract Background: The concept of boosting emerged from the field of machine learning. The basic idea is to boost the accuracy of a weak classifying tool by combining various instances into a more accurate prediction. This general concept was later adapted to the field of statistical modelling. Nowadays, boosting algorithms are often applied to estimate and select predictor effects in statistical regression models. Objectives: This review article attempts to highlight the evolution of boosting algo- rithms from machine learning to statistical modelling. Methods: We describe the AdaBoost algorithm for classification as well as the two most prominent statistical boosting approaches, gradient boosting and likelihood-based boosting for statistical modelling. We highlight the methodological background and present the most common software implementations. Results: Although gradient boosting and likelihood-based boosting are typically treated separately in the literature, they share the same methodological roots and follow the same fundamental concepts. Compared to the initial machine learning algo- rithms, which must be seen as black-box prediction schemes, they result in statistical models with a straight-forward interpretation. Conclusions: Statistical boosting algorithms have gained substantial interest during the last decade and offer a variety of options to address important research questions in modern biomedicine.
    [Show full text]
  • Explaining the Success of Adaboost and Random Forests As Interpolating Classifiers
    Journal of Machine Learning Research 18 (2017) 1-33 Submitted 5/15; Revised 2/17; Published 5/17 Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers Abraham J. Wyner [email protected] Matthew Olson [email protected] Justin Bleich [email protected] Department of Statistics Wharton School, University of Pennsylvania Philadelphia, PA 19104, USA David Mease [email protected] Apple Inc. Editor: Koby Crammer Abstract There is a large literature explaining why AdaBoost is a successful classifier. The literature on AdaBoost focuses on classifier margins and boosting's interpretation as the optimiza- tion of an exponential likelihood function. These existing explanations, however, have been pointed out to be incomplete. A random forest is another popular ensemble method for which there is substantially less explanation in the literature. We introduce a novel per- spective on AdaBoost and random forests that proposes that the two algorithms work for similar reasons. While both classifiers achieve similar predictive accuracy, random forests cannot be conceived as a direct optimization procedure. Rather, random forests is a self- averaging, interpolating algorithm which creates what we denote as a \spiked-smooth" classifier, and we view AdaBoost in the same light. We conjecture that both AdaBoost and random forests succeed because of this mechanism. We provide a number of examples to support this explanation. In the process, we question the conventional wisdom that sug- gests that boosting algorithms for classification require regularization or early stopping and should be limited to low complexity classes of learners, such as decision stumps. We con- clude that boosting should be used like random forests: with large decision trees, without regularization or early stopping.
    [Show full text]
  • Survey of Metaheuristics and Statistical Methods for Multifactorial Diseases Analyses
    AIMS Medical Science, 4(3): 291-331 DOI: 10.3934/medsci.2017.3.291 Received 20 April 2017, Accepted 31 July 2017, Published 14 August 2017 http://www.aimspress.com/journal/medicalScience Research article Survey of Metaheuristics and Statistical Methods for Multifactorial Diseases Analyses Hend Amraoui 1, Faouzi Mhamdi 1, *, and Mourad Elloumi 1 1 Laboratory of Technologies of Information and Communication and Electrical Engineering (LaTICE), National Superior School of Engineers of Tunis (ENSIT), University of Tunis, Tunisia * Correspondence: Email: [email protected]; Tel: +216-78441744; Fax: +216-78441744 Abstract: The identification of the interactions of polymorphisms with other genetic or environmental factors for the detection of multifactorial diseases has now become both a challenge and an objective for geneticists. Unlike monogenic Mendelian diseases, the classical methods have not become too efficient for the identification of these interactions, especially with the exponential increase in the number of genetic interactions as well as the number of combinations of genotypes. Several methods have been proposed for the detection of susceptibility variants such as metaheuristics and statistical methods. Using metaheuristics, we focus on the feature selection of variables, and more precisely on the determination of the genes that increase the susceptibility to the disease, especially as these methods are more suitable for the description of complex data. Statistical methods are divided into two submethods including linkage studies and association studies. Generally these two methods are used one after the other since they are complementary. The linkage study is used initially because its objective is the localization of the chromosomal regions containing the gene(s) involved in the disease.
    [Show full text]
  • A Comparison of Adaboost Algorithms for Time Series Forecast Combination
    See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/303738929 A comparison of AdaBoost algorithms for time series forecast combination Article in International Journal of Forecasting · June 2016 DOI: 10.1016/j.ijforecast.2016.01.006 CITATIONS READS 0 171 1 author: Devon K. Barrow Coventry University 14 PUBLICATIONS 117 CITATIONS SEE PROFILE All content following this page was uploaded by Devon K. Barrow on 02 June 2016. The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document Lancaster E-Prints and are linked to publications on ResearchGate, letting you access and read them immediately. provided by View metadata, citation and similar papers at core.ac.uk CORE brought to you by A comparison of AdaBoost algorithms for time series forecast combination Devon K. Barrow1 and Sven F. Crone2 1School of Strategy and Leadership, Coventry University, Priory Street, Coventry, CV1 5FB 2Lancaster University Management School, Department of Management Science, Lancaster, Lancashire, LA1 4YX, UK Abstract Recently, combination algorithms from machine learning classification have been extended to time series regression, most notably seven variants of the popular AdaBoost algorithm. Despite their theoretical promise their empirical accuracy in forecasting has not yet been assessed, either against each other or against any established approaches of forecast combination, model selection, or statistical benchmark algorithms. Also, none of the algorithms have been assessed on a representative set of empirical data, using only few synthetic time series. We remedy this omission by conducting a rigorous empirical evaluation using a representative set of 111 industry time series and a valid and reliable experimental design.
    [Show full text]
  • Effectiveness of Boosting Algorithms in Forest Fire Classification
    EFFECTIVENESS OF BOOSTING ALGORITHMS IN FOREST FIRE CLASSIFICATION C. Özkana,∗, F. Sunarb, S. Berberoğluc, C. Dönmezc a Erciyes University, Engineering Faculty, Geodesy and Photogrammetry Engineering Dept., 38039 Kayseri, Turkey - [email protected] b Istanbul Technical University, Civil Engineering Faculty, Remote Sensing Division, 34469 Maslak Istanbul, Turkey - [email protected] c Cukurova University, Department of Landscape Architecture, Faculty of Agriculture, 01330 Adana, Turkey - (suha, cdonmez)@cu.edu.tr KEY WORDS: Forest fire, SPOT 4, Adaboost, Logitboost, Multilayer Perseptron, Regression Tree. ABSTRACT: In this paper, it is aimed to investigate the capabilities of boosting classification approach for forest fire detection using SPOT-4 imagery. The study area, Bodrum in the province of Muğla, is located at the south-western Mediterranean coast of Turkey where recent largest forest fires occurred in July 2007. Boosting method is one of the recent advanced classifiers proposed in the machine learning community, such as neural networks classifiers based on multilayer perceptron (MLP), radial basis function and learning vector quantization. The Adaboost (AB) and Logitboost (LB) algorithms which are the most common boosting methods were used for binary and multiclass classifications. The effectiveness of boosting algorithms was shown through comparison with Bayesian maximum likelihood (ML) classifier, neural network classifier based on multilayer perceptron (MLP) and regression tree (RT) classifiers. The pre and post SPOT images were corrected atmospherically and geometrically. Binary classification comprised burnt and non-burnt classes. In addition to the pixel based classification, textural measures including, gray level co-occurrence matrix such as entropy, homogeneity, second angular moment, etc. were also incorporated. Instead of the traditional boosting weak (base) classifiers such as decision tree builder or perceptron learning rule, neural network classifier based on multilayer perceptron were adapted as a weak classifier.
    [Show full text]
  • Performance Analysis of Boosting Classifiers in Recognizing Activities of Daily Living
    International Journal of Environmental Research and Public Health Article Performance Analysis of Boosting Classifiers in Recognizing Activities of Daily Living Saifur Rahman 1,*, Muhammad Irfan 1 , Mohsin Raza 2, Khawaja Moyeezullah Ghori 3 , Shumayla Yaqoob 3 and Muhammad Awais 4,* 1 Electrical Engineering Department, College of Engineering, Najran University, Najran 61441, Saudi Arabia; [email protected] 2 Department of Computer and Information Sciences, Northumbria University, Newcastle-upon-Tyne NE1 8ST, UK; [email protected] 3 Department of Computer Science, National University of Modern Languages, Islamabad 44000, Pakistan; [email protected] (K.M.G.); [email protected] (S.Y.) 4 Faculty of Medicine and Health, School of Psychology, University of Leeds, Leeds LS2 9JT, UK * Correspondence: [email protected] (S.R.); [email protected] (M.A.) Received: 22 January 2020; Accepted: 5 February 2020; Published: 8 February 2020 Abstract: Physical activity is essential for physical and mental health, and its absence is highly associated with severe health conditions and disorders. Therefore, tracking activities of daily living can help promote quality of life. Wearable sensors in this regard can provide a reliable and economical means of tracking such activities, and such sensors are readily available in smartphones and watches. This study is the first of its kind to develop a wearable sensor-based physical activity classification system using a special class of supervised machine learning approaches called boosting algorithms. The study presents the performance analysis of several boosting algorithms (extreme gradient boosting—XGB, light gradient boosting machine—LGBM, gradient boosting—GB, cat boosting—CB and AdaBoost) in a fair and unbiased performance way using uniform dataset, feature set, feature selection method, performance metric and cross-validation techniques.
    [Show full text]
  • A Comparative Performance Assessment of Ensemble Learning for Credit Scoring
    mathematics Article A Comparative Performance Assessment of Ensemble Learning for Credit Scoring Yiheng Li * and Weidong Chen College of Management and Economics, Tianjin University, Tianjin 300072, China; [email protected] * Correspondence: [email protected] Received: 5 September 2020; Accepted: 29 September 2020; Published: 13 October 2020 Abstract: Extensive research has been performed by organizations and academics on models for credit scoring, an important financial management activity. With novel machine learning models continue to be proposed, ensemble learning has been introduced into the application of credit scoring, several researches have addressed the supremacy of ensemble learning. In this research, we provide a comparative performance evaluation of ensemble algorithms, i.e., random forest, AdaBoost, XGBoost, LightGBM and Stacking, in terms of accuracy (ACC), area under the curve (AUC), Kolmogorov–Smirnov statistic (KS), Brier score (BS), and model operating time in terms of credit scoring. Moreover, five popular baseline classifiers, i.e., neural network (NN), decision tree (DT), logistic regression (LR), Naïve Bayes (NB), and support vector machine (SVM) are considered to be benchmarks. Experimental findings reveal that the performance of ensemble learning is better than individual learners, except for AdaBoost. In addition, random forest has the best performance in terms of five metrics, XGBoost and LightGBM are close challengers. Among five baseline classifiers, logistic regression outperforms the other classifiers over the most of evaluation metrics. Finally, this study also analyzes reasons for the poor performance of some algorithms and give some suggestions on the choice of credit scoring models for financial institutions. Keywords: credit scoring; ensemble learning; baseline classifiers; comparative assessment 1.
    [Show full text]
  • Adaboost Is Consistent
    Journal of Machine Learning Research 8 (2007) 2347-2368 Submitted 12/06; Revised 7/07; Published 10/07 AdaBoost is Consistent Peter L. Bartlett [email protected] Department of Statistics and Computer Science Division University of California Berkeley, CA 94720-3860, USA Mikhail Traskin [email protected] Department of Statistics University of California Berkeley, CA 94720-3860, USA Editor: Yoav Freund Abstract The risk, or probability of error, of the classifier produced by the AdaBoost algorithm is investi- gated. In particular, we consider the stopping strategy to be used in AdaBoost to achieve universal 1 ε consistency. We show that provided AdaBoost is stopped after n − iterations—for sample size n and ε (0;1)—the sequence of risks of the classifiers it produces approaches the Bayes risk. 2 Keywords: boosting, adaboost, consistency 1. Introduction Boosting algorithms are an important recent development in classification. These algorithms belong to a group of voting methods (see, for example, Schapire, 1990; Freund, 1995; Freund and Schapire, 1996, 1997; Breiman, 1996, 1998), that produce a classifier as a linear combination of base or weak classifiers. While empirical studies show that boosting is one of the best off the shelf classifica- tion algorithms (see Breiman, 1998) theoretical results do not give a complete explanation of their effectiveness. The first formulations of boosting by Schapire (1990), Freund (1995), and Freund and Schapire (1996, 1997) considered boosting as an iterative algorithm that is run for a fixed number of iterations and at every iteration it chooses one of the base classifiers, assigns a weight to it and eventually outputs the classifier that is the weighted majority vote of the chosen classifiers.
    [Show full text]
  • Explaining Adaboost
    Explaining AdaBoost Robert E. Schapire Abstract Boosting is an approach to machine learning based on the idea of creating a highly accurate prediction rule by combining many relatively weak and inaccu- rate rules. The AdaBoost algorithm of Freund and Schapire was the first practical boosting algorithm, and remains one of the most widely used and studied, with applications in numerous fields. This chapter aims to review some of the many per- spectives and analyses of AdaBoost that have been applied to explain or understand it as a learning method, with comparisons of both the strengths and weaknesses of the various approaches. 1 Introduction Boosting is an approach to machine learning based on the idea of creating a highly accurate prediction rule by combining many relatively weak and inaccurate rules. The AdaBoost algorithm of Freund and Schapire [10] was the first practical boosting algorithm, and remains one of the most widely used and studied, with applications in numerous fields. Over the years, a great variety of attempts have been made to “explain” AdaBoost as a learning algorithm, that is, to understand why it works, how it works, and when it works (or fails). It is by understanding the nature of learning at its foundation — both generally and with regard to particular algorithms and phenomena — that the field is able to move forward. Indeed, this has been the lesson of Vapnik’s life work. This chapter aims to review some of the numerous perspectives and analyses of AdaBoost that have been applied to explain or understand it as a learning method, with comparisons of both the strengths and weaknesses of the various approaches.
    [Show full text]