Model Selection in Regression: Application to Tumours in Childhood
Total Page:16
File Type:pdf, Size:1020Kb
Current Trends on Biostatistics & L UPINE PUBLISHERS Biometrics Open Access DOI: 10.32474/CTBB.2018.01.000101 ISSN: 2644-1381 Review Article Model Selection in Regression: Application to Tumours in Childhood Annah Managa* Department of Statistics, University of South Africa, South Africa Received: August 23, 2018; Published: September 10, 2018 *Corresponding author: Annah Managa, Department of Statistics, University of South Africa, South Africa Summary 2 We give a chronological review2 of the major model selection2 methods that have been proposed from circa 1960. These model multiple determination (Adj R ), Estimate of Error Variance (S p Schwarzselection criterion procedures (BIC). include Some Residual of these meanmethods square are appliederror (MSE), to a problem coefficient of developing of multiple a determination model for predicting (R ), adjusted tumors coefficientin childhood of using log-linear models. The theoretical review will discuss the), problem Stepwise of methods, model selection Mallow’s in C a ,general Akaike setting.information The applicationcriterion (AIC), will be applied to log-linear models in particular. 2 2 2 Keywords: MSE; R ; Adj R ; (S ); Stepwise methods; Cp; AIC; BIC Introduction there was huge spate of proposals to deal with the model Historical Development The problem of model selection is at the core of progress systematic development of frequentist criteria-based model in science. Over the decades, scientists have used various selection problem. Linhart and Zucchini (1986) provided a selection methods for a variety of typical situations that statistical tools to select among alternative models of data. arise in practice. These included the selection of univariate A common challenge for the scientist is the selection of the probability distributions, the regression setting, the analysis of variance and covariance, the analysis of contingency tables, criterion. Tobias Meyer (1750) established the two main best subset of predictor variables in terms of some specified and time series analysis. Bozdogan [1] gives an outstanding review to prove how AIC may be applied to compare models methods, namely fitting linear estimation and Bayesian saw a great development of regression and statistical ideas analysis by fitting models to observation. The 1900 to 1930’s a mathematical formulation that expresses the main features in a set of competing models and define a statistical model as and Leibler developed a measure of discrepancy from but were based on hand calculations. In 1951 Kullback Tibsharini introduced generalized additive models. These Information Theory, which forms the theoretical basis for of the data in terms of probabilities. In the 1990’s Hastie and models assume that the mean of the dependent variable enabled scientists to address the problem of model selection. criteria-based model selection. In the 1960’s computers function. Generalized additive models permit the response Computer programmes were developed to compute all depends on an additive predictor through a nonlinear link probability distribution to be any member of the exponential possible subsets for an example, Stepwise regression, family of distributions. They particularly suggested that, up Mallows C p to that date, model selection had largely been a theoretical , AIC, TIC and BIC. During the 1970’s and 1980’s Citation: Annah Managa. Model Selection in Regression: Application to Tumoursin Childhood. Curr Tre Biosta & Biometr 1(1)-2018. CTBB.MS.ID.000101. DOI: 10.32474/CTBB.2018.01.000101. 1 Curr Tre Biosta & Biometr Copyrights@ Annah Managa. exercise and those more practical examples were needed the tumours diagnosed in children and adolescents cover- Southern Africa (Medunsa) in 2009. The data consist of all (seePhilosophical Hastie and Perspective Tibshirani, 1990). Department were reviewed and all the tumours occurring The motivation for model selection is ultimately derived ing the period 2003 to 2008. The files of the Histopathology - from the principle of parsimony [2]. Implicitly the principle - during the first two decades of a patient’s life were identi binary response variable indicated the presence of either el selection, to remove all that is unnecessary. To implement fied. The following variables were noted: age, sex, site. The of parsimony (or Occam’s Razor) has been the soul of mod malignant (0) or benign (1) tumours. In our setting, the the parsimony principle, one has to quantify “parsimony” problem of model selection is not concerned with which of a model relative to the available data. Parsimony lies be- number of predictor variables to include in the model but rather, which functional form should be used to model the tween the evils of under over-fitting. Burnham and Anderson be as simple as possible concerning the included variables, [3] define parsimony as “The concept that a model should binary data it is usual to model the logit of a probability (the model structure, and number of parameters”. Parsimony is probability of a malignant tumour as a function of age. For logit of the probability is the logarithm of the odds), rather a desired characteristic of a model used for inference, and than the probability itself. Our question was then to select a functional form for the logit on the bases of a model selec- bias and variance of parameter estimators. According to it is usually defined by a suitable trade-off between squared Schwarz criterion (BIC). is developed to select a set of variables which is best for a tion criterion such as Akaike information criterion (AIC) or Claeskens and Hjort [4], focused information criterion (FIC) We considered various estimators for the logit, name- ly using linear or quadratic predictors, or additive with 2, 3, given focus. Foster and Stine [5] predict the onset of personal and 4 degrees of freedom. As an alternation, the probabilities bankruptcy using least squares regression. from a mix of payment history, debt load, demographics, They use stepwise selection to find predictors of these for various bandwidths, namely 8.0, 10.0 and 12.5. The mod- were modeled using Kernel estimator with Gaussian Kernel el selection criterion that was used are AIC and BIC. Based turn stepwise regression into an effective methodology for and their interactions by showing that three modifications on the above approach, recommendations will be made as - to which criteria are most suitable for selecting model se- trate the inadequacy of AIC and BIC in choosing models for predicting bankruptcy. Fresen provides an example to illus lection. The outline of this paper is as follows. In Section 2, we provide a brief review of the related literature. Section ordinal polychotomus regression. Initially, during the 60’s, 3 presents technical details of some of the major model se- 70’s and 80’s the problem of model selection was viewed as lection criteria. Some model selection methods which were nowadays model selection includes choosing the function- the choice of which variable to include in the data. However, Section 5 will provide conclusions and recommendations. use a linear model, or a generalized additive model or even applied to a data set will be discussed in Section 4. Finally, al form of the predictor variables. For example, should one Literature Review should be noted that there is often no one best model, but perhaps a kernel regression estimator to model the data? It The problem of determining the best subset of indepen- that there may be various useful sets of variabsles (Cox and dent variables in regression has long been of interest to ap- - plied statisticians, and it continues to receive considerable logical review of some frequentist methods of model selec- Snell, 1989). The purpose of this paper was to give a chrono - these methods in a practical situation. This research is a re- attention in statistical literature [6-9]. The focus began with tion that have been proposed from circa 1960 and to apply portant developments occurred and computing was expen- - the linear model in the 1960`s, when the first wave of im sive and time consuming. There are several papers that can amples. sponse to Hastie and Tibsharani’s (1990) call for more ex help us to understand the state-of-the-art in subset selection Data and Assumptions as it developed over the last few decades. Gorman and To- In this paper the procedures described here, will be man [10] proposed a procedure based on a fractional facto- applied to a data set collected at the Medical University of rial scheme in an effort to identify the better models with Citation: Annah Managa. Model Selection in Regression: Application to Tumoursin Childhood. Curr Tre Biosta & Biometr 1(1)-2018. CTBB.MS.ID.000101. DOI: 10.32474/CTBB.2018.01.000101. 2 Curr Tre Biosta & Biometr Copyrights@ Annah Managa. a moderate amount of computation and using Mallows as a of complexity of a model differently than AIC, or its variants, - addition or elimination of variables in multiple regression, ter estimates as well as the dependencies of the model resid- criterion. Aitkin [11] discussed stepwise procedures for the by taking into account the interdependencies of the parame uals. Browne [21] gives a review of cross-validation methods and the original application in multiple regression that was which by that time were very commonly used. Akaike [12] measure of discrepancy, or asymmetrical distance, between adopted the Kullback-Leibler definition of information, as a a “true” model and a proposed model, indexed on parameter versions of the AIC (the “corrected” AIC- and the “improved” considered first. Kim and Cavanaugh [22] looked at modified vector. A popular alternative to AIC presented by Schwarz [13] AICM) and the KIC (the “corrected” KIC- and the “improved” KICM) in the nonlinear regression framework. Hafidi and Mkhadri derived a different version of the “corrected” KIC that does incorporate sample size is BIC. Extending Akaike’s ÐKIC-) and compared it to the AIC- derived by Hurvich and original work, Sugiura (1978) proposed AICc, a corrected linear mixed model for longitudinal data and concluded that with normal errors.