THE USE of EFFECT SIZES in CREDIT RATING MODELS By
Total Page:16
File Type:pdf, Size:1020Kb
THE USE OF EFFECT SIZES IN CREDIT RATING MODELS by HENDRIK STEFANUS STEYN submitted in accordance with the requirements for the degree of MASTER OF SCIENCE in the subject STATISTICS at the UNIVERSITY OF SOUTH AFRICA SUPERVISOR: PROF P NDLOVU DECEMBER 2014 © University of South Africa 2015 Abstract The aim of this thesis was to investigate the use of effect sizes to report the results of statistical credit rating models in a more practical way. Rating systems in the form of statistical probability models like logistic regression models are used to forecast the behaviour of clients and guide business in rating clients as “high” or “low” risk borrowers. Therefore, model results were reported in terms of statistical significance as well as business language (practical significance), which business experts can understand and interpret. In this thesis, statistical results were expressed as effect sizes like Cohen‟s d that puts the results into standardised and measurable units, which can be reported practically. These effect sizes indicated strength of correlations between variables, contribution of variables to the odds of defaulting, the overall goodness-of-fit of the models and the models‟ discriminating ability between high and low risk customers. Key Terms Practical significance; Logistic regression; Cohen‟s d; Probability of default; Effect size; Goodness-of-fit; Odds ratio; Area under the curve; Multi-collinearity; Basel II © University of South Africa 2015 i Contents Abstract ................................................................................................................................................... i Contents ................................................................................................................................................. ii List of Tables ........................................................................................................................................ iv List of Figures ...................................................................................................................................... vii ACKNOWLEDGEMENTS .............................................................................................................. viii DEDICATION...................................................................................................................................... ix DECLARATION BY STUDENT ........................................................................................................ x LIST OF ABBREVIATIONS ............................................................................................................. xi 1 Introduction ................................................................................................................................... 1 1.1 Effect sizes .............................................................................................................................. 1 1.2 Credit rating models ................................................................................................................ 2 1.2.1 Credit risk and credit ratings ........................................................................................... 2 1.2.2 Credit risk model ............................................................................................................. 2 1.3 Leading to the problem ........................................................................................................... 3 1.3.1 Current solutions ............................................................................................................. 4 1.3.2 Shortcomings .................................................................................................................. 5 1.4 Objective of the thesis ............................................................................................................. 5 1.4.1 Broad objective ............................................................................................................... 5 1.4.2 Specific objectives .......................................................................................................... 5 1.5 Data ......................................................................................................................................... 6 1.5.1 Data sources .................................................................................................................... 6 1.5.2 Data management ............................................................................................................ 7 1.5.3 Data set construction ....................................................................................................... 8 1.5.4 Treatment in the data set for companies that defaulted ................................................. 10 1.5.5 Treatment in the data set for companies that did not default ........................................ 10 1.6 Organisation of the thesis ...................................................................................................... 11 2 Statistical methods of modelling probability of default ........................................................... 12 2.1 Logistic modelling of probability of default ......................................................................... 12 2.1.1 Logistic regression model and parameter estimation .................................................... 12 2.1.2 Model diagnostics ......................................................................................................... 15 2.1.3 Multi-collinearity and variable selection methods ........................................................ 19 2.1.4 The validation of the model .......................................................................................... 20 2.1.5 Sensibility of choice of explanatory variables .............................................................. 24 2.2 The final probability of default ............................................................................................. 24 2.3 Effect size revisited ............................................................................................................... 26 © University of South Africa 2015 ii 2.3.1 Importance of effect sizes ............................................................................................. 26 2.3.2 Different types of effect sizes ....................................................................................... 27 2.3.3 Magnitude of effect ....................................................................................................... 29 2.3.4 Practical significance .................................................................................................... 30 2.3.5 Reporting guidelines and benefits of reporting effect sizes and its practical significance) .................................................................................................................................. 32 3 Data analysis and results ............................................................................................................ 33 3.1 Data analysis ......................................................................................................................... 33 3.1.1 Data used for modelling ................................................................................................ 33 3.1.2 Correlation of variables and cluster analysis ................................................................ 38 3.1.3 Modelling data sets (month 1 – 48 before default) ....................................................... 41 3.2 Fitting the full logistic regression model to the data ............................................................. 41 3.2.1 Descriptive statistics and correlation analysis ............................................................... 41 3.2.2 Checking model assumptions ........................................................................................ 44 3.2.3 Check for outliers and influential observations ............................................................ 46 3.2.4 Stepwise Variable selection .......................................................................................... 49 3.3 Fitting the logistic regression model for selected variables to the 48 month before default data set .............................................................................................................................................. 50 3.3.1 Checking correlation and model assumptions ............................................................... 50 3.3.2 The fitted logistic regression model and inferences ...................................................... 53 3.4 Fitting the logistic regression models to the various months before default data sets .......... 59 3.4.1 Correlation Analysis ..................................................................................................... 60 3.4.2 Goodness-of-fit of the models ....................................................................................... 62 3.4.3 Test of the significance of individual model parameters .............................................. 63 3.4.4 Odds ratios as effect sizes ............................................................................................. 66 3.5 Constructing the PD models ................................................................................................. 71 4 Conclusion, recommendations and applications .....................................................................