Comparing Several Modeling Methods on Ncaa March Madness
Total Page:16
File Type:pdf, Size:1020Kb
COMPARING SEVERAL MODELING METHODS ON NCAA MARCH MADNESS A Dissertation Submitted to the Graduate Faculty of the North Dakota State University of Agriculture and Applied Science By Su Hua In Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY Major Department: Statistics July 2015 Fargo, North Dakota North Dakota State University Graduate School Title Comparing Several Modeling Methods on NCAA March Madness By Su Hua The Supervisory Committee certifies that this disquisition complies with North Dakota State University’s regulations and meets the accepted standards for the degree of DOCTOR OF PHILOSOPHY SUPERVISORY COMMITTEE: Dr. Rhonda Magel Co-Chair Dr. Gang Shen Co-Chair Dr. Seung Won Hyun Dr. Kenneth Magel Approved: 7/10/2015 Rhonda Magel Date Department Chair ABSTRACT This year (2015), according to the AGA’s (American Gaming Association) research, nearly about 40 million people filled out about 70 million March Madness brackets (Moyer, 2015). Their objective is to correctly predict the winners of each game. This paper used the probability self- consistent (PSC) model (Shen, Hua, Zhang, Mu, Magel, 2015) to make the prediction of all 63 games in the NCAA Men's Division I Basketball Tournament. PSC model was first introduced by Zhang (2012). The Logit link was used in Zhang’s (2012) paper to connect only five covariates with the conditional probability of a team winning a game given its rival team. In this work, we incorporated fourteen covariates into the model. In addition to this, we used another link function, Cauchit link, in the model to make the predictions. Empirical results show that the PSC model with Cauchit link has better average performance in both simple and doubling scoring than Logit link during the last three years of tournament play. In the generalized linear model, maximum likelihood estimation is a popular method for estimating the parameters; however, convergence failures may happen when using large dimension covariates in the model (Griffiths, Hill, Pope, 1987). Therefore, in the second phase in this study, Bayesian inference is used for estimating in the parameters in the prediction model. Bayesian estimation incorporates prior information such as experts’ opinions and historical results in the model. Predictions from three years of March Madness using the model obtained from Bayesian estimation with Logit link will be compared to predictions using the model obtained from maximum likelihood estimation. iii ACKNOWLEDGMENTS It is with immense gratitude that I acknowledge the support and help of my advisors, Dr. Rhonda Magel and Dr. Gang Shen. Thank you for giving me continuous guidance and support throughout this whole research process. This dissertation could not have been completed without the effort from you. I would like to thank all my committee members, Dr. Seung Won Hyun, and Dr. Kenneth Magel, as well as Dr. Changhui Yan and Mr. Curt Doetkott. At last, I would like to thank all professors and staff from the Department of Statistics. I would also like to thank my parents. They were always supporting me and encouraging me with their best wishes. Finally, I would like to thank my wife, Qian Wen. She was always there cheering me up and stood by me through the good times and bad. iv TABLE OF CONTENTS ABSTRACT ................................................................................................................................... iii ACKNOWLEDGMENTS ............................................................................................................. iv LIST OF TABLES ........................................................................................................................ vii LIST OF FIGURES ..................................................................................................................... viii 1. INTRODUCTION ......................................................................................................................1 1.1. Research Objective ..............................................................................................................1 1.2. The Playing Rule and Structure ...........................................................................................1 1.3. Qualifying Procedure ..........................................................................................................2 1.4. Bracket Scoring System .......................................................................................................6 2. LITERATURE REVIEW ...........................................................................................................7 3. PROBABILITY SELF-CONSISTENCY MODEL WITH CAUCHIT LINK .........................12 3.1. Introduction of Cauchit Link .............................................................................................12 3.2. Application .........................................................................................................................14 3.3. Model Selection ...............................................................................................................15 3.4. Prediction Result ...............................................................................................................16 4. BAYESIAN INFERENCE .......................................................................................................25 4.1. The Likelihood Function....................................................................................................26 4.2. The Prior Distribution of Logistic Coefficients .................................................................27 4.3. The Posterior Distribution of Logistic Coefficient............................................................31 4.4. Estimation .........................................................................................................................32 v 4.5. Sampling Algorithms .........................................................................................................32 4.6. Application .........................................................................................................................33 5. COMPARISON OF PREDICTION ACCURACY IN THE PAST THREE YEARS .............40 6. DISCUSSION ...........................................................................................................................44 REFERENCES .............................................................................................................................46 APPENDIX A. R CODE FOR PROBABILITY SELF-CONSISTENCY MODEL WITH CAUCHIT LINK ..........................................................................................................................50 APPENDIX B. R AND SAS CODE FOR BAYESIAN INFERENCE WITH LOGIT LINK ..............................................................................................................................................65 B.1. R Code Part .......................................................................................................................65 B.2. SAS Code Part ...................................................................................................................65 vi LIST OF TABLES Table Page 1. Automatic qualifiers for the 2015 NCAA March Madness ....................................................3 2. At-large qualifiers for the 2015 NCAA March Madness ........................................................5 3. First Four games in 2015 NCAA March Madness .................................................................6 4. Scoring systems ......................................................................................................................6 5. Covariates used in the model ................................................................................................15 6. Summary of the three models for PSCM with Cauchit link (2015 March Madness) ............17 7. Prediction accuracy for PSCM with Cauchit link (2015 March Madness) ...........................18 8. Summary of the three models for PSC model with Logit link (2015 March Madness) ........19 9. Summary of the restricted OLRE model for 2015 March Madness .....................................19 10. PSC model (Cauchit link) probability matrix in 2015 March Madness using MLE ..........21 11. PSC model (Logit link) probability matrix in 2015 March Madness using MLE ...............23 12. Restricted OLRE model probability matrix in 2015 March Madness using MLE .............24 13. Probability matrix based upon the seed number .................................................................29 14. Matchup information in NCAA March Madness Championship games from season 2002-2003 through 2013-2014 (n=12).................................................................................29 15. Summary table for posterior distribution in 2015 March Madness Rd64 model ...............35 16. Estimated coefficients (mean of posterior distribution) for 2015 March Madness using Bayesian inference (standard deviation of posterior distribution given in parenthesis) ......36 17. Prediction accuracy for PSC model (Logit link) using Bayesian estimation in 2015 March Madness ....................................................................................................................37 18. PSC model (Logit link) probability matrix in 2015 March Madness using Bayesian Est. ......................................................................................................................................38 19. The