Predicting Outcomes in Australian Rules Football Richard Ryall B.App.Sci(Stats)(Hons) A thesis submitted in fulfillment of the requirements for the degree of Doctor of Philosophy School of Mathematical and Geospatial Sciences RMIT University January 2011 Statement of Authorship The candidate hereby declares that: • except where due acknowledgement has been made, the work is that of the candidate alone; • the work has not been submitted previously, in whole or in part, to qualify for any other academic award; • the content of the thesis is the result of the work which has been carried out since the official commencement date of the approved research program; • any editorial work, paid or unpaid, carried out by a third party is acknowledged. Richard Ryall January 2011 i \Mathematics, rightly viewed, possesses not only truth, but supreme beauty - a beauty cold and austere, like that of sculpture." - Bertrand Russell ii Acknowledgements I wish to acknowledge the following people and organisations, without whose assistance this dissertation would not have been possible. Dr. Anthony Bedford - Thank you for being such an inspiration throughout this journey, I could not have asked for a better senior supervisor. Your passion for statistics in sport will never be forgotten. RMIT Sports Statistics Research Group - To all the members past and present, it has been great to share this experience with like minded people and watch the reputation of this group grow each year. Special mention to Dr. Adrian Schembri and Dr. Cliff Da Costa for astute comments on earlier versions of this dissertation. Dr. Mark Stewart - My career in sports statistics first started working under your guid- ance as a Research Assistant. This research turned into an ongoing project which I felt fortunate to have been a part of. Mr. Jason Ferris - For teaching me amongst other things, how to write \good code" and for always making time available to answer any questions. Prowess Sports - For providing data used at various stages throughout this dissertation. Anonymous Referees - Your astute comments and advice on journal articles and con- ference proceedings helped improve the foundation of this dissertation. iii Examiners - Thank you for your kind words and insightful suggestions which polished the overall quality of this dissertation. Australian Postgraduate Award - For providing financial support without which this dissertation would not have been possible. To my family - Mum, James and William; I'm always astonished by the obstacles we have overcome and I'm extremely proud of the people we are today. I draw strength from you all. Finally, I wish to dedicate this dissertation to the memory of my father Thomas Gordon Ryall (17/09/1947 - 21/09/2004) iv Summary The primary aim of this dissertation was to utilise mathematical models and computer programming techniques to provide further insight in relation to predicting outcomes in Australian Rules football (AFL). This thesis comprises a collection of research problems relating to home advantage, match prediction and the efficiency of betting markets in AFL. Firstly, a new paradigm was proposed for predicting home advantage in AFL by separately evaluating a number of psychological (crowd intimidation), physiological (travel fatigue) and tactical (ground familiarity) factors. This novel method for quantifying home advantage was utilised for match prediction using a variant of the Elo ratings system. These predictions were applied to betting markets to see if consistent profits were attainable using betting strategies based around the Kelly criterion. Due to a severe lack of accessible in-play betting data, a computer program was developed using the programming language Perl to integrate with the Betfair Application Programming Interface (API) to automatically record in-play betting data for AFL matches. This information was updated in a MySQL database which could then be easily exported as a CSV file for manipulation in Excel. The in-play betting data was transformed to provide a visual representation of who is going to win the match and with what level of certainty. Tests of semi-strong efficiency were performed on the in-play betting data for the 2009 AFL season using logistic regression to see whether teams with certain characteristics are underbet or overbet relative to their chances of winning. A real time prediction model was developed using a Generalised Logistic Model which accounts for the interdependence, if any, between team quality and score difference as the match progresses. These predictions were applied to in-play betting markets to see if consistent profits were attainable using betting strategies based around the Kelly criterion. If home advantage in AFL is comprised of a combination of psychological, physiological and tactical factors then it's plausible that home advantage is dependent upon the current state of the game (score) since the crowd, for example, react to performance. Therefore, home advantage v was modelled at various stages during the game to see the difference, if any, between home teams with certain pre-game characteristics (favourite/underdog) and in-game characteristics (ahead/behind). Finally, a macro was written in Excel to automate the transformation of a mass of \live-streaming" performance data into a single web-based phases of play plot. Statistically, the plot provides an effective representation of the state of the game at any point in time, illustrating which team is playing a style of football highly correlated with winning. Graphically the plot is enhanced by adding images of a player's guernsey when a goal is scored. vi Contents Declaration i Acknowledgements iii Summary v 1 Introduction 1 1.1 Why Australian Rules Football? . 2 1.2 Applications of Sports Statistics . 3 1.3 Literature Review . 6 1.3.1 Home Advantage . 6 1.3.2 Rating Systems in Sport . 8 1.3.3 Market Efficiency . 11 1.3.4 Intra-match Home Advantage . 14 1.3.5 Real Time Predictions in Sport . 16 1.3.6 Phases of Play . 17 1.4 Research Questions and Publications . 18 1.4.1 Research Questions . 19 1.4.2 Publications . 21 2 Australian Rules Football 23 2.1 History . 24 2.1.1 Pre 1890's . 24 2.1.2 1890's to 1910's . 26 2.1.3 1920's to 1940's . 29 2.1.4 1950's and 1960's . 30 vii 2.1.5 1970's . 32 2.1.6 1980's . 34 2.1.7 1990's . 37 2.1.8 2000's . 40 2.2 The Game . 43 2.2.1 Teams . 44 2.2.2 Field and Player Positions . 46 2.2.3 Scoring System . 47 2.2.4 Objectives and Rules . 48 2.2.5 The Fixture . 55 2.2.6 The Ladder . 57 2.3 Recruitment of Players . 59 2.3.1 History . 59 2.3.2 Trading . 59 2.3.3 National Draft . 59 2.3.4 Pre-Season Draft . 60 2.3.5 Rookie Draft . 60 2.4 AFL Statistics Providers . 61 2.4.1 Champion Data . 61 2.4.2 ProWess Sports . 62 2.4.3 Summary . 62 3 Methods 63 3.1 Introduction . 63 3.2 Linear Regression . 63 3.2.1 Introduction . 64 3.2.2 The Linear Regression Model . 64 3.2.3 Assumptions . 65 3.2.4 Least Squares Regression . 67 3.2.5 Analysis of Residuals . 69 3.2.6 Goodness of Fit . 70 3.3 Binary Logistic Regression . 71 3.3.1 Introduction . 71 3.3.2 Fitting the Logistic Regression Model . 73 3.3.3 Analysis of Residuals . 74 3.3.4 Goodness of Fit . 75 3.3.5 Odds Ratios . 76 viii 3.4 Optimisation and Simulation . 77 3.4.1 Introduction . 77 3.4.2 Mathematical Formulation . 78 3.4.3 Important Considerations . 79 3.4.4 Optimisation Algorithms . 80 3.5 Elo Ratings . 82 3.5.1 Introduction . 82 3.5.2 Mathematical Formulation . 83 3.5.3 The World Football Elo Rating System . 84 3.6 Computer Programming . 85 3.6.1 VBA Programming . 85 3.6.2 Perl and MySQL . 86 3.7 Summary . 87 I Pre-Game 88 4 Home Advantage 89 4.1 Introduction . 90 4.2 Independent Effects . 91 4.2.1 Ground Familiarity . 92 4.2.2 Travel Fatigue . 96 4.2.3 Crowd Intimidation . 97 4.2.4 Referee Bias . 100 4.3 Methods . 101 4.4 Results . 107 5 Ratings 114 5.1 Introduction . 114 5.2 Initial Ratings . 115 5.3 Methods . 119 5.4 Results . 122 5.5 Applications to Betting Markets . ..
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages320 Page
-
File Size-