Forecasting Serve Performance in Professional Tennis Matches
Total Page:16
File Type:pdf, Size:1020Kb
Journal of Sports Analytics xx (2021) x–xx 1 DOI 10.3233/JSA-200345 IOS Press 1 Forecasting serve performance 2 in professional tennis matches ∗ 3 Jacob Gollub 4 Harvard University, Department of Statistics 5 Abstract. Many research papers on tennis match prediction use a hierarchical Markov Model. To predict match outcomes, 6 this model requires input parameters for each player’s serving ability. While these parameters are often computed directly 7 from each player’s historical percentages of points won on serve and return, doing so fails to address bias due to limited 8 sample size and differences in strength of schedule. In this paper, we explore a handful of novel approaches to forecasting 9 serve performance that specifically address these limitations. By applying an Efron-Morris estimator, we provide a means 10 to robustly forecast outcomes when players have limited match data over the past year. Next, through tracking expected 11 serve and return performance in past matches, we account for strength of schedule across all points in a player’s match 12 history. Finally, we demonstrate a new way to synthesize historical serve data with the predictive power of Elo ratings. When 13 forecasting serve performance across 7,622 ATP tour-level matches from 2014-2016, all three of these proposed methods 14 outperformed Barnett and Clarke’s standard approach. 15 Keywords: Match prediction, point-based models, Efron-Morris estimator, Elo ratings 16 1. Introduction Assuming that serve and return performances fol- 35 low a normal distribution on a match-by-match basis, 36 17 Statistical prediction models have been applied to Newton and Aslam (2009) ran Monte Carlo simu- 37 18 tennis for decades, often to predict the winner of a lations to predict the winner of a match. Adapting 38 19 match. Laying the groundwork for research on point- Barnett and Clarke’s approach, Spanias and Knotten- 39 20 based models, Klaassen and Magnus (2001) first belt (2012) predicted serve performance by modeling 40 21 tested the assumption that points in a tennis match the probabilities of specific point outcomes (e.g. ace, 41 22 are independent and identically distributed, or i.i.d. rally win on first serve, etc.) within a service game. 42 23 While this was proven false, they concluded devi- In order to reduce bias from variations in strength 43 24 ations are small enough that the i.i.d. assumption of schedule, Knottenbelt et al. (2012) then proposed 44 25 provides a reasonable approximation. By estimating the Common-Opponent Model, which measures per- 45 26 each player’s probability of winning a point on serve formance relative to common adversaries in their 46 27 and computing win probability from these parame- respective match histories. More recently, Kovalchik 47 28 ters, they demonstrated a new way to forecast matches and Reid (2018) demonstrated a way to calibrate 48 29 under this assumption (Klaassen and Magnus, 2003). serve parameters to the win probability implied by 49 30 In the time since, we have seen a variety of ways each player’s Elo rating. 50 31 to forecast serve performance. Barnett and Clarke Building upon previous work in serve performance 51 32 (2005) estimated these parameters byUncorrected adjusting his- prediction, Author this exploration servesProof two purposes. 52 33 torical tournament statistics by the performance of Following a previous assessment of 11 different pre- 53 34 the server and returner, relative to the average player. match prediction models on 2014 ATP tour-level 54 matches (Kovalchik, 2016), we evaluate serve perfor- 55 ∗ mance prediction methods across the same dataset as 56 Corresponding author: Jacob Gollub, Harvard University, Department of Statistics. E-mail: [email protected]. a benchmark for future work. We also introduce sev- 57 ISSN 2215-020X © 2021 – The authors. Published by IOS Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (CC BY-NC 4.0). 2 J. Gollub / Forecasting serve performance in professional tennis matches 58 eral new methods to more robustly compute expected directly from point-level probabilities, while Elo rat- 103 59 serve performance. When forecasting thousands of ings infer player ability solely from match outcomes. 104 60 matches, the previously stated methods tend to run 61 into problems with limited match data and differences 2.1. Hierarchical markov model 105 1 62 in strength of schedule. To avoid these pitfalls, we 63 explore variations of Barnett and Clarke’s approach Consider a tennis match between player i and 106 64 that specifically address these issues. player j, where fij estimates the probability that 107 65 In section 2, we review the hierarchical Markov player i wins a point on serve against player j in a 108 66 Model and Elo ratings, two of the most often-cited given match. Given serve parameters fij ,fji we can 109 67 methods for predicting tennis match outcomes. In calculate the probability that player i wins the match, 110 68 section 3, we review Barnett and Clarke’s formula πij , from any score. To do so, consider how scores 111 69 and explore further approaches to serve performance are composed of points, games, and sets. With player 112 70 prediction. Section 3.1.1 introduces a Bayesian esti- i serving to player j, let ai,aj represent each player’s 113 71 mator to more robustly predict serve performance within-game score. Then we compute the probabil- 114 72 from limited match data. Section 3.1.2 demonstrates ity of player i winning the current game from score 115 73 an approach that tracks expected serve and return per- ai − aj as follows: 116 74 formances in every match, allowing us to consider 75 strength of schedule over the course of a player’s ⎧ ⎪ , a = ,a ≤ 76 entire match history. Section 3.2 harnesses Elo rat- ⎪1 if i 4 j 2 ⎪ 77 ings’ predictive power to more accurately predict ⎪ = ≤ ⎪0, if aj 4,ai 2 78 serve performance while maintaining the overall ⎪ ⎨ 2 79 serve ability between two players. In section 4, we (fij ) = , if a = a = 3 Pg(ai,aj) 2 + − 2 i j 80 present several case studies to examine how each ⎪ (fij ) (1 fij ) ⎪ 81 approach informs predictions for individual matches. ⎪ ∗ + + ⎪fij Pg(ai 1,aj) 82 In section 5, we evaluate all proposed methods ⎪ ⎪ 83 ⎩ alongside established prediction models. Section 6 (1 − fij )Pg(ai,aj + 1), otherwise. 84 summarizes findings and suggests future steps for 117 85 research with point-based models. Following this approach, one may calculate player 118 86 Effective serve performance prediction holds i’s probability of winning the current set, Ps(gi,gj), 119 87 strong implications for both player analytics and and then the match, Pm(si,sj), through similar recur- 120 88 in-match forecasting. With more effective forecasts, sive relationships. Barnett et al. (2002) describe the 121 89 coaches can better understand how their players recursion in full detail. 122 90 match up with opponents on serve and return, 91 and adjust their strategies accordingly. By using 92 the hierarchical Markov Model, we can also com- 2.2. Elo ratings 123 93 pute win probability as a function of each player’s 94 expected serve performance from any score. There- Elo was originally developed as a head-to-head rat- 124 95 fore, improving existing methods will provide means ing system for chess players (Elo, 1978). Recently, 125 96 to more confidently predict match outcomes while FiveThirtyEight’s Elo variant has gained prominence 126 97 play is in progress, a problem with significant applica- in the media (Bialik et al., 2016). For a match at 127 98 tion to betting markets and real-time sports analytics. time t between player i and player j with Elo rat- 128 ings Ei(t) and Ej(t), player i is forecasted to win 129 with probability: 130 99 2. Models for match win prediction − Ej(t)−Ei(t) 1 πˆ (t) = 1 + 10 400 . 131 100 Newly proposed methods will require the context ij 101 of two popular match prediction models.Uncorrected The hier- Author Proof 102 archical Markov Model computes win probability For the following match, player i’s rating is then 132 1Knottenbelt’s Common-Opponent model addresses strength updated accordingly: 133 of schedule at the cost of reducing the amount of available match data. Kovalchik and Reid’s approach indirectly addresses both of + = + ∗ − these issues through the use of Elo ratings. Ei(t 1) Ei(t) Kit (Wi(t) πˆ ij (t)). 134 J. Gollub / Forecasting serve performance in professional tennis matches 3 135 Wi(t) is an indicator for whether player i won the return. ft denotes the percentage of points won on 172 136 given match at time t, while Kit is the learning rate serve at the match’s given tournament and fav,gav 173 137 for player i at time t. represent tour-level averages for the percentages of 174 138 According to FiveThirtyEight’s analysts, Elo rat- points won on serve and return, respectively. 175 139 ings perform optimally when allowing Kit to decay While Barnett and Clarke’s dataset was limited to 176 140 slowly over time (Bialik et al., 2016). With mi(t) rep- year-to-date statistics, we may calculate fi,gi with 177 141 resenting the number of player i’s career matches at the past twelve months of match data for any given 178 2 Mi 142 time t, we update the learning rate as follows : match. Where (y,m) represents the set of player i’s 179 matches in year y, month m, we obtain the following 180 250 4 = statistics : 181 143 Kit .4 .