A Point-Based Bayesian Hierarchical Model to Predict the Outcome Of

J. Quant. Anal. Sports 2019; 15(4): 313–325 Martin Ingram* A point-based Bayesian hierarchical model to predict the outcome of tennis matches https://doi.org/10.1515/jqas-2018-0008 match can be determined from just the probabilities of winning a point on serve for each player. In addition to Abstract: A well-established assumption in tennis is that the match-winning probability, probabilities of winning a point outcomes on each player’s serve in a match are service game and winning a set with a given score can independent and identically distributed (iid). With this be derived from just the serve-winning probabilities. Bar- assumption, it is enough to specify the serve probabili- nett et al. (2006) derive a wealth of other metrics, includ- ties for both players to derive a wide variety of event dis- ing the number of points in a game, tiebreak, set and tributions, such as the expected winner and number of match. sets, and number of games. However, models using this Although many predictions can be made using the iid assumption, which we will refer to as “point-based”, have assumption, it is only an approximation. Pollard, Cross, typically performed worse than other models in the litera- and Meyer (2006) analyse patterns of set wins and find ture at predicting the match winner. This paper presents evidence that the probability of winning a set varies from a point-based Bayesian hierarchical model for predict- set to set. Klaassen and Magnus (2001) analyse 90,000 ing the outcome of tennis matches. The model predicts points from Wimbledon matches between 1992 and 1995 the probability of winning a point on serve given sur- and find deviations from iid behaviour. They suggest how- face, tournament and match date. Each player is given ever that these deviations are small and that the iid model a serve and return skill which is assumed to follow may still be useful for match prediction. Similarly, Newton a Gaussian random walk over time. In addition, each and Aslam (2006) investigate a variety of possible non-iid player’s skill varies by surface, and tournaments are given effects using a Monte Carlo simulation and find that the tournament-specific intercepts. When evaluated on the iid model remains robust even when non-iid effects are ATP’s 2014 season, the model outperforms other point- introduced. based models, predicting match outcomes with greater In Kovalchik (2016), the author compares 11 published accuracy (68.8% vs. 66.3%) and lower log loss (0.592 tennis prediction models by predicting the ATP’s 2014 sea- vs. 0.641). The results are competitive with approaches son. Models are broken into three classes: models using modelling the match outcome directly, demonstrating the iid model for match prediction, which we will refer the forecasting potential of the point-based modelling to as “point-based models” from now on; models based approach. on regression approaches; and paired comparison mod- Keywords: Bayesian modelling; random walk; sports fore- els. The best point-based model was found to have lower casting. accuracy and higher log loss than the best regression and paired comparison models. This paper introduces a new point-based model. The paper’s contributions are the following: firstly, it improves 1 Introduction on the previous best published point-based model and outperforms the regression models in Kovalchik (2016), A wealth of research in tennis, such as Newton and Keller coming close to matching the best reported model, an (2005), O’Malley (2008) and Riddle (1988) has shown that Elo model (Elo 1978) with a customised k-factor devised if it is assumed that point outcomes on each player’s serve by the website FiveThirtyEight (Morris and Bialik 2015). in a tennis match are assumed to be independent and Secondly, to the best of our knowledge, it is the first identically distributed (iid), the probability of winning a Bayesian hierarchical model presented for predicting tennis matches. The hierarchical model allows the fitting of *Corresponding author: Martin Ingram, University of Melbourne, player and surface-specific skills, as well as tournament- School of BioSciences, Melbourne, Australia; and Silverpond, specific adjustments, even when there are few or no obser- Melbourne, Australia, e-mail: [email protected] vations for some combinations. 314 | M. Ingram: A point-based Bayesian hierarchical model to predict the outcome of tennis matches 2 Methods ni serves in the match is the result of a Bernoulli trial with the same success probability θi, thus making use of the iid 2.1 Data assumption. In the following, we also divide time into periods. To be able to compare directly with the results obtained Players’ skills are assumed to be constant within a time in Kovalchik (2016), the ATP’s 2014 season is used as a period. Shorter periods allow the model to adapt more validation set. quickly to skill changes but also require a larger number Table 1 shows summaries of this validation set, bro- of parameters to be estimated. The shortest period length ken down by surface. The dataset was obtained by scrap- considered in this paper is one month, and the largest is ing MatchStat.com. Retirements, walkovers and matches twelve. without serving statistics (total points played and total The serve-winning probability θi is further broken points won on serve for each player) were discarded. Sur- down as follows: faces are broken down into three categories: clay, grass logit(θ ) = (α −β )+(g −g )+δ +θ and hard. Hard court matches are most prevalent, repre- i s(i)p(i) r(i)p(i) s(i)m(i) r(i)m(i) t(i) 0 (2) senting more than half of the tournaments and matches Here, the quantities α, β, g, δ and θ represent the played, followed by clay courts, followed by grass courts, 0 following: which only comprise about 10% of total matches played. – α is server s(i)’s serving skill in period p(i) Not all players play on all surfaces, with only just more s(i)p(i) – β is returner r(i)’s returning skill in period p(i) than half participating in at least one grass court match. r(i)p(i) – g is server s(i)’s additional skill on surface m(i) On average, a player serves 80 points per match, winning s(i)m(i) – g is returner r(i)’s additional skill on surface m(i) 51 of these, or about 64%. This percentage is higher on r(i)m(i) – δ is the adjusted serve intercept at tournament t(i), grass courts (67%) and lower on clay courts (62%). Grass t(i) and court matches average more service points played, which – θ is an intercept representing the average player’s is likely due to the fact that Wimbledon, which uses the 0 probability of winning a point on serve. longer best of five sets format, contributes over half of the matches played on grass in the dataset (124). Breaking down equation (2) into its individual terms, the first term in brackets represents the server’s serving skill 2.2 Model adjusted by the opponent’s return skill, the second represents the difference in skill preferences for the match 2.2.1 Likelihood surface, and the third corrects for tournament variation in the difficulty of winning a point on serve. In every tennis match, each player serves a number of Equation (2) builds on the opponent-adjusted model times n and wins y of those points. We divide each match by Barnett and Clarke (2005). They also adjust a player’s in the dataset into these two contests on serve, referring serve skill by the opponent’s return skill and have a to each as a “serve-match”. The likelihood for each serve- tournament-specific intercept, but do not add a surface- match i given by: specific offset. yi ∼ Binomial(ni , θi) (1) 2.2.2 Priors where θi is the serve-winning probability in that serve- match. This assumes that the outcome on each of a player’s The initial serve and return skills α and β are drawn from a normal distribution: Table 1: Summaries of the 2014 validation set. 2 α.1 ∼ N(0, σα0) (3) Clay Grass Hard Overall β ∼ N(0, σ2 ) (4) Tournaments 22 5 33 60 .1 β0 Matches 736 232 1240 2208 Unique players 186 144 215 267 The initial variance parameters are given hyperpriors: Average serve points played 78 95 79 80 Average serve points won 48 64 51 51 σα0 ∼ H(0, 1) (5) Fraction of serve points won 62% 67% 64% 64% σβ0 ∼ H(0, 1) (6) M. Ingram: A point-based Bayesian hierarchical model to predict the outcome of tennis matches | 315 where H is the half-normal distribution which assigns compare the influence of the period length, separate mod- non-zero probability only to positive values. This els with period lengths of 12, 6, 3, 2 and 1 month(s) hierarchical prior shrinks players’ skills towards the group are fit. In addition, for each period length, model fit- serve and return priors given by equations (3) and (4). ting is started in 2013, 2012 and 2011 to compare per- Similarly, the tournament-specific intercepts and formance. Convergence was assessed using the R^ statis- surface-specific preferences are independently drawn tic (Gelman and Rubin 1992), which is reported by Stan; from zero-centred normal distributions with half-normal all values were below 1.1. Code to fit the model is avail- variance hyperpriors: able online at: https://github.com/martiningram/tennis_ bayes_point_based. 2 g.

Load more