Extensions of the Elo Rating System for Margin of Victory

Extensions of the Elo Rating System for Margin of Victory MathSport International 2019 - Athens, Greece Stephanie Kovalchik 1 / 46 2 / 46 3 / 46 4 / 46 5 / 46 6 / 46 7 / 46 How Do We Make Match Forecasts? 8 / 46 It Starts with Player Ratings Assume the the ith player has some true ability θi. Models of player abilities assume game outcomes are a function of the difference in abilities Prob(Wij = 1) = F(θi − θj) 9 / 46 Paired Comparison Models Bradley-Terry models are a general class of paired comparison models of latent abilities with a logistic function for win probabilities. 1 F(θi − θj) = 1 + α−(θi−θj) With BT, player abilities are treated as Hxed in time which is unrealistic in most cases. 10 / 46 Bobby Fischer 11 / 46 Fischer's Meteoric Rise 12 / 46 Arpad Elo 13 / 46 In His Own Words 14 / 46 Ability is a Moving Target 15 / 46 Standard Elo Can be broken down into two steps: 1. Estimate (E-Step) 2. Update (U-Step) 16 / 46 Standard Elo E-Step For tth match of player i against player j, the chance that player i wins is estimated as, 1 W^ ijt = 1 + 10−(Rit−Rjt)/σ 17 / 46 Elo Derivation Elo supposed that the ratings of any two competitors were independent and normal with shared standard deviation δ. Given this, he likened the chance of a win to the chance of observing a difference in ratings for ratings drawn from the same distribution, 2 Rit − Rjt ∼ N(0, 2δ ) which leads to, Rit − Rjt P(Rit − Rjt > 0) = Φ( ) √2δ and ≈ 1/(1 + 10−(Rit−Rjt)/2δ) Elo's formula was just a hack for the cumulative normal density function. 18 / 46 Choice of σ Was based on the standard deviation of chessplayer ratings when Elo made the system, which was SD ≈ 200. Thus σ = 400 in Elo's system. Source: chess-site.com 19 / 46 Standard Elo U-Step For a binary result Wijt, the update to the ith player rating is, ^ Ri(t+1) = Rit + K(Wijt − W ijt) 20 / 46 Standard Elo U-Step For a binary result Wijt, the update to the ith player rating is, ^ Ri(t+1) = Rit + K(Wijt − W ijt) This adjusts according to the win residual and maximum possible gain (loss) of K. 20 / 46 Choice of K Elo would vary K depending on the tournament type but 32 was one value he often used. 21 / 46 Elo's Model-Based Connections State-space representation 1 P(Wij = 1|θi, θj) = 1 + 10−(θi−θj)/400 Abilities are assumed to follow a normal distribution over a rating period τ t+τ t 2 t 2 θi |θi, ν , t ∼ N(θi, ν t) Glicko (1999) is a Bayesian version, Fahrmeir and Tutz (1994) used Empirical Bayes 22 / 46 Elo's Model-Based Connections Glickman showed that the Elo model is a special case of a state-space paired comparison model that assumes 1. The same prior knowledge about a player's strength throughout time 2. The strengths of opponents are known constants 23 / 46 Elo's Model-Based Connections Glickman showed that the Elo model is a special case of a state-space paired comparison model that assumes 1. The same prior knowledge about a player's strength throughout time 2. The strengths of opponents are known constants Thus, we can consider Elo as a pared down version of Glicko. 23 / 46 Simplicity Works 24 / 46 Can Elo be Simple But Better? Men's 2019 French Open Final 25 / 46 Can Elo be Simple But Better? 26 / 46 Margin Of Victory Modelling Principles 27 / 46 Margin Of Victory Modelling Principles Consider two-step 'estimate then update' algorithms 27 / 46 Margin Of Victory Modelling Principles Consider two-step 'estimate then update' algorithms Targets of estimation must be functions of relative ratings 27 / 46 Margin Of Victory Modelling Principles Consider two-step 'estimate then update' algorithms Targets of estimation must be functions of relative ratings Ratings updates are functions of residuals 27 / 46 Margin Of Victory Modelling Principles Consider two-step 'estimate then update' algorithms Targets of estimation must be functions of relative ratings Ratings updates are functions of residuals The MOV is incorporated into estimation, updating, or both 27 / 46 MOV Models Linear Joint Additive Multiplicative Logistic 28 / 46 Linear E-Step Rit − Rjt M^ = ijt σ 29 / 46 Linear E-Step Rit − Rjt M^ = ijt σ U-Step ^ Ri(t+1) = Rit + K(Mijt − Mijt) 29 / 46 Joint Additive E-Step Rit − Rjt 1 M^ ijt = , W^ ijt = R R σ σ1 1 + 10−( it− jt)/ 2 30 / 46 Joint Additive E-Step Rit − Rjt 1 M^ ijt = , W^ ijt = R R σ σ1 1 + 10−( it− jt)/ 2 U-Step ^ ^ Ri(t+1) = Rit + K1(Mijt − Mijt) + K2(Wijt − W ijt) 30 / 46 Multiplicative E-Step 1 W^ ijt = 1 + 10−(Rit−Rjt)/σ2 31 / 46 Multiplicative E-Step 1 W^ ijt = 1 + 10−(Rit−Rjt)/σ2 U-Step α ^ Ri(t+1) = Rit + K(1 + |Mijt/σ1|) (Wijt − W ijt) α > 0 When σ1 = 1 this is the same Elo goal-based model of Hvattum and Arntzen (2010) 31 / 46 Logistic E-Step Rit − Rjt W^ ijt = L( ) σ2 32 / 46 Logistic E-Step Rit − Rjt W^ ijt = L( ) σ2 U-Step Mijt Rit − Rjt Ri(t+1) = Rit + K[L( ) − L( )] σ1 σ2 where L(x) = 1/(1 + α−x) is a generalized logistic function. 32 / 46 Kinetic Model for Elo Asymptotics Jabin and Junca (2015) propose a continuous kinetic model based on density f(t, r, θ), for players with rating r, true ability θ at time t, ∂ ∂ f + (a[f] f) = 0 ∂t ∂r where a[f] is a scalar vector Held, a[f] = w(r − r′)(b(θ − θ′) − b(r − r′))f(t, r′, θ′)dθ′dr′ ∫R2 w(. ) describes the probability of interactions between players of different ratings b(. ) is the update function, describing how ratings change after a new result 33 / 46 Validity Conditions Condition 1: Stationarity When players have reached their true rating, the expected change in ratings should be zero. 34 / 46 Validity Conditions Condition 1: Stationarity When players have reached their true rating, the expected change in ratings should be zero. Condition 2: Convergence The rating system should converge to player true strengths. Under the kinetic model, Jabin and Junca showed that any Elo system with update function b(. ) that meets the stationarity property and is Lipschitz continuous and strictly increasing satisBes this condition. 34 / 46 Validity 35 / 46 Validity The linear model update, (Mijt − M^ ijt) meets the stationarity and convergence conditions when E[Mijt] = M^ ijt. That is, when we have correctly speciHed the expectation for the margin. 35 / 46 Validity The linear model update, (Mijt − M^ ijt) meets the stationarity and convergence conditions when E[Mijt] = M^ ijt. That is, when we have correctly speciHed the expectation for the margin. The joint additive is the sum of the linear and standard Elo updates, so it's validity depends on the same conditions as the linear model. 35 / 46 Validity The linear model update, (Mijt − M^ ijt) meets the stationarity and convergence conditions when E[Mijt] = M^ ijt. That is, when we have correctly speciHed the expectation for the margin. The joint additive is the sum of the linear and standard Elo updates, so it's validity depends on the same conditions as the linear model. The multiplicative model's validity is established by showing that its update function can be reparameterized as standard Elo with a modiHed K ′. 35 / 46 Validity The linear model update, (Mijt − M^ ijt) meets the stationarity and convergence conditions when E[Mijt] = M^ ijt. That is, when we have correctly speciHed the expectation for the margin. The joint additive is the sum of the linear and standard Elo updates, so it's validity depends on the same conditions as the linear model. The multiplicative model's validity is established by showing that its update function can be reparameterized as standard Elo with a modiHed K ′. The logistic model needs the strongest set of conditions as it's update, L(Mijt/σ1) − L((Rit − Rjt)/σ2), is not a standard residual. 35 / 46 Simulation Study For N = 1000, Rin − Rjn ∼ N(0, 50) MOVijn|(Rin − Rjn) ∼ N((Rin − Rjn)/200, 1) −MOVijn/2 Wijn|MOVijn ∼ Bernoulli(1/(1 + 10 )) 36 / 46 37 / 46 Application Study ATP Dataset, Tuning 2000-2015, Testing 2016-2018 Margin Of Victory Median IQR % Positive for Winner SETS WON 2 1 100 GAMES WON 5 4 95 BREAK POINTS WON 2 2 90 TOTAL POINTS WON 14 10 94 SERVE PERCENTAGE WON 10 12 93 38 / 46 Model Tuning Optimization with loss function that combines RMSE of MOV and log-loss of win predictions, (M^ (θ) − M )2 √∑i,j,t ijt ijt L(θ) = 1/N ⎡ − log(P^ijt(θ)) 3SD ∑ ⎢ i,j,t ⎣ Initial values: Scaling rating difference to MOV 200/SDMOV Scaling learning rate to MOV residual 32/3SDMOV 39 / 46 Source: Horizontal line is standard Elo 40 / 46 Source: Horizontal line is standard Elo 41 / 46 Source: Horizontal lines are standard Elo 42 / 46 Takeaways 43 / 46 Takeaways Modellers have several valid options for incorporating MOV into their player ratings whether wins or the MOV are the target of interest 43 / 46 Takeaways Modellers have several valid options for incorporating MOV into their player ratings whether wins or the MOV are the target of interest When applied to men's tennis, MOV models improve predictive performance over standard Elo, the differences in gains depending more on the choice of MOV than model type 43 / 46 Takeaways Modellers have several valid options for incorporating MOV into their player ratings whether wins or the MOV are the target of interest When applied to men's tennis, MOV models improve predictive performance over standard Elo, the differences in gains depending more on the choice of MOV than model type State-space analogs to these models would allow for inference but aren't expected to improve predictive performance 43 / 46 The Rise of Tsitsipas 44 / 46 Wimbledon Prospects Player Grass Adjusted MOV Elo Novak Djokovic 2562 Rafael Nadal 2539 Roger Federer 2478 Dominic Thiem 2279 David Godn 2250 Kei Nishikori 2248 Gael MonHls 2244 John Isner 2238 Marin Cilic 2211 Roberto Bautista Agut 2207 Matteo Berrettini 2205 Alexander Zverev 2182 Milos Raonic 2178 Daniil Medvedev 2169 Stefanos Tsitsipas 2168 45 / 46 References Fahrmeir, L., Tutz, G., 1994.

Extensions of the Elo Rating System for Margin of Victory

Here Should That It Was Not a Cheap Check Or Spite Check Be Better Collaboration Amongst the Federations to Gain Seconds

The USCF Elo Rating System by Steven Craig Miller

Elo World, a Framework for Benchmarking Weak Chess Engines

Elo and Glicko Standardised Rating Systems by Nicolás Wiedersheim Sendagorta Introduction “Manchester United Is a Great Team” Argues My Friend

Package 'Playerratings'

Intrinsic Chess Ratings 1 Introduction

Alberto Santini Monte Dei Paschi Di Siena Outline

System Requirements Recommended Hardware Supported Hardware

A Brief Comparison Guide Between the ELO and the Glicko Rating Systems

Glossary of Chess

Predicting Chess Player Strength from Game Annotations

(12) United States Patent (10) Patent No.: US 7.496,943 B1 Goldberg Et Al