Sport Analytics

SPORT ANALYTICS Dr. Jirka Poropudas, Director of Analytics, SportIQ [email protected] Outline 1. Overview of sport analytics • Brief introduction through examples 2. Team performance evaluation • Ranking and rating teams • Estimation of winning probabilities 3. Assignment: ”Optimal betting portfolio for Liiga playoffs” • Poisson regression for team ratings • Estimation of winning probabilities • Simulation of the playoff bracket • Optimal betting portfolio 11.3.2019 1. Overview of sport analytics 11.3.2019 What is sport analytics? B. Alamar and V. Mehrotra (Analytics Magazine, Sep./Oct. 2011): “The management of structured historical data, the application of predictive analytic models that utilize that data, and the use of information systems to inform decision makers and enable them to help their organizations in gaining a competitive advantage on the field of play.” 11.3.2019 Applications of sport analytics • Coaches • Tactics, training, scouting, and planning • General managers and front offices • Player evaluation and team building • Television, other broadcasters, and news media • Entertainment, better content, storytelling, and visualizations • Bookmakers and bettors • Betting odds and point spreads 11.3.2019 Data sources • Official summary statistics • Aggregated totals from game events • Official play-by-play statistics • Record of game events as they take place • Manual tracking and video analytics • More detailed team-specific events • Labor intensive approach • Data consistency? • Automated tracking systems • Expensive • Consistency based on given event definitions 11.3.2019 Data sources • Official summary statistics • Aggregated totals from game events • Official play-by-play statistics • Record of game events as they take place • Manual tracking and video analytics • More detailed team-specific events • Labor intensive approach • Data consistency? • Automated tracking systems • Expensive • Consistency based on given event definitions 11.3.2019 Data sources • Official summary statistics • Aggregated totals from game events • Official play-by-play statistics • Record of game events as they take place • Manual tracking and video analytics • More detailed team-specific events • Labor intensive approach • Data consistency? • Automated tracking systems • Expensive • Consistency based on given event definitions 11.3.2019 Data sources • Official summary statistics • Aggregated totals from game • https://www.youtube.com/edit?vide events o_id=7IdxFcy3PFA • Official play-by-play statistics • Record of game events as they take place • Manual tracking and video analytics • More detailed team-specific events • Labor intensive approach • Data consistency? • Automated tracking systems • Expensive • Consistency based on given event definitions 11.3.2019 Methodology • Basic statistics and more advanced techniques • Signal vs. noise • Mathematical modeling • Rules and scoring system specific factors • Machine learning • Neural networks, deep learning, Bayesian networks etc. • Optimization • Simulation 11.3.2019 EPL (football) – Expected goals http://www.bbc.com/sport/football/40699431 11.3.2019 NHL (ice hockey) M.B. McCurdy, @ineffectivemath, https://twitter.com/i/web/status/899721405083906048. 11.3.2019 NBA (basketball) – Houston Rockets K. Goldsberry, Grantland.com, http://grantland.com/the-triangle/future-of-basketball-james-harden-daryl-morey-houston-rockets/. 11.3.2019 MLB (baseball) – Launch angle and velocity D. Sheinin and A. Emamdjomeh, Washington Post, https://www.washingtonpost.com/graphics/sports/mlB-launch-angles-story/. 11.3.2019 NFL (American football) – 4th Down Bot B. Burke and K. Quealy, 4th Down bot, New York Times. http://www.nytimes.com/newsgraphics/2013/11/28/fourth-downs/post.html 11.3.2019 NFL (American football) – 4th Down Bot B. Burke and K. Quealy, 4th Down bot, New York Times. http://www.nytimes.com/newsgraphics/2013/11/28/fourth-downs/post.html 11.3.2019 2. Team performance evaluation and prediction of future outcomes 11.3.2019 Motivation for team performance evaluation and prediction • Unbiased evaluation of performance • Signal vs. noise • Strength of schedule • Strategy and planning • Team building and “tanking” • Storytelling and entertainment • Betting analytics • Betting lines • Predictive analytics 11.3.2019 Team performance evaluation by ranking and rating • The game results depend on (at least) three factors • Home advantage • Strength of the teams • Random variation (stochastic component) • The game results are observed and the teams are ranked or rated according to their perceived level of performance. • The objective of ranking and rating of teams is compare the underlying strengths of the teams. • Ranking: ordinal scale, i.e., the separation between successive teams is not evaluated. • Rating: interval scale, i.e., the differences between teams are measurable and have an meaningful interpretation. • Team ratings can be used for predicting the winners of future games 11.3.2019 Prediction and winning probability • Prediction of future results • When estimates for team strengths have been calculated, they can be used for estimating winning probabilities in future games. • Modeling approach depends on the rules and the scoring system • How are the points/goals scored? • Assumptions about the underlying scoring processes • N.B., There are always a number of alternative modeling choices 11.3.2019 Football • Low scoring game • Limited number of scoring chances • EPL: 2.77 goals/game in 2016-17 • Poisson distribution • Scoring intensity • “Small chance of a goal at every time instant” • Rough approximation https://dashee87.github.io/football/python/predicting-football-results-with-statistical-modelling/ 11.3.2019 Basketball • High number of scoring chances • NBA teams average ≈100 possessions per game • Consecutive offensive possessions are more or less independent • Central limit theorem • Distribution of points can be approximated using a normal distribution J. Poropudas, Kalman filter algorithm for rating and prediction in basketball, 2011. 11.3.2019 How certain is the outcome of the game? • Law of large numbers • Probability of an “upset” • In football, a match between a very good and a very bad team can still result in a tie or even an upset. • In basketball, the better team usually wins. 11.3.2019 Bradley-Terry model • Flexible model for almost(?) any game with two teams/players • Bernoulli trial: first team either wins or doesn’t. • Outcome of each game is 0 or 1. • Home advantage and scoring margin are not considered. • Parameters • Team ratings !" representing team strengths ()* • Winning probability when team # meets team $: log = !" − !/ +,()* • Parameter estimation using maximum likelihood • No closed form solution • Numerical methods 3 3 ℓ 1 = 2 2 4"/ log !" − 4"/ log !" + !/ " / E. Zermelo, Die Berechnung der Turnier-Ergebnisse als ein Maximumproblem der Wahrscheinlichkeitsrechnung, Mathematische Zeitschrift. 29 (1): 436-460, 1929. R.A. Bradley and M.E. Terry, Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons, Biometrika, 39 (3/4): 324-345, 1952. 11.3.2019 Maher’s model for football • “Scoring margin contains information.” • Poisson scoring for home team and visiting team: !" ~ $%&''%((*+ , -.) !0 ~ $%&''%((1. , 2+) • Four parameters per team • Offense at home and away: *+ and 1+ • Defense away and at home: -+ and 2+ • Number of parameters can decreased with equality constraints. • Parameter estimation using maximum likelihood • No closed form solution 9 9 • Numerical methods ℓ 4, 6 = 8 8 :+.*+-. − :+. log *+-. • Not a “perfect fit” to actual data + . • Independence assumption! M.J. Maher, Modelling association football scores, Statistica Neerlandica. 36 (3): 109-118, 1982. 11.3.2019 Dixon-Coles model for football • Refinement of the Maher’s model • Modification to outcomes 0-0, 1-0, 0-1, and 1-1 • Dependence between teams’ scoring • Better fit to actual results • Parameter estimation using maximum likelihood M.J. Dixon and S.G. Coles, Modelling association football scores and inefficiencies in the football betting market, Applied Statistics, 46 (2): 265-280, 1997. 11.3.2019 3. Course assignment: Optimal betting portfolio for Liiga Playoffs 11.3.2019 Finnish ice hockey league: Liiga • Top Finnish Ice Hockey League • 15 teams • 60 games for each team (30 home games) • 10 teams qualify for the playoffs • See, http://liiga.fi/ottelut/2018-2019/runkosarja/. • Regular season ends 14.3.2019 • Preliminary playoffs end 19.3.2019 • N.B., you can use all the information available up to that date in your project work. • Deadline for this project • Presentation due 1.4.2019 • Report due 13.4.2019 11.3.2019 Liiga standings (as of 10.3.2019) http://liiga.fi/tyokalut/laskuri/ 11.3.2019 Liiga playoff format • Six best teams at the conclusion of regular season proceed directly to quarter-finals • Teams placing between 7th and 10th (inclusive) will play preliminary play-offs (“wild card round”) best-of-three • The two winners of the preliminary playoffs take the last two slots to quarter-finals • All series after this are best-of-seven • In all playoff series, the team with the higher playoff seed holds the home advantage. • In the semifinals, the matchups are determined based on the regular season and the best team plays against the worst team (“re-seeding”). • N.B., you can skip the preliminary playoffs, if you like. 11.3.2019 Liiga playoffs (last season) 11.3.2019 Poisson regression • Poisson regression • Generalized linear model form of regression analysis

Sport Analytics

Introduction Predictive Vs. Earned Ranking Methods

Quantifying the Influence of Deviations in Past NFL Standings on the Present

When NBA Teams Don't Want To

Bayesian Analysis of Home Advantage in North American Professional Sports Before and During COVID‑19 Nico Higgs & Ian Stavness*

How Much of the NBA Home Court Advantage Is Explained by Rest?

Open Evan Bittner Thesis.Pdf

Riding a Probabilistic Support Vector Machine to the Stanley Cup

Why the 2020 LA Dodgers Are the Greatest Team of All Time

Team Payroll Versus Performance in Professional Sports: Is Increased Spending Associated with Greater Success?

Home Advantage and Tied Games in Soccer

NBA Team Home Advantage: Identifying Key Factors Using an Artificial Neural Network

Using Bayesian Statistics to Rank Sports Teams (Or, My Replacement for the BCS)