Nonlinear Dynamics of Sports
Total Page:16
File Type:pdf, Size:1020Kb
Parity and Predictability of Competitions: Nonlinear Dynamics of Sports Eli Ben-Naim Complex Systems Group & Center for Nonlinear Studies Los Alamos National Laboratory Federico Vazquez, Sidney Redner (Los Alamos & Boston University) Talk, papers available from: http://cnls.lanl.gov/~ebn 2006 Dynamics Days Asia Pacific, Pohang, Korea, July 12, 2006 Plan • Parity of sports leagues • Theory: competition model • Predictability of competitions • Competition and social dynamics What is the most competitive sport? Football Baseball Hockey Basketball American football What is the most competitive sport? Football Baseball Hockey Basketball American football Can competitiveness be quantified? How can competitiveness be quantified? Parity of a sports league Major League Baseball American League • Teams ranked by win-loss record 2005 Season-end Standings • Win percentage Number of wins x = Number of games • Standard deviation in win-percentage σ = x2 x 2 ! " − ! " • Cumulative distribution! = Fraction of teams with winning percentage < x In baseball 0.400 < x < 0.600 F (x) σ = 0.08 Data • 300,000 Regular season games (all games) • 5 Major sports leagues in US, England sport league full name country years games soccer FA Football Association England 1888-2005 43,350 baseball MLB Major League Baseball US 1901-2005 163,720 hockey NHL National Hockey League US 1917-2005 39,563 basketball NBA National Basketball Association US 1946-2005 43,254 american football NFL National Football League US 1922-2004 11,770 source: http://www.shrpsports.com/ http://www.the-english-football-archive.com/ Standard deviation in winning percentage 1 data 0.8 theory 0.6 F(x) 0.4 NFL NBA NHL 0.2 MLB 0 0 0.2 0.4 0.6 0.8 1 x Standard deviation in winning percentage 1 σ data theory 0.25 0.8 0.20 0.210 0.15 0.150 0.10 0.120 0.102 0.6 0.05 0.084 0 MLB FA NHL NBA NFL F(x) 0.4 NFL •Baseball most competitive? NBA American football least NHL • competitive? 0.2 MLB Distribution of winning percentage clearly 0 0 0.2distinguishes0.4 spor0.6 ts 0.8 1 x Theory: competition model • Two, randomly selected, teams play • Outcome of game depends on team record Better team wins with probability 1-q 1/2 random - q = - Worst team wins with probability q !1 deterministic (i + 1, j) probability 1 q (i, j) − i > j → !(i, j + 1) probability q - When two equal teams play, winner picked randomly • Initially, all teams are equal (0 wins, 0 losses) 1 Teams play once per unit time x = • ! " 2 Rate equation approach • Probability distribution functions k 1 gk = fraction of teams with k wins − ∞ G = g = fraction of teams with less than k wins H = 1 G = g k j k − k+1 j j=0 j=!k+1 • Ev!olution of the probability distribution dgk 1 2 2 = (1 q)(gk 1Gk 1 gkGk) + q(gk 1Hk 1 gkHk) + gk 1 gk dt − − − − − − − 2 − − better team wins worse team wins equal! teams pla"y • Closed equations for the cumulative distribution dGk 2 2 = q(Gk 1 Gk) + (1/2 q) Gk 1 Gk dt − − − − − Boundary Conditions G 0 = 0 G = 1 Initial Conditions Gk(t = 0) = 1 ∞ ! " Nonlinear Difference-Differential Equations An exact solution • Winner always wins (q=0) dGk = Gk(Gk Gk 1) dt − − • Transformation into a ratio Pk Gk = Pk+1 • Nonlinear equations reduce to linear recursion dPk = Pk 1 dt − Exact solution • 1 2 1 k 1 + t + 2! t + + k! t Gk = · · · 1 + t + 1 t2 + + 1 tk+1 2! · · · (k+1)! Long-time asymptotics Long-time limit • 1 k + 1 G 0.8 k → t • Scaling form 0.6 k F(x) 0.4 k=10 Gk F k=20 → t k=100 ! " 0.2 scaling theory Scaling function • 0 0 0.5 1 1.5 2 F (x) = x x Seek similarity solutions Use winning percentage as scaling variable Scaling analysis • Rate equation dGk 2 2 = q(Gk 1 Gk) + (1/2 q) Gk 1 Gk dt − − − − − ! " ∂G Treat number of wins as continuous Gk+1 Gk • − → ∂k ∂G ∂G + [q + (1 2q)G] = 0 ∂t − ∂k • Stationary distribution of winning percentage k G (t) F (x) x = k → t Scaling equation • dF [(x q) (1 2q)F (x)] = 0 − − − dx Scaling analysis • Rate equation dGk 2 2 = q(Gk 1 Gk) + (1/2 q) Gk 1 Gk dt − − − − − ∂G ! G "G Treat number of wins as continuous k+1 − k → ∂k Inviscid Burgers• equation ∂v ∂v ∂G ∂G + v = 0 + [q + (1 2q)G] = 0 ∂t ∂x ∂t − ∂k • Stationary distribution of winning percentage k G (t) F (x) x = k → t Scaling equation • dF [(x q) (1 2q)F (x)] = 0 − − − dx Scaling solution Stationary distribution of winning percentage • F (x) 0 0 < x < q 1 x q F (x) = − q < x < 1 q 1 2q − 1 − 1 q < x. − x q 1 q • Distribution of winning percentage is− uniform f(x) 0 0 < x < q 1 f(x) = F !(x) = q < x < 1 q 1 1 2q − 2q 1 0 − 1 q < x. − − • Variance in winning percentage 1/2 q q = 1/2 perfect parity σ = − √3 −→ !q = 1 maximum disparity Approach to scaling Numerical integration of the rate equations, q=1/4 1 0.5 Theory League games 0.8 t=100 NFL MLB 160 t=500 0.4 FA 40 0.6 NHL 80 NBA 80 ! 0.3 MLB NFL 16 F(x) 1/2 0.4 t− 1 0.2 √ 0.2 4 3 0 0.1 0 0.2 0.4 0.6 0.8 1 0 200 400 600 800 1000 x t •Winning percentage distribution approaches scaling solution •Correction to scaling is very large for realistic number of games •Large variance may be due to small number of games 1/2 q σ(t) = − + f(t) √3 Large! Variance inadequate to characterize competitiveness! The distribution of win percentage 1 0.8 0.6 F(x) 0.4 NFL NBA NHL 0.2 MLB Treat q as a fitting parameter, time=number of games • 0 0 0.2 0.4 0.6 0.8 1 •Allows to estimate qmodel xfor different leagues The upset frequency • Upset frequency as a measure of predictability Number of upsets q = Number of games • Addresses the variability in the number of games • Measure directly from game-by-game results - Ties: count as 1/2 of an upset (small effect) - Ignore games by teams with equal records - Ignore games by teams with no record The upset frequency 0.48 0.46 0.44 0.42 q 0.40 0.38 0.36 FA MLB 0.34 NHL NBA 0.32 NFL 0.30 1900 1920 1940 1960 1980 2000 year The upset frequency 0.48 League q qmodel FA 0.452 0.459 0.46 MLB 0.441 0.413 0.44 NHL 0.414 0.383 NBA 0.365 0.316 0.42 NFL 0.364 0.309 0.40 q q differentiates 0.38 the different 0.36 FA MLB sport leagues! 0.34 NHL NBA 0.32 NFL 0.30 Football, baseball most competitive 1900 1920 1940 1960 1980 2000 Basketball, American fyearootball least competitive Evolution with time 0.28 0.48 0.26 NFL 0.24 NBA 0.46 NHL 0.22 MLB 0.44 0.20 FA ! 0.18 0.42 0.16 0.14 0.40 0.12 q 0.10 0.38 0.08 1900 1920 1940 1960 1980 2000 year 0.36 FA MLB •Parity0.34, predictabilityNHL mirror each other NBA •American0.32 footballNFL , baseball increasing competitiveness •Football0.30 decreasing1/2 competitivq eness (past 60 years) 1900σ =1920− 1940 1960 1980 2000 √3 year Century versus Decade Century (1900-2005) Decade (1995-2005) 0.50 0.50 0.45 0.45 0.452 0.45 0.441 0.43 0.40 0.414 0.40 0.41 0.39 0.35 0.365 0.364 0.35 0.34 0.30 0.30 0.25 0.25 FA MLB NHL NBA NFL FA MLB NHL NBA NFL Football-American Football gap narrows from 9% to 2%! All-time team records 1 MLB 0.8 0.6 NY Yankees (0.567) F(x) 0.4 σ = 0.024 0.2 q = 0.458 0 0.44 0.46 0.48 0.5 0.52 0.54 0.56 x •Provides the longest possible record (t~13000) •Close to a linear function Discussion • Model limitation: it does not incorporate - Game location: home field advantage - Game score - Upset frequency dependent on relative team strength - Unbalanced schedule • Model advantages: - Simple, involves only 1 parameter - Enables quantitative analysis Conclusions • Parity characterized by variance in winning percentage - Parity measure requires standings data - Parity measure depends on season length • Predictability characterized by upset frequency - Predictability measure requires game results data - Predictability measure independent of season length • Two-team competition model allows quantitative modeling of sports competitions Competition and Social Dynamics • Teams are agents • Number of wins represents fitness or wealth • Agents advance by competing against age • Competition is a mechanism for social differentiation The social diversity model • Agents advance by competition (i + 1, j) rate p (i, j) i > j → (i, j + 1) rate 1 p ! − • Agent decline due to inactivity k k 1 with rate r → − • Rate equations dGk 1 2 = r(Gk+1 Gk) + pGk 1(Gk 1 Gk) + (1 p)(1 Gk)(Gk 1 Gk) (Gk Gk 1) dt − − − − − − − − − 2 − − • Scaling equations dF [(p + r 1 + x) (2p 1)F (x)] = 0 − − − dx Social structures 1. Middle class Agents advance at different rates 2.