Sports Analytics from a to Z

Total Page:16

File Type:pdf, Size:1020Kb

Sports Analytics from a to Z i Table of Contents About Victor Holman .................................................................................................................................... 1 About This Book ............................................................................................................................................ 2 Introduction to Analytic Methods................................................................................................................. 3 Sports Analytics Maturity Model .................................................................................................................. 4 Sports Analytics Maturity Model Phases .................................................................................................. 4 Sports Analytics Key Success Areas ........................................................................................................... 5 Allocative and Dynamic Efficiency ................................................................................................................ 7 Optimal Strategy in Basketball .................................................................................................................. 7 Backwards Selection Regression ................................................................................................................... 9 Competition between Sports Hurts TV Ratings: How to Shift League Calendars to Optimize Viewership .................................................................................................................................................................. 9 Bayesian Generalized Linear Model ............................................................................................................ 11 Shrinkage Estimation of NFL Field Goal Success Probabilities ................................................................ 11 Bayesian Hierarchical Latent Variable Models ........................................................................................... 13 Eye on the Ball: The Relationship Between Sensorimotor Abilities and On-Field Performance in Professional Baseball .............................................................................................................................. 13 Bayesian Hierarchical Model....................................................................................................................... 15 Estimating an NBA Player's Impact on His Team's Chances of Winning ................................................. 15 Bayesian Model and Simple OLS Regression .............................................................................................. 17 Moral Hazard in Long-Term Guaranteed Contracts: Theory and Evidence from the NBA ..................... 17 Bayesian Multinomial Model ...................................................................................................................... 19 Exploring the Potential of the Plus/Minus in NCAA Women's Volleyball via the Recovery of Court Presence Information ............................................................................................................................. 19 Binary Logic Model ...................................................................................................................................... 21 Matchup Models for the Probability of a Ground and a Ground Ball Hit ............................................... 21 Bipartite Graph Algorithms ......................................................................................................................... 23 Player Centrality on NBA Teams ............................................................................................................. 23 Bland Altman Plot Analysis ......................................................................................................................... 25 Bland Altman Plot Analysis in Sports Analytics ....................................................................................... 25 Bonferroni Adjusted Comparisons .............................................................................................................. 27 A Closer Look at the Prevalence of Time Rule Violations and the Inter-Point Time in Men's Gland Slam Tennis ...................................................................................................................................................... 27 Bookmaker Consensus Model .................................................................................................................... 29 ii Searching for the GOAT of Tennis Win Prediction .................................................................................. 29 Bootstrap Simulation .................................................................................................................................. 31 Improving Fairness in Match Play Golf through Enhanced Handicap Allocation ................................... 31 Boruta Algorithm ........................................................................................................................................ 33 An Investigation of Three-Point Shooting through an Analysis of NBA Player Tracking Data ................ 33 Brownian Motion Process ........................................................................................................................... 35 The Implied Volatility of a Sports Game ................................................................................................. 35 Chapman- Kolmogorov Equations .............................................................................................................. 37 Evaluating NBA End-of-Game Decision-Making ..................................................................................... 37 Clustering .................................................................................................................................................... 39 Big 2's and Big 3's: Analyzing How a Team's Best Players Complement Each Other .............................. 39 Coaching Assistance Models ....................................................................................................................... 41 Coaching Assistance Models as a Method of Sports Analytics ............................................................... 41 Recognizing and Analyzing Ball Screen Defense in the NBA ................................................................... 42 Competitiveness Metrics ............................................................................................................................ 44 The Competitiveness of Games in Professional Sports Leagues ............................................................ 44 Computer Vision Techniques ...................................................................................................................... 45 Using Computer Vision and Machine Learning to Automatically Classify NFL Game Film and Develop a Player Tracking System ........................................................................................................................... 45 Conditional Probability ............................................................................................................................... 47 Applying Conditional Probability to Sports Analytics ............................................................................. 47 Conditional Random Field ........................................................................................................................... 48 "Quality vs. Quantity": Improved Shot Prediction in Soccer using Strategic Features from Spatiotemporal Data ............................................................................................................................... 48 Conservation of Runs Framework ............................................................................................................... 50 openWAR: An Open Source System for Evaluating Overall Player Performance in Major League Baseball ................................................................................................................................................... 50 Convex Hull ................................................................................................................................................. 52 Predicting Golf Scores at the Shot Level ................................................................................................. 52 Convolutional Neural Networks .................................................................................................................. 54 Mythbusting Set-Pieces in Soccer ........................................................................................................... 54 Cross-Validated Root-Mean-Squared Error ................................................................................................ 56 Improved NBA Adjusted +/- Using Regularization and Out-of-Sample Testing ...................................... 56 Data Science Methods ................................................................................................................................ 58 iii Data Science Methods in Sports Analytics .............................................................................................. 58 Data Tables.................................................................................................................................................. 60 The Effect of
Recommended publications
  • "What Raw Statistics Have the Greatest Effect on Wrc+ in Major League Baseball in 2017?" Gavin D
    1 "What raw statistics have the greatest effect on wRC+ in Major League Baseball in 2017?" Gavin D. Sanford University of Minnesota Duluth Honors Capstone Project 2 Abstract Major League Baseball has different statistics for hitters, fielders, and pitchers. The game has followed the same rules for over a century and this has allowed for statistical comparison. As technology grows, so does the game of baseball as there is more areas of the game that people can monitor and track including pitch speed, spin rates, launch angle, exit velocity and directional break. The website QOPBaseball.com is a newer website that attempts to correctly track every pitches horizontal and vertical break and grade it based on these factors (Wilson, 2016). Fangraphs has statistics on the direction players hit the ball and what percentage of the time. The game of baseball is all about quantifying players and being able give a value to their contributions. Sabermetrics have given us the ability to do this in far more depth. Weighted Runs Created Plus (wRC+) is an offensive stat which is attempted to quantify a player’s total offensive value (wRC and wRC+, Fangraphs). It is Era and park adjusted, meaning that the park and year can be compared without altering the statistic further. In this paper, we look at what 2018 statistics have the greatest effect on an individual player’s wRC+. Keywords: Sabermetrics, Econometrics, Spin Rates, Baseball, Introduction Major League Baseball has been around for over a century has given awards out for almost 100 years. The way that these awards are given out is based on statistics accumulated over the season.
    [Show full text]
  • Utilizing the Impact Test to Predict Performance in College Baseball Hitters by Ryan Nussbaum a Thesis Submitted to the Honors C
    Utilizing the ImPACT Test to Predict Performance in College Baseball Hitters By Ryan Nussbaum A Thesis Submitted to the Honors College In Partial Fulfillment of the Bachelors degree With Honors in Psychology (B.S.) THE UNIVERSITY OF ARIZONA MAY2013 .- 1 The University of Arizona Electronic Theses and Dissertations Reproduction and Distribution Rights Form The UA Campus Repository supports the dissemination and preservation of scholarship produced by University of Arizona faculty, researchers, and students. The University library, in collaboration with the Honors College, has established a collection in the UA Campus Repository to share, archive, and preserve undergraduate Honors theses. Theses that are submitted to the UA Campus Repository are available for public view. Submission of your thesis to the Repository provides an opportunity for you to showcase your work to graduate schools and future employers. It also allows for your work to be accessed by others in your discipline, enabling you to contribute to the knowledge base in your field. Your signature on this consent form will determine whether your thesis is included in the repository. Name (Last, First, Middle) N usSbo."'M.' ll. 'lo.-/1, ?t- \A l Degree tiUe (eg BA, BS, BSE, BSB, BFA): B. 5. Honors area (eg Molecular and Cellular Biology, English, Studio Art): e.si'c ko to~ 1 Date thesis submitted to Honors College: M ,..., l, Z-.0[3 TiUe of Honors thesis: LH;tn.. ;J +he-- L-\?1kT 1~:.+ 1o P~-t P~.t>('.,.,. ....... ~ ~ ~~ '""' G.~~ l3~~~u (-(t~..s The University of Arizona Library Release Agreement I hereby grant to the University of Arizona Library the nonexclusive worldwide right to reproduce and distribute my dissertation or thesis and abstract (herein, the "licensed materials"), in whole or in part, in any and all media of distribution and in any format in existence now or developed in the Mure.
    [Show full text]
  • Twins Notes, 10-4 at NYY – ALDS
    MINNESOTA TWINS AT NEW YORK YANKEES ALDS - GAME 1, FRIDAY, OCTOBER 4, 2019 - 7:07 PM (ET) - TV: MLB NETWORK RADIO: TIBN-WCCO / ESPN RADIO / TWINSBEISBOL.COM RHP José Berríos vs. LHP James Paxton POSTSEASON GAME 1 POSTSEASON ROAD GAME 1 Upcoming Probable Pitchers & Broadcast Schedule Date Opponent Probable Pitchers Time Television Radio / Spanish Radio 10/5 at New York-AL TBA vs. RHP Masahiro Tanaka 5:07 pm (ET) Fox Sports 1 TIBN-WCCO / ESPN / twinsbeisbol.com 10/6 OFF DAY 10/7 vs. New York-AL TBA vs. RHP Luis Severino 7:40 or 6:37 pm (CT) Fox Sports 1 TIBN-WCCO / ESPN / twinsbeisbol.com SEASON AT A GLANCE THE TWINS: The Twins completed the 2019 regular season with a record of 101-61, winning the TWINS VS. YANKEES American League Central on September 25 after defeating Detroit by a score of 5-1, followed Record: ......................................... 101-61 2019 Record: .........................................2-4 Home Record: ................................. 46-35 by a White Sox 8-3 victory over the Indians later that night...the Twins will play Game 2 of this ALDS tomorrow, travel home for an off-day and a workout on Sunday (time TBA), followed Current Streak: ...............................2 losses Road Record: .................................. 55-26 2019 at Home: .......................................1-2 Record in series: .........................31-15-6 by Game 3 on Monday, which would mark the Twins first home playoff game since October Series Openers: .............................. 36-16 7, 2010 and third ever played at Target Field...the Twins spent 171 days in first place, tied 2019 at New York-AL: ............................1-2 Series Finales: ...............................
    [Show full text]
  • Intentional Walk in 2Nd Backfires on Clevinger by Jordan Bastian / MLB.Com | @Mlbastian | 2:05 AM ET + 1 COMMENT DENVER -- It Was a Strategically Sound Decision
    Intentional walk in 2nd backfires on Clevinger By Jordan Bastian / MLB.com | @MLBastian | 2:05 AM ET + 1 COMMENT DENVER -- It was a strategically sound decision. With two outs, first base open and Rockies starter Antonio Senzatela standing in the on-deck circle, Indians manager Terry Francona called for an intentional walk to Tony Wolters in the second inning on Tuesday night. One pitch later, the groundwork had been laid for the Tribe's 11-3 loss at Coors Field. "Those are things that kind of make you stay up at night," Francona said. Colorado's lineup dismantled Cleveland's pitching to the tune of 12 hits, including two home runs off the bat of former Indians slugger Mark Reynolds. Even in the wake of the lopsided score, though, it was the second-inning, bases-clearing double by Senzatela -- one pitch after Wolters removed his shin guard and trotted to first base -- that felt like the game's turning point. With the bases loaded, Clevinger attacked Senzatela with a first-pitch fastball low in the strike zone. Heading into the night, the Colorado pitcher had three hits in 21 at-bats, and each of those came on elevated heaters. None of that mattered, as Senzatela was in attack mode and he split the right-center-field gap with a line drive that cleared the bases and put the Indians in a 3-0 hole. Clevinger wore a look of disgust as the ball bounced through the outfield grass. Senzatela clapped his hands hard and jumped into the air in excitement upon reaching second base.
    [Show full text]
  • Sportsdata MLB Stats Feed Product Description
    Updated 06.27.12 MLB Statistics Feeds 2012 Season 1 SPORTSDATA MLB STATISTICS FEEDS Updated 06.27.12 Table of Contents Overview ...................................................................................................................................... 3 Stats List – Game Information ................................................................................................... 4 Stats List – Team Information .................................................................................................... 5 Stats List – Venue Information ................................................................................................... 6 Stats List – Player Information .................................................................................................. 7 Stats List – Game Statistics ....................................................................................................... 8 Stats List – Player Stats – Hitting .............................................................................................. 9 Stats List – Player Stats – Pitching .......................................................................................... 11 Stats List – Team Stats – Hitting ............................................................................................. 13 Stats List – Team Stats – Pitching ........................................................................................... 15 Event Info and Lineups Feed – Details ...................................................................................
    [Show full text]
  • Pythagoras at the Bat 3 to Be 2, Which Is the Source of the Name As the Formula Is Reminiscent of the Sum of Squares from the Pythagorean Theorem
    Pythagoras at the Bat Steven J Miller, Taylor Corcoran, Jennifer Gossels, Victor Luo and Jaclyn Porfilio The Pythagorean formula is one of the most popular ways to measure the true abil- ity of a team. It is very easy to use, estimating a team’s winning percentage from the runs they score and allow. This data is readily available on standings pages; no computationally intensive simulations are needed. Normally accurate to within a few games per season, it allows teams to determine how much a run is worth in different situations. This determination helps solve some of the most important eco- nomic decisions a team faces: How much is a player worth, which players should be pursued, and how much should they be offered. We discuss the formula and these applications in detail, and provide a theoretical justification, both for the formula as well as simpler linear estimators of a team’s winning percentage. The calculations and modeling are discussed in detail, and when possible multiple proofs are given. We analyze the 2012 season in detail, and see that the data for that and other recent years support our modeling conjectures. We conclude with a discussion of work in progress to generalize the formula and increase its predictive power without needing expensive simulations, though at the cost of requiring play-by-play data. Steven J Miller Williams College, Williamstown, MA 01267, e-mail: [email protected],[email protected] arXiv:1406.0758v1 [math.HO] 29 May 2014 Taylor Corcoran The University of Arizona, Tucson, AZ 85721 Jennifer Gossels Princeton University, Princeton, NJ 08544 Victor Luo Williams College, Williamstown, MA 01267 Jaclyn Porfilio Williams College, Williamstown, MA 01267 1 2 Steven J Miller, Taylor Corcoran, Jennifer Gossels, Victor Luo and Jaclyn Porfilio 1 Introduction In the classic movie Other People's Money, New England Wire and Cable is a firm whose parts are worth more than the whole.
    [Show full text]
  • Vol. 58 No. 7 Florence, Alabama 35632 September 2006
    VOL. 58 NO. 7 FLORENCE, ALABAMA 35632 SEPTEMBER 2006 VOL. 58 NO. 6 FLORENCE, ALABAMA 35632 SEPTEMBER 2006 Behind CoSIDA > In This Issue . • Schaffhauser ................IF Passing the Gavel - Dull Becomes CoSIDA President .............3 • Marriott .......................21 Nashville Workshop Photo Scrapbook ..................................4-9 CoSIDA Hall of Fame Finds Permament Home ................10-11 • ICS................................26 2006 CoSIDA Award Winners .............................................12-15 • Multi-Ad .......................37 Academic All-America Hall of Fame ..................................16-17 CoSIDA to Celebrate 50th Anniversary in San Diego ..........18 • Sports Illustrated ........42 LifeLock Becomes CoSIDA Sponsor .......................................19 • NBA/WNBA ..................55 CoSIDA Adds Three New Board Members ........................20-21 Workshop Sponsors and Exhibitors ..................................22-25 • ESPN ............ Back Cover Crisis Public Relations ............................................................27 Using YouTube ..........................................................................28 NCCSIA Holds Fifth Annual meeting .....................................29 Update from NCAA Statistics Service ...............................30-33 Five Questions With Maureen Nasser ...............................34-35 CoSIDA Board Contact Information ......................................36 SEND CORRESPONDENCE TO: A Wake Up Call ........................................................................38
    [Show full text]
  • Predicting College Basketball Match Outcomes Using Machine Learning
    Predicting NCAAB match outcomes using ML techniques – some results and lessons learned Zifan Shi, Sruthi Moorthy, Albrecht Zimmermann⋆ KU Leuven, Belgium Abstract. Most existing work on predicting NCAAB matches has been developed in a statistical context. Trusting the capabilities of ML tech- niques, particularly classification learners, to uncover the importance of features and learn their relationships, we evaluated a number of different paradigms on this task. In this paper, we summarize our work, pointing out that attributes seem to be more important than models, and that there seems to be an upper limit to predictive quality. 1 Introduction Predicting the outcome of contests in organized sports can be attractive for a number of reasons such as betting on those outcomes, whether in organized sports betting or informally with colleagues and friends, or simply to stimulate conversations about who “should have won”. We would assume that this task is easier in professional leagues, such as Major League Baseball (MLB), the National Basketball Association (NBA), or the National Football Association (NFL), since there are only relatively few teams and their quality does not vary too widely. As an effect of this, match statistics should be meaningful early on since the competition is strong, and teams play the same opponents frequently. Additionally, professional leagues typically play more matches per team per season, e.g. 82 in the NBA or 162 in MLB, than in college or amateur leagues in which the sport in question is (supposed to be) only
    [Show full text]
  • Inteligência Computacional Aplicada Ao Futebol Americano
    INTELIGENCIA^ COMPUTACIONAL APLICADA AO FUTEBOL AMERICANO Diego da Silva Rodrigues Tese de Doutorado apresentada ao Programa de P´os-gradua¸c~ao em Engenharia El´etrica, COPPE, da Universidade Federal do Rio de Janeiro, como parte dos requisitos necess´arios `aobten¸c~aodo t´ıtulode Doutor em Engenharia El´etrica. Orientador: Jos´eManoel de Seixas Rio de Janeiro Mar¸code 2018 INTELIGENCIA^ COMPUTACIONAL APLICADA AO FUTEBOL AMERICANO Diego da Silva Rodrigues TESE SUBMETIDA AO CORPO DOCENTE DO INSTITUTO ALBERTO LUIZ COIMBRA DE POS-GRADUAC¸´ AO~ E PESQUISA DE ENGENHARIA (COPPE) DA UNIVERSIDADE FEDERAL DO RIO DE JANEIRO COMO PARTE DOS REQUISITOS NECESSARIOS´ PARA A OBTENC¸ AO~ DO GRAU DE DOUTOR EM CIENCIAS^ EM ENGENHARIA ELETRICA.´ Examinada por: Prof. Jos´eManoel de Seixas, D.Sc. Prof. Alexandre Gon¸calves Evsukoff, D.Sc. Prof. Aline Gesualdi Manh~aes,D.Sc. Prof. Guilherme de Alencar Barreto, Ph.D Prof. Sergio Lima Netto, D.Sc. RIO DE JANEIRO, RJ { BRASIL MARC¸O DE 2018 Rodrigues, Diego da Silva Intelig^encia Computacional Aplicada ao Futebol Americano/Diego da Silva Rodrigues. { Rio de Janeiro: UFRJ/COPPE, 2018. XIV, 70 p.: il.; 29; 7cm. Orientador: Jos´eManoel de Seixas Tese (doutorado) { UFRJ/COPPE/Programa de Engenharia El´etrica, 2018. Refer^enciasBibliogr´aficas:p. 56 { 63. 1. Futebol Americano. 2. Modelos de Ranqueamento. 3. Ranqueamento Bayesiano. 4. Sport Science. 5. Redes Neurais. I. Seixas, Jos´e Manoel de. II. Universidade Federal do Rio de Janeiro, COPPE, Programa de Engenharia El´etrica.III. T´ıtulo. iii Este trabalho ´ededicado a todos os atletas do pa´ıs. iv Agradecimentos Agrade¸co`aminha familia, por ter oferecido as condi¸c~oesnecess´ariaspara que, ao contr´arioda gigantesca maioria de brasileiros, eu tivesse acesso `aeduca¸c~aoda me- lhor qualidade poss´ıvel neste pa´ıs.
    [Show full text]
  • Mountain West Conference Strategic Scheduling
    Proceedings of the Annual General Donald R. Keith Memorial Conference West Point, New York, USA May 2, 2019 A Regional Conference of the Society for Industrial and Systems Engineering Mountain West Conference Strategic Scheduling Daniel Jones, Matthew Hargreaves, Noah White, and Jacob Fresella Operations Research Program United States Air Force Academy, Colorado Springs, CO Corresponding author: [email protected] Author Note: Daniel Jones, Matthew Hargreaves, Noah White, and Jacob Fresella are first class (senior) cadets at the United States Air Force Academy collectively working on this project as part of a year-long Operations Research Capstone course. The student authors would like to extend thanks to the advisors involved with this project as well as the client organization, the Mountain West Conference. Abstract: The Mountain West Conference is a Division I-A National Collegiate Athletic Association athletic conference. The MWC uses a tier system based on the Ratings Percentage Index (RPI) to stratify their teams in the sports of basketball, baseball, softball, volleyball, and soccer. They use this system in an attempt to maximize teams’ ability to win and obtain high RPIs to ultimately make post-season tournaments. Our mission is to analyze this strategy’s current performance, provide improvements and recommendations, and use forecasting to help coaches schedule in accordance with MWC strategy. We analyze the impact of individual games on a team’s Ratings Percentage Index ranking, and use this information in conjunction with predicted Ratings Percentage Index rankings in order to provide a tool for coaches to use. 1. Introduction Our client, the Mountain West Conference (MWC), was conceived on May 26, 1998, when the presidents of eight institutions decided to form a new National Collegiate Athletic Association (NCAA) Division One (D1) intercollegiate athletic conference.
    [Show full text]
  • The Numbers Game
    The Numbers Game A Qualitative Study on Big Data Analytics and Performance Metrics in Sports. 11107766 Kyra Teklu Faculty of Humanities MA New Media and Digital Cultures Thesis Supervisor: Dr. N.A.J.M. van Doorn 2 Table of Contents Chapter 1. Introduction 3 Chapter 2. The Influence of Statistics 7 2.1 The Role of Quantification 7 2.2. The Role of Statistics and Metrics 10 2.3. Data and The Databases 16 Chapter 3. The Commercialisation of Sports 25 3.1. The Early Years 25 3.2. Media Rights and Sponsorship 29 3.3. Big Data shapes the Sport Industry 38 Chapter 4. Methodology 43 Chapter 5. Findings 54 5.1. The Tools 54 5.2. Player Recruitment 61 5.3 Rise in More Interesting Data 67 5.4. Unpredictability of Data 72 5.4. Economic Value 74 Chapter 6. Conclusion 78 References. 81 Chapter 7. Appendices 93 Appendix A-Examples of Metrics 93 Appendix B-Description of Participants 94 Appendix C-Participant Demographic Table 96 Appendix D-Transcripts 97 3 Chapter 1. Introduction A New Science of Winning The image you see on the first page of this thesis is a visualization of passes. This web depicts the England football teams passes during the first half of a game.1 The blue arrows indicate successful passes and their direction. Red indicates the failed attempts. Such examples of statistical visualizations focusing on performance are not uncommon nowadays, due to the advanced nature of data analytics and performance metrics within the professional sports industry. This is perhaps best indicated by the fact, today, 19 of the 20 Premier League2 3 teams use Prozone (Medeiros 2014).
    [Show full text]
  • Application of Pagerank Algorithm to Division I NCAA Men's Basketball As Bracket Formation and Outcome Predictive Utility
    Journal of Sports Analytics 7 (2021) 1–9 1 DOI 10.3233/JSA-200425 IOS Press Application of PageRank Algorithm to Division I NCAA men’s basketball as bracket formation and outcome predictive utility Nicole R. Matthewsa, Andrew McClaina, Chase M.L. Smithb and Adam G. Tennantc,∗ aBusiness and Engineering Center, University of Southern Indiana, University Boulevard, Evansville, IN, USA bAssistant Professor of Sport Management, Health Professions Center, University of Southern Indiana, University Boulevard, Evansville, IN, USA cAssistant Professor Engineering, Business and Engineering Center, University of Southern Indiana, University Boulevard, Evansville, IN, USA Abstract. This article examines the use of the PageRank algorithm to rank the teams and predict team performance in the tournament. This method has the potential to be utilized as an alternative method to choose tournament participants, as opposed to the traditional ranking and seeding process currently employed by the NCAA. PageRank allows for the consideration of all games played during the regular season (average of 5832 games per season) and for customizable performance weights in the prediction. The PageRank algorithm is a viable tool in predicting tournament outcomes due to depth and extensiveness of the data. The PageRank analysis helped to predict over half of the tournament game outcomes correctly in 2014-2018 and produced an average bracket score of 81.8 points out of 192 possible over the same 5 years. Keywords: Basketball, pagerank, ranking, NCAA 1. Introduction to watch every game every team plays within the NCAA D1 (National Collegiate Athletic Association, Each year, thousands of people fill out a bracket Division-1) season.
    [Show full text]