Sports Data Mining

Total Page:16

File Type:pdf, Size:1020Kb

Sports Data Mining SPORTS DATA MINING SPORTS DATA MINING Robert P. Schumaker Osama K. Solieman Hsinchun Chen Robert P. Schumaker Iona College New Rochelle, New York Osama K. Solieman Tucson, Arizona Hsinchun Chen University of Arizona Tucson, Arizona TABLE OF CONTENTS LIST OF FIGURES.................................................................................... xiii LIST OF TABLES ...................................................................................... xv PREFACE ................................................................................................. xvii CHAPTER 1. SPORTS DATA MINING Chapter Overview ....................................................................................................... 1 1. Definition ........................................................................................................... 2 2. History ............................................................................................................... 6 3. Societal Dimensions......................................................................................... 10 4. The International Landscape ............................................................................ 11 5. Criticisms ......................................................................................................... 14 6. Questions for Discussion ................................................................................. 15 CHAPTER 2. SPORTS DATA MINING METHODOLOGY Chapter Overview ..................................................................................................... 17 1. Scientific Foundation ....................................................................................... 18 2. Traditional Data Mining Applications ............................................................. 20 3. Deriving Knowledge ........................................................................................ 23 4. Questions for Discussion ................................................................................. 24 CHAPTER 3. DATA SOURCES FOR SPORTS Chapter Overview ..................................................................................................... 25 1. Introduction ...................................................................................................... 25 2. Professional Societies ...................................................................................... 26 2.1 The Society for American Baseball Research (SABR) ........................... 26 2.2 Association for Professional Basketball Research (APBR) .................... 27 2.3 Professional Football Researchers Association (PFRA) ......................... 27 3. Sport-related Associations ............................................................................... 27 3.1 The International Association on Computer Science in vi Sport (IACSS) ......................................................................................... 28 3.2 The International Association for Sports Information (IASI) ................. 28 4. Special Interest Sources ................................................................................... 28 4.1 Baseball ................................................................................................... 28 4.2 Basketball ................................................................................................ 29 4.3 Football ................................................................................................... 29 4.4 Cricket ..................................................................................................... 29 4.5 Soccer ...................................................................................................... 30 4.6 Multiple Sports ........................................................................................ 30 5. Conclusions ...................................................................................................... 30 6. Questions for Discussion ................................................................................. 31 CHAPTER 4. RESEARCH IN SPORTS STATISTICS Chapter Overview ..................................................................................................... 33 1. Introduction ...................................................................................................... 33 2. Sports Statistics ................................................................................................ 34 2.1 History and Inherent Problems of Statistics in Sports ............................. 34 2.2 Bill James ................................................................................................ 35 2.3 Dean Oliver ............................................................................................. 36 3. Baseball Research ............................................................................................ 37 3.1 Building Blocks ....................................................................................... 37 3.2 Runs Created ........................................................................................... 38 3.3 Win Shares .............................................................................................. 39 3.4 Linear Weights and Total Player Rating ................................................. 40 3.5 Pitching Measures ................................................................................... 40 4. Basketball Research ......................................................................................... 41 4.1 Shot Zones............................................................................................... 42 4.2 Player Efficiency Rating ......................................................................... 43 4.3 Plus / Minus Rating ................................................................................. 43 4.4 Measuring Player Contribution to Winning ............................................ 44 4.5 Rating Clutch Performances.................................................................... 44 5. Football Research ............................................................................................ 45 5.1 Defense-Adjusted Value Over Average .................................................. 45 5.2 Defense-Adjusted Points Above Replacement ........................................ 46 5.3 Adjusted Line Yards ............................................................................... 46 6. Emerging Research in Other Sports ................................................................. 46 vii 6.1 NCAA Bowl Championship Series (BCS) .............................................. 47 6.2 NCAA Men’s Basketball Tournament .................................................... 47 6.3 Soccer ...................................................................................................... 48 6.4 Cricket ..................................................................................................... 49 6.5 Olympic Curling ..................................................................................... 49 7. Conclusions ...................................................................................................... 49 8. Questions for Discussion ................................................................................. 49 CHAPTER 5. TOOLS AND SYSTEMS FOR SPORTS DATA ANALYSIS Chapter Overview ..................................................................................................... 51 1. Introduction ...................................................................................................... 51 2. Sports Data Mining Tools ................................................................................ 52 2.1 Advanced Scout ...................................................................................... 53 2.2 Synergy Online ....................................................................................... 53 2.3 SportsVis ................................................................................................. 54 2.4 Sports Data Hub ...................................................................................... 54 3. Scouting tools .................................................................................................. 55 3.1 Digital Scout ........................................................................................... 55 3.2 Inside Edge .............................................................................................. 56 4. Sports Fraud Detection .................................................................................... 59 4.1 Las Vegas Sports Consultants (LVSC) ................................................... 60 4.2 Offshore Gaming ..................................................................................... 60 5. Conclusions ...................................................................................................... 61 6. Questions for Discussion ................................................................................. 61 CHAPTER 6. PREDICTIVE MODELING FOR SPORTS AND GAMING Chapter Overview ..................................................................................................... 63 1. Introduction ...................................................................................................... 63 2. Statistical Simulations ...................................................................................... 64 2.1 Baseball ..................................................................................................
Recommended publications
  • The Evolution of Basketball Statistics Is Finally Here
    ows ind Sta W tis l t Ready for a ic n i S g i o r f t O w a e r h e TURBOSTATS SOFTWARE T The Evolution of Basketball Statistics is Finally Here All New Advanced Metrics, Efficiencies and Four Factors Outstanding Live Scoring BoxScore & Play-by-Play Easy to Learn Automatically Tags Video Fast Substitutions The Worlds Most Color-Coded Shot Advanced Live Game Charts Display... Scoring Software Uncontested Shots Shots off Turnovers View Career Shooting% Second Chance Shots After Each Shot Shots in Transition Includes the Advanced Zones vs Man to Man Statistics Used by Top Last Second Shots Pro & College Teams Blocked Shots New NET Rating System Score Live or by Video Highlights the Most Sort Video Clips Efficient Players Create Highlight Films Includes the eBook Theory of Evolution Creates CyberLink Explaining How the New PowerDirector video Formulas Help You Win project files for DVDs* The Only Software that PowerDirector 12/2010 Tracks Actual Rebound % Optional Player Photos Statistics for Individual Customizable Display Plays and Options Shows Four Factors, Event List, Player Team Stats by Point Guard Simulated image on the Statistics or Scouting Samsung ATIV SmartPC. Actual Screen Size is 11.5 Per Minute Statistics for Visit Samsung.com for tablet All Categories pricing and availability Imports Game Data from * PowerDirector sold separately NCAA BoxScores (Websites HTML or PDF) Runs Standalone on all Windows Laptops, Tablets and UltraBooks. Also tracks ... XP, Vista, 7 plus Windows 8 Pro Effective Field Goal%, True Shooting%, Turnover%, Offensive Rebound%, Individual Possessions, Broadcasts data to iPads and Offensive Efficiency, Time in Game Phones with a low cost app +/- Five Player Combos, Score on the Only Tablets Designed for Data Entry Defensive Points Given Up and more..
    [Show full text]
  • Chapter Two Massey’S Method
    Chapter Two Massey’s Method The Bowl Championship Series (BCS) is a rating system for NCAA college football that wasdesignedtodeterminewhichteamsareinvitedtoplayinwhichbowlgames.The BCS has become famous, and perhaps notorious, for the ratings it generates for each team in the NCAA. These ratings are assembled from two sources, humans and computers. Human input comes from the opinions of coaches and media. Computer input comes from six computer and mathematical models—details are given in the aside on page 17. The BCS ratings for the 2001 and 2003 seasons are known as particularly controversial among sports fans and analysts. The flaws in the BCS selection system as opposed to a tournament playoff are familiar to most, including the President of the United States—read the aside on page 19. Initial Massey Rating Method In 1997, Kenneth Massey, then an undergraduate at Bluefield College, created a method for ranking college football teams. He wrote about this method, which uses the mathematical theory of least squares, as his honors thesis [52]. In this book we refer to this as the Massey method, despite the fact that there are actually several other methods attributed to Ken Massey. Massey has since become a mathematics professor at Carson-Newman College and today continues to refine his sports ranking models. Professor Massey has created various rat- ing methods, one of which is used by the Bowl Championship Series K. Massey (or BCS) system to select NCAA football bowl matchups—see the aside on page 17. The Colley method, described in the next chapter, is also one of the six computer rating sys- tems used by the BCS.
    [Show full text]
  • Australian Basketball Statistics Association
    Basketball Statistics Calling Protocol Calling Protocol – April 2009 Edition. Australian Basketball Statistics Committee AUSTRALIAN BASKETBALL STATISTICS COMMITTEE CALLING PROTOCOL APRIL 2009 EDITION 2 Calling Protocol – April 2009 Edition. Australian Basketball Statistics Committee Written by The Australian Basketball Statistics Committee The contents of this manual may not be altered or copied after alteration TABLE OF CONTENTS Calling Protocol ........................................................................................................4 Reasons for a Protocol: ................................................................................................. 4 General Principles: ........................................................................................................ 4 Calling The Action: ....................................................................................................... 5 LiveStats - SPECIFIC CALLS .................................................................................. 7 Time Outs: .................................................................................................................... 7 Substitutions: ............................................................................................................... 7 Player Checks: .............................................................................................................. 7 CALLING IN SEQUENCE ......................................................................................... 8 3 Calling Protocol
    [Show full text]
  • Data-Driven Basketball Web Application for Support in Making Decisions
    Data-driven Basketball Web Application for Support in Making Decisions Tomislav Horvat1a, Ladislav Havaš1b, Dunja Srpak1c and Vladimir Medved2d 1Department of Electrical Engineering, University North, 104 Brigade 3, Varaždin, Croatia 2Department of General and Applied Kinesiology, Faculty of Kinesiology, University of Zagreb, Zagreb, Croatia Keywords: Basketball, Information System, Making Decisions, Statistics Analysis, Web Application. Abstract: Statistical analysis combined with data mining and machine learning is increasingly used in sports. This paper presents an overview of existing commercial information systems used in game analysis and describes the new and improved version of originally developed data-driven Web application / information system called Basketball Coach Assistant (later BCA) for sports statistics and analysis. The aim of BCA is to provide the essential information for decision making in training process and coaching basketball teams. Special emphasis, along with statistical analysis, is given to the player’s progress indicators and statistical analysis based on data mining methods used to define played game point’s difference classes. The results obtained by using BCA information system, presented in tables, proved to be useful in programing training process and making strategic, tactical and operational decisions. Finally, guidelines for the further information system development are given primarily for the use of data mining and machine learning methods. 1 INTRODUCTION information for decision making in training process and coaching basketball teams. The first version of Nowadays, sports statistics and analysis, more the BCA information system, called AssistantCoach, particular the information and communication was presented at the International Congress on Sport technologies are omnipresent in sport and have Sciences Research and Technology Support in Lisbon become a very important factor in making decisions (Horvat et al., 2015).
    [Show full text]
  • Official Basketball Statistics Rules Basic Interpretations
    Official Basketball Statistics Rules With Approved Rulings and Interpretations (Throughout this manual, Team A players have last names starting with “A” the shooter tries to control and shoot the ball in the and Team B players have last names starting with “B.”) same motion with not enough time to get into a nor- mal shooting position (squared up to the basket). Article 2. A field goal made (FGM) is credited to a play- Basic Interpretations er any time a FGA by the player results in the goal being (Indicated as “B.I.” references throughout manual.) counted or results in an awarded score of two (or three) points except when the field goal is the result of a defen- sive player tipping the ball in the offensive basket. 1. APPROVED RULING—Approved rulings (indicated as A.R.s) are designed to interpret the spirit of the applica- Related rules in the NCAA Men’s and Women’s Basketball tion of the Official Basketball Rules. A thorough under- Rules and Interpretations: standing of the rules is essential to understanding and (1) 4-33: Definition of “Goal” applying the statistics rules in this manual. (2) 4-49.2: Definition of “Penalty for Violation” (3) 4-69: Definition of “Try for Field Goal” and definition of 2. STATISTICIAN’S JOB—The statistician’s responsibility is “Act of Shooting” to judge only what has happened, not to speculate as (4) 4-73: Definition of “Violation” to what would have happened. The statistician should (5) 5-1: “Scoring” not decide who would have gotten the rebound if it had (6) 9-16: “Basket Interference and Goaltending” not been for the foul.
    [Show full text]
  • Ranking the Greatest NBA Players: an Analytics Analysis
    1 Ranking the Greatest NBA Players: An Analytics Analysis An Honors Thesis by Jeremy Mertz Thesis Advisor Dr. Lawrence Judge Ball State University Muncie, Indiana July 2015 Expected Date of Graduation May 2015 1-' ,II L II/du, t,- i II/em' /.. 2 ?t; q ·7t./ 2 (11 S Ranking the Greatest NBA Players: An Analytics Analysis . Iv/If 7 Abstract The purpose of this investigation was to present a statistical model to help rank top National Basketball Association (NBA) players of all time. As the sport of basketball evolves, the debate on who is the greatest player of all-time in the NBA never seems to reach consensus. This ongoing debate can sometimes become emotional and personal, leading to arguments and in extreme cases resulting in violence and subsequent arrest. Creating a statistical model to rank players may also help coaches determine important variables for player development and aid in future approaches to the game via key data-driven performance indicators. However, computing this type of model is extremely difficult due to the many individual player statistics and achievements to consider, as well as the impact of changes to the game over time on individual player performance analysis. This study used linear regression to create an accurate model for the top 150 player rankings. The variables computed included: points per game, rebounds per game, assists per game, win shares per 48 minutes, and number ofNBA championships won. The results revealed that points per game, rebounds per game, assists per game, and NBA championships were all necessary for an accurate model and win shares per 48 minutes were not significant.
    [Show full text]
  • Predicting Outcomes of NCAA Basketball Tournament Games
    Cripe 1 Predicting Outcomes of NCAA Basketball Tournament Games Aaron Cripe March 12, 2008 Math Senior Project Cripe 2 Introduction All my life I have really enjoyed watching college basketball. When I was younger my favorite month of the year was March, solely because of the fact that the NCAA Division I Basketball Tournament took place in March. I would look forward to filling out a bracket every year to see how well I could predict the winning teams. I quickly realized that I was not an expert on predicting the results of the tournament games. Then I started wondering if anyone was really an expert in terms of predicting the results of the tournament games. For my project I decided to find out, or at least compare some of the top rating systems and see which one is most accurate in predicting the winner of the each game in the Men’s Basketball Division I Tournament. For my project I compared five rating systems (Massey, Pomeroy, Sagarin, RPI) with the actual tournament seedings. I compared these systems by looking at the pre-tournament ratings and the tournament results for 2004 through 2007. The goal of my project was to determine which, if any, of these systems is the best predictor of the winning team in the tournament games. Project Goals Each system that I compared gave a rating to every team in the tournament. For my project I looked at each game and then compared the two team’s ratings. In most cases the two teams had a different rating; however there were a couple of games where the two teams had the same rating which I will address later.
    [Show full text]
  • The Ranking of Football Teams Using Concepts from the Analytic Hierarchy Process
    University of Louisville ThinkIR: The University of Louisville's Institutional Repository Electronic Theses and Dissertations 12-2009 The ranking of football teams using concepts from the analytic hierarchy process. Yepeng Sun 1976- University of Louisville Follow this and additional works at: https://ir.library.louisville.edu/etd Recommended Citation Sun, Yepeng 1976-, "The ranking of football teams using concepts from the analytic hierarchy process." (2009). Electronic Theses and Dissertations. Paper 1408. https://doi.org/10.18297/etd/1408 This Master's Thesis is brought to you for free and open access by ThinkIR: The University of Louisville's Institutional Repository. It has been accepted for inclusion in Electronic Theses and Dissertations by an authorized administrator of ThinkIR: The University of Louisville's Institutional Repository. This title appears here courtesy of the author, who has retained all other copyrights. For more information, please contact [email protected]. THE RANKING OF FOOTBALL TEAMS USING CONCEPTS FROM THE ANALYTIC HIERARCHY PROCESS BY Yepeng Sun Speed Engineering School, 2007 A Thesis Submitted to the Faculty of the Graduate School of the University of Louisville in Partial Fulfillment of the Requirements for the Degree of Master of Engineering Department of Industrial Engineering University of Louisville Louisville, Kentucky December 2009 THE RANKING OF FOOTBALL TEAMS USING CONCEPTS FROM THE ANALYTIC HIERARCHY PROCESS By Yepeng Sun Speed Engineering School, 2007 A Thesis Approved on November 3, 2009 By the following Thesis Committee x Thesis Director x x ii ACKNOWLEDGMENTS I would like to thank my advisor, Dr. Gerald Evans, for his guidance and patience. I attribute the level of my Masters degree to his encouragement and effort and without him this thesis, too, would not have been completed or written.
    [Show full text]
  • Basketball - Wikipedia, the Free Encyclopedia
    Basketball - Wikipedia, the free encyclopedia http://en.wikipedia.org/wiki/Basketball Basketball From Wikipedia, the free encyclopedia Basketball is a sport played by two teams of five players on a rectangular court. The objective is to shoot a ball through a Basketball hoop 18 inches (46 cm) in diameter and 10 feet (3.048 m) high mounted to a backboard at each end. Basketball is one of the world's most popular and widely viewed sports.[1] A team can score a field goal by shooting the ball through the basket during regular play. A field goal scores three points for the shooting team if the player shoots from behind the three-point line, and two points if shot from in front of the line. A team can also score via free throws, which are worth one point, after the other team was assessed with certain fouls. The team with the most points at the end of the game wins, but additional time (overtime) is issued when the score is tied at the end of regulation. The ball can be advanced on the court by bouncing it while walking or running or throwing it to a teammate. It is a violation to lift or drag one's pivot foot without dribbling the ball, to carry it, or to hold the ball with both hands then resume dribbling. Michael Jordan goes for a slam dunk at the old Boston Garden As well as many techniques for shooting, passing, dribbling Highest FIBA and rebounding, basketball teams generally have player governing body positions and offensive and defensive structures (player positioning).
    [Show full text]
  • Sports Analytics from a to Z
    i Table of Contents About Victor Holman .................................................................................................................................... 1 About This Book ............................................................................................................................................ 2 Introduction to Analytic Methods................................................................................................................. 3 Sports Analytics Maturity Model .................................................................................................................. 4 Sports Analytics Maturity Model Phases .................................................................................................. 4 Sports Analytics Key Success Areas ........................................................................................................... 5 Allocative and Dynamic Efficiency ................................................................................................................ 7 Optimal Strategy in Basketball .................................................................................................................. 7 Backwards Selection Regression ................................................................................................................... 9 Competition between Sports Hurts TV Ratings: How to Shift League Calendars to Optimize Viewership .................................................................................................................................................................
    [Show full text]
  • Minimizing Game Score Violations in College Football Rankings
    University of North Florida UNF Digital Commons Management Faculty Publications Department of Management 2005 Minimizing Game Score Violations in College Football Rankings B. Jay Coleman University of North Florida, [email protected] Follow this and additional works at: https://digitalcommons.unf.edu/bmgt_facpub Part of the Management Sciences and Quantitative Methods Commons Recommended Citation Coleman, B. Jay, "Minimizing Game Score Violations in College Football Rankings" (2005). Management Faculty Publications. 1. https://digitalcommons.unf.edu/bmgt_facpub/1 This Article is brought to you for free and open access by the Department of Management at UNF Digital Commons. It has been accepted for inclusion in Management Faculty Publications by an authorized administrator of UNF Digital Commons. For more information, please contact Digital Projects. © 2005 All Rights Reserved informs ® Vol. 35, No. 6, November-December 2005, pp. 483–496 doi 10.1287/inte.1050.0172 issn 0092-2102 eissn 1526-551X 05 3506 0483 ©2005 INFORMS Minimizing Game Score Violations in College Football Rankings B. Jay Coleman Department of Management, Marketing, and Logistics, Coggin College of Business, University of North Florida, 4567 St. Johns Bluff Road, South, Jacksonville, Florida 32224-2645, [email protected] One metric used to evaluate the myriad ranking systems in college football is retrodictive accuracy. Maximizing retrodictive accuracy is equivalent to minimizing game score violations: the number of times a past game’s winner is ranked behind its loser. None of the roughly 100 current ranking systems achieves this objective. Using a model for minimizing violations that exploits problem characteristics found in college football, I found that all previous ranking systems generated violations that were at least 38 percent higher than the minimum.
    [Show full text]
  • An Investigation of the Relationship Between Junior Girl’S Golf Ratings and Ncaa Division I Women’S Golf Ratings
    AN INVESTIGATION OF THE RELATIONSHIP BETWEEN JUNIOR GIRL’S GOLF RATINGS AND NCAA DIVISION I WOMEN’S GOLF RATINGS Patricia Lyn Earley A thesis submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment of the requirements for the degree of Master of Arts in the Department of Exercise and Sport Science (Sport Administration). Chapel Hill 2011 Approved By: Edgar W. Shields, Jr., Ph.D. Barbara Osborne, Esq. Deborah L. Stroman, Ph.D. ABSTRACT PATRICIA LYN EARLEY: An Investigation of the Relationship Between Junior Girl’s Golf Ratings and NCAA Division I Women’s Golf Ratings (Under the direction of Edgar W. Shields, Jr., Ph.D.) Coaches often depend on rating systems to determine who to recruit. But how reliable are these ratings? Do they help coaches find a player that will contribute to the team’s success? This research examined the degree of importance that should be placed on junior golf ratings and if these ratings will help college coaches predict the impact of each recruit. The research sought to discover the relationship between 207 subjects’ junior golf ratings (Golfweek/Sagarin Junior Girls Golf Ratings) and their freshman, sophomore, junior and senior year golf ratings (NCAA Division I Women’s College Golf Ratings), rate of improvement and number of starter years using simple regression. A significant relationship was found between junior golf ratings and the freshman, sophomore, junior and senior year NCAA Division I women’s college golf ratings. A significant relationship was found between junior golf ratings and starter years but not with the rate of improvement.
    [Show full text]