Sports Analytics Algorithms for Performance Prediction

Total Page:16

File Type:pdf, Size:1020Kb

Sports Analytics Algorithms for Performance Prediction Sports Analytics Algorithms for Performance Prediction Paschalis Koudoumas SID: 3308190012 SCHOOL OF SCIENCE & TECHNOLOGY A thesis submitted for the degree of Master of Science (MSc) in Data Science JANUARY 2021 THESSALONIKI – GREECE -i- Sports Analytics Algorithms for Performance Prediction Paschalis Koudoumas SID: 3308190012 Supervisor: Assoc. Prof. Christos Tjortjis Supervising Committee Mem- Assoc. Prof. Maria Drakaki bers: Dr. Leonidas Akritidis SCHOOL OF SCIENCE & TECHNOLOGY A thesis submitted for the degree of Master of Science (MSc) in Data Science JANUARY 2021 THESSALONIKI – GREECE -ii- Abstract This dissertation was written as a part of the MSc in Data Science at the International Hellenic University. Sports Analytics exist as a term and concept for many years, but nowadays, it is imple- mented in a different way that affects how teams, players, managers, executives, betting companies and fans perceive statistics and sports. Machine Learning can have various applications in Sports Analytics. The most widely used are for prediction of match outcome, player or team performance, market value of a player and injuries prevention. This dissertation focuses on the quintessence of foot- ball, which is match outcome prediction. The main objective of this dissertation is to explore, develop and evaluate machine learning predictive models for English Premier League matches’ outcome prediction. A comparison was made between XGBoost Classifier, Logistic Regression and Support Vector Classifier. The results show that the XGBoost model can outperform the other models in terms of accuracy and prove that it is possible to achieve quite high accuracy using Extreme Gradient Boosting. -iii- Acknowledgements At this point, I would like to thank my Supervisor, Professor Christos Tjortjis, for offer- ing his help throughout the process and providing me with essential feedback and valu- able suggestions to the issues that occurred. Paschalis Koudoumas 4-1-2021 -iv- Contents ABSTRACT ................................................................................................................. III ACKNOWLEDGEMENTS ......................................................................................... IV CONTENTS ................................................................................................................... V LIST OF FIGURES .................................................................................................... VII LIST OF TABLES .................................................................................................... VIII 1 INTRODUCTION ...................................................................................................... 1 2 BACKGROUND ....................................................................................................... 3 2.1 HISTORICAL BACKGROUND ................................................................................ 3 2.1.1 Baseball ................................................................................................ 3 2.1.2 Basketball ............................................................................................. 5 2.1.3 Football ................................................................................................. 7 2.2 LITERATURE REVIEW ......................................................................................... 9 2.2.1 Introduction........................................................................................... 9 2.2.2 Football Prediction Overview........................................................... 10 2.2.3 Rating Systems ................................................................................. 12 2.2.4 Factors that affect the result of a football game ........................... 13 2.2.5 Expected Goals (xG) Models .......................................................... 14 3 GENERAL TERMS ................................................................................................ 16 3.1 DATA MINING ................................................................................................... 16 3.2 MACHINE LEARNING ......................................................................................... 17 3.2.1 Machine Learning Algorithm ............................................................ 21 3.3 SPORTS ANALYTICS ......................................................................................... 22 4 METHODOLOGY ................................................................................................... 24 4.1 PROCESS DESCRIPTION .................................................................................. 24 4.1.1 Data Collection .................................................................................. 25 4.1.2 Data Cleaning .................................................................................... 26 -v- 4.1.3 Feature Extraction and Feature Engineering ................................ 27 5 PREDICTION MODELS........................................................................................ 32 5.1 EXPLANATORY VARIABLES .............................................................................. 32 5.2 CLASSIFICATION MODELS ................................................................................ 33 5.2.1 Boosting with XGBoost..................................................................... 33 5.2.2 Hyperparameter Tuning ................................................................... 33 5.2.3 Grid Search Cross Validation .......................................................... 34 5.2.4 Logistic Regression........................................................................... 35 5.2.5 Hyperparameter Tuning ................................................................... 35 5.2.6 Support Vector Machines................................................................. 36 5.2.7 Hyperparameter Tuning ................................................................... 36 6 RESULTS AND DISCUSSION ............................................................................ 37 6.1 XGBOOST CLASSIFIER .................................................................................... 37 6.2 LOGISTIC REGRESSION .................................................................................... 38 6.3 SUPPORT VECTOR CLASSIFIER ....................................................................... 39 6.4 COMPARATIVE RESULTS FOR CLASSIFIERS .................................................... 40 6.5 EVALUATION OF RESULTS ............................................................................... 41 7 CONCLUSION AND FUTURE WORK ............................................................... 43 7.1 CONCLUSIONS .................................................................................................. 43 7.2 FUTURE WORK ................................................................................................. 43 REFERENCES ............................................................................................................. 45 -vi- List of Figures Figure 1: Branch Rickey's famous formula[2]. ........................................................... 4 Figure 2: Team salaries in 2002 in MLB [4]. .............................................................. 5 Figure 3: SportVU camera in NBA [10]....................................................................... 6 Figure 4: Kyrie Irving's report from SportVU [11]. ..................................................... 7 Figure 5: Long-ball theory [15] ..................................................................................... 8 Figure 6: Score and the Expected Goals [17]. .......................................................... 9 Figure 7: Data Mining process. [49] .......................................................................... 17 Figure 8: The hierarchy of Machine Learning [54]. ................................................. 18 Figure 9: Machine Learning structure and the most important tasks [55]. .......... 19 Figure 10: Supervised Learning [58]. ........................................................................ 19 Figure 11:Unsupervised Learning [60]. .................................................................... 20 Figure 12: Reinforcement Learning [62]. .................................................................. 21 Figure 13: Machine Learning algorithms [64]. ......................................................... 22 Figure 14: Sports Analytics [67]. ................................................................................ 23 Figure 15: Jupyter notebook [69]. .............................................................................. 24 Figure 16: Dataset example and dataset shape. .................................................... 27 Figure 17: Table example after initial feature extraction. ....................................... 28 Figure 18: Transforming full-time result into numeric data. ................................... 28 Figure 19: Table with averages. ................................................................................ 30 Figure 20: First 5 rows of the table with the home team form features. .............. 31 Figure 21: Table with features related to form both for home and away team. 31 Figure 22: Main metrics results and confusion matrix for XGBoost Classifier. .. 38 Figure 23: Main metrics results and confusion matrix for Logistic Regression. 39 Figure 24: Main metrics results and confusion matrix for SVC ............................
Recommended publications
  • Making It Pay to Be a Fan: the Political Economy of Digital Sports Fandom and the Sports Media Industry
    City University of New York (CUNY) CUNY Academic Works All Dissertations, Theses, and Capstone Projects Dissertations, Theses, and Capstone Projects 9-2018 Making It Pay to be a Fan: The Political Economy of Digital Sports Fandom and the Sports Media Industry Andrew McKinney The Graduate Center, City University of New York How does access to this work benefit ou?y Let us know! More information about this work at: https://academicworks.cuny.edu/gc_etds/2800 Discover additional works at: https://academicworks.cuny.edu This work is made publicly available by the City University of New York (CUNY). Contact: [email protected] MAKING IT PAY TO BE A FAN: THE POLITICAL ECONOMY OF DIGITAL SPORTS FANDOM AND THE SPORTS MEDIA INDUSTRY by Andrew G McKinney A dissertation submitted to the Graduate Faculty in Sociology in partial fulfillment of the requirements for the degree of Doctor of Philosophy, The City University of New York 2018 ©2018 ANDREW G MCKINNEY All Rights Reserved ii Making it Pay to be a Fan: The Political Economy of Digital Sport Fandom and the Sports Media Industry by Andrew G McKinney This manuscript has been read and accepted for the Graduate Faculty in Sociology in satisfaction of the dissertation requirement for the degree of Doctor of Philosophy. Date William Kornblum Chair of Examining Committee Date Lynn Chancer Executive Officer Supervisory Committee: William Kornblum Stanley Aronowitz Lynn Chancer THE CITY UNIVERSITY OF NEW YORK I iii ABSTRACT Making it Pay to be a Fan: The Political Economy of Digital Sport Fandom and the Sports Media Industry by Andrew G McKinney Advisor: William Kornblum This dissertation is a series of case studies and sociological examinations of the role that the sports media industry and mediated sport fandom plays in the political economy of the Internet.
    [Show full text]
  • Open Evan Bittner Thesis.Pdf
    THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF STATISTICS PREDICTING MAJOR LEAGUE BASEBALL PLAYOFF PROBABILITIES USING LOGISTIC REGRESSION EVAN J. BITTNER FALL 2015 A thesis submitted in partial fulfillment of the requirements for a baccalaureate degree in Statistics with honors in Statistics Reviewed and approved* by the following: Andrew Wiesner Lecturer of Statistics Thesis Supervisor Murali Haran Associate Professor of Statistics Honors Adviser * Signatures are on file in the Schreyer Honors College. i ABSTRACT Major League Baseball teams are constantly assessing whether or not they think their teams will make the playoffs. Many sources publish playoff probabilities or odds throughout the season using advanced statistical methods. These methods are somewhat secretive and typically advanced and difficult to understand. The goal of this work is to determine a way to calculate playoff probabilities midseason that can easily be understood and applied. The goal is to develop a method and compare its predictive accuracy to the current methods published by statistical baseball sources such as Baseball Prospectus and Fangraphs. ii TABLE OF CONTENTS List of Figures .............................................................................................................. iii List of Tables ............................................................................................................... iv Acknowledgements .....................................................................................................
    [Show full text]
  • Sabermetrics Over Time: Persuasion and Symbolic Convergence Across a Diffusion of Innovations
    SABERMETRICS OVER TIME: PERSUASION AND SYMBOLIC CONVERGENCE ACROSS A DIFFUSION OF INNOVATIONS BY NATHANIEL H. STOLTZ A Thesis Submitted to the Graduate Faculty of WAKE FOREST UNIVERSITY GRADUATE SCHOOL OF ARTS AND SCIENCES in Partial Fulfillment of the Requirements for the Degree of MASTER OF ARTS Communication May 2014 Winston-Salem, North Carolina Approved By: Michael D. Hazen, Ph. D., Advisor John T. Llewellyn, Ph. D., Chair Todd A. McFall, Ph. D. ii Acknowledgments First and foremost, I would like to thank everyone who has assisted me along the way in what has not always been the smoothest of academic journeys. It begins with the wonderful group of faculty I encountered as an undergraduate in the James Madison Writing, Rhetoric, and Technical Communication department, especially my advisor, Cindy Allen. Without them, I would never have been prepared to complete my undergraduate studies, let alone take on the challenges of graduate work. I also want to thank the admissions committee at Wake Forest for giving me the opportunity to have a graduate school experience at a leading program. Further, I have unending gratitude for the guidance and patience of my thesis committee: Dr. Michael Hazen, who guided me from sitting in his office with no ideas all the way up to achieving a completed thesis, Dr. John Llewellyn, whose attention to detail helped me push myself and my writing to greater heights, and Dr. Todd McFall, who agreed to assist the project on short notice and contributed a number of interesting ideas. Finally, I have many to thank on a personal level.
    [Show full text]
  • Changing Baseball Forever Jake Sumeraj College of Dupage
    ESSAI Volume 12 Article 34 Spring 2014 Changing Baseball Forever Jake Sumeraj College of DuPage Follow this and additional works at: http://dc.cod.edu/essai Recommended Citation Sumeraj, Jake (2014) "Changing Baseball Forever," ESSAI: Vol. 12, Article 34. Available at: http://dc.cod.edu/essai/vol12/iss1/34 This Selection is brought to you for free and open access by the College Publications at DigitalCommons@COD. It has been accepted for inclusion in ESSAI by an authorized administrator of DigitalCommons@COD. For more information, please contact [email protected]. Sumeraj: Changing Baseball Forever Changing Baseball Forever by Jake Sumeraj (Honors English 1102) idden in the back rooms of any modern major league baseball franchise are a select few individuals that are drastically changing the way teams operate. Using numbers and Hborderline obsessive tracking of each player’s every move, they see things that elude the everyday baseball fan. These are the baseball analysts. Although they do the research that can potentially decide which player becomes the face of the team, these analysts can likely walk the city streets without a single diehard fan knowing who they are. Baseball analysts get almost zero publicity. However, their work is clearly visible at any baseball game. A catcher’s decision to call for a 2-0 curveball to a power hitter, the manager’s choice to continuously play a hitter that’s only batting 0.238, and a defensive shift to the left that leaves the entire right side of the infield open are all moves that are the result of research done by analysts.
    [Show full text]
  • The Shrinkage of the Pythagorean Exponents
    Journal of Sports Analytics 2 (2016) 37–48 37 DOI 10.3233/JSA-160017 IOS Press The Shrinkage of the Pythagorean exponents John Chena,∗ and Tengfei Lib aKing George V School, 2 Tin Kwong Road, Homantin, Kowloon, Hong Kong bDepartment of Biostatistics, CB# 7420, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA Abstract. The Pythagorean expectation is a formula designed by Bill James in the 1980s for estimating the number of games that a baseball team is expected to win. The formula has since been applied in basketball, hockey, football, and soccer. When used to assess National Basketball Association (NBA) teams, the optimal Pythagorean exponent is generally believed to be between 14 and 17. In this study, we empirically investigated the accuracy of the formula in the NBA by using data from the 1993-1994 to 2013-2014 seasons. This study confirmed the results of previous studies, which found that the Pythagorean exponent is slightly higher than 14 in the fit scenario, in which the strengths and winning percentage of a team are calculated using data from the same period. However, to predict future winning percentages according to the current evaluations of team strengths, the optimal Pythagorean exponent in the prediction scenario decreases substantially from 14. The shrinkage factor varies from 0.5 early in the season to nearly 1 toward the end of the season. Two main reasons exist for the decrease: the current evaluated strengths correlate with the current winning percentage more strongly than they do with the future winning percentage, and the scales of strengths evaluated in the early or middle part of a season tend to exceed those evaluated at the end of the season because of the evening out of randomness or the law of averages.
    [Show full text]
  • Major Qualifying Project: Advanced Baseball Statistics
    Major Qualifying Project: Advanced Baseball Statistics Matthew Boros, Elijah Ellis, Leah Mitchell Advisors: Jon Abraham and Barry Posterro April 30, 2020 Contents 1 Background 5 1.1 The History of Baseball . .5 1.2 Key Historical Figures . .7 1.2.1 Jerome Holtzman . .7 1.2.2 Bill James . .7 1.2.3 Nate Silver . .8 1.2.4 Joe Peta . .8 1.3 Explanation of Baseball Statistics . .9 1.3.1 Save . .9 1.3.2 OBP,SLG,ISO . 10 1.3.3 Earned Run Estimators . 10 1.3.4 Probability Based Statistics . 11 1.3.5 wOBA . 12 1.3.6 WAR . 12 1.3.7 Projection Systems . 13 2 Aggregated Baseball Database 15 2.1 Data Sources . 16 2.1.1 Retrosheet . 16 2.1.2 MLB.com . 17 2.1.3 PECOTA . 17 2.1.4 CBS Sports . 17 2.2 Table Structure . 17 2.2.1 Game Logs . 17 2.2.2 Play-by-Play . 17 2.2.3 Starting Lineups . 18 2.2.4 Team Schedules . 18 2.2.5 General Team Information . 18 2.2.6 Player - Game Participation . 18 2.2.7 Roster by Game . 18 2.2.8 Seasonal Rosters . 18 2.2.9 General Team Statistics . 18 2.2.10 Player and Team Specific Statistics Tables . 19 2.2.11 PECOTA Batting and Pitching . 20 2.2.12 Game State Counts by Year . 20 2.2.13 Game State Counts . 20 1 CONTENTS 2 2.3 Conclusion . 20 3 Cluster Luck 21 3.1 Quantifying Cluster Luck . 22 3.2 Circumventing Cluster Luck with Total Bases .
    [Show full text]
  • Download a Copy of the 2019 Basketball Career Conference
    “To Catch a Foul Ball SMWW You Need a Ticket to the Game” - Dr. G. Lynn Lashbrook Basketball Career Conference July 6-7, 2019 The Global Leader in Sports Business Education | SMWW.com The Global Leader in Sports Education | USA & Canada 503-445-7105 | UK +44(0) 871-288-4799 | SMWW.com BASKETBALL CAREER CONFERENCE AGENDA SMWW SUCCESS STORIES Saturday, July 6th - SMWW Welcome Reception Over 15,000 graduates working in over a 160 countries! John Ross, Portland Trail Blazers Michael Gershon Keystone Ice Miners Brian Graham, National Scouting Report 5:00-7:00pm SMWW Welcome Reception with cocktails and hors’d’oeuvres Mark Warkentien, New York Knicks Travis Gibson Champion Hockey Brian Orth, Cloverdale Minor Hockey Association Alexa Atria, New York Yankees Frank Gilberti Chatham High School Brian Gioia, Chicago Bulls at Bahama Breeze, 375 Hughes Center Dr, Las Vegas, NV 89109 Simon Barrette Columbus Blue Jackets Bob Gillen Yellowstone Quake Brian Adams, Boston Celtics Sunday, July 7th - SMWW Career Conference Paul Epstein, San Francisco 49ers Jessica Gillis Hockey New Brunswick Chad Pennick, Denver Nuggets Demetri Betzios, Toronto Argonauts Tony Griffo London Knights Chris Cordero, Miami Heat Four Seasons Hotel, 3960 S Las Vegas Blvd, Las Vegas, NV 89119 Andre Sherard, Sporting Kansas City Mario Guido Rinknet Christian Alicpala, Toronto Raptors Taylore Scott, Dallas Cowboys Brian Guindon HockeyTwentyFourSeven Christian Stoltz, USAL Rugby Alireza Absalan, FIFA Agent Aaron Guli President Irish Ice Hockey Association Christian Payne, Dickinson
    [Show full text]
  • Baseball Prospectus – Escaping Bill James’ Shadow ...James Fraser
    By the Numbers Volume 10, Number 2 The Newsletter of the SABR Statistical Analysis Committee May, 2000 Welcome Phil Birnbaum, Editor Welcome to the May BTN. Thanks to Rob Wood, Clifford Blau, your help. Hope you enjoy the issue, and please contact our James Fraser, and Tom Hanrahan for their time, effort, and contributors if you liked their work. Deadline for contributions contribution. Thanks also to the anonymous peer reviewers for for next issue is July 24. A Brief Review of “Defense Independent Pitching Stats” Clifford Blau One of the most interesting articles I have read recently is correlation with the following season’s ERA than does the entitled “Defense Independent Pitching Stats, “ which was current season’s ERA, or component ERA (essentially ERA written by Voros McCracken. It can be found on the Internet at calculated using a runs predictor formula.) http://www.baseballstuff.com/fraser/articles/dips.html. Some I recommend it be read comments on by all. the article: A In this issue better test To summarize this and a would involve follow up article, Mr. A Brief Review of “Defense Independent a park- McCracken states that Pitching Stats” ..............................Clifford Blau.......................1 adjusted rate. the ratio of hits allowed News on the SABR Publication Front ........................Neal Traven.........................3 Also, the by pitchers to balls in Baseball Prospectus – Escaping Bill James’ Shadow ...James Fraser........................4 discussion on play is not a skill. “Best Teams” Logic Flawed..........................................Clifford Blau.......................6 the posting Specifically, he shows Probability of Performance – A Comment....................Rob Wood...........................7 board included that the correlation of Clutch Teams in 1999 ..................................................Tom Hanrahan ..................12 a consideration this ratio from one year How Often Does the Best Team Win the Pennant?.......Rob Wood.........................15 of career to another is so low that numbers.
    [Show full text]
  • CAWS Career Gauge Measure for Best Players of the Live Ball
    A Century of Modern Baseball: 1920 to 2019 The Best Players of the Era Michael Hoban, Ph.D. Professor Emeritus (mathematics) – The City U of NY Author of DEFINING GREATNESS: A Hall of Fame Handbook (2012) “Mike, … I appreciate your using Win Shares for the purpose for which it was intended. …thanks … Bill (James)” Contents Introduction 3 Part 1 - Career Assessment The Win Shares System 12 How to Judge a Career 18 The 250/1800 Benchmark - Jackie Robinson 27 The 180/2400 Benchmark - Pedro and Sandy 30 The 160/1500 Benchmark - Mariano Rivera 33 300 Win Shares - A New “Rule of Thumb” 36 Hall of Fame Elections in the 21st Century 41 Part 2 - The Lists The 21st Century Hall of Famers (36) 48 Modern Players with HOF Numbers at Each Position 52 The Players with HOF Numbers – Not Yet in the Hall (24) 58 The Pitchers with HOF Numbers – Not Yet in the Hall (6) 59 The 152 Best Players of the Modern Era 60 The Complete CAWS Ranking for Position Players 67 The Complete CAWS Ranking for Pitchers 74 The Hall of Famers Who Do Not Have HOF Numbers (52) 78 2 Introduction The year 2019 marks 100 years of “the live-ball era” (that is, modern baseball) – 1920 to 2019. This monograph will examine those individuals who played the majority of their careers during this era and it will indicate who were the best players. As a secondary goal, it will seek to identify “Hall of Fame benchmarks” for position players and pitchers – to indicate whether a particular player appeared to post “HOF numbers” during his on-field career.
    [Show full text]
  • The Numbers Game
    The Numbers Game A Qualitative Study on Big Data Analytics and Performance Metrics in Sports. 11107766 Kyra Teklu Faculty of Humanities MA New Media and Digital Cultures Thesis Supervisor: Dr. N.A.J.M. van Doorn 2 Table of Contents Chapter 1. Introduction 3 Chapter 2. The Influence of Statistics 7 2.1 The Role of Quantification 7 2.2. The Role of Statistics and Metrics 10 2.3. Data and The Databases 16 Chapter 3. The Commercialisation of Sports 25 3.1. The Early Years 25 3.2. Media Rights and Sponsorship 29 3.3. Big Data shapes the Sport Industry 38 Chapter 4. Methodology 43 Chapter 5. Findings 54 5.1. The Tools 54 5.2. Player Recruitment 61 5.3 Rise in More Interesting Data 67 5.4. Unpredictability of Data 72 5.4. Economic Value 74 Chapter 6. Conclusion 78 References. 81 Chapter 7. Appendices 93 Appendix A-Examples of Metrics 93 Appendix B-Description of Participants 94 Appendix C-Participant Demographic Table 96 Appendix D-Transcripts 97 3 Chapter 1. Introduction A New Science of Winning The image you see on the first page of this thesis is a visualization of passes. This web depicts the England football teams passes during the first half of a game.1 The blue arrows indicate successful passes and their direction. Red indicates the failed attempts. Such examples of statistical visualizations focusing on performance are not uncommon nowadays, due to the advanced nature of data analytics and performance metrics within the professional sports industry. This is perhaps best indicated by the fact, today, 19 of the 20 Premier League2 3 teams use Prozone (Medeiros 2014).
    [Show full text]
  • Forecasting Outcomes of Major League Baseball Games Using Machine Learning
    Forecasting Outcomes of Major League Baseball Games Using Machine Learning EAS 499 Senior Capstone Thesis University of Pennsylvania An undergraduate thesis submitted in partial fulfillment of the requirements for the Bachelor of Applied Science in Computer and Information Science Andrew Y. Cui1 Thesis Advisor: Dr. Shane T. Jensen2 Faculty Advisor: Dr. Lyle H. Ungar3 April 29, 2020 1 Candidate for Bachelor of Applied Science, Computer and Information Science. Email: [email protected] 2 Professor of Statistics. Email: [email protected] 3 Professor of Computer and Information Science. Email: [email protected] Abstract When two Major League Baseball (MLB) teams meet, what factors determine who will win? This apparently simple classification question, despite the gigabytes of detailed baseball data, has yet to be answered. Published public-domain work, applying models ranging from one-line Bayes classifiers to multi-layer neural nets, have been limited to classification accuracies of 55- 60%. In this paper, we tackle this problem through a mix of new feature engineering and machine learning modeling. Using granular game-level data from Retrosheet and Sean Lahman’s Baseball Database, we construct detailed game-by-game covariates for individual observations that we believe explain greater variance than the aggregate numbers used by existing literature. Our final model is a regularized logistic regression elastic net, with key features being percentage differences in on-base percentage (OBP), rest days, isolated power (ISO), and average baserunners allowed (WHIP) between our home and away teams. Trained on the 2001-2015 seasons and tested on over 9,700 games during the 2016-2019 seasons, our model achieves a classification accuracy of 61.77% and AUC of 0.6706, exceeding or matching the strongest findings in the existing literature, and stronger results peaking at 64-65% accuracy when evaluated monthly.
    [Show full text]
  • THE JAMES FUNCTION 1. Introduction In
    THE JAMES FUNCTION CHRISTOPHER N. B. HAMMOND, WARREN P. JOHNSON, AND STEVEN J. MILLER Abstract. We investigate the properties of the James function, associated with Bill James's so-called \log5 method," which assigns a probability to the result of a game between two teams based on their respective winning percent- ages. We also introduce and study a class of functions, which we call Jamesian, that satisfy the same a priori conditions that were originally used to describe the James function. 1. Introduction In his 1981 Baseball Abstract [8], Bill James posed the following problem: sup- pose two teams A and B have winning percentages a and b respectively, having played equally strong schedules in a game such as baseball where there are no ties. If A and B play each other, what is the probability P (a; b) that A wins? This question is perhaps more relevant to other sports, because in baseball the outcome is particularly sensitive to the pitching matchup. (In 1972, the Philadel- phia Phillies won 29 of the 41 games started by Steve Carlton, and 30 of the 115 games started by their other pitchers.) The answer is quite interesting, even if its applicability is somewhat limited by the tacit assumption of uniformity. For 0 < a < 1 and c > 0, define qc(a) by q (a) (1.1) a = c . qc(a) + c James calls q 1 (a) the log5 of a, and does not consider any other values of c. He 2 claims that q 1 (a) (1.2) P (a; b) = 2 . q 1 (a) + q 1 (b) 2 2 This technique is sometimes called the log5 method of calculating P (a; b), although we will avoid using this name as there is nothing obviously logarithmic about it.
    [Show full text]