Empirical Study on Relationship Between Sports Analytics and Success in Regular Season and Postseason in Major League Baseball
Total Page:16
File Type:pdf, Size:1020Kb
Journal of Sports Analytics 5 (2019) 205–222 205 DOI 10.3233/JSA-190269 IOS Press Empirical study on relationship between sports analytics and success in regular season and postseason in Major League Baseball David P. Chu∗ and Cheng W. Wang Department of Mathematics and Statistics, University of the Fraser Valley, Abbotsford, BC, Canada Abstract. In this paper, we study the relationship between sports analytics and success in regular season and postseason in Major League Baseball via the empirical data of 2014-2017. The categories of analytics belief, the number of analytics staff, and the total number of research staff employed by MLB teams are examined. Conditional probabilities, correlations, and various regression models are used to analyze the data. It is shown that the use of sports analytics might have some positive impact on the success of teams in the regular season, but not in the postseason. After taking into account the team payroll, we apply partial correlations and partial F tests to analyze the data again. It is found that the use of sports analytics, with team payroll already in the regression model, might still be a good indicator of success in the regular season, but not in the postseason. Moreover, it is shown that both the team payroll and the use of sports analytics are not good indicators of success in the postseason. The predictive modeling of decision trees is also developed, under different kinds of input and target variables, to classify MLB teams into no playoffs or playoffs. It is interesting to note that 87 wins (or 0.537 winning percentage) in a regular season may well be the threshold of advancing into the postseason. Keywords: Sports analytics, correlations, team payroll, regressions, decision trees 1. Introduction they wish to analyze the statistics of their players and those of other teams’ players to devise more strategic In recent years, sports analytics has been very game plans to give their team advantages of win- popular in professional baseball teams among other ning games. One could argue that the sports analytics professional sports in North America. The movie belief in a team is an indicator of management style “Money Ball” vividly portrayed a true story of the or management quality. One would then hypothesize magical use of sabermetrics. A team under finan- that greater management quality, all else being equal, cial constraint in Major League Baseball (MLB) translates into more wins and hence greater success was formed. Nevertheless, it thrived and successfully in both regular season and postseason. One might competed with high payroll teams in the league. Many even desire that this eventually leads to winning a baseball management teams nowadays have allocated championship of the professional sport. But some plenty of financial resources to sports analytics in management teams do not believe in the notion of order to improve teams’ performance. In particular, sports analytics at all. They still utilize the traditional methods to set up game plans, using their professional ∗ Corresponding author: David P. Chu, University of the Fraser knowledge and experience to guide their decisions Valley, Department of Mathematics and Statistics, 33844 King Road, Abbotsford, BC, Canada V2S 7M8. Tel.: 604-504-7441/Ext. rather than taking full advantage of the statistics of 4483; Fax: 604-855-7614; E-mail: [email protected]. all players. Other management teams, however, are 2215-020X/19/$35.00 © 2019 – IOS Press and the authors. All rights reserved This article is published online with Open Access and distributed under the terms of the Creative Commons Attribution Non-Commercial License (CC BY-NC 4.0). 206 D.P. Chu and C.W. Wang / Sports Analytics and success in regular season and postseason in MLB skeptical or gradually buy in the idea of sports ana- sample correlation coefficients between the success lytics. of teams and their team payroll for both regular season In this paper, we study the relationship between and postseason of 2014-2017. The Pearson sample sports analytics and the performance of MLB teams. correlation coefficients between the success of teams In particular, we look at the final standings of both and their use of sports analytics are calculated as well. regular season and postseason of 2014-2017. To These results are shown in Section 3. assess the use of sports analytics in MLB teams, we Binary logistic regression models and multiple will examine three aspects. The first aspect comes linear regression models are applied to test the signif- from a source outside of the teams, which is the ESPN icance of the levels of sports analytics on the success sports analytics categorization of teams listed in The in regular season, while ordinal logistic regression Great Analytics Rankings (2015). The sports ana- models are applied to test the significance of the lev- lytics belief in MLB teams was categorized in five els of sports analytics on the success in postseason. levels: All-in, Believers, One-foot-in, Skeptics, and The findings are also given in Section 3. Non-believers. The second and third aspects come In addition to the factor of team payroll, does from the source inside of the teams. We look at the use of sports analytics play an important role in the number of analytics staff hired and devoted to explaining the success of a team (in terms of winning the analytics department, if any, of each team. In more games in regular season and the advancement addition, we investigate the total number of research level in postseason towards a championship of World staff worked in the baseball operations of each team. Series)? We will tackle this issue from two per- The research work may include analytics, statistical spectives. First, as the team payroll is an essential data analysis, mathematical modeling, data science, component of a team’s success, it is necessary to data architecture, decision sciences, informatics, per- control this factor while calculating the correlation formance science, research and development, etc. coefficient. Therefore, we have to calculate the par- The categories of analytics belief and these num- tial correlation coefficient between the success of a bers of analytics staff and research staff might reflect team and its use of sports analytics, after the team teams’ commitment and their potential shift into more payroll is accounted for. Second, we will use the par- reliance on analytics and other innovative research. tial F tests to evaluate the significance of adding the We will evaluate how the use of sports analytics influ- use of sports analytics in explaining the success of ences the number of games won in the regular season, a team, after the factor of team payroll is already in thereby affecting the chances of advancing to the the regression model. Moreover, we apply the model postseason. Once a team is in postseason, we will utility F test to show that both the team payroll and study further how the use of sports analytics influ- the use of sports analytics are not good indicators of ences its chances of getting into the last 4 teams and success in the postseason. These results are shown in last 2 teams of playoffs, and eventually its chances of Section 4. winning the championship of World Series. In Section 5, the predictive modeling of decision Through the historical data from 1977 to 2008, trees will be developed to classify MLB teams into no Schwartz and Zarrow (2009) showed that the team playoffs or playoffs. We will consider the situations payroll had great influence in regular season, but not where the number of games won in a regular season is in postseason. They also tested several other potential available or not as an input variable. It is interesting to indicators of postseason success and found that none note that 87 wins (or 0.537 winning percentage) in a of them was a significant predictor. They concluded regular season may well be the threshold of advancing that the success in October (i.e., playoffs) was a truly into the postseason. A decision tree shows that teams random event. with pitchers’ salary at least $29.7 million and analyt- It makes sense to see that teams of high payroll ics belief either All in or Believers have significantly in general would perform better than teams of low higher chances of advancing to playoffs. Summary payroll. Teams of high payroll could afford to recruit and concluding comments will be presented in more talented and experienced players that lead to Section 6. better chances of winning games. In other words, team payroll could be an indicator of the quality of 2. Data players’ talent, which is an input into producing wins and success in both regular season and postseason. The MLB teams’ categories of analytics belief To confirm this belief, we will calculate the Pearson are listed in The Great Analytics Rankings. The D.P. Chu and C.W. Wang / Sports Analytics and success in regular season and postseason in MLB 207 standings, playoffs and payrolls of the MLB teams St. Louis Cardinals(BELIEVER), Kansas City in 2014-2017 can be found from the corresponding Royals(BELIEVER, advanced to World Series), websites shown on the references. The numbers of and San Francisco(NON-BELIEVER, the champion analytics staff employed by teams can be tracked of World Series); Chicago Cubs(BELIEVER), down from the Baseball America Directories (2014- Toronto Blue Jays(BELIEVER), New York 2017). The total numbers of research staff can also Mets(BELIEVER, advanced to World Series), and be tracked down and added up from the Baseball Kansas City Royals(BELIEVER, the champion of America Directories. World Series); Los Angeles Dodgers(BELIEVER), Toronto Blue Jays(BELIEVER), Cleveland Indi- 3.