投稿類別:英文寫作

篇名: Data Science and Analysis are Changing the

作者: 林俊沂。私立僑泰高中。高二 1 班

指導老師: 胡雯俐老師 Data Science and Analysis are Changing the Baseball

I. Introduction

1. Motivation

Whoever relishes baseball would spend valuable time paying highly constant attention to it. I am no exception. Reading articles or watching programs like MLB almost occupy my free time. I notice that understanding the statistics about baseball is essential because statistics is the most objective ways to define player’s capability. Although sometimes takes a lot of time to grasp, it is actually fun to strengthen the baseball knowledge and to acquaint the influence on players with data science. Therefore, I decide to learn more about baseball statistics and try to show its phenomena and effects through this research.

2. Purpose

This research aims at the revolution of the baseball statistics with its origin and the use in the few years. I also sort out some opposite of baseball statistics to distinguish two different views of the topic to annotate it deeper. Through two-side arguments, the research aims to present the influence of the baseball statistics and explain it.

II. Body

1.

Sabermetrics is a baseball statistics that can make objective analysis of baseball activities, as for the interpretation and evaluation of baseball statistics during baseball games. The term coined by , is derived from the acronym SABR which stands for the Society for America Baseball Research and is rooted with metrics.

1.1. The early history of Sabermetrics

The first baseball statistics way to describe the baseball activity called box scores, developed by Henry Chadwick in 1858. Box scores offer some basic summary statistics for the players and team. Sabermetrics had not been noticed and dismissed by most of the baseball teams and professionals then, because they thought the statistics wouldn’t relate to the overall team of the standings and player’s ability. However, there are some persons still dedicated to showing their research on Sabermetrics like Earnshaw Cook, Bill James, and even players like especially when he was playing in Orioles in 1970s. These statisticians tried to provide some favorite stats like average in different views and opinions. Sabermetrics is a new concept in that era by telling everyone that a good measure determined by how well the players help

1 Data Science and Analysis are Changing the Baseball their team get or more runs.

1.2. The measurements of Sabermetrics

Bill James and his crew found traditional measurements had had some flaws. For example, ignores other ways like walks, -by-pitches. Furthermore, some statisticians, like Tom Tango, created a statistic based on linear weights. This type of stats can measure the more accurate player’s overall in per plate appearance. Take Mike Trout’s performance in 2020 as an example to explain the new measurements of Sabermetrics.

Mike Trout had 199 at bats, 56 hits, 35 based on balls (4 intentional bases on balls), 3 hit by pitch, 4 sacrifice flies in 2020. His 1B was 28, 2B was 9, 3B was 2, HR was 17 in 2020.

OBP The total number of hits + bases on balls + hit by pitch are divided by at bats (On-Base (AB) + bases on balls (BB) + hit by pitch (HBP) + sacrifice flies.(SF) Percentage) Ex: His OBP was 56+35+3/199+35+3+4≈ 0.390. SLG The total number of bases in all hits is divided by the total numbers of time at (Slugging bat. Ex: Mike Trout had 120 total number of bases in 2020. His SLG was Percentage) 120/199≈ 0.603. OPS OBP+SLG (On-base Plus Ex: Mike Trout OPS was 0.390+0.693=0.993 in 2020. Slugging) WOBA Formula 2020: ((0.699 x NIBB) + (0.728 x HBP) + (0.883 x 1B) + (1.238 x (Weighted on 2B) + (1.558 x 3B) + (1.979x HR) )/ AB+BB-IBB+SF+HBP Base Average) NIBB means Non-intentional bases on balls. Ex: Mike Trout IBB was 4 in 2020, so his NIBB was 31. His wOBA was ((0.699 x 31) + (0.728 x 3) + (0.883 x 28) + (1.238 x 9) + (1.558 x 2) + (1.979x 17)) / 199+35-4+4+3≈0.407 in 2020. Ps. Weighted factors will change annually due to every situation in games.

Picture source: FanGraphs wOBA Sabermetrics Library Table 1: the explanation of some of measurements in Sabermetrics Table made by myself; information from FanGraphs Baseball

2 Data Science and Analysis are Changing the Baseball

1.3. The recent use on Sabermetrics

Now attached is Sabermetrics with higher mathematics like related rates and quantitative analysis to examine information, stats and strategy for organization and the front office. It not only can define the market value and role of players but also give a function to analyze players whether to release or sign by data with correlation. Prediction is also part of Sabermetrics. Building machine models with code like R code or SQL takes on more calculation and precision when applied to large number of events. These predictions can help teams summarize their variables like opponents’ scored and estimate their winning rate.

Sabermetrics has inspired a lot of people who love baseball and statistics like Nate Silver’s PECOTA. It is a system that helps those who have great interest about Sabermetrics to learn and discuss. Some of the technology like PITCHf/x can record play-by-play data by video cameras. It was adopted by MLB and often used at the postseason for broadcasters to report in the beginning of the 2007 season. Private baseball cage is also a trend for some professional favorites to change and improve their swinging performance and strategy. With the latest electronic device, players can know their launch angle (LA), exit velocity (EV), angular velocity and other subjects in just a second.

2. Events

Sabermetrics has a strong influence on modern era of baseball. Here are some events that make Sabermetrics become the main stream of the baseball.

2.1. Moneyball: The art of winning an unfair game

Moneyball is a breakout of baseball statistics. It was a story about a team called Athletics (A’s) with sabermetrics to create their team in 2001~2003. Their all-star players like Jason Giambi, Johnny Damon and Jason Isringhausen just left and were signed respectively by Yankees, Red Sox and St. Louis Cardinals at all costs. Athletics was a team constrained budget. They only had a modest payroll of $50 million to recreate their team. Their General Manager Billy Beane then applied research with the revolutionary idea about analysis to choose players who did not have too much expectation in the baseball world.

Besides, Billy Beane chose the players with high on-base-percentage (OBP) instead of batting average (BA). He thought on-base-percentage was a nice statistics because outs were the most precious things in his mind. According to his philosophy at that time, he could buy players that his ability about scoring as well as a high-value player with fewer money. Here is an example, about Scott Hatteberg, a with an outstanding OBP in his whole career. I chose his stats

3 Data Science and Analysis are Changing the Baseball

(2000~2001) before he joined Athletics (2002) with between a hall of fame player Mike Piazza and all hitters OBP (plate appearance above 250), Mike Piazza had the best stats at catcher at the time in 2000 and in 2001.

Players Scott Hatteberg Mike Piazza All hitters average Year (s) (plate appearance above 250) 2000 0.367 0.398 0.345 2001 0.332 0.384 0.332 Table 2: the statistics between the players OBP Table made by myself; information from Baseball Reference

As we see, Scott Hatteberg’s OBP was above league average. What’s more, he is only a substitute catcher for Red Sox. Billy Beane only spent 9 thousand dollars signing him. The average salary number in 2002 was about 2 million dollars. Scott Hatteberg handed in a 121 wRC+ record in 2002, which means his scoring runs ability and creativity were 21 percent above league average above. It is crazy because he is a bench-salary player or even lower than any other bench player in the rich team.

This motion was not to build or produce the best team. A’s just tried to do their best and to enhance their competitivity. The team wasn’t strong enough to compete with the contenders in the market, so they kept mining players and drafting potential players they had selected based on their novel system and stimulated the players capability, outperformed their value and sold them. This situation related the draft style and ploys the team had decided. Billy Beane preferred mature players more than the young boys, thereby college players over high school players. But A’s shortcoming in the postseason was their dark side and was pointed out by the critical fans and traditional baseball players. Those people disagreed A’s purports and said Moneyball was largely misunderstood and disrespected about baseball. As Billy Beane (2009) said in ESPN,

“It’s all about evaluating skills and putting a price tag on them,” Beane told ESPN. “They can choose a fund manager who manages their retirement by gut instinct, or one who chooses by research and analysis. I know which I’d choose.”

Moneyball absolutely boosts the sports analytics a lot. We all agree that most of the team will use the analytics eventually, but Moneyball just speeds up the process. Moneyball adjusted, uploaded the extraordinary opinion, corresponded it and amazed everyone by staying competitive.

No baseball philosophy is always perfect, and Moneyball has flaws. As the point that they couldn’t control the short term games, especially postseason. In the end, though the biggest problem 4 Data Science and Analysis are Changing the Baseball for the A's has likely been an obvious one, they cannot control “Luck”.

"My shit doesn't work in the playoffs. My job is to get us to the playoffs. What happens after that is fucking luck." — Billy Beane, Oakland A's general manager.

Moneyball is more like breaking the tradition, but instead of asserting the traditional stats and observation, it is about inventive aspects about baseball. It brings in new discoveries: defensive shifts, bullpen usage, velocity spike and so on. Moneyball also tells people that you don’t need to spend a lot of money to buy the best players in order to win the championship Although some players, fans, critics thought it break the original taste about baseball, winning the game and playing smart are more important for others.

2.2. Statcast

Statcast is a system to analysis the players’ movements and an automatic tool by MLB. It has also become a popular Sabermetrics system that everyone and teams are keen on in recent years.

Statcast used doppler radar camera and replaced the previous PITCHf/x system in the beginning of the Statcast history. In 2020, Statcast was improved by adding Hawk-Eye Innovations and used Google Cloud as it cloud data partner.

Doopler camera installed in Yankee Stadium Statcast 2.0: An example of Picture source:運動視界-肉眼看不見的棒球 the graphic representation of data Picture source: SportBusiness

Statcast’s terminology like exit velocity, spin rate and pop time have been highly concerned by many teams as new efficient tools to examine the players since it uploads. Some new entertaining stuff like projected HR distance, throw distance, arm strength and catching probability in the events add in more variety while broadcasting and watching before.

3. The opposition and argument on baseball statistics

Though statistics seems fantastic, there are still some argument and opposition against it. There

5 Data Science and Analysis are Changing the Baseball are some examples to demonstrate.

3.1. Office aspects

This issue has always been debated in the baseball world for years. Of course, statistics may have more importance than scouts in recent years. Many teams like Astros and Dodgers which takes progressive means to cooperate the new factors, changing the game all have successful season lately. They often bring up young talented players with cheap salary, which provide immediate power to the team. It is cruel that you just need to adjust the best win rate model and predict the best players who will fit in the club. Clubs just need passionate, enthusiastic and cheap scouts with the camera to get some videos and integrate the basic statistics to the front office to make decisions and run their drafts. Scouts are seen as a tool which can be quickly replaced.

However, extremely toward analysis sometimes is a cost, either. Analysis offers players his probable values to the club, but this is also a problem to the clubs. Because many clubs values players similarly, it make club hard to match up the draft result. Everyone only see the predict sight but dismiss the opportunity to improve current situation. They rather choose three similar prospects than a superior prospect. The clubs afraid to make explosive decision, their goals is to win tomorrow, not today. This situation also seen as the paralyze the offseason, clubs reluctant to sign free agent and trade. It makes the fans feel boring and disappointed during the offseason.

3.2. Audience and games quality aspects

Statistics events like Moneyball bring in different trends in the baseball, but these not always entertain the audience. There are some views to show the game changes between the statistics baseball.

First is the length of the game. The bullpen usage is became more and more considerable, but when the relief enters the game, commercials and warm-ups will cost a lots of time for the fans and audience. Although the MLB office announce the new rules of one-man and decline the timeout limit and rules, the games it still very lengthy. The length of the ball game was 3 hours and 7 minutes in 2020, in contrast to the game length before 2014 season were all below 3 hours, the game length were longer than the past.

Second is the exciting during the game, everyone knows the most important things in the baseball is not being out and scores efficiently. Yeah, statistics points out and enlarge it, but it turn out a bad experience to the audience. Statistics discourages stolen bases and sacrifice . In contrast, statistics encourages hitting extra-bases and showing power by increase the launch angle to contact the ball. Stolen bases and used to be seen as a means to extend the advantage

6 Data Science and Analysis are Changing the Baseball and impel the team atmosphere in the past, but the decline of the stolen bases and sacrifice bunt let the game lose some passion. Otherwise, you might say that is not extra-bases and long-distance at bat bring excitement to the game. Hitting home runs sounds cool, however, it highly increase the strike out during the game, players swinging harder to hit home runs but also higher the risk of being strike out. This is the chart to show the rate (K%) , hard hitting rate (hit with an exit velocity of 95 mph or higher) and total home runs in the few years (2015~2019).

Element (s) K% HardHit% Total HRs Year (s) 2015 20.4% 33.3% 4909 2016 21.1% 34.5% 5610 2017 21.6% 33.3% 6105 2018 22.3% 35.3% 5585 2019 23.0% 36.5% 6776 Table 3: the statistics of 2015 to 2019 Table made by myself; information from FanGraphs Baseball

As the game becomes more and more strikeouts and fly balls, the tension of the game drops and less variety in the game let the audience feels tiresome.

3.3. Players and games results aspects

3.3.1. Fly-ball revolution

This part is similar with the previous topic. Fly-ball revolution is an idea that to stop hitting groundballs because groundballs cause the fewest benefits on scoring runs. In 2016, batters hit .241 with a .715 slugging mark and a wRC+ of 139 on fly balls versus a .238 average, .258 slugging mark and of wRC+ of 27 on ground balls. Through raising the launch angle, players can hit the sweet spot of the ball more easily (cause more extra-bases and home runs) and decrease the groundball rate. You can see the changes in the chart.

Element (s) LA GB% FB% Year (s) (Ground-ball rate) (Fly-ball rate) 2015 10.9 45.3% 33.8% 2016 11.6 44.7% 34.6% 2017 11.8 44.2% 35.5% 2018 12.3 43.2% 35.4% 2019 12.7 42.9% 35.7% 7 Data Science and Analysis are Changing the Baseball

2020 12.7 42.7% 35.7% Table 4: the statistics of 2015 to 2020 Table made by myself; information from FanGraphs Baseball

Some players like Josh Donaldson and Brett Gardner have improved themselves and even achieve better result than they think, but not everyone have these kind of success of the changes.

Jason Heyward, is a player struggle at hitting in 2016, this is a disaster year for him. His wOBA was 0.282 and his wRC+ was 72. His wOBA was about 20 percent worse than the league average. The chart will show his difference between 2015 and 2016.

Element (s) LA FB% wOBA wRC+ Year (s) 2015 4.6 23.5% 0.346 121 2016 10.6 33.3% 0.282 72 Table 5: the Jason Heyward statistics between 2015 and 2016 Table made by myself; information from FanGraphs Baseball

As you can see, Jason Heyward adopting the pull swings did not improve his batting performance but underperforming. Some players like Enrique Hernandez and Juan Lagares also suffered the frustration of changing the original swing motion. Fly-ball revolution might be good for some players and the overall of the team performance, but it may affect and distract players.

3.3.2. Tampa Bay Rays decision

This happened in October 27, 2020. Tampa Bay Rays is a team likes to use statistics and metrics to make strategy during games. They took the first seed in the American League and hit in the finals with a low budget by using data analysis. However, their metrics in the most significant game turned out the loss of the World Series. Tampa Bay Rays decided to pull their ace-Blake Snell, who had five scoreless innings with only pitched 73 pitches after a hit by Austin Barnes. The decision was the breakpoint of the game, Dodgers took over the game in six innings and won the championship. After the game, many people blamed the Rays coach-Kevin Cash because for his decision.

Kevin Cash said the decision was follow the metrics they had proven and after assessed the whole situation they faced. Snell had not performed well after getting into the third-time-lineups in 2020, he also had injuries and wasn’t have full strength of the season even mention postseason. It is hard to say that Cash’s moves were totally wrong, but it changed the result of the game absolutely.

8 Data Science and Analysis are Changing the Baseball

“I guess I regret it because it didn’t work out. The thought process was right… If we had to do it over again, I would have the utmost confidence in Nick Anderson to get through that inning.” (Kevin Cash, 2020)

III. Conclusion

"How can you not be romantic about baseball?" - Moneyball

Sabermetrics undoubtedly improves and motivates the evolution of the baseball. The baseball system and concept has been changed so quickly and strategic than any era before. Nevertheless, the trend and potential of baseball statistic will still evolve in the future for sure, we cannot stuck in old ways and uneducated.

Moreover, baseball statistics is not always correct and the best way to win the championship as we can see, but we all should agree that statistics is a nice tool to have impact on the game, from players to teams; win to championship; scouts to statisticians: games to audience. I believe that baseball is the most academic and mysterious sports for me. Baseball is just that incredible.

At last, since the growing technology, it is obvious that technology is spreading across baseball fast – and with it, the next wave of analytics.

"Baseball was, is, and always will be the best game in the world to me." -

IV. References

Ben Lindbergh, Travis Sawchik (2020)。MVP 製造機:看大聯盟頂尖球隊如何用科技顛覆傳統、 以成長心態擁抱創新,讓平凡 C 咖成為冠軍 A 咖。台北市:堡壘文化

Michael Lewis (2004)。魔球:逆境中致勝的智慧。台北市:早安財經。

達斯 (2019)。肉眼看不見的棒球—認識大聯盟數據分析系統。2020 年 10 月 2 日。 取自:https://www.sportsv.net/articles/60638

Rob Neyer (2016). Sabermetrics. Britannica. Retrieved from: https://www.britannica.com/sports/sabermetrics

Edwin Amenta (2019). Opinion: Moneyball is ruining baseball. MarketWatch. Retrieved from: https://www.marketwatch.com/story/moneyball-has-made-baseball-games-boring-2019-03-28

9 Data Science and Analysis are Changing the Baseball

Jeff Passan (2018). 10 Degrees: How 'Moneyball' caused the largest changes to baseball since integration. Yahoo!Sports. Retrieved from: https://sports.yahoo.com/10-degrees-moneyball-caused-largest-changes-baseball-since-integration-0 50601410.html

Mike Boylan (2011). "Moneyball" and the Oakland A's: How Has It Been so Misunderstood? Bleacher Report. Retrieved from: https://bleacherreport.com/articles/679950-revisiting-moneyball-and-the-oakland-as-how-has-it-bee n-so-misunderstood

Jeff Sullivan (2017). Here Is Your Fly-Ball Revolution. FanGraphs. Retrieved from: https://blogs.fangraphs.com/here-is-your-fly-ball-revolution/

Rob Arthur (2017). The Fly Ball Revolution Is Hurting As Many Batters As It’s Helped. FiveThirtyEight. Retrieved from: https://fivethirtyeight.com/features/the-fly-ball-revolution-is-hurting-as-many-batters-as-its-helped/

Don Yaeger (2020). Kevin Cash’s Critical World Series Decision That Went “Wrong” Was Actually Right. Forbes. Retrieved from: https://www.forbes.com/sites/donyaeger/2020/10/29/kevin-cashs-critical-world-series-decision-that- went-wrong-was-actually-right/?sh=2d06d6f27c56

Best Baseball Quotes From Players, Movies, & More. (2017). JustBats. Retrieved from: https://www.justbats.com/blog/post/best-baseball-quotes-from-players-movies-more/

Baseball Reference. Retrieved from: https://www.baseball-reference.com/

FanGraphs. Retrieved from: https://www.fangraphs.com/

10