Sabermetric Analysis: Wins-Above-Replacement
Total Page:16
File Type:pdf, Size:1020Kb
Locus: The Seton Hall Journal of Undergraduate Research Volume 3 Article 7 October 2020 Sabermetric Analysis: Wins-Above-Replacement Karl Hendela Follow this and additional works at: https://scholarship.shu.edu/locus Recommended Citation Hendela, Karl (2020) "Sabermetric Analysis: Wins-Above-Replacement," Locus: The Seton Hall Journal of Undergraduate Research: Vol. 3 , Article 7. Available at: https://scholarship.shu.edu/locus/vol3/iss1/7 Hendela: Sabermetric Analysis: Wins-Above-Replacement Sabermetric Analysis: Wins-Above-Replacement Karl Hendela Seton Hall University Abstract 1. Introduction Baseball has increasingly turned to analytics The game of baseball has been America’s pas- in order to evaluate players and teams within the time for well over a century. The objective of the league. The latest statistic to become popular game—score more runs than you allow and you is Wins-Above-Replacement, abbreviated WAR. win—has led to the development of two axioms, WAR takes into account all factors of a player’s “A ball player’s purpose in playing baseball is to contribution to a team, including pitching, bat- do those things which create wins for his team, ting, base-running, and fielding, and combines while avoiding those things which create losses them to produce a single number which approxi- for his team,” and “Wins result from runs scored mates how many games the team has won by uti- while losses result from runs allowed” (James, lizing this player as compared to a replacement- 1984). Predicting which team will win a base- level or minor league level player at the same po- ball game and measuring how players contribute sition. The WAR statistic has become widely used to those wins has become an American pastime in as a way to compare players, even players play- itself with sabermetrics, the search for objective ing different positions or in different time peri- knowledge about baseball (James, 1984). Record- ods. However, current versions of WAR all have ing players’ statistics have been an integral part of one serious drawback: there is no publicly avail- Major League Baseball since the foundation of the able, standardized formula or process to calcu- game. It started with the simplest statistics: num- late it. Instead, separate organizations have for- ber of hits and walks; batting average; RBIs (runs mulated their own versions of the statistic using batted in) and home runs for position players; in- proprietary data, rendering the statistic both dif- nings pitched, strikeouts, bases on balls (walks), ficult to understand and difficult to replicate. In earned run average (ERA), and win/loss record for this project we create a standardized WAR statis- pitchers. Statistics such as these have been care- tic that accurately indicates the value of a player fully compiled since the dawn of the game for using only publicly available data, and show that managers and fans alike to look at and analyze our new WAR statistic strongly correlates with themselves. WAR statistics which use proprietary data. This While no statistics can tell the full story of the approach has the potential to create a standard players and the games they played, over time the WAR formula that an average baseball fan can question has arisen: Is it possible to have a sin- use and understand in the future as they evaluate gle statistic that measures the quality of a player? a baseball player’s value. We call it SHU-WAR. This question has led to the development of statis- tics such as runs-created, win-shares, and most Published by eRepository @ Seton Hall, 2020 1 Locus: The Seton Hall Journal of Undergraduate Research, Vol. 3, Iss. 1 [2020], Art. 7 recently wins-above-replacement (WAR). Wins- therefore is left to take the values generated from above-replacement attempts to answer this ques- these methods for granted since there is no way tion by taking into account all aspects of a player’s to reproduce the numbers these organizations cre- performance and translating it into the most im- ate themselves. Baseball Reference has named its portant part of the game: winning. The idea is version bWAR, FanGraphs is fWAR and Baseball if you had a team composed of replacement-level Prospectus is WARP, none of which are identical players, statistically they would be expected to fin- or uniform across all players. When the statistic ish a typical 162-game season with a record of is typically talked about though, it is referred to as roughly 48 wins and 114 losses. The WAR statis- WAR even though there are multiple versions of tic determines how many wins above or below 48 it. would result by replacing one of those replace- Given the amount of publicly available base- ment players with the player whose WAR is being ball statistics, however, and the various mathe- measured (WAR Explained, 2019). In the creation matical ideas and techniques already available in of this number, the unique factors of the game are sabermetrics, it seems reasonable to believe that normalized to be context, league, and park neutral a WAR statistic can be constructed that would so that every player is valued on an even playing be practical, understandable, and accessible to the field. Baseball is one of the few sports that has sig- general baseball-loving fan. Our objective in this nificant differences between fields and leagues as paper is to create that statistic, a new WAR statistic the sport has never standardized outfield or foul we call SHU-WAR, which can function as a stan- territory dimensions. This makes it difficult to dardized wins-above replacement statistic while compare players, as these differences cause prob- avoiding using any proprietary data that is not lems in determining how one player would fare on available on a typical baseball statistical database. a typical field. WAR takes this all into considera- tion, as adjustments are made to even the playing 2. The Linear Weights System field between players, making this a statistic com- parable across all leagues, fields, and time periods. The basis of the SHU-WAR statistic is the Normally, this is the part of the paper where Linear Weights System developed by John Thorn I would describe how WAR is calculated. Un- and Pete Palmer (Thorn & Palmer, 1985). In fortunately, however, this is impossible because this system various baseball events were indepen- WAR is a non-standardized metric: there are in dently evaluated and given a value that expresses fact multiple versions of the WAR statistic. To each event in terms of runs, specifically runs pro- compound matters, all of these WAR statistics, duced or prevented. The events evaluated in Thorn published by companies like Baseball Reference, and Palmer’s linear weights system encompass all FanGraphs and Baseball Prospectus, are based at physical plays, including even non-scoring plays. least in part on proprietary data not available to Examples of batting events, for example, include the public. Some of this data requires special- events like singles, doubles, triples but also non- ized technology and, while it is always nice to hit events like walks, bases on balls, and when a gain new angles on parts of the game, this can re- player is hit by a pitch (and thus advances to first sult in confusion about just what aspects are be- base automatically). Using a regression analysis, ing measured. There is also little public informa- Thorn and Palmer determined the linear weight of tion available about these statistics, which leads each event, as well as adjustments taking into ac- to confusion on how to actually calculate the fi- count the league that the player is in. Hitless at- nal value, as it may be constantly adjusted be- bats are given a negative score, so that the sum of hind closed doors. The typical baseball analyst all linear weights of all events within a league sum https://scholarship.shu.edu/locus/vol3/iss1/7 2 Hendela: Sabermetric Analysis: Wins-Above-Replacement up to zero, and so the walks, singles, and so on the final value of how many wins are accredited can be measured by the number of runs they pro- to the player above that of a replacement player. duce above an average at-bat. To give an example, An early version of this method for calculating while a home run in a baseball game can directly what we now call SHU-WAR first appears in the score between one to four runs (depending on book Understanding Sabermetrics (Costa, Huber how many people were on base) it was determined & Saccoman, 2nd ed.). through Thorn & Palmer’s work that a home run produces approximately 1.40 runs greater than the 3.1 Batting Runs average at-bat (Thorn & Palmer, 1985). Batting runs (BR) is a statistic that takes the For pitching statistics, the linear weights sys- important portions of the offensive player’s game tem is calibrated to the league’s earned-run- and translates each event into runs created. The average (ERA) and the amount of innings pitched. statistic used is as follows: This means a pitcher with the league average ERA would have a presumptive record of 81-81 if they BR = (0:47 × H) + (0:38 × 2B) + (0:55 × 3B) pitched every game of the 162 game season, while + (0:93 × HR) + [0:33 × (BB + HBP)] any deviation from the league average will either − [ABF ∗ (AB − H)] increase or decrease the number of wins for the season (Thorn & Palmer, 1985). This is parallel to The BR formula for an individual player is cal- batting where the system takes into account pos- culated using the total number of at-bats a player itive and negative events in terms of the league records (AB), as well as the various positive events averages to determine how many runs the player that can result: hits (H), doubles (2B), triples (3B), produced.