Download Date 02/10/2021 11:50:23
Total Page:16
File Type:pdf, Size:1020Kb
The Early Negro Leagues and Major League Baseball: A Comparative Analysis Item Type text; Electronic Thesis Authors Catallini, Joseph Louis, II Publisher The University of Arizona. Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author. Download date 02/10/2021 11:50:23 Item License http://rightsstatements.org/vocab/InC/1.0/ Link to Item http://hdl.handle.net/10150/297520 Abstract This study is an exploratory statistical study of the early Latin and Negro Leagues from 1904 to 1924. Data compiled in the seamheads.com Negro League Database, launched in September 2011, was analyzed and compared to data from Major League Baseball, via baseball- reference.com, over the same time period. Despite incomplete data from the Negro Leagues, the results do show some interesting similarities and significant differences in the two data sets. The data shows that both leagues’ batting production progressed similarly throughout the era, with a similar spike in power production at the end of the “Dead Ball Era” in 1920. Results indicate that the primary difference between the leagues was that Negro League teams produced poorer fielding averages. The result of this was higher run production in the Negro Leagues in every year examined in the study, despite the fact that Major League teams often produced better in batting statistics such as batting average, on base percentage and slugging percentage. Introduction The Major League Baseball color line ended in 1947 with the debut of Jackie Robinson. Prior to this, several Latin American and black professional leagues were the outlet for talented black players to play. Many great players and teams played in these leagues, however statistical data on these leagues had been very scarce, especially over their early years. While databases for Major League statistics date back to the 1800s and are available through many different sources, until recently there had not been anything close to a complete database for early Latin American and Negro leagues. In September 2011, seamheads.com introduced their Negro League Database Project in an effort to change this. The project, organized by Gary Ashwill with many contributors, is to compile data and statistics from early Latin American professional leagues, the Negro Leagues and other black professional leagues during the color line era in baseball into one database. The creation of the database allows baseball historians, researchers and enthusiasts a more complete statistical depiction of the early Negro and Latin American leagues than ever before available. The purpose of this exploratory study is to analyze the newly available data set and compare it to Major League data from the same time period (1904-1924). Without the presence of the Negro League Database, a thorough statistical comparative analysis involving the Negro Leagues has never been possible. The hope is that this study will bring to light similarities, differences and relationships between Major League Baseball and the Negro and Latin American leagues. Furthermore, an aim of this study is to compare what influenced making teams and players successful in the Major Leagues versus the Negro and Latin American leagues. The study progressed with three steps. The first step was to organize Major League Baseball (from baseball-reference.com) with data from the Negro League Database onto one spreadsheet. The second step was to standardize the data where necessary and perform tests on the data. While all teams in the Major Leagues played the same or similar amounts of games each season, the Latin American and Negro league seasons were more sporadic. Because of this, statistics must be standardized with respect to events such as games, innings, at-bats or plate appearances in order for data to be compared to data from the Major Leagues. The final step was to perform statistical tests, such as t-tests, correlations and linear regressions to determine statistical similarities, differences and relationships between the two separate leagues and to analyze the results. Background I. The databases a. The Negro League Database via seamheads.com The Negro League Database contains statistics from two areas of baseball that were very connected. Some of the best black baseball players played in Latin American leagues and likewise, black Latin American players came to the United States to compete in the Negro leagues. The “About” section of the database explains the significance of this data, “The statistics presented here document the play of many of the greatest players in baseball's history who never got the chance to compete in the major leagues, including many members of the Hall of Fame.” The data was compiled through research of “box scores and game accounts published in contemporary newspapers,” and includes batting, pitching and fielding statistics for several leagues from the early twentieth century, listed below as described on the website: “1) Independent black professional teams deemed comparable in quality to the later organized Negro leagues, from 1912 through 1919; 2) The Negro National League in its first four seasons, 1920, 1921, 1922, and 1923; 3) Independent black teams of comparable quality to the NNL, 1920-1922; 4) Cuban major leagues (Liga Cubana, Liga Habanera, Liga General, and Liga Nacional, as well as the Premio de Verano, or Cuban Summer League) from the 1902/03 winter season through 1912/13. 5) Exhibition series played in Havana between U.S. major league teams or Negro league teams and Cuban League teams, 1904 through 1915.” Since this list was published on the website, data from the 1924 and 1933 seasons of the Negro National League, the Eastern Colored League in 1923 and 1924 and the 1915/16 and 1916/17 seasons in the Florida Hotel League have been added. Additionally, independent black teams comparable to the organized Negro League teams from 1899 through 1900 and 1902 through 1911. The results of this newspaper research was obtaining count data, such as hits (H), walks (BB), at-bats (AB), runs (R), earned runs (ER), errors (E) and any other statistic that can be obtained through counting their occurrences in a game account or box score. After researching and obtaining observable statistics, the data was compiled into tables and calculable data, such as Batting Average (BA), On-base Percentage (OBP), Slugging Percentage (SLG), On-base plus Slugging (OPS), Earned Run Average (ERA), Fielding Percentage (FLD%), etc. was found and all the data was posted onto the tables separated into Batting, Fielding and Pitching categories. A note on adjusted statistics: Researchers even calculated many contemporary analytic sabermetric statistics for the Latin American and Negro leagues. These statistics had not yet been discovered during the era being researched and include Weighted On-base Average (wOBA), League Adjusted On-base plus Slugging (OPS+), League Adjusted ERA (ERA+), League Adjusted FLD% (FLD%+), Win Shares (WS) and Wins Above Replacement (WAR) and many more. These adjusted statistics were developed as an analysis tool to compare players and teams within one league. For example consider OPS+, ERA+ and FLD%+, their formulas listed below: + = ( + 1) × 100 푂퐵푃 푆퐿퐺 푂푃푆 − 푙푔푂퐵푃 ∗ 푙푔푆퐿퐺 ∗ + = × 100 퐸푅퐴 퐸푅퐴 � � 푙푔퐸푅퐴%∗ %+ = ( ) × 100 % 퐹퐿퐷 퐹퐿퐷 In OPS+, OBP and SLG are a player or푙푔퐹퐿퐷 team’s OBP and SLG. lgOBP* and lgSLG* are the league averages for OBP and SLG adjusted to the player or team’s home park. The higher OBP and SLG that players tend to produce at a certain park, the higher adjusted league values, lgOBP* and lgSLG* would be for that park. For ERA+, ERA is a player or team’s ERA and lgERA* is the league average ERA adjusted to the player or team’s home park. A “hitter’s park” would adjust up from the actual league average to find lgERA* and a “pitcher’s park” would adjust down from the league average to obtain lgERA*. In FLD%+, FLD% is a player or teams FLD% and lgFLD% is the league average FLD%. Each of these statistics takes the ratio of the player or team’s production to the league average or adjusted league average and put it on a scale of 100. A score of 100 in these statistics suggests that the team or player’s performance is equal to the league average performance. A score of 110 suggests that the team or player’s performance is ten percent better than the league average and a score of 90 suggests that the team or player’s performance is 10 percent worse than the league average. As shown, these statistics are used for evaluation of a player or team in the context of their league. On any given year and in any given league the average for each of these statistics will always be 100. Since the purpose of this study is to compare data between two separate leagues, these adjusted statistics, as they were calculated in the databases, were ignored in the project. b. The Major League Database via baseball-reference.com Statistics for Major League Baseball have been readily compiled and available for many years. Baseball-reference.com, which was established in 2000 by Sports Reference LLC, provides statistics and data from American professional baseball leagues dating back to the early beginnings of American baseball. The data, box scores and game accounts available in the online database was obtained from Retrosheet, an organization that compiles baseball data from newspapers, game accounts and box scores and provides it in a downloadable format. The website contains data from American professional leagues that were considered separate from the Major Leagues, specifically the American Association, the Players League, The Union Association and the National Association.