<<

The Early Negro Leagues and Major League : A Comparative Analysis

Item Type text; Electronic Thesis

Authors Catallini, Joseph Louis, II

Publisher The University of Arizona.

Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.

Download date 02/10/2021 11:50:23

Item License http://rightsstatements.org/vocab/InC/1.0/

Link to Item http://hdl.handle.net/10150/297520

Abstract

This study is an exploratory statistical study of the early Latin and Negro Leagues from

1904 to 1924. Data compiled in the seamheads.com Negro League Database, launched in

September 2011, was analyzed and compared to data from , via baseball- reference.com, over the same time period. Despite incomplete data from the Negro Leagues, the results do show some interesting similarities and significant differences in the two data sets. The data shows that both leagues’ production progressed similarly throughout the era, with a similar spike in power production at the end of the “ Era” in 1920. Results indicate that the primary difference between the leagues was that Negro League teams produced poorer fielding averages. The result of this was higher production in the Negro Leagues in every year examined in the study, despite the fact that Major League teams often produced better in batting statistics such as , on base percentage and .

Introduction

The Major League ended in 1947 with the debut of .

Prior to this, several Latin American and black professional leagues were the outlet for talented black players to play. Many great players and teams played in these leagues, however statistical data on these leagues had been very scarce, especially over their early years. While databases for

Major League statistics date back to the 1800s and are available through many different sources, until recently there had not been anything close to a complete database for early Latin American and Negro leagues. In September 2011, seamheads.com introduced their Negro League

Database Project in an effort to change this.

The project, organized by Gary Ashwill with many contributors, is to compile data and statistics from early Latin American professional leagues, the Negro Leagues and other black professional leagues during the color line era in baseball into one database. The creation of the database allows baseball historians, researchers and enthusiasts a more complete statistical depiction of the early Negro and Latin American leagues than ever before available.

The purpose of this exploratory study is to analyze the newly available data set and compare it to Major League data from the same time period (1904-1924). Without the presence of the Negro League Database, a thorough statistical comparative analysis involving the Negro

Leagues has never been possible. The hope is that this study will bring to light similarities, differences and relationships between Major League Baseball and the Negro and Latin American leagues. Furthermore, an aim of this study is to compare what influenced making teams and players successful in the Major Leagues versus the Negro and Latin American leagues.

The study progressed with three steps. The first step was to organize Major League

Baseball (from baseball-reference.com) with data from the Negro League Database onto one spreadsheet. The second step was to standardize the data where necessary and perform tests on

the data. While all teams in the Major Leagues played the same or similar amounts of games

each season, the Latin American and Negro league seasons were more sporadic. Because of this,

statistics must be standardized with respect to events such as games, , at-bats or plate appearances in order for data to be compared to data from the Major Leagues. The final step was to perform statistical tests, such as t-tests, correlations and linear regressions to determine statistical similarities, differences and relationships between the two separate leagues and to analyze the results.

Background

I. The databases

a. The Negro League Database via seamheads.com

The Negro League Database contains statistics from two areas of baseball that were very

connected. Some of the best black baseball players played in Latin American leagues and

likewise, black Latin American players came to the United States to compete in the Negro

leagues. The “About” section of the database explains the significance of this data, “The

statistics presented here document the play of many of the greatest players in baseball's history

who never got the chance to compete in the major leagues, including many members of the Hall

of Fame.”

The data was compiled through research of “box scores and game accounts published in

contemporary newspapers,” and includes batting, pitching and fielding statistics for several

leagues from the early twentieth century, listed below as described on the website:

“1) Independent black professional teams deemed comparable in quality to the later

organized Negro leagues, from 1912 through 1919; 2) The Negro in its first four seasons, 1920, 1921, 1922, and 1923;

3) Independent black teams of comparable quality to the NNL, 1920-1922;

4) Cuban major leagues (Liga Cubana, Liga Habanera, Liga General, and Liga Nacional, as

well as the Premio de Verano, or Cuban Summer League) from the 1902/03 winter season

through 1912/13.

5) Exhibition played in between U.S. major league teams or Negro league

teams and teams, 1904 through 1915.”

Since this list was published on the website, data from the 1924 and 1933 seasons of the

Negro National League, the Eastern Colored League in 1923 and 1924 and the 1915/16 and

1916/17 seasons in the Florida Hotel League have been added. Additionally, independent black

teams comparable to the organized Negro League teams from 1899 through 1900 and 1902

through 1911.

The results of this newspaper research was obtaining data, such as hits (H), walks

(BB), at-bats (AB), runs (R), earned runs (ER), errors (E) and any other statistic that can be obtained through counting their occurrences in a game account or . After researching and obtaining observable statistics, the data was compiled into tables and calculable data, such as

Batting Average (BA), On-base Percentage (OBP), Slugging Percentage (SLG), On-base plus

Slugging (OPS), Average (ERA), (FLD%), etc. was found and

all the data was posted onto the tables separated into Batting, Fielding and Pitching categories.

A note on adjusted statistics:

Researchers even calculated many contemporary analytic sabermetric statistics for the

Latin American and Negro leagues. These statistics had not yet been discovered during the era

being researched and include Weighted On-base Average (wOBA), League Adjusted On-base plus Slugging (OPS+), League Adjusted ERA (ERA+), League Adjusted FLD% (FLD%+), Win

Shares (WS) and (WAR) and many more. These adjusted statistics were developed as an analysis tool to compare players and teams within one league. For example consider OPS+, ERA+ and FLD%+, their formulas listed below:

+ = ( + 1) × 100 푂퐵푃 푆퐿퐺 푂푃푆 − 푙푔푂퐵푃 ∗ 푙푔푆퐿퐺 ∗ + = × 100 퐸푅퐴 퐸푅퐴 � � 푙푔퐸푅퐴%∗ %+ = ( ) × 100 % 퐹퐿퐷 퐹퐿퐷 In OPS+, OBP and SLG are a player or푙푔퐹퐿퐷 team’s OBP and SLG. lgOBP* and lgSLG* are the league averages for OBP and SLG adjusted to the player or team’s home park. The higher

OBP and SLG that players tend to produce at a certain park, the higher adjusted league values, lgOBP* and lgSLG* would be for that park. For ERA+, ERA is a player or team’s ERA and lgERA* is the league average ERA adjusted to the player or team’s home park. A “hitter’s park” would adjust up from the actual league average to find lgERA* and a “’s park” would adjust down from the league average to obtain lgERA*. In FLD%+, FLD% is a player or teams

FLD% and lgFLD% is the league average FLD%. Each of these statistics takes the ratio of the player or team’s production to the league average or adjusted league average and put it on a scale of 100. A score of 100 in these statistics suggests that the team or player’s performance is equal to the league average performance. A score of 110 suggests that the team or player’s performance is ten percent better than the league average and a score of 90 suggests that the team or player’s performance is 10 percent worse than the league average.

As shown, these statistics are used for evaluation of a player or team in the context of their league. On any given year and in any given league the average for each of these statistics will always be 100. Since the purpose of this study is to compare data between two separate leagues, these adjusted statistics, as they were calculated in the databases, were ignored in the project.

b. The Major League Database via baseball-reference.com

Statistics for Major League Baseball have been readily compiled and available for many years. Baseball-reference.com, which was established in 2000 by Sports Reference LLC, provides statistics and data from American leagues dating back to the early beginnings of American baseball. The data, box scores and game accounts available in the online database was obtained from Retrosheet, an organization that compiles baseball data from newspapers, game accounts and box scores and provides it in a downloadable format.

The website contains data from American professional leagues that were considered separate from the Major Leagues, specifically the American Association, the Players League,

The and the National Association. For the purposes of this study, only leagues that were considered one of the Major Leagues were considered. The leagues evaluated in this study are listed below:

1) The National League from 1904 to 1924

2) The from 1904 to 1924

3) The from 1914 to 1915

Similarly to the Negro League Database, statistics and data are separated into Batting,

Fielding and Pitching categories, and include observable and calculable statistics. Baseball- reference.com has also applied contemporary statistics, such as OPS + and ERA+ to all of its data.

c. Combining Leagues and Seasons

The Negro League Database is split into several different leagues and seasons. The purpose of this study is to compare the Negro and Latin American leagues as a whole to the

Major Leagues as a whole. Because of this, all leagues in the Negro League Database will be treated as one cohesive league, henceforth referred to as the “Negro Leagues” in this study.

Similarly, the National League, American League and Federal League are all combined to form the “Major Leagues” in the study. These combinations can be made because all leagues incorporated into the Negro Leagues were strongly related to one another just as the National,

American and Federal Leagues were all strongly related to one another. In the case of the Negro

Leagues, separate leagues were connected by teams that played in multiple leagues, as well as games independent of specific leagues. Several teams included in the Negro League Database would play in multiple leagues over the course of a year. Additionally, teams from separate leagues would play each other in independent matchups. Because of these two factors, teams and players in separate leagues would often face each other, relating all of the Negro Leagues together. The National, American and Federal Leagues are all considered Major Leagues in

American baseball history. There were several players and managers who were affiliated with teams in at least two of these leagues over their careers, and some even participated in all three at some point. Also, the best team from the National League would play the best team from the

American League in the World Series each season. With the evolution of and the Florida Grapefruit League, teams from all of the Major Leagues also began to train nearby to one another during the preseason. This allowed players and managers to observe and compete against teams in the other leagues. This study is not solely a comparative analysis of the Major Leagues and Negro and Latin

American leagues over the entire era. The study is also designed to examine and compare the

trends of the data in each league from season to season. For the Negro Leagues, summer leagues

were combined with the following winter leagues to be compared to the Major League season from that summer. For example, the summer Negro Leagues seasons of 1904 are combined with the winter Negro Leagues seasons of 1904/05 to create a cohesive Negro Leagues season for

1904. This season will be compared to the Major Leagues season of 1904.

d. Two Approaches to Data Collection

In order to compare the Negro Leagues and the Major Leagues, two approaches to compiling and analyzing data will be utilized. In the first, team data will be collected for each season and each team’s production in a particular statistic, or array of statistics, will be treated as an observation. To ensure sufficient data to run tests on these observations, the criterion was set

that a Negro Leagues season must have batting, pitching and fielding data from at least six teams

to be considered in this approach. Additionally, for a team’s batting statistics to be considered

for any given season, a minimum of 500 team plate appearances (PA) must be recorded in the

data. For a team’s pitching statistics to be considered, a minimum of 100 (IP)

must be recorded in the data. Data from the 1904 season through the 1924 season fit these

criteria, with only one exception: despite containing batting and pitching data from 17 teams

during the 1924 season, the Negro League database has no fielding data for 1924. Because of

the significant sample of batting and pitching data available for 1924, analysis of batting and

pitching data will range from the 1904 season through the 1924 season, while fielding data will

only be analyzed through 1923. The second approach will be to analyze the league totals for each season over the same

span. To analyze this data, events such as games, innings and at-bats will be considered

observations, and the rate of results such as hits, and walks will be compared between

the leagues. In this approach, there will be no criteria for a team’s data to be considered, as the sum of every team’s count data will be used to calculate standardized data.

II. History of Important Leagues in the Study

a. The Negro Leagues

The Cuban Winter League – LH, LG, LN, AN and LCA

The Cuban Winter League is covered in this study from the 1904//05 season through the

1916/17 season. Due to labor and ownership disputes, the league changed names and leadership

multiple times. The data from this league came primarily from three teams, , Fe and

Habana. Like the league itself, the teams often changed names. The Cuban Winter League

showcased both Latino and American colored players, and was one of the first ‘real’ leagues for

colored players.

The Cuban Summer League - PV

The Cuban Summer League or Premio de Verano is covered from 1904 to 1908 in the

study. The three major teams from the Cuban Winter League also played in the PV, although

under different names. Almedares went by , Fe by Alerta and Carmelita and by

Rojo, Punzó and . Although not the principal Cuban league, the Summer League

featured a similar quality of play, as Winter League stars who weren’t touring the United States

played in the PV over the summer.

La Temporada Americana – MLB and Negro League teams in Cuba

La Temporada Americana, or the began in the late 1800s when

American teams from various leagues started traveling to Cuba to compete against Cuban teams.

In the early 1900s, MLB and Negro League teams participated in the series for the first time, traveling to Cuba after their seasons ended in the fall and before the Cuban winter leagues began.

The series provided a unique opportunity to see Major League greats such as of the

Detroit Tigers and Christy Matthewson of the Philadelphia Athletics pitted against some of the best black and Latino players at the time. Although the Negro League Database only contains statistics for the series between 1904 and 1915, the American Series continued in various seasons until 1959.

Independent Clubs – EAS, WES, IND and NYI

Before organized leagues of black players existed in America, independent teams played each other as well as white amateur, semi-pro and minor league teams. In the early years, fewer black teams existed and so, “the farther back you go in history, the fewer and fewer games

African American teams played against each other…So the [database], which at the moment only counts games between black teams in the United States, will represent thinner and thinner slivers of a team’s season, and a player’s career, with each step backwards in time.” (blog post,

Ashwill). In the database, EAS represents seasons where independent teams from the East played against one another and WES represents seasons where independent teams from the Midwest played one another. IND represents seasons where known “interleague” play was held between

Eastern and Midwestern teams. NYI is a group of three independent teams from New York that played against each other over three seasons. Just as many American colored players traveled to Cuba for their winter season, teams of Cuban colored stars toured the United States over the summer to play in these independent leagues.

National Association of Colored Professional Clubs of the United States and Cuba - NAC

The NAC is recognized by the database as the first ‘real’ American Negro Baseball

League. The NAC began in 1907, featuring four of the best Negro League team from previously independent leagues playing a short schedule against one another and crowning a champion at the end of the season.

Florida Hotel League - FHL

For the 1915/16-17/18 winter seasons, the database contains data from the Florida Hotel

League. In the Florida Hotel League, teams or groups of players from the Independent League circuit played a winter series in Florida representing two hotels. One team would represent the

Royal Poinciana Hotel, while their opponents would represent Breakers Hotel.

Negro National League - NNL

The Negro National League is the league that many people think of when they think of the Negro Leagues. It is the league that showcased many well-known colored stars such as

Satchel Paige and , although this study predates these two players. Established in

1920, the NNL grouped successful independent teams together to form a full league that played a full season. The creation of this league marks the point in this study where Negro League data sample size begins to approach that of MLB data, with some teams having data from upwards of eighty games over a season. There were also a group of associate members of the NNL (denoted by NNA in this study) who were not full-fledged members of the league.

Eastern Colored League – ECL

The Eastern Colored League began in 1923 when some the Negro National League

Associate teams broke from their affiliations with the NNL and formed an east-coast based

professional colored league. The formation of the league caused a format similar to the Major

Leagues, where two separate leagues existed that were comparable in talent.

Short Series – SOU, NOR and WS

The database also contains data from short series played between Negro League teams. In

1924, the winner of the NNL, the , defeated the winner of the ECL,

Hilldale Club, in the first ever Colored World Series, a ten game series played in four cities.

Data for short series, such as these, never was sufficient to fit the criteria for a team season,

however was included when considering data for season totals in the study.

b. The MLB

The National League – NL

Established in 1876, the National League was the first of the current Major Leagues and is sometimes referred to as the “Senior Circuit”. In this study, the National League consisted of the same eight franchises, although the team names may have changed over the years of the study.

The American League – AL

In 1901, the American League joined the National League as a Major League. Stemming from a minor league called the Western League, the American League also consisted of the same

eight franchises in this study.

The Federal League – FL The Federal League was an attempt to create a third major league to compete with both

the National and American League. The league began in 1914 and was short lived, only lasting

two seasons, although some American and National leaguers did to jump to the new league.

Like the other two of the Major Leagues, the Federal League consisted of eight teams per season,

with the Newark replacing the Hoosiers for the 1915 season.

III. The Dead Ball and Live Ball Era

An interesting aspect of this study is that data used in the study spans over a distinct change in Major League Baseball: from the Dead Ball Era to the Live Ball Era. The Dead Ball

Era is said to have started in 1901 with the establishment of the American League. According to baseball-reference.com, several causes potentially led to this. For one, foul balls never counted as a strike against a batter until 1901 in the NL and 1903 in the AL. Additionally, “trick” pitches, such as the spitball began to surface early in the twentieth century. Finally, and perhaps the most significant of the causes, defense improved remarkably at the turn of the century. All of these factors contributed to scoring decreasing from the 1890s to the early 1900s.

The Live Ball Era ended with the end of the 1919 season. In 1919, started being produced with higher quality materials and machines started winding the cores of baseballs. In 1920, the spitball was officially outlawed, with exception to “17 ‘bona-fide’ spitballers” that still were allowed to use their signature . Additionally, after a tragic death, umpires were instructed to replace dirty balls during a game so that players could see the ball better, in order to improve player safety. These factors, along with the emergence of , who proved players could be successful trying to home runs, are all considered factors in the end of the Dead Ball Era and emergence of the Live Ball Era.

IV. The Statistics in the Study

As this study is a comparative analysis, only statistics that were available or calculable

from the statistics available in both league databases were considered. The following is

interesting information about some of the statistics utilized in the database. For a list of the

statistics used and their descriptions see Appendix 2.

a. Batting

Caught Stealing – CS

Data for as an offensive statistic is sporadic in both databases, with no

offensive caught stealing recorded in either league until 1909 in the Negro Leagues. In the

Major Leagues, the database seems to have complete offensive data for caught stealing from

1920 and on, however in the Negro Leagues, data for caught stealing is never complete. Caught

stealing is also recorded as a fielding statistic in the Major Leagues, but is not available in the

Negro Leagues in the fielding category. Because of this, caught stealing was never used in this

study.

Ground into Plays – GIDP

GIDP was not officially recorded as a Major League statistic until 1933. It is noteworthy that, in 1915, 74 GIDP were recorded in Negro League games, showing that the Negro Leagues began recording GIDP as a statistic before the Major Leagues, however as there is not data available for GIDP in the Major Leagues in this study, the statistic could not be used in this study past this discovery. Double plays (DP) is recorded in both leagues as a fielding statistic, so double plays can be compared between the leagues in the fielding category.

Strikeouts – SO While complete data for offensive strikeouts was available in the Major League

Database, the Negro League Database often did not record strikeouts as an offensive statistic. In fact from 1916 through 1924, not a offensive is recorded in the Negro League

Database. Because of this, strikeouts could not be analyzed in this study from an offensive perspective.

Hit by Pitch – HBP

The offensive data for hit by pitches is missing for only the 1924 season in the Negro

Leagues. As a result hit by pitches, as an offensive stat, could not be compared over the 1924 season, and certain tests using offensive hit by pitches had to be manipulated. Hit by pitches allowed were, however, available in 1924 as a pitching statistic, and so comparisons for hit by pitches could be made for the 1924 season by using data from the pitching category.

Sacrifice Flies – SF

The Major Leagues did not record separate values for sacrifice flies and sacrifice hits

(SH) until 1954. However, the Negro Leagues do have separate values for sacrifice flies over the period 1909-1916. This suggests that Negro League teams were the first to record sacrifice flies as its own statistic. Since there is no data for sacrifice flies in the Major League Database, no comparisons about sacrifice flies in the two leagues could be made. However, statistics such as on base percentage (OBP) and weighted on base average weights (wOBA Weights) reference sacrifice flies where available.

b. Pitching

Home Runs Allowed – HR

The pitching data for home runs allowed is missing for only the 1924 season in the Negro

Leagues. As a result home runs allowed, as a pitching stat, could not be compared over the 1924 season, and certain tests using home runs allowed had to be manipulated. Home runs were,

however, available in 1924 as an offensive statistic, and so comparisons for home runs could be

made for the 1924 season by using data from the batting category.

c. Fielding

Stolen Bases Allowed– SB

The fielding data for stolen bases allowed is missing for only the 1923 season in the

Negro Leagues. As a result stolen bases allowed, as a fielding stat, could not be compared over

the 1923 season. Stolen bases were, however, available in 1923 as an offensive statistic, and so comparisons for stolen bases could be made for the 1923 season by using data from the batting category.

Range Factor – RF

Range Factor is a statistic developed by sabermetrician that measures a team’s and assists divided by the number of innings played by the team. The statistic is used to

measure a team’s overall “range” in the field. In principle a team with more range would be able

to make more plays in the field, and would collect more putouts and assists per , resulting

in a higher Range Factor than a team that has less range.

d. General

Accounting for Unrecorded Statistics

Some of the “count” statistics examined in this study were not official statistics that were

recorded in the MLB until after 1904, where this study begins. Among these statistics are runs

batted in and saves. To obtain each team’s values for statistics that were not officially recorded,

compilers of the databases referenced individual game logs and box scores, applying the criteria

of each statistic. V. Completeness of data

The data in this study is not totally complete, especially in the Negro League database. In

the “About” section of the database, Ashwill explains, “Box scores and game accounts for Negro

league and independent black teams in the U.S. have been drawn from dozens of disparate and

sometimes very hard-to-find sources. Negro league statistics are thus almost never complete,

and it's highly unlikely we will ever achieve comprehensive coverage in every season. On the

other hand, the Cuban league statistics presented here are nearly complete, with the exception of

one missing Almendares/Habana game in the 1904/05 Liga Habanera season, and

home games for the 1907/08 and 1908/09 Liga General seasons.” In this study, both data sets

also are incomplete in that some statistics, such as sacrifice flies, caught stealing and more are

available for every team in every year.

VI. Questions to be explored in the study

The purpose of this study is to statistically compare the Negro Leagues and the Major

Leagues from the 1904 season through the 1924 season. The fundamental question to be

answered is: What were the major differences and similarities between the two leagues?

Through statistical tests, it is the goal of this study to determine what kind of events occurred

more often or less often in the Negro Leagues than in the Major Leagues. Furthermore the study

will use correlation and linear and logistic regression to explore what statistics serve as good

predictors of a team’s success. In this aspect of the study, it is the goal to find what statistics

for each league can be used to predict how well a team was able to score runs as batters, prevent

runs as and win games as a team.

Methodology

I. Introduction

The data in this study was compiled and analyzed using two distinct approaches. In

Approach I, team data was collected for each season, treating each team’s statistical performance over the season as an observation. In Approach II, data for the league totals for each season was

compiled to display league averages. In addition, in Approach II, events such as games, innings

and at bats were considered single observations to compare the rate at which certain events

occurred from the events in each league.

II. Approach I

A. Data Collection

In Approach I, team data for batting, pitching and fielding from the Major Leagues was

obtained from baseball-reference.com and copied into an Excel spreadsheet for each season.

Each category of data, batting, pitching and fielding, was allocated its own worksheet on the

spreadsheet. Team data from the Negro Leagues was obtained via the Negro League Database

on seamheads.com. Using the database’s “Min PA” and “Min IP” functions, only data from

teams that logged at least 500 plate appearances over a season was collected for batting data, and

only data from teams that logged at least 100 innings pitched over a season was collected for

pitching data. Team batting data that fit the criteria was copied into the same Excel worksheet

that contained MLB batting data for each season. Team pitching data that fit the criteria was

copied into the same Excel worksheet that contained MLB pitching data for each season. Team

fielding data was copied into the same Excel worksheet that contained MLB fielding for each

season. Finally, a “Master” Excel spreadsheet was created. Team batting, pitching and fielding

data for each league was copied into this spreadsheet. The format of this spreadsheet is the same

as the spreadsheet for each individual season, but each worksheet includes season totals for each

team for each season from 1904 through 1924 (1923 for fielding data). An extra worksheet was

created on this Master sheet where data from all three categories, batting, pitching and fielding,

was combined.

B. Data Standardization

Due to incomplete data in the Negro Leagues, as well as the fact that Negro League and

Major League seasons were different lengths, in order to compare data between the two league,

statistics must be standardized with respect to events that can be considered “equal” in both

leagues. For example, instead of comparing how many stolen bases teams in each league earned

over a season, the amount of stolen bases earned per game by teams in each league was

compared. For Approach I, the following standardized statistics were calculated and analyzed

(see Appendix 2 for definitions and formulas of statistics):

i. Batting – R/G, BA, OBP, SLG, RBI%, BB%, 2B/AB, 3B/AB, HR/AB,

HBP/G, SH/G, SB/G

ii. Pitching – ERA, SHO%, SV%, IP/G, WHIP, H/9, HR/9, BB/9, SO/9,

SO/BB, CG%, ER%, WP/9, BF/9, R/9, HBP/9

iii. Fielding – Inn/G, Fld%, E/9, PB/9, SB/9, RF

C. Data Analysis

i. T-Tests In Approach I, the first step in analyzing and comparing the data was to perform t-tests on the two independent data sets for each statistic above in order to compare the means of each sample. For the t-tests, the “t-Test: Two Sample Assuming Equal Variances” function from

Excel Data Analysis Toolpak was utilized. For each season, a two-tailed t-test was performed on the two data sets for each statistic using this function.

After completing tests for each year, three t-tests were run for each statistic on the Master sheet using the same funtion. The first test considered data from each data set for the entire era,

1904-1924 (1923 for fielding data). The second test considered data from the Dead Ball Era,

1904-1919 and the third considered data from the Live Ball Era, 1920-1924 (1923 for fielding data).

For each t-test performed, the league which had a larger average value was recorded, as was the p-value measuring the significance in the difference of the sample means. The null hypothesis for each t-test was that there is no significant difference between the two leagues in a given statistic.

ii. Correlation Tests

The second step of data analysis in Approach I was to perform correlation tests to determine what batting statistics correlated with scoring runs and what pitching statistics correlated with preventing runs. To perform correlation tests, the Excel function “Correlation” from the Data Analysis Toolpak was used to produce a correlation matrix. To test the offensive statistics that correlate to scoring runs, a “R/G Correlation Matrix” with the statistics R/G, BA,

OBP and SLG was created for each season. To test the pitching statistics that correlate to preventing runs, an “ERA Correlation Matrix with the statistics ERA, BB/9, SO/9, H/9 and HR/9

was created for each season.

Correlation tests were also run on the Master sheet. Using the same Excel function and

the same statistics used in the season correlation tests, three more matrices for each R/G and

ERA were created. One tested for correlation over the entire era (1904-1924), one for

correlation over the Dead Ball Era (1904-1919) and one for correlation over the live ball era

(1920-1924).

iii. Regressions

The final step for data analysis in Approach I was to run linear and logistic regressions on each data set to determine what statistics were most important in scoring runs, preventing runs and winning games in each league. Season data does not provide enough observations to perform a reliable regression, so when performing these regressions, only the Master sheet was used.

To test batting statistics that can predict run scoring, two multiple linear regressions were run. The first tested the dependent variable, R/G, in terms of independent variables BA, OBP and SLG. The second tested the dependent variable, R, in terms of independent variables BB,

HBP, 1B, 2B, 3B, HR and SB.

To test pitching statistics that can predict run prevention, one multiple linear regression was run. The regression tested the dependent variable, ERA, in terms of independent variables

BB/9, SO/9, H/9 and HR/9. To test all the statistics that can predict winning, a logistic regression was run. The regression tested the dependent variable, W%, which is equivalent to the probability that a team wins any given game, in terms of independent variables BA, OBP, SLG, BB/9, SO/9, H/9, HR/9 and Fld%.

The regressions were run using the software program R. For linear regressions, the function lm() was used and for logistic regressions the function glm() was used. After initially running each regression, the step() function was used to determine which subset of independent variables served as the best to predict the dependent variable. See Appendix 3 for all R code utilized.

In each of these regressions, a statistic that is missing data from the 1924 Negro League season is present in the independent variables (the omission of HBP in the batting statistics influences the data for OBP for the 1924 season). Because of this all regressions were run using only data from the 1904 season through the 1923 season, with a conditional exception. If the step() function in R determined that the statistic with missing data is not an adequate predictor of the dependent variable, then a new regression was run without the statistic and including the

1924 season.

Three versions of each regression were run: one considering data from the entire era

(1904-1923/4), one considering data from the Dead Ball Era (1904-1919) and one considering data from the Live Ball Era (1920-1923/4).

D. Data Presentation

i. T-Tests

Results from the t-tests were presented in two ways. The first was by creating a table which counted the number of seasons that statistics showed significant difference between the leagues to three significance levels. There will be three tables for each statistic category, one for the entire era, 1904-1924, one for the Dead Ball Era, 1904-1919, and one for the Live Ball Era,

1920-1924. Each table will be split into two sections, one that displays times when the statistic was a larger number in the Negro Leagues and one that displays times when the statistic was a larger number in the Major Leagues. Each section will display the number of seasons within the era where the p-value was less than 0.01, between 0.01 and 0.05 and between 0.05 and 0.1.

Additionally there will be a table that displays the p-values for each statistic over the entirety of each era, along with the league where the statistic was a larger number.

The second way to present t-test data was through a heat map programmed in R (see

Appendix 3 for code). First the p-values for every test were compiled into a table, organized by year or era and the statistic tested. Next, the base 10 logarithm of each p-value was taken, and this value was signed based on which league average was larger for the given statistic, with positive values representing the Negro League average being larger than the Major League average. These values were used to create a heat map in R, using the ggplot2 package developed by Hadley Wickham. For each season and era, the map displays a colored block for each statistic. A blue block indicates that the Major League average was larger than the Negro

League average for that statistic in that season or era, while a red block indicates that the Negro

League average was larger than the Major League average for that statistic in that season or era.

The darker the color appears in the map, the more significant the difference between the two league averages. The map capped the signed log10(p-values) at negative three and three, so a p- value that is less than or equal to 0.001 is lowest discernible significance value in the map. A light gray block indicates no difference between the two league averages in the statistic, and a dark gray block indicates that there was no data for the statistic.

ii. Correlation Tests

Correlation tests in Approach I will be presented in in a table displaying a statistic’s correlation coefficient for each season and era to the statistic being analyzed (R/G for batting and

ERA for pitching).

iii. Regressions

Results from regressions will be displayed and analyzed as they are returned in R.

III. Approach II

A. Data Collection

In Approach II, league totals for each season and era will be compiled into Excel spreadsheets. For Major League data, league totals were copied directly from baseball- reference.com . For Negro League data, team data was copied into worksheets for each season, this time with no restrictions on what data was included, then the totals for each season and era were summed up and compiled into an Excel worksheet.

B. Data Standardization

In Approach II, data will be standardized in the same manner that it is in Approach I.

Additionally, one new batting standardized statistic and five new pitching standardized statistics will be used for chi square tests. Those statistics are HBP% for batting BB%, SO%, H%, HR% and HBP% for pitching.

C. Data Analysis

i. League Averages

Using the league totals, the standardized statistics for each season and era will be calculated for each league in order to compare league averages between the two leagues.

ii. Chi Square Tests

In Approach II, chi square tests will be run on the following statistics:

a. Batting – BB%, HBP%, RBI%, BA, 2B/AB, 3B/AB, HR/AB, OBP

b.Pitching – CG%, SHO%, SV%, ER%, BB%, SO%, H%, HR%,

HBP%

c. Fielding – Fld%

Each of these statistics can be considered a proportion statistic with events and successes and failures. For example, in BA, at bats are “events”. An that ends in a hit is a “success” and an at bat that ends with no hit is a “failure”. The BA value is the proportion. Using the prop.test() function in R, chi square tests were applied to the above statistics to find significant differences in the proportions in each league.

For each chi square test performed, the league which had a larger average value was recorded, as was the p-value measuring the significance in the difference of the sample means.

The null hypothesis for each test was that there is no significant difference between the two leagues in a given statistic. D. Data Presentation

i. League Averages

The league totals and averages for each season and era will be displayed in data tables for each league. The tables will display totals of all the count data, as well as the calculated standardized statistics from Approach I. Additionally, time plots will be constructed in R. There will be a plot for each of the standardized statistics from Approach I. Each time plot displays a blue line for the Major League average and a red line for the Negro League average from 1904 through 1924. These plots will be utilized to analyze the trend of each statistic over time in each league.

ii. Chi Square Tests

Results of chi square tests will be displayed exactly how t-test results were displayed in

Approach I, with frequency tables and heat maps of signed log10(p-values).

A Note on Multiple Testing

As this study entails taking over 20 t-tests and/or chi square tests for each statistic tested, multiple testing must be considered. In over 20 tests, it should be expected that at least one test will display significance to p<0.05. Because of this, only statistics that show consistent significant differences will be considered conclusive data.

Results

I. Approach I –Team Data

a. Batting

i. T-tests

Table 1

TOTAL 1904-1924 (21 Seasons) STAT Negro > MLB MLB > Negro p<0.1 p<0.05 p<0.01 p<0.1 p<0.05 p<0.01 R/G 2 5 8 0 0 0 BA 0 1 0 1 2 5 OBP 0 1 0 1 3 3 SLG 0 0 0 2 2 4 RBI% 0 0 0 1 4 12 BB% 2 2 3 1 0 0 2B/AB 0 0 0 0 3 8 3B/AB 0 0 0 2 2 6 HR/AB 0 1 0 1 3 0 HBP/G* 0 3 4 0 0 0 SH/G 0 1 1 0 0 0 SB/G 2 3 5 0 1 0

Table 1 displays the frequency of seasons over the 21 season span that a significant

difference was found for each statistic. The left side of the table shows the frequency of seasons

where the average value of each statistic was significantly greater in the Negro Leagues than in

the Major Leagues, while the right side of the table displays the opposite. Reference Appendix

4-XI-a and the “Batting T-test Heatmap” in Appendix 6-II for further information on batting t-

test results

ii. Correlations

Tables in Appendix 4-IX-b display correlation coefficients of BA, OBP and SLG to R/G for each season, as well as the entire era, the Dead Ball Era and the Live Ball Era. All correlation coefficients are positive and OBP correlated the strongest with R/G in most seasons for both leagues. OBP also had the largest correlation coefficient for every era in both leagues.

Correlation also grew stronger for each stat in the Live Ball Era, as both leagues show higher correlation coefficients for the Live Ball Era than for the Dead Ball Era. In the Negro Leagues, correlation coefficients inflated drastically.

iii. Regressions

Using the step() and lm() functions in R, two linear regressions were run to determine the best predictors for a team scoring runs. Each regression was run for the entire era, the Dead Ball

Era and the Live Ball Era. Reference sections I and II in Appendix 5 for R output from the following regressions.

For the first regression, R acted as the dependent variable to be described in terms of the independent variables BB, HBP, 1B, 2B, 3B, HR and SB. The step() function returns which subset of independent variables should be used to most accurately describe the dependent variable. A linear regression is then run using that subset of variables as the new set of independent variables. For the entire era, the regression for each league produced the following approximate linear models:

= 306.82 + 0.31 × + 0.54 × + 0.41 × 1 + 0.87 × 2 + 1.19 × 3 + 1.37 × + 0.29 ×

푅푀퐿퐵 − 퐵퐵 퐻퐵푃 퐵 퐵 퐵 퐻푅 푆퐵 = 3.47 + 0.28 × + 0.40 × 1 + 0.38 × 2 + 0.83 × 3 + 1.91 × + 0.19 ×

푅푁푒푔푟표 − 퐵퐵 퐵 퐵 퐵 퐻푅 푆퐵 The step() function in R determined HBP not to be a reliable predictor of R for the Negro

Leagues. For the MLB regression, the coefficient for HBP has the lowest significance level at

p<0.01, while the other coefficients are significant to p<0.001. The regression results for the

Dead Ball Era, 1904-1919, produced the following approximate linear models:

= 292.51 + 0.27 × + 0.63 × + 0.41 × 1 + 0.96 × 2 + 1.15 × 3 + 1.20 × + 0.30 ×

푅푀퐿퐵 − 퐵퐵 퐻퐵푃 퐵 퐵 퐵 퐻푅 푆퐵 = 1.81 + 0.16 × + 0.42 × 1 + 0.45 × 2 + 1.07 × 3 + 1.79 × + 0.23 ×

푅푁푒푔푟표 − 퐵퐵 퐵 퐵 퐵 퐻푅 푆퐵 Once again, HBP was eliminated as a predictor for R for the Negro Leagues, and HBP’s

coefficient has the lowest significance level at p<0.01. The regression results for the Live Ball

Era, 1920-1924, produced the following approximate linear models:

= 457.19 + 0.44 × + 0.51 × 1 + 0.66 × 2 + 1.49 × 3 + 1.54 ×

푅푀퐿퐵 − 퐵퐵 퐵 퐵 퐵 퐻푅 = 9.22 + 0.44 × + 0.37 × 1 + 0.31 × 2 + 0.59 × 3 + 1.94 × + 0.2 ×

푅푁푒푔푟표 − 퐵퐵 퐵 퐵 퐵 퐻푅 푆퐵 HBP was eliminated as a predictor from both league’s regressions. SB was also

eliminated from the MLB’s regression. The Negro League regression has two predictors with

coefficients that are not significant to p<0.1, 2B and SB, but neither was eliminated by the step() function.

The step() function eliminating HBP as a reliable predictor of R allowed for all 21 seasons to be considered in these regressions. Due to missing data, all other regressions will be

run from 1904-1923, using 1920-1923 for the Dead Ball Era.

The second batting regression, is a linear regression to model the dependent variable, R/G in terms of independent variables, BA, OBP and SLG. For the entire era, the regression for each league produced the following approximate linear models: = 4.12 + 16.15 × + 8.59 × 푅 �퐺푀퐿퐵 − 푂퐵푃 푆퐿퐺 = 2.09 + 14.54 × + 6.71 × 푅 �퐺푁푒푔푟표 − 푂퐵푃 푆퐿퐺 For the Dead Ball Era, the regression for each league produced the following approximate linear models:

= 4.18 + 15.21 × + 9.72 × 푅 �퐺푀퐿퐵 − 푂퐵푃 푆퐿퐺 = 1.98 + 14.45 × + 6.54 × 푅 �퐺푁푒푔푟표 − 푂퐵푃 푆퐿퐺 The Live Ball Era produced the following results for the regression.

= 5.63 + 17.4 × + 11.15 × 푅 �퐺푀퐿퐵 − 푂퐵푃 푆퐿퐺 = 2.58 + 12.33 × + 9.93 × 푅 �퐺푁푒푔푟표 − 푂퐵푃 푆퐿퐺 The stepwise regression eliminated BA as a reliable predictor of R/G for both leagues for all three eras.

b. Pitching

i. T-Tests

Table 2

TOTAL 1904-1924 (21 Seasons) STAT Negro > MLB MLB > Negro p<0.1 p<0.05 p<0.01 p<0.1 p<0.05 p<0.01 ERA 2 4 0 0 1 0 SHO% 0 0 0 1 5 2 SV% 0 0 0 1 6 10 IP/G 0 0 0 0 2 10 WHIP 0 2 0 3 0 0 H/9 1 0 0 1 0 3 HR/9* 0 0 0 0 6 1 BB/9 2 4 2 0 0 0 SO/9 2 2 9 1 1 0 SO/BB 4 0 5 1 0 2 CG% 5 1 12 0 0 0 ER% 0 0 0 3 0 17 WP/9 0 0 0 1 4 10 BF/9 0 1 13 0 0 0 R/9 4 4 7 0 0 0 HBP/9 1 4 1 0 0 0

Table 2 is set up in the same format as Table 1, but is compiled from the results of t-tests

performed on pitching statistics. Refer to Appendix 4-XII-b and the “Pitching T-test Heatmap” for further information on pitching t-tests.

ii. Correlation

See Appendix 4-IX-a for tables of correlation coefficients between ERA and BB/9, SO/9,

H/9 and HR/9. SO/9 generally had negative correlation coefficients with ERA, but not always, and H/9 generally provided the strongest correlation to ERA for both leagues.

iii. Regression

Using the step() and lm() functions in R, two linear regressions were run to determine the best predictors for a team’s pitchers to prevent runs. Each regression was run for the entire era, the Dead Ball Era and the Live Ball Era. Reference section III in Appendix 5 for R output from the following regressions.

The regression attempted to describe the dependent variable ERA in terms of BB/9,

SO/9, H/9 and HR/9. Running the regression on the entire era produced the following approximate models:

= 2.7 + 0.33 × 9 + 0.53 × 9 + 1.11 × 9 퐸푅퐴푀퐿퐵 − 퐵퐵� 퐻� 퐻푅� = 3.04 + 0.47 × 9 + 0.54 × 9 + 1.52 × 9 퐸푅퐴푁푒푔푟표 − 퐵퐵� 퐻� 퐻푅� Running the regression on the Dead Ball Era produced the following approximate models:

= 2.74 + 0.31 × 9 + 0.54 × 9 + 1.43 × 9 퐸푅퐴푀퐿퐵 − 퐵퐵� 퐻� 퐻푅� = 2.74 + 0.47 × 9 + 0.5 × 9 + 1.44 × 9 퐸푅퐴푁푒푔푟표 − 퐵퐵� 퐻� 퐻푅� Running the regression on the Live Ball Era produced the following approximate models:

= 2.48 + 0.44 × 9 0.1 × 9 + 0.52 × 9 + 0.7 × 9 퐸푅퐴푀퐿퐵 − 퐵퐵� − 푆푂� 퐻� 퐻푅� = 2.74 + 0.62 × 9 + 0.12 × 9 + 0.65 × 9 + 0.66 × 9 퐸푅퐴푁푒푔푟표 − 퐵퐵� 푆푂� 퐻� 퐻푅� The Live Ball Era was the only era where SO/9 was not eliminated in the stepwise regression.

c. Fielding

i. T-tests

Table 3

TOTAL 1904-1923 (20 Seasons) STAT Negro > MLB MLB > Negro p<0.1 p<0.05 p<0.01 p<0.1 p<0.05 p<0.01 Inn/G 0 0 0 1 3 16 FLD% 0 0 0 0 0 20 E/9 0 1 19 0 0 0 DP/9 0 0 20 0 0 0 PB/9 0 3 0 0 0 2 SB/9* 0 3 6 1 0 0 RF 0 0 0 1 1 0

Table 3 has the same format as Tables 1 and 2, but considers data from t-tests performed on fielding data. Reference Appendix 4-VII-c and the “Fielding T-test Heatmap” in Appendix 6-

II for further information on the tests.

ii. Logistic Regression

Logistic regression entails describing the natural log of odds in a linear model. For this

study, this will be utilized with the glm() and step() functions in R to find a logistic model for

W% in terms of BA, OBP, SLG, BB/9, SO/9, H/9, HR/9 and Fld%. The odds ratio of W% is

% %. 푊 1−푊 % ln ( %) = f(BA, OBP, SLG, BB/9, SO/9, H/9, HR/9, Fld%) 푊 1−푊 Consider f to be a linear combination of the independent variables. From this,

% ( %) = 푊 푓 1−푊 푒 % = (1 %) 푓 푊 푒 − 푊 %(1 + ) = 푓 푓 푊 푒 푒 % = 푓 . 푒 푓 푊 1+푒 The results from the entire era produce the following models:

...... % % = 퐻 퐵퐵 퐻푅 1 +−8 05.+7 25.∗푂퐵푃+5 62.∗푆퐿퐺−0 33.∗ �9−0 20.∗ �9−0 49.∗ �9+7 62.∗퐹푙푑 % 푒 푀퐿퐵 퐻 퐵퐵 퐻푅 푊 −8 05+7 25∗푂퐵푃+5 62∗푆퐿퐺−0 33∗ �9−0 20∗ �9−0 49∗ �9+7 62∗퐹푙푑 푒 . . . . . % % = 퐻 퐵퐵 1 +−6 17.+11 47.∗푂퐵푃−0 32.∗ �9−0 22.∗ �9+6 27.∗퐹푙푑 % 푒 푁푒푔푟표 퐻 퐵퐵 푊 −6 17+11 47∗푂퐵푃−0 32∗ �9−0 22∗ �9+6 27∗퐹푙푑 푒 The results from the Dead Ball Era produce the following results:

...... % % = 퐻 퐵퐵 퐻푅 1 +−6 71.+7 99.∗푂퐵푃+5 80.∗푆퐿퐺−0 35.∗ �9−0 21.∗ �9−0 72.∗ �9+6 11.∗퐹푙푑 % 푒 푀퐿퐵 퐻 퐵퐵 퐻푅 푊 −6 71+7 99∗푂퐵푃+5 80∗푆퐿퐺−0 35∗ �9−0 21∗ �9−0 72∗ �9+6 11∗퐹푙푑 푒 . . . . . % % = 퐻 퐵퐵 1 +−5 55.+11 92.∗푂퐵푃−0 32.∗ �9−0 26.∗ �9+5 63.∗퐹푙푑 % 푒 푁푒푔푟표 퐻 퐵퐵 푊 −5 55+11 92∗푂퐵푃−0 32∗ �9−0 26∗ �9+5 63∗퐹푙푑 푒 The step() function considered BA and SO/9 not to be reliable predictors of W% for either league in the Dead Ball Era or the total era, and for the Negro Leagues, SLG and HR/9 were also eliminated as an independent variable in the regressions. The regression for the Live Ball Era produces the following results:

...... % % = 퐻 퐵퐵 푆푂 1 +−10 36.+4 76.∗푂퐵푃+4 44.∗푆퐿퐺−0 28.∗ �9−0 2∗. �9+0 14.∗ �9+10 27.∗퐹푙푑 % 푒 푀퐿퐵 퐻 퐵퐵 푆푂 푊 −10 36+4 76∗푂퐵푃+4 44∗푆퐿퐺−0 28∗ �9−0 2∗ �9+0 14∗ �9+10 27∗퐹푙푑 푒 . . . . % % = 퐻 1 +−9 67.+11 56.∗푂퐵푃−0 34.∗ �9+5 63.∗퐹푙푑 % 푒 푁푒푔푟표 퐻 푊 −9 67+11 56∗푂퐵푃−0 34∗ �9+5 63∗퐹푙푑 푒 In the MLB, BA and HR/9 were eliminated as predictors, while in the Negro Leagues,

BA, SLG, H/9, BB/9, SO/9 and HR/9 were all eliminated, leaving only OBP, H/9 and Fld% as

predictors of W%.

II. Approach II – League Data

a. League Averages

See Appendix 4 sections I through VI for total league averages for each standardized statistic studied for each year in each category. Additionally, see Appendix 6-I sections a

through c for time plots of each statistic studied over the time span 1904-1924.

b. Chi Square Tests

i. Batting

Table 4

TOTAL 1904-1924 (21 Seasons) STAT Negro > MLB MLB > Negro p<0.1 p<0.05 p<0.01 p<0.1 p<0.05 p<0.01 BB% 0 0 10 1 0 1 HBP% 0 3 8 0 0 0 RBI% 0 0 0 0 2 16 BA 0 2 0 0 5 9 2B/AB 0 0 0 1 5 12 3B/AB 0 0 0 1 3 9 HR/AB 0 1 1 0 4 2 OBP% 3 1 2 1 2 7

Table 4 is formatted the same as Table 1, however it considers statistics that were tested

for differences in proportions using a chi square test. See Appendix 4-VIII-a and the “Batting

Chi Square Heatmap” in Appendix 6-II for more information and results on the chi square tests.

ii. Pitching

Table 5

TOTAL 1904-1924 (21 Seasons) STAT Negro > MLB MLB > Negro p<0.1 p<0.05 p<0.01 p<0.1 p<0.05 p<0.01 CG% 0 5 15 0 0 0 SHO/G 0 0 0 3 3 1 SV/G 0 0 0 2 7 7 ER% 0 0 0 0 1 19 BB% 0 1 9 0 0 3 SO% 3 0 14 0 0 3 H/BF 0 0 0 2 0 14 HR/BF* 0 1 2 1 1 5 HBP/BF 9 1 7 0 0 0

Table 5 is the same format as Table 4, however it considers results from chi square tests

for proportion differences from pitching statistics. See Appendix 4-VIII-b and the “Pitching Chi

Square Map” in Appendix 6-II for more information and results from chi square tests.

iii. Fielding

Table 6

TOTAL 1904-1923 (20 Seasons) STAT Negro > MLB MLB > Negro p<0.1 p<0.05 p<0.01 p<0.1 p<0.05 p<0.01 FLD% 0 0 0 0 0 20

Table 6 displays results from chi square tests run on the difference in proportions

between the Negro and Major Leagues. Fld% was the only fielding data analyzed in the chi

square tests, and the results showed that Major League teams had significantly (p<0.01) higher

fielding percentages for all 20 seasons where fielding data was available in the study.

Discussion

T-tests and Chi Square

The results of t-tests in Table 1 show that in 15 of the 21 seasons, Negro League teams scored significantly more runs per game than teams in the MLB (p<0.1), while the MLB never scored significantly more runs than the Negro Leagues. Referencing Appendix 4-VII-a, all eight seasons where t-tests showed extreme significance (p<0.01) favoring the Negro Leagues in R/G came during the 16 seasons studied during Dead Ball Era. Over the five seasons studied during the Live Ball Era, t-tests showed a significant twice (p<0.1), with one of those seasons showcasing a very significant difference (p<0.05).

The “Batting T-test Heatmap” in Appendix 6-II, shows that in each of the six final seasons of the Dead Ball Era (1914-1919), Negro League run production was significant to p<0.05, and to p<0.01 in five of them. However, at the start of the Live Ball Era in 1920, there was no significant difference in the statistic, with the MLB team mean for R/G was slightly greater than the Negro League team mean.

Table 1 shows that BA, OBP and SLG each significantly favored the MLB in at least one-third of the seasons studied. The “Batting T-test Heatmap” shows that these statistics most significantly favored the MLB during the early seasons of the study and during the Live Ball Era at the end of the study. In each of the 1904 and 1905 seasons, BA, OBP and SLG all favored the

MLB to significance p<0.01 and in 1907 Major League BA and SLG were both significantly greater to p<0.05. BA did not significantly favor the MLB in any season from 1908-1919, and

OBP and SLG each only did once over that span (p<0.1). In 1914, OBP and BA both significantly favored the Negro Leagues (p<0.05), the only

year where either statistic was significantly in favor of the Negro Leagues. However, the chi square tests suggests that Negro League OBP was significantly higher to at least p<0.1 in six seasons and was significantly greater to p<0.01 significance in two seasons. Additionally, chi square results show two seasons where BA is favored significantly (p<0.05) in the Negro

Leagues. This or any discrepancy between t-tests and chi square tests may be explained by a few

different things. For one, data that was omitted in the t-tests may be present in chi square tests,

because no criteria were set whether a team’s data could be used in Approach II. Additionally,

the chi square test is treating shorter events as observations. In the case of OBP, any AB, BB,

HBP or SF is considered an observation where either a success (a H, BB or HBP) or failure occurred. For a full MLB season, that’s nearly 100,000 observations being considered, and potentially over 10,000 for a Negro League season. With such a large sample size of observations, it takes less of a difference between leagues demonstrate significance.

Examining the batting heatmaps for both t-tests and chi square tests, it is seen that the two methods agree that there was a period from 1914 through 1918 where, significant or not, the

Negro Leagues typically had a higher OBP or BA.

This was true throughout the study, for example, RBI% favored the MLB with extreme significance (p<0.01) in four of the five Dead Ball Era seasons and in 12 of 21 total seasons.

From the “Batting T-test Heatmap,” the four seasons that showed no significant difference all fell within 1914 and 1919. The chi square tests only show three seasons where the RBI% difference was not significant, however those three seasons also all fall between 1914 and 1919.

Examining the heatmaps in Appendix 6-II shows that, although chi square tests usually displayed a higher significance level, the t-tests and chi square tests supported each other when analyzing

the same, or similar (ie HR/9 in t-test, HR% in chi square), statistics.

In addition to t-test and chi square results supporting each other, results between different

categories, batting, pitching and fielding, supported each other. For an obvious example view

the timeplots for BB% and BB/9 below.

These plots, though on different scales, are nearly identical in shape, as they should be.

Other pairs of statistics from different categories, such as HR/AB and HR/9, BA and H/9, SB/G

and SB/9, R/G and R/9 also have similarly shaped timeplots.

An early question presented in this study was how do the early Negro League teams score more runs per game when most hitting statistics suggest better hitting in the MLB? This question too can be answered using other categories. First, consider the the time plots for RBI% and ER% below.

Both of these plots follow a similar model, with the MLB value slowly and steadily

increasing, with the Negro League value also increasing, slowly approaching the MLB value.

These plots suggest that these two statistics are related to each other in some way. Examining

the fielding data, there are two more statistics that share something in common with these.

It can immediately be seen that the plot for Fld% follows a similar pattern as RBI% and

ER%, while E/9 is the mirror image of the model. As the errors committed in the Negro Leagues

goes down, the Fld%, RBI% and ER% all go up. Analyzing the plots for R/G, it is seen that the

Negro League scoring averages don’t jump drastically until 1921 despite consistent increases in offensive production in statistics such as BA and OBP over the years. Additionally, the plot for

ERA, not only shows that pitchers gave up more earned runs, but that hitters earned more runs

over the course of the era.

The difference in fielding acts as the primary difference between the leagues, which explains the anomoly of the early years of the Negro Leagues showcasing greater run production, despite lesser hitting production. The discrepancies in fielding statistics could also explain some of the other significant statistical differences in the data. One example is that were far more rare in the Negro League than in the MLB, despite relatively insignificant differences in pitching statistics such as ERA and WHIP, and significant differences favoring the Negro

Leagues in SO/9 and SO/BB. Poorer Fld% and committing errors more often increase the potential for unearned runs as seen above, decreasing the likelihood of shutouts. For another example, Negro League teams consistently faced more batters per nine innings than MLB teams, even in seasons where MLB teams recorded hits more often. The Negro Leagues relatively low

Fld% and high E/9 allow more baserunners to reach base. In addition, Negro League teams generally drew more walks over the era and were hit by more pitches. The result of this is more baserunners reaching safely in the Negro Leagues. Even when MLB teams were superior at getting hits, walks, hit by pitches and errors in the Negro League result in more baserunners and more .

This may also contribute to some other significantly different statistics. For one, SB/G and SB/9 generally favored the Negro Leagues, especially in the early seasons studied. With more baserunners, there are more opportunities to steal a base. By the same logic, with more

baserunners, there are more opportunities to execute double plays, which may contribute to the

extreme significance (p<0.01) favoring the Negro Leagues’ DP/9 values in all 20 seasons

studied. The large amount of stolen bases in the Negro Leagues may also contribute to double

plays. Assuming that Negro League teams not only stole more bases, but attempted to steal more

bases, than MLB teams, along with the fact that Negro League teams also generally recorded

more strikeouts, the potential for “strike-him-out throw-him-out” double plays may be increased, however it is difficult to make this assertion without data for CS. Another possibility, although difficult to confirm, is that baserunners in the Negro Leagues were more aggressive and risked being thrown out in caught line drive and situations.

Another ineresting difference in the data sets is that Negro Leagues both walked and struck out significantly more often. One hypothesis to attempt to explain this is that Negro

League players may have strategized to walk more often. It is possible that Negro Leaguers would take more pitches, or intentionally attempt to foul pitches off. If this were the case, fewer events involving balls hit in play would occur, and more events without balls hit in play would occur, including BB, SO and HBP. Another hypothesis is that with more baserunners and a higher potential for errors, pitchers may have been more careful in pitching to hitters, feeling pressured to strike them out and not allow the chance for an .

The Dead Ball Era and Offensive Progression

One interesting thing about analyzing this data is the context of the era. 1920 is known as the end of the Dead Ball Era in Major League Baseball, when hitting statistics and run production began to improve. The study shows that both leagues progressed offensively at very similar rates and times. The first figure to look at when considering this is the timeplot for ERA, displayed above. Although ERA is a pitching statistic, it is a perfect statistic to consider for run production purely from hitting. As runs that scored with the aid of an error are not considered in this statistic, an increasing ERA truly indicates an increase in hitters producing runs. Looking at the plot, it can be seen that the Negro Leagues started with a significantly lower league ERA in the early seasons of the study, but that the Negro League ERA quickly caught up to that of the

MLB and continued to progress similarly to the MLB through the end of the Dead Ball Era. In the MLB, league average ERA jumped from 3.07 to 3.46 in from 1919 to 1920 and then boomed to 4.03 in the 1921 season. In the Negro Leagues, league average ERA actually decreased slightly from 1919 to 1920, going from 3.42 to 3.35, however experienced a similar spike as the

MLB in 1921, jumping to 4.05.

Next, consider the timeplots for BA, OBP and SLG

Similarly to the ERA timeplot, these timeplots show that the Negro Leagues’ offensive production was significantly lower in the early years of this study. Also similarly to the ERA timeplot, these show that Negro League production in each offensive statistic progressed quickly, catching up to that of the MLB production, and peaking out of the Dead Ball Era at similar times.

In the MLB, the average slash line (BA/OBP/SLG) jumped from .263/.322/.348 in 1919 to

.276/.335/.372 in 1920 to .291/.348/.403 in 1921. In the Negro Leagues, the jump came a year later than in the MLB, as .251/.328/.338 in 1919 remained about the same in 1920 at

.251/.319/.322. However the line jumped to .270/.333/.368 in 1921 and again to .277/.340/.383.

This data shows that while the Negro Leagues were slightly slower in transitioning out of the

Dead Ball Era, they did follow the Major Leagues in doing so and in a similar fashion.

The final statistic to examine in considering the increase in offensive production out of the Dead Ball Era is HR/AB. While production in other extra base hits also increased, the most significant depiction of power becoming more prevalent in baseball is the rise of the .

After one look at the plot it is easy to see that the two leagues mirrored each other in HR production over the span of the study. The plot shows that Negro League data is more irratic and unpredictable, however that is to be expected with smaller sample sizes of teams and games. In

1918, Major Leaguers averaged 0.0035 HR/AB (corresponds to 286 AB/HR) and Negro Leaguers averaged 0.0042 HR/AB (236 AB/HR). Three years later in 1921 both the Major and

Negro Leagues had more than doubled their home run production, at 0.011 HR/AB (91 AB/HR) and 0.0094 (107 AB/HR), with both teams peaking in the study at 0.0124 HR/AB (81 AB/HR) in

1922 for the MLB and in 1923 for the Negro Leagues.

After examining run production through ERA, hitting production through slash lines and power production through home run rate, it is evident that both leagues experienced very similar increases in offensive production transitioning out of the Dead Ball Era.

Correlations and Regressions

Correlation coefficients between R/G and BA, OBP and SLG, found in Appendix 4-IX-b, show that, for both leagues, OBP correlated the most strongly with R/G, with BA and SLG both also strongly correlating with R/G. Although not the truth for every season (for the 1913 Negro

League season, R/G and OBP did not correlate very strongly at all with at 0.29 correlation coefficient), most seasons and the totals for each era show OBP with the strongest correlation to scoring runs.

Stepwise multiple linear regressions run for each league in each era (Dead Ball, Live Ball and Total) affirmed that OBP is the most important linear predictor of R/G. The regression was set up to find the best subset of predictors of R/G from BA, OBP and SLG. In all six regressions, BA was eliminated and R/G were presented in terms of OBP and SLG. Also in all six regressions, OBP was assigned the largest coefficient, suggesting that a percentage point increase in OBP positively influences run scoring more than a percentage point increase in SLG.

In the Live Ball Era, however, the coefficient for SLG was larger (11.151 for the MLB and 9.934 for the Negro leagues) for both leagues than it had been in testing the other eras. In addition, in the Negro Leagues, the coefficient for OBP was smaller for the Live Ball Era than it had been in any other era at 12.33 This indicates that across baseball, and particuarly in the Negro Leagues, slugging became more important to run scoring.

Stepwise multiple linear regressions were also run to model R in terms of the best subset of BB, HBP, 1B, 2B, 3B, HR, SB for each era. The test was designed to determine the run values of game events where players either reach base, or advance a base. While the results were interesting, they are also volatile. The output of the regressions is an equation that describes a team’s total number of runs scored on a season based on their total number of each independent statistic. Since MLB teams played much longer, well-documented seasons, MLB teams scored many more runs and collected many more of each independent statistic than Negro League teams did. For this reason, the intercept of the MLB regressions is always a much more negative number. However, the coefficients of the data are theoretically, the number of runs each event is worth on average. The results suggested that Negro League teams valued home runs much higher than any other statistic, valued at nearly two runs. The next highest coefficient in any of the Negro League regression is 3B at 1.07 and only 3B and HR ever eclipse 0.5 as their coefficient in the regression. In the MLB, more evenly distributed weights were found for each event. This may be a result of more baserunners in the Negro Leagues, causing home runs to be on average more valuable than in the MLB.

Correlation and regression tests were also run on ERA to determine predictors of pitcher run prevention. Appendix 4-IX-a contains tables of correlation coefficients between ERA and

BB/9, SO/9, H/9 and HR/9 for each league. Correlation coefficients between ERA and SO/9 are generally negative, although not always. The tests also showed that H/9 consistently had the strongest correlation to ERA, Stepwise multiple linear regressions were also run for each league in each era, to describe

ERA in terms of the best subset of H/9, HR/9, BB/9 and SO/9. In both the entire era and the

Dead Ball Era, SO/9 was excluded from both leagues’ regressions, however was included as an independent variable in each league’s regression during the Live Ball Era. The Negro League’s coefficient for SO/9 in the Live Ball Era is positive, suggesting that strikeouts caused ERA to increase for a pitcher. The results suggest that HR/9 was much more detrimental to ERA in the

Dead Ball Era than in the Live Ball Era and that BB/9 was generally more detrimental to ERA in the Negro Leagues than in the MLB.

Finally, a stepwise logistic regression was run using independent variables from batting, pitching and fielding data in order to create a model that can describe W% in terms of BA, OBP,

SLG, H/9, HR/9, BB/9, SO/9 and Fld% for each league in each era. As OBP was found to be the most important predictor of run scoring, it was expected and affirmed that OBP would receive the greatest coefficient of any offensive statistic (in all three of the Negro League logistic regressions, OBP was the only offensive statistic in the model). MLB models included SLG in every era and HR/9 in the total and Dead Ball Era. The results also showed that Fld% was also an important predictor in winning games and that giving up hits was more detrimental to W% than walking batters.

Completeness of Data

Certain data demonstrated that the Negro League database is incomplete. For one consider the two timeplots below of IP/G and Inn/G (Inn/G is a fielding statistic while IP/G is a pitching statistic.

Both of these charts show not only that data was incomplete in the Negro League

Database, but that it became more incomplete over time. Another statistic from the Negro

Leagues confirms this. GS% is a statistic that was used in this study to give the proportion of known games in the Negro Leagues for which the Negro League Database has data compiled and available.

This chart also demonstrates that incomplete data becomes more prevalent over the years of the study. The reason for this incomplete data coming at the end of the study is that the early years in this study mostly contains data from Cuban Winter and Summer Leagues. These were the premier leagues in the country, and thus results of the leagues were often published and meticulously compiled. In the early 1920s, on the other hand, most of the data comes from th Negro National League or NNL. These were the first few years of the NNL, and with Major

League Baseball very poplular in the United States, records of these leagues are not as readily available. While data is proportionally incomplete in the later years of this study, NNL seasons contained more teams and games than did Cuban League seasons. Because of this, data from the end of the study provided a lrger sample size, despite being wholly more incomplete.

While the seamheads.com Negro League Database will likely forever be incomplete, the contributors to the Database constantly research and update the database with new seasons, teams, players and games. The more data uploaded into the database, the more cohesive, well- rounded statistical look of the Negro Leagues is possible. With future updates, this study, as well as others like it, will undoubtedly continue. In the future the study will look into individual player and team data to determine the best individuals and teams to play in the Negro Leagues and what made them successful, along with players and teams they may have emulated.

References

Baseball Rules Chronology. (2006) Chronology: 1900-Present. [Online].

Available at: http://baseballlibrary.com (Accessed: 15 April 2013).

Figueredo, Jorge S. (2003), Cuban Baseball: A Statistical History, 1878–1961, Jefferson, North

Carolina: McFarland & Company, ISBN 078641250X

MLB History. (2013) MLB History. [Online]. Available at: http://baseball-reference.com

(Accessed: 1 August 2012).

The Negro League Database. (2013) The Negro League Database. [Online]. Available at:

http://www.seamheads.com (Accessed: 1 August 2012).

Official Rules: 10.00 The . (2013) Official Rules: 10.00 The Official Scorer.

[Online]. Available at: http://mlb.mlb.com (Accessed: 21 April 2013).

Acknowledgements

I would like to thank Dr. Joe Watkins for advising me and supervising my work on this project, while providing insight and suggestions for statistical analysis.

I would like to thank Eric New for advising me in compiling and writing the paper.

I would like to thank David Rockoff for advising me in compiling and writing the paper, as well as providing and explaining R code to me that was utilized in the study.

Finally, I would like to acknowledge Hadley Wickham, the creator of R package ggplot2, which was utilized in the development of heatmap graphics. Appendix 1 - Glossary of Leagues and Teams

The glossaries that follow will be organized in the following format:

League Abbreviation – League Name, Summer/Winter, Seasons with statistics that were used in the study Team Abbreviation – Team Name(s), Seasons with statistics from in this league that were used in the study

I. The Negro Leagues

AN – Asociación Nacional de Béisbol de la República de Cuba, Winter, 1915 AP – Alemendares Park, 1915 HP – Habana Park, 1915 SFP – San Francisco Park, 1915 ASM – La Temporada Americana: Major Leagues in Cuba, Winter, 1908-1913 ALM – Alemendares, 1908-1910, 1912, 1913 APK – Alemendares Park, 1911 AST – All Stars, 1909 BRG – Brooklyn Royal Giants, 1908 BRO* - Brooklyn Dodgers, 1913 CIN* – , 1908 DET* - , 1909, 1910 HAB – Habana, 1908, 1909, 1912, 1913 HPK – Havana Park, 1911 NY1* - New York Giants, 1911 PHA* - Philadelphia Athletics, 1910, 1912 PHI* - , 1911 ASN – La Temporada Americana: Negro League Clubs in Cuba, Winter, 1904-1908, 1910, 1912, 1914, 1915 ABC – Indianapolis ABCs, 1915 AC1 – , 1905 ALM – Almendares, 1905-1908, 1910, 1912, 1914-1915 AZU – Azul, 1904 BRG – Brooklyn Royal Giants, 1908 CTA – Carmelita, 1904 CXG – Cuban X Giants, 1904-1906 FE – Fe, 1905 HAB – Habana, 1904-1908, 1910, 1912, 1914, 1915 LEL – Leland Giants, 1910 NLG – New York , 1912 NLS – New York , 1914 NUC – Nuevo Criollo, 1904 PG – , 1907 SFP – San Francisco, 1915 EAS – Eastern Independent Clubs, Summer, 1905, 1910-1918, 1920 AC – Atlantic City Bacharach Giants, 1916-1918, 1920 AC1 – All Cubans, 1905 AMC – Almendares Cubans, 1915 BBS – Black Sox, 1920 BRG – Brooklyn Royal Giants, 1905, 1910-1914, 1916-1918, 1920 CBG – , 1911, 1912, 1914 CGB – Cuban Giants of Buffalo, 1913 CSE – New York Cuban Stars, 1916-1918, 1920 CSW – Cuban Stars of Havana, 1910, 1911 CXG – Cuban X Giants, 1905 GCE – Grand Central Red Caps, 1918 HAV – Havana Reds, 1915 HIL – Hilldale Club, 1917, 1918, 1920 JPC – Jersey City/Poughkeepsie Cubans, 1916 LBC – Long Branch Cubans, 1915 MOH – Schenectady Mohawk Giants, 1913, 1914 NBS – New York Black Sox, 1910 NCG – New York Colored Giants, 1914 NLG – New York Lincoln Giants, 1911-1918, 1920 NLS – New York Lincoln Stars, 1914-1916 PAS – Philadelphia All Stars, 1917 PG – Philadelphia Giants, 1905, 1910, 1911, 1913-1915 PRC – Red Caps, 1917-1918, 1920 SET – Patterson Smart Set, 1912-1913 ECL – Eastern Colored League, Summer, 1923, 1924 AC – Atlantic City Bacharach Giants, 1923, 1924 BBS – Baltimore Black Sox, 1923, 1924 BRG – Brooklyn Royal Giants, 1923, 1924 CSE – New York Cuban Stars, 1923, 1924 HBG – , 1924 HIL – Hilldale Club, 1923, 1924 NLG – New York Lincoln Giants, 1923, 1924 WP – Washington Potomacs, 1923, 1924 FHL – Florida Hotel League, Winter, 1915-1917 BRK – Breakers Hotel, 1915-1917 RP – , 1915-1917 IND – Independent Clubs, Summer, 1904, 1906, 1909 ABC – Indianapolis AC1 – All Cubans, 1904 BG – Birmingham Giants, 1909 BRG – Brooklyn Royal Giants, 1906, 1909, 1919 BUX – Buxton Wonders, 1909 CBG – Cuban Giants, 1909 CIU – Chicago Union Giants, 1904, 1906 CSE – New York Cuban Stars, 1919 CSW – Cuban Stars of Havana, 1906, 1909 CXG – Cuban X Giants, 1904, 1906 HVS – Havana, 1906 ILL – Giants, 1909 KCG – Kansas City Giants, 1909 KEY – Minneapolis Keystones, 1909 LEL – Chicago Leland Giants, 1906, 1909 NLG – New York Lincoln Giants, 1919 PG – Philadelphia Giants, 1904, 1906, 1909 QG – Philadelphia Quaker Giants, 1906 SBB – San Antonio Black Broncos, 1909 SPG – St. Paul Giants, 1909 LCA – Liga Cubana-Americana de Béisbol, Winter, 1916 ORI – , 1916 RS – Red Sox, 1916 WS – White Sox, 1916 LG – Liga General de Béisbol de la República de Cuba, Winter, 1905-1911 ALM – Almendares, 1905-1910 AMK – América Park, 1911 APK – Almendares Park, 1911 FE – Fe, 1905-1910 HAB – Habana, 1905-1910 HPK – Havana Park, 1911 MTZ – Matanzas, 1907, 1908 LH – Liga Habanera de Béisbol, Winter, 1904 ALM – Almendares, 1904 FE – Fe, 1904 HAB – Habana, 1904 LN – Liga Nacional de Béisbol de la República de Cuba, Winter, 1911-1914 ALM – Almendares, 1911-1914 FE – Fe, 1911-1914 HAB – Habana, 1911-1914 NAC – National Association of Colored Professional Clubs of the United States and Cuba, Summer, 1907, 1908 BRG – Brooklyn Royal Giants, 1907, 1908 CBG – Cuban Giants, 1907, 1908 CSW – Cuban Stars of Havana, 1907, 1908 PG – Philadelphia Giants, 1907, 1908 NNA – Negro National League Associated Clubs, Summer, 1921-1923 AC – Atlantic City Bacharach Giants, 1921 ALC – All Cubans, 1921 BBB – , 1923 BBS – Baltimore Black Sox, 1921, 1922 CTS – Tate Stars, 1921, 1923 HIL – Hilldale, 1921, 1922 MRS – , 1923 NYB – New York Bacharach Giants, 1922 PK – Keystones, 1921 NNL – Negro National League, Summer, 1904-1924 ABC – Indianapolis ABCs, 1920-1924 BBB – Birmingham Black Barons, 1924 CAG – , 1920-1924 CBN – Cleveland Browns, 1924 COB – Columbus Buckeyes, 1921 COG – Chicago Giants, 1920, 1921 CS – Cincinnati Cuban Stars, 1921 CSW – Cuban Stars of Havana, 1920, 1922-1924 CTS – Cleveland Tate Stars, 1922 DM – , 1920 DS – , 1920-1924 KCM – Kansas City Monarchs, 1920-1924 MB – Milwaukee Bears, 1923 MRS – Memphis Red Sox, 1924 PK – Pittsburgh Keystones, 1922 SLG – St. Louis Giants, 1920 SLS – St. Louis Stars, 1922-1924 TT – Toledo Tigers, 1923 NOR – Northern Independent Clubs, Summer, 1908 KEY – Minneapolis Keystones, 1908 SPG – St. Paul Gophers, 1908 NYI – New York Independent Clubs, Summer, 1919, 1921, 1922 BRG – Brooklyn Royal Giants, 1919, 1921, 1922 CSE – New York Cuban Stars, 1919, 1921, 1922 NLG – New York Lincoln Giants, 1919, 1921, 1922 PV – Premio de Verano, Summer, 1904-1908 ALE – Alerta, 1905, 1906 AZU – Azul, 1904-1908 COL – Columbia, 1907 CTA – Carmelita, 1904, 1908 EMI – Eminencia, 1905 PUN – Punzó, 1904 ROJ – Rojo, 1906-1908 SFO – San Francisco, 1907 SOU – Negro Championship of the South, Summer, 1907, 1908 BRG – Birmingham Giants, 1907, 1908 SBB – San Antonio Black Broncos, 1907, 1908 WES – Western Independent Clubs, Summer, 1907, 1908, 1910-1918 ABC – Indianapolis ABCs, 1907, 1908, 1910-1918 AC2 – All Cubans, 1911 ALL – Kansas City , 1916, 1917 BAS – Brooklyn All Stars, 1914 CAG – Chicago American Giants, 1911-1918 CBS – Chicago Black Sox, 1915 CIU – Chicago Union Giants, 1907, 1908, 1913, 1914, 1916, 1917 COG – Chicago Giants, 1910-1917 CSW – Cuban Stars of Havana, 1912-1918 DM – Dayton Marcos, 1918 FLP – French Lick Plutos, 1912-1914 IBA – Bowser’s ABCs of Indianapolis, 1916 JBC – Jewell’s ABCs of Indianapolis, 1917 KCG – Kansas City Giants, 1910, 1911 KEY – Minneapolis Keystones, 1910, 1911 KRG – Kansas City Royal Giants, 1910 LEL – Chicago Leland Giants, 1907, 1908, 1910, 1911 LWS – Louisville White Sox, 1914, 1915 OKM – Oklahoma Monarchs, 1910 SC1 – Stars of Cuba, 1910 SLG – St. Louis Giants, 1911-13, 1915-1917 SPG – St. Paul Gophers, 1907, 1910 TAS – Texas All Stars, 1917 TCG – Twin City Gophers, 1911 WBS – West Baden Sprudels, 1910-1915 WS – Colored World Series, Summer, 1924 HIL – Hilldale Club, 1924 KCM – Kansas City Monarchs, 1924 *Denotes that the team is a Major League team playing against Cuban teams in the ASM

II. The Major Leagues

AL – American League, Summer, 1904-1924 BOS – Boston Americans, 1904-1907 , 1908-1924 CHW – , 1904-1924 CLE – Cleveland Naps, 1904-1914 , 1915-1924 DET – Detroit Tigers, 1904-1924 NYY – , 1904-1924 PHA – Philadelphia Athletics, 1904-1924 SLB – St. Louis Browns, 1904-1924 WSH – Washington Senators, 1904-1924 FL – Federal League, 1914, 1915 BAL – , 1914, 1915 BTT – Brooklyn Tip-Tops, 1914, 1915 BUF – Buffalo Buffeds, 1914 , 1915 CHI – Chicago Chi-Feds, 1914 , 1915 IND – Indianapolis Hoosiers, 1914 KCP – Kansas City Packers, 1914, 1915 NEW – Newark Pepper, 1915 PBS – , 1914, 1915 SLM – St. Louis Terriers, 1914, 1915 NL – National League BRO – Brooklyn Superbas, 1904-1910, 1913 Brooklyn Dodgers, 1911, 1912 Brooklyn Robins, 1914-1924 BSN – Boston Beaneaters, 1904-1906 Boston Doves, 1907-1910 Boston Rustlers, 1911 Boston Braves, 1912-1924 CHC – , 1904-1924 CIN – Cincinnati Reds, 1904-1924 NYG – New York Reds, 1904-1924 PHI – Philadelphia Phillies, 1904-1924 PIT – , 1904-1924 STL – St. Louis Cardinals, 1904-1924

Appendix 2 - Glossary of Statistics

The glossaries that follow will be organized in the following format:

Stat Abbreviation – Stat Name Stat definition or formula

I. Count Data

A. Batting

AB – At bats

The number of times a player completes a turn batting and does not

(i) Reach base by or walk

(ii) Hit a sacrifice hit or

(iii) Reach base on or

BB –

The number of times a player reaches base on a walk

CS** – Caught Stealing

The number of times a runner is put out while attempting to steal

G – Games

The number of games played

GIDP** - Ground into Double Plays

The number of ground balls that resulted in a

H – Hits

The number of times a batter reaches base safely unaided by an error or fielder’s

choice on a in play

HBP* – Hit by pitches

The number of times a batter reaches base upon being hit by a pitch HR – Home runs

The number of times a batter scores safely unaided by an error or fielder’s

choice on a batted ball in play

PA – Plate Appearances

The number of times a player completes a turn batting

R – Runs

The number of runs scored

RBIᵠ – Runs batted in

The number of runs that result from the result of a batter’s , unless

(i) An error allows the run score that would not have scored without the error

(ii) The result of the plate appearance is a double play or an error while

attempting a double play

SB – Stolen Bases

The number of time a base runner advances one base unaided by a hit, , error,

force-out, fielder’s choice, , or

SHǂ – Sacrifice Hits

The number times a batter bunted to safely advance a runner to the next base or hits a fly

ball that resulted in a base runner scoring safely

SFǂ – Sacrifice Flies

The number of times a batter hits a fly ball that resulted in a base runner scoring safely

SO*** - Strikeouts

The number of times a batter is put out by

(i) A third strike (ii) A foul on the third strike┼

2B – Doubles

The number of times a batter reaches second base safely unaided by an error or fielder’s

choice on a batted ball in play

3B – Triples

The number of times a batter reaches third base safely unaided by an error or fielder’s

choice on a batted ball in play

*Hit by pitches data missing for the 1924 Negro League season

**Denotes incomplete data in both leagues over the 1904-1924 time span

***Denotes incomplete data for the Negro Leagues over the 1904-1924 time span

ᵠRBIs was not an official statistic in the MLB until 1920

ǂSacrifice flies not recorded as a separated statistic from sacrifice hits until 1954 in the MLB.

From 1909-1916, some Negro League teams separated the statistic\

┼Constraint (ii) not added in MLB until 1909

B. Pitching

BB – Base on balls allowed

The number of times a pitcher allows a base runner on a walk

BF – Batters faced

The number of batters faced by a pitcher

CG – Complete games

The number of times a pitcher pitches the entirety of a game

ER – Earned runs The number of runs allowed credited to the pitcher, the number of runs a pitcher would

have allowed if there were no errors or passed balls committed in any inning

G – Games

The number of games played

GS** –

The numbers of games started by a pitcher

H –

The number of times a pitcher allows a batter to reach base safely unaided by an error or

fielder’s choice on a batted ball in play

HBP – Hit by pitches allowed

The number of times a pitcher allows a batter to reach base upon being hit by a pitch

HR* – Home runs allowed

The number of times a pitcher allows a batter to score safely on a batted ball in play

IP – Innings pitched

The total number of innings pitched by a pitcher, one out corresponds with one-third of

an inning pitched

L – Losses

The number of games a team lost

R – Runs allowed

The number of runs allowed by a pitcher

SHO – Shutouts

The number of times a team’s pitchers allow no runs in a game

SO – Strikeouts The number of times a pitcher puts out a batter by

(i) A third strike

(ii) A foul bunt on the third strike┼

SVǂ – Saves

The number of times the non-winning pitcher finishes a game won by his team, and

(i) He pitches at least one inning with a lead of no more than three runs

(ii) He enters the game with the potential tying run on base, at bat or on deck

(iii) He pitches at least three innings

W – Wins

The number of times a team wins

WP – Wild pitches

The number of times a pitch cannot be handled with ordinary effort by the and the

ball gets passed the catcher

*Home runs allowed data missing for the 1924 Negro League season

**For the MLB, GS refers to the number of times the announced actually threw

the first pitch of the game for his team. In the Negro League Database, GS refers to the

number of games for which pitching data is available

┼Constraint (ii) not added in MLB until 1909

ǂSaves were not an official statistic in the MLB until 1969

C. Fielding*

A – Assists

The number of times a fielder to contributes to putting out a batter by throwing or

deflecting the ball DP – Double Plays

The number of times a team records two outs on the result of a single pitch

E – Errors

The number of times a fielder’s action assisted the team batting

G – Games

The number of games played

Inn – Innings

The total number of innings played in the field, one out corresponds with one-third of

an inning

PB – Passed balls

The number of times the catcher’s action allowed one or more base runners to advance

PO – Putouts

The number of times a fielder’s action is the last action to cause the out of a batter

SB* – Stolen bases allowed

The number of times a team allowed their opponents to steal a base

*All fielding data missing for the 1924 Negro League season

**Stolen bases allowed data missing for the 1923 Negro League season

II. Calculable Data

A. Batting

BA – Batting average

= 퐻 BB% - 퐵퐴Walk percentage퐴퐵

% = 퐵퐵 퐵퐵 푃퐴 HBP/G* – Hit by pitches per game

HBP/G = 퐻퐵푃 퐺 HR/AB – Home runs per at bat

HR/AB = 퐻푅 퐴퐵 OBP** – On base percentage

= 퐻+퐵퐵+퐻퐵푃 퐴퐵+퐵퐵+퐻퐵푃+푆퐹 R/G – Runs푂퐵푃 per game

R/G = 푅 퐺 RBI% - RBI percentage

% = 푅퐵퐼 SB/G –푅퐵퐼 Stolen bases푅 per game

SB/G = 푆퐵 퐺 SH/G – Sacrifice hits per game

SH/G = 푆퐻 퐺 SLG – Slugging percentage

= 푇퐵 퐴퐵 TB – Total푆퐿퐺 Bases

= 1 + 2 2 + 3 3 + 4 = + 2 + 2 3 + 3

1B – Singles푇퐵 퐵 ∗ 퐵 ∗ 퐵 ∗ 퐻푅 퐻 퐵 ∗ 퐵 ∗ 퐻푅

1 = 2 3

2B/AB –퐵 Doubles퐻 − per퐵 at− bat퐵 − 퐻푅 2B/AB = 2퐵 퐴퐵 3B/AB – Triples per at bat

3B/AB = 3퐵 *HBP/G data not퐴퐵 available for the 1924 Negro League season

**OBP uses zero for HBP and SF when that data is not available

B. Pitching

BB/9 – Walks allowed per nine innings

BB/9 = 9 퐵퐵 퐼푃 BB%ǂ -

% = 퐵퐵 BF/9 – 퐵퐵Batters faced퐵퐹 per nine innings

BF/9 = 9 퐵퐹 퐼푃 CG% - Complete ∗game percentage

% = 퐶퐺 퐺푆 ERA – 퐶퐺Earned

= 9 퐸푅 퐼푃 ER% - 퐸푅퐴Earned run∗ percentage

% = 퐸푅 GS%***퐸푅 - Games푅 started percentage

% = 퐺푆 퐺 H/9 – Hits퐺푆 allowed per nine innings H/9 = 9 퐻 퐼푃 H%ǂ - Hit percentage∗

% = 퐻 HBP/9 퐻– Hit by퐵퐹 pitches allowed per nine innings

HBP/9 = 9 퐻퐵푃 퐼푃 HBP%ǂ - Hit by pitch∗ percentage

% = 퐻퐵푃 HR/9* 퐻퐵푃– Home runs퐵퐹 allowed per nine innings

HR/9 = 9 퐻푅 퐼푃 HR%ǂ - Home run∗ percentage

% = 퐻푅 퐵퐹 IP/G**퐻푅 – Innings pitched per game

IP/G = 퐼푃 퐺 R/9 – Runs allowed per nine innings

R/9 = 9 푅 퐼푃 SHO%** - ∗ percentage

% = 푆퐻푂 퐺 SO/BB 푆퐻푂– Strikeouts per walk

SO/BB = 푆푂 SO/9 – Strikeouts퐵퐵 per nine innings

SO/9 = 9 푆푂 퐼푃 ∗ SO%ǂ - Strikeout percentage

% = 푆푂 SV%**푆푂 - percentage퐵퐹

% = 푆푉 퐺 WHIP 푆푉– Walks plus hits over innings pitched

= 퐵퐵+퐻 퐼푃 WP/9 –푊퐻퐼푃 Wild pitches per nine innings

WP/9 = 9 푊푃 퐼푃 W% -

% = 푊 푊+퐿 *HR/9 data푊 not available for the 1924 Negro League season

**Substitute G with GS for Negro League calculations

***GS% only used for Negro Leagues as a measure of completeness of data

ǂUsed as replacements for “per nine” statistics for chi square tests

C. Fielding

DP/9 – Double plays per nine innings

DP/9 = 9 퐷푃 E/9 – Errors per퐼푛푛 nine∗ innings

E/9 = 9 퐸 Fld% - Fielding퐼푛푛 percentage∗

% = 푃푂+퐴 푃푂+퐴+퐸 Inn/G –퐹푙푑 Innings per game Inn/G = 퐼푛푛 퐺 Fld% - Fielding percentage

% = 푃푂+퐴 푃푂+퐴+퐸 PB/9 – 퐹푙푑Passed balls per nine innings

PB/9 = 9 푃퐵 퐼푛푛 RF – Range Factor∗

= 푃푂+퐴 퐼푛푛 SB/9* –푅퐹 Stolen bases per nine innings

SB/9 = 9 푆퐵 *SB/9 data not 퐼푛푛available∗ for the 1924 Negro League season

Appendix 3 – R Code

I. T-test and Chi Square Heatmaps

### Takes log10 of two-sided p-value ### Windsorizes at 3 ### Makes logp positive if Negro Leagues higher ### Bluer = Major Leagues higher

library(ggplot2) library(reshape2)

heatmap <- read.csv("TABLE_TITLE.csv") ##Use log10 of p-value, signed names(heatmap) <- c('Year','stat_1','stat_2','stat_3',...) meltmap <- melt(heatmap,id="Year") meltmap$value <- as.numeric(meltmap$value) meltmap$value[meltmap$value>3 & !is.na(meltmap$value)] <- 3 ###Cutoff at 3 meltmap$value[meltmap$value< (-3) & !is.na(meltmap$value)] <- -3 p1 <- qplot(x=Year, y=variable, data=meltpitch, fill=value,geom="tile",ylab="",xlab="",main="MAP_TITLE") p2 <- p1 + scale_fill_gradient2(high = "red", low = "blue",mid="gray95") p2

“TABLE_TITLE.csv” refers to the name of the file that contains the log10(p-values) data, converted to a CSV file. ‘stat_1’, ‘stat_2’, etc. refers to the names of the statistics being analyzed.

II. Linear Regression

data <- read.csv("TABLE_TITLE.csv", header=T) attach(data) reg1 <- lm(dVar ~ iVar_1 + iVar_2 + iVar_3 + ...) step(reg1) reg2 <- lm(dVar ~ bestVar_1 + bestVar_2 + bestVar_3 + ...) summary(reg2)

“TABLE_TITLE.csv” refers to the name of the file that contains the dependent and independent variable data converted to a CSV file. “dVar” refers to the dependent variable, and

“iVar_1”, “iVar_2”, etc. refers to the independent variables. “bestVar_1”, “bestVar_2”, etc. refers to the variables that the step() function identified as the most reliable predictors of the dependent variable.

III. Logistic Regression data <- read.csv("TABLE_TITLE.csv", header=T) attach(data) reg1 <- glm(dVar ~ iVar_1 + iVar_2 + iVar_3 + ...) step(reg1) reg2 <- glm(dVar ~ bestVar_1 + bestVar_2 + bestVar_3 + ...) summary(reg2)

“TABLE_TITLE.csv” refers to the name of the file that contains the dependent and independent variable data converted to a CSV file. “dVar” refers to the dependent variable, and

“iVar_1”, “iVar_2”, etc. refers to the independent variables. “bestVar_1”, “bestVar_2”, etc. refers to the variables that the step() function identified as the most reliable predictors of the dependent variable.

Appendix 4 – Tables

I. MLB Batting Data

MLB Season RBI% BB% 2B/AB 3B/AB HR/AB HBP/G SH/G SB/G BA OBP SLG TB R/G 1904 0.8160 0.0612 0.0346 0.0140 0.0040 0.311 0.888 1.113 0.247 0.301 0.321 26515 3.72 1905 0.8237 0.0675 0.0349 0.0137 0.0041 0.331 1.012 1.191 0.248 0.307 0.323 26410 3.89 1906 0.8178 0.0688 0.0329 0.0125 0.0033 0.296 1.099 1.220 0.247 0.306 0.314 25165 3.61 1907 0.8142 0.0680 0.0308 0.0120 0.0030 0.300 1.055 1.128 0.245 0.305 0.309 24823 3.52 1908 0.8105 0.0647 0.0313 0.0124 0.0033 0.310 1.298 1.094 0.239 0.297 0.305 24605 3.38 1909 0.8099 0.0715 0.0330 0.0124 0.0032 0.308 1.255 1.231 0.244 0.306 0.311 25099 3.54 1910 0.8205 0.0799 0.0345 0.0142 0.0044 0.317 1.209 1.307 0.249 0.318 0.326 26550 3.83 1911 0.8349 0.0835 0.0397 0.0161 0.0062 0.341 1.166 1.376 0.266 0.336 0.357 29367 4.51 1912 0.8371 0.0824 0.0409 0.0165 0.0054 0.291 1.122 1.379 0.269 0.337 0.359 29428 4.53 1913 0.8383 0.0792 0.0378 0.0156 0.0058 0.284 1.030 1.317 0.259 0.325 0.345 28031 4.04 1914 0.8412 0.0801 0.0377 0.0142 0.0058 0.264 1.114 1.216 0.254 0.321 0.337 41362 3.86 1915 0.8430 0.0803 0.0372 0.0145 0.0052 0.272 1.191 1.101 0.25 0.318 0.332 40435 3.81 1916 0.8458 0.0764 0.0366 0.0139 0.0047 0.257 1.134 1.104 0.248 0.312 0.326 26711 3.56 1917 0.8445 0.0745 0.0355 0.0139 0.0041 0.232 1.235 0.968 0.249 0.311 0.324 26581 3.59 1918 0.8399 0.0756 0.0345 0.0131 0.0035 0.227 1.204 0.986 0.254 0.317 0.325 21883 3.63 1919 0.8566 0.0712 0.0391 0.0140 0.0060 0.237 1.216 0.931 0.263 0.322 0.348 25983 3.88 1920 0.8677 0.0720 0.0429 0.0150 0.0075 0.227 1.233 0.697 0.276 0.335 0.372 31304 4.36 1921 0.8966 0.0716 0.0467 0.0160 0.0110 0.224 1.215 0.605 0.291 0.348 0.403 34302 4.85 1922 0.8947 0.0754 0.0462 0.0146 0.0124 0.242 1.195 0.586 0.288 0.348 0.401 34218 4.87 1923 0.8987 0.0786 0.0460 0.0134 0.0115 0.245 1.108 0.635 0.284 0.347 0.391 33364 4.81 1924 0.9053 0.0772 0.0482 0.0139 0.0106 0.225 1.098 0.610 0.287 0.348 0.394 33390 4.76 TOTAL 0.8482 0.0746 0.0382 0.0141 0.0060 0.274 1.146 1.045 0.260 0.322 0.344 6E+05 4.02 Dead Ball 0.8317 0.0745 0.0358 0.0140 0.0046 0.286 1.139 1.169 0.252 0.315 0.329 4E+05 3.81 Live Ball 0.8930 0.0750 0.0460 0.0146 0.0106 0.233 1.170 0.627 0.285 0.345 0.392 2E+05 4.73 II. Negro League Batting Data

Negro Leagues Season RBI% BB% 2B/AB 3B/AB HR/AB HBP/G SH/G SB/G BA OBP SLG TB R/G 1904 0.5714 0.0739 0.0162 0.0077 0.0026 0.560 1.071 1.946 0.202 0.275 0.241 1476 4.64 1905 0.6021 0.0707 0.0153 0.0080 0.0022 0.406 1.383 2.089 0.208 0.275 0.246 1476 4.19 1906 0.6880 0.0785 0.0209 0.0101 0.0034 0.459 1.267 1.808 0.230 0.302 0.281 2481 4.29 1907 0.6969 0.0847 0.0186 0.0081 0.0034 0.401 1.415 1.813 0.227 0.304 0.272 2379 4.54 1908 0.7527 0.0745 0.0252 0.0111 0.0031 0.393 1.121 1.431 0.225 0.293 0.282 3351 4.14 1909 0.7310 0.0663 0.0273 0.0128 0.0053 0.304 1.107 1.206 0.233 0.291 0.301 2145 4.36 1910 0.7518 0.0801 0.0312 0.0099 0.0047 0.402 1.277 1.459 0.239 0.310 0.304 2915 4.29 1911 0.8054 0.0833 0.0314 0.0132 0.0054 0.335 1.110 1.774 0.257 0.327 0.331 3547 4.64 1912 0.7937 0.0834 0.0316 0.0123 0.0034 0.338 1.248 1.656 0.267 0.336 0.333 3468 4.92 1913 0.8069 0.0767 0.0336 0.0114 0.0038 0.435 1.187 1.616 0.263 0.329 0.330 3409 4.39 1914 0.8215 0.0825 0.0381 0.0098 0.0062 0.358 1.035 1.740 0.262 0.331 0.338 4468 4.76 1915 0.7939 0.0893 0.0317 0.0092 0.0051 0.312 1.016 1.478 0.254 0.328 0.319 5198 4.71 1916 0.8362 0.0869 0.0350 0.0114 0.0040 0.299 1.120 1.122 0.247 0.321 0.317 3925 4.34 1917 0.8467 0.0851 0.0331 0.0113 0.0025 0.256 1.247 0.859 0.241 0.314 0.304 3058 3.93 1918 0.8097 0.0842 0.0293 0.0134 0.0042 0.257 1.294 0.673 0.265 0.334 0.333 2364 4.84 1919 0.8482 0.0902 0.0345 0.0139 0.0084 0.346 1.220 1.028 0.251 0.328 0.338 2780 4.74 1920 0.8275 0.0798 0.0338 0.0114 0.0048 0.283 1.307 0.892 0.251 0.319 0.322 6495 4.38 1921 0.8379 0.0729 0.0381 0.0159 0.0094 0.378 1.324 1.115 0.270 0.333 0.368 10985 5.23 1922 0.8690 0.0770 0.0424 0.0142 0.0122 0.284 1.332 0.967 0.277 0.340 0.384 8647 5.43 1923 0.8784 0.0728 0.0404 0.0136 0.0124 0.246 1.273 0.806 0.278 0.337 0.383 12165 5.47 1924 0.8705 0.0751 0.0388 0.0141 0.0101 1.026 0.773 0.271 0.327 0.369 12145 5.26 TOTAL 0.8160 0.0786 0.0339 0.0123 0.0070 0.336 1.207 1.213 0.257 0.323 0.336 98877 4.81 Dead Ball 0.7731 0.0816 0.0294 0.0108 0.0043 0.359 1.177 1.474 0.244 0.315 0.309 48440 4.49 Live Ball 0.8596 0.0751 0.0389 0.0140 0.0100 0.299 1.241 0.905 0.271 0.332 0.368 50437 5.20

III. MLB Pitching Data

MLB Season IP/G WHIP H/9 HR/9 BB/9 SO/9 SO/BB CG% ER% WP/9 BF/9 R/9 HBP/9 SHO% SV% 1904 8.81 1.186 8.329 0.135 2.349 3.806 1.62 0.8758 0.6989 0.208 36.624 3.808 0.328 0.100 0.016 1905 8.88 1.205 8.279 0.138 2.568 3.918 1.53 0.7987 0.7136 0.222 36.423 3.952 0.340 0.091 0.018 1906 8.84 1.196 8.169 0.108 2.595 3.772 1.45 0.7783 0.7215 0.192 36.297 3.683 0.319 0.113 0.035 1907 8.85 1.19 8.107 0.101 2.603 3.586 1.38 0.7413 0.6966 0.193 36.083 3.592 0.320 0.114 0.036 1908 8.97 1.137 7.762 0.108 2.468 3.675 1.49 0.6740 0.6966 0.190 35.724 3.397 0.310 0.116 0.043 1909 8.95 1.184 7.958 0.105 2.700 3.790 1.4 0.6535 0.7087 0.229 35.894 3.570 0.311 0.110 0.046 1910 8.96 1.243 8.177 0.145 3.012 3.938 1.31 0.6209 0.7177 0.217 36.139 3.857 0.324 0.093 0.050 1911 8.91 1.352 8.939 0.210 3.232 4.036 1.25 0.5812 0.7387 0.220 37.371 4.555 0.342 0.068 0.055 1912 8.88 1.355 9.046 0.182 3.152 4.025 1.28 0.5808 0.7346 0.230 37.726 4.586 0.296 0.065 0.054 1913 8.94 1.285 8.575 0.192 2.986 3.858 1.29 0.5349 0.7539 0.210 37.056 4.064 0.282 0.079 0.067 1914 8.91 1.261 8.355 0.191 2.993 4.016 1.34 0.5500 0.7449 0.215 36.743 3.903 0.269 0.091 0.063 1915 8.94 1.248 8.226 0.171 3.004 3.812 1.27 0.5518 0.7560 0.198 36.582 3.839 0.273 0.093 0.064 1916 9.03 1.217 8.120 0.153 2.834 3.808 1.34 0.5269 0.7651 0.200 36.378 3.553 0.250 0.093 0.073 1917 9.02 1.21 8.139 0.134 2.755 3.456 1.25 0.5533 0.7509 0.166 36.436 3.572 0.231 0.099 0.065 1918 9.05 1.239 8.350 0.115 2.803 2.875 1.03 0.6299 0.7664 0.146 37.057 3.611 0.224 0.096 0.056 1919 8.99 1.272 8.772 0.200 2.677 3.075 1.15 0.5818 0.7913 0.167 37.022 3.874 0.239 0.085 0.045 1920 9.02 1.352 9.410 0.255 2.759 2.939 1.07 0.5652 0.7944 0.168 37.644 4.352 0.223 0.074 0.058 1921 8.95 1.435 10.125 0.383 2.792 2.848 1.02 0.5179 0.8268 0.147 39.104 4.876 0.230 0.049 0.072 1922 8.92 1.439 10.010 0.430 2.940 2.825 0.96 0.5008 0.8287 0.155 39.170 4.905 0.248 0.059 0.063 1923 8.96 1.439 9.858 0.399 3.095 2.855 0.92 0.4959 0.8249 0.144 39.183 4.835 0.247 0.050 0.068 1924 8.91 1.439 9.932 0.368 3.016 2.717 0.9 0.4870 0.8432 0.139 39.112 4.800 0.225 0.056 0.071 TOTAL 8.94 1.279 8.679 0.201 2.834 3.532 1.25 0.6068 0.7593 0.190 37.108 4.050 0.278 0.086 0.054 Dead Ball 8.93 1.237 8.325 0.151 2.809 3.739 1.33 0.6346 0.7355 0.201 36.593 3.841 0.290 0.094 0.050 Live Ball 8.95 1.421 9.866 0.367 2.920 2.837 0.97 0.5134 0.8241 0.151 38.840 4.753 0.234 0.057 0.066

IV. Negro League Pitching Data

Negro Leagues Season IP/G WHIP H/9 HR/9 BB/9 SO/9 SO/BB CG% ER% WP/9 BF/9 R/9 HBP/9 SHO% SV% 1904 8.82 1.077 6.847 0.089 2.846 4.012 1.41 0.8641 0.4766 0.183 38.497 4.739 0.572 0.054 0.016 1905 8.96 1.072 6.973 0.073 2.678 3.671 1.37 0.8833 0.5113 0.145 37.923 4.212 0.407 0.089 0.006 1906 8.88 1.195 7.734 0.114 3.021 3.273 1.08 0.8346 0.6223 0.145 38.399 4.347 0.461 0.060 0.004 1907 8.69 1.201 7.553 0.114 3.259 3.335 1.02 0.8088 0.6202 0.198 38.402 4.702 0.415 0.048 0.011 1908 8.80 1.151 7.538 0.104 2.816 3.896 1.38 0.7967 0.6545 0.101 37.804 4.230 0.405 0.091 0.025 1909 8.85 1.155 7.872 0.181 2.519 4.544 1.80 0.7944 0.6198 0.081 37.867 4.426 0.309 0.079 0.019 1910 8.85 1.209 7.862 0.158 3.020 4.047 1.34 0.7879 0.6837 0.106 37.714 4.352 0.407 0.057 0.010 1911 8.84 1.305 8.570 0.181 3.175 4.751 1.50 0.6942 0.6923 0.118 38.129 4.735 0.342 0.061 0.021 1912 8.84 1.360 9.011 0.110 3.231 4.612 1.43 0.6465 0.6714 0.149 38.684 5.015 0.344 0.057 0.025 1913 8.87 1.312 8.866 0.128 2.942 4.330 1.47 0.6935 0.6839 0.095 38.386 4.441 0.442 0.100 0.032 1914 8.72 1.347 8.923 0.211 3.203 4.446 1.39 0.7100 0.7144 0.137 38.822 4.913 0.369 0.093 0.023 1915 8.66 1.330 8.539 0.169 3.431 4.415 1.29 0.7083 0.6905 0.107 38.486 4.850 0.322 0.058 0.030 1916 8.65 1.292 8.295 0.136 3.337 4.402 1.32 0.7292 0.7079 0.106 38.421 4.519 0.312 0.081 0.029 1917 8.65 1.260 8.082 0.077 3.260 4.353 1.34 0.7468 0.7439 0.047 38.295 4.086 0.267 0.106 0.029 1918 8.72 1.370 9.039 0.145 3.288 3.726 1.13 0.7009 0.7217 0.063 39.056 4.989 0.265 0.047 0.019 1919 8.93 1.328 8.452 0.283 3.496 4.206 1.20 0.6545 0.7161 0.033 38.744 4.779 0.348 0.061 0.061 1920 8.80 1.280 8.455 0.162 3.061 4.371 1.43 0.6846 0.7488 0.094 38.369 4.476 0.289 0.062 0.038 1921 8.70 1.356 9.347 0.321 2.855 4.240 1.49 0.6512 0.7560 0.079 39.183 5.358 0.384 0.054 0.033 1922 8.71 1.410 9.641 0.424 3.045 4.365 1.43 0.6272 0.7938 0.130 39.565 5.612 0.294 0.040 0.036 1923 8.65 1.405 9.762 0.434 2.881 3.779 1.31 0.5966 0.7367 0.143 39.570 5.678 0.255 0.047 0.052 1924 8.71 1.371 9.591 2.749 4.117 1.50 0.5884 0.7756 0.098 39.481 5.431 0.221 0.041 0.037 TOTAL 8.75 1.310 8.766 0.224 3.024 4.162 1.38 0.6894 0.7153 0.111 38.781 4.944 0.331 0.061 0.031 Dead Ball 8.78 1.261 8.220 0.145 3.130 4.176 1.33 0.7444 0.6698 0.112 38.358 4.596 0.368 0.072 0.023 Live Ball 8.71 1.368 9.413 0.346 2.899 4.145 1.43 0.6248 0.7616 0.109 39.281 5.356 0.287 0.048 0.040

V. MLB Fielding Data

MLB Season Inn/G Fld% RF E/9 DP/9 PB/9 SB/9 1904 8.81 0.955 4.48 1.924 0.572 0.136 1.233 1905 8.88 0.956 4.45 1.858 0.589 0.162 1.268 1906 8.84 0.958 4.48 1.758 0.586 0.148 1.344 1907 8.85 0.958 4.49 1.755 0.634 0.136 1.155 1908 8.97 0.959 4.47 1.712 0.514 0.114 1.173 1909 8.95 0.957 4.45 1.817 0.638 0.103 1.239 1910 8.96 0.958 4.48 1.784 0.696 0.124 1.360 1911 8.91 0.956 4.45 1.854 0.674 0.139 1.473 1912 8.88 0.956 4.46 1.836 0.710 0.127 1.434 1913 8.94 0.96 4.45 1.649 0.697 0.112 1.364 1914 8.91 0.959 4.45 1.725 0.696 0.139 1.270 1915 8.94 0.962 4.45 1.587 0.709 0.118 1.150 1916 9.03 0.964 4.45 1.512 0.743 0.128 1.187 1917 9.02 0.964 4.45 1.498 0.774 0.114 0.995 1918 9.05 0.964 4.49 1.499 0.781 0.096 0.991 1919 8.99 0.966 4.47 1.429 0.713 0.093 0.994 1920 9.02 0.966 4.47 1.412 0.795 0.078 0.698 1921 8.95 0.966 4.45 1.395 0.888 0.075 0.608 1922 8.92 0.968 4.42 1.309 0.913 0.073 0.592 1923 8.96 0.967 4.42 1.359 0.932 0.092 0.691 TOTAL 8.94 0.961 4.46 1.636 0.712 0.116 1.117 Dead Ball 8.93 0.959 4.46 1.700 0.671 0.125 1.229 Live Ball 8.96 0.967 4.44 1.369 0.882 0.080 0.647

VI. Negro League Fielding Data

Negro Leagues Season Inn/G Fld% RF E/9 DP/9 PB/9 SB/9 1904 8.37 0.882 4.41 5.33 0.83 0.19 2.02 1905 8.42 0.901 4.45 4.42 1.27 0.22 2.07 1906 8.44 0.920 4.46 3.47 1.34 0.14 1.85 1907 7.87 0.915 4.47 3.72 1.49 0.23 1.93 1908 8.26 0.931 4.40 2.92 1.28 0.15 1.46 1909 8.40 0.927 4.40 3.10 1.25 0.13 1.26 1910 8.18 0.935 4.51 2.83 1.59 0.09 1.49 1911 7.98 0.940 4.43 2.57 1.60 0.10 1.60 1912 7.92 0.938 4.43 2.66 1.63 0.14 1.65 1913 8.13 0.944 4.47 2.37 1.76 0.11 1.66 1914 8.00 0.947 4.48 2.27 1.69 0.15 1.74 1915 7.85 0.944 4.47 2.40 1.87 0.16 1.52 1916 7.96 0.949 4.49 2.16 1.77 0.13 1.08 1917 8.13 0.956 4.45 1.84 1.82 0.07 0.89 1918 8.03 0.944 4.43 2.38 1.75 0.05 0.70 1919 8.07 0.951 4.48 2.09 1.77 0.07 1.01 1920 8.03 0.952 4.45 2.03 1.42 0.05 0.92 1921 7.82 0.946 4.46 2.27 1.59 0.09 1.13 1922 7.84 0.953 4.42 1.98 1.72 0.11 0.99 1923 7.64 0.951 4.51 2.09 1.79 0.07 TOTAL 7.99 0.941 4.46 2.51 1.61 0.11 1.34 Dead Ball 8.09 0.936 4.46 2.76 1.59 0.13 1.48 Live Ball 7.83 0.950 4.46 2.11 1.63 0.08 1.03

VII. T-test Tables

a. Batting

TOTAL 1904-1924 (21 Seasons) STAT Negro > MLB MLB > Negro p<0.1 p<0.05 p<0.01 p<0.1 p<0.05 p<0.01 R/G 2 5 8 0 0 0 BA 0 1 0 1 1 6 OBP 0 1 0 1 3 3 SLG 0 0 0 2 2 4 RBI% 0 0 0 1 4 12 BB% 2 2 3 1 0 0 2B/AB 0 0 0 0 3 8 3B/AB 0 0 0 2 2 6 HR/AB 0 1 0 1 3 0 HBP/G* 0 3 4 0 0 0 SH/G 0 1 1 0 0 0 SB/G 2 3 5 0 1 0

DEAD BALL ERA 1904-1919 (16 Seasons) STAT Negro > MLB MLB > Negro p<0.1 p<0.05 p<0.01 p<0.1 p<0.05 p<0.01 R/G 1 4 8 0 0 0 BA 0 1 0 0 1 2 OBP 0 1 0 1 0 2 SLG 0 0 0 1 1 2 RBI% 0 0 0 0 4 8 BB% 2 2 3 0 0 0 2B/AB 0 0 0 0 2 5 3B/AB 0 0 0 2 2 5 HR/AB 0 1 0 0 3 0 HBP/G 0 2 4 0 0 0 SH/G 0 1 1 0 0 0 SB/G 1 2 4 0 1 0

LIVE BALL ERA 1920-1924 (5 Seasons) STAT Negro > MLB MLB > Negro p<0.1 p<0.05 p<0.01 p<0.1 p<0.05 p<0.01 R/G 1 1 0 0 0 0 BA 0 0 0 1 1 3 OBP 0 0 0 0 3 1 SLG 0 0 0 1 1 2 RBI% 0 0 0 1 0 4 BB% 0 0 0 1 0 0 2B/AB 0 0 0 0 1 3 3B/AB 0 0 0 1 1 1 HR/AB 0 0 0 1 0 0 HBP/G* 0 1 0 0 0 0 SH/G 0 0 0 0 0 0 SB/G 1 1 1 0 0 0

TOTAL DEAD BALL ERA LIVE BALL ERA STAT League > p-value League > p-value League > p-value

R/G Negro 2.67E-22 Negro 2.07E-20 Negro 0.00770877 BA MLB 0.009896622 MLB 0.036708 MLB 2.91919E-09 OBP MLB 0.626884541 MLB 0.461767 MLB 8.46065E-07 SLG MLB 0.000121485 MLB 2.09E-05 MLB 4.77011E-07 RBI% MLB 3.43376E-16 MLB 2.21E-21 MLB 6.39205E-09 BB% Negro 0.000246485 Negro 8.20E-06 MLB 0.824232867 2B/AB MLB 2.03546E-10 MLB 4.91E-10 MLB 7.65646E-09 3B/AB MLB 3.12213E-10 MLB 2.91E-11 MLB 0.050477234 HR/AB Negro 0.808180055 MLB 0.577304 MLB 0.131259418 HBP/G* Negro 2.43E-06 Negro 1.19E-06 Negro 0.058981991 SH/G Negro 0.025603646 Negro 0.041125 Negro 0.470365021 SB/G Negro 1.31034E-05 Negro 4.29E-07 Negro 1.28465E-05

*No HBP data for the Negro Leagues in 1924 season

b. Pitching

TOTAL 1904-1924 (21 Seasons) STAT Negro > MLB MLB > Negro p<0.1 p<0.05 p<0.01 p<0.1 p<0.05 p<0.01 ERA 2 4 0 0 1 0 SHO-G 0 0 0 1 5 2 SV/G 0 0 0 1 6 10 IP/G 0 0 0 0 2 10 WHIP 0 2 0 3 0 0 H/9 1 0 0 1 0 3 HR/9* 0 0 0 0 6 1 BB/9 2 4 2 0 0 0 SO/9 2 2 9 1 1 0 SO/BB 4 0 5 1 0 2 CG% 5 1 12 0 0 0 ER% 0 0 0 3 0 17 WP/9 0 0 0 1 4 10 BF/9 0 1 13 0 0 0 R/9 4 4 7 0 0 0 HBP/9 1 4 1 0 0 0

DEAD BALL ERA 1904-1919 (16 Seasons) STAT Negro > MLB MLB > Negro p<0.1 p<0.05 p<0.01 p<0.1 p<0.05 p<0.01 ERA 2 3 0 0 1 0 SHO-G 0 0 0 0 4 2 SV/G 0 0 0 1 4 7 IP/G 0 0 0 0 2 5 WHIP 0 2 0 2 0 0 H/9 1 0 0 1 0 2 HR/9 0 0 0 0 5 0 BB/9 2 4 2 0 0 0 SO/9 2 2 4 1 1 0 SO/BB 4 0 0 1 0 2 CG% 5 1 7 0 0 0 ER% 0 0 0 3 0 12 WP/9 0 0 0 1 4 8 BF/9 0 1 13 0 0 0 R/9 3 3 5 0 0 0 HBP/9 1 4 0 0 0 0

LIVE BALL ERA 1920-1924 (5 Seasons) STAT Negro > MLB MLB > Negro p<0.1 p<0.05 p<0.01 p<0.1 p<0.05 p<0.01 ERA 0 1 0 0 0 0 SHO-G 0 0 0 1 1 0 SV/G 0 0 0 0 2 3 IP/G 0 0 0 0 0 5 WHIP 0 0 0 1 0 0 H/9 0 0 0 0 0 1 HR/9* 0 0 0 0 1 1 BB/9 0 0 0 0 0 0 SO/9 0 0 5 0 0 0 SO/BB 0 0 5 0 0 0 CG% 0 0 5 0 0 0 ER% 0 0 0 0 0 5 WP/9 0 0 0 0 0 2 BF/9 0 0 0 0 0 0 R/9 1 1 2 0 0 0 HBP/9 0 0 1 0 0 0

TOTAL DEAD BALL ERA LIVE BALL ERA STAT League > p-value League > p-value League > p-value

ERA Negro 7.09E-05 Negro 0.01005 Negro 0.044839 SHO-G MLB 2.47E-09 MLB 5.08E-07 MLB 0.020474 SV/G MLB 1.49E-22 MLB 2.43E-16 MLB 1.64E-10 IP/G MLB 3.89E-30 MLB 6.5E-15 MLB 2.53E-17 WHIP Negro 0.115023 Negro 0.099421 MLB 0.17294 H/9 MLB 0.607151 MLB 0.198661 MLB 0.093085 HR/9* MLB 0.632106 MLB 0.167361 MLB 0.196719 BB/9 Negro 5.72E-06 Negro 2.37E-07 MLB 0.948197 SO/9 Negro 7.10E-21 Negro 7.26E-09 Negro 9.21E-33 SO/BB Negro 3.98E-05 Negro 0.204731 Negro 2.13E-15 CG% Negro 1.00E-19 Negro 2.57E-17 Negro 1.40E-12 ER% MLB 2.98E-24 MLB 4.44E-27 MLB 1.01E-18 GS% MLB 3.6E-181 MLB 1.8E-127 MLB 1.95E-67 WP/9 MLB 4.88E-23 MLB 3.49E-19 MLB 0.000186 BF/9 Negro 5.60E-30 Negro 3.91E-34 Negro 0.005118 R/9 Negro 4.19E-18 Negro -3.86E-13 Negro 1.71E-05 HBP/9 Negro -6.59E-07 Negro 4.79E-07 MLB 0.019067

*No HR data for the Negro Leagues in 1924 season

c. Fielding

TOTAL 1904-1923 (20 Seasons) STAT Negro > MLB MLB > Negro p<0.1 p<0.05 p<0.01 p<0.1 p<0.05 p<0.01 Inn/G 0 0 0 1 3 16 FLD% 0 0 0 0 0 20 E/9 0 1 19 0 0 0 DP/9 0 0 20 0 0 0 PB/9 0 3 0 0 0 2 SB/9* 0 3 6 1 0 0 RF 0 0 0 1 1 0

DEAD BALL ERA 1904-1919 (16 Seasons) STAT Negro > MLB MLB > Negro p<0.1 p<0.05 p<0.01 p<0.1 p<0.05 p<0.01 Inn/G 0 0 0 1 3 12 FLD% 0 0 0 0 0 16 E/9 0 1 15 0 0 0 DP/9 0 0 16 0 0 0 PB/9 0 3 0 0 0 1 SB/9 0 3 4 1 0 0 RF 0 0 0 1 1 0

LIVE BALL ERA 1920-1923 (4 Seasons) STAT Negro > MLB MLB > Negro p<0.1 p<0.05 p<0.01 p<0.1 p<0.05 p<0.01 Inn/G 0 0 0 0 0 4 FLD% 0 0 0 0 0 4 E/9 0 0 4 0 0 0 DP/9 0 0 4 0 0 0 PB/9 0 0 0 0 0 1 SB/9* 0 0 2 0 0 0 RF 0 0 0 0 0 0

TOTAL DEAD BALL ERA LIVE BALL ERA STAT League > p-value League > p-value League > p-value

Inn/G MLB 3.2E-112 MLB 6.53E-81 MLB 1.09E-39 FLD% MLB 3.76E-62 MLB 2.95E-52 MLB 2.74E-19 E/9 Negro 1.58E-58 Negro 3.19E-49 Negro 1.26E-18 DP/9 Negro 2.62E-108 Negro 1.09E-86 Negro 5.45E-30 PB/9 Negro 0.031743 Negro 0.026444 MLB 0.921681 SB/9* Negro 2.06E-11 Negro 1.1E-09 Negro 9.01E-07 RF MLB 0.134545 MLB 0.113654 MLB 0.942286

*No SB data for the Negro Leagues in 1923 season

VIII. Chi Square Tables

a. Batting

TOTAL 1904-1924 (21 Seasons) STAT Negro > MLB MLB > Negro p<0.1 p<0.05 p<0.01 p<0.1 p<0.05 p<0.01 BB% 0 0 10 1 0 1 HBP% 0 3 8 0 0 0 RBI% 0 0 0 0 2 16 BA 0 2 0 0 5 9 2B/AB 0 0 0 1 5 12 3B/AB 0 0 0 1 3 9 HR/AB 0 1 1 0 4 2 OBP% 3 1 2 1 2 7

DEAD BALL ERA 1904-1919 (16 Seasons) STAT Negro > MLB MLB > Negro p<0.1 p<0.05 p<0.01 p<0.1 p<0.05 p<0.01 BB% 0 0 9 1 0 0 HBP% 0 2 6 0 0 0 RBI% 0 0 0 0 2 11 BA 0 2 0 0 4 5 2B/AB 0 0 0 1 4 8 3B/AB 0 0 0 1 3 8 HR/AB 0 1 1 0 3 1 OBP 3 1 2 1 1 3

LIVE BALL ERA 1920-1924 (5 Seasons) STAT Negro > MLB MLB > Negro p<0.1 p<0.05 p<0.01 p<0.1 p<0.05 p<0.01 BB% 0 0 1 0 0 1 HBP% 0 1 2 0 0 0 RBI% 0 0 0 0 0 5 BA 0 0 0 0 1 4 2B/AB 0 0 0 0 1 4 3B/AB 0 0 0 0 0 1 HR/AB 0 0 0 0 1 1 OBP 0 0 0 0 1 4

TOTAL DEAD BALL ERA LIVE BALL ERA STAT League > p-value League > p-value League > p-value

BB% Negro 7.29E-16 Negro 4.51E-27 Negro 0.867464 HBP% Negro 0.000112 Negro 2.21E-17 Negro 1 RBI% MLB 1.1E-62 MLB 1.2E-99 MLB 1.07E-38 BA MLB 0.000248 MLB 1.22E-10 MLB 3.47E-26 2B/AB MLB 8.13E-31 MLB 2.74E-38 MLB 8.65E-29 3B/AB MLB 5.91E-15 MLB 3.98E-24 MLB 0.105438 HR/AB Negro 2.12E-10 MLB 0.186694 MLB 0.079 OBP Negro 0.58841 MLB 0.943233 MLB 2.63E-21

b. Pitching

TOTAL 1904-1924 (21 Seasons) STAT Negro > MLB MLB > Negro p<0.1 p<0.05 p<0.01 p<0.1 p<0.05 p<0.01 CG% 0 5 15 0 0 0 SHO/G 0 0 0 3 3 1 SV/G 0 0 0 2 7 7 ER% 0 0 0 0 1 19 BB% 0 1 9 0 0 3 SO% 3 0 14 0 0 3 H/BF 0 0 0 2 0 14 HR/BF* 0 1 2 1 1 5 HBP/BF 9 1 7 0 0 0

DEAD BALL ERA 1904-1919 (16 Seasons) STAT Negro > MLB MLB > Negro p<0.1 p<0.05 p<0.01 p<0.1 p<0.05 p<0.01 CG% 0 5 10 0 0 0 SHO/G 0 0 0 1 3 1 SV/G 0 0 0 1 7 4 ER% 0 0 0 0 1 14 BB% 0 1 8 0 0 1 SO% 3 0 9 0 0 3 H/BF 0 0 0 1 0 10 HR/BF 0 1 1 1 1 3 HBP/BF 6 1 5 0 0 0

LIVE BALL ERA 1920-1924 (5 Seasons) STAT Negro > MLB MLB > Negro p<0.1 p<0.05 p<0.01 p<0.1 p<0.05 p<0.01 CG% 0 0 5 0 0 0 SHO/G 0 0 0 2 0 0 SV/G 0 0 0 1 0 3 ER% 0 0 0 0 0 5 BB% 0 0 1 0 0 2 SO% 0 0 5 0 0 0 H/BF 0 0 0 1 0 4 HR/BF 0 0 1 0 0 2 HBP/BF* 3 0 2 0 0 0

TOTAL DEAD BALL ERA LIVE BALL ERA STAT League > p-value League > p-value League > p-value

CG% Negro 8.26E-50 Negro 3.89E-51 Negro 4.76E-35 SHO/G MLB 9.64E-15 MLB 9.05E-07 MLB 0.0274 SV/G MLB 8.56E-20 MLB 2.9E-16 MLB 4.62E-10 ER% MLB 1.62E-82 MLB 1.65E-91 MLB 1.5E-86 BB% Negro 0.001268 Negro 3.92E-13 MLB 0.072579 SO% Negro 1.5E-106 Negro 1.38E-18 Negro 0 H/BF MLB 3.35E-23 MLB 1.35E-36 MLB 7.34E-30 HR/BF MLB 0.051184 MLB 0.027199 MLB 2.21E-23 HBP/BF* Negro 1.21E-10 Negro 1.66E-13 Negro 6.08E-08

*No HBP data for the Negro Leagues in the 1924 season c. Fielding

TOTAL 1904-1923 (20 Seasons) STAT Negro > MLB MLB > Negro p<0.1 p<0.05 p<0.01 p<0.1 p<0.05 p<0.01 FLD% 0 0 0 0 0 20

DEAD BALL ERA 1904-1919 (16 Seasons) STAT Negro > MLB MLB > Negro p<0.1 p<0.05 p<0.01 p<0.1 p<0.05 p<0.01 FLD% 0 0 0 0 0 16

LIVE BALL ERA 1920-1923 (4 Seasons) STAT Negro > MLB MLB > Negro p<0.1 p<0.05 p<0.01 p<0.1 p<0.05 p<0.01 FLD% 0 0 0 0 0 4

TOTAL DEAD BALL ERA LIVE BALL ERA STAT p- p- League > League > League > p-value value value FLD% MLB 0 MLB 0 MLB 3.5E-156

IX. Correlation

a. ERA

i. MLB

Year BB/9 SO/9 H/9 HR/9 1904 0.616001 -0.441 0.916704 0.057167 1905 0.466896 -0.57701 0.940342 0.542934 1906 0.31221 -0.26778 0.850339 0.597368 1907 0.300158 -0.24595 0.831023 0.456504 1908 0.59484 -0.04847 0.730112 0.50159 1909 0.744451 -0.47852 0.954023 0.578331 1910 0.725831 -0.83004 0.966302 0.513713 1911 0.588402 -0.64469 0.775093 0.727327 1912 0.441144 -0.72161 0.852706 0.184401 1913 0.433378 -0.62436 0.794986 0.569757 1914 0.289693 0.107481 0.886527 0.346021 1915 0.733645 -0.20723 0.8771 0.475159 1916 0.842844 -0.31726 0.806477 0.195338 1917 0.607614 -0.14913 0.721562 0.214969 1918 0.484127 -0.76444 0.791237 0.361932 1919 0.722861 -0.20556 0.888185 0.586129 1920 0.900817 -0.11783 0.884163 0.726989 1921 0.814982 -0.24371 0.780254 0.561929 1922 0.532931 -0.17696 0.875254 0.288509 1923 0.504361 -0.44516 0.857213 0.599942 1924 0.641384 -0.49095 0.931439 0.332796 Total 0.522616 -0.4828 0.926114 0.75904 Dead Ball 0.610725 -0.24171 0.877529 0.559755 Live Ball 0.676123 -0.27298 0.862106 0.641661 ii. Negro Leagues

Year BB/9 SO/9 H/9 HR/9 1904 -0.05212 -0.17011 0.749013 0.953937 1905 0.646347 0.525918 0.825521 -0.1103 1906 0.059832 0.304646 0.847395 0.268204 1907 0.637369 -0.03304 0.827974 0.675634 1908 -0.2064 -0.62236 0.919986 0.54219 1909 -0.19292 0.044471 0.5735 0.400379 1910 0.365512 -0.19744 0.632204 0.681948 1911 0.427597 -0.1348 0.809883 0.585403 1912 0.123696 -0.57612 0.870126 0.328193 1913 -0.26362 -0.03368 0.764459 0.729173 1914 0.576979 -0.2571 0.833739 0.475671 1915 0.472231 -0.53912 0.871718 -0.19471 1916 0.037373 0.088146 0.889452 0.349218 1917 0.913003 0.005145 0.954105 0.4559 1918 0.358186 -0.54785 0.431316 0.807216 1919 0.542001 -0.58998 0.757189 0.370265 1920 0.872608 -0.6255 0.900975 0.768741 1921 0.680847 -0.18368 0.84931 0.027393 1922 0.354944 -0.293 0.968663 0.534264 1923 0.758033 -0.65319 0.919323 0.703684 1924 0.513977 -0.5804 0.909733 N/A Total 0.326813 -0.14833 0.847213 0.592989 Dead Ball 0.366991 -0.08509 0.796182 0.456298 Live Ball 0.599854 -0.43899 0.905488 0.585229

b. R/G

i. MLB

Year BA OBP SLG 1904 0.910704 0.939109 0.865033 1905 0.773999 0.8657 0.829346 1906 0.858685 0.881019 0.827476 1907 0.864726 0.691878 0.737813 1908 0.8746 0.904766 0.814242 1909 0.895574 0.929482 0.902279 1910 0.912653 0.928854 0.916175 1911 0.875557 0.837273 0.925024 1912 0.876427 0.901207 0.889174 1913 0.825785 0.845047 0.84326 1914 0.892882 0.894927 0.857892 1915 0.67398 0.853922 0.698504 1916 0.67604 0.768678 0.668604 1917 0.70016 0.797935 0.719422 1918 0.687671 0.734599 0.772985 1919 0.909526 0.89438 0.922923 1920 0.869499 0.92817 0.943872 1921 0.855408 0.940631 0.910251 1922 0.964591 0.895365 0.934867 1923 0.921869 0.855288 0.894286 1924 0.854852 0.877944 0.810216 Total 0.901008 0.915096 0.904866 Dead Ball 0.855573 0.868175 0.857477 Live Ball 0.885286 0.911189 0.905993

ii. Negro Leagues

Year BA OBP SLG 1904 0.706204 0.880143 0.769799 1905 0.767223 0.912139 0.592779 1906 0.821616 0.937233 0.720933 1907 0.750731 0.94123 0.697487 1908 0.839595 0.903045 0.843403 1909 0.336242 0.496012 0.258364 1910 0.767354 0.737713 0.795894 1911 0.875658 0.812977 0.914163 1912 0.615235 0.7301 0.637441 1913 0.690071 0.287542 0.835932 1914 0.708002 0.617995 0.824893 1915 0.938802 0.866137 0.953994 1916 0.870183 0.888301 0.836756 1917 0.959302 0.945193 0.878036 1918 0.49277 0.595265 0.768577 1919 0.861506 0.950995 0.782579 1920 0.869236 0.91815 0.884725 1921 0.684554 0.850579 0.855895 1922 0.66264 0.708152 0.821475 1923 0.847507 0.86085 0.81653 1924 0.855812 0.900707 0.770519 Total 0.726912 0.750772 0.730987 Dead Ball 0.696562 0.722047 0.696802 Live Ball 0.838669 0.854232 0.839058

Appendix 5 – Regressions

I. R Linear Regression

a. Total

i. MLB

Call: lm(formula = R ~ BB + HBP + X1B + X2B + X3B + HR + SB Residuals: Min 1Q Median 3Q Max -88.17 -23.88 -0.43 23.86 100.42 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -306.82486 23.42762 -13.097 < 2e-16 *** BB 0.31494 0.02685 11.728 < 2e-16 *** HBP 0.54436 0.17165 3.171 0.00165 ** 1B 0.41211 0.02787 14.787 < 2e-16 *** 2B 0.87055 0.07048 12.352 < 2e-16 *** 3B 1.19354 0.12719 9.384 < 2e-16 *** HR 1.36935 0.11660 11.744 < 2e-16 *** SB 0.29134 0.04261 6.838 3.68e-11 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 34.37 on 344 degrees of freedom Multiple R-squared: 0.9073, Adjusted R-squared: 0.9054 F-statistic: 481.1 on 7 and 344 DF, p-value: < 2.2e-16 ii. Negro Leagues

Call: lm(formula = R ~ BB + X1B + X2B + X3B + HR + SB) Residuals: Min 1Q Median 3Q Max -56.935 -9.686 -1.079 9.680 58.053 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -3.47028 2.92702 -1.186 0.237179 BB 0.27752 0.05853 4.741 4.02e-06 *** 1B 0.39543 0.03151 12.551 < 2e-16 *** 2B 0.38208 0.12045 3.172 0.001751 ** 3B 0.82965 0.22561 3.677 0.000302 *** HR 1.90925 0.22031 8.666 1.48e-15 *** SB 0.19400 0.05896 3.290 0.001182 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 19.76 on 201 degrees of freedom Multiple R-squared: 0.9761, Adjusted R-squared: 0.9753 F-statistic: 1365 on 6 and 201 DF, p-value: < 2.2e-16

b. Dead Ball Era

i. MLB

Call: lm(formula = R ~ BB + HBP + X1B + X2B + X3B + HR + SB) Residuals: Min 1Q Median 3Q Max -92.537 -26.736 -3.545 23.111 92.952 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -292.50776 32.95597 -8.876 < 2e-16 *** BB 0.26506 0.03122 8.491 1.51e-15 *** HBP 0.62574 0.19611 3.191 0.00159 ** 1B 0.40637 0.03979 10.214 < 2e-16 *** 2B 0.95570 0.08884 10.757 < 2e-16 *** 3B 1.14545 0.15212 7.530 8.06e-13 *** HR 1.19921 0.22409 5.351 1.89e-07 *** SB 0.30123 0.05798 5.196 4.09e-07 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 35.42 on 264 degrees of freedom Multiple R-squared: 0.8562, Adjusted R-squared: 0.8523 F-statistic: 224.5 on 7 and 264 DF, p-value: < 2.2e-16 ii. Negro Leagues

Call: lm(formula = R ~ BB + X1B + X2B + X3B + HR + SB) Residuals: Min 1Q Median 3Q Max -54.727 -8.147 -0.408 7.644 49.507 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.81462 3.78607 -0.479 0.63253 BB 0.15868 0.07463 2.126 0.03536 * 1B 0.41501 0.04470 9.284 4.52e-16 *** 2B 0.44655 0.16834 2.653 0.00897 ** 3B 1.07339 0.34022 3.155 0.00199 ** HR 1.78670 0.42964 4.159 5.75e-05 *** SB 0.23385 0.07828 2.988 0.00336 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 17.86 on 131 degrees of freedom Multiple R-squared: 0.9174, Adjusted R-squared: 0.9136 F-statistic: 242.3 on 6 and 131 DF, p-value: < 2.2e-16

c. Live Ball Era

i. MLB

Call: lm(formula = R ~ BB + X1B + X2B + X3B + HR) Residuals: Min 1Q Median 3Q Max -65.130 -13.240 5.017 16.492 46.887 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -457.19266 49.90983 -9.160 8.29e-14 *** BB 0.53325 0.04372 12.196 < 2e-16 *** 1B 0.51460 0.04861 10.587 < 2e-16 *** 2B 0.65659 0.10594 6.198 2.95e-08 *** 3B 1.49185 0.17827 8.369 2.60e-12 *** HR 1.54110 0.10390 14.833 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 24.34 on 74 degrees of freedom Multiple R-squared: 0.9405, Adjusted R-squared: 0.9365 F-statistic: 233.9 on 5 and 74 DF, p-value: < 2.2e-16 ii. Negro Leagues

Call: lm(formula = R ~ BB + X1B + X2B + X3B + HR + SB) Residuals: Min 1Q Median 3Q Max -51.476 -10.762 1.016 11.548 59.021 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -9.22402 7.61165 -1.212 0.230 BB 0.44613 0.10436 4.275 6.60e-05 *** 1B 0.36717 0.05266 6.972 2.24e-09 *** 2B 0.30858 0.21170 1.458 0.150 3B 0.59111 0.35084 1.685 0.097 . HR 1.93693 0.31403 6.168 5.49e-08 *** SB 0.20142 0.12379 1.627 0.109 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 23.13 on 63 degrees of freedom Multiple R-squared: 0.9762, Adjusted R-squared: 0.9739 F-statistic: 430.2 on 6 and 63 DF, p-value: < 2.2e-16

II. R/G Linear Regression

a. Total

i. MLB

Call: lm(formula = R.G ~ OBP + SLG) Residuals: Min 1Q Median 3Q Max -0.55884 -0.15399 -0.03425 0.14409 0.74399 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -4.1224 0.2030 -20.31 <2e-16 *** OBP 16.1512 1.1394 14.18 <2e-16 *** SLG 8.5862 0.6889 12.46 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.2342 on 333 degrees of freedom Multiple R-squared: 0.8839, Adjusted R-squared: 0.8832 F-statistic: 1268 on 2 and 333 DF, p-value: < 2.2e-16

ii. Negro Leagues

Call: lm(formula = R.G ~ OBP + SLG) Residuals: Min 1Q Median 3Q Max -1.98142 -0.39910 -0.06325 0.27330 2.24203 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.0913 0.5146 -4.064 7.09e-05 *** OBP 14.5400 2.5439 5.716 4.23e-08 *** SLG 6.7113 1.4594 4.599 7.79e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.6804 on 188 degrees of freedom Multiple R-squared: 0.5998, Adjusted R-squared: 0.5955 F-statistic: 140.9 on 2 and 188 DF, p-value: < 2.2e-16

b. Dead Ball Era

i. MLB

Call: lm(formula = R.G ~ OBP + SLG) Residuals: Min 1Q Median 3Q Max -0.56916 -0.15551 -0.02578 0.15033 0.71873 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -4.1802 0.2458 -17.01 <2e-16 *** OBP 15.2067 1.3260 11.47 <2e-16 *** SLG 9.7179 0.9548 10.18 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.2398 on 269 degrees of freedom Multiple R-squared: 0.8222, Adjusted R-squared: 0.8209 F-statistic: 622 on 2 and 269 DF, p-value: < 2.2e-16

ii. Negro Leagues

Call: lm(formula = R.G ~ OBP + SLG) Residuals: Min 1Q Median 3Q Max -2.01271 -0.44536 -0.07494 0.23849 2.20399 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.9834 0.6034 -3.287 0.00129 ** OBP 14.4491 3.0905 4.675 7.03e-06 *** SLG 6.5353 1.9761 3.307 0.00121 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.7655 on 135 degrees of freedom Multiple R-squared: 0.5572, Adjusted R-squared: 0.5507 F-statistic: 84.95 on 2 and 135 DF, p-value: < 2.2e-16

c. Live Ball Era

i. MLB

Call: lm(formula = R.G ~ OBP + SLG) Residuals: Min 1Q Median 3Q Max -0.42534 -0.13015 -0.00927 0.12129 0.38723 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -5.628 0.382 -14.733 < 2e-16 *** OBP 17.397 1.842 9.444 1.52e-13 *** SLG 11.151 1.164 9.581 8.93e-14 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1663 on 61 degrees of freedom Multiple R-squared: 0.9378, Adjusted R-squared: 0.9357 F-statistic: 459.6 on 2 and 61 DF, p-value: < 2.2e-16

ii. Negro Leagues

Call: lm(formula = R.G ~ OBP + SLG) Residuals: Min 1Q Median 3Q Max -0.7199 -0.2628 -0.0692 0.3506 0.7808 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.583 1.064 -2.428 0.018819 * OBP 12.332 5.467 2.256 0.028502 * SLG 9.934 2.536 3.917 0.000272 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.3741 on 50 degrees of freedom Multiple R-squared: 0.7864, Adjusted R-squared: 0.7778 F-statistic: 92.02 on 2 and 50 DF, p-value: < 2.2e-16

III. ERA Linear Regression

a. Total

i. MLB

Call: lm(formula = ERA ~ BB.9 + H.9 + HR.9) Residuals: Min 1Q Median 3Q Max -0.43057 -0.10036 -0.00771 0.09140 0.52308 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.70452 0.11336 -23.86 <2e-16 *** BB.9 0.33254 0.01719 19.34 <2e-16 *** H.9 0.53146 0.01494 35.58 <2e-16 *** HR.9 1.10913 0.10858 10.21 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1612 on 332 degrees of freedom Multiple R-squared: 0.9412, Adjusted R-squared: 0.9406 F-statistic: 1770 on 3 and 332 DF, p-value: < 2.2e-16

ii. Negro Leagues

Call: lm(formula = ERA ~ BB.9 + H.9 + HR.9) Residuals: Min 1Q Median 3Q Max -1.53891 -0.26896 -0.00814 0.23672 1.73572 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -3.04130 0.22789 -13.35 < 2e-16 *** BB.9 0.46853 0.04234 11.06 < 2e-16 *** H.9 0.54083 0.02451 22.07 < 2e-16 *** HR.9 1.51887 0.21636 7.02 3.12e-11 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.4751 on 207 degrees of freedom Multiple R-squared: 0.8421, Adjusted R-squared: 0.8398 F-statistic: 368.1 on 3 and 207 DF, p-value: < 2.2e-16

b. Dead Ball Era

i. MLB

Call: lm(formula = ERA ~ BB.9 + H.9 + HR.9) Residuals: Min 1Q Median 3Q Max -0.47556 -0.10882 -0.00151 0.10448 0.48208 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.73911 0.13331 -20.547 <2e-16 *** BB.9 0.31105 0.01927 16.145 <2e-16 *** H.9 0.53744 0.01781 30.175 <2e-16 *** HR.9 1.43173 0.15843 9.037 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1621 on 268 degrees of freedom Multiple R-squared: 0.902, Adjusted R-squared: 0.9009 F-statistic: 822.4 on 3 and 268 DF, p-value: < 2.2e-16

ii. Negro Leagues

Call: lm(formula = ERA ~ BB.9 + H.9 + HR.9) Residuals: Min 1Q Median 3Q Max -1.4106 -0.2613 0.0041 0.3105 1.6048 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.73637 0.26569 -10.299 < 2e-16 *** BB.9 0.46637 0.04770 9.777 < 2e-16 *** H.9 0.49908 0.02876 17.354 < 2e-16 *** HR.9 1.44135 0.29661 4.859 2.89e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.4892 on 153 degrees of freedom Multiple R-squared: 0.788, Adjusted R-squared: 0.7839 F-statistic: 189.6 on 3 and 153 DF, p-value: < 2.2e-16

c. Live Ball Era

i. MLB

Call: lm(formula = ERA ~ BB.9 + SO.9 + H.9 + HR.9) Residuals: Min 1Q Median 3Q Max -0.35508 -0.08927 -0.02091 0.08083 0.36858 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.48412 0.41157 -6.036 1.13e-07 *** BB.9 0.44000 0.04157 10.584 2.93e-15 *** SO.9 -0.09648 0.06229 -1.549 0.126753 H.9 0.51917 0.03730 13.917 < 2e-16 *** HR.9 0.70107 0.18551 3.779 0.000369 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1425 on 59 degrees of freedom Multiple R-squared: 0.9284, Adjusted R-squared: 0.9236 F-statistic: 191.3 on 4 and 59 DF, p-value: < 2.2e-16

ii. Negro Leagues

Call: lm(formula = ERA ~ BB.9 + SO.9 + H.9 + HR.9) Residuals: Min 1Q Median 3Q Max -0.55876 -0.21414 -0.05577 0.14370 1.00617 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -4.57582 0.64301 -7.116 4.38e-09 *** BB.9 0.62212 0.08740 7.118 4.35e-09 *** SO.9 0.12394 0.08524 1.454 0.1523 H.9 0.64758 0.04205 15.400 < 2e-16 *** HR.9 0.65930 0.30485 2.163 0.0355 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.3448 on 49 degrees of freedom Multiple R-squared: 0.922, Adjusted R-squared: 0.9157 F-statistic: 144.9 on 4 and 49 DF, p-value: < 2.2e-16

IV. W% Logistic Regression

a. Total

i. MLB

Call: glm(formula = cbind(W, L) ~ OBP + SLG + H.9 + BB.9 + HR.9 + Fld., family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -2.67321 -0.61435 -0.01403 0.62150 2.92182 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -8.05360 1.54754 -5.204 1.95e-07 *** OBP 7.25425 0.85089 8.526 < 2e-16 *** SLG 5.61726 0.59500 9.441 < 2e-16 *** H.9 -0.33084 0.01624 -20.373 < 2e-16 *** BB.9 -0.19995 0.01885 -10.607 < 2e-16 *** HR.9 -0.48626 0.13115 -3.708 0.000209 *** Fld. 7.61996 1.61483 4.719 2.37e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 2102.27 on 335 degrees of freedom Residual deviance: 314.72 on 329 degrees of freedom AIC: 2152 Number of Fisher Scoring iterations: 3 ii. Negro Leagues

Call: glm(formula = cbind(W, L) ~ OBP + H.9 + BB.9 + SO.9 + Fld., family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -3.2044 -0.6164 -0.0245 0.6383 2.1553 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -6.17653 1.37962 -4.477 7.57e-06 *** OBP 11.25784 1.06864 10.535 < 2e-16 *** H.9 -0.31096 0.02096 -14.835 < 2e-16 *** BB.9 -0.21613 0.04026 -5.368 7.95e-08 *** SO.9 0.03547 0.03457 1.026 0.305 Fld. 6.12958 1.56045 3.928 8.56e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 586.64 on 183 degrees of freedom Residual deviance: 164.14 on 178 degrees of freedom AIC: 860.17 Number of Fisher Scoring iterations: 4

b. Dead Ball Era

i. MLB

Call: glm(formula = cbind(W, L) ~ OBP + SLG + H.9 + BB.9 + HR.9 + Fld., family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -2.62678 -0.60824 0.02399 0.62939 3.05749 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -6.71365 1.83971 -3.649 0.000263 *** OBP 7.99558 0.94713 8.442 < 2e-16 *** SLG 5.80683 0.69152 8.397 < 2e-16 *** H.9 -0.34689 0.02022 -17.154 < 2e-16 *** BB.9 -0.20790 0.02067 -10.056 < 2e-16 *** HR.9 -0.72519 0.17101 -4.241 2.23e-05 *** Fld. 6.11345 1.88157 3.249 0.001158 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1801.14 on 271 degrees of freedom Residual deviance: 262.28 on 265 degrees of freedom AIC: 1750.3 Number of Fisher Scoring iterations: 3

ii. Negro Leagues

Call: glm(formula = cbind(W, L) ~ OBP + H.9 + BB.9 + Fld., family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -2.9554 -0.6706 -0.0331 0.6090 2.1173 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.55484 1.51474 -3.667 0.000245 *** OBP 11.91815 1.24073 9.606 < 2e-16 *** H.9 -0.31656 0.02722 -11.632 < 2e-16 *** BB.9 -0.26848 0.04589 -5.851 4.88e-09 *** Fld. 5.63108 1.71188 3.289 0.001004 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1)

Null deviance: 419.88 on 137 degrees of freedom Residual deviance: 128.97 on 133 degrees of freedom AIC: 624.18 Number of Fisher Scoring iterations: 4

c. Live Ball Era

i. MLB

Call: glm(formula = cbind(W, L) ~ OBP + SLG + H.9 + BB.9 + SO.9 + Fld., family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -1.80377 -0.43871 -0.03125 0.53025 1.60187 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -10.36272 5.13716 -2.017 0.043674 * OBP 4.76105 1.95866 2.431 0.015067 * SLG 4.44015 1.20922 3.672 0.000241 *** H.9 -0.28139 0.04283 -6.570 5.02e-11 *** BB.9 -0.19608 0.04792 -4.092 4.27e-05 *** SO.9 0.13713 0.07322 1.873 0.061076 . Fld. 10.27240 5.24053 1.960 0.049975 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 301.133 on 63 degrees of freedom Residual deviance: 32.191 on 57 degrees of freedom AIC: 395.44 Number of Fisher Scoring iterations: 3

ii. Negro Leagues

Call: glm(formula = cbind(W, L) ~ OBP + H.9 + Fld., family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -2.0672 -0.6690 -0.1752 0.5285 1.8329 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -9.66670 5.41043 -1.787 0.0740 . OBP 11.56236 2.05997 5.613 1.99e-08 *** H.9 -0.34200 0.03969 -8.616 < 2e-16 *** Fld. 9.47087 5.72264 1.655 0.0979 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 164.393 on 45 degrees of freedom Residual deviance: 32.908 on 42 degrees of freedom AIC: 239.73 Number of Fisher Scoring iterations: 4

Appendix 6 – Graphics

I. Time Plots

a. Batting

b. Pitching

c. Fielding

II. Heatmaps