THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE

DEPARTMENT OF FINANCE

MONEYBALL IN THE NFL: A FINANCIAL MANAGEMENT ANALYSIS OF THE IDEAL NFL SALARY CAP STRUCTURE

JOHN PEREGRIM FALL 2019

A thesis submitted in partial fulfillment of the requirements for a baccalaureate degree in Finance with honors in Finance

Reviewed and approved* by the following:

Robert Novack Associate Professor of Supply Chain Management Thesis Supervisor

Brian Davis Professor of Finance Honors Adviser

* Signatures are on file in the Schreyer Honors College.

i

ABSTRACT

The paper is meant to determine how the highly acclaimed Major League strategy can be implemented into the . Overall, the goal is to build a regression model that results in an optimal salary cap allocation for various NFL player position groups based on each groups’ contribution to the team’s average win percentage. All data is sourced from historical National Football League statistics over the last ten seasons, from

2008 to 2017. Players were grouped into eight main position groups, and five key statistics were selected for each position group. Each team’s win percentage was regressed on the forty various player statistics to determine the contribution of each variable towards the team’s win percentage. Once the regression model returned the optimal salary cap allocations per position group, these values were compared to the weighted average salary cap allocation over all teams from 2011 to 2017. It was determined that quarterbacks, running backs, offensive linemen, cornerbacks and safeties, and kickers and punters are underpaid, wide receivers and tight ends and defensive linemen are overpaid, and linebackers are statistically unable to be valued using various player statistics.

ii

TABLE OF CONTENTS

LIST OF FIGURES………………………………………………………………………………iv

LIST OF TABLES...………………...…………………………………………………………….v

ACKNOWLEDGEMENTS………………………………………………………………………vi

Chapter 1 Introduction of Topic.………………………………………………………………….1

NFL: The Business………………………………………………………………………..1 Origins of Moneyball………………………………………………………..……..……...2 Moneyball in the NFL………………………………………………………………...…...3 Paul DePodesta’s Influence on Sports……………………………………………..……...4 Thesis Statement……………………………………………………………………...…...4

Chapter 2 Literature Review………...…………………………………………………………….5

Inconsistencies Between Moneyball in the MLB and in the NFL………………………...5 NFL Salary Cap Breakdown………….……………………………………..……..……...6 Unanswered Questions………...……………………………………………………...…...7

Chapter 3 Data Methodology………………..…………………………………………………….9

Data Collection….………………………………………………………………………...9 Position Groups……….……………………………………………..……..……...9 Quantitative Statistics………………………………………………………...….10 Data Exporting and Sorting……………………………………………………...12 Position Group Regression Model……………..…………………………………..…….14 Position Group Allocations………………………………………………………15 Data Methodology Summary…….…………………………………………………...….16

Chapter 4 Data Analysis…………….…………………………………………………………...17

Regression Model………….…………………………………………………………….17 Model Verification…………………………………………………..……..…….17 Regression Model 1.0…………...…………………………………..……..…….17 Regression Model 2.0…………...…………………………………..……..…….20 Regression Model 3.0…………...…………………………………..……..…….22 Regression Model 4.0…………...…………………………………..……..…….25 Regression Model 5.0…………...…………………………………..……..…….27 Statistical Analysis……………………….…………………………………………...….29 Salary Cap Comparison………………………………………………………...…….….33 iii

Chapter 5 Conclusion..…………………………………………………………………………...39

Moneyball’s Potential Impact on NFL Franchises.…………..………………………….39 Topics to Further Explore………………………………………………………………..41

Appendix A Annual Statistics Per Team 2008-2017….…………………………………………44

Appendix B Annual Salary Cap Allocation Per Position Group Per Team 2011-2017…………85

BIBLIOGRAPHY...………...…………………………………………………………………..102

iv

LIST OF FIGURES

Figure 1: Regression Model 1.0 Output…..……………………………………………………...19

Figure 2: Regression Model 2.0 Output……..…………………………………………………...21

Figure 3: Regression Model 3.0 Output….………………………………………………………24

Figure 4: Regression Model 4.0 Output….………………………………………………………26

Figure 5: Regression Model 5.0 Output….………………………………………………………28

Figure 6: Python 3.7.2 OLS Regression Model 5.0 Output...……………………………………32

Figure 7: Various Salary Cap Breakdowns………………………………………………………33

Figure 8: NFL Salary Cap Simple Average.……..………………………………………………34

Figure 9.1: NFL Salary Cap Weighted Average……………………………………………..…..35

Figure 9.2: NFL Salary Cap Weighted Average………………………………………………....36

Figure 10: NFL Salary Cap Super Bowl Average……………………………………………….36

Figure 11: NFL Franchises Ranked by Total Cap Difference………….………………………..38

v

LIST OF TABLES

Table 1: Player Statistics Per Position Group..…..………………………………………………11

vi

ACKNOWLEDGEMENTS

I would like to thank my thesis supervisor, Dr. Novack, and my thesis advisor, Dr. Davis, for guiding my vision every step along the way and always being so encouraging about having productive meetings. I would also like to thank my good friend Trent Andraka, a computer engineering major at the Washington University in St. Louis, for teaching me how to use the coding language Python. Finally, and most importantly, I would like to thank my mom for her daily support through this thesis and all of my other Schreyer Honors College experiences.

1

Chapter 1

Introduction of Topic

Many sports fans are already familiar with the (MLB) salary cap management technique, coined the term, Moneyball. The purpose of this thesis is to extend the theory behind Moneyball and implement it into the National Football League.

NFL: The Business

It is no secret that the National Football League (NFL) is the most popular sports league in the world, both in regard to viewership and revenue. The average NFL game has an audience of 15.472 million viewers which is significantly larger than the second-place league, NASCAR, with an average of 3.332 million viewers (Maglio, 2018). NFL total revenue is currently at

$13.68 billion with $1.32 billion coming from sponsorship revenue, proving that the National

Football League is a moneymaking juggernaut (Fuller, 2017).

It is also key to note that the average NFL franchise has a value of $2.57 billion (Fuller,

2017). The top five most valuable franchises are the Dallas Cowboys, New England Patriots,

New York Giants, Los Angeles Rams, and Washington Redskins with five, six, four, one, and three Super Bowl victories respectively (Rovell, 2018). In a sports league, winning is indicative of success, both with viewers, stadium attendance, and overall revenue, and this franchise valuation helps prove that general theory. These various measures prove that the NFL is thriving financially, thanks to a large demographic, however both fans and franchise owners heavily value wins. With league personnel and team owners constantly searching for a formula for success, this thesis is meant to solve this conundrum by playing on the Kairos of the time. 2

Origins of Moneyball

In the MLB, salary caps are not consistent amongst teams, proving that older and more revenue driven teams, like the New York Yankees, have a larger budget to acquire players than a newer and less established team, like the . This visually unfair playing field gave rise to the highly acclaimed technique called Moneyball. In 2002, the Oakland Athletics began using sabermetric statistics to find undervalued players in an attempt to find talent, while also maximizing their salary cap efficiency. For example, while most MLB teams looked at basic statistics like batting average and field percentage, the Athletics began analyzing statistics like slugging percentage and on-base percentage (Piellucci, 2017).

The goal was not just to use statistics to value players but predominantly to find statistics more attributable to wins than were currently being used. At the time, front office assistant Paul

DePodesta spearheaded the idea and was involved in the implementation process. DePodesta worked hand in hand with the current Athletics’ General Manager , analyzing statistics, activating player trades, and managing the tight salary cap allocation. By acquiring players with a greater bang-for-your-buck, the Oakland Athletics were able to go from one of the worst team in baseball into a squad that won the West division with over one- hundred years, thanks to their iconic twenty-game win streak. Theoretically, the Oakland

Athletics were able to catapult themselves into relevancy by having a front office that was able to outsmart their counterparts by using numbers and statistics to value their players more accurately.

3

Moneyball in the NFL

NFL coaches have recently begun to rely on statistics more heavily within the last decade, however numbers predominantly play into coaching decisions. The most discussed aspect of using quantitative data in football is in regards to probabilities and whether statistics encourage or discourage certain decisions. For example, coaches have become more inclined to go for it on fourth down, prior to the game being on the line, typically under a certain yardage.

This is not because coaches are getting less risk-averse but because going for the first down is statistically favorable.

However, when it comes to paying players, general managers are not convinced that statistics are the way to value their talent. Several high name NFL personnel have expressed their skepticism in Moneyball tactics due to the dissimilarities between MLB and NFL statistics.

While positions in baseball are not exactly the same, they are quite similar in the sense that all players have a fielding percentage and batting statistics, although in football almost every position is drastically different. For example, it is simply unfair to compare a crucial offensive player, like a running back, to an unappreciated defensive player, like a linebacker.

Another reason that general managers are turning away Moneyball is because the performance of some NFL players is difficult to quantify. The most common question posed is, how does one value an offensive lineman whose sole purpose is to block for other offensive position players? A hypothetical statement describes a scenario in which the offensive line blocks and perfectly executes their assignments, meanwhile the running back slips and falls behind the line of scrimmage, leading to a two-yard loss (Olson, 2018). Naturally, there will always be some variability when purely using past player statistics to value positions and determine an appropriate salary cap allocation. 4

Paul DePodesta’s Influence on Sports

Oakland Athletics front office assistant Paul DePodesta has been the man predominantly attributed to the success of Moneyball, and rightfully so. Following his stint with the Athletics,

DePodesta was hired as the General Manager in 2004, the San Diego

Padres Special Assistant for Baseball Operations in 2006, and the Vice

President of Player Development and Scouting in 2010. Even more intriguing, in 2016

DePodesta took his first job in the National Football League as the Chief Strategy Officer for the struggling (Axisa, 2016). Anyone with even a small amount of football IQ would agree that, as of 2016, the Cleveland Browns were the worst franchise in the NFL. Over the past decade, the Browns own one of the lowest average winning percentages in football, with a history of misallocating salary cap funds and a trend of poor draft picks, specifically the quarterback position. In desperate need of a change, the Browns went out of a limb and hired baseball guru DePodesta. Initially, many NFL general managers and executives were highly skeptical of the move, yet with a bit of intrigue. The general consensus was that the world will know in a matter of years whether the Browns were successful with the hire or not.

Thesis Statement

After running the multi-variate regression model of forty key NFL statistics on winning percentage, the optimal salary cap allocation will prove that teams should pay more for quarterbacks, running backs, offensive linemen, and kickers and punters, but should pay less for wide receivers and tight ends, defensive linemen, and linebackers.

5

Chapter 2

Literature Review

Inconsistencies Between Moneyball in the MLB and in the NFL

For decades, individuals have wondered that if Moneyball was so groundbreaking and instrumental in the success of an MLB team then why has it not been adapted and implemented by NFL franchises? In fact, as of late, the majority of NFL teams have a large staff consistently examining statistics and how they could affect game preparation and coaching decisions.

However, while general managers, owners, and coaches agree that analytics are key in preparing to win football games, there is heavy debate regarding their effectiveness in altering a team’s salary cap behind the scenes. The main reason is because MLB is considered a fairly binary game, in which player statistics are relatively easy to measure, while NFL player value is extremely hard to quantify (Olson, 2018). For example, in the MLB, every player has a batting average and a fielding percentage, although in the NFL, there are dozens of positions, each with its own unique task. Therefore, it is virtually impossible to compare a running back to a quarterback or a kicker to a safety.

Another reason the NFL has not embraced the concept of Moneyball is because the number of variables is significantly different from the MLB. In the MLB there are nine starting players and one-hundred and sixty-two games, compared to twenty-two NFL starting players, on offense and defense combined, and a mere sixteen games. With only sixteen regular season games, the sample size for the NFL is minute compared to that of the lengthy MLB season 6

(Olson, 2018). This leads to a lack of uniformity in player statistics, and an inability to value players in relation to the team’s success. It is also argued that in the MLB, a player’s personal statistics are indicative of their own value, while in the NFL a player’s stats might be biased, in the sense that teammates rely on each other more in a game of football (Smith, 2016).

Theoretically, an NFL player could have a very bland game statistically, although it could have contributed to the team’s victory.

Salary cap size can also be attributed to the NFL refusing to adapt a Moneyball-type strategy. In the MLB, the Oakland Athletics developed this strategy because their payroll was significantly smaller than powerhouse teams like the or New York Yankees. On the other hand, the NFL has a uniform salary cap that is mandated for each of the thirty-two teams. To put it simply, there is no urgent desire to find a better “bang for your buck” because teams like the Cleveland Browns have the same player budget as the New York Jets or Buffalo

Bills (Smith, 2016). With that being said, no team in the NFL has truly taken a deep dive into the correlation between player salaries and the average franchise win percentage. However, in

2016 the Cleveland Browns hired a new chief strategy officer, Paul DePodesta, most famously known for his work with the Oakland Athletics (La Canfora, 2017). This was a huge move for a struggling Cleveland Browns organization that has caught the attention of the NFL community, and can either launch the franchise back into relevance or prove that Moneyball in the NFL is nothing more than a farce.

NFL Salary Cap Breakdown

According to data analytics, a key metric related to the average number of wins is the salary cap allocation to the top ten highest paid players. Between 2011 and 2017, the highest 7 number of average wins, 9.2, can be attributed to the top ten players accounting for fifty-six percent to fifty-nine percent of the team’s total salary cap (Connelly, 2018). According to the data, going over sixty percent is detrimental to a franchise, averaging a mere 5.5 wins in a season, while anywhere between forty-four percent and fifty-nine percent averages between 8.0 and 9.2 wins respectively. There is clearly a correlation here between the payroll of the top ten salaries and the number of wins, in which the ideal allocation hovers around the fifty percent benchmark. To be exact, during that same time period, the teams that won between thirteen and fifteen games had 52.5 percent of their salary caps allocated to the top ten highest paid players.

On the other hand, in 2017, the Super Bowl Champion Philadelphia Eagles were at a mere 45.4 percent while the runner-up New England Patriots were at 46.1 percent. According to these metrics, both the Eagles and Patriots should have achieved 8.0 wins and most likely in the bottom half of the league. Taking all of that into account, it appears as if there is no perfect correlation between salary cap percentage allocated to the top tier players and the number of team wins, although there might be a method behind the madness.

Unanswered Questions

While many articles and opinions have been published, both by amateurs and sport intellectuals, there is a large amount of skepticism around the connection between statistics and

NFL player valuation. Several opinions believe that it is virtually impossible to value a position group by using numbers, while others believe it is possible but would yield an unrealistic or detached result from the reality within the sport. With that being said, no research studies have been completed in an attempt to value position groups as a whole. 8

There also has not been studies attempting to value individual players, however salary cap managers and general managers have analyzed the payment of top players in relation to the total salary cap. Many NFL franchises have followed suit in analyzing the percentage of the salary cap allocated to the top ten highest paid players on their teams. While this number is been heralded as a key metric in current salary cap management techniques, it is manipulative considering it ignores the positions of those top paid athletes. For example, one franchise’s top ten highest paid players could include all offensive players while another franchise could include all defensive players. If both franchises allocated the same percentage to their top ten players, it does not mean that they would have the same average win percentage. Actually, the average win percentage could be significantly different, proving that this current practice needs to be further examined. This is a perfect transition into the need for this thesis, with a goal to value position groups independently based on their contribution to the average win percentage for teams.

9

Chapter 3

Data Methodology

Moneyball is the concept of using player statistics to value players, therefore it was crucial to identify key NFL statistics to use in the regression model. However, unlike baseball, football statistics vary based on the position. This led to the creation of position groups, in which statistics would vary based on each position group, but remain constant amongst the various positions within each grouping. Finally, annual team winning percentages were found and used as the dependent variable, with the correlation between various player statistics and wins being the end goal.

Data Collection

Position Groups

Before any data extraction, players needed to be organized into groups considering there are a plethora of football positions. The most efficient position groups were determined to be quarterbacks, running backs, wide receivers and tight ends, offensive linemen, defensive linemen, linebackers, cornerbacks and safeties, and punters and kickers. Not only was this helpful to categorize players based on their designated task, but it also allowed the data mining process to operate more smoothly. For example, when exporting data for NFL passing defense, the values for cornerbacks, left cornerbacks, right cornerbacks, free safeties, strong safeties, and defensive backs were either summed or averaged, depending on the statistic, determining the final value used for that single statistic. By having eight position groups, five key statistics were chosen for each group for consistency purposes. 10

Quantitative Statistics

Player statistics were chosen, first and foremost, based on availability considering all player data is published online and made readily available for the public. This would typically be a challenge, however almost every NFL statistic imaginable is either published on a popular website in a ready-to-go format or can be found on individual team websites. With that being said, the next step was to determine how many statistics are appropriate for each position group, which was determined to be five. Selecting five of the most important statistics for each position group proves that the forty most important NFL statistics would be used to find the relationships to winning percentage.

For quarterbacks, the most important statistics used were passing yards, touchdown passes, comebacks, completion percentage, and passing yards per attempt. Comebacks consist of the number of times that a team was able to reclaim the lead, after previously been down in the fourth quarter or overtime. For running backs, the most important statistics used were rushing yards, receiving yards, total touchdowns, non-fumble percentage, and total touches. Non-fumble percentage consists of the total percentage of rushing attempts and receptions that did not result in a fumble. Total touches consist of the sum of total rushing attempts and total receptions. For wide receivers and tight ends, the most important statistics used were receiving yards, catch percentage, receiving touchdowns, yards per reception, and yards per game. Catch percentage consists of the percentage of receptions in relation to the number of targets. For offensive linemen, the most important statistics used were rushes of ten or more yards, positive rushing percentage, non-quarterback hits, non-quarterback sacks, and power percentage. Positive rushing percentage consists of the percentage of offensive rushes that resulted in positive yards in relation to the total number of rushing attempts. Non-quarterback hits percentage consists of 11 the percentage of plays in which the quarterback was not hit by a defender in relation to the total number of pass attempts, while non-quarterback sacks percentage is same as the previous statistics, using sacks instead of hits. Power percentage consists of the percentage of rushes under two yards, on third or fourth down, that resulted in a first down. For defensive linemen, the most important statistics used were forced fumbles, sack percentage, tackles, tackles for loss, and yards lost from sacks. For linebackers, the most important statistics used were interception percentage, forced fumbles, fumble recoveries, quarterback pressures, and passes defended.

Interception percentage consists of the number of interceptions in relation to the number of opposing passing attempts. Quarterback pressures consist of the sum of quarterback hits and sacks. For cornerbacks and safeties, the most important statistics used were interceptions, forced fumbles, passes defended, tackles, and opposing incompletion percentage. Opposing incompletion percentage consists of the percentage of passing attempts that did not result in a completion in relation to the total number of opposing pass attempts. For kickers and punters, the most important statistics used were total field goals made, field goals made over forty yards, yards per punt, total yards punted, and punts pinned within the twenty-yard line.

Table 1: Player Statistics Per Position Group 12

Several statistics were manipulated to satisfy the regression model, which will be explained further in the data analysis section. For example, non-quarterback hits percentage is not the average NFL statistic, typically measured as quarterback hits; however, it was crucial in the assumptions and output of the model. It is also important to point out that some statistics are repeated but are not identical. For example, several position groups host the statistic interception percentage, but interceptions were counted based on the position player and listed under their assigned position group.

Data Exporting and Sorting

Prior to pulling any data, it was determined that using the data for all thirty-two NFL teams over the past ten years, 2008 to 2017, would yield the best results. Ten years was figured to be an appropriate chronological range because data availability began to diminish as time progressed and using a decade worth of data was simply a visually appealing number. The majority of the data was taken directly from the public website Pro-Football-Reference.com while some harder to find, and less common, statistics were found on other websites such as

FootballOutsiders.com, NFL.com, and ESPN.com.

Luckily, most position group statistics were presented in a chart format which allowed the data to be copied and pasted fairly easily into Microsoft Excel. Each position group’s data was exported separately, for each year, proving that the data needed to be exported a minimum of eighty times. For example, to find the quarterback statistics for 2017 the data was exported from the 2017 NFL Passing section from Pro-Football-Reference.com. Once these numbers were formatted in Excel, they were sorted based on total games played. All players that played less than the majority of the typical NFL season, less than nine games, were removed from the 13 data. This was a large assumption, considering a significant amount of the player statistics available were being removed from the model. The goal of this assumption was to remove variability in player valuation, by not including statistics for players not contributing to the majority of their team’s games. Next, the data was sorted based on team name, in which all players who played on more than one team during the same season were removed. Sorting the player data by team also required the changing of team names, considering the Los Angeles

Chargers were the San Diego Chargers prior to 2017 and the Los Angeles Rams were the St.

Louis Rams prior to 2016. Since most teams are listed as abbreviations, for data sorting purposes, it was important to keep them consistent by changing SDG, for San Diego, back to

LAC, for L.A. Chargers, and STL, for St. Louis, back to LAR, for L.A. Rams. This extra step allowed the transfer of player data between Excel tabs much smoother in the long run. Players were then sorted by position, in which all players were removed that did not fit the current player position group being valued. For example, when selecting the data for the wide receivers and tight ends position group, all players listed under the 2017 NFL Receiving stats from Pro-

Football-Reference.com were removed if their position was anything other than a wide receiver or tight end. This process was repeated for each position group, unless the data specifically pertained to that group and the positions were consistent throughout. Also, all players who did not have a listed position were also removed from the data. This was yet another assumption, since most players without a position were considered utility players, in the sense that they played several positions throughout a single NFL season. While this is quite common in football, it is unfair to include them into a model that is attempting to value specific position groups based on their individual contributions. Finally, individual player statistics were either summed or averaged to come up with each teams’ total statistics for each of the forty statistics. 14

For example, if the 2017 Baltimore Ravens had five running backs that played more than eight games, the total rushing yards were summed but the non-fumble percentages was averaged.

Each year’s position group was exported into an individual Excel tab until all data was properly sorted and refined. Once all the data was in the ideal output, the values were alphabetically sorted based on the team’s abbreviation. These values were then copied from each individual tab into the master tab, listing each team with its coinciding year. However, in order to confirm that there were no errors made during the data transfer process, an IF function was used. The formula stated that if the abbreviations in the master tab matched the abbreviations in the individual data tab then pull the subsequent values, if not then put an X.

Naturally, if an X was located in the cells then there was a mistake somewhere in the individual data. This usually meant that a team’s abbreviation was missing, proving that there was not a single player, in that position group, who did not play more than eight games. While the team’s abbreviations were manually added into the spreadsheet, there would be empty cells in the subsequent values stating that the data is theoretically nonexistent based on the given assumptions. Finally, once the data checked out, all values were copied and pasted as values to prevent any future sorting errors in the Excel master tab. This process was repeated for each year, then for each position group, until all 320 samples were satisfied with all forty variables.

Position Group Regression Model

The basis of the regression model is built around the individual NFL player data for each of the thirty-two NFL teams, over the span of 2008-2017. The majority of the player statistics were listed as values while a few were listed as a percentage when appropriate. Once all the data was exported and properly formatted in the master Excel tab, any individual set who had empty 15 data points were removed from the sample. For example, the 2017 Houston Texans were removed from the final model considering the team did not have a quarterback who started more than eight games, and therefore had non-existent data. After removing all teams that did not have position data, based on the model’s assumptions, the sample size shrunk from 320 to 259.

Position Group Allocations

Once all data was formatted, a multi-variate regression was run using the coding language Python 3.7.2 considering Microsoft Excel was unable to handle a regression with over sixteen independent variables. The forty independent variables, or the various player statistics, were regressed on the team’s winning percentage. The overall goal was to establish various correlations between individual player statistics on the winning percentage of that same year, with worse statistics occurring in a season with fewer wins and vice versa. Once the model was run, coefficients were assigned to each variable and then multiplied by the average value for each statistic. For example, the QB passing yards coefficient was multiplied by the average QB passing yards to get a value. While this value may seem obscure, this process was repeated for each of the forty statistics. These values were then summed up, along with the constant, which established the theoretical dependent variable. Obviously, the goal was to have the dependent variable from the model be extremely close to the actual average NFL winning percentage over the 2008-2017 seasons. Finally, the values for each variable were summed, but only if they were within the same position group, to get a theoretical position value. These position values were then measured as a percentage of the total, forty-variable value to determine the optimal allocation of the salary cap that should be designated towards players within that grouping.

Therefore, a stronger correlation between a position group’s statistics to the team’s winning 16 percentage would yield a higher salary cap allocation. Hypothetically, the model would result in the optimal allocation for each position group based on their contribution to wins, meaning an

NFL team would yield the highest winning percentage following the implementation of this strategy.

Data Methodology Summary

The analysis will ask whether there is a way to use individual football player statistics to value their contribution to their team’s annual winning percentage. If so, this model will determine an appropriate salary cap allocation, per position group, based on each statistics’ win percentage per statistic, measured as the product of the coefficient and subsequent average value.

17

Chapter 4

Data Analysis

There were several variations of the same regression model, in which the data was run on a trial and error basis. Considering Moneyball has not even been attempted in the NFL, the multi-variate regression could have produced illogical numbers with little to no true significance.

This is the main reason that some variables were altered or changed completely, including the dependent variable which is explained more in detail below.

Regression Model

Model Verification

It is key to note that while there are five different variations of the same model, the data was not being manipulated to create a desired result, but rather correct any and all errors.

Inconsistencies and mistakes were discovered while building the model, and are explained in each corresponding version introduced.

Regression Model 1.0

In the first regression model, the number of wins was used as the dependent variable which predictably came out with widely fluctuating salary cap allocations. As shown in the model, the quarterbacks and cornerbacks and safeties allocations were abnormally high, while the running backs, wide receivers and tight ends, defensive linemen, and linebackers yielded extremely low allocations (e.g., See Figure 1). The initial observation was that the dependent variable had quite a small range, in which the minimum number of wins was zero and the 18 maximum number was twenty, sixteen games plus a maximum of four playoff games. As analyzed in the literature review, skeptics have failed to use Moneyball techniques in football because the typical season only consists of sixteen games while the typical baseball season consists of one-hundred and sixty-two games. The most logical solution was to alter the presentation of the dependent variable from wins to win percentage, between 0.000 and 1.000, in which 0.681 would represent winning 68.1 percent of the total games.

When analyzing the OLS Regression results, it was important to understand the warnings listed at the bottom. The most typical response stated that there was strong multicollinearity amongst variables, which was expected considering there are forty independent variables. It is key to point out that this warning was present in every version of the model, although this notice directly influenced the alterations between Model 2.0 and Model 3.0. The solution was to run a simple correlation matrix that explained the relationship between every variable present. Once the matrix was established, the correlations proved that four variables were highly statistically correlated. Statistics were considered highly correlated if the coefficient was above 0.9 or below negative 0.9. These variables included the relationship between QB passing yards and QB completions, WR/TE receptions and WR/TE receiving yards, WR/TE receptions and WR/TE targets, and K/P punts and K/P total yards punted. Luckily, the QB, WR/TE, and K/P position groups all yielded extremely irregular salary cap allocations.

19

Figure 1: Regression Model 1.0 Output

20

Regression Model 2.0

In this variation of the model, the QB completions statistic was replaced with QB interceptions, the WR/TE receptions statistic was replaced with WR/TE yards per reception, the

WR/TE targets statistic was replaced with WR/TE yards per game, and the K/P punts statistic was replaced with K/P yards per punt (e.g., See Figure 2). After rerunning the regression, the allocations remained nearly unchanged, calling into question the multicollinearity warning produced from the model. To test whether the high number of variables was causing the odd outputs, another correlation matrix was run between the forty independent variables. After analyzing the results, the most correlated variable was removed from each position group. The purpose of this was to narrow the number of independent variables down to thirty-two, or four statistics for each of the eight position groups. The following variables that were removed, due to high correlations, included QB touchdown passes, RB touches, WR/TE yards per game, OL negative rushes, DL quarterback hits, LB tackles, CB/S passes defended, and K/P field goals made over forty yards. It is also key to mention that the QBR statistic, for the quarterback position group, is a metric comprising several various quarterback statistics, and was therefore double counting. This statistic was replaced with total comebacks led, which consists of all instances where the quarterback’s team was able to regain the lead after being down in the fourth quarter or overtime. 21

Figure 2: Regression Model 2.0 Output

22

Regression Model 3.0

After rerunning the model with only thirty-two variables, the salary cap allocations became more extreme. The position groups that had extremely high allocations included the quarterbacks, cornerbacks and safeties, and kickers and punters. The position groups that had negative allocations were the running backs, wide receivers and tight ends, and the offensive linemen. This helped prove that the number of independent variables were not directly causing the disproportioned salary cap allocations (e.g., See Figure 3). The solution was simply to disregard Regression Model 3.0 and adjust Regression Model 2.0.

When re-analyzing Model 2.0, it was determined that allocations were most likely negative because the corresponding statistic coefficients were negative. While coefficients do not have to be positive in a multi-variate regression, it did not make logical sense to include statistics in the model that were optically, negatively correlated to win percentage. For example, the offensive linemen had four statistics being used that can be attributed to negative contributions, such as quarterback sacks allowed. It is simply irrational to have only one negatively correlated statistic for the quarterback position group and four negatively correlated statistics for the offensive linemen position group. With that being said, the eight statistics that needed to be changed were QB interceptions, RB fumbles, WR/TE fumbles, OL negative rushes,

OL quarterback hits, OL quarterback sacks, OL penalties, and CB/S penalties.

When making the changes, the statistics themselves were not changed, however they were manipulated in order to establish a positive correlation between them and the win percentage. QB interceptions was turned into QB non-interception percentage, which consists of the number of pass attempts that were not intercepted over the total number of passing attempts.

RB fumbles were turned into RB non-fumble percentage, which consists of the number of 23 touches that did not result in a fumble over the total number of rushing attempts and receptions.

WR/TE fumbles were turned into WR/TE non-fumble percentage, which consists of the number of catches not resulting in a fumble over the total number of receptions. OL negative rushes were turned into OL positive rushes, which consists of the number of rushing attempts that resulted in a positive gain over the total number of rushing attempts. OL quarterback hits was turned into OL non-quarterback hits, which consists of the number of times the quarterback was not hit out of the total number of passing attempts. OL quarterback sacks was turned into OL non-quarterback sacks, which consists of the number of times that the quarterback was not sacked out of the total number of passing attempts. OL penalties was turned into OL non- penalties, which consists of the number of offensive plays that did not result in a penalty out of the total number of offensive plays. CB/S penalties was turned into opposing quarterback completion percentage, which was used because it is a better statistic to value that specific position group.

24

Figure 3: Regression Model 3.0 Output

25

Regression Model 4.0

Once the eight original statistics were changed, the regression was rerun showing that some position groups were improving while others continued to yield odd allocations. Optically, the wide receivers and tight ends, offensive linemen, cornerbacks and safeties, and kickers and punters had allocations that looked fairly appropriate, while the remaining position groups were either negative, extremely high, or extremely low (e.g., See Figure 4).

By taking a deeper dive into the OLS Regression results, it was noticed that the variables with the highest standard error were the QB non-interception percentage, OL non-penalties percentage, and WR/TE non-fumble percentage respectively. With the amount of data present, these statistics were simply replaced with other positive statistics for each group. QB non- interception percentage was replaced with QB completion percentage, OL non-penalties percentage was replaced with OL power percentage, and WR/TE non-fumble percentage was replaced with WR/TE catch percentage.

Other adjustments made were to statistics within the defensive linemen and linebackers position groups, considering the allocations were still hovering around zero. The next step included a simple correlation between each of five variables, for both position groups, and win percentage. Following the analysis, it was found that three variables were statistically insignificant, meaning the correlation coefficient was between -0.1 and 0.1. These variables included DL quarterback hits, LB tackles, and LB tackles for loss. DL quarterback hits was replaced with DL yards lost from sacks, LB tackles was replaced with LB quarterback pressures, and LB tackles for loss was replaced with LB passes defended.

26

Figure 4: Regression Model 4.0 Output

27

Regression Model 5.0

Following the removal of those six statistics and the addition of six new statistics, the

OLS Regression Model resulted in relatively optimal allocations for seven of the eight position groups. While all position groups had positive allocations, the only group with an irregular number was the linebacker position group. Naturally, a one-percent allocation is fairly illogical, but there were no further steps that could be taken without truly manipulating the data in an unfair way (e.g., See Figure 5). The goal of the model is to value each position group independently, regardless of whether the model’s allocations are close to the actual NFL allocations or not. Obviously, it would have been ideal if every position group had an appropriate valuation, although this interesting output for the linebacker position group would be up to the average general manager to evaluate. This also proves that funds would need to be pulled from other position groups with much higher allocations. In this case, it would probably make the most sense for NFL general managers and salary cap managers to shift funds from either the offensive linemen or cornerbacks and safeties, into the linebacker position group, assuming the model is an accurate predictor of position contributions to win percentage.

28

Figure 5: Regression Model 5.0 Output

29

Statistical Analysis

When analyzing the results of a regression model it is key to interpret the significance of the data results using key statistical measures. One of the most important metrics is the R- squared value, when running a single variable regression, and the adjusted R-squared when running a multi-variate regression. The adjusted R-squared determines what percentage of the dependent variable can be explained by the various independent variables. In the model, the adjusted R-squared value was 0.758 which means that 75.8 percent of the average win percentage can be explained by the forty different player statistics. This is an encouraging result because while the goal is to have the adjusted R-squared as high as possible, it is also questionable when it gets too high to 1. For example, if the adjusted R-squared was 0.950 then it could simply be a result of the large number of independent variables, instead of the data being portrayed in those separate variables.

The F-statistic is an additional determinant in whether the data from the model is statistically different than zero. If the F-statistic is greater than the corresponding p-value then it is logical to reject the null, which would mean that the model fits the data better than a model with no independent variables (Frost, 2019). According to the OLS Regression Model, the F- statistic is 21.18 and the p-value is statistically zero, proving that the null hypothesis is rejected.

By rejecting the null hypothesis, the alternative hypothesis is accepted, proving that the independent variables in the model improve the fit.

Next it is important to analyze the mathematical statistics that can be attributed to each individual statistic. One of the most used statistics is the standard error, which corresponds to each individual independent variable within the regression model. The standard error represents how far, on average, the observable value is from the regression line. Naturally, it is better when 30 the standard error is the lowest, which means that the average independent variable value is fairly close to the regression line. All forty standard errors are listed in the OLS Regression Model output. The three statistics with the highest standard error include linebacker interception percentage, defensive linemen sack percentage, and running back non-fumble percentage respectively. The three statistics with the lowest standard error include kickers and punters total yards punted, wide receivers and tight ends receiving yards, and quarterback passing yards respectively. In order to achieve a 95 percent prediction interval, the standard errors must be less than 2.5 for each independent variable (Kenton, 2019). In this model, the highest standard error was 2.234 for linebacker interception percentage. Interpretively, this means that the average distance of the data points from the fitted line is about 2.234 percent for the linebacker interception percentage statistic. With all other statistics having a standard error significantly under 2.5, this proves that the model does achieve a 95 percent prediction interval.

Another key statistic is the p-value, which helps explain whether each independent variable is statistically different than zero. If the p-value is less than 0.05 then it is logical to reject the null hypothesis and accept the alternative hypothesis, which states that the independent variable is statistically different than zero. In the model, twenty-four of the forty independent variables had a p-value greater than 0.05, which means that a majority of the statistics are not different than zero. Of the forty statistics, three in the quarterback group, one in the running back group, two in the wide receivers and tight ends group, two in the offensive linemen group, four in the defensive linemen group, all five in the linebacker group, three in the cornerbacks and safeties group, and four in the kickers and punters group had a p-value of 0.05 or higher.

According to the p-value results, the position group that contained the most statistically significant variables was the running backs, with only one statistic having to accept the null. The 31 position group that had the most statistically insignificant variables was the linebackers, with all five statistics being forced to accept the null. This observation was quite interesting considering the linebacker position group was allocated an extremely low amount of 1.03 percent of the salary cap. It can be argued that considering all five linebacker statistics are not statistically different than zero, the linebacker position players cannot be valued using quantitative player statistics (e.g., See Figure 6).

32

Figure 6: Python 3.7.2 OLS Regression Model 5.0 Output 33

Salary Cap Comparison

Once the model was able to determine the optimal allocations for each position group, it was essential to compare the results to the actual salary cap allocations that have been implemented in the NFL over the past couple of seasons (e.g., See Figure 7).

Figure 7: Various Salary Cap Breakdowns

Average salary cap allocations per position were determined by taking the average allocation of all NFL franchise average allocations between 2011 and 2017. Ideally, the years

2008 to 2017 were to be used for consistency purposes with the model, although data was extremely difficult to find prior to the 2011 season. The simple average is fairly self- explanatory, in which each team’s average salary cap was then averaged across all NFL franchises. For example, the Arizona Cardinals salary cap allocation was determined for each season between 2011 and 2017. These allocations were then averaged to get the Cardinals average allocation. This process was repeated for every NFL team, and then averaged to determine the simple average. The weighted average was determined using the same average allocations per team, however more weight was given to the franchises with a higher average win percentage. Therefore, the New England Patriots, with the highest average win percentage, had a larger influence on the weighted average salary cap allocation than the Cleveland Browns, the lowest average win percentage. Finally, the Super Bowl average was determined by taking the various individual years of Super Bowl winning teams. For example, the 2017 Eagles, 2016 34

Patriots, and all other Super Bowl winners between 2011 and 2017, were averaged based on that specific year’s salary cap allocation. The NFL salary cap allocations for simple, weighted, and

Super Bowl averages are shown in Figures 8, 9.1, 9.2, and 10 respectively.

Figure 8: NFL Salary Cap Simple Average

35

Figure 9.1: NFL Salary Cap Weighted Average

36

Figure 9.2: NFL Salary Cap Weighted Average

Figure 10: NFL Salary Cap Super Bowl Average 37

When analyzing the model results to the seven-year average salary cap, it is more sensible to compare the allocations to the NFL weighted average salary cap allocations. This would enable the average salary cap allocations to be indicative of each franchise, while also putting more weight on the teams with a higher average win percentage. Naturally, this would also help give a more efficient average salary cap representation considering it is valuing winning salary cap allocations more than those that have not been as successful in the past.

According to the model, quarterbacks are slightly underpaid, running backs are accurately valued, wide receivers and tight ends are slightly overpaid, offensive linemen are severely underpaid, defensive linemen are overpaid, linebackers are severely overpaid, cornerbacks and safeties are underpaid, and kickers and punters are slightly underpaid. Keeping in mind that the linebacker position group allocation is extremely low, it is safe to assume that the model believes that linebackers are severely overpaid, well below the average allocation of around fourteen percent.

Next it was important to solve for the difference between each team’s 2011-2017 average salary cap allocation to the model. The model allocations were subtracted from each team’s average allocation per position group. Those absolute value measures were then summed to determine the total salary cap difference between each team and the model. The team with the closest average salary cap, over those seven seasons, were the New York Jets with an average win percentage of 0.472. While this is not too promising, it was determined that six of the seven

Super Bowl Champions over that time span had total salary cap differences in the top half of

NFL franchises. The most accurate Super Bowl Champion was the 2017 Philadelphia Eagles with a total cap difference of 45.32 percent (e.g., See Figure 11). This means that, on average,

45.32 percent of the Eagles salary cap is allocated differently than the model predicts. 38

Figure 11: NFL Franchises Ranked by Total Cap Difference

39

Chapter 5

Conclusion

Moneyball’s Potential Impact on NFL Franchises

The National Football League is the most watched sports league in the United States, with a typical game averaging almost 15.5 million viewers and grossing a total revenue of $13.6 billion. There is also a high correlation between the number of Super Bowl wins and total franchise value, considering the three most valuable franchises have five, six, and four Super

Bowl victories respectively. Despite being a form of entertainment, the NFL is a business with an end goal of maximizing profit, typically by maximizing the team’s average win percentage.

In an age where winning is highly profitable, NFL executives and owners are consistently searching for a quantifiable winning formula. One avenue that has untapped potential is attempting to introduce a Moneyball inspired strategy into the NFL that would maximize the efficiency of a team’s salary cap allocation by position group.

Despite having five variations of the same model, a regression was built that determined the optimal salary cap allocation, per position group, based on that group’s contribution to the team’s average win percentage over the last ten years. According to the model, quarterbacks should be allocated 11.61 percent, running backs should be allocated 6.37 percent, wide receivers and tight ends should be allocated 13.42 percent, offensive linemen should be allocated

31.02 percent, defensive linemen should be allocated 6.19 percent, linebackers should be allocated 1.03 percent, cornerbacks and safeties should be allocated 23.52 percent, and kickers 40 and punters should be allocated 6.85 percent of the salary cap. The extremely low allocation for the linebacker position group can be explained by the fact that all five of the position statistics have p-values greater than 0.05, meaning they are not statistically significant from zero.

When comparing the model allocations to the actual NFL data over the seasons between

2011 and 2017, the model defended the thesis statement in every position group expect for running backs. The model proved that quarterbacks, offensive linemen, cornerbacks and safeties, and kickers and punters are underpaid, wide receivers and tight ends, defensive linemen, and linebackers are overpaid, and running backs are appropriately paid. These results were determined by comparing the model allocations to the weighted average NFL salary cap allocations.

Moneyball was a breakthrough in salary cap management for Major League Baseball franchises, which proved that it is possible to value players based on their statistical performance. This strategy has not been adjusted or implemented into the National Football

League because many general and salary cap managers believe it is virtually impossible to value players based on their contributions to their team’s average win percentage. The regression model built for this thesis was able to determine an optimal allocation for eight specified position groups based on the correlation between the corresponding statistics and the team’s win percentage. It would be instrumental for NFL franchises to adopt the following Moneyball- inspired salary cap management strategy to maximize their team’s future average win percentage, and subsequent profit.

41

Topics to Further Explore

While this thesis did an impressive job at giving a numerical value to each position group as a fraction of the total salary cap, there is definitely more to be done to improve upon the model.

Is there a way to accurately value the linebacker position group more efficiently?

As shown in the regression model output, the linebacker position group was granted an extremely small salary cap allocation, which is virtually nonsensical. There are really two ways to approach this alternative thesis designed specifically at perfecting the model by accurately valuing the linebackers. The first option would be to group the linebackers in with either the defensive linemen or the cornerbacks and safeties. This would involve changing the statistics within the position groups being merged. The second option would be to complete a case study, in which the linebacker allocation is determined by the average NFL salary cap allocation given to linebackers over the past ten years. This would also involve elaborating on the case study to determine which position group should be forced to forego some salary cap allocation in order to appropriate the cap funds to the linebackers. For example, if the case study shows that linebackers should be allocated eleven percent of the salary cap, instead of the one percent from the model, then that extra ten percent needs to be taken from other position groups represented in the model output.

42

What is the average win percentage of a team implementing the Moneyball-inspired salary cap strategy?

Another thesis could build on the current model by using the allocations to project a future win percentage for a team implementing the Moneyball strategy. Naturally, since this model was built from scratch, the allocations are fairly different than any current NFL salary cap allocation strategy. Therefore, it would be intriguing to see what is the average win percentage of a team that not only implements the optimal salary cap allocation, but also maintains that strategy over a sustained period of time.

What is the average length of time between the implementation of the optimal salary cap allocations and the surpassing of a certain win percentage margin?

This potential thesis could build off of the previous question, as well as build off the current regression model. If the optimal salary cap allocation is determined to be efficient, a benchmark average win percentage could be set, potentially around 0.650 or even as high as

0.700. Using the model, or even building a new model, it would be interesting to answer the question: how long is the implementation lag for this optimal salary cap allocation? Most strategies, whether in business or in sports, have an implementation lag, also known as the time it takes between when a strategy is implemented and when the desired results begin. With that being said, it would be interesting to determine about how many years would it take for an NFL franchise implementing this Moneyball strategy to surpass an average win percentage of a set benchmark amount.

43

44

Appendix A

Annual Statistics Per Team 2008-2017

The subsequent data is presented in order both reverse chronologically, by year, and then alphabetically by team abbreviation. Super Bowl Champions are denoted with an asterisk immediately following the team’s abbreviation. Statistics are also presented for each position group, listed with quarterbacks, running backs, wide receivers and tight ends, offensive linemen, defensive linemen, linebackers, cornerbacks and safeties, and kickers and punters respectively.

45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83

84

85

Appendix B

Annual Salary Cap Allocation Per Position Group Per Team 2011-2017

The subsequent data is presented in alphabetical order based on the team’s abbreviation used throughout the paper. Super Bowl Champions are denoted with an asterisk immediately following the year, listed in each individual team’s salary cap chart.

86 87 88 89 90 91 92 93 94 95 96 97 98 99 100

101

BIBLIOGRAPHY

Axisa, Mike. “Browns Fans, Here's Everything You Need to Know about Paul DePodesta.”

CBSSports.com, CBS Sports, 5 Jan. 2016, www.cbssports.com/mlb/news/browns-fans-

heres-everything-you-need-to-know-about-paul-depodesta/.

Connelly, Bill. “The Secret Salary Cap Formula Successful NFL Teams Rely

On.” SBNation.com, SBNation.com, 15 Mar. 2018,

www.sbnation.com/2018/3/15/17114596/nfl-free-agency-2018-salary-cap-formula-

winning-teams.

Frost, Jim. “How to Interpret the F-Test of Overall Significance in Regression Analysis.”

Statistics By Jim, 15 Mar. 2019, statisticsbyjim.com/regression/interpret-f-test-overall-

significance-regression/.

Fuller, Steve. “Topic: National Football League (NFL).” Statista, Statista, 2017,

www.statista.com/topics/963/national-football-league/.

Kenton, Will. “How T-Tests Work.” Investopedia, Investopedia, 22 Mar. 2019,

www.investopedia.com/terms/t/t-test.asp.

La Canfora, Jason. “Browns' 'Moneyball' Approach Playing out and the Results Are, Well, to Be

Determined.” CBSSports.com, CBS Sports, 10 Mar. 2017,

www.cbssports.com/nfl/news/browns-moneyball-approach-playing-out-and-the-results-

are-well-to-be-determined/.

Maglio, Tony. “Happy NFL Kickoff Day! 8 Pro Sports Leagues Ranked by How Much Money

Their TV Viewers Earn.” TheWrap, TheWrap, 6 Sept. 2018, www.thewrap.com/pro-

sports-leagues-ranked-viewer-income-nfl-nfl-nba-mlb/.

“NFL 2018 Salary Cap Tracker.” Spotrac.com, www.spotrac.com/nfl/cap/.

“NFL Team Punting Statistics - 2018.” ESPN, ESPN Internet Ventures, 2019,

www.espn.com/nfl/statistics/team/_/stat/punting.

Olson, Eric. “Where Is the NFL's Version of Moneyball?” Inside The Pylon, 24 May 2018,

insidethepylon.com/football-science/football-statistics/2018/05/24/nfls-version-

moneyball/.

Outsiders, Football. “2018 Offensive Lines.” Football Outsiders, 2018,

www.footballoutsiders.com/stats/ol.

Piellucci, Mike. “An Experiment That Changed Baseball: The Moneyball Draft 15 Years Later.”

Sports, VICE, 12 June 2017, sports.vice.com/en_us/article/pay3n9/an-experiment-that-

changed-baseball-the-moneyball-draft-15-years-later.

“Pro Football Statistics and History.” Pro, 2019, www.pro-football-reference.com/.

Rovell, Darren. “Forbes: Cowboys Most Valuable NFL Team for 12th Year in Row.” ESPN,

ESPN Internet Ventures, 20 Sept. 2018, www.espn.com/nfl/story/_/id/24742979/forbes-

magazine-dallas-cowboys-5b-again-most-valuable.

Smith, Michael David. “Brian Billick: You Can't Do Moneyball in the NFL.” ProFootballTalk,

ProFootballTalk, 7 Jan. 2016, profootballtalk.nbcsports.com/2016/01/07/brian-billick-

you-cant-do-moneyball-in-the-nfl/.

“Statistics.” National Football League Stats - by Team Category | NFL.com, 2019,

www.nfl.com/stats/categorystats?role=TM&offensiveStatisticCategory=OFFENSIVE_LI

NE&tabSeq=2.