<<

THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE

DEPARTMENT OF INDUSTRIAL AND MANUFACTURING ENGINEERING

PREDICTIVE MODELING AND ANALYTICS FOR PROFESSIONAL : AN ANALYSIS OF INJURIES, PLAYER PERFORMANCE, AND MEDICAL STAFF OPTIMIZATION

PATRICK SCHERI SPRING 2020

A thesis submitted in partial fulfillment of the requirements for a baccalaureate degree in Industrial Engineering with honors in Industrial Engineering

Reviewed and approved* by the following:

Guodong (Gordon) Pang Associate Professor Harold and Inge Marcus Department of Industrial and Manufacturing Engineering Thesis Supervisor

Catherine Harmonosky Associate Professor and Associate Department Head of Harold and Inge Marcus Department of Industrial and Manufacturing Engineering Honors Adviser

* Signatures are on file in the Schreyer Honors College.

i

ABSTRACT The research in this paper aims to help (MLB) teams find the next big competitive advantages within baseball – injury modeling and medical staff optimization. The objective of this research is to create models to predict future injury, and evaluate medical staffing so that teams and players can increase their future performance.

The game of baseball is quickly shifting towards analytics and as teams strive to find every advantage possible, they must consider evaluating their medical departments. The research in this paper utilizes a predictive model to indicate the odds of a requiring surgery

(a common baseball injury). The model used a variety of variables ranging from basic statistics, selections and velocities, and pitching mechanics to generate an equation for the likelihood a player will require surgery. The model showed that a pitcher’s pitch selection is one of the largest indicators of surgery. Additionally, the number of appearances that a pitcher has over his career, and the duration of the appearances can be predictors of surgery.

Following the creation of the predictive model, the staffing of team medical departments was evaluated. A baseline scenario was created using one teams known combination of physicians, team consultants, athletic trainers, physical therapists, occupational therapists, chiropractors, massage therapists, strength coaches, and nutritionists. Four additional scenarios were then created to demonstrate a low budget team, a team that highly values preventative action, a team looking to maximize player satisfaction, and a team looking to minimize overall injury. The tradeoffs between each model came from changes in budget, injury time, and risk.

This research is just the beginning for what can be done in analyzing medicine in professional sports. As analytics continue to become a larger part of sports, teams can save millions of dollars and increase performance by analyzing every aspect of their operation. ii

TABLE OF CONTENTS

LIST OF FIGURES ______iv LIST OF TABLES ______v ACKNOWLEDGEMENTS ______vi CHAPTER 1: INTRODUCTION ______1 1.1 History and Evolution of Baseball Analytics ______1 1.2 Implications of Injuries in Professional Baseball ______3 1.3 Objectives ______4 CHAPTER 2: A BASIS FOR INJURY MODELING ______6 2.1 Motivation ______6 2.2 Literature Review ______8 2.2.1 Biomechanical Evaluation of Injury ______8 2.2.2 Statistical Evaluation of Injury ______9 2.2.3 Evaluation of Injury Model Types ______10 2.3 Injury Trends in Recent Years ______11 2.4 A Basic Analysis of Common Injuries ______14 2.5 The Pitcher ______17 2.5.1 The Injuries of a Pitcher ______17 2.5.2 The Physics of Pitching ______18 2.6 Potential Predictors of the Injury Prediction Model ______20 2.7 The Modeling Approach ______21 CHAPTER 3: PREDICTING FUTURE INJURY ______22 3.1 Variable Selection for the Model ______22 3.1.1 Initial Selection of Variables ______22 3.1.2 Variable Types ______24 3.1.3 Variable Parameters and Elimination ______30 3.1.4 Backwards Selection of Variables ______31 3.2 Model Type Selection ______34 3.2.1 The Linear Model ______34 3.2.2 The Generalized Linear Model (GLM) ______36 3.3 Model Analysis ______37 CHAPTER 4: MODELING THE MEDICAL STAFFING OF A TEAM ______41 4.1 An Overview of a Major League Baseball Team Medical Staff ______41 4.1.1 Team Physicians ______42 4.1.2 Team Medical Consultants ______44 4.1.3 Athletic Trainers ______46 4.1.4 Strength and Conditioning Coaches ______47 iii

4.2 Modeling the Medical Staff ______48 4.3 Evaluating Tradeoffs within a Medical Staff ______50 CHAPTER 5: CONCLUSION ______56 5.1 Summary of Findings ______56 5.2 Future Research ______59 CITATIONS ______60 ACADEMIC VITA ______62

iv

LIST OF FIGURES

Figure 1: Total Cost of Player Injuries ...... 6

Figure 2: Number of Player Injuries ...... 7

Figure 3: Yearly Trends of Injuries...... 13

Figure 4: Monthly Analysis of Injuries ...... 14

Figure 5: Shoulder Injury Seasonal Trend ...... 15

Figure 6: Elbow Injury Seasonal Trend ...... 15

Figure 7: Hamstring Injury Seasonal Trend ...... 16

Figure 8: Knee Injury Seasonal Trend ...... 16

Figure 9: Back Injury Seasonal Trend ...... 17

Figure 10: Sequential Mechanics of a Pitch ...... 19

Figure 11: Comparison of Basic Pitching Statistics ...... 25

Figure 12: Comparison of Pitch Selection ...... 27

Figure 13: Comparison of Pitch Movement ...... 28

Figure 14: Pitch Movement (X) ...... 29

Figure 15: Pitch Movement (Z) ...... 29

Figure 16: Variable Selection ...... 33

Figure 17: Normal Q-Q Plot of Linear Model ...... 35

Figure 18: Staffing Model Constraints...... 50

Figure 19: Verification of Staffing Model ...... 50

Figure 20: Staffing Model Case 1 ...... 52

Figure 21: Staffing Model Case 2 ...... 53

Figure 22: Staffing Model Case 3 ...... 54

Figure 23: Staffing Model Case 4 ...... 55

v

LIST OF TABLES

Table 1: DL Data Yearly Percentages ...... 12

Table 2: Harvard Sports Analytics Predictors ...... 20

Table 3: Initial Variable Selection for Model ...... 23

Table 4: Sensitivity Analysis of Variables ...... 32

Table 5: Model Variable Coefficients ...... 37

Table 6: Sensitivity Analysis of Variables ...... 38

Table 7: Medical Staffing Variables and Cost Parameters ...... 48

Table 8: Reduction of Injury Days Parameters ...... 49

vi

ACKNOWLEDGEMENTS

I want to thank my thesis supervisor, Dr. Pang, for guiding me through the research process and for motivating me to develop new technical skills during the past two years. I want to acknowledge my honors advisor, Dr. Harmonosky, for her guidance and support during my undergraduate honors industrial engineering journey. Finally, I would like to thank my parents and family for supporting me during my academic and baseball career. You have granted me the opportunity to receive an industrial engineering degree from one of the greatest universities in the world. I am forever grateful, as my degree would not be possible without you. 1

CHAPTER 1: INTRODUCTION

The growing usage of analytics in professional baseball, the implications of player injuries, and the primary objective of this research are discussed in this chapter.

1.1 History and Evolution of Baseball Analytics

Over the past 30 years the game of baseball has undergone a wave of change. In the 1940s, hall of fame player Yogi Berra said, “Baseball is 90% mental. The other half is physical.” Today,

Berra’s statement no longer holds true as baseball gurus and front office executives strive to find every competitive advantage possible within analytics and unique statistics. Statistics have always been a part of baseball, however; the statistics that are used to evaluate players and their worth have drastically changed (Knowledge at Wharton 2019).

In past generations, players were evaluated based off of rudimentary statistics such as their average, how many singles they , or how many home runs they hit. Today, these statistics that have always been a part of the game are now used for their predictive value and plugged into complex models that provide analysis-allowing teams to predict when players will be at peak performance.

The usage of these predictive models first became a part of baseball at the turn of the century when general , Billy Bean, began using the concept of

“Moneyball” to build his teams. Bean and the Athletics were able to make a small-market team very competitive by utilizing information that had not previously been evaluated (McKinsey &

Company 2018). As described by the general manager, Jeff Luhnow, “the team began finding undervalued players that they were able to recognize through their use of 2

information and analytics.” This information ultimately translated to success as the Athletics’s frequently made the playoffs in the early 2000s (McKinsey & Company 2018).

In 2003, Michael Lewis published the book Moneyball: The Art of Winning an Unfair

Game. Once this book was published, the secret was and many other front offices also started relying heavily on analytics to make their teams more competitive. Once other organizations started focusing on analytics the department rapidly grew to become a key part of the game.

Luhnow described the transition by saying “In 2003, there were maybe four to five clubs that had analytics-dedicated people on their payroll,” today, however; “every general manager has some background or interest in analytics, and the typical size of the group in the front office is probably somewhere between 12 and 15 full-time people who all have advanced degrees, whether it’s computer science or physics or mathematics or some other discipline.” Luhnow elaborated that today’s game is all about finding any competitive advantage possible and that the best way to currently do that is through data analytics (McKinsey & Company 2018).

As technology continues to advance, the game of baseball will continue to become more and more reliant on statistics and analysis. Wharton statistics professor, Abraham Wyner, claims,

“the analytics group has already made its mark [on baseball].” Wyner explained that current contracts are awarded based on the projections of a player’s future productivity. These projections are changing the game and any new type of advantage analytics can make an organization more competitive. All 30 organizations view their analytics models as prized data and perpetually compete to one-up other team’s data departments (Knowledge at Wharton 2019). 3

1.2 Implications of Injuries in Professional Baseball

The evaluation of injuries and the driving factors that can predict future injury is relatively new to baseball. Since the creation of the game, players have always faced injuries from the stress that the game puts on their bodies. In particular, baseball players have always been subject to upper body injuries because of the immense stress that the throwing motion puts on their upper extremities. Today, leading medical professionals are looking to develop preseason tactics to help players avoid future injuries. These tactics are developed based off of predictive models that use an array of physical and data driven statistics to determine which players are most likely to be injured.

If medical experts can determine how to prevent or push back certain injuries there are billions of dollars in revenue that could be saved. In the past 15 years, Major League Baseball players on the disabled list accounted for $7 billion in lost wages (McCarthy 2018). In the past four years, there has been an increase in the number of players who have been injured each season.

Consequently, there has also been an increase in the revenue that has been lost as a result of players being injured.

In recent years, most studies have focused on models for the upper extremities of the body.

This is because a vast majority of player injuries happen to player’s elbows and shoulders. In particular, are the most likely to face upper body injuries. Several studies have been done on the physics of a pitcher’s motion and how certain angles and minor physical details can be predictors for injury. However, there is no universal model that identifies all of the predictors behind certain injuries.

The injuries of position players are much more difficult to predict because there is significantly less data behind their mechanics and a significant amount of their injuries occur as 4

“freak” injuries such as collisions or events that could not be predicted in a game. For position players, age is currently the leading predictor used to determine injuries. However, given all of the data and research that has been done on pitchers, the ability to develop an accurate model predicting their injuries is very much a possibility. Such a model would not only save teams millions of dollars but would also allow teams to determine how to optimally staff their medical department.

1.3 Objectives

Analytics have become a staple in professional baseball over the past quarter century.

Every organization is constantly looking to develop new models that give them a competitive advantage. While most models, have focused on performance related data and looking at players’ future productivity, this research intends to focus on player injuries. Furthermore, the research intends to create a model for professional baseball teams to (1) optimally staff their medical department, (2) predict future injury, and (3) analyze future player performance. The goal is to minimize the cost of injuries per team by staffing each team with the appropriate doctors while optimizing team performance. This model is data driven and builds upon a comprehensive set of data regarding player injury and performance in the past two decades.

This research focuses specifically on data from pitchers and their upper extremity injuries.

Currently, there is the most data available for pitchers and their mechanics. This available data will allow for a model to be generated using the predictors and provide insight into potential future injuries. The predictors that are used for this research will be based off of a combination of previous studies, injury databases, and pitch statistics. 5

Once injury predictors are determined, the model will factor in a financial analysis to aid teams in their decision-making. Based on the probability of injury for a specific player the team will be able to determine what the players present value is and how much the player may be worth in the future. Teams can use these financial factors when negotiating a contract and when determining what players to draft or bring into their organization.

The model will also aid teams in determining how to staff their medical department.

Currently when players are injured, some injuries are dealt with internally with team doctors and specialized physicians treat other injuries. If a team is able to optimally staff their medical department they could mitigate a significant portion of the cost that occurs with an injury.

Minimizing this cost would allow a team to focus their resources in other departments and ultimately achieve a competitive advantage.

Ultimately, the goal of this research is to increase a team’s performance. The information that this model will give teams access to will allow teams to be more successful in the future.

Similar to all previous analytic models in baseball, this model looks to give teams a competitive advantage. Better understanding player injuries could further change the game of baseball and have a profound impact on the future of the sport.

6

CHAPTER 2: A BASIS FOR INJURY MODELING

This chapter details the initial research and groundwork that was used for creating the model. After conducting an analysis of previous studies, this chapter demonstrates what will be used from previous studies as well as what will be changed.

2.1 Motivation

Over the past four years there has been an increase in both the number of players getting injured per year as well as the total cost of injuries per year. When a player becomes injured there are two main costs that a team faces: the medical cost of the injury, and the salary cost lost to the injured player. An injured player still receives their salary but is not able to contribute to the team.

This player essentially becomes a sunk cost for the team. If a team were able to predict future injuries or optimally staff their medical department for injury prevention they would be able to optimally spend their money and generate the best result (wins).

In 2018 alone, the cost of injuries in Major League Baseball exceeded $700 million. As shown in Figure 1, this cost is the highest in recent years. This cost consisted of the money lost to player salaries during their time on the injured list (previously known as the disabled list). The high cost of injury can partially be explained by the increasing salaries that professional baseball players have incurred over the past decade.

Figure 1: Total Cost of Player Injuries 7

Since players are getting paid more money the team also loses more money when that player gets hurt. It is for this reason that teams need to be determine a way of predicting if a player could face future injury. If a team can predict a future injury this will significantly decrease a player’s worth and help teams prevent additional future costs.

In addition to the increase in the cost of injuries, the number of players getting injured has also increased every year in recent years. As shown in Figure 2, nearly 600 players spent time on the disabled list during 2018. This is a much more concerning statistic because it shows that more and more players are getting injured. For this reason, the medical staffs of teams need to be further examined to determine if there are any prevention methods or other trends that could help teams to minimize costs.

Figure 2: Number of Player Injuries To combat the increasing costs and increasing number of injuries, this research intends to create a model that allows teams to optimize their medical staff based on certain injuries. Past studies have been more focused on individual players and how to properly evaluate their potential.

Instead of focusing on the players, this research will use player injury data to focus on the medical staff decrease the injury costs that a team incurs. 8

2.2 Literature Review

There are currently a wide variety of studies that have been conducted to predict injuries in baseball. In particular, these studies tend to focus on pitchers (the most injury prone position).

As a result of the violent arm motion that pitchers face when they are throwing. Research has been conducted on a variety of potential factors of injury such as biomechanics, and statistics. These factors are used as predictors to create predictive models for injury.

2.2.1 Biomechanical Evaluation of Injury In 2018, Michelle McCarthy published an article titled “Researchers Team up with Major

League Baseball to Predict Injuries before They Occur”. McCarthy’s research studied how to predict and prevent certain injuries. McCarthy teamed up with a University of Southern California

Professor of Clinical Physical Therapy, Lori Michener, to research certain biomechanical risk factors in professional baseball. Previously, most studies regarding biomechanical risk factors have only focused on range of motion in the shoulder (McCarthy 2018). Contrary to these studies,

McCarthy evaluated measurements from all parts of the body. In particular, the research focused on the correlation between muscle control, shoulder, trunk, and leg strength (McCarthy 2018).

Using these factors, McCarthy plans to analyze the frequency of certain injuries and their relation to biomechanics in future research.

The biomechanical approach is a new method in professional baseball that could potentially lead to developing prevention programs. If it is determined that certain weaknesses in the body correlate with specific injuries, training regiments to strengthen these areas could be implemented and ultimately prevent future injuries. For the purpose of the research in this thesis, this article shows that a medical staff can have a direct impact on if a player gets injured.

Investments into new medical research and proper diagnostics before a season could ultimately 9

prevent a player from getting injured. These factors must be considered during optimization as they could significantly lower the costs that a team faces.

Following a year of research, McCarthy published another article that detailed the hypothesis of her experiments. McCarthy predicted that “players who have a lower elbow force while they’re pitching and a set ball velocity compared to the pitchers who have a higher force at the same ball velocity will have better physical factors” (McCarthy 2018). This prediction demonstrates that injuries could be prevented by correcting physical factors by strengthening a player and by correcting his mechanics. If this prediction holds, the medical staff of a team becomes essential as they can do studies to correct mechanics and strengthen the pitcher – thus preventing injuries. The biomechanical research further demonstrates how an optimized medical staff can greatly improve a team’s value and lead to future success.

2.2.2 Statistical Evaluation of Injury Contrary to the biomechanical evaluation of injuries, there have also been studies that evaluate injuries based on a player’s statistics. These studies utilize the statistics as predictors and create mathematical models to predict the odds of injury in certain players. The Harvard University

Sports Analytics group published research on a model that could predict future injury for players.

The research analyzed a wide variety of predictors and used a logistic regression to generate a predictive model. The model uses data from the players career to predict the chance of injury that a player has in the following season. The study focused on injuries to the arm, shoulder, back, and side which are the most common injuries to players and relate to specific mechanical and muscle issues (Harvard Sports Analytics Group 2016).

The model determined that in 2016 there was a 23.2% average risk for a pitcher to get injured. The research also stated that “we would expect future years to hover around this value, 10

since there hasn’t been a secular change in pitcher usage or philosophy” (Harvard Sports Analytics

Group 2016). This statement demonstrates the importance of medical staff optimization. If a medical staff could utilize the predictors determined by Harvard Sports Analytics and the biomechanical factors discussed in Michelle McCarthy’s research, the medical staff could change their pitcher usage and philosophy. This deviation in thought would allow a medical staff to better utilize its players and minimize injuries. This information would give a team a competitive advantage by having all players at peak performance. In addition to this, the team would also be able to minimize their medical costs by decreasing the number of players that are on the disabled list. Based on this information, the primary objective of a medical staff should be to lower the average risk of a pitcher getting injured, and the medical staff determines this risk through statistical modeling.

2.2.3 Evaluation of Injury Model Types Kari Davis of Towards Data Science published research that created a model for predicting future injuries. Davis’s research focused specifically on evaluating models that would best fit the injury data in professional baseball. Davis analyzed a logistic regression model, a K-Nearest

Neighbors model, a Linear SVM model, a Random Forest model, a Linear SVM with Smote model, and a Linear SVM with Adasyn model. Based on the research, it was determined that the

Linear SVM with Random Oversampling scored the highest for the specific predictors tested

(Davis 2019).

Davis’s research demonstrates how there are different means of analysis of data for injury evaluation, and a team must factor in which model is optimal when analyzing their data. An inaccurate model could end up costing a team more money than it saves them by generating inaccurate predictions. This emphasizes how essential it is for a team and medical staff to make 11

sure that they are considering the proper factors. If a team is able to accomplish this, they could follow Davis’s ideology and create an app that gives a specific breakdown of each player. The combination of biomechanical, statistical, and modeling research could create a powerful predictive model that would lead to great future success for teams. As baseball moves more towards analytics, any model that offers a team a competitive advantage is highly coveted.

2.3 Injury Trends in Recent Years

To begin research on this thesis, an analysis of injury trends was conducted. The analysis aimed to determine which injuries were most common and how see if the percentage of certain injuries varied over time. To conduct this analysis, data was collected from baseball prosportstransactions.com. This website contained data on every transaction in professional baseball since the beginning of the game. To limit the scope of data, only injuries in the 21st century were considered. All data was downloaded into excel and then filtered to show the disabled list transactions. Disabled list transactions were selected because every time that a player faces an injury they are placed on the disabled list. The disabled list is the best representation of what injuries are occurring in baseball and can show specific injury trends.

Once the data had been filtered to show the disabled list data, it was then filtered further to show injuries to specific parts of the body. Using this data, a variety of pivot tables were generated showing the total number of injuries to specific body parts that occurred each year. An analysis was then conducted to show what percentage of injuries occurred to specific body parts in a given season (Table 1). 12

Table 1: DL Data Yearly Percentages

13

As shown in Figure 3, shoulder and elbow injuries have the highest frequency of any injury in every year. In a given season, shoulder and elbow injuries account for over 30% of the total injuries in professional baseball, or roughly $210 million in lost salaries. Although there is some variation in the percentage of injuries that occur in a year it is clear that shoulder and elbow injuries are the most common injuries in baseball.

Figure 3: Yearly Trends of Injuries The other notable injuries that occurred happened to the knee, hamstring, and back. These injuries all hovered around 5% of total injuries for each season ($35 million in lost salaries). Based off of these frequencies, it is evident that elbow and shoulder injuries are the most common and the mitigation of these injuries have the most potential in savings. Based on the savings potential it was determined that the predictive model would focus specifically on elbow and shoulder injuries. Although there is still significant savings to be had with other types of injuries, there is currently not enough data on the other injuries to create an active model. 14

2.4 A Basic Analysis of Common Injuries

Once an analysis had been conducted on the frequency of specific injuries it was essential to do a further analysis of the most common injuries and determine any underlying trends. To conduct this research, the data collected from Pro Sports Transactions was manipulated in a variety of ways to see what trends were prevalent. After looking at all injuries, it became evident that injuries tend to follow a seasonal trend. As shown in Figure 4, the most injuries occur at the beginning and end of the season (March and September/October).

Figure 4: Monthly Analysis of Injuries The professional baseball season starts in March with when players first start practicing and playing in games. The regular season concludes at the end of September for teams that do not make the playoffs, and in October for teams that make the playoffs. The data clearly shows that the highest number of injuries occur at the start and end of the season, and less injuries occur in the middle of the season once players are in shape. This data suggests that two main drivers of injury could be poor off-season training (March injuries) and over usage/mechanical injures (September injuries). These two main drivers are essential for a team to understand so that they can (1) invest in proper injury prevention during the off-season, and (2) 15

maximize the medical staff preparation during the start and end of the season so that injuries can be predicted and treated before they become damaging and cause a player to go on the disabled list.

To further analyze the monthly injury trends, specific injuries were analyzed vs. the time of year. This data showed that shoulder injuries occur more frequently whereas elbow injuries are most common at the start of the season. The seasonal trends for both elbow and shoulder injuries can be seen in Figures 5 and 6. The seasonal trends further emphasize that the team must focus on off-season prevention and medical staff optimization.

Figure 5: Shoulder Injury Seasonal Trend

Figure 6: Elbow Injury Seasonal Trend 16

In addition to analyzing shoulder and elbow injuries, an analysis was also done for hamstring, knee, and back injuries (roughly 5% of total injuries each) to see if they followed similar trends to the total injury seasonal trend. As shown in Figures 7, 8, and 9 the hamstring, knee, and back injuries deviated from the total injury trend than the elbow and shoulder injuries. This can be explained because these injuries are typically more “freak” injuries that are not as predictable. The elbow and shoulder injuries have measurable factors and occur in a trend which make them easier to predict and model. This is another reason why the predictive model generated in this research focusses specifically on elbow and shoulder injuries.

Figure 7: Hamstring Injury Seasonal Trend

Figure 8: Knee Injury Seasonal Trend 17

Figure 9: Back Injury Seasonal Trend

2.5 The Pitcher

After determining that the research would focus specifically on shoulder and elbow injuries the next step was determining if there was a specific position to focus on, and where the most data was available. After analyzing each position, it became evident that the model should focus on the pitcher. Pitchers incur the most arm and shoulder injuries of any position. This is to be expected as the main function of the pitcher is throwing which puts tremendous amounts of force and stress on both the elbow and shoulder. A variety of studies have been conducted on pitchers and there is publicly available data on a variety of baseball websites that can be used to generate predictors in the model.

2.5.1 The Injuries of a Pitcher In the 1970s a pitcher for the , Tommy John, suffered an injury to his ulnar collateral ligament in his elbow. This injury was directly related to the stress on his elbow from pitching in games. After the injury, Tommy John became the first person to have ulnar collateral ligament reconstruction surgery which is now known as “Tommy John Surgery.”

Tommy John surgery has become so frequent in baseball that every season dozens of professional baseball players undergo the surgery. The injury happens because of a player’s mechanics and excessive throwing. The pitching motion is not a natural motion and as a result there are a variety 18

of forces and torques that certain body parts face. As stated in McCarthy’s research, “elbow torque occurs during pitching and the higher that torque goes, the more pressure it will put on that elbow ligament. If there’s too much pressure on the UCL, it will tear” (McCarthy 2018). The Tommy

John injury has become one of the most common in baseball, but there is potential of predicting it and reducing it. The Tommy John injury directly correlates with predictors such as past surgeries, player biomechanics, and frequency of certain pitches. The prediction and prevention of the

Tommy John injury would significantly decrease elbow injuries in baseball.

In addition to the “Tommy John” injury, baseball players also face a wide variety of other shoulder and elbow injures such as torn labrums, torn rotator cuffs, and various other ligament and soft tissue damages. All of these injuries directly relate to the forces that a pitcher’s arm incurs while the pitcher is throwing. This means that these injuries are also predictable and given the correct set of predictors could also be minimized.

2.5.2 The Physics of Pitching As mentioned during the biomechanical approach that McCarthy discussed there is a significant amount of physics that goes into pitching a baseball. McCarthy stated that the physical factors used to create velocity on a pitch consist of strength and control of the legs through the trunk, through the shoulder, and to the hand (McCarthy). If any part of the pitching motion deviates, the pitcher can experience a variety of forces and torques on certain parts of their elbow and shoulder. These torques can result in the ligaments and muscles in the elbow and shoulder having partial tears or wearing down over time. In a given game, a pitcher can throw upwards of

100 pitches, and 1,000s of pitches in a season. If the mechanics of a pitcher are off this repetitive motion will wear away at their body over time and ultimately lead to a serious injury. 19

It is very rare that two pitchers ever have the exact same mechanics. Every pitchers motion varies slightly from the next, and consists of a of sequential lower body and upper body mechanisms (Figure 10). Each mechanism in the sequence can be measured and used as a predictor that generates the model. For example, pitchers can throw a pitch from a wide variety of angles.

Some pitchers will throw the ball directly over the top, while others will throw at a ¾ angle, and others will throw from a sidearm angle. Each angle puts different forces and torques on the pitcher’s arm (Kagan 2009). Given the data on the pitcher’s arm angle and other mechanics, a medical staff could conduct an analysis of the arm-lifespan of a pitcher before injury.

Figure 10: Sequential Mechanics of a Pitch Overall, the physics of a pitcher can be studied to help understand and predict future injuries. This study will focus on predictors that are publicly available which limits some of the potential predictors. However, if an actual teams’ medical staff were to use the model they could use specific testing on their players to create quantifiable measurements for each mechanic in the motion. Using this data, a medical staff would be able to generate a model that is more accurate. 20

2.6 Potential Predictors of the Injury Prediction Model

To gain an understanding of potential predictors that could be analyzed to generate this model, I first looked back at the research that had previously been done by the Harvard Sports

Analytics group and Kari Davis. Harvard Sports Analytics focused their analytics specifically on pitch data to determine the model. These predictors (seen in Table 2) consider the percentages of each type of pitch that is thrown. This is because certain types of pitches put different forces on the arm, which results in certain pitches having a stronger correlation with injury.

Table 2: Harvard Sports Analytics Predictors

While the Harvard predictors focus solely on statistics and pitch analysis it is also important to consider mechanical factors in a pitcher’s motion as well as the pitchers age. In addition to the

Harvard predictors, predictors such as arm angle, arm lag, balance point, and hip drive should all be considered in the model. Since some of these factors are difficult to quantify, a smaller sample of pitchers will be analyzed and used to represent the whole. This is the reason that many previous 21

studies such as the Harvard Sports Analytics study focused only on statistical values. However, if this research is able to generate a model that incorporates mechanics, the medical staff can work on preventative action and work to minimize the total number of injuries that occur.

2.7 The Modeling Approach

As previously stated, the objective of the model is to (1) optimally staff their medical department, (2) predict future injury, and (3) analyze future player performance. To meet these objectives, two models will be created: a predictive model, and a staffing model. This predictive model will use some of the predictors detailed in the previous section to determine the odds of a player getting injured. Additionally, the predictive model will also incorporate biomechanical variables to help determine the odds of injury. Based off of the predictions for future injury, the medical staff will be able to determine the players future performance as well as the future worth of a player. In addition to the predictive model, a staffing model will be created to analyze how a team can utilize capital within the medical department. The staffing model will be based on known data and will compare the tradeoffs between various staffing options. This will allow the medical staff to optimize spending and increase the team’s future performance. In turn this will allow the model to meet its objectives.

22

CHAPTER 3: PREDICTING FUTURE INJURY

This chapter details the predictive model that was created to determine the likelihood of a player getting Tommy John surgery. The chapter details the variable selection process, as well as the types of models that were explored. The chapter then explains the findings of the model and their relevance to the thesis.

3.1 Variable Selection for the Injury Prediction Model

The variable selection for the injury prediction model consisted of several steps and validations. Initially, an array of variables was collected based on the research of past studies.

These variables consisted of basic statistics, pitch selection, and velocity. In addition to these variables, a group of variables for ball movement was also selected. The ball movement variables were to indicate the mechanics of a pitcher. The ball movement is a direct result of the arm angle of a pitcher which means that these variables allow to see how mechanics factor into Tommy John

Surgery. The decision to include all of these variables came by combining the findings of the

Harvard Sports Analytics Study, and the McCarthy Biomechanical study. The combination of these variables yielded a new model which is described in this chapter.

3.1.1 Initial Selection of Variables As stated, the predictor variables were broken into four basic groups: basic statistics, pitch selection, pitch velocity, and mechanics. One binary response variable was selected: Tommy John

Surgery. The Tommy John variable (TJ) was a binary variable with 1 indicating that a player had

Tommy John Surgery, and 0 indicating that a player had not had Tommy John Surgery.

All of the predictor variables were collected from the Pitch Type Statistics page on the

FanGraphs Baseball website. The FanGraphs website had data for these statistics from every pitcher from the 2000 to 2019 season. Additionally, the Tommy John data came from the Tommy 23

John database (Tommy John Surgery List). This database included a list of every player who has had the Tommy John Surgery and their level of play when they had the surgery. The combination of these two data files was then exported into as CSV file that was used for the RStudio coding to create the model. The initial file that contained all of the variables was titled Pitcher.Data . This file used for the initial modeling before having certain variables eliminated. Table 3 (seen below) shows all of the variables that were initially selected. The variables that are highlighted yellow in

Table 3 were the first variables eliminated from the model.

Table 3: Initial Variable Selection for Model

Variable Label in R Actual Variable Variable Label in R Actual Variable W Wins SF. Percent Pitches Sinkers L Losses SFv Sinker Velocity SV Saves KN. Percent Pitches G Games Played KNv Knuckleball Velocity GS Games Started FA.X X.Movement: IP Pitched FC.X X.Movement: Cutter K/9 Per 9 Innings FS.X X.Movement: Splitter BB/9 Walks Per 9 Innings SI.X X.Movement: Sinker ERA Earned Average CH.X X.Movement: Age Age SL.X X.Movement: H Hits Allowed CU.X X.Movement: BB Walks Allowed CS.X X.Movement: Unknown CG Complete Games KN.X X.Movement: Knuckleball Pace Pace Between Pitches SB.X X.Movement FB. Percent Pitches FA.Z Z.Movement: Fastball FBv Fastball Velocity FC.Z Z.Movement: Cutter SL. Percent Pitches Sliders FS.Z Z.Movement: Splitter SLv Slider Velocity SI.Z Z.Movement: Sinker CT. Percent Pitches Cutters CH.Z Z.Movement: Changeup CTv Cutter Velocity SL.Z Z.Movement: Slider CB. Percent Pitches CU.Z Z.Movement: Curveball CBv Curveball Velocity CS.Z Z.Movement: Unknown CH. Percent Pitches KN.Z Z.Movement: Knuckleball CHv Changeup Velocity SB.Z Z.Movement: Screwball TJ Tommy John Surgery Indicator 24

As seen in Table 3, there were initially 48 predictor variables with the one response variable. Initially, all pitches that could be thrown by a pitcher were evaluated (Variable CS accounts for unknown pitches). For the variables indicating mechanics, the x direction indicates the movement of the pitch on the same plane as the plate, or the horizontal. This means that the x movement is evaluating the movement of the pitch from side to side. On the contrary, the z moment variables indicate the vertical movement of the pitch. The combination of these two variables would allow future research to better understand the release point and the prior mechanics leading to a pitch.

The initial variable selection encompassed such a large number of variables to ensure that the model would not be missing any unforeseen predictors. As detailed in later sections, a sensitivity analysis using backwards selection was used to then determine the significance of variables and ensure that

3.1.2 Variable Types To allow for a better understanding of the variables, comparisons of the variables within their specific subgroup were created to show the relationship between certain variables. The variables that fell into the basic statistics group were: wins, losses, saves, games played, games started, innings pitched, per 9 innings, walks per 9 innings, earned run average, age, hits allowed, walks allowed, complete games, and pace between pitches. All of these variables come from statistics that are regularly tracked during every game. These are the statistics that one would read about in the newspaper following a game. While these statistics are useful for fans, they also help the model by showing the effectiveness of the pitcher, the total number of games that a pitcher has played in (strain on the arm), and the type of pitcher the player is (GS indicates that the pitcher is a starter who typically throws longer in games than a reliever). 25

Figure 11 (below) shows the basic analysis of these variables. Across the median of Figure

11 is the variable label (full names can be found in Table 3). The chart then compares each variable to the others. Within each chart, the x-axis variable is determined by what variable on the median is in the same column as the chart. The y-axis variable is then determined by what variable on the median is in the same row as the chart.

Figure 11: Comparison of Basic Pitching Statistics 26

In Figure 11, there are four particular graphs that are highlighted. These graphs compare

Innings Pitched and Games Started, Innings Pitched and Age, Innings Pitched and Complete

Games, and Hits and Games. The comparison between games started and innings pitched shows a strong linear relationship between the two variables. This is because starting pitchers typically throw the most innings of a pitcher and thus have the largest strain on their arm throughout a game.

The innings pitched and age chart shows that as a pitcher gets older in age they typically play less and throw less innings. Additionally, the hits and games comparison show the effectiveness of a pitcher. As seen in the chart, there is a split in the curves. The line that follows a flatter curve indicates a typical pitcher who throws a large number of innings and its positive correlation with hits. However, the other split in the graph shows how a pitcher who gives up a lot of hits may not play as much because they are not affective (thus the lower innings pitched). Finally, the chart comparing innings pitched and complete games shows that pitchers who throw complete games typically see a lot of innings which could be a potential indicator for strain and injury.

The next group of variables that were evaluated were the pitch selection and pitch velocity variables (seen in Figure 12). Initially, this selection included all possible pitches that could be thrown and compared the percentage of a certain pitch as well as the velocity of the pitch. This variable analysis was particularly helpful for determining how much data is available on each pitch and how useful of a variable the certain pitch would be. The data also showed a comparison between selection of certain pitches. This shows relationships such as how high fastball velocity leads to an increased percentage of fastballs being thrown. However, the biggest finding from this variable comparison was the lack of data available about the knuckleball. This is because it is very rare to find a knuckleball pitcher. As a result, there is an insufficient amount of data available about . This can be shown in the red box that is on the data plot. The knuckleball chart has 27

minimal data. This led to the knuckleball being removed from the variable list and not being put into the initial models.

Figure 12: Comparison of Pitch Selection Following the pitch selection variables, the pitching mechanics variables were evaluated.

These variables could be broken into two different directions: the x-movement of a pitch, and the z-movement of a pitch. Figure 13 shows the comparison charts of both the x and z movement of 28

each pitch. As highlighted by the three red boxes, there was minimal data on the knuckleball movement, the screwball movement, and the unknown pitch movement. For this reason, all of these variables were eliminated before being put into the model. This is because the variables with minimal data could have skewed the model because there was not enough data for them to be accurate predictors.

Figure 13: Comparison of Pitch Movement

29

Figure 14: Pitch Movement (X)

Figure 15: Pitch Movement (Z)

30

The lack of data is further highlighted by Figures 14 and 15 which show the scarcity of the data for the unknown pitches, knuckleballs, and . Eliminating these variables became the first step in the variable elimination. Once these variables were eliminated, a new data set

(Pithcer.Data1) was created. This dataset was then input into RStudio and used for the backwards intuition that led to the model creation.

3.1.3 Variable Parameters and Elimination Once the basic relationships between the variables had been analyzed and the variables had been put into their subgroups, the next logical steps were eliminating certain variables and setting basic parameters. As mentioned in the previous section, there were certain variables that did not have enough data to be relevant in the model. These variables were all based on certain pitches – the knuckleball, the screwball, and unknown pitches. The knuckleball and screwball are both pitches that are very rarely thrown, and only used by a small handful of pitchers. Therefore, these pitches would not be useful to classify all pitchers in the model. Similarly, the unknown pitches category consisted of pitches where something abnormal happened and the pitch was not normal.

The probability of these unknown pitches occurring in a game is very low. Therefore, this variable was also deemed not useful to classify all pitchers in the model.

As a result of these pitches not being included, eight different variables pertaining to these pitches were eliminated. These variables can be seen in Table 3 as the variables that are highlighted yellow. The variables that were eliminated due to a lack of data were – percent pitches knuckleball

(KN.), knuckleball velocity (KNv), unknown pitch x-movement (CS.X), knuckleball x-movement

(KN.X), screwball x-movement (SB.X), unknown pitch z-movement (CS.Z), knuckleball z- movement (KN.Z), and screwball z-movement (SB.Z). 31

After eliminating the above eight variables, the next logical step was to set parameters for the data and remaining data. The first parameter that was selected evaluated the total number of innings pitched that a pitcher had. To make the data in the model more accurate, it was determined that an observation (specific pitcher) must have at least 50 innings pitched to towards the data. This decision eliminated 724 observations (roughly 28.8% of the data). After evaluating the data before and after the elimination, the same proportion of pitchers who had Tommy John surgery remained.

Innings pitched was chosen as a parameter of the data because pitchers who did not throw at least 50 innings skewed the data and did not have enough observations of each pitch to make the data useful. This was determined after running the model once and seeing that the tendencies of pitchers who did not play much were driving some of the results. This would cause the model to become inaccurate and cause a small portion of data to represent the whole. To correct this problem and make the data more accurate, the 50 innings pitched parameter was added to the data.

Once this parameter was added, the data was ready to be manipulated in R Studio. The next step in the process was running a sensitivity analysis to determine which variables were significant.

3.1.4 Backwards Selection of Variables To determine the significance of the remaining 39 variables, a sensitivity analysis was run using R Studio. The default setting for selection in R Studio is backwards selection. Backwards selection starts by using all of the variables in the model and eliminating one variable with each step. The variable that is deleted is determined by the AIC and p-value of the variable. The backward selection continued to run until the subset of variables that results in the best performing model is found with a maximized R squared value and minimized AIC value. 32

To check that the model was working correctly, each variable that was eliminated was evaluated against previous studies and looked at from a logical standpoint. This was done because certain variables such as innings pitched, and fastball velocity and been shown in essentially every study to have an affect on Tommy John surgery. If either of these variables had been eliminated from the model it would have been clear that there was a problem in the data or the code.

To run the backwards selection in R studio, the code “testglm <- step(datFit6) ; summary(testglm). This code utilized all of the data that was collected and the variable string of

“W+L+SV+G+GS+IP+K.9+BB.9+ERA+Age+H+BB+CG+Pace+FB.+FBv+SL.+SLv+CT.+CT v+CB.+CBv+CH.+CHv+SF.+SFv+FA.X+FC.X+FS.X+SI.X+CH.X+SL.X+CU.X+FA.Z+FC.Z+

SI.Z+CH.Z+SL.Z+ CU.Z” This resulted in the backwards selection process that can be seen below in Table 4.

Table 4: Sensitivity Analysis of Variables

Iteration Number Number of Variables Variable Eliminated AIC 1 39 - 2309.59 2 38 CU.X 2307.59 3 37 FA.X 2305.59 4 36 CTv 2303.6 5 35 BB 2301.62 6 34 Pace 2299.63 7 33 CH.Z 2297.68 8 32 BB.9 2295.79 9 31 CB. 2293.91 10 30 CBv 2292.01 11 29 SLv 2290.18 12 28 FC.x 2288.47 13 27 L 2286.77 14 26 W 2285.05 15 25 CHv 2283.41 16 24 FC.Z 2281.82 17 23 SL.X 2280.24 18 22 SI.X 2279.08 19 21 CH.X 2277.17 20 20 SL.Z 2276.2 21 19 FA.Z 2274.88 22 18 Age 2273.7 23 17 SFv 2273.11 33

As seen in the above table, the model took 23 iterations to get the best performing subset of variables. The final model utilizes 17 variables and an intercept. The majority of the variables that were eliminated could be classified as the directional variables or basic statistics that were repetitive. The final model selected only three directional variables (splitter x-movement, sinker z-movement, and cutter z-movement). The model also selected eight of the variables that can be classified as basic statistics (saves, games, games started, innings pitched, strikeouts per 9 innings, earned run average, hits, and complete games). Additionally, six pitch selection variables were selected (fastball percentage, fastball velocity, slider percentage, cutter percentage, changeup percentage, and splitter percentage). The variables that were selected can be seen below in Figure

16.

Figure 16: Variable Selection 34

Although the model is further analyzed later in this chapter it is noteworthy that the basic statistics mostly related to total innings pitched. The variables games, games started, and complete games all have strong positive correlations with innings pitched. This matches the predication that the more people throw the more strain is put on a pitcher’s arm which can lead to potential injury.

Additionally, with the exception of fastball velocity, all of the pitch selection variables focused on the percentage of a certain pitch being thrown. This demonstrates how the types of pitches that a pitcher throws put various stresses on the body and can relate to surgeries.

3.2 Model Type Selection

The model selection for this research consisted mainly of trial and in testing various models. The two models that were tested were a linear model, and a generalized linear model. The models were evaluated based on their R2 value, and their adjusted R2 value. The linear model was initially tested and it was later determined that the generalized linear model was a better fit for the data since the response (Tommy John Surgery) was a binary variable.

3.2.1 The Linear Model The first model type that was tested was the linear model. The linear model was chosen because it is one of the most common and simplest data distributions. The model was created in R studio using the “lm” function which generated a linear model based on all of the input variables previously discussed. When the linear model was first run (using all of the variables), both the R squared and adjusted R squared values were extremely low. This indicated that the model was not affective. The initial solution to this problem was hypothesized to be eliminating variables. To complete this, a manual backwards selection was performed in R studio and the R squared and adjusted R squared values were observed after each iteration. After 15 iterations, both the R squared and adjusted R squared values had barely changed. This indicated that the issued was 35

within the model and not the variables. To test this hypothesis, the data was plotted onto a quantile- quantile plot that compared the data to the line of a normal distribution.

A quantile-quantile plot, or Q-Q plot, is commonly used to check the assumption of what distribution a particular set of data fits. The Q-Q plot is created by making a scatterplot that compares two sets of quantiles against one another. If the data is normally distributed, the plot should form a straight line, or relatively straight line.

As shown in Figure 17 the data deviates a significant amount from the redline which portrays the linear model. For the data to fit the linear model, the points should closely follow the red line – which they clearly do not. The Q-Q plot actually demonstrated that the data followed the generalized linear model much . This is largely in part because the response variable is binary. Since the response is binary, the linear model is not a good fit. This is because the linear model will give a straight line which could predict certain values. Contrary to this, the binary response indicates that the model should be finding the odds of a player getting Tommy John surgery. Since this is the case, the model should solve for the odds between 0 and 1, and not solve for a linear model.

Figure 17: Normal Q-Q Plot of Linear Model 36

3.2.2 The Generalized Linear Model (GLM) The Generalized Linear Model proved to be a much better model for the given set of data.

After disproving the linear model using a variety of tests and a logical analysis shown in the

Normal Q-Q Plot, it became apparent that the GLM was the best option to model the data.

Following the backwards selection that was shown in a previous section, the GLM underwent various testing to determine if the model was significant and could be used to complete the analysis.

Based on a 90% significance level the model was determined to work. The 90% significance level was used instead of the typical 95% significance level because the model wanted to analyze as many variables as possible. In order to get the model to a 95% significance value, more variables would need to be removed including the significant variables such as Innings

Pitched and Fastball Velocity. Based on this analysis, it was accepted that although a 90% significance interval is not as accurate as a 95% significance interval, the 90% significance interval provided more insight into certain variables.

Once the Generalized Linear Model was proven to have statistically significant results, the next steps were analyzing the model and determining how each variable affects Tommy John surgery in a player. The odds ratio that was generated by the Generalized Linear Model provided insight into how much the values of certain variables would need to be increased to have a significant impact on the odds of injury. The GLM was the best model to analyze these odds and see how specific variables could lead to injury.

37

3.3 Analysis of Injury Prediction Variables

After determining that the Generalized Linear Model was the best fit for the data, the next logical step was running the data and analyzing the variables and the coefficients associated with them. To create the model, the glm function was used in R studio to generate the model. The summary function then summarized the model by giving an estimate of the coefficient for each variable that was deemed significant. The values for the coefficient of each variable can be seen in Table 5.

Table 5: Model Variable Coefficients

Variable Value Splitter Percentage 0.680013 Cutter Percentage 0.5994836 Fastball Percentage 0.3379757 Changeup Percentage 0.3327832 Slider Percentage 0.2537667 Saves 0.0978979 Strikeouts Per 9 Innings 0.025574 Complete Games 0.0088316 Sinker Z-Movement 0.0053322 Games Started 0.0052599 Games 0.0004876 Hits 0.0004108 Innings Pitched -0.0015073 Fastball Velocity -0.0024622 Fastball X-Movement -0.0108701 Cutter Z-Movement -0.0118188 Earned Run Average -0.0253235

Table 5 orders the variables based on which coefficients have the highest values. As shown in Table 5, all of the percentage variables have the highest coefficients. This is because when the 38

data for the percentages were entered, they were entered as decimal values so a one percent increase in splitter pitches would have a value of 0.01. Since the value for the data is so low, the corresponding variable coefficients are much higher.

To better understand the values associated with each variable, a sensitivity analysis was created to see the affects that changing certain variables would have on the odds of getting Tommy

John surgery. As seen in Table 6, some of the most impactful variables were the splitter percentage, complete games, saves, and games started. When comparing the variable groups of basic statistics, pitch selection, and pitch movement, pitch movement had the least impact on the odds of a player getting Tommy John surgery.

Table 6: Sensitivity Analysis of Variables Effect on Tommy Variable Value Increase John Odds Splitter Percentage 0.680013 10% 6.80% Cutter Percentage 0.5994836 10% 5.99% Fastball Percentage 0.3379757 10% 3.38% Changeup Percentage 0.3327832 10% 3.33% Slider Percentage 0.2537667 10% 2.54% Saves 0.0978979 1 9.79% Strikeouts Per 9 Innings 0.025574 1 2.56% Complete Games 0.0088316 10 8.83% Sinker Z-Movement 0.0053322 1 0.53% Games Started 0.0052599 30 15.78% Games 0.0004876 100 4.88% Hits 0.0004108 100 4.11% Innings Pitched -0.0015073 1 -0.15% Fastball Velocity -0.0024622 1 -0.25% Fastball X-Movement -0.0108701 1 -1.09% Cutter Z-Movement -0.0118188 1 -1.18% Earned Run Average -0.0253235 1 -2.53% 39

To evaluate the sensitivity of the pitch selection variables, each variable was altered by increasing the selection by 10%. Increasing the pitch selection by 10% had the greatest impact on splitters, followed by cutters, fastballs, changeups, and sliders. The model determined that if a pitcher were to increase the number of splitters thrown by 10%, the pitchers odds of getting

Tommy John surgery would increase by 6.80%. Similarly, the odds of Tommy John surgery would increase by 5.99%, 3.38%, 3.33%, and 2.54% for cutters, fastballs, changeups and sliders respectively. Although no one pitch drastically increase the odds of surgery, these changes in odds demonstrate how the pitch selection of a pitcher has a huge long-term effect on his health. Pitches that put more stress on the elbow and shoulder of the pitcher result in increased odds of future injury. Based off this model, teams and players should evaluate the pitch selection of a pitcher to project the longevity of his career.

Following the pitch selection variables, the basic statistics were evaluated using a sensitivity analysis. In particular, the variables strikeouts per 9 innings, complete games, and games started yielded interesting effects. As the strikeouts per nine innings were increased by 1, the odds of getting Tommy John surgery increased by 2.56%. This indicates that strikeout pitchers could be more likely to get hurt as a result of throwing more pitches thus causing more strain. The next variable evaluated was complete games. As the total of complete games that a pitcher throws increases by 10, the odds of getting Tommy John surgery increased by 8.83%. Similar to the logic behind strikeouts, when a pitcher throws a complete game, they throw significantly more pitches which can lead to additional stress on the arm. The games started variable also followed similar logic. When a pitcher starts an additional 30 games (roughly the number of games a pitcher would start in a season) the odds of getting Tommy John surgery increases by 15.78%. This demonstrates 40

that after each season a pitcher has additional usage on his arm and becomes more likely to get injured in the following seasons.

Finally, the movement variables were evaluated to see their effect on the odds of having

Tommy John Surgery. The movement variables did not have as strong of an impact on the odds as other variable groups but still led to some notable conclusions. As the downward or movement of the splitter decreased by 1 inch, the odds of getting Tommy John surgery increased by 0.53%. To generate more downward movement on a splitter a pitcher must throw out of a slightly different arm angle. This slight change in odds could be used to demonstrate how changing the pitcher’s mechanics could potentially lead to surgery. The same can be said about the fastball x-movement and cutter z-movement. As a fastball tails to the left one additional inch, the odds of getting Tommy

John surgery increase by 1.09%. Additionally, when a cutter has 1 inch less of z-movement, the odds of getting Tommy John surgery increases by 1.18%. Despite the low effects of pitch movement, they still demonstrate how mechanics of pitchers should be further evaluated to prevent future injuries.

41

CHAPTER 4: MODELING THE MEDICAL STAFFING OF A TEAM

This chapter details the current layout of medical staffing in Major League Baseball, the roles of medical staff members, and the tradeoffs between various staffing various roles. The baseline data and overview of team medical staffs was provided by a personal interview with an anonymous MLB team medical consultant.

4.1 An Overview of a Major League Baseball Team Medical Staff

In professional baseball, the world of medicine is significantly different than anything else in the real world. Players receive almost instantaneous treatment, insurance costs are forgotten, and hospitals or medical groups pay teams to have first access to treat players. These standards are not that of a typical patient’s experience, and the exceptions occur because professional athletes are not typical medical patients.

When a player needs to seek medical help, the primary objectives of the team are to get the diagnosis correct the first time, minimize the players time to return, and keep the player satisfied.

On the other hand, hospitals and medical groups often utilize the treatment of professional athletes to bolster publicity and strengthen their brand within their region. When a professional athlete gets hurt and seeks treatment from a particular group, people naturally assume that the group is the best in their field. The combination of these team and medical objectives result in the unique setup of a Major League Baseball team medical staff.

All 30 teams in Major League Baseball have a medical staff, and every medical staff has a head team physician, and a head athletic trainer. However, other than the head physician and head trainer, the composition of every team’s medical staff varies significantly. There are no proven studies on how a team should staff their medical department, and most of the staffing is done at 42

the discretion of the head physician, general manager, and organization president’s philosophies.

A lot of this philosophy is simply based on what has worked well in the past or unproven theory.

As a result, every team has a different medical staff composition that utilizes a variety of physicians, medical consultants, athletic trainers, strength and conditioning coaches, physical therapists, and nutritionists. The only real constraints that teams face are varying budgets (typically around $3 million annually), and differences in ideology. Each team strives to optimize their medical department and prevent future injury. Although treating injuries is the primary role of doctors, most members of a team’s medical staff are enlisted to help prevent future injuries and keep player medical costs low.

This section will explore the variety of roles within a team’s medical staff and discuss the function, and benefits of utilizing each role.

4.1.1 Team Physicians At the core of any medical practice you find the physicians – a professional baseball team’s medical staff is no different. One of the only commonalities between teams is that every team has a head physician. The head physician of a team is often in charge of staffing the medical department based on their philosophy and determining how the team allocates its budget. The head physician of each team can come from a wide variety of backgrounds. The head physician can come from either an orthopedic surgeon background, or a family medicine background. Currently, there is roughly a 50/50 split between these backgrounds amongst the head team physicians. The physicians with orthopedic backgrounds typically specialize in hand/wrist, shoulders, or knees.

The family medicine / internal medicine physicians can be grouped as sports medicine doctors, or other. Roughly 50% of the family medicine doctors specialize in sports medicine and the other half specialize in just about anything else. 43

The specialization of the team physician becomes important when considering what type of injuries that the players will face. Similar to normal patients, baseball players can face a wide variety of illness or injury which results in them often seeing doctors outside of the team’s medical staff. If a player injures his shoulder and the head team physician specializes in knees, then the player will need to seek other medical help. This happens more frequently than one would expect, and in professional baseball only 9-10% of injuries are dealt with in-house by a team’s medical department. To combat this and treat as many players as possible, teams will often utilize multiple assistant physicians. The assistant physicians can come from a variety of backgrounds and help to supplement the knowledge of the team’s primary physician. Additionally, this leads to the conclusion that it is often better for the team’s head physician to have a generalized background.

A physician with a generalized background will be able to treat a much larger percentage of injuries and illness than a specialized physician. At the very least, a generalized team physician can be a strong first point of contact for a player and direct the player to the team’s preferred specialized doctors if they need additional treatment (similar to a regular patient’s primary care physician).

Team physicians deviate from real-world medicine most when it comes to how they are paid. Contrary to popular belief, a team’s head physician is not paid by the team – they are paid by the hospital or practice where they are staffed. The payment of team physicians can best be described as an exchange of services. A hospital or practice exchanges their medical services for the publicity and brand strength that the team provides them. The typical practice will offer the team a package that exceeds $1 million to become the primary physicians for a team. This $1 million package is not all cash, it consists of a variety of services. Within the package that a practice offers a team, they offer a variety of therapists, psychologists, bloodwork, EKGs, health tests,

MRIs, x-rays, and doctors. To fulfil a team’s need for quick treatment, a practice could offer a 44

team 50 free MRIs for a season where the team’s insurance is not billed. Additionally, a practice could offer to have a sports psychologist provide weekly office hours for the team in its clubhouse, or have a massage therapist visit the team three times a week.

The hospital exchanges all of these services for the brand recognition that the team generates them. For example, most people in Pittsburgh know that Alleghany Health is the official provider of the . Similar, in the Philadelphia region, most people know that the

Rothman/Jefferson group is the official healthcare provider of the . Health care providers are able to brand treating the teams and create the perception that they are the best provider since professional athletes visit their facilities. When a doctor performs a successful surgery on a high-profile athlete, that doctor then generates interest amongst youth and college athletes in the area with similar injuries. The hospitals and medical groups have valued this exposure and publicity to be well worth the $1million in service that they provide the team. As a result, the team receives top of the line treatment and can focus its medical staff budget towards other roles besides the head and assistant physicians.

4.1.2 Team Medical Consultants Starting in the early 2000s some Major League Baseball teams began hiring medical consultants who were responsible for both collecting data for the team and being the teams traveling doctors. Medical consultants for teams are also physicians who are hired directly by the team. During the season, the primary role of the medical consultant is to be the traveling doctor for the team and treat any injuries or illnesses that arise while the team is on an away trip. During the off season, the medical consultants work to help the team evaluate the medical criteria of potential draft picks and trade options. The medical consultants give the players a very in-depth physical that then results on the player getting a medical grade that evaluates the risk and reward 45

of signing a player. Additionally, the team consultants help the rest of the medical staff to develop preventative care measures that players can work on during the off season to prevent future injuries. Although most teams in professional baseball have started using the medical consultants, not all teams currently use the consultants.

Medical consultants were first used by the and the Pittsburgh Pirates.

The initial usage of these consultants was to prevent issues from occurring on road trips. Without medical consultants, the opposing teams’ doctors become responsible for the illness and injury treatment of the visiting team. This can result in issues with information disclosure, privacy, and bias. Between 2005-2006 there were 3 instances in the MLB where there were delays in getting a player’s physical done because the visiting team was dealing with another team’s doctors. In the real world a delay of a few days would not be a huge deal, but in professional baseball when a player is signed, he is needed to play almost immediately and the delay costs the team money.

Team medical consultants primarily help to minimize risk. On a typical day, the team medical consultant does not do much. If all goes according to plan the team medical consultant becomes an extra body at the field. However, when things start to go wrong and injuries and illness occur the team consultant can become a vital part of the team. When determining how many (if any at all) medical consultants a team wants to hire they need to properly evaluate their risk as well as the potential costs that could occur from not having a medical consultant. The best medical consultants are doctors who can serve multiple functions such as creating an injury prevention program, or offering the team data on models predicating injury. Every team’s philosophy on medical consultants differs and currently there is no proven model determining how many a team should hire. 46

4.1.3 Athletic Trainers Similar to the head team physician, the other commonality between all teams’ medical staffs is the head athletic trainer. Contrary to physicians, most athletic trainers have similar backgrounds. The athletic trainers have been through a master’s program in schooling and have typically worked at various levels in a sport before becoming the teams head athletic trainers.

Although a hospital or medical practice will sometimes provide a team with an athletic trainer, for the most part teams are responsible for hiring their own athletic trainers and their staff. In addition to the head athletic trainer, teams typically have an additional two or three assistant athletic trainers. A head athletic trainer will make upwards of $100,000 while an assistant athletic trainer will have a median salary of roughly $61,000 (Bender). These salaries are then evaluated into the team’s medical budget to determine how many athletic trainers to staff.

The benefits of having a top tier head athletic trainer are immense. In accordance with the current collective bargaining agreement in place for Major League Baseball, the athletic trainer is typically the first person who evaluates a player’s injury. The athletic trainer is then responsible for coordinating the players next move in seeking treatment. The athletic trainer can either help the player with a minor injury (not requiring the player to see a team physician), recommend the player see the team physician, or recommend the player see an outside physician. Given that teams want players to return as quickly as possible, having a good athletic trainer helps to make sure that the initial diagnosis is correct and the player follows the proper means of action in the future. This is similar to how someone will call their family doctor before rushing to the emergency room for an illness. Based on this reasoning, it is important for teams to invest in the top athletic training staff with a large capacity, otherwise the team will incur great future risks resulting in additional time lost and additional costs. 47

Within the teams athletic training staff, the team can also look to hire its own physical therapists, occupational therapists, chiropractors, or massage therapists. These roles are dependent on the head team physician’s philosophy, and the team’s budget but can all potentially lead to greater player success, injury prevention, and satisfaction. When considering these roles, the median salary for a physical therapist and an occupational therapist is roughly $86,000, and chiropractors and massage therapists will both charge roughly $75 an hour. These costs are all factored into a team’s decision to hire individuals in these specific roles. The greatest benefit to the team comes when individuals are certified in more than one field. For example, an athletic trainer is also certified as a physical therapist or occupational therapist. Individuals with multiple certifications are in high demand and will receive a slightly higher salary but will ultimately result in the team saving money on its budget.

4.1.4 Strength and Conditioning Coaches The last potential members of a medical staff that a team can hire are the strength and conditioning coaches, and nutritionists. The strength and conditioning coaches are responsible for helping to train the players and play a huge role in injury prevention. A proper training program that is tailored to players needs can help for them to become more durable and save the team money in the long run. In recent years, teams have also begun hiring nutritionists to help players in their diets and forming a link between dietary health and performance. Similar to the athletic trainers, it is most beneficial when a team can find a strength and conditioning who is also a certified dietitian to help minimize costs. The median salary for a strength and conditioning coach is

$50,000 while the median salary for a sports nutritionist is $75,000. The challenge for a team’s front office then becomes determining what roles are necessary and how many of each role to hire. 48

4.2 Modeling the Medical Staff

To model the differences in staffing options, a theoretical model was created to analyze the tradeoffs between specific medical positions. Given the limited amount of public data that is released by teams regarding their medical departments, the model was created based on a theoretical scenario. The model uses the nine different staffing positions that are described in section 4.1 (physicians, team consultants, athletic trainers, physical therapists, occupational therapists, chiropractors, massage therapists, strength and conditioning coaches, and nutritionists) as the decision variables – the decision being the number of employees staffed at each position.

Once the decision variables were determined, the next step was to determine the parameters within the model. The first set of parameters that were determined were the costs of each position. The initial variable definitions and parameter definitions can be seen below in Table 7.

Table 7: Medical Staffing Variables and Cost Parameters

In addition to cost, another parameter was needed to represent the value added that each position brings. For this model, a theoretical parameter called “reduction of injury days” was created. This parameter is an estimate of the reduction of number of days that players spend unable to play in games. For actual teams using a model similar to this, they will have much more input data that can generate a more accurate value representation for each team. For this model, an 49

estimate for each position was used. The parameter values associated with each variable can be seen below in Table 8.

Table 8: Reduction of Injury Days Parameters

Parameter Parameter Definition Value R1 Reduction of Injury Days Per Physician 80 R2 Reduction of Injury Days Per Team Consultant 70 R3 Reduction of Injury Days Per Athletic Trainer 55 R4 Reduction of Injury Days Per Physical Therapist 40 R5 Reduction of Injury Days Per Occupational Therapist 35 R6 Reduction of Injury Days Per Chiropractor Session 0.1 R7 Reduction of Injury Days Per Massage Session 0.1 R8 Reduction of Injury Days Per Strength and Conditioning Coach 45 R9 Reduction of Injury Days Per Nutritionist 10

Parameters for the budget, number of injuries without any medical staff, and team budget were also created. These parameters can also be altered from team to team but serve to add values to the model’s constraints. The parameter for budget (B) was estimated to be $2.5million in medical budget that a team is able to spend. The parameter for number of injuries was determined to be 2000. This parameter was determined by evaluating the total number of days that players on each team spent during the 2018 season. The average number of injury days per team was 1,138 days. The maximum number of injury days was 1,800 and the minimum was 508 days. Based on these values, 2000 days was selected for the parameter “N” (number of injury days without a medical staff). A value of 1200 was determined for the parameter “M” (maximum number of injury days a team can have in a given season).

Using all of the parameters, two constraints were created in the model – budget and injury days. The two constraint equations can be seen in below in Figure 18. 50

Figure 18: Staffing Model Constraints Once the constraints had been completed, a model was created using excel. The model was tested for accuracy based on the known injury total and known budget of a specific team (which will remain unnamed). Once the basis for model had been verified it was then altered using different staffing combinations to evaluate the tradeoffs between budget and injury totals. The verification version of the model can be seen below in Figure 19.

Variable Value Parameter Value Parameter Value Cost Injury Days Reduced X1 4 C1 250,000.00 R1 80 1,000,000.00 320 X2 3 C2 250,000.00 R2 75 750,000.00 225 X3 3 C3 100,000.00 R3 60 300,000.00 180 X4 1 C4 86,000.00 R4 40 86,000.00 40 X5 0 C5 86,000.00 R5 35 - 0 X6 50 C6 75.00 R6 0.1 3,750.00 5 X7 100 C7 75.00 R7 0.1 7,500.00 10 X8 3 C8 85,000.00 R8 45 255,000.00 135 X9 0 C9 75,000.00 R9 10 - 0 2,402,250.00 915

N M Total Injury Days 1085 2000 1200 Verificantion Staffing Model: Created using acutal data from a teams staffing compositon B Acceptable Budget? ACCEPTABLE and the total number of injuries that the team Acceptable Number 2,500,000 faced in a given year. ACCEPTABLE Injury Days?

Figure 19: Verification of Staffing Model

4.3 Evaluating Tradeoffs within a Medical Staff

Once a baseline was established based on the verification model, various staffing combinations were then tested to analyze how the injury days and budget would be affected. The model does not consider the different risks that are associated with removing staff from certain 51

roles. However, it can be assumed that any budget money that is saved can be used to pay off potential insurance claims due to injury. This is a tradeoff that was very difficult to quantify without team’s private data. Therefore, the model only analyzes the tradeoffs between staffing, budget, and injury time.

The first scenario that was analyzed can be seen below in Figure 20. This scenario demonstrates a team who is looking to minimize their medical budget and is willing to take significant risks. This model eliminates all three team consultants from the baseline model.

Eliminating the team consultants frees up $750,000 in capital for the team but also increase the total number of injury days by 225 days. Additionally, the team also now faces the risk of a player getting injured on the road and not being able to have an immediate response. In place of the team consultants, this scenario adds one additional athletic trainer and one additional strength coach.

The addition of the strength coach and athletic trainer costs the team $185,000 and reduces the total number of injury days by 105 days. Overall, the team’s total injury days increases by 120 days from the baseline (to 1205 days) and the budget decrease by $565K to $1,837,250.

This scenario demonstrates a team who focuses a majority of their capital on preventative care (athletic trainers and strength coaches). This investment is beneficial in the sense that it can reduce some potential future injuries. The risk of this scenario is that there are no consultants to respond to road trip injury which could potentially lead to bias or a wrong initial diagnosis. It is known that there are currently teams in the MLB who do not use team consultants and this scenario helps to demonstrate the risk / reward that these teams incur. 52

Variable Value Parameter Value Parameter Value Cost Injury Days Reduced X1 4 C1 250,000.00 R1 80 1,000,000.00 320 X2 0 C2 250,000.00 R2 75 - 0 X3 4 C3 100,000.00 R3 60 400,000.00 240 X4 1 C4 86,000.00 R4 40 86,000.00 40 X5 0 C5 86,000.00 R5 35 - 0 X6 50 C6 75.00 R6 0.1 3,750.00 5 X7 100 C7 75.00 R7 0.1 7,500.00 10 X8 4 C8 85,000.00 R8 45 340,000.00 180 X9 0 C9 75,000.00 R9 10 - 0 1,837,250.00 795

N M Total Injury Days 1205 2000 1200 Case 1: Eliminated Team Consultants (X2). Added additional Athletic Trainer, and additional strength and conditioning coach. B Acceptable Budget? ACCEPTABLE Injury days increae by a 120 days, budget decreases by $565K. Greater risk as a result Acceptable Number 2,500,000 UNACCEPTABLE of no team consultants. Injury Days?

Figure 20: Staffing Model Case 1 The second case that was evaluated exemplifies some of the possibilities when a team has a higher budget. As shown in Figure 21, this team has a budget of $2.75 million which is 250K higher than the baseline budget. Since this team invests additional capital into their medical staffing, they are able to add an additional physical therapist, an occupational therapist, and a nutritionist. The additions of a physical therapist and occupational therapist help the team to better treat players who are recovering from injury and expedite their recovery time. Additionally, the nutritionist is a role that has recently become more popular in baseball. Although the impact of the nutritionist is not too significant, the usage of the nutritionist helps to increase player satisfaction and aligns with some team’s managers logic. The extra capital allows the teams to spend additional capital on some of these luxuries and ultimately reduces the number of injury days by 85 from the baseline. This case demonstrates how the varying budgets of teams has an effect on the total number of injuries that the teams face. 53

Variable Value Parameter Value Parameter Value Cost Injury Days Reduced X1 4 C1 250,000.00 R1 80 1,000,000.00 320 X2 3 C2 250,000.00 R2 75 750,000.00 225 X3 3 C3 100,000.00 R3 60 300,000.00 180 X4 2 C4 86,000.00 R4 40 172,000.00 80 X5 1 C5 86,000.00 R5 35 86,000.00 35 X6 50 C6 75.00 R6 0.1 3,750.00 5 X7 100 C7 75.00 R7 0.1 7,500.00 10 X8 3 C8 85,000.00 R8 45 255,000.00 135 X9 1 C9 75,000.00 R9 10 75,000.00 10 2,649,250.00 1000

N M Total Injury Days 1000 2000 1200 Case 2: Team has a budget that is $250K higher ($2.75million total). Team is able to allocate resources towards training -- additional B Acceptable Budget? ACCEPTABLE physical therapist, addition of occupation therapist, addition of nutritionist. Reduces total Acceptable Number 2,750,000 ACCEPTABLE number of injury days by 85. Injury Days?

Figure 21: Staffing Model Case 2

The third case works to increase the players satisfaction by adding additional services that the players can benefit from. This case doubles the chiropractor sessions that the players receive to 100 total (twice a week) and increases the massage sessions from twice a week to three times a week (100 total to 150 total). Additionally, this scenario adds a nutritionist similar to case 2.

Although increasing the player satisfaction does not have a drastic impact on injury days

(only decreases injuries by 20 days), this scenario has other impacts in recruiting and retaining players to join a team. When players are satisfied with how a team treats them, this impression travels throughout the league and other players hear of how well they are being treated. Although this does not directly impact the medical staffing it does help with a team’s performance. The effects of this case on the medical staffing department can be seen on the next page in Figure 22. 54

Variable Value Parameter Value Parameter Value Cost Injury Days Reduced X1 4 C1 250,000.00 R1 80 1,000,000.00 320 X2 3 C2 250,000.00 R2 75 750,000.00 225 X3 3 C3 100,000.00 R3 60 300,000.00 180 X4 1 C4 86,000.00 R4 40 86,000.00 40 X5 0 C5 86,000.00 R5 35 - 0 X6 100 C6 75.00 R6 0.1 7,500.00 10 X7 150 C7 75.00 R7 0.1 11,250.00 15 X8 3 C8 85,000.00 R8 45 255,000.00 135 X9 1 C9 75,000.00 R9 10 75,000.00 10 2,484,750.00 935

N M Total Injury Days 1065 2000 1200 Case 3: Allocation of additional budget towards player satisfaction. Chiropractor sessions doubled to twice a week, massage sessions B Acceptable Budget? ACCEPTABLE moved from twice a week to three times a week. Team adds nutritionist. Only 20 day Acceptable Number 2,500,000 ACCEPTABLE reduction in injury days, and increase in cost. Injury Days?

Figure 22: Staffing Model Case 3

The final scenario that was analyzed (case 4) gave teams a $3 million budget and attempted to minimize the total injury days as much as possible. This scenario simulates a team with a significant amount of capital looking to be cutting edge and significantly beat the league average for total injury days. In this scenario, teams added two additional athletic trainers, an occupational therapist, and three additional strength and conditioning coaches. This scenario also attempted to increase player satisfaction by increasing the number the number of chiropractor sessions to two a week and the number of massage sessions to three times a week.

The objective of this case was to utilize aspects that worked well in the previous cases to generate a more significant result. Increasing the athletic trainers from three to five increase the team’s capacity to treat minor injuries and also increases the team’s ability to personalize preventative care. The same logic follows in the addition of three strength and conditioning coaches. A greater number of strength and conditioning coaches allows the team to personalize workouts for players and specify injury prevention work. Additionally, this model increase player satisfaction by adding the occupation therapist and increasing massage and chiropractor sessions. 55

This demonstrates how the team’s investment will not only result in medical increases but can also potentially increase future performance. Overall, this case decreases the total number of injury days by 300 from the baseline. As shown below in Figure 23, this case demonstrates how if a team is willing to invest into their medical staffing, they will see results in decreased injury.

Variable Value Parameter Value Parameter Value Cost Injury Days Reduced X1 4 C1 250,000.00 R1 80 1,000,000.00 320 X2 3 C2 250,000.00 R2 75 750,000.00 225 X3 5 C3 100,000.00 R3 60 500,000.00 300 X4 1 C4 86,000.00 R4 40 86,000.00 40 X5 1 C5 86,000.00 R5 35 86,000.00 35 X6 100 C6 75.00 R6 0.1 7,500.00 10 X7 150 C7 75.00 R7 0.1 11,250.00 15 X8 6 C8 85,000.00 R8 45 510,000.00 270 X9 0 C9 75,000.00 R9 10 - 0 2,950,750.00 1215

N M Case 4: This case operates with a $3 million Total Injury Days 785 2000 1200 budget and seeks to minimize total days injured. This case adds two additional athletic trainers, and 3 additional strength and B Acceptable Budget? ACCEPTABLE conditioning coaches. This case also doubles the chiropractor sessions, and increases Acceptable Number 3,000,000 ACCEPTABLE massage sessions from two times a week to Injury Days? three times a week.

Figure 23: Staffing Model Case 4 Overall, the various scenarios that are compared in this chapter demonstrate the varying tradeoffs between specific staffing models. Although this model is not perfect, it highlights how a larger investment into the medical staff ultimately leads to decreased injuries. The scenarios demonstrate how a team’s ideology can vary based on minimizing budget, maximizing preventative care, maximizing player satisfaction, or a combination of all ideologies. There are tradeoffs between each scenario and all scenarios are practiced by various teams in the major leagues. Currently there is no true science between which scenario better – it comes down to what teams want the most. However, analyzing the tradeoffs between each scenario allows teams to become more educated on their decisions and potentially leads to greater success. 56

CHAPTER 5: CONCLUSION

The summary of the research findings, and potential avenues for future research are discussed in this chapter.

5.1 Summary of Findings

The game of baseball is ever changing, and every team is always trying to find “the next big thing” that will give them a competitive advantage. Since data analytics first began in baseball, the field has vastly expanded from one or two salaried employees to now entire departments

(McKinsey & Company 2018). Similarly, teams have begun expanding their medical research and medical departments over recent years. Medical findings can result in teams saving millions of dollars which can then be allocated to other departments. At the forefront of medical research are predictive models. Teams have begun creating models to indicate the likelihood of players getting injured, and using these models to evaluate potential prospects and trades.

In addition to the predictive models, teams have also begun evaluating how they can optimally staff their medical departments. While there is no proven staffing model that works, there are tradeoffs between each model. As the game of baseball continues to change, the next big competitive advantages could be injury prediction and staffing optimization. To help teams find the next big advantage, the research in this paper created models to predict future injury, and evaluate medical staffing so that teams and players can increase their future performance.

The research started by first analyzing a variety of trends among recent injuries. The cost of injuries, the total number of injuries, and the total time that players spent injured were all evaluated. It was shown that nearly all of these figures had increased in recent years. A deeper analysis further explored the specific types that were occurring and looked at the trends in the most 57

common injuries: shoulder, elbow, hamstring, knee, and back injuries. At this part of the research it became apparent that the most common injuries occurred in the players shoulders and elbows.

Based on this information, previous studies were then evaluated to get a sense of what other models had evaluated. Research papers on biomechanical factors, statistical factors, and model selection were all analyzed. After closely analyzing past research it became apparent that the pitcher position had the most available data. The available data resulted in the research of this paper pivoting to focus on pitchers.

In baseball, the pitcher undergoes a violent motion that puts various stresses on his upper extremities. Pitchers are one of the most commonly injured players in baseball for this reason

(Kagan 2009). Given the data resources, the model for this research focused on analyzing the odds of injury specifically in pitchers.

The model evaluated the likelihood of a pitcher getting Tommy John surgery – one of the most common injuries in baseball. The model used over 2,500 observations and initially had 49 variables. The variables consisted of basic pitching statistics, pitch selection percentages and velocities, and pitch movements (indicative of mechanics). Some variables were eliminated based on a lack of data and others were eliminated using backwards selection in RStudio. Once all of the variables had been determined, a Generalized Linear Model was created that predicted the odds of a pitcher needing Tommy John Surgery. The model showed that pitch selection factored heavily into the odds of a pitcher getting hurt. The model also showed that the number of appearances and the duration of a pitcher’s appearances can also attribute to injury. In the future, the model could be used when brokering deals between teams to evaluate how likely certain pitchers are to get injured – a direct factor in the players worth. 58

Following the creation of the predictive model for injuries, the medical staffs of teams were then evaluated. Currently, every team is staffed differently and there is no exact science behind how a team chooses to staff its medical department. The medical department of teams can contain any combination of physicians, team consultants, athletic trainers, physical therapists, occupation therapists, chiropractors, massage therapists, strength coaches, and nutritionists.

Using one teams known staffing combination, and their total injuries and budget for a given season a baseline model was created. Once the baseline had been created, four different scenarios were tested and compared to the baseline to see how teams can alter their medical staff and the tradeoffs that teams will face. The different scenarios showed potential staffing combinations for teams with a low budget, high injury prevention measures, high player satisfaction, or minimal injury time. The four scenarios demonstrated how teams must consider their budget allocation and the associated risks when staffing their medical department. Having a low budget allows team to spend money elsewhere but also leaves the team at a great risk to injury which negatively affects performance. The scenarios also show how teams must consider players satisfaction when staffing their medical department and how this scenario can have indirect effects on a team’s success.

Overall, this research follows the trend towards analytics in professional baseball. If teams are able to utilize the injury prediction model, and evaluate the various staffing scenarios presented they can save money. By saving money teams will be able to allocate capital to other resources and have a greater chance of success. The next competitive advantage could be in within medical departments and as soon as one team has success, every other team will be soon to follow. 59

5.2 Future Research

The research in this paper is just the tip of the ice berg in what can be done with optimizing teams medical departments and predicting future injuries. Much of the analysis in this paper was limited by a lack of available data. While many basic statistics are available, much of the data that teams keep is private and unavailable to the public. If a team wanted to further develop this research, they could enhance both the predictive model and staffing scenarios.

If a team wanted to further enhance the predictive model, they could further examine some of the mechanical predictors that lead to injury. The current research only uses pitch movement as an indicator of mechanics, but an actual team could use real measurements taken from each player.

The measurements and biomechanics of each player would add additional variables to the model and could help to increase the accuracy within the model.

The parameters for the medical staffing model costs and reduction of injury time were largely based off of estimates given a small portion of one team data. If a team had a full set of data, they would be able to make these parameters much more accurate. This would give the team a better sense of the tradeoffs between different scenarios and would further aid them in their decision-making process.

Injury prevention and optimizing medical staffing can save teams millions of dollars each season. In the near future teams will have models that can list percentages of almost every injury type imaginable and there will eventually become a clear science behind how to staff a medical department. The game of baseball will continue to shift more towards analytics and will be optimizing every performance, not just those on the field. 60

CITATIONS

“Baseball Disabled List Transactions.” Pro Sports Transactions, Pro Sports Transactions,

www.prosportstransactions.com/baseball/SearchResults.php?Player=&Team=&BeginDate

=2002-02001&EndDate=2019-02-13&DLChkBx=yes&submit=Search

Business Radio. “Analytics in Baseball: How More Data Is Changing the

Game.” Knowledge@Wharton, 21 Feb. 2019, knowledge.wharton.upenn.edu/article/analytics-

in-baseball/.

Davis, Kari. “Predicting Injuries in MLB Pitchers.” Medium, Towards Data Science, 11 Mar. 2019,

towardsdatascience.com/predicting-injuries-in-mlb-pitchers-c2e133deca39.

Kagan, David. “The Anatomy of a Pitch: Doing Physics with PITCHf/x Data.” The Physics

Teacher, vol. 47, Oct. 2009, pp. 412–416., http://baseball.physics.illinois.edu/KaganPitchfx.pdf.

“Major League Baseball Transactions.” Mlb.com, Major League Baseball,

mlb.mlb.com/mlb/transactions/?tcid=mm_mlb_players#month=7&year=2018

“Major League Leaderboards " 2014 " Pitchers " Pitch Type Statistics: FanGraphs Baseball.” Major

League Leaderboards " 2014 " Pitchers " Pitch Type Statistics | FanGraphs Baseball,

FanGraphs,

www.fangraphs.com/leaders.aspx?pos=all&stats=pit&lg=all&qual=10&type=4&season=2014&

month=0&season1=2014&ind=0&team=0&rost=0&age=0&filter=&players=0.

McCarthy, Michelle. “Researchers Team up with Major League Baseball to Predict Injuries before

They Occur.” USC News, 30 Mar. 2018, news.usc.edu/139033/predict-baseball-injuries-before-

they-occur-usc-physical-therapy/. 61

McKinsey & Company. “A View from the Front Lines of Baseball's Data-Analytics

Revolution.” McKinsey & Company, July 2018, www.mckinsey.com/business-

functions/organization/our-insights/a-view-from-the-front-lines-of--data-analytics-

revolution.

“MLB Stats, Scores, History, & Records.” Baseball, www.baseball-reference.com/.

“Predicting Pitcher Injuries.” The Harvard Sports Analysis Collective, 4 Jan. 2016,

harvardsportsanalysis.org/2016/01/predicting-pitcher-injuries/.

“Professor Lori Michener and Her Team Strive to Reduce Elbow Injuries in Baseball.” USC

Division of Biokinesiology and Physical Therapy, 1 Oct. 2019, pt.usc.edu/2019/02/12/usc-

professor-lori-michener-strives-to-reduce-ucl-injuries-in-baseball/.

“Tommy John Surgery List (@MLBPlayerAnalys).” Google Sheets, Google,

docs.google.com/spreadsheets/d/1gQujXQQGOVNaiuwSN680Hq-FDVsCwvN-

3AazykOBON0/edit#gid=0.

“2017 Disabled List Information.” RotoGraphs Fantasy Baseball, Fan Graphs,

fantasy.fangraphs.com/2017-disabled-list-information/.

62

ACADEMIC VITA