Runs Created vs. Runs Scored in Steroid and Contemporary Eras

April 20, 2016

Runs Created vs. Runs Scored in Steroid and Contemporary Eras Conor Bruen, Hannah Corderman, Sam Gomez, Marcus Jones

Introduction We decided to investigate the relationship between runs created and runs scored, comparing the Steroid Era (1994-2004) with the Contemporary Era (2005-2014). In the Steroid Era, we hypothesized that there would be a statistical difference in the runs created vs. runs scored because of the effects of steroids. We considered the notion that players on steroids would create more offensive power opportunities , which would lead to more . With the Contemporary (or post-Steroid Era), we hypothesized that the correlation would be stronger, with respect to accuracy between runs created and runs scored.

Methods Using the Lahman Data frame, we required the use of dplyr and ggplot to examine the correlation between runs created and runs scored. Dplyr allowed us the opportunity to formulate the appropriate variables, specify specific eras, and perform in- depth calculations, which we were later able to view visually through the application of ggplot. Through commanding the use of linear regressions, we compared and analyzed the computed data and correlations between the Steroid Era and Contemporary era. Findings In the Steroid Era, we found that there was a strong correlation between the runs created and runs scored, evidenced by an adjusted R-squared value of 0.9189. In the Contemporary Era, there was also a strong correlation between the runs created and runs scored as the linear regression results showed an adjusted R-squared value of 0.9193.

Discussion Based on our findings, we no longer support our original hypothesis that there would be a statistical difference in the runs created vs. runs scored during the Steroid Era. In fact, we discovered that there was quite a strong correlation between runs created and runs scored between 1994 and 2004. Similarly, we also found that there was a strong correlation between runs created and runs scored between 2005 and 2014, which coincides our original assumptions. Given these results, we feel that the runs created statistic accurately represents the amount of runs scored by a team.

## yearID lgID teamID franchID divID Rank G Ghome W L DivWin WCWin LgWin ## 1 1871 NA BS1 BNA 3 31 NA 20 10 N ## 2 1871 NA CH1 CNA 2 28 NA 19 9 N ## 3 1871 NA CL1 CFC 8 29 NA 10 19 N ## 4 1871 NA FW1 KEK 7 19 NA 7 12 N ## 5 1871 NA NY2 NNA 5 33 NA 16 17 N ## 6 1871 NA PH1 PNA 1 28 NA 21 7 Y ## WSWin R AB H X2B X3B HR BB SO SB CS HBP SF RA ER ERA CG SHO SV ## 1 401 1372 426 70 37 3 60 19 73 NA NA NA 303 109 3.55 22 1 3 ## 2 302 1196 323 52 21 10 60 22 69 NA NA NA 241 77 2.76 25 0 1 ## 3 249 1186 328 35 40 7 26 25 18 NA NA NA 341 116 4.11 23 0 0 ## 4 137 746 178 19 8 2 33 9 16 NA NA NA 243 97 5.17 19 1 0 ## 5 302 1404 403 43 21 1 33 15 46 NA NA NA 313 121 3.72 32 1 0 ## 6 376 1281 410 66 27 9 46 23 56 NA NA NA 266 137 4.95 27 0 0 ## IPouts HA HRA BBA SOA E DP FP name ## 1 828 367 2 42 23 225 NA 0.83 Boston Red Stockings ## 2 753 308 6 28 22 218 NA 0.82 Chicago White Stockings ## 3 762 346 13 53 34 223 NA 0.81 Cleveland Forest Citys ## 4 507 261 5 21 17 163 NA 0.80 Fort Wayne Kekiongas ## 5 879 373 7 42 22 227 NA 0.83 New York Mutuals ## 6 747 329 3 53 16 194 NA 0.84 Philadelphia Athletics ## park attendance BPF PPF teamIDBR teamIDlahman45 ## 1 South End Grounds I NA 103 98 BOS BS1 ## 2 Union Base-Ball Grounds NA 104 102 CHI CH1 ## 3 National Association Grounds NA 96 100 CLE CL1 ## 4 Hamilton Field NA 101 107 KEK FW1 ## 5 Union Grounds (Brooklyn) NA 90 88 NYU NY2 ## 6 Jefferson Street Grounds NA 102 98 ATH PH1 ## teamIDretro ## 1 BS1 ## 2 CH1 ## 3 CL1 ## 4 FW1 ## 5 NY2 ## 6 PH1

## Source: local data frame [6 x 8] ## Groups: yearID [1] ## ## yearID teamID BA PA X1B TB RC R ## (int) (fctr) (dbl) (int) (int) (dbl) (dbl) (int) ## 1 2000 ANA 0.280 6326 995 2659 926 864 ## 2 2000 ARI 0.265 6179 961 2373 783 792 ## 3 2000 ATL 0.271 6188 1011 2353 812 810 ## 4 2000 BAL 0.272 6210 992 2414 814 794 ## 5 2000 BOS 0.267 6331 988 2384 804 792 ## 6 2000 CHA 0.286 6351 1041 2654 944 978

## ## Call: ## lm(formula = R ~ RC, data = tm.) ## ## Residuals: ## Min 1Q Median 3Q Max ## -67.58 -13.60 -0.29 14.97 60.62 ## ## Coefficients: ## Estimate Std. t value Pr(>|t|) ## (Intercept) 13.72956 18.74065 0.733 0.465 ## RC 0.97431 0.02371 41.096 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 24.36 on 148 degrees of freedom ## Multiple R-squared: 0.9194, Adjusted R-squared: 0.9189 ## F-statistic: 1689 on 1 and 148 DF, p-value: < 2.2e-16 ## yearID lgID teamID franchID divID Rank G Ghome W L DivWin WCWin LgWin ## 1 1871 NA BS1 BNA 3 31 NA 20 10 N ## 2 1871 NA CH1 CNA 2 28 NA 19 9 N ## 3 1871 NA CL1 CFC 8 29 NA 10 19 N ## 4 1871 NA FW1 KEK 7 19 NA 7 12 N ## 5 1871 NA NY2 NNA 5 33 NA 16 17 N ## 6 1871 NA PH1 PNA 1 28 NA 21 7 Y ## WSWin R AB H X2B X3B HR BB SO SB CS HBP SF RA ER ERA CG SHO SV ## 1 401 1372 426 70 37 3 60 19 73 NA NA NA 303 109 3.55 22 1 3 ## 2 302 1196 323 52 21 10 60 22 69 NA NA NA 241 77 2.76 25 0 1 ## 3 249 1186 328 35 40 7 26 25 18 NA NA NA 341 116 4.11 23 0 0 ## 4 137 746 178 19 8 2 33 9 16 NA NA NA 243 97 5.17 19 1 0 ## 5 302 1404 403 43 21 1 33 15 46 NA NA NA 313 121 3.72 32 1 0 ## 6 376 1281 410 66 27 9 46 23 56 NA NA NA 266 137 4.95 27 0 0 ## IPouts HA HRA BBA SOA E DP FP name ## 1 828 367 2 42 23 225 NA 0.83 Boston Red Stockings ## 2 753 308 6 28 22 218 NA 0.82 Chicago White Stockings ## 3 762 346 13 53 34 223 NA 0.81 Cleveland Forest Citys ## 4 507 261 5 21 17 163 NA 0.80 Fort Wayne Kekiongas ## 5 879 373 7 42 22 227 NA 0.83 New York Mutuals ## 6 747 329 3 53 16 194 NA 0.84 Philadelphia Athletics ## park attendance BPF PPF teamIDBR teamIDlahman45 ## 1 South End Grounds I NA 103 98 BOS BS1 ## 2 Union Base-Ball Grounds NA 104 102 CHI CH1 ## 3 National Association Grounds NA 96 100 CLE CL1 ## 4 Hamilton Field NA 101 107 KEK FW1 ## 5 Union Grounds (Brooklyn) NA 90 88 NYU NY2 ## 6 Jefferson Street Grounds NA 102 98 ATH PH1 ## teamIDretro ## 1 BS1 ## 2 CH1 ## 3 CL1 ## 4 FW1 ## 5 NY2 ## 6 PH1

## Source: local data frame [6 x 8] ## Groups: yearID [1] ## ## yearID teamID BA PA X1B TB RC R ## (int) (fctr) (dbl) (int) (int) (dbl) (dbl) (int) ## 1 2005 ARI 0.256 6256 910 2337 771 696 ## 2 2005 ATL 0.265 6111 924 2387 792 769 ## 3 2005 BAL 0.269 6094 980 2409 778 729 ## 4 2005 BOS 0.281 6389 1020 2557 913 910 ## 5 2005 CHA 0.262 6092 974 2349 739 741 ## 6 2005 CHN 0.270 6090 966 2457 783 703

## ## Call: ## lm(formula = R ~ RC, data = tm.batting) ## ## Residuals: ## Min 1Q Median 3Q Max ## -63.774 -14.177 -0.903 13.298 78.709 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 42.37129 11.75707 3.604 0.000367 *** ## RC 0.92516 0.01584 58.389 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 22.46 on 298 degrees of freedom ## Multiple R-squared: 0.9196, Adjusted R-squared: 0.9193 ## F-statistic: 3409 on 1 and 298 DF, p-value: < 2.2e-16