Runs Created vs. Runs Scored in Steroid and Contemporary Eras
April 20, 2016
Runs Created vs. Runs Scored in Steroid and Contemporary Eras Conor Bruen, Hannah Corderman, Sam Gomez, Marcus Jones
Introduction We decided to investigate the relationship between runs created and runs scored, comparing the Steroid Era (1994-2004) with the Contemporary Era (2005-2014). In the Steroid Era, we hypothesized that there would be a statistical difference in the runs created vs. runs scored because of the effects of steroids. We considered the notion that players on steroids would create more offensive power opportunities at bat, which would lead to more total bases. With the Contemporary (or post-Steroid Era), we hypothesized that the correlation would be stronger, with respect to accuracy between runs created and runs scored.
Methods Using the Lahman Data frame, we required the use of dplyr and ggplot to examine the correlation between runs created and runs scored. Dplyr allowed us the opportunity to formulate the appropriate variables, specify specific eras, and perform in- depth calculations, which we were later able to view visually through the application of ggplot. Through commanding the use of linear regressions, we compared and analyzed the computed data and correlations between the Steroid Era and Contemporary era. Findings In the Steroid Era, we found that there was a strong correlation between the runs created and runs scored, evidenced by an adjusted R-squared value of 0.9189. In the Contemporary Era, there was also a strong correlation between the runs created and runs scored as the linear regression results showed an adjusted R-squared value of 0.9193.
Discussion Based on our findings, we no longer support our original hypothesis that there would be a statistical difference in the runs created vs. runs scored during the Steroid Era. In fact, we discovered that there was quite a strong correlation between runs created and runs scored between 1994 and 2004. Similarly, we also found that there was a strong correlation between runs created and runs scored between 2005 and 2014, which coincides our original assumptions. Given these results, we feel that the runs created statistic accurately represents the amount of runs scored by a team.
## yearID lgID teamID franchID divID Rank G Ghome W L DivWin WCWin LgWin ## 1 1871 NA BS1 BNA
## Source: local data frame [6 x 8] ## Groups: yearID [1] ## ## yearID teamID BA PA X1B TB RC R ## (int) (fctr) (dbl) (int) (int) (dbl) (dbl) (int) ## 1 2000 ANA 0.280 6326 995 2659 926 864 ## 2 2000 ARI 0.265 6179 961 2373 783 792 ## 3 2000 ATL 0.271 6188 1011 2353 812 810 ## 4 2000 BAL 0.272 6210 992 2414 814 794 ## 5 2000 BOS 0.267 6331 988 2384 804 792 ## 6 2000 CHA 0.286 6351 1041 2654 944 978
## ## Call: ## lm(formula = R ~ RC, data = tm.batting) ## ## Residuals: ## Min 1Q Median 3Q Max ## -67.58 -13.60 -0.29 14.97 60.62 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 13.72956 18.74065 0.733 0.465 ## RC 0.97431 0.02371 41.096 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 24.36 on 148 degrees of freedom ## Multiple R-squared: 0.9194, Adjusted R-squared: 0.9189 ## F-statistic: 1689 on 1 and 148 DF, p-value: < 2.2e-16 ## yearID lgID teamID franchID divID Rank G Ghome W L DivWin WCWin LgWin ## 1 1871 NA BS1 BNA
## Source: local data frame [6 x 8] ## Groups: yearID [1] ## ## yearID teamID BA PA X1B TB RC R ## (int) (fctr) (dbl) (int) (int) (dbl) (dbl) (int) ## 1 2005 ARI 0.256 6256 910 2337 771 696 ## 2 2005 ATL 0.265 6111 924 2387 792 769 ## 3 2005 BAL 0.269 6094 980 2409 778 729 ## 4 2005 BOS 0.281 6389 1020 2557 913 910 ## 5 2005 CHA 0.262 6092 974 2349 739 741 ## 6 2005 CHN 0.270 6090 966 2457 783 703
## ## Call: ## lm(formula = R ~ RC, data = tm.batting) ## ## Residuals: ## Min 1Q Median 3Q Max ## -63.774 -14.177 -0.903 13.298 78.709 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 42.37129 11.75707 3.604 0.000367 *** ## RC 0.92516 0.01584 58.389 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 22.46 on 298 degrees of freedom ## Multiple R-squared: 0.9196, Adjusted R-squared: 0.9193 ## F-statistic: 3409 on 1 and 298 DF, p-value: < 2.2e-16