Problem Set Six - Retrosheet
Total Page:16
File Type:pdf, Size:1020Kb
Problem Set Six - Retrosheet Problem 1. For the years 1987-1988, which AL and which NL umpire had the most games at home plate? rs1987 <- getRetrosheet("game", 1987) rs1988 <- getRetrosheet("game", 1988) rs88_89 <- rbind(rs1987,rs1988) table(rs88_89$UmpHNm,rs88_89$HmTmLg) ## ## AL NL ## Al Clark 75 0 ## Bill Hohn 0 7 ## Bill Williams 0 20 ## Bob Davidson 0 73 ## Bob Engel 0 71 ## Bruce Froemming 0 73 ## Charlie Williams 0 72 ## Chuck Meriwether 6 0 ## Dale Ford 69 0 ## Dale Scott 56 0 ## Dan Morrison 75 0 ## Dana DeMuth 0 64 ## Dave Pallone 0 71 ## Dave Phillips 70 0 ## Derryl Cousins 74 0 ## Dick Stello 0 36 ## Don Denkinger 70 0 ## Doug Harvey 0 75 ## Drew Coble 68 0 ## Durwood Merrill 73 0 ## Dutch Rennert 0 74 ## Ed Montague 0 73 ## Eric Gregg 0 71 ## Frank Pulli 0 67 ## Fred Brocklander 0 60 ## Gary Darling 0 50 ## Gerry Davis 0 75 ## Greg Bonin 0 46 ## Greg Kosc 76 0 ## Harry Wendelstedt 0 67 ## Jerry Crawford 0 74 ## Jim Evans 76 0 ## Jim Joyce 26 0 ## Jim McKean 69 0 ## Jim Quick 0 72 ## Joe Brinkman 74 0 ## Joe West 0 72 ## John Hirschbeck 72 0 ## John Kibler 0 75 1 ## John McSherry 0 74 ## John Shulock 73 0 ## Ken Kaiser 66 0 ## Larry Barnett 75 0 ## Larry McCoy 75 0 ## Larry Poncino 0 29 ## Larry Young 63 0 ## Lee Weyer 0 59 ## Mark Hirschbeck 0 23 ## Mark Johnson 49 0 ## Mike Reilly 73 0 ## Nick Bremigan 65 0 ## Paul Runge 0 70 ## Randy Marsh 0 76 ## Rich Garcia 73 0 ## Rick Reed 77 0 ## Rocky Roe 75 0 ## Steve Palermo 55 0 ## Steve Rippley 0 62 ## Ted Hendry 74 0 ## Terry Cooney 60 0 ## Terry Craft 21 0 ## Terry Tata 0 69 ## Tim McClelland 73 0 ## Tim Tschida 60 0 ## Tim Welke 75 0 ## Tom Hallion 0 40 ## Vic Voltaggio 54 0 The analysis shows Rick Reed was tops in the AL with 77 games behind the plate and Randy Marsh had the most games behind the plate with 76 in the NL. Problem 2 For each team in 1999, find the highest number of hits they accumulated in an away game. Also find the lowest number of hits each time got in an away game. rs1999 <- getRetrosheet("game",1999) team.hits <- rs1999 %>% group_by(VisTm) %>% summarize(max.hits=max(VisH),min.hits=min(VisH)) %>% arrange(VisTm) team.hits ## Source: local data frame [30 x 3] ## ## VisTm max.hits min.hits ## (chr) (int) (int) ## 1 ANA 20 0 ## 2 ARI 20 1 ## 3 ATL 24 3 ## 4 BAL 25 1 ## 5 BOS 19 2 ## 6 CHA 18 2 ## 7 CHN 16 2 ## 8 CIN 28 1 2 ## 9 CLE 20 3 ## 10 COL 18 3 ## .. ... ... ... Problem 3 For each team in 2002, calculate the difference between the number of homeruns hit at home and the number of homeruns hit on the road. Which team the the greatest difference between home and the road? How many teams hit more HRs on the road? rs2002 <- getRetrosheet("game", 2002) home.hr <- rs2002 %>% group_by(HmTm) %>% summarize(tot.HmHr=sum(HmHR)) %>% arrange(HmTm) vis.hr <- rs2002 %>% group_by(VisTm) %>% summarize(tot.VisHr=sum(VisHR)) %>% arrange(VisTm) comb <- cbind(home.hr,vis.hr) comb %>% mutate(diff=tot.HmHr-tot.VisHr) %>% arrange(desc(diff)) ## HmTm tot.HmHr VisTm tot.VisHr diff ## 1 CHA 132 CHA 85 47 ## 2 COL 97 COL 55 42 ## 3 KCA 88 KCA 52 36 ## 4 TEX 132 TEX 98 34 ## 5 OAK 116 OAK 89 27 ## 6 BAL 92 BAL 73 19 ## 7 TOR 102 TOR 85 17 ## 8 HOU 88 HOU 79 9 ## 9 MON 85 MON 77 8 ## 10 NYN 81 NYN 79 2 ## 11 ARI 83 ARI 82 1 ## 12 CIN 85 CIN 84 1 ## 13 SLN 88 SLN 87 1 ## 14 ATL 82 ATL 82 0 ## 15 CHN 99 CHN 101 -2 ## 16 CLE 95 CLE 97 -2 ## 17 DET 61 DET 63 -2 ## 18 NYA 108 NYA 115 -7 ## 19 PHI 79 PHI 86 -7 ## 20 TBA 63 TBA 70 -7 ## 21 LAN 73 LAN 82 -9 ## 22 ANA 71 ANA 81 -10 ## 23 FLO 66 FLO 80 -14 ## 24 MIL 62 MIL 77 -15 ## 25 SDN 59 SDN 77 -18 ## 26 PIT 61 PIT 81 -20 ## 27 BOS 77 BOS 100 -23 ## 28 SEA 64 SEA 88 -24 ## 29 MIN 68 MIN 99 -31 ## 30 SFN 72 SFN 126 -54 3 The Chicago White Sox had the greatest home versus road differential for HR with Colorado a close second. Sixteen teams had more homeruns on the road than at home. Problem 4 For the years 1972-1974, calculate the average number of home team Homeruns per game for each month of the season. Show your results with a ggplot graphic, faceting by league of the home team. You will need to use lubridate to create a year column and a month column. What patterns do you see? rs1972 <- getRetrosheet("game",1972) rs1973 <- getRetrosheet("game",1973) rs1974 <- getRetrosheet("game",1974) rs1972_1974 <- rbind(rs1972,rs1973,rs1974) rs1972_1974$year <- year(ymd(rs1972_1974$Date)) rs1972_1974$month <- month(ymd(rs1972_1974$Date), label=T,abbr=T) rs1972_1974 %>% group_by(year,month,HmTmLg) %>% summarize(mean.HR = mean(HmHR)) %>% ggplot(.,aes(month,mean.HR)) + geom_point() + facet_wrap(~HmTmLg) + geom_smooth(method ="lm") AL NL 1.0 0.8 mean.HR 0.6 Apr May Jun Jul Aug Sep Oct Apr May Jun Jul Aug Sep Oct month Problem 5 For the years from 1972 through 1974, which day of the week had the most games and which had the least? 4 table(rs1972_1974$Day) ## ## Fri Mon Sat Sun Thu Tue Wed ## 892 563 930 1087 555 863 857 5.