America’s Pastime:
A Statistical Analysis of Baseball Batting Orders
By
Ian Orr & Arunabh Singh
MAT 444: Senior Seminar (Markov Chain Monte Carlo)
Professor George Cobb
December 21, 2007 Orr & Singh – America’s Pastime
In baseball, one of a manager’s important duties is to fill out the lineup card.
Besides determining which players are going to play in a given game, the manager must also determine the order in which the players in his lineup bat. Conventional wisdom has passed on given suggested guidelines for what would be expected of each position in the batting order. For instance, the first player to bat (the “leadoff” hitter) is generally supposed to be adept at reaching base, in addition to being quick enough on the basepaths to “make things happen” by stealing bases or advancing extra bases if a following batter
gets a hit (i.e., going from first base to third base on a single). The second hitter in the
lineup is generally supposed to have good “bat control,”1 be able to take extra pitches if
the leadoff hitter wishes to steal second base, or to be able to “hit-and-run.”2 The “heart
of the lineup” bats in the third through fifth spots. Those are the power-hitters who “drive
in” the “top-of-the-order” hitters, producing runs for the team. The last four hitters in the
lineup have less well-defined characteristics, but there usually is a trend in decreasing
ability towards the bottom of the order.
While the traditional lineup makes logical sense, there is alarmingly little
empirical evidence of how this sort of lineup prepares with other arrangements. Few
managers ever depart significantly from that setup, and attempts to even formulate new
ideas are often met with hostility. Thus, any attempts to investigate alternate lineups
would have to occur outside the realm of the baseball establishment. People might
attempt to apply probability theory to predict the number of runs, but they would
ultimately find that the interrelating variables would make the problem far too complex
1 “Bat control” is a term used to describe several skills, including not striking out, or hitting to the right side of the infield to move a runner from second base to third base. 2 A “hit-and-run” is a pre-planned play where a runner on first base breaks for second as if he were stealing, causing the defending team’s second baseman to cover second base for the catcher’s throw. The batter, however, hits a ground ball towards the spot recently vacated by the second baseman.
- 2 - Orr & Singh – America’s Pastime for successful completion. The other option was the motivation behind this paper: computer simulations. We reasoned that by creating a simplified model of a batting order, we can compare its offensive effectiveness compared to other arrangements of the same lineup. Our method of doing this is determine both the expected number of times that a given player would lead off an inning in a standard nine-inning game and the expected number of runs that the team would score if a given player did lead off an inning.
Combining those results gives us a raw total of runs that a lineup would be expected to score, and that number can be compared to other arrangements of the same lineup.
The Model:
When we started this project, we knew that the game of baseball would have to be simplified if we were going to be able to set up a simulation. Furthermore, we wanted to construct a model that had as few inputs as possible. Thus, we had the challenge of pruning the many statistics available reflecting the various abilities of the different players to the fewest statistics that carried the most weight to the attributes of a player.
The statistics we settled on were batting average (AVG), on-base percentage (OBP), and slugging percentage (SLG), because together they paint the overall picture of a hitter’s basic attributes. Batting average, the most popular measure of a player’s offensive production, is that ratio of hits to at-bats (H / AB); on-base percentage is the ratio hits, walks, and hit-by-pitches to at-bats, walks, hit-by-pitches, and sacrifice flies [(H + BB +
HBP) / (AB + BB + HBP + SF)]; and slugging percentage is the ratio of total bases to at- bats (TB / AB) where total bases is the sum of bases obtained on hits (TB = 1 * 1B + 2 *
2B + 3 * 3B + 4 * HR). Since sacrifice files are relatively rare, we decided to utilize OBP
- 3 - Orr & Singh – America’s Pastime
as if SF was not part of the equation.3 Our next assumption was to assume that the
distribution of extra bases (TB - H) from doubles, triples, and home runs for each player
matched the distribution of extra bases from doubles, triples, and home runs for the
league over the past three years.4 We designate those variables px2B, px3B and pxHR.
After further assuming that walks and hit-by-pitches produce the same outcome, so we decided that in our model, what would have been a hit-by-pitch becomes a walk.
Our model has five further assumptions pertaining to how the inning is played
rather than each player’s performance. We first assumed that there would be exactly nine
innings in every game for a lineup to bat and that no substitutions would ever be made.
Second, we assumed that if one player lead off an inning, the probability of a given
player leading off the second inning was proportional to the probability of him leading
off the following inning given no more than eleven players can bat in a given inning (see
Methodology – I). Third, since it was beyond reflection of our chosen statistics, we assumed there would be no stolen bases, no outs by a player once he is on base, and no advancing on another player’s out. Fourth, there would be no errors, wild pitches, passed balls, or any other freak event allowing a baserunner to advance further than expected.
Finally, we codified our results for every possible scenario: a runner on first advances to second on walks or singles, and scores on any larger hit; a runner on second scores on all hits and only advances to third on walks when there is also a runner on first; a runner on third scores on all hits, but only scores on walks with both second base and first base occupied. An unintended consequence of this model is that triples are logically equivalent
3 We did not re-calculate on-base percentages, removing sacrifice flies. We used the data as it was originally calculated. 4 Three years was chosen because we hoped it would give a reflection of the current state of the league while simultaneously removing year-by-year fluctuations.
- 4 - Orr & Singh – America’s Pastime
to doubles! In any given base situation, a double would score as many runs as a triple,
and the only time a runner would score from third but not second is from a walk with the
bases loaded, and in our model, a bases-loaded situation can only happen as a result of a
walk. Thus, without corrupting our model, we were able to simplify it further by
assuming that all hits that would normally have been triples become doubles. Using high school algebra, we were able to determine each batter’s probability in our model of getting, respectively, an out, a walk, a single, a double5, or a home run, using the
following set of equations:
• pOUT = 1 – OBP
• pBB = (OBP - AVG) / (1 - AVG)
• p1B = AVG / (1 - AVG) * (1 - OBP) - [1 - (OBP - AVG) / (1 - AVG)] * (SLG - AVG) * (px2B + px3B/2 + pxHR/3)
• p2B = [1 - (OBP - AVG) / (1-AVG)] * (SLG - AVG) * px2B + [1 - (OBP - AVG) / (1 - AVG)] * (SLG - AVG) * px3B/2
• pHR = [1 - (OBP - AVG) / (1 - AVG)] * (SLG - AVG) * pxHR/3.
Methodology – I: Our first goal was to determine the expected number of times, for a given lineup,
that each player would be expected to lead-off an inning in a single nine-inning game. To do this, we determine a probability matrix P for which coordinate (i , j) represents the probability that, if player i leads off an inning, player j leads off the following inning. We approach this by first letting pi be player i’s probability of getting on-base, and qi be his
probability of getting out. We know pi for each player, because it is simply his on-base
6 percentage. Likewise, qi = pOUTi = 1 - OBPi. In addition, let us suppose that the probability of j leading off an inning when player i leads off the inning before is pij. It
5 The second portion equation for p2B is the sum of what would have been the proportion of doubles and the proportion of triples if we differentiated between doubles and triples. 6 For convenience, the remainder of this section will use pi and qi instead of pOUTi and OBPi.
- 5 - Orr & Singh – America’s Pastime
follows that each (m , n) coordinate in P has the value pmn. If we are to fix m at 1 to focus
on the first row, we find ourselves wishing to determine the probability that a certain
number of batters come to the plate before the third out is recorded. In essence, this is a
generalized form of the negative binomial distribution. A single entry can thus be
expressed as:
n−2 • p1n = ( ∑ [(qi / pi) * (qn - 2 / pn - 2)]) * (∏ pk) * qn - 1. 1≤i< j≤n−2 k =1
It is given in the model, however, that the first run-through of the lineup dominates, so
we will assumed that 4 ≤ n ≤ 12, where p10, p11, and p12 are equivalent to p1, p2, and p3, respectively, and likewise for q10, q11, and q12.
The above equation is only applicable, however, for small values of n. For larger
values, calculation by hand becomes cumbersome and impractical. To ease this
computation, we note that ∑ [(qi / pi) * (qn - 2 / pn - 2)] can be determined by looking 1≤i< j≤n−2
at a matrix where the coordinate (i , j) has entry [(qi / pi) * (qj / pj)]. Therefore, this sum
can be obtained by adding the entries of the matrix through row and column n-1,
subtracting the entries from the main diagonal, and then dividing the result by 2. One
might (reasonably) ask why this fact is helpful at all, because it would involve more
computations to determine the result through this method than before, but this fact makes
programming a computer to determine the result simpler. After the results are obtained
for the entire row, we scaled the entries so that the row aggregate is 1. We can compute
the remaining entries of P by shifting the values of p and q for each row, and repeating
the same procedure. Thus, using language R, the following code creates the probability
matrix for a given nine-entry vector of on base percentage values (obp):
- 6 - Orr & Singh – America’s Pastime
p0 = obp P = matrix(0,9,9)
shiftR = function(x){ n = length(x) x = c(x[n],x[-n]) return(x) }
shiftL = function(x){ x = c(x,x[1])[-1] return(x) }
ShiftR = function(x,r){ if (r == 0) {return(x)} for(i in 1:r){x = shiftR(x)} return(x) }
ShiftL = function(x,r){ if (r == 0) {return(x)} for(i in 1:r){x = shiftL(x)} return(x) }
for (i in 1:9){ p = ShiftL(p0,i-1) p = c(p, p[1:3]) q = 1-p A = outer(q/p,q/p) trans = rep(0,9) for (j in 1:9){ B = A[1:(j+1),1:(j+1)] S = .5*(sum(B) - sum(diag(B))) trans[j] = S*prod(p[1:(j+1)])*q[j+2] } P[i,] = ShiftR(trans,i+2) }
P = P/rowSums(P) round(P,digits=4)
To determine the expected number of times that each lineup spot would lead-off an inning in a nine-inning game, we calculate the expected number of times each hitter leads off each individual inning. For instance, we know that player 1 leads off the first inning, so his expected number of times leading off the first inning is 1, while the other eight players have expected value 0. To determine the expected values for the second inning, we multiply a probability vector (1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0) by P to get a new probability vector, which is an array of probabilities that each player would lead off the
- 7 - Orr & Singh – America’s Pastime
second inning. To get further innings’ probabilities, we multiply the above probability
vector by successive powers of P. Finally, we add together the successive results to
obtain a vector of expected number of innings lead off. We used the following code in R
to obtain that array:
mtrxpwr = function(x,r){ for (i in 2:r){ x=x%*%x } return(x) }
leadoff = c(1,0,0,0,0,0,0,0,0)
timesleadingoff = function(x){ timesleadoff = leadoff + (leadoff*x)[1,1:9] for (i in 2:8){ timesleadoff = timesleadoff + (leadoff*mtrxpwr(x,i))[1,1:9] } return(timesleadoff) }
timesleadingoff(P)
This array obtained is combined with the results of the following section to determine the
expected number of runs scored.
Methodology – II:
This portion of our project is mathematically more straightforward, but requires
extensive computer simulation. Our goal for this section was to find the expected number
of runs scored in an inning when each player in a given lineup leads off an inning using
Markov Chain Monte Carlo techniques. Using the derived probabilities of each possible outcome (out, walk, single, double, or home run) and how it affects the situation on the basepaths, we use the R to run 500,000 innings with each batter leading off, and then average the number of runs scored per inning.7 The following code in R generates an
array of the expected number of runs scored in an inning based on leadoff hitter (Batting
7 A drawback to this method is that the resulting simulation takes approximately fifteen minutes per lineup.
- 8 - Orr & Singh – America’s Pastime
Average, On-Base percentage, and Slugging Percentage are, respectively, nine-entry arrays of each player’s avg, obp, slg, while px2b, px3b, and pxhr are the proportion of extra bases from doubles, triples, and home runs, respectively):
pout=1-obp pbb=(obp-avg)/(1-avg) p1b=avg/(1-avg)*(1-obp)-(1-(obp-avg)/(1-avg))*(slg-avg)*(px2b+px3b/2+pxhr/3) p2b=(1-(obp-avg)/(1-avg))*(slg-avg)*px2b + (1-(obp-avg)/(1-avg))*(slg-avg)*px3b/2 phr=(1-(obp-avg)/(1-avg))*(slg-avg)*pxhr/3
empty = 0 first = 1 second = 2 firstsecond = 12 loaded = 123 NReps = 500000
modish=function(x){ if (x>9) {x-9}else {x} }
emptyresult = function(x,v,poutbatter,pbbbatter,p1bbatter,p2bbatter,p3bbatter,phrbatter){ if (x < poutbatter) {v = v + c(1,0,0); v} else {if (x < poutbatter + pbbbatter + p1bbatter) {v = v + c(0,1,0);v} else {if (x < poutbatter + pbbbatter + p1bbatter + p2bbatter) {v = v + c(0,2,0);v} else {v = v +c(0,0,1);v} } } }
firstresult = function(x,v,poutbatter,pbbbatter,p1bbatter,p2bbatter,p3bbatter,phrbatter){ if (x < poutbatter) {v = v + c(1,0,0); v} else {if (x < poutbatter + pbbbatter + p1bbatter) {v[2] = firstsecond;v} else {if (x < poutbatter + pbbbatter + p1bbatter + p2bbatter) {v = v + c(0,0,1); v[2] = 2;v} else {v = v + c(0,0,2); v[2] = empty; v} } } }
secondresult = function(x,v,poutbatter,pbbbatter,p1bbatter,p2bbatter,p3bbatter,phrbatter){ if (x < poutbatter) {v[1] = v[1] + 1; v} else {if (x < poutbatter + pbbbatter) {v[2] = firstsecond;v} else {if (x < poutbatter + pbbbatter + p1bbatter) {v[2] = first; v[3] = v[3] + 1;v} else {if (x < poutbatter + pbbbatter + p1bbatter + p2bbatter) {v[2] = second; v[3] = v[3] + 1;v} else {v[2] = empty; v[3] = v[3] + 2;v}
- 9 - Orr & Singh – America’s Pastime
} } } } firstsecondresult = function(x,v,poutbatter,pbbbatter,p1bbatter,p2bbatter,p3bbatter,phrbatter){ if (x < poutbatter) {v[1] = v[1] + 1; v} else {if (x < poutbatter + pbbbatter) {v[2] = loaded; v} else {if (x < poutbatter + pbbbatter + p1bbatter) {v[2] = firstsecond; v[3] = v[3] + 1; v} else {if (x < poutbatter + pbbbatter + p1bbatter + p2bbatter) {v[2] = second; v[3] = v[3] + 2;v} else {v[3] = v[3] + 3;v} } } } }
loadedresult = function(x,v,poutbatter,pbbbatter,p1bbatter,p2bbatter,p3bbatter,phrbatter){ if (x < poutbatter) {v[1] = v[1] + 1; v} else {if (x < poutbatter + pbbbatter) {v[3] = v[3] + 1; v} else {if (x < poutbatter + pbbbatter + p1bbatter) {v[2] = firstsecond; v[3] = v[3] + 2; v} else {if (x < poutbatter + pbbbatter + p1bbatter + p2bbatter) {v[2] = second; v[3] = v[3] + 3;v} else {v[2] = empty; v[3] = v[3] + 4;v} } } } }
batterresult = function(x,v,poutbatter,pbbbatter,p1bbatter,p2bbatter,p3bbatter,phrbatter){ if (v[2] == empty) {emptyresult(x,v,poutbatter,pbbbatter,p1bbatter,p2bbatter,p3bbatter,phrbatter)} else {if (v[2] == first) {firstresult(x,v,poutbatter,pbbbatter,p1bbatter,p2bbatter,p3bbatter,phrbatter)} else {if (v[2] == second) {secondresult(x,v,poutbatter,pbbbatter,p1bbatter,p2bbatter,p3bbatter,phrbatter)} else {if (v[2] == firstsecond) {firstsecondresult(x,v,poutbatter,pbbbatter,p1bbatter,p2bbatter,p3bbatter,phrbatter)} else {loadedresult(x,v,poutbatter,pbbbatter,p1bbatter,p2bbatter,p3bbatter,phrbatter)} } } } }
inningresult = function(m,v){ n = m while (v[1] < 3) {v = batterresult(runif(1),v,pout[n],pbb[n],p1b[n],p2b[n],p3b[n],phr[n]); n = modish(n+1)} v[3] } avgrunsinning = function (n,runs){
- 10 - Orr & Singh – America’s Pastime
x = 0; for (i in 1:runs){ x = x + inningresult(n, c(0,0,0))}; x/runs}
avgrunsbyleadoffhitter = function (runs){ c(avgrunsinning(1, runs),avgrunsinning(2, runs),avgrunsinning(3, runs),avgrunsinning(4, runs),avgrunsinning(5, runs),avgrunsinning(6, runs),avgrunsinning(7, runs),avgrunsinning(8, runs),avgrunsinning(9, runs))}
avgrunsbyleadoffhitter(NReps)
We then take the dot product of this vector and the vector produced in the previous section to yield the expected number of runs scored in a standard game.
Results:
Our first simulation determined the expected number runs scored by an average
National League lineup—that is, the first batter had statistics averaged from all National
League batters in the leadoff position, the second batter for the second lineup position,
and so on for all nine positions. For this simulation8 (and all further simulations involving the National League), we used the calculated values of px2B = .3455, px3B = .0732, and
pxHR = .5813. The following statistics were used for this lineup:
2007 National League Lineup Position AVG OBP SLG 1 0.276 0.339 0.413 2 0.276 0.338 0.417 3 0.288 0.370 0.490 4 0.279 0.367 0.496 5 0.272 0.342 0.456 6 0.273 0.338 0.446 7 0.263 0.323 0.409 8 0.252 0.321 0.378 9 0.184 0.236 0.265
8 All statistics used in this session were obtained from www.baseball-reference.com.
- 11 - Orr & Singh – America’s Pastime
According to our simulation, the expected number of runs scored by that lineup
would be 5.172 runs per game. We next ran that analysis based on the average 2007
American League statistics, setting px2B = .3444, px3B = .0674, and pxHR = .5882 for
this (and all further American League simulations). The table below shows the
aforementioned lineup statistics:
2007 American League Lineup Position AVG OBP SLG 1 0.282 0.348 0.413 2 0.281 0.340 0.415 3 0.281 0.356 0.468 4 0.284 0.359 0.487 5 0.276 0.345 0.459 6 0.266 0.328 0.431 7 0.261 0.320 0.414 8 0.251 0.312 0.383 9 0.251 0.305 0.372
Our simulation projected that lineup to score 5.351 runs per game.
We next decided to see how an alternate arrangement of these same batters would
affect expected runs scored. In 1964, Earnshaw Cook published a book entitled
Percentage Baseball which suggested that a team would increase the number of runs scored by rearranging its lineup in the following order (where the number represents the original lineup position): 34512789.9 We re-ordered the above lineups using Cook’s
suggestion and ran the simulations again. The new National League lineup scored an
average of 5.189 runs per game, while the new American League lineup averaged 5.361
runs per game. That showed an improvement of .017 runs per game for the National
League and .010 runs per game for the American League, which would average to 2.754
and 1.620 runs, respectively, over the course of a 162-game season. We next did a more
9 Cited in Thorn, John, and Pete Palmer. 1985. The Hidden Game of Baseball. Garden City, New York: Doubleday & Company, Inc.
- 12 - Orr & Singh – America’s Pastime
subtle lineup re-ordering for the National League inspired by current St. Louis Cardinal
manager Tony La Russa which consisted of placing the pitcher in the number 8 lineup
position (the pitcher almost always the weakest hitter and thus hits last). We simply
swapped hitters 8 and 9 in the National League lineup; the new arrangement averaged
5.192 runs per game, which was a full .003 runs per game greater than the Cook lineup
for an additional .486 runs per 162 games!
The above lineups, however, are of dubious value because they do not actually
deal with specific players. Real managers do not deal with league-average players, but
instead deal with a group of players who varying skill sets. All the remaining simulations
deal with real teams and real players. The first of these lineups is the 2006 St. Louis
Cardinals. The following is the lineup that the Cardinals used in their World Series
clinching game 5 victory over the Detroit Tigers:
2006 Cardinals’ World Series Game 5 Lineup David Eckstein 0.292 0.350 0.344 Chris Duncan 0.293 0.363 0.589 Albert Pujols 0.331 0.431 0.671 Jim Edmonds 0.257 0.350 0.471 Scott Rolen 0.296 0.369 0.518 Ron Belliard10 0.237 0.295 0.371 Yadier Molina 0.216 0.274 0.321 So Taguchi 0.266 0.335 0.351 Pitcher11 0.173 0.224 0.222
This lineup averaged 5.402 runs per game. We then, however, decided to
investigate the worth of having David Eckstein bat leadoff. Baseball analysts consider
David Eckstein to be an extremely over-rated baseball player who is a fan-favorite
because he is 5’6”, 170 lb. and has a reputation of being “scrappy.” They also note that
10 Ron Belliard’s statistics are only from his games in the National League. 11 The pitcher’s statistics are the average statistics from all Cardinals pitchers in 2006.
- 13 - Orr & Singh – America’s Pastime his low slugging percentage hurts his team, and argue that his on-base percentage is not high enough to justify batting him leadoff. We then ran the simulation again, swapping
Eckstein with Scott Rolen, who had both a better on-base percentage and a better slugging percentage. The drawback, however, was that Eckstein’s low slugging percentage would not drive in as many people in front of him. This alternate lineup ended up scoring 5.362 runs per game, for a decrease of .040 runs per game and 6.480 runs per
162 games.
Our final batch of simulations pertained to the 2007 Chicago Cubs.12 Their most- used lineup that season was the following (which statistics from the 2007 season):
2007 Cubs Most Used Lineup Alfonso Soriano 0.299 0.337 0.560 Ryan Theriot 0.266 0.326 0.346 Derrek Lee 0.317 0.400 0.513 Aramis Ramirez 0.310 0.366 0.549 Cliff Floyd 0.284 0.373 0.422 Mark DeRosa 0.293 0.371 0.420 Jacque Jones 0.285 0.335 0.400 Jason Kendall13 0.270 0.362 0.356 Pitcher14 0.155 0.167 0.207
This lineup averaged 5.288 runs per game. The second hitter, however, Ryan
“The Riot” Theriot, a fan favorite, shares many qualities with David Eckstein—he only weighs 175 lb. and has a reputation for being scrappy (though he is 5’11”); however, his on-base-percentage is terrible. We suspected that swapping Ryan Theriot with Mark
DeRosa would yield a more productive lineup. We were correct—this lineup scored
12 The Chicago Cubs are always a source of heartbreak for one of us (Orr) and humor for the other (Singh). The latter takes pleasure in the former’s sadness and the former in the latter’s relative ignorance of baseball. 13 Kendall’s statistics are only from his games in the National League. 14 The pitcher line is an average off all Cubs pitchers for the 2007 season.
- 14 - Orr & Singh – America’s Pastime
5.303 runs per game, for a .015 run per game improvement and a whopping 2.43
additional runs per 162 games.
We felt, however, that the Cubs’ lineup could be completely re-worked. For
instance, we simply do not see Alfonso Soriano as a good leadoff hitter. His offensive
value comes from his .560 slugging percentage which is wasted if he bats with no one on
base. Similarly, he does not get on base very often for other hitters because of his .337
on-base percentage. Therefore, we propose an alternate lineup:
2007 Cubs Proposed Lineup Mark DeRosa 0.293 0.371 0.420 Derrek Lee 0.317 0.400 0.513 Aramis Ramirez 0.310 0.366 0.549 Alfonso Soriano 0.299 0.337 0.560 Cliff Floyd 0.284 0.373 0.422 Jason Kendall15 0.270 0.362 0.356 Jacque Jones 0.285 0.335 0.400 Ryan Theriot 0.266 0.326 0.346 Pitcher16 0.155 0.167 0.207
This lineup scored 5.367 runs per game for a .079 run per game and a full 12.798 runs per
162 games over the original Cubs lineup. That would actually imply one additional win
in during the course of the regular season.17
Conclusions:
We are reluctant to draw any significant conclusions from our results for
primarily three reasons. Firstly, we did not anticipate how long it would take to run a
single round of simulations. That length forced us to carefully decide what questions we would investigate, each time not sure if the outcome would be anything significant.
15 Kendall’s statistics are only from his games in the National League. 16 The pitcher line is an average off all Cubs pitchers for the 2007 season. 17 Empirically, ten additional runs scored or ten fewer runs allowed has corresponded to an additional win by a team over the course of a season.
- 15 - Orr & Singh – America’s Pastime
Secondly, our relative inexperience with R coupled with the inefficiency of our code
means that we do not know how much the values obtained through the simulation vary,
nor do we possess information about the standard deviation of runs scored by various
lineups, which might alter our conclusions. Lastly, our model has too many assumptions
to be able to take any of our results with anything more than a grain of salt. For instance,
baserunning is an important skill that we left out; in addition to stealing bases, some
players can advance farther on hits (or outs, for that matter) than other hitters.
Furthermore, some players are more prone to hitting into double plays than others, but
that, of course, also depends on the players in front of a hitter in the lineup. Thus, we are
hesitant to overstate our conclusions.
It is also relevant to note that, with the exception of our proposed Cubs lineup and
perhaps the swapped Cardinals lineup, none of the results are significant enough to have
any noticeable affect on the outcome of the season. This suggests that, unless a manager does something as questionable as Lou Piniella did with the 2007 Cubs, investigating lineup improvements is possibly not worth the effort it entails, since managers take many additional pieces of information into account when they arrange a lineup. About the only things we can safely conclude from this test is that players who reach base are more valuable in front of players who hit for power, players who hit for power are more valuable following players who reach base more often, and that a truly bad arrangement, like the 2007 Cubs, really does hinder the offense. Aside from these caveats, however, we do believe that we have developed a useful tool for analyzing batting orders (using a fairly large set of assumptions to simplify the game). Hopefully, our work will be
- 16 - Orr & Singh – America’s Pastime expanded in future using more efficient code to reduce, or even eliminate, some of these assumptions so that the results obtained are more realistic.
- 17 -