America's Pastime: a Statistical Analysis of Baseball Batting Orders

America’s Pastime:

A Statistical Analysis of Baseball Batting Orders

Ian Orr & Arunabh Singh

MAT 444: Senior Seminar (Markov Chain Monte Carlo)

Professor George Cobb

December 21, 2007 Orr & Singh – America’s Pastime

In baseball, one of a manager’s important duties is to fill out the lineup card.

Besides determining which players are going to play in a given game, the manager must also determine the order in which the players in his lineup bat. Conventional wisdom has passed on given suggested guidelines for what would be expected of each position in the batting order. For instance, the first player to bat (the “leadoff” hitter) is generally supposed to be adept at reaching base, in addition to being quick enough on the basepaths to “make things happen” by stealing bases or advancing extra bases if a following batter

gets a hit (i.e., going from first base to third base on a single). The second hitter in the

lineup is generally supposed to have good “bat control,”1 be able to take extra pitches if

the leadoff hitter wishes to steal second base, or to be able to “hit-and-run.”2 The “heart

of the lineup” bats in the third through fifth spots. Those are the power-hitters who “drive

in” the “top-of-the-order” hitters, producing runs for the team. The last four hitters in the

lineup have less well-defined characteristics, but there usually is a trend in decreasing

ability towards the bottom of the order.

While the traditional lineup makes logical sense, there is alarmingly little

empirical evidence of how this sort of lineup prepares with other arrangements. Few

managers ever depart significantly from that setup, and attempts to even formulate new

ideas are often met with hostility. Thus, any attempts to investigate alternate lineups

would have to occur outside the realm of the baseball establishment. People might

attempt to apply probability theory to predict the number of runs, but they would

ultimately find that the interrelating variables would make the problem far too complex

1 “Bat control” is a term used to describe several skills, including not striking out, or hitting to the right side of the infield to move a runner from second base to third base. 2 A “hit-and-run” is a pre-planned play where a runner on first base breaks for second as if he were stealing, causing the defending team’s second baseman to cover second base for the catcher’s throw. The batter, however, hits a ground ball towards the spot recently vacated by the second baseman.

- 2 - Orr & Singh – America’s Pastime for successful completion. The other option was the motivation behind this paper: computer simulations. We reasoned that by creating a simplified model of a batting order, we can compare its offensive effectiveness compared to other arrangements of the same lineup. Our method of doing this is determine both the expected number of times that a given player would lead off an inning in a standard nine-inning game and the expected number of runs that the team would score if a given player did lead off an inning.

Combining those results gives us a raw total of runs that a lineup would be expected to score, and that number can be compared to other arrangements of the same lineup.

The Model:

When we started this project, we knew that the game of baseball would have to be simplified if we were going to be able to set up a simulation. Furthermore, we wanted to construct a model that had as few inputs as possible. Thus, we had the challenge of pruning the many statistics available reflecting the various abilities of the different players to the fewest statistics that carried the most weight to the attributes of a player.

The statistics we settled on were batting average (AVG), on-base percentage (OBP), and slugging percentage (SLG), because together they paint the overall picture of a hitter’s basic attributes. Batting average, the most popular measure of a player’s offensive production, is that ratio of hits to at-bats (H / AB); on-base percentage is the ratio hits, walks, and hit-by-pitches to at-bats, walks, hit-by-pitches, and sacrifice flies [(H + BB +

HBP) / (AB + BB + HBP + SF)]; and slugging percentage is the ratio of total bases to at- bats (TB / AB) where total bases is the sum of bases obtained on hits (TB = 1 * 1B + 2 *

2B + 3 * 3B + 4 * HR). Since sacrifice files are relatively rare, we decided to utilize OBP

- 3 - Orr & Singh – America’s Pastime

as if SF was not part of the equation.3 Our next assumption was to assume that the

distribution of extra bases (TB - H) from doubles, triples, and home runs for each player

matched the distribution of extra bases from doubles, triples, and home runs for the

league over the past three years.4 We designate those variables px2B, px3B and pxHR.

After further assuming that walks and hit-by-pitches produce the same outcome, so we decided that in our model, what would have been a hit-by-pitch becomes a walk.

Our model has five further assumptions pertaining to how the inning is played

rather than each player’s performance. We first assumed that there would be exactly nine

innings in every game for a lineup to bat and that no substitutions would ever be made.

Second, we assumed that if one player lead off an inning, the probability of a given

player leading off the second inning was proportional to the probability of him leading

off the following inning given no more than eleven players can bat in a given inning (see

Methodology – I). Third, since it was beyond reflection of our chosen statistics, we assumed there would be no stolen bases, no outs by a player once he is on base, and no advancing on another player’s out. Fourth, there would be no errors, wild pitches, passed balls, or any other freak event allowing a baserunner to advance further than expected.

Finally, we codified our results for every possible scenario: a runner on first advances to second on walks or singles, and scores on any larger hit; a runner on second scores on all hits and only advances to third on walks when there is also a runner on first; a runner on third scores on all hits, but only scores on walks with both second base and first base occupied. An unintended consequence of this model is that triples are logically equivalent

3 We did not re-calculate on-base percentages, removing sacrifice flies. We used the data as it was originally calculated. 4 Three years was chosen because we hoped it would give a reflection of the current state of the league while simultaneously removing year-by-year fluctuations.

- 4 - Orr & Singh – America’s Pastime

to doubles! In any given base situation, a double would score as many runs as a triple,

and the only time a runner would score from third but not second is from a walk with the

bases loaded, and in our model, a bases-loaded situation can only happen as a result of a

walk. Thus, without corrupting our model, we were able to simplify it further by

assuming that all hits that would normally have been triples become doubles. Using high school algebra, we were able to determine each batter’s probability in our model of getting, respectively, an out, a walk, a single, a double5, or a home run, using the

following set of equations:

• pOUT = 1 – OBP

• pBB = (OBP - AVG) / (1 - AVG)

• p1B = AVG / (1 - AVG) * (1 - OBP) - [1 - (OBP - AVG) / (1 - AVG)] * (SLG - AVG) * (px2B + px3B/2 + pxHR/3)

• p2B = [1 - (OBP - AVG) / (1-AVG)] * (SLG - AVG) * px2B + [1 - (OBP - AVG) / (1 - AVG)] * (SLG - AVG) * px3B/2

• pHR = [1 - (OBP - AVG) / (1 - AVG)] * (SLG - AVG) * pxHR/3.

Methodology – I: Our first goal was to determine the expected number of times, for a given lineup,

that each player would be expected to lead-off an inning in a single nine-inning game. To do this, we determine a probability matrix P for which coordinate (i , j) represents the probability that, if player i leads off an inning, player j leads off the following inning. We approach this by first letting pi be player i’s probability of getting on-base, and qi be his

probability of getting out. We know pi for each player, because it is simply his on-base

6 percentage. Likewise, qi = pOUTi = 1 - OBPi. In addition, let us suppose that the probability of j leading off an inning when player i leads off the inning before is pij. It

5 The second portion equation for p2B is the sum of what would have been the proportion of doubles and the proportion of triples if we differentiated between doubles and triples. 6 For convenience, the remainder of this section will use pi and qi instead of pOUTi and OBPi.

- 5 - Orr & Singh – America’s Pastime

follows that each (m , n) coordinate in P has the value pmn. If we are to fix m at 1 to focus

on the first row, we find ourselves wishing to determine the probability that a certain

number of batters come to the plate before the third out is recorded. In essence, this is a

generalized form of the negative binomial distribution. A single entry can thus be

expressed as:

n−2 • p1n = ( ∑ [(qi / pi) * (qn - 2 / pn - 2)]) * (∏ pk) * qn - 1. 1≤i< j≤n−2 k =1

It is given in the model, however, that the first run-through of the lineup dominates, so

we will assumed that 4 ≤ n ≤ 12, where p10, p11, and p12 are equivalent to p1, p2, and p3, respectively, and likewise for q10, q11, and q12.

The above equation is only applicable, however, for small values of n. For larger

values, calculation by hand becomes cumbersome and impractical. To ease this

computation, we note that ∑ [(qi / pi) * (qn - 2 / pn - 2)] can be determined by looking 1≤i< j≤n−2

at a matrix where the coordinate (i , j) has entry [(qi / pi) * (qj / pj)]. Therefore, this sum

can be obtained by adding the entries of the matrix through row and column n-1,

subtracting the entries from the main diagonal, and then dividing the result by 2. One

might (reasonably) ask why this fact is helpful at all, because it would involve more

computations to determine the result through this method than before, but this fact makes

programming a computer to determine the result simpler. After the results are obtained

for the entire row, we scaled the entries so that the row aggregate is 1. We can compute

the remaining entries of P by shifting the values of p and q for each row, and repeating

the same procedure. Thus, using language R, the following code creates the probability

matrix for a given nine-entry vector of on base percentage values (obp):

- 6 - Orr & Singh – America’s Pastime

p0 = obp P = matrix(0,9,9)

shiftR = function(x){ n = length(x) x = c(x[n],x[-n]) return(x) }

shiftL = function(x){ x = c(x,x[1])[-1] return(x) }

ShiftR = function(x,r){ if (r == 0) {return(x)} for(i in 1:r){x = shiftR(x)} return(x) }

ShiftL = function(x,r){ if (r == 0) {return(x)} for(i in 1:r){x = shiftL(x)} return(x) }

for (i in 1:9){ p = ShiftL(p0,i-1) p = c(p, p[1:3]) q = 1-p A = outer(q/p,q/p) trans = rep(0,9) for (j in 1:9){ B = A[1:(j+1),1:(j+1)] S = .5*(sum(B) - sum(diag(B))) trans[j] = S*prod(p[1:(j+1)])*q[j+2] } P[i,] = ShiftR(trans,i+2) }

P = P/rowSums(P) round(P,digits=4)

To determine the expected number of times that each lineup spot would lead-off an inning in a nine-inning game, we calculate the expected number of times each hitter leads off each individual inning. For instance, we know that player 1 leads off the first inning, so his expected number of times leading off the first inning is 1, while the other eight players have expected value 0. To determine the expected values for the second inning, we multiply a probability vector (1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0) by P to get a new probability vector, which is an array of probabilities that each player would lead off the

- 7 - Orr & Singh – America’s Pastime

second inning. To get further innings’ probabilities, we multiply the above probability

vector by successive powers of P. Finally, we add together the successive results to

obtain a vector of expected number of innings lead off. We used the following code in R

to obtain that array:

mtrxpwr = function(x,r){ for (i in 2:r){ x=x%*%x } return(x) }

leadoff = c(1,0,0,0,0,0,0,0,0)

timesleadingoff = function(x){ timesleadoff = leadoff + (leadoff*x)[1,1:9] for (i in 2:8){ timesleadoff = timesleadoff + (leadoff*mtrxpwr(x,i))[1,1:9] } return(timesleadoff) }

timesleadingoff(P)

This array obtained is combined with the results of the following section to determine the

expected number of runs scored.

Methodology – II:

This portion of our project is mathematically more straightforward, but requires

extensive computer simulation. Our goal for this section was to find the expected number

of runs scored in an inning when each player in a given lineup leads off an inning using

Markov Chain Monte Carlo techniques. Using the derived probabilities of each possible outcome (out, walk, single, double, or home run) and how it affects the situation on the basepaths, we use the R to run 500,000 innings with each batter leading off, and then average the number of runs scored per inning.7 The following code in R generates an

array of the expected number of runs scored in an inning based on leadoff hitter (Batting

7 A drawback to this method is that the resulting simulation takes approximately fifteen minutes per lineup.

- 8 - Orr & Singh – America’s Pastime

Average, On-Base percentage, and Slugging Percentage are, respectively, nine-entry arrays of each player’s avg, obp, slg, while px2b, px3b, and pxhr are the proportion of extra bases from doubles, triples, and home runs, respectively):

pout=1-obp pbb=(obp-avg)/(1-avg) p1b=avg/(1-avg)*(1-obp)-(1-(obp-avg)/(1-avg))*(slg-avg)*(px2b+px3b/2+pxhr/3) p2b=(1-(obp-avg)/(1-avg))*(slg-avg)*px2b + (1-(obp-avg)/(1-avg))*(slg-avg)*px3b/2 phr=(1-(obp-avg)/(1-avg))*(slg-avg)*pxhr/3

empty = 0 first = 1 second = 2 firstsecond = 12 loaded = 123 NReps = 500000

modish=function(x){ if (x>9) {x-9}else {x} }

emptyresult = function(x,v,poutbatter,pbbbatter,p1bbatter,p2bbatter,p3bbatter,phrbatter){ if (x < poutbatter) {v = v + c(1,0,0); v} else {if (x < poutbatter + pbbbatter + p1bbatter) {v = v + c(0,1,0);v} else {if (x < poutbatter + pbbbatter + p1bbatter + p2bbatter) {v = v + c(0,2,0);v} else {v = v +c(0,0,1);v} } } }

firstresult = function(x,v,poutbatter,pbbbatter,p1bbatter,p2bbatter,p3bbatter,phrbatter){ if (x < poutbatter) {v = v + c(1,0,0); v} else {if (x < poutbatter + pbbbatter + p1bbatter) {v[2] = firstsecond;v} else {if (x < poutbatter + pbbbatter + p1bbatter + p2bbatter) {v = v + c(0,0,1); v[2] = 2;v} else {v = v + c(0,0,2); v[2] = empty; v} } } }

secondresult = function(x,v,poutbatter,pbbbatter,p1bbatter,p2bbatter,p3bbatter,phrbatter){ if (x < poutbatter) {v[1] = v[1] + 1; v} else {if (x < poutbatter + pbbbatter) {v[2] = firstsecond;v} else {if (x < poutbatter + pbbbatter + p1bbatter) {v[2] = first; v[3] = v[3] + 1;v} else {if (x < poutbatter + pbbbatter + p1bbatter + p2bbatter) {v[2] = second; v[3] = v[3] + 1;v} else {v[2] = empty; v[3] = v[3] + 2;v}

- 9 - Orr & Singh – America’s Pastime

} } } } firstsecondresult = function(x,v,poutbatter,pbbbatter,p1bbatter,p2bbatter,p3bbatter,phrbatter){ if (x < poutbatter) {v[1] = v[1] + 1; v} else {if (x < poutbatter + pbbbatter) {v[2] = loaded; v} else {if (x < poutbatter + pbbbatter + p1bbatter) {v[2] = firstsecond; v[3] = v[3] + 1; v} else {if (x < poutbatter + pbbbatter + p1bbatter + p2bbatter) {v[2] = second; v[3] = v[3] + 2;v} else {v[3] = v[3] + 3;v} } } } }

loadedresult = function(x,v,poutbatter,pbbbatter,p1bbatter,p2bbatter,p3bbatter,phrbatter){ if (x < poutbatter) {v[1] = v[1] + 1; v} else {if (x < poutbatter + pbbbatter) {v[3] = v[3] + 1; v} else {if (x < poutbatter + pbbbatter + p1bbatter) {v[2] = firstsecond; v[3] = v[3] + 2; v} else {if (x < poutbatter + pbbbatter + p1bbatter + p2bbatter) {v[2] = second; v[3] = v[3] + 3;v} else {v[2] = empty; v[3] = v[3] + 4;v} } } } }

batterresult = function(x,v,poutbatter,pbbbatter,p1bbatter,p2bbatter,p3bbatter,phrbatter){ if (v[2] == empty) {emptyresult(x,v,poutbatter,pbbbatter,p1bbatter,p2bbatter,p3bbatter,phrbatter)} else {if (v[2] == first) {firstresult(x,v,poutbatter,pbbbatter,p1bbatter,p2bbatter,p3bbatter,phrbatter)} else {if (v[2] == second) {secondresult(x,v,poutbatter,pbbbatter,p1bbatter,p2bbatter,p3bbatter,phrbatter)} else {if (v[2] == firstsecond) {firstsecondresult(x,v,poutbatter,pbbbatter,p1bbatter,p2bbatter,p3bbatter,phrbatter)} else {loadedresult(x,v,poutbatter,pbbbatter,p1bbatter,p2bbatter,p3bbatter,phrbatter)} } } } }

inningresult = function(m,v){ n = m while (v[1] < 3) {v = batterresult(runif(1),v,pout[n],pbb[n],p1b[n],p2b[n],p3b[n],phr[n]); n = modish(n+1)} v[3] } avgrunsinning = function (n,runs){

- 10 - Orr & Singh – America’s Pastime

x = 0; for (i in 1:runs){ x = x + inningresult(n, c(0,0,0))}; x/runs}

avgrunsbyleadoffhitter = function (runs){ c(avgrunsinning(1, runs),avgrunsinning(2, runs),avgrunsinning(3, runs),avgrunsinning(4, runs),avgrunsinning(5, runs),avgrunsinning(6, runs),avgrunsinning(7, runs),avgrunsinning(8, runs),avgrunsinning(9, runs))}

avgrunsbyleadoffhitter(NReps)

We then take the dot product of this vector and the vector produced in the previous section to yield the expected number of runs scored in a standard game.

Results:

Our first simulation determined the expected number runs scored by an average

National League lineup—that is, the first batter had statistics averaged from all National

League batters in the leadoff position, the second batter for the second lineup position,

and so on for all nine positions. For this simulation8 (and all further simulations involving the National League), we used the calculated values of px2B = .3455, px3B = .0732, and

pxHR = .5813. The following statistics were used for this lineup:

2007 National League Lineup Position AVG OBP SLG 1 0.276 0.339 0.413 2 0.276 0.338 0.417 3 0.288 0.370 0.490 4 0.279 0.367 0.496 5 0.272 0.342 0.456 6 0.273 0.338 0.446 7 0.263 0.323 0.409 8 0.252 0.321 0.378 9 0.184 0.236 0.265

8 All statistics used in this session were obtained from www.baseball-reference.com.

- 11 - Orr & Singh – America’s Pastime

According to our simulation, the expected number of runs scored by that lineup

would be 5.172 runs per game. We next ran that analysis based on the average 2007

American League statistics, setting px2B = .3444, px3B = .0674, and pxHR = .5882 for

this (and all further American League simulations). The table below shows the

aforementioned lineup statistics:

2007 American League Lineup Position AVG OBP SLG 1 0.282 0.348 0.413 2 0.281 0.340 0.415 3 0.281 0.356 0.468 4 0.284 0.359 0.487 5 0.276 0.345 0.459 6 0.266 0.328 0.431 7 0.261 0.320 0.414 8 0.251 0.312 0.383 9 0.251 0.305 0.372

Our simulation projected that lineup to score 5.351 runs per game.

We next decided to see how an alternate arrangement of these same batters would

affect expected runs scored. In 1964, Earnshaw Cook published a book entitled

Percentage Baseball which suggested that a team would increase the number of runs scored by rearranging its lineup in the following order (where the number represents the original lineup position): 34512789.9 We re-ordered the above lineups using Cook’s

suggestion and ran the simulations again. The new National League lineup scored an

average of 5.189 runs per game, while the new American League lineup averaged 5.361

runs per game. That showed an improvement of .017 runs per game for the National

League and .010 runs per game for the American League, which would average to 2.754

and 1.620 runs, respectively, over the course of a 162-game season. We next did a more

9 Cited in Thorn, John, and Pete Palmer. 1985. The Hidden Game of Baseball. Garden City, New York: Doubleday & Company, Inc.

- 12 - Orr & Singh – America’s Pastime

subtle lineup re-ordering for the National League inspired by current St. Louis Cardinal

manager Tony La Russa which consisted of placing the pitcher in the number 8 lineup

position (the pitcher almost always the weakest hitter and thus hits last). We simply

swapped hitters 8 and 9 in the National League lineup; the new arrangement averaged

5.192 runs per game, which was a full .003 runs per game greater than the Cook lineup

for an additional .486 runs per 162 games!

The above lineups, however, are of dubious value because they do not actually

deal with specific players. Real managers do not deal with league-average players, but

instead deal with a group of players who varying skill sets. All the remaining simulations

deal with real teams and real players. The first of these lineups is the 2006 St. Louis

Cardinals. The following is the lineup that the Cardinals used in their World Series

clinching game 5 victory over the Detroit Tigers:

2006 Cardinals’ World Series Game 5 Lineup David Eckstein 0.292 0.350 0.344 Chris Duncan 0.293 0.363 0.589 Albert Pujols 0.331 0.431 0.671 Jim Edmonds 0.257 0.350 0.471 Scott Rolen 0.296 0.369 0.518 Ron Belliard10 0.237 0.295 0.371 Yadier Molina 0.216 0.274 0.321 So Taguchi 0.266 0.335 0.351 Pitcher11 0.173 0.224 0.222

This lineup averaged 5.402 runs per game. We then, however, decided to

investigate the worth of having David Eckstein bat leadoff. Baseball analysts consider

David Eckstein to be an extremely over-rated baseball player who is a fan-favorite

because he is 5’6”, 170 lb. and has a reputation of being “scrappy.” They also note that

10 Ron Belliard’s statistics are only from his games in the National League. 11 The pitcher’s statistics are the average statistics from all Cardinals pitchers in 2006.

- 13 - Orr & Singh – America’s Pastime his low slugging percentage hurts his team, and argue that his on-base percentage is not high enough to justify batting him leadoff. We then ran the simulation again, swapping

Eckstein with Scott Rolen, who had both a better on-base percentage and a better slugging percentage. The drawback, however, was that Eckstein’s low slugging percentage would not drive in as many people in front of him. This alternate lineup ended up scoring 5.362 runs per game, for a decrease of .040 runs per game and 6.480 runs per

162 games.

Our final batch of simulations pertained to the 2007 Chicago Cubs.12 Their most- used lineup that season was the following (which statistics from the 2007 season):

2007 Cubs Most Used Lineup Alfonso Soriano 0.299 0.337 0.560 Ryan Theriot 0.266 0.326 0.346 Derrek Lee 0.317 0.400 0.513 Aramis Ramirez 0.310 0.366 0.549 Cliff Floyd 0.284 0.373 0.422 Mark DeRosa 0.293 0.371 0.420 Jacque Jones 0.285 0.335 0.400 Jason Kendall13 0.270 0.362 0.356 Pitcher14 0.155 0.167 0.207

This lineup averaged 5.288 runs per game. The second hitter, however, Ryan

“The Riot” Theriot, a fan favorite, shares many qualities with David Eckstein—he only weighs 175 lb. and has a reputation for being scrappy (though he is 5’11”); however, his on-base-percentage is terrible. We suspected that swapping Ryan Theriot with Mark

DeRosa would yield a more productive lineup. We were correct—this lineup scored

12 The Chicago Cubs are always a source of heartbreak for one of us (Orr) and humor for the other (Singh). The latter takes pleasure in the former’s sadness and the former in the latter’s relative ignorance of baseball. 13 Kendall’s statistics are only from his games in the National League. 14 The pitcher line is an average off all Cubs pitchers for the 2007 season.

- 14 - Orr & Singh – America’s Pastime

5.303 runs per game, for a .015 run per game improvement and a whopping 2.43

additional runs per 162 games.

We felt, however, that the Cubs’ lineup could be completely re-worked. For

instance, we simply do not see Alfonso Soriano as a good leadoff hitter. His offensive

value comes from his .560 slugging percentage which is wasted if he bats with no one on

base. Similarly, he does not get on base very often for other hitters because of his .337

on-base percentage. Therefore, we propose an alternate lineup:

2007 Cubs Proposed Lineup Mark DeRosa 0.293 0.371 0.420 Derrek Lee 0.317 0.400 0.513 Aramis Ramirez 0.310 0.366 0.549 Alfonso Soriano 0.299 0.337 0.560 Cliff Floyd 0.284 0.373 0.422 Jason Kendall15 0.270 0.362 0.356 Jacque Jones 0.285 0.335 0.400 Ryan Theriot 0.266 0.326 0.346 Pitcher16 0.155 0.167 0.207

This lineup scored 5.367 runs per game for a .079 run per game and a full 12.798 runs per

162 games over the original Cubs lineup. That would actually imply one additional win

in during the course of the regular season.17

Conclusions:

We are reluctant to draw any significant conclusions from our results for

primarily three reasons. Firstly, we did not anticipate how long it would take to run a

single round of simulations. That length forced us to carefully decide what questions we would investigate, each time not sure if the outcome would be anything significant.

15 Kendall’s statistics are only from his games in the National League. 16 The pitcher line is an average off all Cubs pitchers for the 2007 season. 17 Empirically, ten additional runs scored or ten fewer runs allowed has corresponded to an additional win by a team over the course of a season.

- 15 - Orr & Singh – America’s Pastime

Secondly, our relative inexperience with R coupled with the inefficiency of our code

means that we do not know how much the values obtained through the simulation vary,

nor do we possess information about the standard deviation of runs scored by various

lineups, which might alter our conclusions. Lastly, our model has too many assumptions

to be able to take any of our results with anything more than a grain of salt. For instance,

baserunning is an important skill that we left out; in addition to stealing bases, some

players can advance farther on hits (or outs, for that matter) than other hitters.

Furthermore, some players are more prone to hitting into double plays than others, but

that, of course, also depends on the players in front of a hitter in the lineup. Thus, we are

hesitant to overstate our conclusions.

It is also relevant to note that, with the exception of our proposed Cubs lineup and

perhaps the swapped Cardinals lineup, none of the results are significant enough to have

any noticeable affect on the outcome of the season. This suggests that, unless a manager does something as questionable as Lou Piniella did with the 2007 Cubs, investigating lineup improvements is possibly not worth the effort it entails, since managers take many additional pieces of information into account when they arrange a lineup. About the only things we can safely conclude from this test is that players who reach base are more valuable in front of players who hit for power, players who hit for power are more valuable following players who reach base more often, and that a truly bad arrangement, like the 2007 Cubs, really does hinder the offense. Aside from these caveats, however, we do believe that we have developed a useful tool for analyzing batting orders (using a fairly large set of assumptions to simplify the game). Hopefully, our work will be

- 16 - Orr & Singh – America’s Pastime expanded in future using more efficient code to reduce, or even eliminate, some of these assumptions so that the results obtained are more realistic.

- 17 -