<<

Beyond the * Adjusting for performance inflation in professional sports

Alexander M. Petersen IMT Lucca Lucca, Italy

Sunday, August 5, 2012 Bridging the past and the present

1. Method 3. Re-ranking for The All-Time “deflating” Greats achievement metrics

2. The Statistical Physics of Achievement

Sunday, August 5, 2012 1. Establishing a baseline by removing trends

1394 YANHUI LIU et al. PRE 60 Financial Market Activity El Niño and La Nina

courtesy of William S. Kessler, NOAA / Pacific Marine Environmental Laboratory Y. Liu, P. et al., The Statistical Properties of the Volatility of Price Fluctuations, Phys. Rev. E 60, 1390-1400 (1999).

FIG. 8. ͑a͒ Semilog plot of the autocorrelation function of g(t). ͑b͒ Autocorrelation function of ͉g(t)͉ in the log plot, with sampling time interval ⌬tϭ1 min. The autocorrelation function of A 8 D 10 1.2 WW II ! ! 1.7 ! WW Ig(t) decays exponentiallyCultureto zero within half an hour, C(t) " " ϳexp(Ϫt/␶) with ␶Ϸ4.0 min. A power law correlation C(t)ϳtϪ␥ FIG. 7. The 1-min4 interval intraday pattern for absolute1 price 12 ) 10 exists inThethe instrumental͉g(t)͉ for more rethancordthree goesdecades. back to1882.Note that Paleoboth evidence suggests

f ! changes of the S&P 500Englishstock index ͑1984-1996!+ ! 2.0 ͒͑shifted͒ and for + Time series ( t ) 0.8 0 English (fiction) r graphs are truncatedthatat the El10first Niñoszero havalueve ofoccurC(t)r.eThed fosolidr millionsline in of years. ! P( f ) the absolute priceP( 10 changes, averaged for the chosen 500 companies ␥ English 1M ͑b͒ is the fit to the function 1/(1ϩt ) from which we obtainRock␥ & Roll ͑1994–1995͒. Thein sociallength ofandthe day is 390 minutes. (t) 0.6In order to Jazz English GB r

-4 French >

! ϭ0.30Ϯ0.08. The horizontal dashed line indicates the noise level. avoid the detection10 of spuriousEnglish US correlations, this daily patternGermanis i 8 natural phenomena Russian polyphony removed. Otherwise one finds peaks in the power spectrum0.4 at the -10 -8 -6 -4 -2 Americanism frequenciesareof 1/day typically10 and10larger. non-10 Note10 that10 both the curves1850 have 1900a where1950 the index2000 j runs 6over all theRepatriationtrading days N in the year, t Antibiotics WWI WWII f (word frequency) 13-year period (Nϭ3309 in our study͒ and tday denotes the 6 similar patternB withstationary:7 large values within the first 15 min1.2 afterAmericanthe WW II (t) /

f 4 4 Chinese market opens. English 10 6 G. B. Vietnam 0.26(1)there are underlying Language 1 WWII Korea (t) 10 U. S. caused by this daily oscillation, we remove the intraday pat- 2 0.77(2) b0.51(1) = 0.51(1) 10 w All similar exponent5 ␮Ϸ3 for the cumulative distribution, with a WWI 0 tern from G(t) which we2 schematically write as N 10exogenous lipopeptides

10 ( t ) 0.8 2 4 6 8 6 8 10 r glycylcyclines Fluctuation scale, (t) drop-off at low values. English 10 10 10 10 10 10 10 ! penicillin+ oxazolidinones w and endogenous factors Eng. GB C " = 0.29(1) G. B. 0.6 Eng. US g͑t͒ϵG͑t ͒/A͑t ͒, ͑10͒ N 0 English 6 0 day day 106 100 French 0.29(1) 10 1M V. CORRELATIONS10 IN THE VOLATILITYU. S. 1850 1900 1950 2000 Fict. that can Englishsignificantly All Chinese 0.49(1) (t) 0.52(1) 0.4 year, t r 5 for all days. Each data point g(t), denotes the normalized ! A. M. Petersen, J. Tenenbaum, S. Havlin, H. E. Stanley. 10 64 5 7 6 8 7 9A.8 Volatility109 fluctuate7 correlations8 !8 9for S&P10 500 119stock index1850 1900 1950 2000 1010 1010 10 10 10 1010 1010 10 10 10 10 10 1010 6 year,absolute t price changeStatisticalat time Lawst, which Governingis computed Fluctuations byin Worddivid- Use from Word Birth to Word Death. Unlike price10 changes that are correlated only on very Scientific Reports 2, 313 (2012). 60 English 0 HebrewCorpus Size, Nu(t) Year,ing teach point G(tday) at time tday of the day by A(tday) for 1010 German 105 1M short time scales10 ͓40͔͑a few minutes0.47(1)͒, the absolute values of Fict. 0.18(1) 0.60(1) French all days. Sunday, August4 5, 20120.65(1)0.24(1) 5 10 10 price changes show long-range power-law correlations on Three methods—correlation function, power spectrum, 66 7 7 8 8 9 9 10 4 5 8 6 7 8 9 1010 1010 10 10 10 10 10 10 10 1010 10 1010 ( t ) time scales up to a year or more ͓20–31͔. Previous works and detrended fluctuation analysis ͑DFA͒— are employed to r 7 Vocabulary size, 10 6 ! 60 have shown that0 understanding the power-law correlations, quantify the correlation of the volatility. The pros and cons 1010 Russian 10 Spanish 5 German 0.39(1) 10 0.22(1)specifically the 5values of the exponents,0.22(1)0.51 can be helpful for of each method and the relations between them are described 4 0.65(1) 10 Hebrew 10 guiding the selection of models and mechanisms ͓32͔. There- in the Appendix. 6 5 76 7 8 8 99 65 7 6 8 7 9 8 10 10 1010 10fore,10 10in this1010 part10we focus1010on the10quantification10 10 10 of power-law 0 Corpus size,0 N (t) 10 correlations of10the volatility.u To quantify the correlations, we 2. Correlation quantification Russian 0.12(1)use ͉G(t)͉ instead Spanishof V (t), i.e., time window T is set to 1 T 0.24(1) Figure 8͑a͒ shows the autocorrelation function of the nor- 4 5 6 min7 with8 ⌬9 tϭ1 6min for7 the best8 resolution.9 10 10 10 10 10 10 10 10 10 10 malized price changes g(t), which shows exponential decay with a characteristic time of the order of 4 min. However, we Corpus size, N1.uIntraday(t | fc) pattern removal find that the autocorrelation function of ͉g(t)͉ has power law It is known that there exist intraday patterns of market decay, with long persistence up to several months, Fig. 8͑b͒. activity in the NYSE and the S&P 500 index ͓23–25,42͔.A This result is consistent with previous studies on several eco- possible explanation is that information gathers during the nomic time series ͓20–28,40͔. time of closure and hence traders are active near the opening More accurate results are obtained by the power spectrum hours, and many liquidity traders are active near the closing ͓Fig. 9͑a͔͒, which shows that the data fit not one but rather Ϫ␤1 hours ͓25͔. We find a similar intraday pattern in the absolute two separate power laws: for f Ͼ f ϫ , S( f )ϳ f , while for Ϫ␤2 price changes ͉G(t)͉ ͑Fig. 7͒. In order to quantify the corre- f Ͻ f ϫ , S( f )ϳ f , where lations in absolute price changes, it is important to remove this trend, lest there might be spurious correlations. The in- ␤1ϭ0.31Ϯ0.02, f Ͼ f ϫ , ͑11͒ traday pattern A(tday), where tday denotes the time in a day, is defined as the average of the absolute price change at time ␤2ϭ0.90Ϯ0.04, f Ͻ f ϫ , ͑12͒ t of the day for all days: Saturday, July 21, 12 day N and j ͚ ͉G ͑tday͉͒ jϭ1 1 A͑t ͒ϵ , ͑9͒ f ϭ minϪ1, ͑13͒ day N ϫ 570 Accounting for Inflation

Just as the price of a candy bar has increased by a factor of ~ 20 over the last 100 years (roughly 3% inflation rate),

Sunday, August 5, 2012 72 The European Physical Journal B

A B

100 > 100 D 90 90

80 80

70 70

60 60

50 50 League average < K > 40 40 Detrended league average < K 1920 1940 1960 1980 2000 1920 1940 1960 1980 2000 Year Year CD14 14 > D 12 12 10 10 8 8 6 6 4 4

League average < HR > 2 2 Detrended league average < HR 1920 1940 1960 1980 2000 1920 1940 1960 1980 2000 Year Year Fig. 4. Acomparisonoftraditionalanddetrendedleagueaveragesdemonstratestheutilityofthedetrendingmethod.Annual per-player averages for (A) (B) detrended strikeouts (for ),Accounting (C) home , for and (D)Inflation detrended home runs (for batters). The detrended average is remarkably constant over the 90-year “modern era” period 1920–2009, however there remains a negative trend in the detrended average. This residual trend in the strikeout average may result from the decreasing role of starters (resulting in shorter stints) and the increased role in the relievers, which affects the average number of opportunities obtained for players in a given season. This follows from the definition of the detrended average given by equation (10). A second detrending for average pitched per game might remove this residual trend demonstrated in Figure 5.Thesharpnegativefluctuationsin1981and1994–1995correspondtoplayerstrikesresultinginJust as the price season stoppage and a reduced average number of opportunities y(t) for these seasons. ! " of a candy bar has increased by a factor of ~ 20 removed (detrended) by normalizing accomplishments by where P is the average of P (t) over the entire period, " # over the last 100 the average prowess for a given season. years (roughly 0.04 2009 We first calculate the prowess Pi(t)ofanindividual 1 3% inflation rate), P Raising the moundP (’62)(t) . (8) player i as 0.035 ≡ 110 " # the home run t=1900 Pi(t) xi(t)/yi(t), (4) 0.03 " PED hitting ability ≡ era The choice0.025 of normalizing with respect to P ofis players arbi- has where x (t)isanindividual’stotalnumberofsuccessesout x 3 i trary, and we could just as well normalize with respectalso increased to by of his/her total number of opportunities y (t)inagiven 0.02 i P (2000), placing all values in terms of current “2000a significant US year t. To compute the league-wide average prowess, we 0.015 dollars”, as is typically done in economics. factor over the then compute the weighted average for season t over all 0.01 Lowering the mound (’69) same period In Figure 4 we compare the seasonal average of x(t) players Home Run Prowess 0.005 end of dead-ball era, emergenceD of " # to the prowess-weighted“Ruthian” average powerx hitters(t) ,forstrikeoutsper 0 " # player and1900 home1920 runs1940 per player.1960 We1980 define2000 x(t) as i xi(t) Year, t " # P (t) = wi(t)Pi(t), (5) " #≡ y (t) Sunday, August 5, 2012 !i i i 1 " x(t) = xi(t) " # N(t) ! i where " y (t) yi(t) w (t)= i . (6) = P (t) i = P (t) y(t) (9) i " # N(t) " #" # i yi(t) ! The index i runs over all players! with at least y! oppor- and xD(t) as, " # tunities during year t,and i yi is the total number of opportunities of all N(t)playersduringyeart.Weusea D 1 D P cutoff y 100 which eliminates! statistical fluctuations x (t) = xi (t)= xi(t) ! " # N(t) P (t) N(t) ≡ i i that arise from players with very short seasons. " " # " We now introduce the detrended metric for the accom- x(t) = P " # = P y(t) . (10) plishment of player i in year t, P (t) " # " # P As a result of our detrending method defined by equa- xD(t) x (t) (7) i ≡ i P (t) tion (7), which removes the time-dependent factors that " # 72 The European Physical Journal B

A B

100 > 100 D 90 90

80 80

70 70

60 60

50 50 League average < K > 40 40 Detrended league average < K 1920 1940 1960 1980 2000 1920 1940 1960 1980 2000 Year Year CD14 14 > D 6 12 12 10 10 vidual player careers. A

0.25 < P (t) > 8 8 B. Quantifying Average Prowess K 0.2 6 6 We define prowess as an individual player’s ability to 0.15 achieve a success x (e.g. a home run, strikeout) in any 4 4 given opportunity y (e.g. an AB or IPO). In Fig. 4 0.1 we plot the average annual prowess for strikeouts (- Strikeout Prowess League average < HR > 2 2 ers) and home runs (batters) over the 133-year period 0.05 Detrended league average < HR 1920 1940 1960 1980 2000 1920 1940 1960 1980 2000 1876-2009 in order to investigate the evolution of player 0 ability in . The average prowess 1880 1900 1920 1940 1960 1980 2000 Year Year Year serves as an index for comparing accomplishments in dis- B tinct years. We conjecture that the changes in the av- 0.04 Fig. 4. Acomparisonoftraditionalanddetrendedleagueaveragesdemonstratestheutilityofthedetrendingmethod.Annualerage prowess are related to league-wide factors which can be quantitatively removed (detrended) by normaliz- < P HR (t) > per-player averages for (A) strikeouts (B) detrended strikeouts (for pitchers), (C) home run, and (D) detrended homeing runs accomplishments (for by the average prowess for a given 0.03 P2 season. batters). The detrended home run average is remarkably constant over the 90-year “modern era” period 1920–2009, however Ps We first calculate the prowess Pi(t)ofanindividual 0.02 there remains a negative trend in the detrended strikeout average. This residualDetrending trend in the method strikeout average mayplayer i resultas from the decreasing role of starters (resulting in shorter stints) and the increased role in the bullpen relievers, which affects Pi(t) xi(t)/yi(t) , (5) 0.01

≡ Home Run Prowess the average number of opportunities obtained for players in a given season. This follows from the definition of the detrended where xi(t)isanindividual’stotalnumberofsuccesses 0 average given by equation (10). A second detrending for average inningsTime-dependent pitched per economic, game mighttechnological, remove and this social residual of his/her trend total number of opportunities yi(t)in 1880 1900 1920 1940 1960 1980 2000 factors can artificially inflate or deflate quantitative agivenyeart.Tocomputetheleague-wideaverage Year demonstrated in Figure 5.Thesharpnegativefluctuationsin1981and1994–1995correspondtoplayerstrikesresultinginprowess, we then compute the weighted average for sea- measures for season and career achievement. son t over all players FIG. 4: Success rates reflect the time-dependent factors that season stoppage and a reduced average number of opportunities y(t) for these seasons. can inflate or deflate measures of success. The annual prowess for (A) strikeouts (K) and (B) home runs (HR) are calculated ! " xi(t) P (t) i = w (t)P (t) , (6) using Eq. (6). Prowess is a weighted measure of average xi(t) = # of successes " #≡! y (t) i i i i "i league wide ability, with more active players having a larger ! statistical weight than less active players in the calculation of yi(t) = # of opportunities where the prowess value P . (B) We also plot the average values ! " removed (detrended) by normalizing accomplishments by where P is the average of P (t) over the entire period, P 1 and P s of PHR(t) over the 16-year periods Y1 1978 ! " { } ≡ − Pi(t) = x/y =" success# rate yi(t) 1993 and Ys 1994 2009, where the latter period roughly wi(t)= . (7) { } ≡ − the average prowess for a given season. y (t) corresponds to the “steroids” era. We calculate P 1 =0.025 i i ± 0.04 2009 0.003 and P s =0.033 0.002, and find P 1 < P s at the We first calculate the prowess P (t)ofanindividual ! ± i 1 The index i runs over all players with at least y! oppor- 0.005 confidence level. For the difference ∆ P s P 1,we P P (t) . (8) calculate the confidence interval 0.005 < ∆ ≡< 0.010− at the player i as 0.035 tunities during year t,and i yi is the total number of ≡ 110 " # 0.01 confidence level. t=1900 opportunities of all N(t)playersduringyear! t.Weusea 0.03 cutoff y! 100 which eliminates statistical fluctuations Pi(t) xi(t)/yi(t), (4) " ≡ ≡ that arise from players with very short seasons. In Fig. 5 we compare the seasonal average of x(t) to The choice0.025 of normalizing with respect to P Weis now arbi- introduce the detrended metric for the accom- the prowess-weighted average xD(t) ,forstrikeoutsper" # where xi(t)isanindividual’stotalnumberofsuccessesout plishment of player i in year t, player and home runs per player." We# define x(t) as trary, and0.02 we could just as well normalize with respect to " # 1 of his/her total number of opportunities yi(t)inagiven D P xi (t) xi(t) (8) x(t) = xi(t) P (2000), placing all values in terms of current “2000 US ≡ P (t) " # N(t) " year t. To compute the league-wide average prowess, we 0.015 " # i y (t) dollars”, as is typically done in economics. where P is the average of P (t) over the entire period, = P (t) i i = P (t) y(t) (10) then compute the weighted average for season t over all 0.01 ! In Figure 4 we compare the seasonal average of x(t) " # " # N(t) " #" # D players Home Run Prowess 0.005 D 2009 and x (t) as, " # 1 " # to the prowess-weighted average x (t) ,forstrikeoutsperP P (t) . (9) ≡ 110 " # D 1 D P 0 " # t=1900" x (t) = x (t)= x (t) player and1900 home1920 runs1940 per player.1960 We1980 define2000 x(t) as " # N(t) i P (t) N(t) i xi(t) "i "i i Year, t A. M. Petersen," O. Penner,# H. E. Stanley , “ʼMethods for detrending success metrics to " # P (t) = wi(t)Pi(t), (5) The choice of normalizing with respect to P is arbi- x(t) account for inflationarytrary, and and deflationary we could factors.” just as Eur. well Phys. normalize J. B 79, 67-78 with (2011). respect = P " # = P y(t) . (11) " #≡ i yi(t) 1 P (t) " # ! i Sunday, August 5, 2012 to P (2000), placing all values in terms of current “2000 " # " x(t) = xi(t) " # N(t) US dollars,” as is typically done in economics. As a result of our detrending method defined by Eq. ! i where " y (t) yi(t) w (t)= i . (6) = P (t) i = P (t) y(t) (9) i " # N(t) " #" # i yi(t) ! The index i runs over all players! with at least y! oppor- and xD(t) as, " # tunities during year t,and i yi is the total number of opportunities of all N(t)playersduringyeart.Weusea D 1 D P cutoff y 100 which eliminates! statistical fluctuations x (t) = xi (t)= xi(t) ! " # N(t) P (t) N(t) ≡ i i that arise from players with very short seasons. " " # " We now introduce the detrended metric for the accom- x(t) = P " # = P y(t) . (10) plishment of player i in year t, P (t) " # " # P As a result of our detrending method defined by equa- xD(t) x (t) (7) i ≡ i P (t) tion (7), which removes the time-dependent factors that " # Accounting for socio-technological factors that underly achievement

A B >

900 D 900 800 800 ≈ 600 pts / season Quantitative measures 700 700 for success are 600 600 important for comparing 500 500 both individual and group 400 400 accomplishments, often 300 300 achieved in different time League average < Pts > periods.

200 Detrended league average < Pts 200 1950 1960 1970 1980 1990 2000 2010 1950 1960 1970 1980 1990 2000 2010 Year statistical baseline Year However, the C 14 D 14 > D evolutionary nature of 12 12 competition results in a 10 10 ≈ 7 HRs / season non-stationary rate of 8 8 success, that makes 6 6 comparing 4 4 accomplishments across Baseball League average < HR > 2 2 time statistically biased. Detrended league average < HR 1920 1940 1960 1980 2000 1920 1940 1960 1980 2000 Year Year applying detrending method While there is much speculation and controversy surrounding the causes for changes in player ability, we do not address these individually. In essence, we blindly account for not only the role of PED, but also changes in the physical construction of bats and balls, sizes of ballparks, talent dilution of players from expansion, etc.

Sunday, August 5, 2012 2. the “Socio-physics” of Careers

-1 10 Statistical Physics -2 10 approach to -3 understanding 10 -4 longevity and 10 P(x) -5 success in 10 competition -6 10 driven systems -7 10 0 1 2 3 4 5 10 10 10 10 10 10 career longevity, x

Sunday, August 5, 2012 5

0.2 player ability, a generic concept that can be easily applied Not surprisingly, player to other professions. Ref. [9] finds clear evidence for height is governed by a non-stationarity in the seasonal home-run ability, both 0.15 on the career and the seasonal level. By comparing the standard bell-shaped pdfs for career home runs for players belonging to either distribution the 1920–1960 or the 1960–2000 periods, it is shown that 0.1 the pdf for career home-runs are shifted towards larger totals in the more-recent 1960–2000 period. Moreover, P(Height) 95 % by comparing the pdfs for seasonal home-run ability for The ratio of the tallest baseball 0.05 players belonging to one of the three periods 1940–1959, player (, 6 feet 11 inches) to the shortest baseball -2σ +2σ 1960–1979, or 1980–2006, it is shown that at the fun- damental seasonal time-scale, the home-run rates among player (, 3 feet 7 0 inches) is roughly 2. 5-06 5-07 5-08 5-09 5-10 5-11 6-0 6-01 6-02 6-03 6-04 6-05 6-06 players is also changing, where the pdf is becoming more Player Height (feet-inches) right-skewed with time. These results show why it is im- portant to account for the era-dependence of statistics The relatively small value of this FIG. 3: A demonstration of a probability density function when comparing career statistical totals. height ratio follows from the that has a characteristic scale. The pdf of Major League Yet, this is not the only time-dependent factor that properties of the Gaussian Baseball player height. The data are fit well by a Gaussian distribution, which is well-suited for we consider. By detrending, we remove the net trend “bell-curve” pdf (dashed line) with an average height of 6.0 the description of height in a resulting from many underlying factors, season by sea- feet 2 inches. Data courtesy of baseball-almanac.com, ac- human population. ± son, which allows the proper (statistical) comparison of cessed at: contemporary players to players of yore (of lore). A sig- http://www.baseball-almanac.com/charts/heights/heights.shtml nificant result of this paper is that detrending for sea- sonal prowess maintains the overall pdf of success while re-ordering the ranking of player achievements locally. .... but how about thesociology, longest meteorology, and shortest and medicine. MLB Detrending career? with This means that the emergence of the right-skewed pdfs respect to price inflation in economics is commonly re- for longevity and success are not due to changes in player Sunday, August 5, 2012 ferred to as “deflation” and relies on a consumer price ability, but rather, result from the fundamental nature of index (CPI). The CPI allows one to properly compare competition. the cost of a candy bar in 1920 dollars to the cost of a The idea behind detrending is relatively straightfor- candy bar in 2010 dollars. In stock market analysis, one ward. By calculating the average prowess of all players typically detrends intraday volatility by removing the in- in a given season, we effectively renormalize all statistical traday trading pattern corresponding to relatively high accomplishments to the typical prowess of all contempo- market activity at the beginning and end of the market raneous competitors. Hence, detrending establishes rela- day, which results in a daily activity trend that is “U- tive significance levels, such that hitting fifty home runs shaped”. In meteorology, trends are typically cyclical, was of less relative significance during the “Steroids Era” corresponding to daily, lunar, and annual patterns, and than hitting fifty home runs during the 1920’s. The ob- even super-annual patterns as in the case of the El Nino jective of this work is to calculate the detrended statistics effect. Cyclical trends are also encountered in biological of a player’s whole career. To this end, we compare ca- systems, as in the case of protein concentration fluctu- reer metrics that take into account the time-dependence ations over cell life cycles. In baseball, the trends that of league-wide player ability. While there is much specu- we will analyze are those that are associated with player lation and controversy surrounding the causes for changes performance ability, or prowess. in player ability, we do not address these individually. In It is common for paradigm shifts to change the nature essence, we blindly account for not only the role of PED of business and the patterns of success in competitive [22–29], but also changes in the physical construction of professions. Baseball has many examples of paradigm bats and balls, sizes of ballparks, talent dilution of play- shifts, since the game has changed radically since its con- ers from expansion [30, 31], etc. ception over a century ago. As a result, the relative value of accomplishments depends on the underlying time pe- riod. For example, although a home-run will always be ahome-run,andastrikeoutwillalwaysbeastrikeout, II. MATERIALS AND METHODS the rate at which these two events occur has changed drastically over time. A. Data ArelevanthistoricalexampleisthecaseofBabeRuth. Before Ruth, home runs were much less frequent than We analyze historical Major League Baseball (MLB) they are in 2010. However, following changes in the rule player data compiled and made publicly available by Sean set accompanied by ’s success in the 1920’s, Lahman [7]. The Lahman Baseball Database is updated many sluggers emerged that are summarily remembered at the end of each year, and has player data dating back for their home run prowess. The main time-dependent as- to 1871. In total, this database records approximately pect we consider in this paper is the variation in relative 35,000 players seasons and approximately 17,000 indi- Career longevity distribution A 0 10

! = 1 -2 x = IPO 10

-4 P(x) 10 pitchers, careers ending 1920-2009 -6 10 ⟨x⟩± σ = 1300 ± 2100

0 1 2 3 4 5 10 10 10 10 10 10 x B career length measured in opportunities 0 10

! = 1 x = PA Sunday, August 5, 2012-2 = 950 10 x = AB

-4 P(x) 10

-6 10

0 1 2 3 4 5 10 10 10 10 10 10 x 5

0.2 player ability, a generic concept that can be easily applied Career longevity distribution to other professions. Ref. [9] finds clear evidence for xmp = non-stationarity in the seasonal home-run ability, both A 3%, xmp = 3 0.15 on the career and the seasonal level. By comparing the 0 10 pdfs for career home runs for players belonging to either the 1920–1960 or the 1960–2000 periods, it is shown that 0.1 ! = 1 the pdf for career home-runs are shifted towards larger -2 x = IPO 10 totals in the more-recent 1960–2000 period. Moreover, P(Height) by comparing the pdfs for seasonal home-run ability for 0.05 players belonging to one of the three periods 1940–1959, -4 P(x) 10 1960–1979, or 1980–2006, it is shown that at the fun- pitchers damental seasonal time-scale, the home-run rates among 0 -6 = 1300 5-06 5-07 5-08 5-09 5-10 5-11 6-0 6-01 6-02 6-03 6-04 6-05 6-06 players is also changing, where the pdf is becoming more 10 Player Height (feet-inches) right-skewed with time. These results show why it is im- portant to account for the era-dependence of statistics 0 1 2 3 4 5 In order to emphasize the disparity between the FIG. 3: A demonstration of a probability density function when comparing career statistical totals. 10 10 10 10 10 10 long and short careers, consider the ratio of the x that has a characteristic scale. The pdf of Major League Yet, this is not the only time-dependent factor that B extremely right-skewed! longestBaseball career player height. (Pete TheRose, data 14,053 are fit wellat-bats) by a Gaussianto the we consider. By detrending, we remove the net trend 0 “bell-curve” pdf (dashed line) with an average height of 6.0 3% 10 shortest career (many individuals with one at-bat), resulting from many underlying factors, season by sea- feet 2 inches. Data courtesy4 of baseball-almanac.com, ac- son, which allows the proper (statistical) comparison of whichcessed± is at: roughly 1x 10 . For comparison, the ratio ! = 1 x = PA contemporary players to players of yore (of lore). A sig- -2 ofhttp://www.baseball-almanac.com/charts/heights/heig the tallest baseball player (Jon Rauch,hts.shtml 6 feet 11 10 x = AB nificant result of this paper is that detrending for sea- inches) to the shortest baseball player (Eddie sonal prowess maintains the overall pdf of success while Gaedel, 3 feet 7 inches) is roughly 2. re-ordering the ranking of player achievements locally. -4 sociology, meteorology, and medicine. Detrending with P(x) This means that the emergence of the right-skewed pdfs 10 respect to price inflation in economics is commonly re- batters for longevity and success are not due to changes in player ferred to as “deflation” and relies on a consumer price ability, but rather, result from the fundamental nature of -6 = 950 index (CPI). The CPI allows one to properly compare 10 the probability density function (pdf) competition. the cost of a candy bar in 1920 dollars to the cost of a The idea behind detrending is relatively straightfor- candy barP(x) in 2010is defined dollars. so In that stock the market probability analysis, one 0 1 2 3 4 5 ward. By calculating the average prowess of all players 10 10 10 10 10 10 typicallyof detrends observing intraday an event volatility in bythe removing interval the in- in a given season, we effectively renormalize all statistical x traday trading pattern corresponding to relatively high (x, x + δx) is P(x)δx. accomplishments to the typical prowess of all contempo- career length measured in opportunities market activity at the beginning and end of the market raneous competitors. Hence, detrending establishes rela- xmp = 1 day, which results in a daily activity trend that is “U- tive significance levels, such that hitting fifty home runs shaped”. In meteorology, trends are typically cyclical, Sunday, August 5, 2012 was of less relative significance during the “Steroids Era” = 950 corresponding to daily, lunar, and annual patterns, and than hitting fifty home runs during the 1920’s. The ob- even super-annual patterns as in the case of the El Nino jective of this work is to calculate the detrended statistics effect. Cyclical trends are also encountered in biological of a player’s whole career. To this end, we compare ca- systems, as in the case of protein concentration fluctu- reer metrics that take into account the time-dependence ations over cell life cycles. In baseball, the trends that of league-wide player ability. While there is much specu- we will analyze are those that are associated with player lation and controversy surrounding the causes for changes performance ability, or prowess. in player ability, we do not address these individually. In It is common for paradigm shifts to change the nature essence, we blindly account for not only the role of PED of business and the patterns of success in competitive [22–29], but also changes in the physical construction of professions. Baseball has many examples of paradigm bats and balls, sizes of ballparks, talent dilution of play- shifts, since the game has changed radically since its con- ers from expansion [30, 31], etc. ception over a century ago. As a result, the relative value of accomplishments depends on the underlying time pe- riod. For example, although a home-run will always be ahome-run,andastrikeoutwillalwaysbeastrikeout, II. MATERIALS AND METHODS the rate at which these two events occur has changed drastically over time. A. Data ArelevanthistoricalexampleisthecaseofBabeRuth. Before Ruth, home runs were much less frequent than We analyze historical Major League Baseball (MLB) they are in 2010. However, following changes in the rule player data compiled and made publicly available by Sean set accompanied by Babe Ruth’s success in the 1920’s, Lahman [7]. The Lahman Baseball Database is updated many sluggers emerged that are summarily remembered at the end of each year, and has player data dating back for their home run prowess. The main time-dependent as- to 1871. In total, this database records approximately pect we consider in this paper is the variation in relative 35,000 players seasons and approximately 17,000 indi- R EPORTS ing systems form a huge genetic network these two ingredients, we show that they are cited in a paper. Recently Redner (11) has whose vertices are proteins and genes, the responsible for the power-law scaling ob- shown that the probability that a paper is chemical interactions between them repre- served in real networks. Finally, we argue cited k times (representing the connectivity of senting edges (2). At a different organization- that these ingredients play an easily identifi- a paper within the network) follows a power

al level, a large network is formed by the able and important role in the formation of law with exponent ␥cite ϭ 3. nervous system, whose vertices are the nerve many complex systems, which implies that The above examples (12) demonstrate that cells, connected by axons (3). But equally our results are relevant to a large class of many large random networks share the com- complex networks occur in social science, networks observed in nature. mon feature that the distribution of their local where vertices are individuals or organiza- Although there are many systems that connectivity is free of scale, following a power tions and the edges are the social interactions form complex networks, detailed topological law for large k with an exponent ␥ between between them (4), or in the World Wide Web data is available for only a few. The collab- 2.1 and 4, which is unexpected within the R EPORTS (WWW), whose vertices are HTML docu- oration graph of movie actors represents a framework of the existing network models. ing systems form a huge geneticments network connectedthese by two links ingredients, pointing we show from that one they arewell-documentedcited in a paper. exampleRecently Redner of a (11 social) has net- The random graph model of ER (7) assumes whose vertices are proteins and genes, the responsible for the power-law scaling ob- shown that the probability that a paper is chemical interactions betweenpage them to repre- anotherserved (5, 6 in). real Because networks. of their Finally, large we arguework.cited Eachk times actor (representing is represented the connectivity by a ofvertex, that we start with N vertices and connect each senting edges (2). At a differentsize organization- and the complexitythat these ingredients of their play interactions, an easily identifi-twoa actors paper within being the connected network) follows if they a power were cast pair of vertices with probability p. In the

al level, a large network is formedthe topology by the able of these and important networks role in is the largely formation oftogetherlaw with in exponentthe same␥cite movie.ϭ 3. The probability model, the probability that a vertex has k nervous system, whose verticesunknown. are the nerve many complex systems, which implies thatthat anThe actor above has examplesk links (12 (characterizing) demonstrate that his or edges follows a Poisson distribution P(k) ϭ cells, connected by axons (3). But equally our results are relevant to a large class of many large random networks share the com- Ϫ␭ k complex networks occur in socialTraditionally, science, networks networks observed of incomplex nature. topol- her popularity)mon feature that has the a distribution power-law of their tail local for large e ␭ /k!, where Ϫ␥actor where vertices are individualsogy or organiza- have been describedAlthough there with are the many random systems thatk, followingconnectivityP is( freek) ϳ of scale,k following, where a power␥actor ϭ tions and the edges are the social interactions form complex networks, detailed topological law for large k with an exponent ␥ between N Ϫ 1 graph theory of Erdo˝s and Re´nyi (ER) (7), 2.3 Ϯ 0.1 (Fig. 1A). A more complex net- k NϪ1Ϫk between them (4), or in the Worldbut Wide in the Web absencedata is of available data on for large only a networks, few. The collab-work2.1 with and over 4, which 800 is million unexpected vertices within (8 the) is the ␭ ϭ N p ͑1 Ϫ p͒ (WWW), whose vertices are HTML docu- oration graph of movie actors represents a framework of the existing network models. ͩ k ͪ ments connected by links pointingthe predictionsfrom one well-documented of the ER theory example were of rarelya social net-WWW,The random where graph a vertex model is of a ER document (7) assumes and the page to another (5, 6). Becausetested of their in large the realwork. world. Each actor However, is represented driven by by a vertex,edgesthat are we start the with linksN vertices pointing and from connect one each docu- In the small-world model recently intro- size and the complexity of theirthe interactions, computerizationtwo actors of being data connected acquisition, if they such were castmentpair to of another. vertices The with topologyprobability ofp. In this the graph duced by Watts and Strogatz (WS) (10), N the topology of these networkstopological is largely informationtogether in the isincreasingly same movie. The avail- probabilitydeterminesmodel, the the probability Web’s connectivity that a vertex has and,k con- vertices form a one-dimensional lattice, unknown. that an actor has k links (characterizing his or edges follows a Poisson distribution P(k) ϭ Traditionally, networks of complexable, raising topol- theher possibility popularity) has of a understandingpower-law tail for largesequently,eϪ␭␭k/k our!, where effectiveness in locating infor- each vertex being connected to its two ogy have been described withthe the dynamical random k, and following topologicalP(k) ϳ kϪ␥ stabilityactor, where of␥ ϭmation on the WWW (5). Information about nearest and next-nearest neighbors. With actor N Ϫ 1 graph theory of Erdo˝s and Re´nyilarge (ER) networks. (7), 2.3 Ϯ 0.1 (Fig. 1A). A more complex net-P(k) can be obtained usingk robotsNϪ1Ϫk (6), indi- probability p,eachedgeisreconnectedtoa but in the absence of data on large networks, work with over 800 million vertices (8) is the ␭ ϭ N p ͑1 Ϫ p͒ Here we report on the existence of a high cating that theͩ probabilityk ͪ that k documents vertex chosen at random. The long-range the predictions of the ER theory were rarely WWW, where a vertex is a document and the tested in the real world. However,degree driven of by self-organizationedges are the links characterizing pointing from onethe docu-pointIn to the a certainsmall-world Web model page recently follows intro- a power connections generated by this process de- the computerization of data acquisition,large-scale such propertiesment to another. of complex The topology networks. of this graphlaw,duced with by␥www Wattsϭ and2.1 StrogatzϮ 0.1 (WS) (Fig. (10 1B)), N (9). A crease the distance between the vertices, topological information is increasinglyExploring avail- severaldetermines large the databases Web’s connectivity describing and, con-networkvertices whose form topology a one-dimensional reflects the lattice, histori- leading to a small-world phenomenon (13), able, raising the possibility ofthe understanding topologysequently, of large our networks effectiveness that in locating span infor-cal patternseach vertex of urbanbeing connected and industrial to its develop- two often referred to as six degrees of separa- the dynamical and topological stabilityR EPORTS of mation on the WWW (5). Information about nearest and next-nearest neighbors. With large networks. that proposedfields to explain, inas terms of diverse sp-d bond-P(downk)as tocan about the be 5% at obtained 300 WWW K in Co/STO/ using or robots citation (6), indi-mentprobability is the electricalp,eachedgeisreconnectedtoa power grid of the west- tion (14). For p ϭ 0, the probability distri- ing, the positive polarization at the Co-ALO LSMO (4). However, other types of oxides of Here we report on the existenceinterfacepatterns (8). However, of athere high is in no general science, the-catingthe double-perovskite that we the showfamily probability (for example, that, that indepen-k documentsern Unitedvertex chosen States, at the random. vertices The long-range being genera- bution of the connectivities is P(k) ϭ␦(k Ϫ ory predicting the trend of the experimental Sr2FeMoO6) combine electronic properties results for Co—that is, a negative polarization similar to those of manganites with a defi- degree of self-organization characterizingwith oxides of d elements (STO, the CLO, Ta O point) nitely higher to a Curie certain temperature Web (15). Their page follows a power connections generated by this process de- dent of the2 system5 and the identity of its tors, transformers, and substations and the z), where z is the coordination number in and a positive one when there are only s and p use in magnetic tunnel junctions is promising large-scale properties of complexstates (ALO). Itnetworks. is likely that the spin polariza-law,for a newwith generation␥www of tunnelϭ junctions2.1 withϮ 0.1 (Fig. 1B) (9). A crease the distance between the vertices, tion shouldconstituents, also depend on the position of the the very probability high magnetoresistance forP room-tem-(k) that a ver- edges being to the high-voltage transmission the lattice; whereas for finite p, P(k)still Exploring several large databasesFermi level with describing respect to the electronic levelsnetworkperature applications. whose topology reflects the histori- leading to a small-world phenomenon (13), of each character above and below the gap of the topology of large networksthe insulator.tex In that addition,in span the as an evanescent networkcal patternsReferences and interacts Notes of urban withand industrialk other develop-linesoften between referred them to as (10 six). degrees Because of separa- of the rel- peaks around z,butitgetsbroader(15). A wave in an insulator is a Bloch wave with an 1. J. S. Moodera, L. R. Kinder, T. M. Wong, R. Meservey, Phys. Rev. Lett. 74, 3273 (1995). fields as diverse as the WWWimaginaryvertices wave or vector, citation one can decays expect differ-ment2. as J. De is Boeck, a theScience power281 electrical, 357 (1998). law, power following grid of the west-ativelytion modest (14). For sizep ϭ 0, of the the probability network, distri- contain- common feature of the ER and WS models ent decay lengths for Bloch waves of different 3. M. Sharma, S. X. Wang, J. H. Nickel, Phys. Rev. Lett. patterns in science, we showcharacter. that, This means indepen- that the finalϪ␥ polarizationern82 United, 616 (1999). States, the vertices being genera- bution of the connectivities is P(k) ϭ␦(k Ϫ P(k) k . This4. J. M. De result Teresa et al., ibid., p. indicates 4288. that large ing only 4941 vertices, the scaling region is is that the probability of finding a highly could also depend onϳ the thickness of the bar- 5. P. M. Tedrow and R. Meservey, Phys. Rev. B 7, 318 dent of the system and therier, as identity illustrated by the of calculations its of Mac-tors,(1973); transformers, R. Meservey and P. M. Tedrow, Phys. and Rep. substations and the z), where z is the coordination number in Laren et al.forFe/ZnSe/Fejunctions(14). 238, 173 (1994). networks self-organize6. M. B. Stearns, J. Magn. into Magn. Mater. a5, 187 scale-free (1997). state, less prominent but is nevertheless approxi- connected vertex (that is, a large k)decreas- constituents, the probability PThe(k influence) that of the a barrier ver- on the spinedges7. J. A. Hertz being and K. Aoi, Phys. to Rev. the B 8, 3252 high-voltage (1973). transmission the lattice; whereas for finite p, P(k)still polarization opens new ways to shape and op- 8. D. Nguyen-Mahn et al., Mat. Res. Soc. Symp. Proc. timizea the TMR. feature Interesting biasunpredicted dependencies 492, 319 (1998). by all existing random mated by a power law with an exponent es exponentially with k;thus,verticeswith tex in the network interacts with k other lines9. J. H. between Park et al., Nature 392, them 794 (1998). (10). Because of the rel- peaks around z,butitgetsbroader(15). A can be obtained with barriers selecting the d 10. J. Nassar, M. Hehn, A. Vaure`s, F. Petroff, A. Fert, Appl. Fig. 3. Bias dependence of the TMR ratio in (A) electrons and probing the fine structure of the Phys. Lett. 73, 698 (1998). verticesCo/STO/LSMO decays and (B) as Co/ALO/STO/LSMO a powerd-DOS, law,network as in Fig. following 3A. The DOS models. of a d-band canatively11.To J. S. Moodera modest explainet al., ibid. 81, 1953 size (1998). the of origin the network, of this contain-␥ commonӍ 4 feature (Fig. 1C). of the Finally, ER and WS a rather models large large connectivity are practically absent. In tunnel junctions. power Ϫ␥ also be easily tailored by alloying (for example, 12. K. Wang, thesis, New York University (1999). P(k) ϳ k . This result indicates that large ing13. S. only Zhang, P. M. 4941Levy, A. C. Marley, vertices, S. S. P. Parkin, Phys. the scaling region is is that the probability of finding a highly by introductionscale of virtual invariance, bound states) to pro- Rev. we Lett. 79, 3744show (1997). that existing net- complex network is formed by the citation contrast, the power-law tail characterizing level of LSMO is situated above the Fermi duce specific bias dependencies. Although here 14. J. M. MacLaren, X. G. Zhang, W. H. Butler, X. Wang, networkslevel of Co self-organize and a maximum of inverse into TMR awe scale-free concentrated on the problemstate, of the spinlessPhys. prominent Rev. B 59, 5470 (1999). but is nevertheless approxi- connected vertex (that is, a large k)decreas- 15. K. I. Kobayashi, T. Kimura, H. Sawada, K. Terakura, Y. is expected when the Fermi level of LSMO is polarizationwork of the Co models electrode and regarded failTokura, toNature incorporate395, 677 (1998). growth and patterns of the scientific publications, the ver- P(k)forthenetworksstudiedindicatesthat a featureapproximately unpredicted at the maximum of the by spin 2 allthe existing strongly spin-polarized random LSMO only as amated16. We thank by the group a of A.power Revcoleschi for providing law with an exponent es exponentially with k;thus,verticeswith DOS of Co. This is consistent with the max- useful spin analyzer, the large TMR ratios ob- LSMO targets, R. Lyonnet and A. Vaures for their preferential attachment,experimental help and J. two Nassar for fruitful key discus- features of tices being papers published in refereed jour- highly connected (large k)verticeshavea networkimum of models. inverse TMR observed To explain at Ϫ0.4 V tained the by origin combining Co of and LSMOthis electrodes␥powersions. SupportedӍ 4 by the (Fig. European Union 1C). and the New Finally, a rather large large connectivity are practically absent. In for Co/STO/LSMO junctions (Fig. 3A). For a (50% with a STO barrier) are also an interesting Energy and Industrial Technology Development Or- scalepositive invariance, bias, the TMR is expected we showto change thatresult.real The existing drawback networks. arising net- from the lowcomplex Usingganization of Japan. network a model is formed incorporating by the citationnalscontrast, and the the edges power-law being tail links characterizing to the articles large chance of occurring, dominating the sign and become normal above 1 V when the Curie temperature of LSMO (ϳ350 K) is the Heavy-tailedworkFermi models level of LSMO fail goes to down incorporate intodistributions the reduction of growth the TMR at room and temperature,patterns1 July 1999; of acceptedin the 2 September scientific 1999social publications, and the ver- physicalP(k)forthenetworksstudiedindicatesthat phenomena energy range of the majority spin d-band of connectivity. preferentialCo. This is also attachment, observed in Fig. 3A. two key features of tices being papers published in refereed jour- highly connected (large k)verticeshavea For ALO and ALO/STO barriers, a predom- There are two generic aspects of real net- real networks.inant tunneling of s-character Using electrons a (see model ar- incorporatingEmergencenals of Scaling and the edgesin being links to the articles large chance of occurring, dominating the Complexrow in Fig. 2B)networks is the usual explanation of the Geophysical and Financial Shocks positive polarization (6–8). The rapid drop Random Networks works that are not incorporated in these mod- with bias (Fig. 3B) is similar to what has been connectivity. observed in most junctions with ALO barriers, Albert-La´szlo´ Baraba´si* and Re´ka Albert and completely different from what is obtained There are two generic aspects of real net- els. First, both models assume that we start when the tunneling is predominantly by d-char- Systems as diverse asScience genetic networks 286 or the (1999) World Wide Web are best works that are not incorporated in these mod- acter electrons (Fig. 3A). The origin of this described as networks with complex topology. A common property of many with a fixed number (N) of vertices that are rapid decrease of the TMR at relatively small large networks is that the vertex connectivities follow a scale-free power-law bias has never been clearly explained. This is distribution. This feature was found to be a consequence of two generic mech- els. First, both models assume that we start roughly consistent with the energy dependence anisms: (i) networks expand continuously by the addition of new vertices, and then randomly connected (ER model), or re- of the DOS induced by sp-d bonding effects on (ii) new vertices attach preferentially to sites that are already well connected. with a fixed number (N) of vertices that are atomic layer of ALO in the calculation A model based on these two ingredients reproduces the observed stationary of Nguyen-Mahn et al.(8)fortheCo-ALO scale-free distributions, which indicates that the development of large networks then randomly connected (ER model), or re- connected (WS model), without modifying interface. But Zhang et al.(13)havealsoshown is governed by robust self-organizing phenomena that go beyond the particulars that a large part of the TMR drop can be of the individual systems. connected (WS model), without modifying N. In contrast, most real world networks are attributed to the excitation of spin waves. The experiments reported here and in sev- The inability of contemporary science to de- actions currently limits advances in many N. In contrast, most real world networks are eral recent publications (3, 4)demonstratethe scribe systems composed of nonidentical el- disciplines, ranging from molecular biology open and they form by the continuous addi- important role of the electronic structure of the ements that have diverse and nonlocal inter- to computer science (1). The difficulty of open and they form by the continuous addi- metal-oxide interface in determining the spin describing these systems lies partly in their tion of new vertices to the system, thus the polarization of the tunneling electrons. The neg- Department of Physics, , topology: Many of them form rather complex tion of new vertices to the system, thus the ative polarization for the Co-STO interface has Notre Dame, IN 46556, USA. networks whose vertices are the elements of been ascribed to d-d bonding effects between *To whom correspondence should be addressed. E- the system and whose edges represent the number of vertices N increases throughout Al and Ti (4). This interpretation is similar to mail: [email protected] interactions between them. For example, liv- number of vertices N increases throughout the lifetimeUnified of the network. scaling For law example, for earthquakes the , the lifetime of the network. For example, the www.sciencemag.org SCIENCE VOL 286 15 OCTOBER 1999 509 140 The European Physical Journal B actor networkK. Christensen grows by the et addition al., PNAS of new 99 (2002) actor network grows by the addition of new Snapshot of Internet network actors to the system, the WWW grows expo- actors to the system, the WWW grows expo- nentially over0 time by the addition of new courtesy k.c.Fig. claffy 1. The distribution function of connectivities for various large networks. (A) Actor collaboration 10 0.5 Web pages (8), and the research literaturea nentially over timeb by the addition of new graph with N ϭ 212,250 verticesFig. 1. andThe average distribution connectivity function͗k͘ϭ of28.78. connectivities (B) WWW, N forϭ various large networks. (A) Actor collaboration Levy/ (lower bound) constantly grows−2 by the publication of new 0.4 325,729, ͗k͘ϭ5.46 (6). (C) Powergraph grid with data,NN ϭ 4941,212,250͗k͘ϭvertices2.67. The anddashed average lines have connectivity 10͗k͘ϭ28.78. (B) WWW, N ϭ Web pages (8), and the research literature An keyslopes feature (A) ␥actor ofϭ 2.3, extremely (B) ␥www ϭ skewed2.1 and (C) ␥power ϭdistributions4. papers. Consequently, a common feature of • 325,729, ͗kP(x)͘ϭ5.46 (6). (C) Power grid data, N ϭ 4941, ͗k͘ϭ2.67. The dashed lines have 0.3 constantly grows by the publication of new 10−4 slopes (A) ␥ ϭα 2.3, (B) ␥ ϭ 2.1 and (C) ␥ ϭ 4. (g) papers. Consequently, a common feature of P(g) (i.e., scale-free power law actor ), is the wwwlarge power ! 510 P(x)15 ~ OCTOBER 1 / x 1999 VOL 286 SCIENCE www.sciencemag.org neg. tail 0.2 pos. tail disparity between the most probable value and the mean/ 10−6 neg. fit pos. fit 0.1 median value of the distribution:510 ⇒ the most probable 15 OCTOBER 1999 VOL 286 SCIENCE www.sciencemag.orgexponential 10−8 0.0 1 10 100 0.00 0.05 0.10 0.15 0.20 value xmp = Min(x), and the mean value ⟨x⟩ >> xmp. price fluctuation, g 1/g Inverse100 cubic law for the distribution0.5 This is in stark contrast to the Gaussian (Normal) distribution pdf for c / d • of stock price variations, P. Levy (lower bound) −2 0.4 which the mean value and the most probable value coincide, xmp = ⟨x⟩. Gopikrishnan10 et al., EPJB 3 (1998) 0.3 10−4 (g) P(g) • For Baseball, the approximate power-law behavior can be roughly phrased as such: Forneg. every tail Mickey ! 0.2 pos. tail Mantle (~8000 career at-bats), there are roughly 10 players with careers similar to Doc10−6 “theneg. Punk” fit Gautreaus 0.1 (~800 career at-bats); and for every Doc “the Punk” Gautreau there are roughly 10 players pos.with fit careers similar exponential 10−8 0.0 to Frank “the Jelly” Jelincich (8 career at-bats with one !). This statistical property arises1 from 10the ratio of 100 0.00 0.05 0.10 0.15 0.20 g 1/g frequencies P(x)/P(y) ~ (y / x) α = (y / x) for exponent α ≈ 1 Fig. 1. (a) Log-log plot of the cumulative probability distribution P (g) of the normalized price increments gi(t). The lines in (a) are power law fits to the data over the range 2 g(2) >...>g(N).ThecumulativedensitycanthenbewrittenasP (g(k))=k/N, and we obtain for the local slope γ(g(k))=− log(g(k+1)/g(k))/ log(P (g(k+1))/P (g(k))) ￿ k(log(g(k+1)) − log(g(k))). Each data point shown in b is an average over 1000 increments g(k), and the lines are linear regression fits to the data. Note that the −1 m (k) average m k=1 γ(g ) over all events used would be identical to the estimator for the asymptotic slope proposed by Hill [9]. (c) Same as (a) for the 1 min S&P500 increments. The regression lines yield α =2.93 and α =3.02 for the positive and negative tails￿ respectively. (d) Same as (b) for the 1 min S&P500 increments, except that the number of increments per data point is 100.

(iii)atruncatedL´evy distribution, where the tails be- 5. A. Pagan, J. Empirical Finance 3, 15 (1996), and refer- come “approximately exponential” [3]. The inverse cubic ences therein. law differs from all three proposals: Unlike (i)and(iii), it 6. J.P. Bouchaud, M. Potters, Th´eorie des Risques Financiers has diverging higher moments (larger than 3), and unlike (Alea-Saclay, Eyrolles 1998); R. Cont, Eur. Phys. J. B (i)and(ii)itisnotastabledistribution. (1998), in press (cond-mat/9705075). 7. The Trades and Quotes Database,24CD-ROMfor’94-’95, We thank X. Gabaix, S. Havlin, Y. Liu, R. Mantegna, published by the New York Stock Exchange. C.K. Peng and D. Stauffer for helpful discussions, and JNICT 8. In order to obtain uncorrelated estimators for the volatility DFG and NSF for financial support. and the price increment at time t, the time average extends only over time steps t￿ ￿= t. 9. B.M. Hill, Ann. Stat. 3, 1163 (1975). References 10. P. Gopikrishnan, M. Meyer, L.A.N. Amaral, H.E. Stanley 1. L. Bachelier, Ann. Sci. Ecole´ Norm. Sup. 3, 21 (1900). (to be published). 2. B.B. Mandelbrot, J. Business 36, 294 (1963). 11. V. Pareto, Cours d’Economie´ Politique (Lausanne and 3. R.N. Mantegna, H.E. Stanley, Nature 376, 46 (1995). Paris, 1897). 4. S. Ghashgaie, W. Breymann, J. Peinke, P. Talkner, 12. P. L´evy, Th´eorie de l’addition des variables al´eatoires Y. Dodge, Nature 381, 767 (1996). (Gauthier-Villars, Paris, 1937). 10 age (ERA) will be more descriptive than career strikeouts 500 or wins. To this end, strikeouts per and ERA are 300 fundamentally averages, and would fall into a separate class of pdf than the truncated power-laws observed for the metrics considered in this paper. In Tables 6-11 we list the traditional and detrended 100 single season statistics for HR, H, RBI, and K. There are too many players to discuss individually, so we men- 50 N (t) tion a few interesting observations. First, the rankings Data set size, N AB 30 for detrended home runs HRD are dominated by sea- N (t) sons prior to 1950. Second, of contemporary note, Ichiro IPO Suzuki’s single season hits record in 2004, which broke 10 the 83-year record held by , holds its place 1880 1900 1920 1940 1960 1980 2000 A 0 B 0 as the top single-season hitting performance of all time. Year 10 10 -2 α = 1 -2 α = 1 Finally, we provide two top-50 tables for single season Career 10 10 -4 -4 FIG. 7: Semi-log plot of data set size exhibits theD growth of

P(x) x = K P(x) strikeouts in Tables 10 and 11. We provide two sepa- 10 10 x = HD x = K x = H .success The size of the-6 data sets, N,isused -6 rate tables because the relative performance of pitchers 10 10 to compute the annual trends for pitchers0 and1 batters2 in3 Fig.4 0 1 2 3 4 from the 1800’s far surpasses the relative performance of distributions 10 10 10 10! 10 10 10 10 10 10 4andFig.5.Thesedatasetscorrespondtocutox ffs x 0 x contemporary legends. Hence, in Table 10 we rank all ! 0 ≡ 0 and y 100 in Eqs. (6)-(11). Interestingly,C 10 we observe a D 10 single-season performances from 1883-2009, while in Ta- ≡ spike in league size during WWI,possiblycorrespondingto-2 α = 1 -2 α = 1 ble 11 we rank all single-season performances from the 10 10 the widespread replacement of league veteransx = with WD multiple x = HRD P(x) -4 P(x) -4 “modern” era 1920-present. We note that the domi- replacements through the course of the10 season.x = W We also note 10 x = HR nance of Table 10 by 19th century baseball players could that one can clearly see the jumps associated-6 with the expan- -6 10 0 1 2 3 10 0 1 2 3 10 10 10 10 10 10 10 10 reflect fluctuations from small data set size and incom- sion between 1960 and 1980. x x 0 plete records of pitchers in the 19th century (see Fig. 7). E 10 Still, ’s 513 strikeouts in 1889 seems unfath- can be modeled well by the -2 α = 1 marks as we now discuss. 10 omable by today’s standards, and the seemingly out of Gamma distribution with P(x) -4 10 x = RBID place number may reflect factors which detrending can We approximatescaling exponent the pdf α

x > AB approximately 3% of non-pitchers have ￿ i x > H ￿ ( i

( approximately 5% of non-pitchers finish H P

AB ￿1 a career length of 1 at-bat ￿1 10 P 10

￿ their career with only 1 hit ￿ x x ￿ ￿ P P , , 10￿2 10￿2 4,580 1,280

￿3 ￿3 Distribution

Distribution 10 10 8,680 2,500

10￿4 10￿4 Cumulative Cumulative 11,000 3,280

10￿5 10￿5 0 2000 4000 6000 8000 10 000 12 000 14 000 0 1000 2000 3000 4000 Cumulative Distribution, Cumulative Distribution, Cumulative Distribution, Cumulative Distribution, Career at￿bats, ABi Career at￿bats, Hi Career at-bats, ABi Career hits, Hi ) i 100 ￿ i approximately 13% of non-pitchers finish career

HR with only 1 home-run

x > HR

￿ ￿1 ( 10 x P ￿ P

, 10,277 at-bats

10￿2 125 27 seasons

Distribution 3,418 hits

10￿3 380

Cumulative x = career success total 580 100 200 300 400 500 600 700 800 typically proportional to the Career at￿bats, HRi Cumulative Distribution, Cumulative Distribution, Career home-runs, HRi career length

125 Sunday, August 5, 2012 Statistical law for career longevity 0 1 2 3 4 1 2 3 4 5 10 10 10 10 10 10 10 10 10 10 Major League Baseball -2 A B -2 10 α = 0.77 10 α = 0.63 130+ years of player -4 • -4 10 10 Baseball (fielders) Basketball (NBA) statistics, ~ 15,000 careers AB (Korea, KPB) -6 -6 Minutes 10 10 AB (USA, MLB) -1 ``One-hit wonders” 10

P(x) C D P(x) α = 0.71 α = 0.55 3% of all fielders finish their -2 • -3 10 10 career with ONE at-bat! Pro Sports

-5 Baseball (pitchers) -4 10 Soccer (Premier League) 10 3% of all pitchers finish their IPO (Korea, KPB) • IPO (USA, MLB) Games career with less than one -7 -6 10 0 1 2 3 4 0 1 2 3 10 10 10 10 10 10 10 10 10 10 inning pitched! x (career longevity in opportunities) 0 0 10 10 ``Iron horses” -1 α ≈ 0.40 E F -1 10 10

-2 -2 10 10 • (the Iron Horse): NY

P(x) -3 -3 P(x) 10 Nature CELL 10 Yankees (1923-1939) PNAS NEJM -4 -4 10 Science PRL 10 • Played in 2,130 consecutive games in -5 -5 15 seasons! 8001 career at-bats! 10 0 1 0 1 210 Academia 10 10 10 10 10 Career & life stunted by the fatal x (career longevity in years) • neuromuscular disease, amyotrophic opportunities ~ time duration lateral sclerosis (ALS), aka Lou A. M. Petersen, W.-S. Jung, J.-S. Yang, H. E. Stanley, “Quantitative and empirical demonstration of the Matthew effect in a study of career longevity.” Gehrig’s Disease Proc. Natl. Acad. Sci. USA 108, 18-23 (2011).

Sunday, August 5, 2012 3. Re-ranking the all-time greats

Sunday, August 5, 2012 Single season success distributions 0 10 1920 - 2008 Data collapse: The bulk of the 1920 - 1940 1941 - 1960 league has remarkably uniform

) 1961 - 1980 achievement distribution

x -2 1981 - 2008 10 PDF( -4 A B 10 Traditional Detrended

0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 x = HR per season x = HRD per season applying But in comparison 1967 - 2008 -3 detrending1967 - 1977 method to the bulk, the extremely rare 10 1978 - 1987 achievements gain new

) 1988 - 1997

x 1998 - 2008 perspective lighting

-4 10 Detrending amplifies PDF( Crelatively significant achievementsD -5 10 usingTraditional the local ability averageDetrended as a baseline 0 1000 2000 30000 1000 2000 3000 Sunday, August 5, 2012 x = Pts per season x = PtsD per season Career Wins: not affected by detrending 5

Table S5. Ranking of career wins (1890–2009).

Traditional Rank Detrended Rank Rank Name Final Season (L) Career Metric Rank∗(Rank) % Change Name Final Season (L) Career Metric 1CyYoung1911(22)5111(1) 0 1911 (22) 510 2WalterJohnson1927(21)4172(2) 0 1927 (21) 420 3ChristyMathewson1916(17)373 3(3) 0 1916 (17) 376 3PeteAlexander1930(20)3734(3) -33 Pete Alexander 1930 (20) 375 5PudGalvin1892(15)3645(5) 0 1892 (15) 365 6WarrenSpahn1965(21)3636(6) 0 1965 (21) 362 7KidNichols1906(15)3617(7) 0 1906 (15) 359 8GregMaddux2008(23)3558(8) 0 Greg 2008 (23) 351 9RogerClemens2007(24)3549(9) 0 2007 (24) 350 10 1893 (14) 342 10(10) 0 Tim Keefe 1893 (14) 342 11 1988 (24) 329 11(11) 0 Steve Carlton 1988 (24) 329 12 1894 (12) 328 12(13) 7 1917 (17) 328 13 Eddie Plank 1917 (17) 326 13(12) -8 John Clarkson 1894 (12) 327 14 1988 (23) 324 14(14) 0 Don Sutton 1988 (23) 324 14 Nolan Ryan 1993 (27) 324 14(14) 0 Nolan Ryan 1993 (27) 324 16 1987 (24) 318 16(16) 0 Phil Niekro 1987 (24) 318 17 1983 (22) 314 17(17) 0 Gaylord Perry 1983 (22) 314 18 1986 (20) 311 18(18) 0 Tom Seaver 1986 (20) 311 19 Charley Radbourn 1891 (11) 309 19(19) 0 Charley Radbourn 1891 (11) 308 20 1892 (13) 307 20(20) 0 Mickey Welch 1892 (13) 307 21 2008 (22) 305 21(21) 0 Tom Glavine 2008 (22) 302 22 2009 (22) 303 22(25) 12 1887 (15) 300 23 1963 (23) 300 22(23) 4 Early Wynn 1963 (23) 300 23 1941 (17) 300 24(23) -4 Lefty Grove 1941 (17) 299 25 Bobby Mathews 1887 (15) 297 24(22) -9 Randy Johnson 2009 (22) 299 26 1989 (26) 288 26(26) 0 Tommy John 1989 (26) 288 27 1992 (22) 287 27(27) 0 Bert Blyleven 1992 (22) 287 28 1966 (19) 286 28(28) 0 Robin Roberts 1966 (19) 285 29 1894 (13) 284 29(29) 0 Fergie Jenkins 1983 (19) 284 29 Fergie Jenkins 1983 (19) 284 30(31) 3 1983 (25) 283 31 Jim Kaat 1983 (25) 283 30(29) -3 Tony Mullane 1894 (13) 283 32 1947 (22) 273 32(32) 0 Red Ruffing 1947 (22) 272 33 2008 (18) 270 33(33) 0 1934 (19) 270 33 Burleigh Grimes 1934 (19) 270 34(35) 2 1984 (19) 268 35not Jim Palmer surprising, 1984 (19) since 268 pitcher35(33) wins -6 is Mike largely Mussina dependent 2008 (18) 267 36 1933 (21) 266 36(36) 0 Eppa Rixey 1933 (21) 266 36 Bob Feller 1956 (18) 266 36(38) 5 Jim McCormick 1887 (10) 266 38 Jim McCormick 1887 (10) 265 on 38(36)team factors -5 Bob Feller 1956 (18) 265 39 1901 (14) 264 39(39) 0 Gus Weyhing 1901 (14) 263 40 1946 (21) 260 40(40) 0 Ted Lyons 1946 (21) 259 Sunday,41 August 5, Jamie 2012 Moyer 2009 (23) 258 40(44) 9 Al Spalding 1877 (7) 259 42 1994 (18) 254 42(42) 0 1933 (20) 255 42 Red Faber 1933 (20) 254 43(41) -4 2009 (23) 254 44 Al Spalding 1877 (7) 253 44(42) -4 Jack Morris 1994 (18) 253 44 1943 (16) 253 44(44) 0 Carl Hubbell 1943 (16) 253 46 1975 (17) 251 46(46) 0 Bob Gibson 1975 (17) 251 47 1910 (13) 249 47(47) 0 Vic Willis 1910 (13) 250 48 1933 (23) 247 48(48) 0 Jack Quinn 1933 (23) 247 49 Joe McGinnity 1908 (10) 246 48(49) 2 Joe McGinnity 1908 (10) 247 50 1901 (10) 245 50(50) 0 Jack Powell 1912 (16) 245 2 Career Hits: not affected by detrending

Table S2. Ranking of career hits (1871–2009).

Traditional Rank Detrended Rank Rank Name Final Season (L) Career Metric Rank∗(Rank) % Change Name Final Season (L) Career Metric 1PeteRose1986(24)42561(1) 0 1986 (24) 4409 2TyCobb1928(24)41892(2) 0 1928 (24) 4166 3HankAaron1976(23)37713(3) 0 1976 (23) 3890 4StanMusial1963(22)36304(4) 0 1963 (22) 3661 5TrisSpeaker1928(22)35145(6) 16 1983 (23) 3537 6CarlYastrzemski1983(23)34196(8) 25 1917 (21) 3484 7CapAnson1897(27)34187(7) 0 Cap Anson 1897 (27) 3464 8HonusWagner1917(21)34158(5) -60 1928 (22) 3449 9PaulMolitor1998(21)33199(11) 18 1973 (22) 3375 10 1930 (25) 3315 10(9) -11 1998 (21) 3361 11 Willie Mays 1973 (22) 3283 11(12) 8 1997 (21) 3303 12 Eddie Murray 1997 (21) 3255 12(13) 7 1916 (21) 3291 13 Nap Lajoie 1916 (21) 3242 13(10) -30 Eddie Collins 1930 (25) 3266 14 Cal Ripken 2001 (21) 3184 14(15) 6 1993 (21) 3222 15 George Brett 1993 (21) 3154 15(14) -7 Cal Ripken 2001 (21) 3219 16 1945 (20) 3152 16(17) 5 1993 (20) 3209 17 Robin Yount 1993 (20) 3142 17(18) 5 2001 (20) 3175 18 Tony Gwynn 2001 (20) 3141 18(19) 5 1995 (22) 3171 19 Dave Winfield 1995 (22) 3110 19(23) 17 1979 (19) 3150 20 2007 (20) 3060 20(22) 9 1985 (19) 3149 21 2003 (25) 3055 21(27) 22 1972 (18) 3107 22 Rod Carew 1985 (19) 3053 22(26) 15 1974 (22) 3094 23 Lou Brock 1979 (19) 3023 23(21) -9 Rickey Henderson 2003 (25) 3089 24 2005 (20) 3020 24(20) -20 Craig Biggio 2007 (20) 3060 25 1999 (18) 3010 25(25) 0 Wade Boggs 1999 (18) 3053 26 Al Kaline 1974 (22) 3007 26(29) 10 1917 (19) 3046 27 Roberto Clemente 1972 (18) 3000 27(30) 10 1976 (21) 3040 28 1934 (20) 2987 28(24) -16 Rafael Palmeiro 2005 (20) 3034 29 Sam Crawford 1917 (19) 2961 29(16) -81 Paul Waner 1945 (20) 2968 30 Frank Robinson 1976 (21) 2943 30(42) 28 1977 (23) 2955 31 2007 (22) 2935 31(31) 0 Barry Bonds 2007 (22) 2948 32 1910 (19) 2932 32(33) 3 1907 (20) 2905 33 Rogers Hornsbynot so 1937surprising (23) 2930 since33(40) career 17hits is Harold closely Baines related 2001 (22) to 2900 33 Jake Beckley 1907 (20) 2930 34(32) -6 Willie Keeler 1910 (19) 2872 35 1944 (20) 2927 35(47) 25 1975 (18) 2863 36 Zack Wheatcareer 1927 (19)length, 2884 which36(52) hasn’t 30 changed Tony Perez significantly 1986 (23) 2831 37 1937 (19) 2880 37(58) 36 1976 (18) 2830 38 1947 (22) 2876 38(45) 15 1996 (21) 2823 Sunday,39 August 5, Babe 2012 Ruth 1935 (22) 2873 39(55) 29 1985 (23) 2821 40 2001 (22) 2866 40(50) 20 1985 (18) 2813 41 1905 (16) 2850 41(36) -13 1927 (19) 2809 42 Brooks Robinson 1977 (23) 2848 42(28) -50 Sam Rice 1934 (20) 2794 43 1942 (19) 2839 43(56) 23 1990 (22) 2779 44 George Sisler 1930 (15) 2812 44(63) 30 1973 (18) 2771 45 Andre Dawson 1996 (21) 2774 45(57) 21 1991 (19) 2770 46 Ken Griffey 2009 (21) 2763 46(41) -12 Jesse Burkett 1905 (16) 2768 47 Vada Pinson 1975 (18) 2757 47(33) -42 1937 (23) 2766 48 1950 (20) 2749 47(46) -2 Ken Griffey 2009 (21) 2766 49 2009 (15) 2747 49(38) -28 Mel Ott 1947 (22) 2745 50 Al Oliver 1985 (18) 2743 50(53) 5 2004 (17) 2738 10 Career Strikeouts: affected by distinct eras

Table S10. Ranking of season strikeouts for the Modern Era (1920–2009).

Traditional Rank Detrended Rank Rank Name Season (Y #) Season Metric Rank∗(Rank) %Change Name Season(Y #) Season Metric 1NolanRyan1973(7)3831(72) 98 Dazzy Vance 1924 (5) 443 2SandyKoufax1965(11)382 2(6) 66 Bob Feller 1946 (8) 407 3RandyJohnson2001(14)372 3(197) 98 Dazzy Vance 1925 (6) 368 4NolanRyan1974(8)3674(4) 0 Nolan Ryan 1974 (8) 335 5RandyJohnson1999(12)364 5(79) 93 Bob Feller 1941 (6) 334 6BobFeller1946(8)3486(1) -500 Nolan Ryan 1973 (7) 333 7RandyJohnson2000(13)347 7(75) 90 Bob Feller 1940 (5) 325 8NolanRyan1977(11)3418(133) 93 Van Mungo 1936 (6) 323 9RandyJohnson2002(15)334 9(47) 80 1946 (8) 322 10 Nolan Ryan 1972 (6) 329 10(102) 90 Bob Feller 1939 (4) 321 10 Randy Johnson 1998 (11) 329 11(435) 97 Lefty Grove 1926 (2) 317 12 Nolan Ryan 1976 (10) 327 12(124) 90 Bob Feller 1938 (3) 316 13 Sam McDowell 1965 (5) 325 12(400) 97 Dazzy Vance 1923 (4) 316 14 1997 (10) 319 12(367) 96 Dazzy Vance 1928 (9) 316 15 Sandy Koufax 1966 (12) 317 15(12) -25 Nolan Ryan 1976 (10) 310 16 Curt Schilling 2002 (15) 316 16(8) -100 Nolan Ryan 1977 (11) 301 17 J.R. Richard 1979 (9) 313 17(578) 97 Dazzy Vance 1927 (8) 299 17 Pedro Martinez 1999 (8) 313 18(175) 89 1938 (8) 298 19 Steve Carlton 1972 (8) 310 18(382) 95 1933 (3) 298 20 1971 (9) 308 18(17) -5 J.R. Richard 1979 (9) 298 20 Randy Johnson 1993 (6) 308 21(2) -950 Sandy Koufax 1965 (11) 294 22 1986 (8) 306 21(251) 91 Hal Newhouser 1945 (7) 294 22 Sandy Koufax 1963 (9) 306 23(269) 91 Lefty Grove 1930 (6) 293 24 Pedro Martinez 1997 (6) 305 24(26) 7 J.R. Richard 1978 (8) 289 25 Sam McDowell 1970 (10) 304 24(600) 96 Lefty Grove 1928 (4) 289 26 J.R. Richard 1978 (8) 303 26(767) 96 Lefty Grove 1927 (3) 282 27 Nolan Ryan 1989 (23) 301 27(484) 94 Dizzy Dean 1932 (2) 274 27 1971 (3) 301 28(499) 94 Red Ruffing 1932 (9) 273 29 Curt Schilling 1998 (11) 300 29(37) 21 Steve Carlton 1980 (16) 272 30 Randy Johnson 1995 (8) 294 30(449) 93 1930 (3) 271 31 Curt Schilling 2001 (14) 293 31(10) -210 Nolan Ryan 1972 (6) 270 32 Roger Clemens 1997 (14) 292 31(528) 94 Lefty Grove 1932 (8) 270 33 Randy Johnson 1997 (10) 291 33(791) 95 Pete Alexander 1920 (10) 269 33 Roger ClemensThe competitive 1988 (5) 291 (dis)advantage34(874) associated 96 Lefty with Grove particular 1929 (5) eras 268 35 Randy Johnson 2004 (17) 290 35(1174) 97 Walter Johnson 1924 (18) 267 36(raised Tom Seaver mound 1971 (5) 1962-69, 289 deadball36(423) era 1900-20) 91 is Dizzy evident Dean in this 1936 (6) re-ranking 265 37 Steve Carlton 1982 (18) 286 37(499) 92 Dizzy Dean 1935 (5) 264 Sunday,37 August Steve 5, 2012 Carlton 1980 (16) 286 38(964) 96 1929 (2) 261 39 Pedro Martinez 2000 (9) 284 38(37) -2 Steve Carlton 1982 (18) 261 40 Tom Seaver 1970 (4) 283 38(20) -90 Mickey Lolich 1971 (9) 261 40 Sam McDowell 1968 (8) 283 41(346) 88 1941 (5) 260 42 Denny McLain 1968 (6) 280 41(1153) 96 1926 (8) 260 43 Sam McDowell 1969 (9) 279 43(537) 91 Hal Newhouser 1944 (6) 259 44 1965 (4) 276 44(5) -780 Randy Johnson 1999 (12) 258 44 1996 (9) 276 45(70) 35 1956 (2) 257 44 1984 (1) 276 46(423) 89 Dizzy Dean 1934 (4) 255 47 Hal Newhouser 1946 (8) 275 46(19) -142 Steve Carlton 1972 (8) 255 47 Steve Carlton 1983 (19) 275 46(27) -70 Vida Blue 1971 (3) 255 49 1982 (6) 274 49(105) 53 Herb Score 1955 (1) 253 49 Fergie Jenkins 1970 (6) 274 49(729) 93 1932 (3) 253 6 Season Home Runs: case of extreme inflation Table S6. Ranking of season home runs for the Modern Era (1920–2009).

Traditional Rank Detrended Rank Rank Name Season (Y #) Season Metric Rank∗(Rank) %Change Name Season(Y #) Season Metric 1BarryBonds2001(16)731(19) 94 Babe Ruth 1920 (7) 133 2MarkMcGwire1998(13)70 2(8) 75 Babe Ruth 1927 (14) 102 3SammySosa1998(10)663(9) 66 Babe Ruth 1921 (8) 100 4MarkMcGwire1999(14)65 4(72) 94 Babe Ruth 1926 (13) 82 5SammySosa2001(13)645(94) 94 Babe Ruth 1924 (11) 80 6SammySosa1999(11)635(72) 93 Lou Gehrig 1927 (5) 80 7RogerMaris1961(5)617(19) 63 Babe Ruth 1928 (15) 77 8BabeRuth1927(14)608(61) 86 1933 (9) 70 9BabeRuth1921(8)599(94) 90 Babe Ruth 1931 (18) 68 10 Mark McGwire 1997 (12) 58 9(94) 90 Lou Gehrig 1931 (9) 68 10 2006 (3) 58 11(10) -10 Jimmie Foxx 1932 (8) 67 10 1938 (7) 58 12(215) 94 1923 (12) 66 10 Jimmie Foxx 1932 (8) 58 12(215) 94 Babe Ruth 1923 (10) 66 14 2002 (9) 57 14(181) 92 Rogers Hornsby 1922 (8) 62 14 Luis Gonzalez 2001 (12) 57 15(10) -50 Hank Greenberg 1938 (7) 60 16 1930 (8) 56 16(301) 94 1922 (7) 58 16 Ken Griffey 1998 (10) 56 16(592) 97 1943 (8) 58 16 Ken Griffey 1997 (9) 56 18(42) 57 Lou Gehrig 1936 (14) 57 19 Babe Ruth 1928 (15) 54 18(42) 57 Lou Gehrig 1934 (12) 57 19 Babe Ruth 1920 (7) 54 20(16) -25 Hack Wilson 1930 (8) 56 19 Alex Rodriguez 2007 (14) 54 21(135) 84 Hank Greenberg 1946 (12) 55 19 2006 (10) 54 21(401) 94 1922 (12) 55 19 1961 (11) 54 23(94) 75 Babe Ruth 1929 (16) 53 19 1949 (4) 54 23(899) 97 1943 (5) 53 25 2002 (12) 52 25(301) 91 Rogers Hornsby 1925 (11) 52 25 Alex Rodriguez 2001 (8) 52 25(36) 30 Jimmie Foxx 1938 (14) 52 25 Mark McGwire 1996 (11) 52 25(519) 95 Babe Ruth 1922 (9) 52 25 Willie Mays 1965 (14) 52 28(135) 79 Jimmie Foxx 1934 (10) 51 25 Mickey Mantle 1956 (6) 52 28(1023) 97 Hack Wilson 1927 (5) 51 25 1977 (9) 52 28(1023) 97 Cy Williams 1927 (16) 51 31 1947 (9) 51 28(457) 93 1942 (4) 51 31 Willie Mays 1955 (4) 51 32(161) 80 1929 (2) 50 31 Ralph Kiner 1947 (2) 51 32(31) -3 Johnny Mize 1947 (9) 50 31 2005 (10) 51 32(31) -3 Ralph Kiner 1947 (2) 50 31 1990 (5) 51 35(94) 62 Joe DiMaggio 1937 (2) 49 36 1998 (10) 50 35(592) 94 Babe Ruth 1933 (20) 49 ..... 36 2000 (12) 50 35(42) 16 Babe Ruth 1930 (17) 49 Steroids36 Jimmieera Foxxplayers 1938 show (14) a relative 50 35(1134)decrease in 96 their Bill achievement Nicholson 1943 (6)significance; 49 36 2007 (3) 50 35(181) 80 1936 (4) 49 36Nevertheless, their 1995 (7) achievements 50 35(686) are still monumental 94 Bill Nicholson in magnitude 1944 (7)! 49 36 1996 (9) 50 41(181) 77 Mel Ott 1929 (4) 48 Sunday, August42 5, 2012 1997 (9) 49 41(19) -115 Ralph Kiner 1949 (4) 48 42 Jim Thome 2001 (11) 49 41(215) 80 Jimmie Foxx 1936 (12) 48 42 Sammy Sosa 2002 (14) 49 41(353) 88 Ted Williams 1946 (5) 48 42 Babe Ruth 1930 (17) 49 45(215) 79 Babe Ruth 1932 (19) 47 42 Frank Robinson 1966 (11) 49 45(1422) 96 1924 (3) 47 42 2006 (6) 49 45(1422) 96 1924 (12) 47 42 Mark McGwire 1987 (2) 49 45(777) 94 1931 (3) 47 42 Willie Mays 1962 (11) 49 45(1134) 96 Ken Williams 1923 (8) 47 42 1954 (8) 49 45(3401) 98 George Sisler 1920 (6) 47 the big debate...Career Home Runs.... 25

Traditional Rank Detrended Rank Rank Name Final Season (L) Career Metric Rank∗(Rank) Name Final Season (L) Career Metric 1BarryBonds2007(22)7621(3) Babe Ruth 1935 (22) 1215 2HankAaron1976(23)7552(23) Mel Ott 1947 (22) 637 3BabeRuth1935(22)7143(26) Lou Gehrig 1939 (17) 635 4WillieMays1973(22)6603(17) Jimmie Foxx 1945 (20) 635 5KenGriffey Jr. 2009 (21) 630 5(2) Hank Aaron 1976 (23) 582 6SammySosa2007(18)6096(124) Rogers Hornsby 1937 (23) 528 7FrankRobinson1976(21)586 7(192) Cy Williams 1930 (19) 527 8AlexRodriguez2009(16)583 8(1) Barry Bonds 2007 (22) 502 8MarkMcGwire2001(16)5839(4) Willie Mays 1973 (22) 490 10 1975 (22) 573 10(18) Ted Williams 1960 (19) 482 11 Rafael Palmeiro 2005 (20) 569 11(13) 1987 (21) 478 12 Jim Thome 2009 (19) 564 12(14) 1989 (18) 463 13 Reggie Jackson 1987 (21) 563 13(7) Frank Robinson 1976 (21) 444 14 Mike Schmidt 1989 (18) 548 14(10) Harmon Killebrew 1975 (22) 437 15 2009 (17) 546 15(577) 1920 (11) 433 16 Mickey Mantle 1968 (18) 536 16(718) Honus Wagner 1917 (21) 420 17 Jimmie Foxx 1945 (20) 534 17(18) Willie McCovey 1980 (22) 417 18 Ted Williams 1960 (19) 521 18(557) 1893 (14) 413 18 Frank Thomas 2008 (19) 521 19(5) Ken Griffey Jr. 2009 (21) 411 18 Willie McCovey 1980 (22) 521 20(28) Stan Musial 1963 (22) 410

TABLE XII: Ranking of Career Home Runs (1871 - 2009). ...for extensive top-50 tables for Hits, HR, RBI, K, W calculated for single seasons and also over entire the career consult the papers downloadable at: http://physics.bu.edu/~amp17/webpage_files/publications.html A. M. Petersen, O. Penner, H. E. Stanley. Methods for detrending success metrics to account for inflationary and deflationary factors Eur. Phys. J. B 79, 67-78 (2011). DOI: 10.1140/epjb/e2010-10647-1

and an analogous statistical analysis of basketball career statistics:

A. M. Petersen, O. Penner. A method for the unbiased comparison of MLB and NBA career statistics across era Presented at the MIT Sloan Sports Analytics Conference 2012 (2012).

Sunday, August 5, 2012 ... aside from being fun... Baseball is a

Google Ngram Viewer historical treasurehttp://books.google.com/ngrams/graph?content=baseball&year_...

Ngram Viewer http://books.google.com/ngrams/

Graph these case-sensitive comma-separated phrases: baseball between 1800 and 2000 from the corpus English One Million with smoothing of 3 . Search lots of books

towards the preservation of the 0 game and its Tweet 0 wonderful history

Search in Google Books: 1800 - 1916 1917 - 1978 1979 - 1987 1988 - 1993 1994 - 2000 baseball (English One Million)

Run your own experiment! Raw data is available for download here. © 2010 Google - About Google - About Google Books - About Google Books NGram Viewer

Sunday, August 5, 2012

1 of 1 7/28/12 3:23 PM eras. In this manuscript we demonstrate the utility of our detrending method by analyzing the distribution of individual success in a competitive arena, Major League Baseball (MLB), using an exhaustive data set that contains more than 15,000 player careers over the rich 130+ year history. We provide 10 extensive tables which list the top-50All-timeachievementsaccording to both the traditional rankings and our detrended rankings,forthestatisticalcategoriesof home runs (HR), hits (H) and runs-batted-in (RBI) for batters, and strikeouts (K) and wins (W) for pitchers. We analyze the most natural measures for accomplishment, the statistics that are listed in every box-score and onClosing every baseball remarks.... card, so that the results are tangible to any historian or fan who is interested in reviewing and discussing the “all-time greats”. InRelevant particular, this cultural study addresses questions: two relevant cultural questions:

(i) How to quantitatively account for economic, technological, and social factors that influence the rate of success in competitive professions.

(ii) How to use career statistics in an unbiased fashion to help in the both the standard, as well as, retroactive induction of athletes into a Hall of Fame. This is particularly important given the “inflation” observed for home runs in Major League Baseball, a phenomena that is believed to be related to the widespread use of PerformanceEnhancingDrugs(PED).

In order to account for changes in relative player ability over time, we have developed a de- trending method which accounts for inflationary and deflationary factors (PEDs, changes in the physical construction of bats and balls, sizes of ballparks, talent dilution of players from expansion,Relevant etc.) andbar-stool allows for an debates: objective comparison of players across time. Remarkably, we find using our detrending method, that the distributions of career success are invariant after we normalize accomplishments to local averages. Another surprising observation in this comprehensive studyoftheentireMLBlaborforceisthe large numbers of “one-hitWho wonders”, along was with muchThe smaller Greatest,butstatisticallysignificantand theoretically predictable, number of stellar “iron-horse” careers. We quantify this surprising finding with a statistical law which “bridges the gap” betweenthelargenumberofplayerswith very few career Slugger accomplishments andof theAll-Time few players with legendary ???????? career accomplishments. Moreover, by analyzing the distribution of success across the entire set of players, we can calcu- lateSunday, universal August 5, benchmarks2012 that can be used for the identification of extraordinary careers. Also, since these benchmarks are calculated using detrended metrics, the benchmarks are robust and stable with respect to the time-dependent factors. Specifically, this empirical and quantitative observation can be used by the Baseball Hall of Fame to addressissue(ii),raisedabove,inthe context induction criteria. Contact Information A. M. Petersen: [email protected] O. Penner: [email protected] H. E. Stanley: [email protected]

Sincerely yours,

A. M. Petersen O. Penner H. E. Stanley Thank You! and also thanks to my collaborators: Woo-Sung Jung, Orion Penner, Gene Stanley, Sauro Succi, Fengzhong Wang, Jae-Sook Yang, Massimo Riccaboni and Fabio Pammolli

http://physics.bu.edu/~amp17/

I) A. M. Petersen, W.-S. Jung, J.-S. Yang, H. E. Stanley, “Quantitative and empirical demonstration of the Matthew effect in a study of career longevity.” Proc. Natl. Acad. Sci. USA 108, 18-23 (2011).

II) A. M. Petersen, W.-S. Jung, H. E. Stanley, “On the distribution of career longevity and the evolution of home run prowess in professional baseball.” Europhysics Letters 83, 50010 (2008).

III) A. M. Petersen, O. Penner, H. E. Stanley , “ʼMethods for detrending success metrics to account for inflationary and deflationary factors.” Eur. Phys. J. B 79, 67-78 (2011).

IV) A. M. Petersen, O. Penner. “A method for the unbiased comparison of MLB and NBA career statistics across era.” Presented at the MIT Sloan Sports Analytics Conference, 2012.

V) A. M. Petersen, M. Riccaboni, H. E. Stanley, F. Pammolli “ʻPersistence and Uncertainty in the Academic Career.” Proc. Natl. Acad. Sci. USA 109, 5213 - 5218 (2012).

Sunday, August 5, 2012 “Beyond the asterisk* : Adjusting for performance inflation in professional sports”

The evaluation of success depends on many factors, some time dependent, others time independent. In order to compare human achievements from different time periods, success metrics should be normalized to a common index so that the time dependent factors do not bias the comparison of the statistical measures. This consideration is particularly relevant to career achievement records in Major League Baseball (MLB), which are of significant cultural importance. I will present a novel approach which removes the time-dependent factors by normalizing a player’s annual achievement by the local ability average. Using empirical career data for more than 15,000 MLB player careers, our method yields “detrended” success measures that are more appropriate for comparing and evaluating the relative merits of players from different historical eras. In particular, this study addresses two relevant cultural questions: (i) How to quantitatively account for economic, technological, and social factors that influence the rate of success in competitive professions, and (ii) How to use career statistics in an unbiased fashion to help in the both the standard, as well as, retroactive induction of athletes into a Hall of Fame; This is particularly important given the “steroids-era” inflation observed for home runs in Major League Baseball, a phenomena that is believed to be related to the widespread use of Performance Enhancing Drugs (PED).

Sunday, August 5, 2012