<<

arXiv:0708.3961v1 [stat.ME] 29 Aug 2007 hs,Cac n Conspiracy Segal R. and Mark Chance , 07 o.2,N.1 98–108 1, DOI: No. 22, Vol. 2007, Science Statistical 44-50 S e-mail: California USA Francisco, 94143-0560, San California, of Biostatistics, University Molecular and Center Bioinformatics Director, for and Biostatistics, and Epidemiology vr hncmeiini tteutmt ee,that level, ultimate How- the levels. at highest is the competition selection at when move played ever, luck is in event, game role any the In when no players. have Go randomness from claims and/or most rise such a the although get is games, often chess all that of contend logical/rational would Many lows.

akR ea sPoesr eatetof Department Professor, is Segal R. Mark c ttsia Science article Statistical original the the of by reprint published electronic an is This ern iesfo h rgnli aiainand pagination in detail. original typographic the from differs reprint nttt fMteaia Statistics Mathematical of Institute hs n hneaeseigysrnebedfel- strange seemingly are chance and Chess 10.1214/088342306000000574 Chess o aeabdopnn o aegood 7). have age you Segal, opponent (Jacob if luck bad and a luck, have bad you have you opponent good is .INTRODUCTION 1. aeo uk fyuhv a have you If luck. of game a nttt fMteaia Statistics Mathematical of Institute , tteutmt ee,ta fteWrdCesCaposi ( Championship competit Chess are when World the However, conspiracy of levels. and that chess highest level, wh ultimate the the selection at move at played in role is apparent game no have randomness and/or Abstract. his u ttsis streaks. statistics, run chains, phrases: and words to Key brought be can bases problem. data the game on and w computers Further, chess distributions. both fini statistic how imbedded run employ evaluate p we to of chains particular, run In prob observed using methods. an performed statistical be of to consists investigation Bobb this only claim allows Champion, the moves, this That World for investigated. former basis be a ought concrete it by that advanced argues was Fischer, mo by claim move prearranged this and That fixed were partici game Kasparov) the vs between all (Karpov that WCC levied was repeated, accusations frequently accusation, of such history colorful and 07 o.2,N.1 98–108 1, No. 22, Vol. 2007, [email protected] 2007 , hs n hneaeseigysrnebdelw.Luck bedfellows. strange seemingly are chance and Chess hs,dt ae,dsrbto hoy Markov theory, distribution bases, data Chess, . This . not tag eflos hr en long a being there bedfellows, strange in 1 etoscnenn i iiihdfclis(e.g., faculties diminished per- Krauthammer, his widespread of concerning view ceptions in has per- communications) Laan, investigation der sonal van Mark an Editor, (an such questioned been warrant play- chess credentials in- Fischer’s be that ing it logic the that Cham- Now, argues World vestigated. That Fischer, former (Bobby) move. WCC a Robert by by 1985 pion, advanced the move was in claim prearranged this games and re- the fixed frequently all were accusation, that such was peated, show- One are levied below. which accusations, cased of of instances history participants, colorful between and long a is eto epoieavr re itr fteWorld the of history brief very a provide we section methods. statistical and investigation an probabilistic run such using observed pursue we an moves, of particular consists of claim Fischer’s for basis htteeaeahrnst i leain among (Spassky, former allegations Champion one World his including community, to chess-playing adherents the are there that oe’ ol hmin(.Pl´ri Polg´ar Polg´arShutzman, and in (Z. Champion World women’s chess (WCC), are Championship conspiracy Chess and World the of h ae sognzda olw.I h etsub- next the In follows. as organized is paper The 1997 .Sneteol ulse,concrete published, only the Since ). 2005 demonstrate e .Hwvr ti ot noting worth is it However, ). not nte1985 the in s blsi and abilistic at.One pants. eMarkov te published, tag eflos There bedfellows. strange articular WCC), nthe en o is ion 1999 bear ve. y n n former one and ) 2 M. R. SEGAL

Chess Championship post World War II. This pro- Fischer won convincingly, but forfeited the title to vides context to the 1985 match. Conspiracy ele- in 1975. ments surrounding the 1978 match are highlighted Karpov’s challenger in the 1978 WCC was Vic- to indicate the climate in which these contests are tor Korchnoi. Previously, in 1976, Korchnoi had de- conducted. Section 2 focuses on Fischer’s claim re- fected from the and sought political garding move runs and several approaches for evalu- asylum in the . The USSR Chess Fed- ating the significance thereof. In particular, we em- eration put pressure on FIDE to exclude Korchnoi ploy imbeddings in finite Markov chains that are es- from the candidates matches. FIDE did not buckle pecially suited for this purpose. Section 3 describes and, as fate would have it, Korchnoi defeated three further analytic possibilities for run assessment that Soviet players on his way to the title match with are afforded by use of powerful, chess-playing soft- Karpov. Shortly before the final candidates match, ware, while Section 4 does likewise with respect to Korchnoi was injured in a serious car accident. Then, game data bases. Concluding discussion is presented during the WCC itself, a remarkable series of claims in Section 5. and counterclaims was exchanged, that showcases For the most part, only a very rudimentary under- the tensions and controversies surrounding chess at standing of chess is required for this paper. Rather this level. These included (i) Karpov demanding the than give definitions of pieces, legal moves and other dismantling of Korchnoi’s chair to search for “extra- rules, we defer such to the many expository treat- neous objects or prohibited devices,” (ii) Korchnoi ments available both in print and online. wearing mirrored sunglasses to neutralize Karpov’s habit of staring at his opponent, (iii) Korchnoi ac- 1.1 : History and cusing Karpov of receiving move advice encoded by Controversy the color of the yogurts that he was given during the Prior to 1948, the World Chess Championship was game, (iv) Karpov employing a parapsychologist, arranged at the behest of the reigning World Cham- Dr. Zukhar, who sat in the audience, fixedly staring pion. This formulation allowed the Champion to se- at Korchnoi, purportedly to disturb/hypnotize—this lect substandard opponents and avoid leading ad- led to intense bickering with regard to Zukhar’s seat- versaries. However, the confluence of the death of ing position and Korchnoi placing his own hypnotist the titleholder, World War II and the emergence of and two Ananda Marga sect members in the audi- an international governing body, F´ed´eration Inter- ence. nationale des Echecs´ (FIDE), led to the abandon- Ultimately, Karpov (barely) prevailed. He deci- ment of such matches by invitation. In 1948, FIDE sively beat Korchnoi in the subsequent 1981 WCC. invited six leading players to participate in a round- Karpov’s next challenger for the 1984 WCC was robin for the title of World Champion. Gary Kasparov, a mere 21-year-old, who had domi- Subsequently, up until 1990, the WCC was orga- nated his three candidates matches en route to qual- nized to run on a three-year cycle. The cycle started ifying. Rules for the 1984 WCC were that the win- with the world’s best chess players being seeded into ner would be the first player to achieve six victo- one or more . The top fin- ries, there being no limit on the total number of ishers in the interzonal tournaments qualified for a games played. Karpov started convincingly, leading series of elimination (“candidates”) matches. The 4–0 after 9 games and 5–0 after 27. However, Kas- player who emerged victorious from the candidates parov won the 32nd game and, following another matches met the reigning World Champion in a title long string of draws, won both the 47th and 48th match. games. While Karpov still led 5–3, he had lost 10 kg This period, which notably coincided with the Cold (22 lbs), been hospitalized several times and was on War, was marked by the dominance of Soviet play- the verge of collapse, his condition compounded by ers. From 1948 up to 1972 there were nine WCCs, alleged frequent use of stimulants. Amid great con- each featuring a Soviet champion and challenger. troversy, the president of FIDE canceled the match, However, in 1972, following a stunning surge through citing the failing health of the players due to the the interzonal tournament and candidates matches record breaking length and duration (6 months) of that included an unprecedented run of 19 consec- the contest. Provisions were made for a rematch to utive wins, an American, Robert (Bobby) Fischer, be held in 6 months time, with a fixed number (24) emerged as the challenger to titleholder . of games, Karpov retaining his title in the event of a CHESS, CHANCE AND CONSPIRACY 3

5 WCCs—we will investigate this incredibility in terms of runs. Consider n i.i.d. Bernoulli trials with success prob- ability p. Let Nn,k be the number of nonoverlapping success runs of length k and let Ln be the length of the longest success run. Although our primary interest is in probabilities such as Pr(Ln > k), we will sometimes obtain these by exploiting the equiv- alence {Ln < k} ⇔ {Nn,k = 0}. The exact probabil- ity distributions for these events have been histor- ically difficult to compute. Godbole (1990a) estab- lished the formula

Pr(Nn,k = x) y + x (1) = qypn−y x ⌊(n−kx)/kX⌋≤y≤n−kx   y + 1 n − kx − jk · (−1)j . j y Fig. 1. Karpov vs Kasparov key position, prior to White’s 0≤j≤⌊(nX−kx−y)/k⌋     21st move: 21. N × e6. While (1) offers computational advantages when con- trasted with previously established combinatorially tie. Kasparov won the 1985 rematch, becoming the derived alternatives (see, e.g., Philippou and Makri, youngest ever World Champion. He successfully, al- 1986), it is nonetheless problematic for evaluating beit narrowly, defended his title in three subsequent the probability of observing a run of length k = 18 in (1986, 1987, 1990) WCC matches against Karpov. a series of length n = 5540. However, Feller (1968), It is this body of Karpov vs Kasparov games, and exploiting the theory of recurrent events, provides accusations surrounding them, that is the subject of an approximation that is highly accurate even for our further analysis. moderate n, 1 − pθ 1 2. KARPOV–KASPAROV 1985: MOVE RUNS (2) Pr(L < k) ≈ · , n k + 1 − kθ θn+1 Fischer has frequently claimed that all games of where θ solves θ =1+ qpkθk+1 and q = 1 − p. Ap- the 1985 WCC match between Karpov and Kas- plying (2) to the Karpov–Kasparov collection gives parov were rigged and prearranged move by move (Fischer, 1996; Polg´ar and Shutzman, 1997). No- Pr(L5540 ≥ 18) ≈ 0.0105. tably, at his first press conference following his re- cent release to Iceland from immigration detention Now, this p-value is not incredibly incredible, espe- in Japan, Fischer repeated this allegation (ESPN cially since it does not accommodate Fischer’s im- plicit search for other (run) patterns—presumably a Sportscenter, March 25, 2005). One element of this run of moves to dark squares would also have regis- accusation, indeed the only concrete piece of evi- tered. On the other hand, it allows runs to straddle dence proffered, concerns the fourth game and the different games and this does not reflect Fischer’s position following Black’s 20th move as depicted in likely ascertainment process. Rather, runs would be Figure 1. detected within games and, accordingly, individual Fischer (1996) asserts: games should constitute the units of analysis. Starting on move 21, White makes no less So, focusing on the key game, we are interested in than 18 consecutive moves on the light Pr(L63 ≥ 18), which can be evaluated using either squares. Incredible! (1) or (2). These give (with agreement to six decimal places) Kasparov and Karpov played a total of 144 games that totaled 5540 moves over the course of their (3) Pr(L63 ≥ 18) = 0.0000896. 4 M. R. SEGAL

If this p-value is adjusted for multiplicity (number The prescription pi = 0.5 derives from the fact of games = 144) via Bonferroni correction, we ob- that 32 of the 64 squares on the are light. tain an (adjusted) p-value (0.013) that again does Indeed, the symmetries and piece and moves not impress as being overly incredible. However, not are such that, prior to White’s first move, there are only is such Bonferroni correction conservative, but equal numbers of moves to light and dark squares. it also does not reflect the length structure of the However, that situation does not necessarily persist. game collection. At the cost of obtaining p-values For example, following Fischer’s generally preferred for the maximal run within each game, we could first move, 1. e4, we have p2 ≥ 2/3. In studying hit- seek to redress these concerns using, say, step-down ting streaks in baseball, Albright (1993) employed Bonferroni (Dudoit, Shaffer and Boldrick, 2003) or so-called situational covariates to capture variation in underlying probabilities of getting a hit for a given false discovery rates (Storey, Taylor and Siegmund, at bat. These included such features as home field 2004). But there is a far more fundamental con- and day/night game. We could attempt an analo- cern that affects the computation of these individ- gous approach here. Relevant covariates would in- ual p-values. The results given in (1) and (2) re- clude presence of opposite-colored bishops, presence quire that the underlying sequence of Bernoulli tri- of knights, nature of the and even als is i.i.d. Here the corresponding ith random vari- advent of a , the latter often inducing able can be designated as Xi ≡ I{White’s movei → repetitions. However, selecting, characterizing and light square} and identically distributed equates to quantifying such positional covariates seems prob- pi = Pr(Xi =1)= p ∀ i. All the above p-value cal- lematic, especially considering the availability of a culations have been based on the seemingly natural simple proxy. specification pi = 0.5. We next further scrutinize this The proxy is based on the set of legal moves. After specification. all, 18 consecutive light square moves in checkers

Fig. 2. Probabilities of White making a legal move to a light square. The 18 move run is indicated by the dashed vertical lines. CHESS, CHANCE AND CONSPIRACY 5 would be truly incredible, since checkers is played Nn,k (recall equivalence to Ln), Fu and Koutras es- ∗ n ∗ ∗′ exclusively on dark squares. Define tablished Pr(Nn,k =0)= π0( i=1 Λi )U , where

#{legal movesi → light square} qi pi Q0 0 . . . 0 (4) pi = . #{legal moves } qi 0 pi 0 . . . 0 i   ∗ qi 0 0 pi . . . 0 Results that follow are not affected by how possible Λi =  ......   ......  ambiguities surrounding are handled. Here, (k+1)×(k+1)     we exclude castling throughout. At the onset of the  qi 0 0 0 ... pi   0 0 0 0 . . . 1  run we have p21 = 33/43. Using this value in either     (1) or (2) gives ∗ ∗ and π0 = (1, 0,..., 0) and U = (1,..., 1, 0) are 1 × (k + 1) vectors. For the key Karpov–Kasparov game Pr(L63 ≥ 18|p = p21) = 0.096. use of this imbedding, with pi as per (4) and Fig- While again this is not remarkable, even prior to ure 2, gives multiplicity adjustment, we have imposed p21 (6) Pr(L63 ≥ 18|pi) = 0.0065. throughout the game. Inspection of Figure 2, which plots pi vs i, reveals this to be an extreme choice. The densities (smoothed histograms) for L63 under What is needed is a method for evaluating runs with pi and pi = p = 0.5 are presented in Figure 3. The varying success probabilities. shift in mass corresponding to (appropriate) use of Fu and Koutras (1994) provided just such a legal moves is evident and underscores the difference methodology. They showed that the distributions of between the p-values (3) and (6). We note that com- a variety of run statistics can be readily evaluated putation via imbedded Markov chains is exceedingly in the nonidentically distributed case by appropri- fast and easy. ate imbedding in a finite Markov chain. We briefly Since (6) is our final p-value proffered for Karpov recapitulate the relevant definitions and results. For and Kasparov’s 18-move run of light square moves, we provide some context with regard to magnitude given n, let Γn = {0, 1,...,n} be an index set and by citing Short and Wasserman’s (1989) appraisal of let Ω = {a1,...,am} be a finite state space. A non- negative, integer-valued random variable Z can be the p-value they computed in assessing the “streak- n of-streaks”—Joe DiMaggio’s 56-game hitting streak. imbedded into a finite Markov chain if: They reported one p-value of 0.0000486 and com- (a) there exists a finite Markov chain {Yi : i ∈ ment that “the paucity of this probability is not Γn = {0,...,n}} defined on Ω, overwhelming.” Accordingly, we would not charac- (b) there exists a finite partition {Cx,x = 0, 1, terize the p-value in (6) as incredible. ...,l} on Ω, and (c) for every x = 0, 1,...,l, we have Pr(Zn = x)= 3. CHESS PROGRAMS: Pr(Yn ∈ Cx). In baseball, the batters objective is to hit, the 1919

If Zn can be imbedded into a finite Markov chain Chicago Black Sox excepted. In basketball, where then, simply as a consequence of the definition and streak shooting has also been statistically evaluated Chapman–Kolmogorov equations, we have (Theo- (Larkey, Smith and Kadane, 1989; Tversky and rem 2.1, Fu and Koutras, 1994) Gilovich, 1989a, b), the objective is to make the bucket, with 1978–1979 Boston College (among oth- n ers) excepted. In chess, if the objective was to move (5) Pr(Z = x)= π Λ U ′(C ), n 0 i x to light squares, there would be a lot more sacrifice i=1 ! Y of dark square bishops. So, to infer anything about where Λi is the m × m transition probability matrix “signal” or “suspicion” in the moves that constitute of the Markov chain, π0 is the initial distribution the run singled out by Fischer, it is necessary to e e and U(Cx)= r : ar ∈Cx r for r a 1 × m unit vector judge the quality of these moves. If, conditioning on having 1 at the rth coordinate and 0 elsewhere. successive positions within the streak, the best move P The art in utilizing imbedded Markov chains for happens to be to a light square, then what is the evaluating run statistic distributions lies in eliciting message/implication? Is it incredible if the World suitable Ω and Λi. For our run statistic of interest, Champion makes 18 good moves in a row? Was the 6 M. R. SEGAL quality of moves played during the run notably dif- comparisons will be relative—{in run (R)/not in run ferent than the quality of moves chosen outside the (R¯)}—all that is required is that strong moves are run? Of course, judging quality is tricky—Fischer, consistently recognized and that the assessments are Karpov and Kasparov are all former World Cham- unbiased with respect to the move run. pions and regarded among the greatest players of all Fritz returns a quantitative score for each move time—it would be presumptuous for mere mortals to evaluated. Here, the more positive the score, the second guess their evaluations. better the move. We cannot just compare R and So, to make such judgments we turn to chess- R¯ scores directly, since differing score distributions playing software. Select programs are now capable reflect the decisiveness of the move and so, for exam- of playing competitively against the world’s leading ple, are correlated with stage of the game. Thus, for grandmasters. For example, in October 2002, World our first outcome, we standardize according to the Champion drew an eight-game best move possible and define ∆i = scorei −bestscorei, match against the Fritz program; in early 2003, Kas- where scorei is Fritz’s score for the ith move as parov drew a six-game match against the Deep Ju- played and bestscorei is the score for the best pos- nior program; and in November 2003, Kasparov drew sible move. So, ∆i ≤ 0 with less negative values cor- a four-game match against Fritz. Accordingly, we responding to better moves. Comparing the ∆s ob- use Fritz to evaluate move quality. While we contend tained during the run (nR = 18) to those not in the that these results, along with wider tournament per- run (nR¯ = 45) via a two-sample t-test shows that formances and ratings, establish Fritz’s credentials better moves tended to be played during the run, to determine good moves, it is important to recog- but the difference is nonsignificant (p = 0.27). The nize that we are not claiming infallibility or even su- same findings hold if we limit nonrun moves to the periority of Fritz’s move assessments. Rather, since nine preceding and nine succeeding the run (to exert

Fig. 3. Smoothed histograms of Pr(L63 = k). Tail areas to the right of the dashed line at k = 18 correspond to the p-value reported in (3) and (6). CHESS, CHANCE AND CONSPIRACY 7

finer control over stage of game), and/or if we work The first was based on the key position itself (1). If ∗ on a relative scale (∆i =∆i/bestscorei), and/or if this position had occurred in other high level games, we work with ranks instead of scores. then the ensuing sequence of moves would be poten- A more sophisticated approach is afforded by em- tially informative with regard to the incredulity of ploying the binary outcome that indicates whether the run. Not surprisingly, search of an online data the move played coincided with Fritz’s best move, base that comprises more than two million games Yi = I{∆i = 0}, and using logistic regression to at- (www.chesslab.com/) revealed the position to be tempt to adjust for putative covariates. This also unique. remedies a concern with t-test appropriateness in The next search is motivated by an important fea- view of the underlying mixed (discrete with mass at ture of the key position: the presence of opposite- zero, continuous) distribution of ∆. As previously, colored bishops. This is arguably the most impor- covariate specification is challenging: what we pro- tant attribute with respect to generating move runs pose here is natural but not exhaustive. Playing the on either light or dark squares. We used the com- best move when it is obvious is less incredible than mercial Chessbase (www..com/) Big 2000 when it is obscure. Accordingly, we adjust for po- data base which, although not as sizable as Chesslab sitional complexity which, for each move, we op- (1,327,059 games), possesses more powerful search erationalize as complexity(j, k) = score(jth best)/ tools. The following search parameters were prescribed— score(kth best) for differing choices of j > k = 1,..., 5. for a game to be selected it must satisfy all require- Similarly, playing the best move when the number of ments at some stage: possibilities is limited is also less compelling, so we use possibles = #{legal moves} as another covari- 1. Opposite-colored bishops ate. We fitted several models of the form logit(Y )= 2. No knights or major pieces βrun+ f(possibles, complexity(j, k)), corresponding 3. Three to four pawns to different specifications for f, j and k. All gave 4. Average Elo rating of players ≥ 2500 highly null results with respect to tests for β, the 5. Length of game ≥ 80 moves parameter of interest. In addition to limiting the number of games chosen, Therefore, when considering the quality of moves these criteria are motivated by the following con- played, there is nothing distinctive, let alone incredi- siderations. Item 2 eliminates pieces that can move ble, about the run Fischer isolates. However, in addi- on either color squares; item 3 balances complex- tion to relying on Fritz’s assessments of move qual- ity; item 4 imposes level play. Item 5, ity, these analyses could be criticized on the basis which mandates relatively long games, is motivated of being underpowered to detect distinguishing at- by run statistic asymptotics. In particular, if n, k → tributes of the run. So, as a final approach, we turn k ∞ such that nqp → λ, then Pr(Nn,k = x) tends to to repositories of chess games to provide broader the Poisson probability e−λλx/x! (Feller, 1968; God- context for the run. bole, 1990b). So if p is fixed, k = O(log(n)), thereby bestowing a premium on examining longer games. 4. SEARCHING CHESS GAME DATA BASES The resulting search of Chessbase Big 2000 yielded Efron (1971) reexamined Bode’s law “governing” 11 games that had, in chronological order, the fol- mean planetary distances to the Sun as an exem- lowing maximal run lengths: L85 = 21,L109 = 29, plar of whether an observed sequence of numbers L81 = 46,L90 = 10,L92 = 13,L85 = 18, L81 = 17, follows a simple rule. He states that differing an- L86 = 14,L98 = 20,L113 = 9,L94 = 12. Fischer’s iden- alytic approaches would have been employed had tified Karpov–Kasparov run, with L63 = 18 does not measurements on 50 solar systems been available. So stand out against this collection. Indeed, if anything far, our treatment of the significance of the run has is notable, it is the third game with L81 = 46: not been confined to the key Karpov–Kasparov game in only was this game played at the candidates level question. However, unlike the planetary system situ- (Timman vs Salov, 1988), but, in addition to the ation, there are extensive data bases of chess games cited 46-move run (played by Black) that attains that we can mine so as to appraise the distinctive- an appreciably smaller p-value than the Karpov– ness of the run identified by Fischer. To this end we Kasparov run (2.24 × 10−13), it featured a separate conducted three structured searches. 34-move run (played by White). 8 M. R. SEGAL

Fig. 5. Probabilities of White making a legal move to a light square. The 13-move run is indicated by the dashed vertical lines.

For the third and final search we elected to scru- what does this imply about the former? It is of in- tinize the 827 Chessbase Big 2000 games of Fischer terest to note how the pi fluctuations in the Fischer– himself. Search item 1 was retained, the restrictions Reshevsky game, as depicted in Figure 5, are such imposed by items 2, 3 and 4 were eliminated, and that the probability density for Pr(L57 = k) under item 5 was relaxed to game lengths ≥ 50 moves. pi from (4) is barely distinguishable from that under This yielded a total of five games. We focus on one pi = 0.5, as illustrated by the smoothed histograms of these, Fischer vs Reshevsky, 1957, in Figure 6. Championship. In particular, the position arising prior to White’s 34th move is of interest; see Fig- 5. DISCUSSION ure 4. From this position Fischer (White) played 13 consecutive moves to dark squares, this in the con- In their evaluation of coincidences, Diaconis and text of a 57-move game with move-by-move prob- Mosteller (1989) showcased the roles played by mis- abilities of moving to a light square as depicted in perception, multiplicity and the law of truly large Figure 5. numbers. These factors have arguably contributed We follow the same approach in using imbedded to Fischer’s overstating the significance of the Karpov– Markov chains to evaluate the significance of this Kasparov move run. Indeed, misperceptions surround- run, using pi in accordance with (4) and Figure 5, ing the significance of runs arising from random pro- as was employed in Section 2. This gives cesses are purportedly commonplace; see, for exam- ple, Scheaffer, Gnanadesikan, Watkins and Witmer (7) Pr(L57 ≥ 13|pi) = 0.0023. (1996) for pedagogic exercises that illustrate this The punch line is that the p-value in (7) is more ex- point. These misperceptions are compounded when treme than that achieved by the Karpov–Kasparov (implicit) search for an extreme run is performed run given in (6). Thus, if the latter is “incredible,” and multiplicity considerations are ignored; see Al- CHESS, CHANCE AND CONSPIRACY 9

Fig. 6. Smoothed histograms of Pr(L57 = k).

bert (2004) for some examples from baseball. In ad- dition to seemingly failing to account for these con- cerns when assessing run significance, Fischer, in ex- pressing incredulity at the Karpov–Kasparov move run, has also assumed a constant “success” proba- bility for each move: pi = p = 0.5. Even aside from variation in pi, it is clear from (1) or (2) that Pr(Ln ≥ k) depends heavily on p. This has again been noted in the baseball context (Casella and Berger, 1994)—a more successful player (team) is likely to have a longer hitting (winning) streak than a less successful player (team). Indeed, Lou (1996) provided an elegant extension to the imbedded Markov chain approach of Fu and Koutras (1994) that allows for evaluation of joint and con- ditional distributions of the number of successes and Ln. Not only does her framework accommodate vary- ing pi, but it also facilitates testing of between-move dependency. However, application of these methods here is problematic owing to difficulties in estimat- ing, or even prescribing, the requisite transition prob- Fig. 4. Fischer vs Reshevsky key position, prior to White’s abilities. Furthermore, the power curves displayed 34th move: 34. N d4. by Lou (1996) suggest that in our n = 63 setting we 10 M. R. SEGAL will be hard-pressed to detect between-move depen- laws of chance. They base this assertion on analyses dencies. Additionally, as noted by Rubin, McCulloch of data obtained by coding shot attempts from 48 and Shapiro (1990), there are instances where such televised NBA games plus free throw sessions of col- conditional analyses are inappropriate. These include lege players. In an effort at refutation, Larkey, Smith situations when there is no intrinsic interest in the and Kadane (1989) took issue with the analyses in total number of successes, as is the case here where large part on the basis of sample space considera- success corresponds to a move to a light square. tions. They argued that the unit of analysis should Alternative approaches to appraising the signif- reflect “cognitively manageable chunks,” capturing icance and/or asserting the existence of runs and temporal proximity, which they operationalize as streaks have been advanced. In the sports context, 20 consecutive shot attempts. Tversky and Gilovich Yang (2004) employs Bayesian binary segmentation (1989b) offer a rebuttal that (in part) further re- to assess whether transitions in success probability fines sample space considerations by accommodat- are evident. Applying his methodology to the key ing individual playing times. Here a definitive char- Karpov–Kasparov game yields highly null results. acterization of sample space seems daunting. More As highlighted by a referee, evaluation of success challenging still is capturing variations in p , despite runs can be highly dependent on (i) the sample i the widely acknowledged need to do so. This de- space employed and (ii) how the success probabil- rives from the need to accommodate the hard-to- ities, pi, are framed. Here, we have argued that (i) the appropriate unit of analysis is a game, rather quantify notion of defensive pressure, above and be- than a match or series of matches, and (ii) allow- yond facets such as position on the floor and phase of the game. Further compounding these obstacles ing variable pi, that at the least reflects the num- ber of legal moves available, is essential. The resul- is the absence of usable data bases: the analyses tant achieved significance levels were sufficiently null conducted required the individual investigators to [see (6)] that the above mentioned multiplicity cor- watch and encode film, which in and of itself led to rections [for the number of games in the Karpov– dramatic disputation (Tversky and Gilovich, 1989b). Kasparov match(es)] were not pursued. In our evaluations of the significance of the Karpov– However, in other settings, including two celebrated Kasparov move run, not only have we benefited from examples briefly described next, these considerations being able to frame the problem (relatively) pre- can be less clear-cut. In their evaluations of Joe cisely, but also from the existence of sophisticated DiMaggio’s 56-game hitting streak (baseball), Short chess-playing software and game data bases. The and Wasserstein (1989) performed various probabil- latter tool could be used to assess a run potentially ity calculations that are readily reproduced using far more remarkable than the move run, namely, either (1) or (2). What distinguishes the differing Fischer’s own aforementioned run of 19 consecu- calculations are the alternative sample spaces con- tive wins as part of the 1972 WCC qualifying cycle. sidered: season, career, history of baseball. Indeed, This run included two 6–0 shutouts in his two candi- dramatically different results ensue, with no attempt dates matches. As an indication of the incredibility to arbitrate or reconcile between them. Throughout, of these results and to ground things in the world they employ constant p based on lifetime batting of grounders, Time magazine equated the shutouts average. In this case, given the rich documentation with pitching “two straight no-hitters.” Perhaps Fis- surrounding baseball, it might be possible to im- cher’s ascent to World Champion was part of some prove on this imposition. This could make recourse conspiracy. either to day-of-game batting average or, more am- bitiously, to modeling pi based on game specific co- ACKNOWLEDGMENTS variates, akin to Albright (1993). However, as is made clear by the discussants of that article, such I thank the Executive Editor, an Editor, two ref- modeling is problematic. Furthermore, despite the erees and Chuck McCulloch for many helpful com- extensive compilation of historical baseball statis- ments and suggestions. tics, extracting the requisite game level data is, at best, a laborious undertaking. Tversky and Gilovich (1989a) contended that the REFERENCES “hot hand” phenomenon in basketball is a “cogni- Albert, J. H. (2004). Streakiness in team performance. tive illusion” that derives from misconceptions of the Chance 17(3) 37–43. MR2061937 CHESS, CHANCE AND CONSPIRACY 11

Albright, S. C. (1993). A statistical analysis of hitting Larkey, P. D., Smith, R. A. and Kadane, J. B. (1989). streaks in baseball (with discussion). J. Amer. Statist. As- It’s okay to believe in the “hot hand.” Chance 2(4) 22–30. soc. 88 1175–1196. Lou, W. Y. W. (1996). On runs and longest run tests: A Casella, G. and Berger, R. L. (1994). Estimation with method of finite Markov chain imbedding. J. Amer. Statist. selected binomial information or do you really believe that Assoc. 91 1595–1601. MR1439099 Dave Winfield is batting .471? J. Amer. Statist. Assoc. 89 Philippou, A. N. and Makri, F. S. (1986). Successes, 1080–1090. MR1294753 runs and longest runs. Statist. Probab. Lett. 4 101–105. Diaconis, P. and Mosteller, F. (1989). Methods for study- MR0829441 ing coincidences. J. Amer. Statist. Assoc. 84 853–861. Polgar,´ Z. and Shutzman, J. (1997). of the Kings MR1134485 Game. CompChess, New York. Dudoit, S., Shaffer, J. P. and Boldrick, J. C. (2003). Rubin, G., McCulloch, C. E. and Shapiro, M. A. (1990). Multiple hypothesis testing in microarray experiments. Multinomial runs tests to detect clustering in constrained Statist. Sci. 18 71–103. MR1997066 free recall. J. Amer. Statist. Assoc. 85 315–320. Efron, B. (1971). Does an observed sequence of numbers fol- Scheaffer, R. L., Gnanadesikan, M., Watkins, A. and low a simple rule? (Another look at Bode’s law). J. Amer. Witmer, J. (1996). Activity-Based Statistics. Springer, Statist. Assoc. 66 552–559. New York. Feller, W. (1968). An Introduction to Probability The- Short, T. and Wasserman, L. (1989). Should we be sur- ory and Its Applications 1, 3rd ed. Wiley, New York. prised by the streak of streaks? Chance 2(2) 13. MR0228020 Spassky, B. (1999). Excerpts from an interview. Inside Chess Fischer, R. J. (1996). Excerpts from an interview. Inside 12(4) 22–23. Chess December 23 9. Storey, J. D., Taylor, J. E. and Siegmund, D. (2004). Fu, J. C. and Koutras, M. V. (1994). Distribution theory Strong control, conservative point estimation and simulta- of runs: A Markov chain approach. J. Amer. Statist. Assoc. neous conservative consistency of false discovery rates: A 89 1050–1058. MR1294750 unified approach. J. R. Stat. Soc. Ser. B Stat. Methodol. Godbole, A. P. (1990a). Specific formulae for some suc- 66 187–205. MR2035766 cess run distributions. Statist. Probab. Lett. 10 119–124. Tversky, A. and Gilovich, T. (1989a). The cold facts about MR1072498 the hot hand in basketball. Chance 2(1) 16–21. Godbole, A. P. (1990b). Degenerate and Poisson conver- Tversky, A. and Gilovich, T. (1989b). The “hot hand”: gence criteria for success runs. Statist. Probab. Lett. 10 Statistical reality or cognitive illusion. Chance 2(4) 31–34. 247–255. MR1072518 Yang, T. Y. (2004). Bayesian binary segmentation procedure Krauthammer, C. (2005). Did chess make him crazy? Time for detecting streakiness in sports. J. Roy. Statist. Soc. Ser. April 26 96. A 167 627–637. MR2109410