From: AAAI Technical Report WS-97-04. Compilation copyright © 1997, AAAI (www.aaai.org). All rights reserved.

The Historical Development of ComputerChess and its Impact on Artificial Intelligence

David Heath and Derek Allum Faculty of Science and Computing, University of Luton, Park Square, Luton LU1 3JU United Kingdom [email protected] [email protected]

The minimaxalgorithm was first applied in a Abstract context in the landmark paper of Shannon.He also introduced the classification of In this paper we review the historical chess playing programsinto either type A or . developmentof computerchess and discuss its Type A are those that search by ’brute force’ impacton the conceptof intelligence. Withthe alone, while type B programs try and use some adventof electronic computersafter the Second considerable selectivity in deciding which World War, interest in was branches of the gametree require searching. stimulated by the seminal papers of Shannon (1950) and Turing(1953). Theinfluential paper Alpha-beta pruning was first formulated by of Shannonintroduced the classification of chess McCarthy at the Dartmouth Summer Research playingprograms into either type A(brute force) Conference on Artificial Intelligence in 1956. or type B (selective). Turing’s paper (1953) However,at this stage no formal specification of highlighted the importanceof only evaluating it was given, but it was implemented in game ’dead positions’ which have no outstanding playing programs of the late 1950s. Papers by captures. Thebrute force search methodis the Knuth and Moore (1975) and Newborn (1977) most popular approach to solving the chess have analysed the efficiency of the methodand it problem today. Search enhancements and has been proved that the algorithm returns pruningtechniques developed since that era have exactly the same moveas that obtained by full ensuredthe continuingpopularity of the type A method.Alpha-beta pruning remainsa standard minimaxingor, alternatively, a moveof the same technique. Other important developmentsare value. surveyed. A popular benchmark test for The success of type A ’brute force’ programs determiningintelligence is the Turingtest. In the using exhaustive search, minimaxingand alpha- case of a computerprogram playing chess the beta pruning, tables and other movesare generatedalgorithmically using rules search enhancements has had the unfortunate that have been programmedinto the by effect of minimising interest in the development a humanmind. Akey question in the artificial of type B programs. The work of Simon and intelligence debate is to what extent computer Chase (1973) established that most humansonly bytes aided by an arithmetic processingunit can be claimedto ’think’. consider a handful of plausible moves. The power and ability of the chess resides in his ability to select the correct subset Introduction of movesto examine. With the advent of computersafter the end of the In contrast the brute force programssearch the Second World War, interest in the development entire spectrum of initial movesdiverging from of chess playing programswas stimulated by two some given position referred to as the root. seminal papers in this area. The paper by These initial movesfan out, generating a game Shannon(1950) remains even to this day to be tree which grows exponentially with depth. central importance while the paper by Turing Apart from the work of Botvinnik et.al in recent (I 953)is equallyinfluential. years, there has been no significant progress in developing a type B strategy program which

63 would reduce the initial span of the search tree. The first significant chess playing program The successful development of algorithms that was by Bernstein (1957) and ran on an IBM704 introduced selectivity into the search engine, so computer, capable of performing approximately that the program followed similar lines of 42,000 operations per second. This was not a thought to those of a chess grandmaster, would ’brute force’ programas it only selected the best considerably reduce the amount of searching seven movesfor consideration using heuristics required. based on chess lore. Compared to the Turing’s paper (1953) highlighted the sophisticated brute force programs of today importance of only evaluating ’dead positions’ which generate the full span of moves at the which have no outstanding captures. It is from root, this is a very limited range of moves. this paper the term ’Turing dead’ is taken. Most The Bernstein program was built around the chess programs search to a quiescent position strategy of working to a plan. As it was (Turing dead) and then evaluate the position as incapable of doing a full width search due to function of the material balance and other time considerations, it selected its quota of seven features. This encapsulates movesfor deeper analysis by seeking the answer chess-specific knowledgetypically relating to to eight questions. Once the moves were pawnstructures and safety. selected, they were analysed to a search depth of 4 ply. The potential replies by the opponentwere History also selected on the basis of seeking answers to the sameset of questions. The historical evolution of computer chess Bernstein’s program played at a very programming techniques and knowledge can be elementary level and the first programto attain conveniently discussed in three broad eras. Each any recognisable standard of play was that of era is characterised by its own particular Greenblatt (1968). For a number of years this developments, some of which can be directly remained the most proficient chess program and linked to increased processor power, the played at an Elo strength of approximately 1500. availability of new hardware devices and others It had carefully chosen quiesence rules to aid to algorithmic advances. tactical strength and wasalso the first programto The boundary between each era is not always use transposition tables to reduce the search precise as it is sometimes not easy to a space. However, the Greenblatt program also clear dividing line across a time continuum.Any used an initial selection process to minimizethe such process is artificial. However,these broad size of the game-tree as the computing hardware historical eras are (Donskoy and Schaeffer of that era was incapable of achieving the 1990): computing speeds of today. Again this program, (i) 1st era (1950 - c1975) because of its selectivity at the root node, falls into the first era. (ii) 2ndera(c1975-c1985) The first programto achieve full width search (iii) 3rd era (c1985 onwards) and make ’brute force’ appear a viable possibility was Chess 4.5. This program was The first pioneering era as stated above runs developed for entry into the ACM1973 from 1950-c1975.Here there is a definite point Computer Chess contest by Slate and Atkin, at which this history commences,marked by the using the experience they had gained in publication of Shannon’spaper and the advent of programming earlier selective search chess electronic . These computers, although programs. The techniques used by Slate and originally regarded as revolutionary and Atkin (1977) are still in use today although they powerful, had but a fraction of the computing have been refined and improved over the years. power of the current generation of These standard techniques initiated what maybe microprocessors. Indeed hardware limitations classed as the second technology era giving rise characterise the first era of computer chess to programstypically searching to a fixed depth development, requiring highly selective as fast as possible and then resolving the horizon techniques in order to produce moves in an problemby extending checks at the cutoff depth acceptable time. The earliest programs were, and considering captures only in a quiescence therefore, Shannontype B. search.

64 The second era of computer chess also saw Move Ordering Techniques more emphasis placed on the development of Theprevious section has outlined the historical dedicated chess hardware. Typical of such developmentof computerchess. The transition developments was ’s chess to ShannonB type programsto ShannonA is not machine which won the ACMNorth American Computer Chess Championship in solely attributable to increasedcomputing power. It also partly arose out of increased 1978. The special purpose hardwareincreased the speed of Belle enabling it to analyse 30 understandingof the alpha-beta algorithmwhich million positions in 3 minutes and search was subjected to deep analysis by Knuthand exhaustively to 8 or 9 ply in middlegame Moore(1975). Other techniques for increasing positions. It also had an extensiveopening book. the efficiency of the search and pruning Belle won the 3rd World Computer Chess mechanismsalso becamemore prominentat the beginningof the secondera. Championshipin 1983, achieving a rating of Producinga cutoff as soonas possiblewith the over 2200 in 1980and wasthe first programto receive a Masterrating. It was for Belle that alpha-beta algorithm considerably reduces the Thompsondevised his first end gamedatabase, size of the search tree. Consequently, move ordering is an important aspect of achieving solving the KQKRproblem and hence partially removingthe perceivedweakness at that time of maximumspeed since, if we knowthe best move computersin endgamepositions. in a certain situation producingit early rather In 1975 Hyatt commencedwork on Blitz than late, will havebeneficial results. Theworst which was then entered in the ACM1976 North case is wherethe movesare examinedin such an order as to produceno cutoffs, generating the American Computer Chess Championship. maximumnumber of nodes to be analysed. This Initially Blitz wasselective and relied on a local is the maximaltree whichgrows exponentially evaluation function to discard moves.However, with depth, d, so that the numberof nodes the availability of the world’s fastest examinedassuming a uniformbranching factor, ,the Cray, enabled the program b, is givenby bd. Thebest situation is wherethe appropriately renamedCray Blitz, to use brute force and in 1981it wassearching approximately moves are all well-ordered and provide 3000 nodes per second and consistently immediatecutoffs, producingthe minimaltree achievingsix ply searches.This rate of analysis which,although it also growsexponentially with was improvedby the use of assemblylanguage depth, is very muchreduced in size by the cutoffs. In betweenthese two extremeswe have and the availability of the Cray XMPcomputer with multiprocessingfacilities, allowing20,000 - gametrees generated by chess programswhich, initially are unordered,but becomeprogressivley 30,000nodes to be searchedper secondin 1983. Thethird stage, aptly namedalgorithmic by moreordered as the depthof searchincreases. Algorithmic developments and various Donskoyand Schaeffer (1990), extends onwards heuristics are movingthe programscloser to the fromthe mid1980s to the current time. This has minimaltree. This progressive improvementin seen someconsiderable activity in the refinement reachingcloser and closer to the minimaltree is of the basic tree searchingalgorithms used (see borne out by Belle, whichit is estimated came later). It has also seen the emergenceof personal within a factor of 2.2 of the minimaltree, computerson a scale originally unenvisaged. Thewidespread practice of incorporating vast Phoenixwithin a factor of 1.4 and withinan impressivefactor of 1.2 (Plaat 1996). openingbooks and morecd-roms for end games. Iterative deepening is a useful device for This current phase has been most fruitful and allowing movesto be re-ordered prior to the seen a considerable increase in the playing depth being increased as well as providing a strength of chess programsto the point wherethe methodfor limiting search depth in responseto top commercialsoftware programsare generally time constraints. Originallyintroduced in 1969, recognisedas being superior to humansat speed it has becomestandard practice in brute force chessand in sharptactical positions. Understrict programs. tournament conditions the most highly rated Thekiller heuristic is another moveordering player of all time Kasparovhas nowlost a game, techniqueusing informationfrom the alpha-beta but not the match,against DeepBlue. pruning algorithm to facilitate it. Whenevera certain movecauses a cutoff in response to

65 another move,then it is likely that it is able to variation is searched with a full width alpha-beta refute other movesin a similar position. Such a search, while others are searched with a minimal moveis a ’killer move’and can be stored in a windowwhere [3 = (x + 1. killer table so that it can be tried first in order to With perfect moveordering, all movesoutside cause another cutoff. Captures and mate threats those estimated to be part of the principal provide the commonest form of such ’killer variation will be worse than those estimated to moves’. In this context the null move can be on it. This is proved when the minimal provide a list of the most powerfulmoves for the windowsearch fails low. Should this search fail opponent and assist in the move ordering high, a re-search has to be done with full alpha- process. beta width. The killer heuristic was described in detail by A refinement on PVS is NegaScout. This is Slate and Atkin (1977) although it had been used very similar to PVS and incorporates the in earlier chess programs. However,the benefits observation that the last two plies of a tree in a of this technique are controversial with Hyatt fail-soft search always return an exact value. claiming an 80% reduction while Gillolgly There have been a number of further observed no such significant reduction in size derivations of the so-called zero windowalpha- (Plaat 1996). beta algorithm, whichuse a zero (actually unity) A generalisation of the killer heuristic is the windowat all nodes, unlike PVSwhich uses zero history heuristic introduced by Schaeffer (1989). windowonly awayfrom the principal variation. This extends the ideas of the killer heuristic to Oneof these is described by Plaat et al. (1996a) include all the interior moveswhich are ordered and has been given the name MTD(f). It relies on the basis of their cutoff history. on a large memoryto hold a transposition table containing records of previous visits to nodes. Tree Searching Algorithms Werecall that in fact gameslike chess really take the form of graphs rather than trees, so that a Chess programs have used till now almost particular board position might be visited from exclusively one of a variety of related algorithms manydifferent paths. The memoryshould be as to search for the best move. Constant large as feasible, since it serves to hold as many improvementsin the search algorithm have been as possible of the results of visiting nodes. sought to reduce the numberof nodes visited, Since the memoryenhanced test algorithms and thus speed up the program. operate with zero window, they, of necessity, We now briefly describe a number of need to do manyre-searches. It is the use of variations upon the alpha-beta algorithm. These memory to help speed up the production of depend on having some knowledge, albeit results that makesthese algorithms competitive imperfect, of what is the most likely principal in speed. Although Plaat et al. (1996a) claim variation, the series of moves downthe tree their version is faster in practice than PVS,R. consisting of the best play. Hyatt, as he wrote in the rec.games. The usual call to alpha-beta at the root node chess.computer Usenet newsgroup, in April has an infinite window. In aspiration search 1997, still uses PVS for his chess someinitial estimate of a likely value is made program. The following pseudo-code from Plaat together with a window,or band of values this likely value could lie in. Clearly the success of et al. (1996a)illustrates their MTD(f)version. this algorithm dependson the initial estimates. If function MTDF(node-type, f,d)-+ a value is returned which lies outside the g := f; aspiration search windowthen a re-search needs upperbound := +o0; to be done with full alpha-beta width. There is lowerbound :=-oo; thus a trade-off to be made between a greater repeat degree of pruning using the limited windowand if g = lowerbound then the amountof re-search which has to be done. [3 :=g+ I; Principal variation search (PVS)is a variation else of alpha-beta search where the search at a [3 := g; particular level is ordered according to some g:=ABWM(root,[3-1,[3,depth); evaluation function. The node which is found to if g < [3 then be the most likely member of the principal upperbound := g;

66 else the processes of humanreasoning. They argued lowerbound := g; that if it was possible for computersto solve the until lowerbound >-- upperbound; , it must be possible for computers return g; to tackle other difficult problems relating to The algorithm works by calling ABWMa planning and applications to the economy.This preoccupation with intelligence formed the core number of times with a zero window search. Each call returns a bound on the minimaxvalue. of Turing’s work, commencingwith the creation These bounds are stored in upperbound and of Turing machines and then the formulation of lowerbound,forming an interval around the true the Turing test which still remains of central value for that search depth. Whenthe importance in the AI debate today. upper and lower bounds collide, the minimax Turing himself worked in the period 1940-45 value is found. at Bletchley Park as an Enigmacode-breaker. An ABWM(AlphaBetaWithMemory) is indifferent chess player himself, he was, conventional alpha-beta algorithm which has however, in the constant companyof the best extra lines to store and retrieve information into British chess players of that era, thus creating the environmentand ideas for the final phase of his and out of a transposition table. work. The luring test states that if the responses Interest has also been shown in the B* of a computerare indistinguishable from those of algorithm as well as those involving conspiracy numbers. These are more complex than a humanit possesses intelligence. Consequently, it can be argued that a computer playing chess algorithms involving alpha-beta searches, and seem to get closer to the Shannon B kind of possesses intelligence as, in many cases, its movesare identical to those of chess experts. algorithm where some expert knowledge of the gameis programmedin to guide the machine. This is the strong AI viewpoint. It was precisely to repudiate these claims of intelligence that Searle (1980) put forward his Impact on AI and Philosophy famous Chinese Roomargument against strong In the pioneering era, both computer chess and AI. Chess programs produce their moves AI were emergingdisciplines. At that early stage algorithmically. Suppose another humanbeing computer chess was regarded as a testbed for AI having no prior knowledge of chess per se, search and techniques. The performs the same algorithms, does he also early chess programs were primitive but are now understand the game?The weak AI viewpoint is much more sophisticated and have achieved that computers lack any such intelligence or spectacular results. However,as recognised by understanding. This viewpoint is supported by Donskoyand Schaeffer (1990), the dominantion Penrose (1995) who cites an example of of AI by computer chess has nowchanged and it position in which Deep Thought blundered is currently viewedas a small part of AI with its because of its lack of understanding of a methods, identified primarily as brute force, particular position, despite being capable of regarded within the AI communityas simplistic. deceiving us into thinking that it really has some However,the impact of computer chess on AI chess intelligence. can be gauged by the number of methods developed within the chess environmentthat are Conclusion domain independent and applicable elsewhere. It is, for instance, undeniable that computerchess The strength of the top chess playing programsis has made significant contributions to search continually increasing. In this paper we have reviewed the main techniques that have enabled techniques whichfind applications in other areas such as theorem proving and problem solving in programmers to achieve such a spectacular general. It has also emphasised the development increase in playing strength commencingwith the earliest exploratory programs to the of special hardwareto solve the chess problem, a viewpoint which is echoed in other areas of AI sophisticated software of today. During this period, we have seen significant increases in (Horacek 1993). processing power which, despite the prevalent The computerscientists and AI enthusiasts that von Neumannarchitecture of today, is still initiated the pioneering era of computer chess increasing in power. Consequently, we foresee were partly motivated by a desire to investigate

67 the possibility of further increases in playing Penrose, g. 1995. in Shadows of the , strength. London, Vintage. Future work will probably refme the Plaat, A.1996. Research Re:Search and Re- traditional Shannon A type algorithms even search. Ph.D.diss., Dept. of ComputerScience, more. For example, new avenues of research still ErasmusUniversity. exist on application-independent techniques of Plaat, A., Schaeffer, J., Pijls, W., and de Bruin, exploiting the fact that computerchess trees are A. 1996a. Best-first fixed-depth minmax really graphs (Plaat 1996b). However,there is algorithms, Artificial Intelligence 87:255-293. limit on the extent to which this can proceed. Plaat, A., Schaeffer, J., Pijls, W., and de The consideration of Shannon B type algorithms Bruin, A. 1996b. Exploiting Graph Properties of is an area requiring further investigation and GameTrees, Proceedings of the 13th National development. Conferenceon Artificial Intelligence, Portland, Oregon. References Searle, J.R. 1980. , brains amd programs, in The behavioural and brain Botvinnik, M.; Cherevik, D.; Vladimirov, V., sciences, Vol 3. Cambridge, Cambridge and Vygodsky, V. 1994. Solving Shannon’s University Press. Problem: Ways and Means, Advances in Schaeffer, J.1989. The history heuristic and ComputerChess 7, van der Herik H. J, Hersberg alpha-beta search enhancements in practice, I.S, and Uiterwijk, J.W.H.M.(eds) Maastricht, IEEE Transactions on Pattern Analysis and University of Limburg. MachineIntelligence, 11(1):1203-1212. Donskoy, M.V and Schaeffer, J. 1990. Shannon, C.E. 1950. Programming a Perspectives on Falling from Grace, in Computer for Playing Chess. Philosophical Computers, Chess and Cognition, Marsland, T Magazine41(7): 256-275. and Schaeffer, J. (eds) NewYork, Springer Simmon,H.A and Chase, W.G. 1973. Skill in Verlag. Chess. AmericanScientist, 6 l, 482-488. Horacek, H. 1993. Computer Chess, its Slate, D and Atkin, L.1977. Chess 4.5:The Impact on Artificial Intelligence, ICCAJournal North Western University Chess Program, in 16(1):31-36. Chess Skill in Manand Machine, P.Frey(.), Knuth, D.E, and Moore, R.W. 1975. An 82-118. NewYork, Springer Verlag. analysis of alpha-beta pruning. Artificial Turing, A. M. 1953. Digital computers Intelligence 6(4):293-326. applied to games, in Faster than Thought, Newborn, M.M. 1977. An analysis of alpha- Bowden,B.V.(ed). London, Pitman. beta pruning. Artificial Intelligence 6:137- 153.

68