1 Running head: and the Flynn effect

The Flynn Effect: Chess as a real life example

Kari S. Carstairs and Manny Rayner

Kari S. Carstairs, Carstairs Psychological Associates Limited, London, UK;

Manny Rayner, Faculty of Translation and Interpretation, Geneva University.

We confirm that we have no conflict of interest in writing this article.

Correspondence concerning this article should be addressed to Dr Kari Carstairs,

7 Mayfield Road, Bromley, Kent, BR1 2HB, UK. E-mail: [email protected]

2 Running head: Chess and the Flynn effect

Abstract

Population gains in IQ scores over time (known as the “Flynn effect”) are well documented. The practical implications of this psychometric finding have been debated.

We discuss data from the world of chess that analyses records of games spanning a long period. Powerful chess software is capable of accurate evaluation of the quality of problem solving ability in these games. This analysis demonstrates large improvements over time among the top chess players.

Keywords: Flynn effect, chess, IQ.

3 Running head: Chess and the Flynn effect

The Flynn Effect: Chess as a real life example

Introduction

The Flynn effect (Flynn, 1984) concerns the average rise in IQ scores for a population over time. It has been estimated at approximately 0.30 IQ points per year

(Neisser, 1998). Differing values have been reported by age (Kane, 2000) and an average increase of 0.44 IQ points by year across the Wechsler scales was reported more recently

(Beaujean & Sheng, 2014). However, the basic finding of population gains in IQ scores over time is robust (Trahan, Stuebing, Fletcher & Hiscock, 2014). These gains result in the need for updated norms for IQ tests, with implications for clinical (Grégoire, Daniel,

Llorente & Weiss, 2016), educational (Kanaya & Ceci, 2012) and forensic settings

(Kane, 2003) that rely on cut-off scores to make decisions about individuals.

The measurement of IQ is an empty exercise unless it can be related to real life behavioural outcomes. A substantial body of research shows that IQ correlates with variables such as longevity and social mobility (Deary, Whalley & Starr, 2009), and educational achievement (Mayes, Calhoun, Bixler & Zimmerman, 2009). However, the causes of the increases in IQ over time remain unclear, with genetic, environmental and measurement factors having been considered. The Flynn effect suggests that the children of today should be far brighter than their grandparents yet we do not find highly visible differences between the daily functioning of the different generations, so the psychometric findings present us with an apparent paradox. Some of this paradox may be attributed to differential effects on the different cognitive abilities that contribute to the

Full Scale IQ score and changing cognitive demands that stem from socio-cultural changes in the type of thought process that is valued (Flynn, 2009). 4 Running head: Chess and the Flynn effect

Evidence of changes in cognitive functioning over time that does not rely on psychological testing is difficult to find (Howard, 2001). Chess provides us with a natural “real life” experiment because the game has remained constant and there are historical records of games played between the 16th century and the present. Chess performance is related to a combination of intelligence, motivation and practice (Grabner,

Stern & Neubauer, 2007, and de Bruin, Kok, Leppink & Camp, 2014) and pattern recognition and strategic problem solving contribute to expertise in chess (Connors,

Burns & Campitelli, 2011, Ferrari, Didierjean & Marmèche, 2008 and de Groot, Gobet &

Jongman, 1996). We argue that if we can show improvements in performance at the top level in the world of chess, this is likely to reflect increases in intelligence rather than motivation and practice.

Until recently, evidence that the performance of top chess players has improved has been unconvincing. Progress made in opens up new possibilities but the key findings have only appeared in chess and computer science publications and they have not been related to the Flynn effect. We outline methodological issues with measurement of chess performance, and then summarise earlier work and what we see as its shortcomings. We then outline the newer lines of investigation.

Background: options for measuring chess performance

There are two reasonable strategies for measuring chess performance. The coarse- grained strategy is to look at competitive results: other things being equal, better play will result in better scores. The problem is what is meant by “other things being equal”; the 5 Running head: Chess and the Flynn effect

obvious approach is to exploit the system the international chess world uses to measure competitive performance. There are two systems. The older one is that of awarding titles, the highest of which is the (GM) title. The newer system, in use since 1970, is the Elo rating. This consists of a formula applied after rated games. Points are subtracted from the loser and given to the winner, the number of points transferred being dependent on the difference between the two players’ ratings at the start of the game. If the game is drawn, points move from the higher-rated to the lower-rated player. The two systems have been in practice been fused together. GM titles, which initially were given out on ad hoc grounds, are now awarded by the International Chess Federation using a defined scheme. This requires new title holders to achieve three tournament results (“GM norms”) each corresponding to an Elo rating of at least 2600, against opponents enough of whom have already acquired GM titles.

The fine-grained strategy is based on examining the quality of individual moves.

Here, the problem is how to do this in a clear, objective way. Until recently, this was not technically feasible, but advances in computer chess have opened up new possibilities.

We will return to this point shortly. First, we review early work based on the coarse- grained strategy.

Early analysis of chess data

Howard (2001) argued that chess performance gives real-world evidence of the

Flynn effect, citing how chess players are acquiring GM titles at increasingly early ages

(Howard, 2005). However, people familiar with the chess world would make a simple retort: the fact that young players are acquiring the GM title earlier may only show that it 6 Running head: Chess and the Flynn effect

has become easier to get the three “GM norms” required. This is undeniably true, if only for the reason that acquiring a GM norm requires the candidate to play in tournaments where they meet enough current GMs. Fifty years ago, there were less than a hundred players in the world who had a GM title, so there were few such opportunities. Now, there are more than a thousand GMs, so there are far more tournaments where a player can hope to get a GM norm; in particular, there are many open tournaments where GMs take part, so a promising young player does not need to wait until they get invited to strong closed tournaments. These changes contribute to a widespread perception in the chess world that the GM title has lost its earlier prestige, calling into question the meaningfulness of the trend Howard cites.

The fact that the GM title has become easier to acquire is the result of a more general phenomenon. It is well known that Elo ratings, like IQs, have trended steadily upwards. For example, when Bobby Fischer challenged for the World

Chess Championship in July 1972, Fischer was the only player in the world with a rating of over 2700, and there were 13 players with ratings of 2600 or more

(http://www.olimpbase.org/Elo/Elo197207e.html). By July 1992, there were four players over 2700 and 46 over 2600 (http://www.olimpbase.org/Elo/Elo199207e.html). The current (July 2016) World Chess Federation top ranking list

(https://ratings.fide.com/top.phtml?list=men) shows 40 players with ratings of over

2700, six of whom have ratings better than Fischer’s peak of 2785.

The fact that talented players are achieving the GM title at a younger age is thus beside the point; if we are prepared to accept the validity of arguments based on competitive results at all, the general rise in Elo ratings provides much stronger evidence 7 Running head: Chess and the Flynn effect

that the top players are getting better. Most chess players have however been reluctant to accept the obvious interpretation of the facts, and prefer the alternate explanation of

“rating inflation”. According to this theory, a rating of 2700 today is equivalent to a rating of much less than 2700 in 1972, and rating figures - and hence GM titles - lose value over time. The most common argument used to support the idea of "rating inflation" is that players enter the rating system and play games, after which some drop out and some continue. The ones who lose tend to drop out, so there is a net influx of rating points.

Although this argument has some plausibility, it is not supported by the more fine-grained analysis which we will shortly consider. For the moment, we summarize the above arguments as follows: if we wish to demonstrate that players are getting better, it is sufficient to show that rating inflation is not happening, and that Elo ratings are stable over time.

John Nunn: initial attempts at fine-grained analysis

An analysis based solely on competitive results is unconvincing; what is needed is an analysis at the level of individual moves. Until recently, it was impossible to do the necessary work, which requires detailed, objective analysis of thousands of games played by the very best players in the world. Modern chess engines, which now have ratings of well over 3000, are however equal to this task. Their evaluations of human games are not error-free, but they are reliable enough to allow solid conclusions to be reached, and a number of studies have now been carried out based on this idea. An important early example of this kind of study was performed by John Nunn. Nunn’s work has not been 8 Running head: Chess and the Flynn effect

published through normal academic channels, although it is mentioned approvingly by

Flynn (2009; p. 44). Although Nunn is not a member of the mainstream academic community, chess players will be inclined to take him very seriously. He is a former world top 10 player and a prolific writer of high-quality chess books (he has won the

British Chess Federation’s “Book of the Year” prize three times, a record surpassed only by former World Champion Garry Kasparov).

Nunn (2009) devotes a chapter to the question of whether it is possible to use modern chess software to directly compare the playing strengths of current and historical players. He examined the famous tournament played at Carlsbad in 1911, which was attended by many of the world’s then top players. He analyzed all 325 games using a chess engine set to search for clear mistakes. His comparison point was the 1993 Biel tournament, which was of a similar standard. Nunn’s description is impressionistic, but strong chess players reading it will be likely to agree with his overall conclusions. Based on the computer’s analysis, the average playing strength of the competitors at the 1911 tournament turns out to be extremely weak. Nunn estimates it at 2129 Elo; in modern terms, this would be just good enough to play on the bottom boards of a decent club team.

The game fragments shown by Nunn suggest that the players are, by modern standards, weak in every part of the game. They commit many elementary tactical oversights. More interestingly, they do not just miscalculate, they also play moves that betray a lack of understanding for what would now be regarded as obvious strategic principles. One striking example is the game between Fahrni and Burn which Burn loses because he apparently does not understand the principles of the ending rook against pawn, which any strong player of today would consider elementary. 9 Running head: Chess and the Flynn effect

Nunn’s arguments are persuasive for chess players, but others may find them inconclusive. We turn now to the work of Regan.

Ken Regan and collaborators: systematic fine-grained analysis

The most rigorously compelling, quantitative evidence comes from Ken Regan’s work. Regan is a strong player (International Master) and a professor of computer science with an excellent knowledge of statistical methods. His work has been published in computer science and chess contexts. He has focused on the problem of uncovering players who cheat by using electronic help. Regan is regarded in chess circles as the world’s foremost expert on this subject. Goldowsky (2014) discusses his work in Chess

Life and gives a good nontechnical overview. Details, which presuppose background in chess software, statistical methods and computer science, can be found in Regan and

Haworth (2011).

Regan and his colleagues developed fine-grained methods of accurately estimating chess rating from decisions made at the level of each individual move, using strong chess engines to evaluate the quality of the moves. As explained in the Goldowsky article, each move in a game can be regarded as the answer to a test question implicitly asked by the preceding position; the correctness of the answer is scored by the engine.

Regan and his colleagues have succeeded in defining a measure of this kind, the

“Intrinsic Performance Rating” (IPR), which correlates well with Elo rating over large samples of players taken from the same time period.

The crucial finding for our purposes is that IPR scores also agree when compared between different time periods, providing strong evidence that ‘rating inflation’ does not 10 Running head: Chess and the Flynn effect

in fact exist. Regan and Haworth (2011) state: “We conclude that there has been little or no ‘inflation’ in ratings over time—if anything there has been deflation.” The good fit with the more anecdotal study carried out by Nunn, where a world-class player manually checked the machine’s analysis, gives additional support to the conclusion that the top players are getting better. According to Regan’s results, Bobby Fischer, in 1972 the best player in the world by a huge margin, would today only just make the top ten. Similarly, the world-class players from 1911 that Nunn studied would now only be middle-ranked club players.

Conclusion

The chess findings sketched here are consistent with the Flynn effect; when one moves forward in time, people appear on average smarter. Chess records are sufficiently precise so we can demonstrate this quantitatively with modern chess engines, which are now stronger than the best human players.

Chess is a game that requires a certain type of intellectual functioning, so it should not be surprising that the Flynn effect can be demonstrated using the automatic analysis of solving ability developed by Regan. As Flynn (2009) discusses, one needs to place changes in performance over time into a sociological and historical context to understand what might be driving these improvements. Flynn defines intelligence as

“having a mental tool kit that allows us to address a wider range of cognitive problems”

(p. 187). In an engaging analogy, he compares intellectual tasks with marksmanship.

Improved performance in marksmanship over time may partly depend on better reaction times or eyesight, but will be dominated by the development of better rifles. Similarly, 11 Running head: Chess and the Flynn effect

Flynn argues that the improved performance noted on intellectual tasks is most likely dominated by development of better “mental weaponry”.

Improvements in chess performance agree well with this model, with Nunn in particular drawing attention to the, by today’s standards, surprisingly impoverished conceptual understanding of early 20th century Grandmasters. The causal mechanisms which have driven the improvement require more study: obvious candidates include the general trend towards more abstract thinking described by Flynn, increased leisure time to devote to non-productive activities like chess, and more efficient dissemination of chess knowledge though books, magazines, and most recently electronic media. For now, we content ourselves with noting that the world of chess appears to offer a real life example of the Flynn effect, where there is solid evidence of improved cognitive functioning over time.

12 Running head: Chess and the Flynn effect

References

Beaujean, Alexander & Sheng, Yanyan (2014). Assessing the Flynn Effect in the

Wechsler Scales. Journal of Individual Differences, 35, 63-78. doi:10.1027/1614-

0001/a000128

Connors, Michael H., Burns, Bruce D., & Campitelli, Guillermo (2011). Expertise in

complex decision making: The role of search in chess 70 years after de Groot.

Cognitive Science, 35, 1567-1579. doi: 10.1111/j.1551-6709.2011.01196.x

Deary, Ian J., Whalley, Lawrence J. & Starr, John M. (2009). A lifetime of intelligence:

Follow-up studies of the Scottish Mental Surveys of 1932 and 1947. Washington

DC: American Psychological Association. doi: 10.1037/11857-000 de Bruin, Anique B., Kok, Ellen M., Leppink, Jimmie & Camp, Gino (2014). Practice,

intelligence, and enjoyment in novice chess players: A prospective study at the

earliest stage of a chess career. Intelligence, 45, 18-25.

doi: 10.1016/j.intell.2013.07.004 de Groot, Adriaan D., Gobet, Fernand, & Jongman, Riekent W. (1996). Perception and

memory in chess: Studies in the heuristics of the professional eye.

Assen/Maastricht: Van Gorcum.

Ferrari, Vincent, Didierjean, André, & Marmèche, Evelyne (2008). Effect of expertise

acquisition on strategic perception: The example of chess. Quarterly Journal of

Experimental Psychology, 61, 1265-1280. doi: 10.1080/17470210701503344

Flynn, J. R. (1984). The mean IQ of Americans: Massive gains 1932-1978.

Psychological Bulletin, 95, 29-51. doi: 10.1080/17470210701503344 13 Running head: Chess and the Flynn effect

Flynn, J. R. (2009). What is intelligence? Beyond the Flynn effect. New York, NY:

Cambridge University. doi: 10.1017/CBO9780511605253

Goldowsky, Howard (2014). “How To Catch A Chess Cheater: Ken Regan Finds Moves

Out Of Mind”. Chess Life. http://www.uschess.org/content/view/12677/763

Grabner, Roland H., Stern, Elsbeth, & Neubauer, Aljoscha C. (2007). Individual

differences in chess expertise: A psychometric investigation. Acta Psychologica,

124, 398-420. doi: 10.1016/j.actpsy.2006.07.008

Grégoire, Jacques, Daniel, Mark, Llorente, Antolin M. & Weiss, Lawrence C. (2016).

The Flynn effect and its clinical implications. In Lawrence Weiss, Donald

Saklofske, James Holdnack & Aurelia Prifitera (Eds.), WISC-V assessment and

interpretation: Scientist-practitioner perspectives. San Diego, CA: Elsevier

Academic Press (pp. 187-212). doi: 10.1016/B978-0-12-404697-9.00006-6

Howard, Robert W. (2001). Searching the real world for signs of rising population

intelligence. Personality and Individual Differences, 30, 1039-1058.

doi: 10.1016/S0191-8869(00)00095-7

Howard, Robert W. (2005). Objective evidence of rising population ability: A detailed

examination of longitudinal chess data. Personality and Individual Differences 38,

347-363. doi: 10.1016/j.paid.2004.04.013

Kanaya, Tomoe, & Ceci, Stephen (2012). The impact of the Flynn effect on LD

diagnoses in special education. Journal of Learning Disabilities, 45, 319-326.

doi: 10.1177/0022219410392044 14 Running head: Chess and the Flynn effect

Kane, H. D. (2000). A secular decline in Spearman’s g: Evidence from the WAIS,

WAIS-R and WAIS-III. Personality and Individual Differences, 29, 561-566.

doi: 10.1016/S0191-8869(99)00217-2

Kane, H. D. (2003). Straight talk about IQ and the death penalty. Ethics and Behavior,

13, 27-33. doi: 10.1207/S15327019EB1301_05

Mayes, Susan D., Calhoun, Susan L, Bixler, Edward O. & Zimmerman, Dennis N.

(2009). IQ and neuropsychological predictors of academic achievement.

Learning and Individual Differences, 19, 238-241.

doi: 10.1016/j.lindif.2008.09.001

Neisser, Ulric (Ed.) (1998). The Rising Curve: Long-term gains in IQ and related

measures. Washington DC: American Psychological Association.

doi: 10.1037/10270-000

Nunn, John. (2009). John Nunn's chess puzzle book. London: Gambit Publications.

Regan, Kenneth W. and Haworth, Guy M. (2011). Intrinsic Chess Ratings. Proceedings

of the Twenty-Fifth AAAI Conference on Artificial Intelligence, AAAI Press.

Trahan, Lisa H., Stuebing, Karla K., Fletcher, Jack M. & Hiscock, Merrill (2014). The

Flynn Effect: A meta-analysis. Psychological Bulletin, 140, 1332-1360.

doi: 10.1037/a0037173