measure for measure Unbridled mental power Artifcial intelligence is set to rival the human mind, just as the engine did the horse. José Hernández-Orallo looks at how we compare cognitive performance.

here was a time when horses were a it comes to quantifying mental power. For major source of physical power. When example, Levin’s universal search makes it Tthe steam engine started to rival them, possible to define the difficulty of any inversion manufacturers wanted to know how many task. From here, the capability of a system horses a particular engine would replace. can be defined as an integral of performance James Watt soon realized how important over a range of difficulties. In this way, both these comparisons were, and conceived a new difficulty and capability are measured on a measure: the horsepower. From discussions ratio scale with the same unit: the logarithm of with millwrights, who used horses to turn the number of computational steps4. This unit their wheels, one mechanical horsepower is ultimately commensurate to bits, under the was estimated to be 33,000 foot-pounds per two terms of Levin’s universal search. minute — the measure was a great success. This conceptually appealing formulation And now, as (AI) has some technical limitations. For example, emerges as an alternative source of mental without the choice of a reference machine, power, scientists are rethinking if and how Credit: Ievgen Chepil / Alamy Stock Vector the and the the mental capabilities of humans and logarithm of the number of computational machines can be measured. steps will depend on constants that need For the moment, humans still claim but the fact that it derives from human to be evaluated independently. One way superiority in mental power, but it’s clear populations. For obvious reasons, the notion to overcome those limitations might be that AI is becoming a rival. As in Watt’s of machine population in AI is thorny. Still, the further linking of computation and approach, one valuable comparison seems a bevy of AI competitions, benchmarks and information to physics. Indeed, there must be to be whether a particular AI system is more platforms have been recently introduced1. bounds between mental power and physical powerful than a ‘standard’ human, what is Progress is measured in terms of performance energy, and discovering them may shed light naively referred to as ‘human level’ machine on particular tasks, usually compared with on questions such as AI progress, intelligence intelligence. Perhaps project management some average human estimate. Cross-task growth, footprints on the environment and could lend us a measure: the ‘person month’. comparison remains elusive though, as many the effect of quantum computing on AI. Stimulating analogies aside, there are AI systems are specialized for a single task. By trying to identify the units of mental many differences between physical and Some would say that cognitive tasks cannot power and then linking them to physical mental work. The early psychometricians be reduced to a limited number of capabilities, units, we may well look eccentric from the pushed the analogy as far as they could, less so to a single one, because different tasks perspective of a thriving — and seemingly measuring intelligence as the capability of can never be compared. But others would unbridled — AI field. But like Watt two producing a particular kind of information- say that intelligence may have different centuries ago, sometimes we have to put the processing work. However, psychometric manifestations and a complex structure, a cart before the horse. ❐ measurement derives from human phenomenon that is common in physics. By populations. In many of its forms, it just looking at the performance of a system on José Hernández-Orallo1,2 captures a deviation from the mean, but not one set of tasks, we might be able to predict its 1Department of Computer Science, Universitat an actual magnitude. There is no such thing performance on a different set of tasks. Politècnica de València, Valencia, Spain. 2Leverhulme as an imperial foot for intelligence. The range between these two extremes — Centre for the Future of Intelligence, University of So back in the late eighteenth century, the importance of bias — is immanent Cambridge, Cambridge, UK. what Watt did was Copernican: he measured within . e-mail: [email protected] horses in terms of universal physical measures In this context, Ray Solomonoff’s prediction — feet, pounds and minutes. In Watt’s time, theory2 and Leonid Levin’s universal heuristics3 Published online: 2 January 2019 the understanding of the physical world was see Occam’s razor as a bias that emerges from https://doi.org/10.1038/s41567-018-0388-1 sufficiently mature to realize that the power algorithmic information theory — a possible needed in a mill could be compared to the foundation for computational measures of References power needed to boil a pot of water. intelligence. These theories are still insufficient 1. Castelvecchi, D. Nature 540, 323–324 (2016). 2. Solomonof, R. J. Inf. Control 7, 1–22 (1964). In contrast, even today, there is nothing like for nailing down such a measure, and they 3. Levin, L. A. in and Friends. Bayesian a unit for mental power, which is independent are superficially different from deep learning, Prediction and Artifcial Intelligence (ed. Dowe, D.L.) of both the human and the task. In fact, the the dominant paradigm in AI today. Still, they 53–54 (Springer, Berlin, Heidelberg, 2013). 4. Hernández-Orallo, J. Te Measure of all Minds: Evaluating main problem for adapting psychometrics to have more potential than any other current Natural and Artifcial Intelligence (Cambridge University Press, AI is not only the lack of a ratio-scale measure, computational theory of intelligence when Cambridge, 2017).

106 Nature Physics | VOL 15 | JANUARY 2019 | 106 | www.nature.com/naturephysics