<<

Ben Goertzel and Matthew Ikle’, with numerous colleagues

Introduction to Artificial General (incomplete, very preliminary draft)

May 4, 2015

Preface

Why a Text on AGI?

AI has always included what we would now call AGI as a significant component – in , arguably the founders of the AI field were more interested in human-like general intelligence than in narrow application-specific functionality. Since the early aughts, though, the urge has gradually risen among a community of researchers in the AI community to explicitly distinguish pursuits such as AGI and Human-Level AI from the application-specific or problem-specific AI that has come to dominate the AI field. A vibrant collection of intersecting research communities has arisen, dealing with various aspects of AGI theory and practice. For the student or researcher interested in getting up to speed on and developments related to AGI, however, there has not previously been anywhere really great to turn. There are several edited volumes covering AGI topics [? ], [? ], and then the AGI conference proceedings volumes; but while this is interesting and often high quality material, it is somewhat haphazard in its coverage. There seemed a need for a more systematic introduction to the issues and that preoccupy AGI researchers. Thus the motivation to put this book together.

Intended Audience

While there is not that much highly technical material here, the discussion is generally pitched at the reader who is already familiar with the basics of structures and , and mainstream “narrow AI” as is taught in a typical undergraduate AI course. Review of neu- roscience and cognitive science is provided (specifically targeted toward AGI), but review of undergrad is not. Thus, a natural audience for the book would be • Master students or final-year undergraduate students in computer science • Grad students or final-year undergrads in allied disciplines like neuroscience, cognitive sci- ence, mathematics or engineering – who have some basic familiarity with computer science and programming concepts • Professional programmers or scientists who are experienced reading advanced technical material in their own fields, and have seen narrow-AI applications here and there, and

v vi

would like to find out what AGI is all about at a level above popularizations, but without having to dig deeply into the research literature

Utilization in University Courses

The present book is intended to be useful in university courses, for instance it can serve as a key part of the curriculum for a semester or year course at the grad or final-year undergrad level focused on Artificial General Intelligence specifically. Such a course would naturally be placed after a standard “narrow AI” focused AI course, in a computer science or cognitive science curriculum. This book does not constitute a complete curriculum for such a class, however, because it lacks practical exercises. To form a compelling course on AGI, one option would be to couple the present text, which gives a broad overview of the concepts related to AGI, with appropriate tutorial material related to specific AGI-related software systems. This could be done in many different ways, and this is an area where different instructors will likely want to exercise their own creativity. One specific option, for instance, would be to couple this book with: • Hands-on work with the tutorials supplied with a few different AGI-related systems. LIDA, ACT-R and Soar have fairly advanced tutorials, and OpenCog’s tutorials are coming along. • Guided implementation of simple AGI-oriented learning agents, for instance as done in Olivier Georgeon’s Developmental AI MOOC http://liris.cnrs.fr/ideal/mooc/ Thoroughly going through the tutorials of any of the above-mentioned systems would take students a substantial amount of time (a couple dozen hours per system, at least). So for instance a one-year course on AGI, with 4 hours of instruction per week, could potentially have 2 hours/week of theoretical instruction based on the present text, and 2 hours/week of lab sessions involving hands-on work.

The Origins of this Book, and the Wonderful and Frustrating Diversity of the AGI Field

This is a multi-author textbook and hence owes a lot to a lot of people. However the original of putting it together was due to Ben Goertzel, the lead editor, and the rest of this Preface is written from Ben’s point of view... I got the idea to write or edit a textbook of some sort on AGI in 2012 or so. At that point, I got as far as making a table of contents, and emailing a bunch of colleagues soliciting chapters. I got some nice material back via email, but didn’t get much further, due to being overwhelmed with other projects. The project smoldered in the background though, and in spare moments here and there I put together the material that eventually became Chapters2,1 and ??. What stimulated me to finally bite the bullet and put together a draft of the book was the decision to accept an opportunity to help out with a new MS program in AI, to be offered at the Addis Ababa Institute of Technology, in Ethiopia. I wanted to teach a course of AGI, as part of vii this MS program. And, while I could do an AGI course using a hodgepodge of articles as course materials, this was a definite near-term use-case for the long-germinating AGI textbook. Over the years the idea was germinating, when I mentioned the “AGI textbook” idea to other AGI researchers, the most common reaction was a mixture of interest, with acknowledgement of what a huge challenge such an undertaking would be. The most challenging aspect commonly remarked upon was the sheer diversity of perspectives and ideas in the AGI field. As one friend and fellow researcher put it, "No two AGI researchers could possibly agree on what an AGI textbook should contain." Of course, while this isn’t quite literally true, it does contain a semblance of reality. The AGI field is somewhat disparate and disorganized, as befits its early stage of development. But nevertheless, it seems important to ease the path for students and young researchers to get into the field – so they can contribute their own perspectives to the mix, as well as push new perspectives forward. My hope, and that of my co-editor Matt Ikle’ and the other contributors, is that this book can be useful in this regard.

Ben Goertzel Hong Kong, 2015

Contents

Section I The Past, Present and Future of AGI

1 Overview of the AGI Field ...... 3 1.1 Introduction...... 3 1.2 AGI versus Narrow AI...... 3 1.3 The Emergence of an AGI Community...... 4 1.3.1 AGI and Related Concepts...... 5 1.4 Perspectives on General Intelligence...... 5 1.4.1 The Pragmatic Approach to Characterizing General Intelligence...... 6 1.4.2 Psychological Characterizations of General Intelligence...... 6 1.4.3 A Mathematical Approach to Characterizing General Intelligence...... 7 1.4.4 The Adaptationist Approach to Characterizing General Intelligence.....7 1.4.5 Broadly Suspected Aspects of General Intelligence...... 7 1.5 Current Scope of the AGI Field...... 8 1.5.1 Universal AI...... 8 1.5.2 Symbolic AGI...... 10 1.5.3 Emergentist AGI...... 10 1.5.4 Hybrid AGI...... 12 1.6 Future of the AGI Field...... 13

2 A Brief History of AI and AGI ...... 15 2.1 Introduction...... 15 2.2 The Prehistory of AI...... 17 2.3 1600s-1800s: Mechanical Calculators, and Models of Thought as Calculation... 18 2.4 Turn of the 20th Century: Maturation of the View of Human and Artificial Thought as Complex Mechanical Operations...... 21 2.5 Mid 20th Century: The Birth of Electronic Computing and Modern AI Technology...... 22 2.6 Late 1950s - Early 1970s: Emergence and Flourishing of AI as a Discipline.... 26 2.7 Mid 1970s - early 80s: Having Failed to Progress as Fast as Hoped, AI Experiences a Funding Winter, but also a Host of New Ideas...... 29 2.8 Mid 1980s - early 90s: AI Funding for Expert Systems Rises and Falls; Connectionist, Probabilistic and Subsumption Approaches Surge...... 30

ix x Contents

2.9 Fueled by Powerful Computers and Big Data, Narrow AI Technology Finds Broad Practical Applications...... 33 2.9.1 Dramatic Progress in Neuroscience, with Limited Implications for AI... 34 2.10 2004-2012: While Narrow AI Tech Further Pervades Industry, a Trend back toward AGI / Human Level AI R&D Emerges...... 35

3 Artificial General Intelligence: , State of the Art and Future Prospects ...... 47 3.1 Introduction...... 47 3.1.1 What is General Intelligence?...... 48 3.1.2 The Core AGI Hypothesis...... 49 3.1.3 The Scope of the AGI Field...... 50 3.2 Characterizing AGI and General Intelligence...... 51 3.2.1 AGI versus Human-Level AI...... 52 3.2.2 The Pragmatic Approach to Characterizing General Intelligence...... 52 3.2.3 Psychological Characterizations of General Intelligence...... 53 3.2.4 A Cognitive-Architecture Perspective on General Intelligence...... 56 3.2.5 A Mathematical Approach to Characterizing General Intelligence...... 57 3.2.6 The Adaptationist Approach to Characterizing General Intelligence..... 58 3.2.7 The Embodiment Focused Approach to Characterizing General Intelligence...... 59 3.3 Approaches to Artificial General Intelligence...... 60 3.3.1 Symbolic AGI Approaches...... 60 3.3.2 Emergentist AGI Approaches...... 62 3.3.3 Hybrid AGI Architectures...... 67 3.3.4 The Universalist Approach to AGI...... 70 3.4 Structures Underlying Human-Like General Intelligence...... 71 3.5 Metrics and Environments for Human-Level AGI...... 78 3.5.1 Metrics and Environments...... 79 3.5.2 Quantifying the Milestone of Human-Level AGI...... 79 3.5.3 Measuring Incremental Progress Toward Human-Level AGI...... 80 3.6 What Would a General Theory of General Intelligence Look Like?...... 84 3.7 Conclusion...... 85

4 Mapping the Landscape of AGI ...... 87 4.1 Introduction...... 87 4.2 Mapping the Landscape of AGI...... 88 4.3 Tasks and Environments for Early-Stage AGI Systems...... 88 4.3.1 Environments and Embodiments for AGI...... 89 4.3.2 The Breadth of Human Competencies...... 89 4.3.3 Supporting Diverse Concurrent Research Efforts...... 98 4.4 Scenarios for Assessing AGI...... 98 4.4.1 General Video-game Learning...... 98 4.4.2 Preschool Learning...... 100 4.4.3 Reading Comprehension...... 101 4.4.4 Story/Scene Comprehension...... 102 4.4.5 School Learning...... 103 Contents xi

4.4.6 The Wozniak Test...... 104 4.4.7 Remaining Whitespace...... 105 4.5 From Scenarios to Tasks, Metrics and Challenges...... 105 4.5.1 Example Tasks...... 106 4.5.2 Multidimensional Challenges...... 106 4.5.3 Challenges and Competitions...... 106 4.6 Roadmapping as an Ongoing Process...... 107

Section II Background from Neural and Cognitive Science

5 A Neuroscience Primer ...... 111 5.1 Introduction...... 111 5.2 A Brief History of Neuroscience...... 112 5.2.1 The behaviorist’s brain...... 116 5.2.2 The computing brain...... 117 5.2.3 The dynamic brain...... 117 5.3 The Embryology and Functional Organization of the Nervous System...... 119 5.3.1 The formation of the major divisions of the nervous system...... 119 5.3.2 The nervous system...... 120 5.4 Functional anatomy of the Central Nervous System...... 121 5.4.1 The spinal cord...... 121 5.4.2 The hindbrain...... 121 5.4.3 The midbrain...... 121 5.4.4 The forebrain...... 122 5.5 Neurons and Their Interactions...... 125 5.5.1 Neurons...... 125 5.5.2 Glia...... 129 5.5.3 Mirror neurons...... 129 5.6 Vision and Sensory Input...... 130 5.6.1 The eye...... 131 5.6.2 The brain...... 132 5.6.3 Contemporary research...... 135 5.7 Motor Control of the Body...... 135 5.7.1 Overview of the motor control system of the brain...... 137 5.7.2 The spinal pathways...... 138 5.7.3 The frontal lobe...... 138 5.8 Biological Rhythms...... 139 5.8.1 Circadian rhythms...... 139 5.8.2 Ultradian rhythms...... 142 5.9 Drive States and Emotion...... 144 5.9.1 Biological drives...... 144 5.9.2 Emotions...... 148 5.10 Learning and Memory...... 150 5.10.1 Where is memory?...... 151 5.10.2 The time course of memory formation...... 155 5.11 Language and ...... 157 5.11.1 Language centers...... 159 xii Contents

5.11.2 Consciousness and the brain...... 161 5.11.3 Why consciousness?...... 163 5.11.4 The enigma of consciousness...... 164 5.12 Pathology and the Brain...... 165 5.12.1 Depression...... 166 5.12.2 Alzheimer’s disease...... 167 5.12.3 The schizophrenias...... 168 5.12.4 Altered States of Consciousness (ASCs)...... 168 5.13 Techniques for imaging the brain...... 169 5.13.1 Functional Magnetic Resonance Imaging (fRMI)...... 169 5.13.2 Direct Electrode Recordings...... 169 5.14 Conclusion...... 170

6 Essentials of Cognitive Science ...... 171

7 Mapping into Brain ...... 173

8 Essentials of Cognitive Development ...... 175

9 Dynamic Global Workspace Theory ...... 177

10 Perspectives on Human and Machine Consciousness ...... 179 10.1 Introduction...... 179 10.2 Aspects of Human Consciousness...... 181 10.2.1 Hard and Possibly Less Hard Problems Regarding Consciousness...... 182 10.2.2 Degrees of Consciousness...... 183 10.2.3 Specific Subprocesses of Human Consciousness...... 184 10.2.4 Dynamic Global Workspace Theory...... 184 10.2.5 Consciousness as a Nonlinear-Dynamical Process...... 187 10.2.6 Consciousness and Attention...... 188 10.2.7 States of Consciousness...... 189 10.2.8 Tononi’s Integrated Information Measure...... 191 10.2.9 Self-Modeling, Reflection and Self-Awareness...... 192 10.3 Toward a Unified Model of Human and Human-Like Consciousness...... 193 10.3.1 Measuring Human-Like Consciousness Multifactorially...... 193 10.3.2 Measuring Consciousness in the Human Brain...... 194 10.3.3 Human-Like Consciousness in LIDA and OpenCog...... 195 10.3.4 Human-Like Consciousness in the Global Brain...... 196

11 Motivation and Intelligence ...... 199

Section III Key Concepts for AGI

12 Natural Language , Generation and Interaction ...... 203

13 Logical and Probabilistic Inference ...... 205 Contents xiii

14 Complex Dynamics and Self-Organization in Intelligent Systems ...... 207 14.1 Introduction...... 207 14.2 Dynamical Systems Concepts: Basic Concepts and History...... 208 14.2.1 What Are Dynamical Systems?...... 208 14.2.2 Basic Definitions...... 209 14.3 Discrete Dynamical Systems...... 210 14.3.1 Graphical Tools...... 210 14.4 Continuous Dynamical Systems...... 214 14.4.1 Analysis of Dynamical Systems...... 215 14.5 Discrete Dynamical Systems...... 216 14.6 Neuron Models...... 217 14.6.1 Hodgkin Huxley equation...... 217 14.6.2 The Izhikevich neuron...... 219 14.6.3 formal neuron as used in CS style NN...... 219 14.7 Neural Networks...... 219 14.7.1 Hopfield Networks...... 219 14.8 Evolutionary Learning...... 219 14.8.1 Genetic Algorithms...... 220 14.8.2 ...... 220 14.8.3 Estimation of Distribution Algorithms...... 220 14.9 Economic Attention Networks...... 220 14.9.1 Updating Equations...... 221 14.9.2 ECAN and Cognitive Synergy...... 223

15 Deep Learning: Principles and Practices ...... 225 15.1 Convolutional Neural Networks (CNNs)...... 225

16 Algorithmic and General Intelligence ...... 227

Section IV Example AGI Architectures

17 The LIDA Architecture ...... 231 17.1 Introduction...... 231 17.2 Background...... 234

18 The MicroPsi Architecture ...... 235

19 The CogPrime Architecture and OpenCog System ...... 237 19.1 Introduction...... 237 19.1.1 An Integrative Approach...... 237 19.1.2 Key Claims...... 238 19.2 CogPrime and OpenCog...... 240 19.2.1 Current and Prior Applications of OpenCog...... 241 19.2.2 Transitioning from Virtual Agents to a Physical ...... 242 19.3 Philosophical Background...... 243 19.3.1 A Mind-World Correspondence Principle...... 248 19.4 High-Level Architecture of CogPrime...... 249 19.5 Local and Global Knowledge Representation...... 250 xiv Contents

19.5.1 Weighted, Labeled Hypergraphs...... 251 19.5.2 Associative Links...... 258 19.5.3 Procedure Nodes...... 259 19.5.4 Links for Special External Data Types...... 260 19.5.5 Glocal Memory...... 261 19.6 Memory Types and Associated Cognitive Processes in CogPrime...... 263 19.6.1 Cognitive Synergy in PLN...... 264 19.7 Goal-Oriented Dynamics in CogPrime...... 266 19.7.1 Analysis and Synthesis Processes in CogPrime...... 267 19.8 Clarifying the Key Claims...... 268 19.8.1 Multi-Memory Systems...... 269 19.8.2 Perception, Action and Environment...... 270 19.8.3 Developmental Pathways...... 271 19.8.4 Knowledge Representation...... 272 19.8.5 Cognitive Processes...... 272 19.8.6 Fulfilling the “Cognitive Equation”...... 277 19.8.7 Occam’s Razor...... 277 19.8.8 Cognitive Synergy...... 278 19.8.9 Emergent Structures and Dynamics...... 280 19.9 Measuring Incremental Progress Toward Human-Level AGI...... 281 19.9.1 Competencies and Tasks on the Path to Human-Level AI...... 282 19.10A CogPrime Thought Experiment: Build Me Something I Haven’t Seen Before. 290 19.10.1Let the Semi-Narrative Begin...... 290 19.10.2Conclusion...... 293 19.11Broader Issues...... 294 19.11.1Ethical AGI...... 295 19.11.2Toward Superhuman General Intelligence...... 296 19.11.3Conclusion...... 296

20 The Aera Architecture...... 305 20.1 Introduction...... 305 20.2 Conclusion...... 305

A Glossary ...... 307 A.1 List of Specialized Acronyms...... 307 A.2 Glossary of Specialized Terms...... 308 References...... 325 Section I The Past, Present and Future of AGI

Chapter 1 Overview of the AGI Field

Ben Goertzel

1.1 Introduction

The term "Artificial General Intelligence" (often abbreviated "AGI") has no broadly accepted precise definition, but has multiple closely related meanings, e.g. • the capacity of an engineered system to – display the same rough sort of general intelligence as humans; or, – display intelligence that is not tied to a highly specific set of tasks; or – generalize what it has learned, including generalization to contexts qualitatively very different than those it has seen before; or, – take a broad view, and interpret its tasks at hand in the context of the world at large and its relation thereto • an engineered system displaying the property of artificial general intelligence, to a significant degree • the theoretical and practical study of artificial general intelligence systems and methods of creating them

1.2 AGI versus Narrow AI

The original founders of the AI field, in the 1950s and 60s, were largely concerned with the creation of hardware or software emulating human-like general intelligence. Since that time, the field has come to focus instead largely on the pursuit of discrete capabilities or specific practical tasks. This approach has yielded many interesting technologies and theoretical results, yet has proved relatively unsuccessful so far in terms of the original central goals of the field. Thus, some researchers have come to prefer the term and concept of "AGI", in order to distinguish the pursuit of general intelligence from more narrowly-focused associated pursuits (Goertzel and Pennachin, 2005). A dichotomy has sometimes been drawn between AGI and "narrow AI" (Goertzel and Pen- nachin, 2005). For example, Kurzweil (1999) contrasted "narrow AI" with "strong AI" – using the former to refer to the creation of systems that carry out specific ÓintelligentÓ behaviors

3 4 1 Overview of the AGI Field in specific contexts, and the latter to refer essentially to what is now called AGI. For a nar- row AI system, if one changes the context or the behavior specification even a little bit, some level of human reprogramming or reconfiguration is generally necessary to enable the system to retain its level of intelligence. Qualitatively, this seems quite different from natural generally intelligent systems like humans, which have a broad capability to self-adapt to changes in their goals or circumstances, performing Ótransfer learningÓ to generalize knowledge from one goal or context to others. The precise definition or characterization of AGI is one of the subjects of study of the AGI research field. However, it is broadly accepted that, given realistic space and time resource constraints, human beings do not have indefinite generality of intelligence; and for similar reasons, no realistically is going to have indefinite generality. Human intelligence combines a certain generality of scope, with various highly specialized aspects aimed at providing efficient processing of pragmatically important problem types; and real-world AGI systems are going to mix generality and specificity in their own ways.

1.3 The Emergence of an AGI Community

The emergence of a distinct community focused on AGI has been a gradual process, that has largely coincided with an increase in the legitimacy accorded to explicitly AGI-focused research within the AI community as a whole. In 2005, Springer published an edited volume titled "Artificial General Intelligence" (Goertzel and Pennachin, 2005). In 2006, the first formal research workshop on "AGI" was held, in Bethesda, Maryland (Goertzel, 2013). In the subsequent years, a broad community of researchers united by the explicit pursuit of AGI and related concepts has emerged, as evidenced e.g. by conference series such as Artificial General Intelligence (AGI), Biologically Inspired Cognitive Architectures (BICA), and Advances in Cognitive Systems; and by numerous special tracks and symposia at major conferences such as AAAI and IEEE, focused on closely allied topics such as Human-Level Intelligence and Integrated Intelligence. There is also a Journal of AGI. The AGI community, from the start, has involved researchers following a number of different directions, including some building cognitive architectures inspired by and neurobiology; and also some focused on deriving mathematical results regarding formalizations of general intelligence (thus, among other things, building bridges between AGI and other formal pursuits such as theoretical computer science and statistical decision theory). Each of the subcommunities involved has brought its own history, e.g. some AGI cognitive architecture work extends ideas from classic AI cognitive architectures such as SOAR (Laird, 2012) and GPS (Newell et al, 1959), some extends work from evolutionary computing, etc. The mathematical side of contemporary AGI draws heavily on foundational work by Ray Solomonoff (1964) and other early pioneers of formal intelligence theory. While the qualitative commonality among the various research directions pursued in AGI community is relatively clear, there have not yet been any broadly successful attempts to clarify core hypotheses or conclusions binding the various threads of the AI field. As one effort along these lines, Goertzel (2014) has articulated a Ócore AGI hypothesisÓ, namely that ”the creation and study of synthetic with sufficiently broad (e.g. human-level) scope and strong generalization capability, is at bottom ’qualitatively different’ from the 1.4 Perspectives on General Intelligence 5 creation and study of synthetic intelligences with significantly narrower scope and weaker gen- eralization capability” This is intended as a statement on which nearly all researchers in the AGI community will agree, regardless of their different conceptualizations of the AGI concept and their different architectural, theoretical, technical and engineering approaches. However, much more precise propositions than this will need to attain broad agreement among researchers, for the AGI field to be considered theoretically unified.

1.3.1 AGI and Related Concepts

AGI is related to many other terms and concepts commonly used. Joscha Bach has characterized AGI in terms of the quest to create Ósynthetic intelligence." (Bach, 2009). One also finds communities of researchers working toward AGI-related goals under the labels Ócomputational intelligence", Ónatural intelligence", Ócognitive architecture", Óbiologically inspired cognitive architectureÓ, and many others. AGI is related to, yet far from identical to, Óhuman-level AIÓ (Cassimatis, 2006) – a term which is usually used to mean, in effect, Óhuman-level, reasonably human-like AGIÓ. AGI is a fairly abstract notion, which is not intrinsically tied to any particular characteristics of human beings beyond their general intelligence. On the other hand, the concept of Óhuman-level AGIÓ is openly anthropomrophic, and seeks to compare synthetic intelligences to human beings along an implicit lineal scale, a notion that introduces its own special complexities. If a certain AGI system is very different than humans, it may not be easy to assess in what senses it resides on the same level as humans, versus above or below. On the other hand, if one’s goal is to create AGI systems that resemble humans, it could be argued that thinking about hypothetical radically different AGI systems is mainly a distraction. The narrower focus of the "human level AI" concept, as opposed to AGI, seems to have positives and negatives, which are complex to disentangle given the current state of knowledge..

1.4 Perspectives on General Intelligence

The AGI field contains a number of different, largely complementary approaches to under- standing the Ògeneral intelligenceÓ concept. While the bulk of the AGI communityÕs effort is devoted to devising and implementing designs for AGI systems, and developing theories re- garding the best way to do so, the formulation of a detailed and rigorous theory of Ówhat AGI isÓ also constitutes a small but significant part of the communityÕs ongoing research. The lack of a clear, universally accepted definition is not unique to "AGI." For instance, ÒAIÓ also has many different meanings within the AI research community, with no clear consensus n the definition. ÒIntelligenceÓ is also a fairly vague concept; Legg and Hutter wrote a paper summarizing and organizing over 70 different published definitions of ÓintelligenceÓ, most oriented toward general intelligence, emanating from researchers in a variety of disciplines (Legg and Hutter, 2007). Four key approaches to conceptualizing the of GI and AGI are outlined below. 6 1 Overview of the AGI Field 1.4.1 The Pragmatic Approach to Characterizing General Intelligence

The pragmatic approach to conceptualizing general intelligence is typified by the AI Magazine article ÓHuman Level Artificial Intelligence? Be Serious!Ó, written by Nils Nillsson, one of the early leaders of the AI field (Nilsson, 2005) . NillssonÕs view is ”... that achieving real Human Level artificial intelligence would necessarily imply that most of the tasks that humans perform for pay could be automated. Rather than work toward this goal of automation by building special-purpose systems, I argue for the dev elopment of general- purpose, educable systems that can learn and be taught to perform any of the thousands of jobs that humans can perform. Joining others who have made similar proposals, I advocate beginning with a system that has minimal, although extensive, built-in capabilities. These would have to include the ability to improve through learning along with many other abilities.” In this perspective, once an AI obsoletes humans in most of the practical things we do, it should be understood to possess general Human Level intelligence. The implicit assumption here is that humans are the generally intelligent system we care about, so that the best practical way to characterize general intelligence is via comparison with human capabilities. The classic for machine intelligence (Turing, 1955) Ð simulating human con- versation well enough to fool human judges Ð is pragmatic in a similar sense to Nillsson’s perspective. But the Turing test has a different focus, on emulating humans. Nillsson isnÕt interested in whether an AI system can fool people into think itÕs a human, but rather in whether an AI system can do the useful and important practical things that people can do.

1.4.2 Psychological Characterizations of General Intelligence

The psychological approach to characterizing general intelligence also focuses on human-like general intelligence; but rather than looking directly at practical capabilities, it tries to isolate deeper underlying capabilities that enable these practical capabilities. In practice it encompasses a broad variety of sub-approaches, rather than presenting a unified perspective. Viewed historically, efforts to conceptualize, define, and measure intelligence in humans reflect a distinct trend from general to specific (it is interesting to note the similarity to historical trends in AI) . Thus, early work in defining and measuring intelligence was heavily influenced by Spearman, who in 1904 proposed the psychological factor g (the Óg factorÓ, for general intelligence. Spearman argued that g was biologically determined, and represented the overall intellectual skill level of an individual. In 1916, Terman introduced the notion of an intelligence quotient or IQ, which is computed by dividing the test-takerÕs mental age (i.e., their age- equivalent performance level) by their physical or chronological age). In subsequent years, though, psychologists began to question the concept of intelligence as a single, undifferentiated capacity. There emerged a number of alternative theories, definitions, and measurement approaches, which share the idea that intelligence is multifaceted and variable both within and across individuals. Of these approaches, a particularly well-known example is GardnerÕs (1983) theory of multiple intelligences, which proposes eight distinct forms or types of intelligence: (1) linguistic, (2) logical-mathematical, (3) musical, (4) bodily-kinesthetic, (5) spatial, (6) interpersonal, (7) intrapersonal, and (8) naturalist. 1.4 Perspectives on General Intelligence 7 1.4.3 A Mathematical Approach to Characterizing General Intelligence

In contrast to approaches focused on human-like general intelligence, some researchers have sought to understand general intelligence in general. The underlying here is that • Truly, absolutely general intelligence would only be achievable given infinite computational ability. For any computable system, there will be some contexts and goals for which itÕs not very intelligent. • However, some finite computational systems will be more generally intelligent than others, and itÕs possible to quantify this extent This approach is typified by the recent work of Legg and Hutter (2007a), who give a for- mal definition of general intelligence based on the Solomonoff-Levin prior, building heavily on the foundational work of Hutter (2005). Put very roughly, they define intelligence as the aver- age reward-achieving capability of a system, calculated by averaging over all possible reward- summable environments, where each environment is weighted in such a way that more compactly describable programs have larger weights. According to this sort of measure, humans are nowhere near the maximally generally intelli- gent system. However, intuitively, such a measure would seem to suggest that humans are more generally intelligent than, say, rocks or worms. While the original form of Legg and HutterÕs definition of intelligence is impractical to compute, there are also more tractable approxima- tions.

1.4.4 The Adaptationist Approach to Characterizing General Intelligence

Another perspective views general intelligence as closely tied to the environment in which it exists. Pei Wang has argued carefully for a conception of general intelligence as Óadaptation to the environment using limited resourcesÓ (Wang, 2006). A system may be said to have greater general intelligence, if it can adapt effectively to a more general class of environments, within realistic resource constraints.

1.4.5 Broadly Suspected Aspects of General Intelligence

Variations in perspective aside, there is reasonably broad agreement in the AGI community on some key likely features of general intelligence: • General intelligence involves the ability to achieve a variety of goals, and carry out a variety of tasks, in a variety of different contexts and environments • A generally intelligent system should be able to handle problems and situations quite dif- ferent from those anticipated by its creators • A generally intelligent system should be good at generalizing the knowledge itÕs gained, so as to transfer this knowledge from one problem or context to others 8 1 Overview of the AGI Field

• Arbitrarily general intelligence is likely not possible given realistic resource constraints • Real-world systems may display varying degrees of limited generality, but are inevitably going to be a lot more efficient at learning some sorts of things than others; and for any given real-world system, there will be some learning tasks on which it is unacceptably slow. So real-world general intelligences are inevitably somewhat biased toward certain sorts of goals and environments. • Humans display a higher level of general intelligence than existing AI programs do, and apparently also a higher level than other animals • It seems quite unlikely that humans happen to manifest a maximal level of general intelli- gence, even relative to the goals and environment for which they have been evolutionarily adapted There is also a common intuition in much of the AGI community that various real-world general intelligences will tend to share certain common properties; though there is less agreement on what these properties are. A 2008 workshop on Human-Level AI resulted in a paper by Laird and Wray enumerating one proposed list of such properties (Laird et al, 2008); a 2009 workshop on AGI resulted in an alternative, more extensive list, articulated in a multi-author paper published in AI Magazine (Adams et al, 2012)

1.5 Current Scope of the AGI Field

Wlodek Duch, in his survey paper (Duch, 2008), divided existing approaches to AI into three Ð symbolic, emergentist and hybrid. To this trichotomy we here add one additional category, Óuniversal." Due to the diversity of AGI approaches, it is difficult to find truly com- prehensive surveys; Samsonovich (2010) is perhaps the most thorough but is by no means complete.

1.5.1 Universal AI

In the universal approach, one starts with AGI algorithms or agents that would yield incredibly powerful general intelligence if supplied with massively, unrealistically much computing power, and then views practically feasible AGI systems as specializations of these powerful theoretic systems. The path toward universal AI began in earners with Solomonoff’s (1964) universal predictors, which provide a rigorous and elegant solution to the problem of sequence prediction, founded in the theory of algorithmic information (also known as Kolmogorov Complexity (Kolmogorov, 1965; Li and Vitanyi, 1997). The core idea here (setting aside certain technicalities) is that the shortest program computing a sequence, provides the best predictor regarding the continuation of the sequence. Hutter’s (2005) work on AIXI extends this approach, applying the core idea of Solomonoff induction to the problem of controlling an agent carrying out actions in, and receiving reinforcement signals from, a computable environment. In an abstract sense, AIXI is the optimally intelligent agent in computable environments. In a bit more detail, what AIXI does is to maximize expected reward over all possible future perceptions created by all possible environments q that are consistent with past perceptions. 1.5 Current Scope of the AGI Field 9

The expectation over environments is weighted, where the simpler an environment, the higher is its weight 2−l(q), where simplicity is measured by the length l of program q. AIXI effectively learns by eliminating Turing machines q once they become inconsistent with the progressing history AIXI is uncomputable, but HutterÕs AIXItl is a computable approximation that involves, at each step in its Òcognitive cycleÓ, a search over all programs of length less than I and runtime less thant Conceptually, AIXItl may be understood roughly as follows: • An AGI system is going to be controlled by some program • Instead of trying to figure out the right program via human wizardry, we can just write a Ómeta-algorithmÓ to search program space, and automatically find the right program for making the AGI smart, and then use that program to operate the AGI. • We can then repeat this meta-algorithm over and over, as the AGI gains more data about the world, so it will always have the operating program thatÕs best according to all its available data.

AIXItl is a precisely defined "meta-algorithm" of this nature. Related systems have also been formulated, including one due to Schmidhuber (2002) that is based on the Speed Prior, which takes into account program runtime in way that is optimal in a certain sense. The Universal AI research program also involves blueprints of universal problem solvers for arbitrary computable problems, that are time-optimal in various theoretical senses. These in- clude Levin’s (1973) asymptotically optimal Universal Search, which has constant multiplicative overhead [? ], and its incremental extension, the Optimal Ordered Problem Solver, which can greatly reduce the constant overhead by re-using previous successful programs (Schmidhuber et al, 2004); as well as Hutter’s (2002) asymptotically optimal method, which will solve any well-defined problem as quickly as the unknown fastest way of solving it, save for an additive constant overhead that becomes negligible as problem size grows (this method is related to AIXItl). Self-improving universal methods have also been defined, including some that justify self- changes (including changes of the learning algorithm) through in a lifelong learning context (Schmidhuber et al, 1997), and the Gödel Machine (Schmidhuber, 2006) that self-improves via proving theorems about itself, and can improve any part of its software (in- cluding the learning algorithm itself) in a way that is provably time-optimal in a sense that takes constant overheads into account and goes beyond asymptotic optimality. At each step of the way, the Gödel Machine takes the action that it can prove, according to its axiom system and its perceptual data, will be the best way to achieve its goals. Like AIXI, this is uncom- putable in the most direct formulation, and computable but probably intractable in its most straightforward simplified formulations. In the perspective of universal AI, the vast majority of computationally feasible problems are "large" in the sense that they exist in the regime where asymptotic optimality is relevant; the other "small" problems are relatively few in number. However, it seems that many (perhaps all) of the problems of practical everyday interest to humans are "small" in this sense, which would imply that reduction in the overhead of the universal methods mentioned above is critical for practical application of universal AI. There has been work in this direction, dating back at least to (Schmidhuber et al, 1991) , and including recent work such as (Schmidhuber et al, 2013; Veness et al, 2011). 10 1 Overview of the AGI Field 1.5.2 Symbolic AGI

Attempts to create or work toward AGI using symbolic reasoning systems date back to the 1950s and continue to the current day, with increasing sophistication. These systems tend to be created in the spirit of the "physical symbol system hypothesis" (Newell and Simon, 1976), which states that exist mainly to manipulate symbols that represent aspects of the world or themselves. A physical symbol system has the ability to input, output, store and alter symbolic entities, and to execute appropriate actions in order to reach its goals. In 1956, Newell and Simon (1956) built a program, Theorist, that discovers proofs in propositional logic. This was followed up by the (Newell, 1963) that attempted to extend Logic Theorist type capabilities to commonsensical problem-solving. At this early stage, it became apparent that one of the key difficulties facing symbolic AI was how to represent the knowledge needed to solve a problem. Before learning or problem solving, an agent must have an appropriate symbolic language of formalism for the learned knowledge. A variety of representations were proposed, including complex logical formalisms (McCarthy and Hayes, 1969), semantic frames as proposed by Minsky (1975), and simpler feature-based representations. Early symbolic AI work led to a number of specialized systems carrying out practical func- tions. Winograd’s SHRDLU system (1972) could, using restricted natural language, discuss and carry out tasks in a simulated blocks world. CHAT-80 could answer geographical questions placed to it in natural language (Warren and Pereira, 1982). DENDRAL , developed from 1965 to 1983 in the field of organic chemistry, proposed plausible structures for new organic com- pounds (Buchanan and Feigenbaum, 1978). MYCIN, developed from 1972 to 1980, diagnosed infectious diseases of the blood, and prescribed appropriate antimicrobial therapy (Buchanan and Shortliffe, 1984). However, these systems notably lacked the ability to generalize, perform- ing effectively only in the narrow domains for which they were engineered. Modern symbolic AI systems seek to achieve greater generality of function and more robust learning ability via sophisticated cognitive architectures. Many such cognitive architectures fo- cus on Òworking memoryÓ that draws on long-term memory as needed, and utilize a centralized control over perception, and action. Although in principle such architectures could be arbitrarily capable (since symbolic systems have universal representational and computational power, in theory), in practice symbolic architectures tend to be less developed in learning, creativity, procedure learning, and episodic memory. Leading examples of symbolic cognitive architectures include ACT-RT (Anderson et al, 2004), originally founded on a model of human semantic memory; Soar (Laird, 2012), which is based on the application of production systems to solve problems defined as residing in various problem spaces, and which has recently been extended to include perception, episodic memory, and a variety of other cognitive functions; and Sigma, which applies many of Soar’s architectural ideas using a probabilistic network based knowledge representation (Rosenbloom, 2013)

1.5.3 Emergentist AGI

Another species of AGI design expects abstract symbolic processing Ð along with every other aspect of intelligence Ð to emerge from lower-level ÒsubsymbolicÓ dynamics, which sometimes (but not always) are designed to simulate neural networks or other aspects of human brain 1.5 Current Scope of the AGI Field 11 function. TodayÕs emergentist architectures are sometimes very strong at recognizing patterns in high-dimensional data, and associative memory; but no one has yet shown how to achieve high-level functions such as abstract reasoning or complex language processing using a purely subsymbolic, emergentist approach. The broad concepts of emergentist AI can be traced back to Norbert Wiener’s Cybernetics (1948), and more directly to the 1943 work of McCulloch and Pitts (1943), which showed how networks of simple thresholding "formal neurons" could be the basis for a Turing-complete machine. In 1949, Donald Hebb wrote The Organization of Behavior (Hebb, 1949), pointing out the fact that neural pathways are strengthened each time they are used, a concept now called "Hebbian learning", conceptually related to long-term potentiation in the brain and to a host of more sophisticated reinforcement learning techniques (Sutton and Barto, 1998). In the 1950s practical learning algorithms for formal neural networks were articulated by Marvin Minsky (1952) and others. Rosenblatt (1958) designed "Perceptron" neural networks, and Widrow and Hoff (1962) presented a systematic neural net learning procedure that was later labeled "back-propagation." These early neural networks showed some capability to learn and generalize, but were not able to carry out practically impressive tasks, and interest in the approach waned during the 1970s. An alternate approach to emergentist AI that emerged in the 1970s was evolutionary com- puting, centered on the , a computational model of evolution by natural se- lection. John Holland’s learning classifier system combined reinforcement learning and genetic algorithms into a cognitive architecture with complex, self-organizing dynamical properties (Holland, 1975). A learning classifier system consists of a population of binary rules on which a genetic algorithm (roughly simulating an evolutionary process) alters and selects the best rules. Rule fitness is based on a reinforcement learning technique. In 1982, broad interest in neural net based AI began to resume, triggered partly by a paper by John Hopfield of Caltech (Hopfield, 1982), explaining how completely connected symmet- ric neural nets could be used to store associative memories. In 1986, psychologists Rumelhart and McClelland (1986) popularized the extension of the Widrow-Hoff learning rule to neu- ral networks with multiple layers (a method that was independently discovered by multiple researchers) Currently neural networks are an extremely popular technique with a host of practical applications. Multilayer networks of formal neurons or other conceptually similar processing units have become known by the term "deep learning" and have proved highly suc- cessful in (Bengio, 2014; Le et al, 2013; Taigman et al, 2014), speech processing (Deng et al, 2013) and other areas. Computational neuroscience is also a flourishing field, uti- lizing detailed computational models of biological neurons to study large-scale self-organizing behavior in neural tissue (Izhikevich and Edelman, 2008). Some researchers in these areas be- lieve that, by gradually increasing the adaptive capability and architectural complexity of their networks, they will be able to incrementally approach human-level AGI (Arel et al, 2010) An important subset of emergentist cognitive architectures, still at an early stage of ad- vancement, is developmental , which is focused on controlling without significant Òhard- wiringÓ of knowledge or capabilities, allowing robots to learn (and learn how to learn etc.) via their engagement with the world. A significant focus is often placed here on Òintrin- sic motivation,Ó wherein the robot explores the world guided by internal goals like novelty or curiosity, forming a model of the world as it goes along, based on the modeling requirements implied by its goals. Some of the foundations of this research area were laid by Juergen Schmid- 12 1 Overview of the AGI Field huberÕs work in the 1990s (Schmidhuber, 1995), but now with more powerful computers and robots the area is leading to more impressive practical demonstrations.

1.5.4 Hybrid AGI

In response to the complementary strengths and weaknesses of the other existing approaches, a number of researchers have turned to integrative, hybrid architectures, which combine subsys- tems operating according to the different paradigms. The combination may be done in many different ways, e.g. connection of a large symbolic subsystem with a large subsymbolic system, or the creation of a population of small agents each of which is both symbolic and subsymbolic in nature. One aspect of such hybridization is the integration of neural and symbolic compo- nents (Hammer and Hitzler, 2007). Hybrid systems are quite heterogenous in nature, and here we will mention three that are relatively representative; a longer list is reviewed in (Goertzel, 2014). A classic example of a hybrid system is the CLARION (Connectionist Learning with Adaptive Rule Induction On-line) cognitive architecture created by Ron Sun (2002), whose design focuses on explicitly distinguishing implicit versus explicit processes, and capturing the interaction between these two process types. Implicit processes are modeled as neural networks, whereas explicit processes are modeled as formal symbolic rules. CLARION involves an action-centered subsystem whose job is to control both external and internal actions; its implicit layer is made of neural networks called Action Neural Networks, while the explicit layer has is made up of action rules. It also involves a non-action-centered subsystem whose job is to maintain general knowledge; its implicit layer is made of associative neural networks, while the bottom layer is associative rules. The learning dynamics of the system involves ongoing coupling between the neural and symbolic aspects. The LIDA architecture (Faghihi and Franklin, 2012), developed by Stan Franklin and his colleagues, is closely based on cognitive psychology and cognitive neuroscience, particularly on Bernard Baars’ Global Workspace Theory and Baddeley’s model of working memory. LIDA’s dynamics are based on the principles that: 1) Much of human cognition functions by means of frequently iterated ( 10 Hz) interactions, called cognitive cycles, between conscious contents, the various memory systems and action selection. 2) These cognitive cycles, serve as the ÒatomsÓ of cognition of which higher-level cognitive processes are composed. LIDA contains components corresponding to different processes known to be associated with working and long-term memory (e.g. an episodic memory buffer, a sensory data processing module, etc.), and utilizes different AI algorithms within each of these components. The CogPrime architecture (Goertzel et al, 2013), implemented in the OpenCog AI software framework, represents symbolic and subsymbolic knowledge together in a single weighted, la- beled hypergraph representation called the Atomspace. Elements in the Atomspace are tagged with probabilistic or fuzzy truth values, and also with short and long term oriented "attention values." Working memory is associated with the subset of Atomspace elements possessing the highest short term importance values. A number of cognitive processes, including a probabilis- tic logic engine, an evolutionary program learning framework and a neural net like associative and reinforcement learning system, are configured to concurrently update the Atomspace, and designed to aid each others’ operation. 1.6 Future of the AGI Field 13 1.6 Future of the AGI Field

The field of AGI is still at a relatively early stage of development, in the sense that nobody has yet demonstrated a software or hardware system that is broadly recognized as displaying a significant degree of general intelligence, or as being near general-purpose human-level AI. No one has yet even demonstrated a compelling "proto-AGI" system, such as e.g.: a robot that can do a variety of preschool-type activities in a flexible and adaptive way; or a chatbot that can hold an hourÕs conversation without sounding bizarre or resorting to repeating catch-phrases. Furthermore, there has not yet emerged any broadly accepted theory of general intelligence. Such a theory might be expected to include a characterization of what general intelligence is, and a theory of what sorts of architecture can be expected to work for achieving human-level AGI using realistic computational resources. However, a significant plurality of experts believes there is a possibility of dramatic, inter- linked progress in AGI design, engineering, evaluation and theory in the relatively near future. For example, in a survey of researchers at the AGI-2010 conference, the majority of respondents felt that human-level AGI was likely to arise before 2050, and some were much more optimistic (Baum et al, 2011). Similarly, a 2014 poll among AI experts (Mueller and Bostrom, 2014) at various conferences showed a broad agreement that AGI systems will likely reach overall human ability (defined as "ability to carry out most human professions at least as well as a typical human") around the middle of the 21st century.he years 2013 and 2014 also saw sharply height- ened commercial activity in the AI space, which is difficult to evaluate in terms of its research implications, but indicates a general increase in interest in the field. The possibility of a relatively near-term advent of advanced AGI has led some researchers and other observers to express concern about the ethics of AGI development and the possibility of "existential risks" associated with AGI. A number of recently-formed research institutes have emerged, placing significant focus on this topic, e.g. the Machine Intelligence Research Institute (formerly the Singularity Institute for AI), Oxford University’s Future of Humanity Institute, and Cambridge University’s Center for the Study of Existential Risk (CSER). The dramatic potential benefits of AGI, once it is achieved, has been explored by a variety of thinkers during the past decades. I.J. Good in 1962 famously pointed out that "the first ultraintelligent machine is the last invention that man need ever make." Hans Moravec (1986), Vernor Vinge (1993), Ray Kurzweil (1999, 2006), and many others have highlighted the potential of AGI to effect radical, perhaps sudden changes on human society. These thinkers view AGI as one of a number of emerging transformational technologies, including nanotechnology, genetic engineering, brain-computer interfacing, mind uploading and others, and focus on the potential synergies between AGI and these other technologies once further advances in various relevant directions occur.

Chapter 2 A Brief History of AI and AGI

Ben Goertzel

Abstract A nontechnical survey of the AI field is given, with an emphasis on the interplay between AGI and narrow AI throughout the field’s history. The path of the AI field away from, and more recently somewhat back toward, a focus on AGI and human-level intelligence is traced. An argument is made that narrow AI researchers have, in some cases, effectively explored various particular aspects of human-like general intelligence, thus providing valuable input to AGI researchers and architects, even when their work has not directly related to AGI.

2.1 Introduction

The pursuit of thinking machines may be humanity’s greatest quest. As I.J. Good said, the first intelligent machine is the last invention humanity will need to make [? ]. The concept of mechanical humans with thoughts and consciousness has excited people since antiquity. The modern field of AI field began in the middle of the last century as an instance of this age-old quest, and has evolved in multiple interesting directions since. The first modern AI researchers, working as digital computing technology emerged, sought to create machines with general intelligence of the same basic sort as human beings – and ultimately surpassing the human mind in both generality and acuity of thought. These goals are now referred to with terms such as Artificial General Intelligence (AGI), or Human-Level AI – terms that didn’t exist in the early years of the field, because researchers assumed this was the only kind of AI there is. As time passed, the AI field accumulated various other objectives besides its original goal of human-level general intelligence, such as the creation of "Narrow AI" systems carrying out specialized functions with high intelligence. It gradually became apparent that the construction of these narrowly specialized intelligent systems was a much easier, and substantially different, project than creating mechanical humans or other advanced artificial general intelligences. In the past few decades, narrow AI systems have achieved great things – beaten the world cham- pions at chess and Jeopardy, won billions on the financial markets, optimized the US military’s logistics systems, mastered credit card fraud detection, piloted cars and copters, proved recon- dite theorems, and so forth. These successes have been attained mostly via methods bearing little surface resemblance to the human mind, without much of the capability for autonomy and generalization that characterizes human intelligence. These more specialized pursuits have

15 16 2 A Brief History of AI and AGI diverted a great amount of the AI field’s attention from the quest for AGI and human-level AI. But the field’s original, broadly ambitious goals have never fully faded, though they have waxed and waned. At the moment there is an increasing trend in the field back toward its original AGI ambitions. Due to the complex and multifaceted nature of the human mind, researchers have taken a wide variety of approaches to the pursuit of machines with human-like general intelligence. None has proved completely successful yet; and the human mind has repeatedly proved more complex than engineers’ approaches to replicating or surpassing it. Commonly, different AI approaches that initially seem contradictory, are later found to be complementary, and reflect different aspects of the mind. The history of AI to date can be interpreted in multiple ways. It can be viewed as the story of a series of failed attempts to create human-like or greater general intelligence via various algorithms and architectures. It can also be viewed as the story of a series of dramatic successes at making machines do difficult things that, when humans do them, appear to require a high level of intelligence. Or it can be viewed as the story of a scientific and mathematical pursuit that, as well as producing a host of interesting algorithms and theorems, has taught us a great deal about how the human mind works, and does not work. The history of the AI field in the latter half of the 20th century is a tale often told: the naive overenthusiasm of the brilliant, early AI zealots in the 1960s and early 1970s ... the ensuing "AI Winter" when funding sources become annoyed at the failure of the AI field to fulfill its early promises ... the gradual resurrection of the field with a new focus on statistical methods and Big Data and more narrowly-defined practical problems.... This archetypal history is accurate so far as it goes; but it is also somewhat limited as a vision of the human race’s quest to create artificial minds. One may envision the history of the AI field as encompassing four different, interconnected aspects:

1. understanding of various aspects of intelligence (both human intelligence and intelligence- in-general) 2. attempts (mostly rather unsuccessful so far, but some still early stage) to create ambitious general intelligences via leveraging individual aspects of intelligence 3. attempts (in some cases very successful) to create useful specialized intelligences ("narrow AI" systems) leveraging individual aspects of intelligence 4. attempts (mostly still early stage) to create holistic cognitive architectures combining dif- ferent aspects of intelligence Many of the "failures" of the mid-20th-century AI community were failures only when in- terpreted in terms of their original general intelligence oriented goals, and turned out to be dramatic successes when evaluated in terms of the narrow AI progress they led to. The following perspectives on relationship between these aspects in the history of AI are all reasonably empirically supportable at this stage: A pessimistic perspective , that views the history of AI as • a series of attempts to create and/or understand general intelligence using approaches that proved too limited for the task • a series of sometimes quite successful attempts to understand specialized tasks, and create systems carrying out specialized tasks, using these same limited approaches 2.2 The Prehistory of AI 17

In this perspective, one could argue that the AI field has made lots of headway on solving various practical problems with narrow AI technologies, but next to no progress toward its original goals. After all, in terms of practical system building, we don’t seem that much closer to HAL 9000 or C3PO than we were in 1970. A more optimistic, "integrative" perspective on the same history, holding that • at various stages in its development, the AI field has gained an understanding of the prop- erties, strengths and weaknesses of various algorithms and structures embodying various aspects of general intelligence • as a consequence of this understanding, AI researchers have created a variety of useful narrow AI systems based on specific aspects of general intelligence • as more and more aspects of general intelligence are understood, the field is gradually working toward the capability to build integrated AI systems achieving powerful general intelligence via combining different aspects appropriately This is the perspective taken by many of the researchers presently working on "integrative" AGI architectures. In this perspective, advanced AGI is somewhat like the human body – it requires a lot of complex parts all working together properly. If any of the key parts are left out, the whole thing may not work at all. Once we have learned enough about all the major parts and how they should work together, then we can connect the whole system together and it will achieve generally intelligent functionality via holistic dynamics. A hardware-oriented perspective on this same history, also optimistic in its own way, holding that • one of the particular approaches previously or currently explored (say, deep learning, or formal logic, or evolutionary program learning) is actually adequate to serve as the basis of a human-level AGI system all by itself • once we have the right hardware infrastructure (the right amount of computing power, and maybe the right kind of robotic body, etc.) then the true power of this approach will reveal itself In this latter perspective, the history of AI is mostly a story of • some overly impatient researchers with good ideas trying to make their ideas work in reality before the supporting tools are available • other researchers with bad ideas laboring on in vain because the available hardware isn’t enough to run the experiments needed to convincingly refute their bad ideas There are other perspectives as well, of course. Which of these views is correct, is currently largely a matter of intuition or opinion. There is not sufficient hard and fast evidence to con- vincingly disambiguate. The story is still unfolding as research and development proceed. It’s worth keeping all these views in mind as one reflects on the historical record, and uses this record to contextualize current work.

2.2 The Prehistory of AI

The quest to create human-level AI extends back into the depths of human history. Alongside stories of statues or golems endowed with consciousness and intelligence by magical means, 18 2 A Brief History of AI and AGI there are also ancient tales of actual mechanical robots. For instance, the Chinese Lie Zi text, written in the 3rd century BC, describes a life-sized build by the engineer Yan Shi for King Mu of Zhou, who ruled from 1023-957 BC). King Mu’s reaction to the robot is described as follows: The king stared at the figure in astonishment. It walked with rapid strides, moving its head up and down, so that anyone would have taken it for a live human being. The artificer touched its chin, and it began singing, perfectly in tune. He touched its hand, and it began posturing, keeping perfect time...As the performance was drawing to an end, the robot winked its eye and made advances to the ladies in attendance, whereupon the king became incensed and would have had Yen Shih executed on the spot had not the latter, in mortal fear, instantly taken the robot to pieces to let him see what it really was. And, indeed, it turned out to be only a construction of leather, wood, glue and lacquer, variously coloured white, black, red and blue. Examining it closely, the king found all the internal organs completeÑliver, gall, heart, lungs, spleen, kidneys, stomach and intestines; and over these again, muscles, bones and limbs with their joints, skin, teeth and hair, all of them artificial...The king tried the effect of taking away the heart, and found that the mouth could no longer speak; he took away the liver and the eyes could no longer see; he took away the kidneys and the legs lost their power of locomotion. The king was delighted.. We have no current way to validate the veracity of this tale, of course. There is more convincing evidence for more recent mechanical humans, such as Al Jazeri’s orchestra of mechanical humans in the early 1200s. But at very least, the Lie Zi story and other ancient tales shows that the concept of building mechanical humans goes pretty far back. There is nothing new about the desire to create artificial people. What is new is that, during the last century, we have created tools that plausibly seem capable of leading to artificial people that not only look and sound like humans, but also think and feel like them.

2.3 1600s-1800s: Mechanical Calculators, and Models of Thought as Calculation

Serious progress toward AI began in the 1600s with the development of mechanical calculators. This task was difficult at this point in history, mostly because of mechanical constraints imposed by the crudity of the parts available. In the modern era we take for granted the availability of multiple machine parts of the same size and shape, due to the existence of mass production. Back when every part was hand made, every part was a little different, which made it tricky to create machines operating in a uniform way. Nevertheless, Blaise Pascal built what was apparently the world’s first digital calculator in 1652; and two decades later Leibniz created a digital calculator that could multiply and divide. The conceptually groundbreaking nature of these achievements is indicated by the fact that Pascal and Leibniz were not merely tinkerers, but also two of the great thinkers in history. The construction of automated arithmetic calculators was part and parcel of an emerging understanding of the human mind as a kind of calculating system. Hobbes, in his Leviathan [? ], modeled the human mind as a kind of complex calculating or "reckoning" system. Leibniz went further via introducing the notion of binary logic, and postulating that all human knowledge could be represented in this form, as logical combinations of a small 2.3 1600s-1800s: Mechanical Calculators, and Models of Thought as Calculation 19

Fig. 2.1: Prehistory of AI: Early Attempts to Create Mechanical Humans number of primitive concepts. In 1750, Julien Offray de Metttrie published a book titled "The Man-Machine", arguing that the human mind was basically just a big complicated mechanical apparatus – a view that shocks no one today but was quite revolutionary for its time. These practical and conceptual achievements paved the way for the pioneering work of Charles Babbage and Ada Lovelace in the 1800s, as they designed and sought to build a fully 20 2 A Brief History of AI and AGI programmable arithmetic calculator. Had they succeeded, it would have been the first artifi- cial computer truly worthy of the name. Unfortunately they never quite got their "Analytical Engine" working, due to practical difficulties related to irregularly shaped parts and so forth. In hindsight the workability of their ideas seems almost obvious, but at the time their pursuit was judged insane by most contemporaries. He did receive substantial funding from the British government, in spite of the controversial nature of his ideas. Some have argued that that the improvements in precision manufacturing developed in the process of trying to get the Analyt- ical Engine to work, more than repaid the investment – even though they ultimately were not enough to enable him to get built.

Fig. 2.2: 1600s-1800s: Mechanical Calculators, and Models of Thought as Calculation 2.4 Turn of the 20th Century: Maturation of the View of Human and Artificial Thought as Complex Mechanical Operations21

Fig. 2.3: A Modern, Working Version of Leibniz’s "Stepped Reckoner" Automated Calculator

2.4 Turn of the 20th Century: Maturation of the View of Human and Artificial Thought as Complex Mechanical Operations

As the 20th century approached and unfolded, mathematics and science gradually advanced, allowing the held by previous visionaries like Leibniz and Babbage to become more concrete and more pragmatically manifested. George Boole’s landmark "Laws of Thought" pursued the same binary logic theme outlined by Leibniz earlier, but much more explicitly and didactically argued for the reduction of human cognitive operations to binary operations of the sort we now call "Boolean logic." Theorists like Samuel Butler began exploring concepts verging on the modern notion of the Singularity, realizing that once mechanical thought had been achieved, there was no reason its intelligence would be limited by the level humans happen to have achieved at this point in history. In 1915, Leonardo Torres de Quevedo built a chess-playing robot. It only played certain sorts of endgames, but it played them well, and it manipulated the pieces just like a human player. A few years earlier, Camillo Golgi and Santiago Ramon y Cajal won a Nobel Prize for their work showing that the brain is made of neurons connected in a complex network. The philosophy of "mind as mechanism" was now bolstered by a concrete biological theory of the brain as a network of interconnected cells analogous to a network of interconnected electrical wires. The harmony of physics and psychology was very much in the air. Gestalt psychologists viewed the mind in terms of forces analogous to physical forces; and Sigmund Freud presented an understanding of the human mind based on analogies to equilibrium thermodynamics. Philosophically and 22 2 A Brief History of AI and AGI

Fig. 2.4: A Modern, Working Version of Babbage’s Analytical Engine, Constructed According to the Original Design conceptually, many of the leading thinkers of the time were primed for AI and robotics – they merely lacked the technical ideas and practical tools to make these visions a reality.

2.5 Mid 20th Century: The Birth of Electronic Computing and Modern AI Technology

Before the middle of the 20th century, humanity was slowly moving in the direction of artificial intelligence and robotics. The conceptual understanding of mind as mechanism had gradu- ally emerged and become fairly mature, and machinery was generally getting more and more sophisticated, but there was no particularly clear path to building machines doing things as 2.5 Mid 20th Century: The Birth of Electronic Computing and Modern AI Technology 23

Fig. 2.5: Torres y Quevedo’s Chess Endgame Robot complex as human thought. And then digital computers were invented, and everything was different. In 1941, Konrad Zuse built the first program-controlled digital computer, and in the decades following that, practical and theoretical progress toward artificial intelligence sped up dramatically. The analogy between electrical circuits and neuronal circuits became more than an analogy, with McCullough and Pitts’ classic 1943 paper "A Logical Calculus of the Ideas Immanent in Nervous Activity." They showed that one could model the activity of the brain’s neuronal network using Boolean logic. They invented the earliest version of what we’d now call a formal neural network, and showed that it made basic sense to model the brain as a kind of computer. No more was "mind as mechanism" a merely philosophical point – we now had a concrete theory regarding what kind of machine the brain-mind was. It was, perhaps, a logical machine implemented as a neuronal network. Of course, nobody thought their simple initial neural net model was really an accurate model of the brain. But it seemed to identify the class of mechanical system that was needed to model the human mind, and that was a huge step forward. Norbert Wiener’s book Cybernetics, spanning engineering, mathematics, philosophy and other disciplines, laid the foundations for a common understanding of biological and engi- neered intelligent systems. Among other things it outlined an approach to embodied general intelligence via what we would now call intelligent . It certainly didn’t give a de- tailed roadmap for building a thinking machine. But it laid out a conceptual framework within 24 2 A Brief History of AI and AGI

Fig. 2.6: Turn of the 20th Century: Maturation of the View of Human and Artificial Thought as Complex Mechanical Operations which the creation of thinking machines was "merely" a matter of solving a series of difficult technical problems, rather than something vague and nebulous. A brain-mind was envisioned as a complex network of nonlinear components, sending signals amongst each other in a manner that caused the organism to achieve its goals and receive positive feedback. The question of brain-mind design came down to building the right , and having the right equations for adaptation of the elements and their interconnections. We take this way of think- ing about engineering intelligence for granted now, but it’s not something that’s intrinsically obvious – its a perspective that was created in the middle of the last century, as part of the same intellectual revolution that led to the development of computer technology. A host of other innovations came along at the same time. Andrew Donald Booth started designing the first machine language translation systems. Claude Shannon analyzed chess as a complex search problem, paving the way for modern game playing AI systems, as well as for the general conceptual perspective of much modern AI, in which nearly every problem can somehow be reduced to a search problem over some appropriately defined search space. Arthur Samuel implemented a series of checkers programs, including some that learned to play checkers via experience. Vannevar Bush’s essay "As We May Think" postulated future AI programs that would help intelligently resolve human queries by reference to a massive of knowledge, similar to today’s Web. Alan Turing foresaw that, no matter how smart AI programs eventually got, some people would refuse to acknowledge their intelligence as genuine – and so he proposed 2.5 Mid 20th Century: The Birth of Electronic Computing and Modern AI Technology 25 what is now called the Turing Test, according to which we should consider an AI to have human- level intelligence if it can impersonate a human successfully in a textual conversation. And remember, all this occurred before the identification of Artificial Intelligence as a specific scientific discipline. At this stage, computer science, AI, neuroscience and electrical engineering were somewhat freely mixed together, in accordance with Wiener’s vision of an interdisciplinary cybernetics. In some ways, the thinking common in this period was more sophisticated than what occurred in the mainstream of AI a couple decades later, after AI emerged as a discipline in its own right. The emergence of AI as a distinct discipline was positive in many ways – e.g. it brought focus to some key problems that weren’t being addressed before. But it also had negative consequences, in that it stifled consideration of important cross-disciplinary synergies related to the synergy between software and hardware, mind and body, individual and society. Many of these synergies resurged in the 1980s and 1990s via the means of new perspectives like cognitive science and embodied AI, which instantiated aspects of the spirit of cybernetics in more modern and sophisticated form.

Fig. 2.7: The Mark I computer, on which early chess and checkers programs were implemented in the early 1950s 26 2 A Brief History of AI and AGI

Fig. 2.8: Mid 20th Century: The Birth of Electronic Computing and Modern AI Technology

2.6 Late 1950s - Early 1970s: Emergence and Flourishing of AI as a Discipline

In the classic narrative of the history of AI, the field began in 1956 at a meeting at Dartmouth College, organized by Marvin Minsky, John McCarthy, Claude Shannon and IBM researcher Nathan Rochester. Other luminaries such as Ray Solomonoff, who later articulated the first rigorous mathematical theory of general intelligence, were also in attendance. This summer workshop was where the term "artificial intelligence" was coined, and it was important as part of the process of establishing AI as an independent discipline. MIT (where Minsky worked) and IBM were key ingredients of the meeting, and these two institutions have played a key role in the development of AI ever since. Valuable as this foundational meeting was, however, one should not exaggerate its impor- tance. The quest for AI was already rolling in a fairly mature form by that point, thanks to cybernetics and related ideas. In 1959, Stanford researchers Widrow and Hoff developed the ADALINE and MADALINE neural network algorithms, which owed a lot to cybernetics and nothing to the Dartmouth meeting, and which inspired a host of future development in neural net based AI. Stanford has also continued to play a major role in AI to this day. Chomsky’s classic work Syntactic Structures appeared in 1957, presenting an incisive analysis of natural language syntax in terms of mathematical formal grammars – laying the conceptual foundations for computational linguistics as well as modern theoretical linguistics. Chomsky still works at 2.6 Late 1950s - Early 1970s: Emergence and Flourishing of AI as a Discipline 27

MIT today. While his approach ultimately failed to capture natural language even well enough for narrow AI purposes, it turned out to be surprisingly useful as a foundation for analyzing programing languages. While a host of innovations of all sorts were emerging during this period, a common thread to most of the "mainstream" AI work at MIT, Stanford and other leading US and UK institutions was "symbolic AI." Rather than continuous-variable systems like most formal neural net models, the trend of the day was systems that modeled knowlegde and thinking using manipulation of discrete symbols, as in logic systems or formal languages. The first explorations of this approach were extremely fruitful, and a while the innovations and new achievements just kept coming and coming. Software like Raphael’s SIR system extended Chomsky’s thinking, and formally represented semantic as well as syntactic structures, paving the way for the creation of automated natural language question-answering systems. James Slade’s SAINT system solved freshman calculus problems as early as 1962. Weizenbaum’s ELIZA system, based on a system of simple IF-THEN rules, provided a crude form of psychotherapy and convinced a number of naive humans that it was a real human therapist. Robotics was also advancing dramatically. Industrial robotics got its start with the founding of Unimation in 1962. Stanford’s Shakey the Robot was doing simple navigation, perception and problem-solving in 1969. The University of Edinburgh’s Freddy robot was locating objects and building models with them just a few years later. The AI field concretized itself into an international community in 1969, with the estab- lishment of the first International Joint Conference on Artificial Intelligence (IJCAI). These conferences are held on alternating years, to this day. Buoyed by all these successes, in 1970 leading MIT AI researcher Marvin Minsky told Life Magazine that ”In from three to eight years we will have a machine with the general intelligence of an average human being.” .... This was not an isolated quote; many of his colleagues felt the same way. This was the beginning of a long tradition of AI researchers placing the achievement of human-level intelligence 5 or 10 years in the future. In hindsight, it is clear that the AI researchers of this generation were making a very impor- tant discovery, but not the one they thought they were making. What they were discovering was, among other things, that it is possible to achieve some of the amazing feats that humans do via advanced general intelligence, using a bunch of fairly simple algorithmic tricks that ef- fectively leverage the particular strengths of digital computers. While seeking to follow a path toward human-level general intelligence, instead they discovered the previously unanticipated existence of Narrow AI. But it took them a while to understand what they had discovered. It was hard for them to escape the intuition that, once their software had mastered chess and calculus, trivial things like conversation and upright locomotion couldn’t be far behind. Actually, though, the conceptual framework needed to understand why their Narrow AI successes didn’t translate into human-level general intelligence was already ready at hand. Wiener’s cybernetics framework clearly explained how intelligence depends on complex feedback loops, not only within the brain-mind, but between the brain and body, and between the individual and the environment and society. Cybernetics recognized that general intelligence is largely about adaptation to context. On the other hand, the narrow AI achievements of the mainstream AI crowd of the late 1950s through early 1970s, were all about isolating specific tasks from their contexts, to make them easier for AI programs to solve. Problems were simplified by removing the noisy, complex feedback loops that characterize most problems in the real world. This was an interesting and productive direction for creating practical Narrow AI systems, but 28 2 A Brief History of AI and AGI resulted in the construction of systems very different in nature from the complex, contextually- coupled control systems via which biological systems achieve general intelligence. By the mid 1970s the limitations of the prevailing AI paradigms was becoming clear. The gap between ELIZA and real human conversation was obvious, no matter how easy it was to fool naive observers that ELIZA was human. The difficulty of extending AI capabilities from delimited domains like chess, checkers or blocks worlds in the AI lab, to real-world systems, was becoming glaringly apparent. In the UK, the 1973 Lighthill Report gave a devastatingly negative evaluation of the future promise of AI research, which resulted in the elimination of nearly all AI research funding in that nation. Finally lest neural network approaches be viewed as a viable alternative to the mainstream approaches based on symbolic programming, Minsky and Papert wrote a book called Percep- trons, which pointed out severe limitations of some of the neural network systems of that time, without exploring more sophisticated neural network systems that surpass these limitations. Modern "deep learning" neural net systems bypass Minsky and Papert’s arguments, but they did not consider this sort of system, because the mathematics of such systems was not well un- derstood at the time, and the available hardware was inadequate to explore them empirically. Perceptrons both symbolized and reinforced a split that was arising in the AI community at that time period – between those working on purely symbolic approaches to AI, based on formal logic, expert rules and so forth; and those working on more brain-based approaches such as neural networks. As the field evolved over the next decades, a variety new AI approaches such as genetic algorithms, fuzzy systems, ant systems and so forth emerged and were culturally grouped with neural networks, due to their eschewing explicit symbolic knowledge representa- tion, and their focus on learning via experience or training. Only in the late 1990s and early 2000s did the field finally begin to overcome this dichotomy in a thoroughgoing way, with the emergence of hybrid neural-symbolic systems, the creation of neural networks giving rise to symbolic reasoning, and the realization that symbolic probabilistic logic networks and formal neural networks are really not such dramatically different entities. To some extent, it’s fair to say that the dichotomy between the symbolic and neural/subsymbolic approaches was exag- gerated by the dynamic of early AI researchers seeking to stake their claims and emphasize the distinctness and superiority of their approach relative to others. However, it’s also the case that the similarities and relationships between symbolic and subsymbolic approaches are scien- tifically and mathematically much clearer now than in the early 1970s, due to all the research progress made in the intervening decades. The symbolic/subsymbolic dichotomy was loosely connected with another dichotomy often drawn in the 1970s AI community: neats versus scruffies. Roughly speaking, "neats consider that solutions should be elegant, clear and provably correct. Scruffies believe that intelligence is too complicated (or computationally intractable) to be solved with the sorts of homogeneous system such neat requirements usually mandate." [? ] Most subsymbolic AI research was considered to fall into the "scruffy" category, although this is not necessarily the case – there can be elegant math about neural nets and other subsymbolic learning systems as well. Roger Schank, who originated the terminology, considered Marvin Minsky’s AI work from the 1960s to fall into the "scruffy" category, even though it was largely symbolic in nature. Minsky’s work looked scruffy to Schank because it was based on opportunistic procedure learning rather than elegant theorem-proving, and because Minsky and his students tended to hack their programs until they had made them work, rather than proceeding more rigorously based on theory. Just like the symbolic vs. subsymbolic dichotomy, the neat vs. scruffy dichotomy appears to make little sense in the light of modern AI and cognitive science – and it’s not clear how 2.7 Mid 1970s - early 80s: Having Failed to Progress as Fast as Hoped, AI Experiences a Funding Winter, but also a Host of New Ideas29 much sense either of these distinctions ever made. In hindsight, these crude distinctions ap- pear as much political/cultural as scientific. Some current researchers view them as merely "squabbling", characteristic of a field experiencing troubled times. By the mid 1970s, the AI research funding drought commonly known as the "First AI Winter" had come.

2.7 Mid 1970s - early 80s: Having Failed to Progress as Fast as Hoped, AI Experiences a Funding Winter, but also a Host of New Ideas

The period from the mid 1970s through the mid 1980s is frequently viewed as a sort of doldrums of AI progress. The excitement of the AI field’s early years had faded, and nothing new had emerged to take its place AI-wise. The field was relatively devoid of research funding, and had no clear ruling ideas. This was, however, a time of interesting explorations. John Holland published his book Adaptation in Natural and Artificial Systems, creating the field of Evolutionary Computing, in which intelligent behaviors or problem solutions are created via imitation of the process of natural selection. Neural networks expanded beyond the simplistic two-layer networks Minsky and Papert had demolished in Perceptrons, and researchers developed learning algorithms for complex multilayered networks Marvin Minsky crystallized earlier ideas about knowledge representation in a classic paper on "frames", which set the subfield of symbolic AI in a new direction. In essence, frames were an attempt to solve the problem of contextuality in a purely symbolic AI setting. Rather than positing rules that had to be true universally, one posited rules that were relative to a particular frame, denoting a particular situation or context. This was a more flexible sort of formalism than the ones used by previous generations of researchers, and seemed to show some promise of getting past the limitations that symbolic AI had run into in the late 60s and early 70s. So far the the promise of the frame based approach has still not been fulfilled, but it generated wide enthusiasm at the time, and led to later interesting developments like Minsky’s "Society of Mind" approach and Jackendoff’s Conceptual . Doug Lenat, meanwhile, was attempting to extend the symbolic AI framework in the direction of learning, with his pioneering AM and EURISKO. The latter was especially innovative, with its diverse pool of heuristics, including heuristics for learning new heuristics. It was oriented toward discovering new things, and made some interesting engineering inventions as well as playing some strategy games very well. Autonomous humanoid robotics developed very slowly, but dramatic progress was made with wheeled vehicles. Hans Moravec developed the Stanford Car, which was arguably the first autonomous vehicle, operating in a robot lab context. In Germany, Ernst Dickmanns created self-driving cars capable of fully autonomous high-speed driving down empty highways. As part of its broad-based searching process, the AI field began re-establishing some of the cross-disciplinary connections that had been critical in the cybernetics period. The cross- disciplinary research field of Cognitive Science emerged, as symbolized by the founding of the Cognitive Science Society and associated journal in 1979. These days Cognitive Science is fre- quently considered a subfield of psychology, a variant of cognitive psychology that draws inspi- ration from AI, neuroscience, linguistics and other areas. But the original ambition of the field 30 2 A Brief History of AI and AGI was much more ambitious, and is still realized today in some cognitive science departments or programs. The idea was to make a truly interdisciplinary science of cognition, drawing on knowledge about the human mind and brain as needed, and using computer algorithms to test hypotheses and generate new ideas. The goals of building thinking machines and understanding human minds were to be pursued in tandem. The emergence of cognitive science as a disci- pline inspired a huge amount of interesting research and thinking, bringing back some of the cross-disciplinary creativity of the days of Wiener, Shannon, Turing McCullough and Pitts. At this stage in the development of the AI field, we had a number of different emerging paradigms:

• the symbolic approach, which was attempting to embrace context and complexity via frames and other novel formalisms, and via learning heuristics like those in EURISKO • the neural net based approach, which was starting to embrace multilayer nets and deep learning • evolutionary learning, with a focus on learning complex structures via data analysis or interactive experience • cognitive science, which could be used for example to design AI cognitive architectures based on analogy to the human mind All of these approaches seemed exciting to significant swaths of researchers, and nobody was terribly clear on what the limitations of any of these approaches might be. It was clear that there was a lot to explore. From a modern integrative AI perspective, one might say that at this stage there were multiple, largely disjoint subsets of researchers, each in effect exploring a different aspect of human intelligence. The available hardware was insufficient to explore any of these aspects very thoroughly. And the available funding was depressingly small. But still, a lot was being learned.

2.8 Mid 1980s - early 90s: AI Funding for Expert Systems Rises and Falls; Connectionist, Probabilistic and Subsumption Approaches Surge

What ended the First AI Winter was, in hindsight, somewhat of an illusion. Symbolic AI systems were steadily advancing throughout the Winter, leading among other things to the development of sophisticated "expert systems" that made sophisticated logic-based judgments in specific domains, based on knowledge rules hand-coded by human experts. These systems gave an effective illusion of expertise, and in some cases actually made useful judgments. The Expert System to End All Expert Systems, , was initiated in 1984. This was a massive logic-based AI system, intended originally to emulate the commonsense understanding of a 10 year old child. It was based on a massive file of knowledge, expressed in predicate logic and encoded by a team of dozens of expert humans. The project is still ongoing, and in spite of a fairly massive expenditure of human and financial resources, has not yielded any dramatic results yet. The Cyc system has gained a natural language user interface and acquired a lot of specialist knowledge (e.g. about terrorism and network intrusion) along with childlike commonsense knowledge. But its ability at everyday understanding is still very weak. It seems that even millions of hand-coded logic rules, barely scratch the surface of the amount of 2.8 Mid 1980s - early 90s: AI Funding for Expert Systems Rises and Falls; Connectionist, Probabilistic and Subsumption Approaches Surge31 knowledge needed to reason like a child. Of course, a child does not have to gain its knowledge from explicitly encoded logic rules – it learns via experience. In the mid 2000s, Cyc employee Stephen Reed designed a system called CognitiveCyc, in- tended to interface Cyc with a robot to enable it to learn from experience as well. But the Cyc leadership opted not to pursue this direction, and Stephen split off from Cyc to create his own AI effort called Texai, using the Cyc with a different reasoning and learning infrastructure aimed at experiential learning. SOAR, developed initially in 1983, has arguably been more successful than Cyc. It is also based on a database of hand-coded expert rules, but has generally been applied in narrow contexts where only a small number of rules is required. Unlike Cyc, SOAR is based on a "cognitive architecture" intended to model key aspects of human cognition. SOAR remains the case of a cognitive architecture centered AI. Whereas Cyc is focused on its knowledge base and reasoning engine, and neural net and evolutionary algorithms are focused on their learning prowess, SOAR is focused on its architecture – the way it’s divided into parts, and the way information moves between these parts. Heavily influenced by cognitive science, the philosophy is that if one builds a system with the right architecture, the other aspects of an AI system can be flexibly experimented with until they work right. Get the architecture right first, then worry about the learning algorithms and the knowledge base. SOAR has been used to do some interesting things, like simulate fighter pilots, and emulate human behavior in a host of laboratory psychology experiments. But even in its much expanded 2012 form, its architecture covers only a small percentage of human intelligence, leaving out key aspects like language, experiential learning, and the feedback loops between perception, action and cognition. The leader of the SOAR research group, John Laird, is well aware of these lacunae, and plans to add them in over time, methodically based on the results of systematic experimentation. Japan launched perhaps the most ambitious foray into the rule-based AI space, with its Fifth Generation AI initiative. Including special AI hardware and software, and based heavily on the language, this massive government/industry collaboration utterly failed to achieve its goals, and turned the Japanese research funding establishment away from AI for some time after that. Fortunately the stigma attached to AI was not extended to robotics, though, and Japan went on to contribute dramatically in the latter area! Around 1987, the short-lived excitement that had grown surrounding expert systems ap- proaches faded. In a partial replication of the First AI Winter of the early 1970s, funding for AI research somewhat dried up for a while. But the Winter didn’t quite last as long this time. By the early 1990s, variants of the expert system approach were actually paying off in some areas. And more excitingly, a host of new ideas was emerging, pushing the AI field in different directions. Probably the biggest success of the expert rule based approach to AI was in the area of planning and scheduling. This is not such a sexy AI application, compared to chess-playing or mobile robotics. But it’s important, because improving the planning and scheduling of complex operations can save a complex organization a lot of money, by letting it do its work a lot more efficiently. And complex planning and scheduling is not the sort of thing the human mind does particularly well. The first Desert Storm war, in 1991, was the first situation in which the US military offloaded a large portion of its planning and scheduling to AI algorithms, and it saved a lot of money in this way. It’s been calculated that the cost savings due to using AI in this particular war, more than compensated for all the US government’s expenditures on AI up till that point. Of course, these days AI algorithms are used far more widely for planning in scheduling in industry as well as the military. Like many other practical AI applications, real- 32 2 A Brief History of AI and AGI world , domain specialized planners are often not even considered AI. But they came directly out of research in AI academia, and they carry out operations that appear to require significant intelligence when humans do them. These planning algorithms can’t do all the same kinds of planning that humans can – they’re not so good in highly dynamic, unpredictable environments. They can’t learn about one sort of environment, and then transfer this learning to other similar environments. But they serve very useful functions. And while all this expert system work was dominating the scene, various other advances were occurring as well. Physicist John Hopfield discovered a new class of neural networks, which stored memories associatively using dynamical attractors. Danny Hillis designed a new, massively parallel computer called the Connection Machine, intended to allow implementation of AI algorithms that did a massive number of things at the same time, similar to how all the neurons in the brain are active at once. He founded a fantastic company and filled it up with AI and other advanced researchers, using his hardware to explore new algorithms and ideas, figuring that the solution to human-level general intelligence (as well as other problems) would eventually emerge in this way. Judea Pearl’s classic book on Bayesian network came out in 1988, and gradually nudged a large percentage of the AI field toward probabilistic methods – a trend that continues to this day. Much of the applied AI work on Big Data that companies like Google, LinkedIn and Amazon do is based on Bayesian methods descending from the ones Pearl summarized in his book. Incorporating Bayesian methods but going for beyond them, "machine learning" was gradu- ally emerging as a distinct and highly successful subfield of AI, based on applying probabilistic, neural network, evolutionary and other mathematical methods to recognize patterns in datasets. Unlike most prior AI paradigms, machine learning was immediately extremely useful in a variety of domains. Essentially, almost anywhere that statistical methods were being used to analyze large bodies of data, machine learning methods could be used on the same data and found to yield more insight. This was a limited form of artificial intelligence – identifying patterns in data – but an increasingly useful one, as computer technology enabled the collection of more and more data. In robotics, Rodney Brooks created broad interest in the late 1980s with his proposal of "subsumption robotics" as an alternative to the classic robotics architectures. In classic robotics, one supplies robots with an explicit world-model, and then programs it to reason about how to achieve its goals in the context of its world-model. In the subsumption approach, one eschews centralized modeling and control, and simply hooks together a bunch of parts, each of which has some capability to respond to stimuli and to adapt in reaction to the world and the other parts. This approach lets one build interesting, fairly capable robot bugs rather quickly. At the time, many researchers felt it might also fairly rapidly enable the creation of more sophisticated robot organisms. Advanced cognition, it was speculated, might emerge automatically from the synergetic activity of various inter-adapting robot parts in response to a complex environment. Once again, different research groups were exploring different, largely disjoint aspects of human intelligence. While the Cyc team was understanding the structure of abstract human knowledge, the SOAR community was understanding how information flows between parts of the human mind during execution of simple non-learning-based tasks. Hopfield was understanding how the brain stores multiple different memories using the same set of neurons, and how this sort of distributed associative memory leads to automatic pattern completion and inductive learning. Hillis was learning how simple algorithms, massively parallelized, can lead to complex emergent behaviors and structures. Pearl was teaching the world how to make some simple but 2.9 Fueled by Powerful Computers and Big Data, Narrow AI Technology Finds Broad Practical Applications33 useful variants of probabilistic inference reasonably scalable. Brooks was understanding action learning, by making robot bugs that learn to adapt to their environments. And none of these research directions was powered by sufficient hardware to really explore their ideas empirically in a dramatic way. It was also in the late 1980s that the concept of "AI-complete" or "AI-hard" problems emerged [? ] (as a rough conceptual analogue to NP-complete and NP-hard problems in com- plexity theory). The concept here is that if a problem is AI-complete, then it can only be solved via approaches that are sufficiently sophisticated to produce human-level general intelligence. AI-complete problems are not susceptible to specialized tricks. Not even today, let alone in the 1980s, is there a sufficiently rigorous theory of general intelligence to allow someone to prove that a certain problem is AI-complete. Rather, this is a matter of conjecture. The history of AI is, in large part, a story of gradually proving some problems that looked AI-complete to actually be solvable using relatively simple specialized methods ... and discovering that other problems, which initially seemed likely to be susceptible to specialized methods, might actually be AI-complete. For instance, many AI researchers in the early 1960s assumed that solving calculus problems would be AI-complete, whereas bipedal locomotion across rough terrain and navigation in dynamic environments would not be. But now solving calculus problems is known not to be AI-complete (Mathematica and Maple can do it better than nearly all educated humans), whereas the jury is still out regarding flexible locomotion and navigation. Not all researchers, in the 1980s or now, accept that the notion of "AI-complete" is meaningful – some hold the philosophy that human general intelligence is mostly a bag of specialized tricks, so that implementing special tricks for one problem after the other is a viable approach to AGI. However, as we will review in the following chapter, one of the core characteristics of the con- temporary AGI research community is a rejection of this perspective, and an assertion that general intelligence does indeed involve particular structures and dynamics different from those needed for any specialized task.

2.9 Fueled by Powerful Computers and Big Data, Narrow AI Technology Finds Broad Practical Applications

By the mid-1990s, we understood a lot about how to build various kinds of AI systems, and about how various aspects of intelligence work. However, the quest to make thinking machines with human-level general intelligence was at one of its low ebbs in terms of popularity. Having seen one round of overoptimism in the early founders of the AI field in the late 1960s, and another in the expert system builders of the early 1980s, the general feeling of the world, and the AI research and funding communities, was that human-level AI was a very long way off, and researchers would do better to focus on nearer-term, more practical achievements. The AI field was full of creative ideas and approaches, which were ever easier to explore as got better and better. And then the Internet happened, providing massive amounts of data to feed into AI systems. More and more of the human world became electronic, rendering it susceptible to sensation and control by AI systems. These days, it’s hard to find an area of industry where AI technology doesn’t do something important. This wasn’t the case in 1990; it was the case in 2005. As the new millennium came on, AI became increasingly pervasive, via integration into all sorts of other software systems. Only a small percentage of these systems involved presenting AI directly to the end user, though, 34 2 A Brief History of AI and AGI which meant that the large role of AI in the world remained relatively obscure. A high net worth individual doesn’t really care if the hedge fund he invests in, uses AI algorithms to trade his money. PayPal users appreciate the service’s effective fraud detection, unaware that it relies on machine learning technology. Soldiers appreciate getting the supplies they need, indifferent to the fact that it was routed to them according to sophisticated probabilistic AI planning algorithms. The neural nets inside a car’s on-board chips help provide diagnostic signals to the car mechanic, who has no idea what a neural net is. There are a few domains like game AI and where contemporary AI is right there in the user’s face, but these are the minority. Given how dramatically successful narrow AI was during this period, it’s hardly surprising that the quest for advanced artificial general intelligence largely fell by the wayside. In a sense, AI was now embracing contextuality – but in a different way than is needed for the development of human-like intelligent systems. Instead of creating AI systems capable of understanding context, people were creating AI systems that were very well adapted – by their human designers – to specific contexts. In most AI application projects, loosely speaking, 90% of the work goes into the application and 10% into the AI. But still, this 10% can yield valuable improvements. One thing that happened as a consequence of the wide deployment of AI algorithms was that we learned a lot about how to make these algorithms more robust, predictable and scalable in their behavior. We now know how to apply machine learning algorithms on Google scale data-stores, which is useful not only for large-scale text-mining, but also for analysis of the large mass of perceptual data coming through the sensors of a robot. The probabilistic robotics technology used to make self-driving cars work, can also be used – with modifications and extensions – for more exploratory, generally intelligent sorts of mobile robots. Etc.

2.9.1 Dramatic Progress in Neuroscience, with Limited Implications for AI

During this same time period, neuroscience was experiencing a dramatic explosion of successful research. As in many other areas of biology (e.g. genomics), this was driven largely by the emergence of new experimental tools. Brain imaging tools like fMRI and PET scanning enabled neuroscientists to take pictures reflecting the relative activity in different parts of the brain during different sorts of activities. The ability to measure dynamics in the brain using invasive electrodes and external EEG/MEG devices also improved dramatically. The result was a much better understanding of the large-scale structure of the brain – although not, yet, any compelling or accepted general theory of how the brain’s low-level neuronal dynamics coordinates with this large-scale structure to give rise to thought and experience. Much of modern neuroscience has focused on perception (especially vision) and movement; the neural foundations of language and abstract reasoning remain much vaguer. All in all, in spite of the presence of formal neural network theory as a key AI paradigm, the two fields of neuroscience and AI have had much less to do with each other than most outsiders would suspect. Computational neuroscience is a discipline in itself, which has taken a very different direction from formal neural nets in AI. Biologically realistic neuron simulations, it turns out, are very rarely anywhere near the optimal way to carry out an intelligent function, even compared to other broadly neural-net-like systems. Of course, a strong theory of how the 2.10 2004-2012: While Narrow AI Tech Further Pervades Industry, a Trend back toward AGI / Human Level AI R&D Emerges35 brain gives rise to intelligence would be extremely useful to AI researchers. But the current, highly fragmentary knowledge and insight possessed by the neuroscience field, has been generally difficult for the AI field to assimilate. Cognitive psychology has been more valuable, as it gives specific suggestions regarding how to decompose a human-like mind into parts, what are the key dynamics arising in a human mind in various situations, etc. One exception to the lack of influence of neuroscience on AI has been in the area of com- puter vision. Here neuroscience has painted a rather clear picture of the hierarchical pattern recognition processes going on in the visual cortex, and various AI architectures such as Jeff Hawkins’ HTM and Itamar Arel’s DeSTIN have sought to embody this same sort of process in architectures without detailed simulations of biological neurons. On the other hand, neuro- scientists like Tomasso Poggio have created more brain-like simulations of vision processing, achieving functionality roughly similar to that of less brain-like hierarchical pattern recognition systems. In computer vision one can see interesting feedback between neuroscience and AI, of a sort that’s lacking in most other domains. Hawkins, Arel and some other researchers in this area believe that one can take vision as a sort of template for general intelligence, and that the same architectures they are honing for computer vision, can then be extended to handle everything else the human mind does, without major architectural revisions. This is a minority view in the AI field, but if it is correct, then the influence of neuroscience on AGI will in hindsight be rated very significant!

2.10 2004-2012: While Narrow AI Tech Further Pervades Industry, a Trend back toward AGI / Human Level AI R&D Emerges

While the 1990s and early 2000s were a fantastic period for practical applications of narrow AI technology, they were also a time of such widespread negativity toward the original goals of the AI field, that many researchers later referred to it as an "AGI Winter", analogous to the AI Winters of old. Fortunately, sometime around 2004-2007, the AGI Winter began to thaw. The successes of narrow AI became dramatic and exciting enough, the successes of neuroscience suf- ficiently evocative, the presence of amazingly powerful computer hardware sufficiently inspiring – that more and more people began to muse about what else might be possible. Narrow AI continues to display greater and greater success across numerous areas of industry. But AGI is increasingly considered a serious, plausible pursuit as well, much as it was at the dawn of the formal AI field in the middle of the previous century. Futurist pundits like Ray Kurzweil have staked out gutsy AGI-optimist positions – Kurzweil argues that human-level AI will almost surely come about before 2030, and massively super- human AI by 2045. Mainstream tech industry figures such as Intel CEO Justin Rattner have embraced this vision, as have a small but non-trivial plurality of the AGI research community. In the years since 2004, a somewhat broad community of researchers united by the explicit pursuit of AGI has emerged, as evidenced for instance by conference series like AGI 1, BICA 2 and Advances in Cognitive Systems 3, and numerous special tracks and symposia on Human-

1 http://agi-conf.org 2 http://bicasociety.org 3 http://www.cogsys.org/ 36 2 A Brief History of AI and AGI

Level Intelligence 4, Integrated Intelligence 5 and related themes. An ”AGI community” has emerged, consisting e.g. of the attendees at the AGI-related conferences mentioned above. This community is very much a work in progress, and is a combination of a grassroots movement among researchers, and an initiative driven by a small number of AGI activists, such as the author of this chapter. The AGI community is a fuzzy set containing researchers with various interpretations of, and varying levels of commitment to, the AGI concept. But it represents a major shift from the situation in 1990, 1995 or 2000, when a coherent research community focused on the grand original goals of the AI field could barely be said to exist. What does the current quest for AGI draw from the previous history of AI research, that we have summarily reviewed here? Actually, that depends on your perspective and your approach to AGI. Three fairly common perspectives in the AGI field today, corresponding to the three views of AI history mentioned earlier, may be stated roughly as follows: • A pessimistic view of the value of historical AI work, holding that it has mainly been digressive from the goal of AGI, since work on specialized tasks takes one in very different directions algorithmically and architecturally from work on AGI • An optimistic view based on advances in hardware. Many AGI researchers believe that one of the historical AI paradigms – say, logical theorem-proving or deep learning or probabilistic reasoning – is essentially adequate for AGI, and just needs to be made more scalable and connected to the world in the proper way • An optimistic view based on an integrative perspective. In this view, the prior work on narrow AI has explored and refined approaches to achieving various aspects of human- like general intelligence in software and hardware. Now general intelligence research can leverage this work by integrating together, if not previously constructed narrow AI systems, then at least newly constructed components that are conceptually inspired by previously constructed narrow AI systems.

Which of these perspectives is correct? Research inspired by each of these views is actively ongoing. Time will tell!

4 http://www.aaai.org/Press/Reports/Symposia/Fall/fs-04-01.php, http://www.ntu. edu.sg/home/epnsugan/index_files/SSCI2013/CIHLI2013.htm 5 http://www.aaai.org/Conferences/AAAI/2011/aaai11iicall.php 2.10 2004-2012: While Narrow AI Tech Further Pervades Industry, a Trend back toward AGI / Human Level AI R&D Emerges37

Fig. 2.9: Stanford’s Shakey the Robot, from the late 1960s 38 2 A Brief History of AI and AGI

Fig. 2.10: Late 1950s - Early 1970s: Emergence and Flourishing of AI as a Discipline 2.10 2004-2012: While Narrow AI Tech Further Pervades Industry, a Trend back toward AGI / Human Level AI R&D Emerges39

Fig. 2.11: Mid 1970s - early 80s: Having Failed to Progress as Fast as Hoped, AI Experiences a Funding Winter, but also a Host of New Ideas 40 2 A Brief History of AI and AGI

Fig. 2.12: The CM5 massively parallel AI computer, from Danny Hillis’s Thinking Machines Inc. 2.10 2004-2012: While Narrow AI Tech Further Pervades Industry, a Trend back toward AGI / Human Level AI R&D Emerges41

Fig. 2.13: Mid 1980s - early 90s: AI Funding for Expert Systems Rises and Falls; Connectionist, Probabilistic and Subsumption Approaches Surge 42 2 A Brief History of AI and AGI

Fig. 2.14: One of Google’s server farms in Council Bluffs, Iowa, which provides over 115,000 square feet of space for servers running services like (AI-based) Search and YouTube

Fig. 2.15: Brain imaging technology provides fascinating insights into the localization of various mental functions, but has provided rather little useful guidance to AI so far, outside some specialized domains like computer vision 2.10 2004-2012: While Narrow AI Tech Further Pervades Industry, a Trend back toward AGI / Human Level AI R&D Emerges43

Fig. 2.16: Fueled by Powerful Computers and Big Data, Narrow AI Technology Finds Broad Practical Applications 44 2 A Brief History of AI and AGI

Fig. 2.17: Honda’s Asimo humanoid robot 2.10 2004-2012: While Narrow AI Tech Further Pervades Industry, a Trend back toward AGI / Human Level AI R&D Emerges45

Fig. 2.18: 2004-2012: While Narrow AI Tech Further Pervades Industry, a Trend back toward AGI / Human Level AI R&D Emerges

Chapter 3 Artificial General Intelligence: Concept, State of the Art and Future Prospects

Ben Goertzel

Abstract I n recent years a broad community of researchers has emerged, focusing on the original am- bitious goals of the AI field – the creation and study of software or hardware systems with general intelligence comparable to, and ultimately perhaps greater than, that of human beings. This paper surveys this diverse community and its progress. Approaches to defining the concept of Artificial General Intelligence (AGI) are reviewed including mathematical formalisms, engi- neering, and biology inspired perspectives. The spectrum of designs for AGI systems includes systems with symbolic, emergentist, hybrid and universalist characteristics. Metrics for general intelligence are evaluated, with a conclusion that, although metrics for assessing the achieve- ment of human-level AGI may be relatively straightforward (e.g. the Turing Test, or a robot that can graduate from elementary school or university), metrics for assessing partial progress remain more controversial and problematic.

3.1 Introduction

How can we best conceptualize and approach the original problem regarding which the AI field was founded: the creation of thinking machines with general intelligence comparable to, or greater than, that of human beings? The standard approach of the AI discipline [? ], as it has evolved in the 6 decades since the field’s founding, views artificial intelligence largely in terms of the pursuit of discrete capabilities or specific practical tasks. But while this approach has yielded many interesting technologies and theoretical results, it has proved relatively unsuccessful in terms of the original central goals of the field. Ray Kurzweil [? ] has used the term “narrow AI” to refer to the creation of systems that carry out specific “intelligent” behaviors in specific contexts. For a narrow AI system, if one changes the context or the behavior specification even a little bit, some level of human reprogramming or reconfiguration is generally necessary to enable the system to retain its level of intelligence. This is quite different from natural generally intelligent systems like humans, which have a broad capability to self-adapt to changes in their goals or circumstances, performing “transfer learning” [? ] to generalize knowledge from one goal or context to others. The concept of “Artificial General Intelligence“ has emerged as an antonym to “narrow AI", to refer to systems with this

47 48 3 Artificial General Intelligence: Concept, State of the Art and Future Prospects sort of broad generalization capability. 12 The AGI approach takes “general intelligence“ as a fundamentally distinct property from task or problem specific capability, and focuses directly on understanding this property and creating systems that display it. A system need not possess infinite generality, adaptability and flexibility to count as “AGI”. Informally, AGI may be thought of as aimed at bridging the gap between current AI programs, which are narrow in scope, and the types of AGI systems commonly seen in fiction – robots like R2D2, C3PO, HAL 9000, Wall-E and so forth; but also general intelligences taking non-robotic form, such as the generally intelligent chat-bots depicted in numerous science fiction novels and films. And some researchers construe AGI much more broadly than even the common science fictional interpretations of AI would suggest, interpreting it to encompass the full gamut of possible synthetic minds, including hypothetical ones far beyond human comprehension, such as uncomputable minds like AIXI [Hut05]. The precise definition or characterization of AGI is one of the subjects of study of the AGI research field. In recent years, a somewhat broad community of researchers united by the explicit pursuit of AGI has emerged, as evidenced for instance by conference series like AGI 3, BICA 4 (Biolog- ically Inspired Cognitive Architectures) and Advances in Cognitive Systems 5, and numerous special tracks and symposia on Human-Level Intelligence 6, Integrated Intelligence 7 and related themes. The “AGI community”, consisting e.g. of the attendees at the AGI-related conferences mentioned above, is a fuzzy set containing researchers with various interpretations of, and vary- ing levels of commitment to, the AGI concept. This paper surveys the key ideas and directions of the contemporary AGI community.

3.1.1 What is General Intelligence?

But what is this “general intelligence" of what we speak? A little later, I will review some of the key lines of thinking regarding the precise definition of the GI concept. Qualitatively speaking, though, there is broad agreement in the AGI community on some key features of general intelligence: • General intelligence involves the ability to achieve a variety of goals, and carry out a variety of tasks, in a variety of different contexts and environments.

1 Kurzweil originally contrasted narrow AI with “strong AI”, but the latter term already has a different established meaning in the AI and cognitive science literature [? ], making this an awkward usage. 2 The brief history of the term “Artificial General Intelligence” is as follows. In 2002, Cassio Pennachin and I were editing a book on approaches to powerful AI, with broad capabilities at the human level and beyond, and we were struggling for a title. I emailed a number of colleagues asking for suggestions. My former colleague Shane Legg came up with “Artificial General Intelligence,” which Cassio and I liked, and adopted for the title of our edited book [GP05]. The term began to spread further when it was used in the context of the AGI conference series. A few years later, someone brought to my attention that a researcher named Mark Gubrud had used the term in a 1997 article on the future of technology and associated risks [? ]. If you know of earlier published uses, please let me know. 3 http://agi-conf.org 4 http://bicasociety.org 5 http://www.cogsys.org/ 6 http://www.aaai.org/Press/Reports/Symposia/Fall/fs-04-01.php, http://www.ntu. edu.sg/home/epnsugan/index_files/SSCI2013/CIHLI2013.htm 7 http://www.aaai.org/Conferences/AAAI/2011/aaai11iicall.php 3.1 Introduction 49

• A generally intelligent system should be able to handle problems and situations quite dif- ferent from those anticipated by its creators. • A generally intelligent system should be good at generalizing the knowledge it’s gained, so as to transfer this knowledge from one problem or context to others. • Arbitrarily general intelligence is not possible given realistic resource constraints. • Real-world systems may display varying degrees of limited generality, but are inevitably going to be a lot more efficient at learning some sorts of things than others; and for any given real-world system, there will be some learning tasks on which it is unacceptably slow. So real-world general intelligences are inevitably somewhat biased toward certain sorts of goals and environments. • Humans display a higher level of general intelligence than existing AI programs do, and apparently also a higher level than other animals. 8. • It seems quite unlikely that humans happen to manifest a maximal level of general intelli- gence, even relative to the goals and environment for which they have been evolutionarily adapted.

There is also a common intuition in the AGI community that various real-world general intel- ligences will tend to share certain common properties; though there is less agreement on what these properties are!

3.1.2 The Core AGI Hypothesis

Another point broadly shared in the AGI community is confidence in what I would venture to call the “core AGI hypothesis,” i.e. that Core AGI hypothesis: the creation and study of synthetic intelligences with sufficiently broad (e.g. human-level) scope and strong generalization capability, is at bottom qualitatively different from the creation and study of synthetic intelligences with significantly narrower scope and weaker generalization capability. This “core AGI hypothesis” is explicitly articulated in English for the first time here in this review paper (it was presented previously in Japanese in [? ]). I highlight it because it is something with which nearly all researchers in the AGI community agree, regardless of their different conceptualizations of the AGI concept and their different architectural, theoretical, technical and engineering approaches. 9 If this core hypothesis is correct, then distinguishing AGI as a separate pursuit and system class and property from the “narrow AI” that has come to constitute the main stream of the AI field, is a sensible and productive thing to do. Note, the core AGI hypothesis doesn’t imply there is zero commonality between narrower- scope AI work and AGI work. For instance, if a researcher is engineering a self-driving car via

8 Some researchers have suggested that cetacea might possess general intelligence comparable to that of humans, though very different in nature [? ] 9 It must be admitted that this “core hypothesis", as articulated here, is rather vague. More precise versions can be formulated, but then this seems to require making decisions that only a fraction of the AGI community will agree with. The reality is that currently the level of conceptual agreement among members of the AGI community pursuing different research approaches is mainly at the level of broad, vaguely-stated concepts, rather than precise formulations. 50 3 Artificial General Intelligence: Concept, State of the Art and Future Prospects a combination of specialized AI techniques, they might use methods from the field of transfer learning [? ] to help each component of the car’s control system (e.g. the object recognition system, the steering control system, etc.) better able to deal with various diverse situations it might encounter. This sort of transfer learning research, having to do with generalization, might have some overlap with the work one would need to do to make a generalized “AGI driver“ that could, on its own, adapt its operations flexibly from one vehicle or one environment to another. But the core AGI hypothesis proposes that, in order to make the latter sort of AGI driver, additional architectural and dynamical principles would be required, beyond those needed to aid in the human-mediated, machine learning aided creation of a variety of narrowly specialized AI driving systems.

3.1.3 The Scope of the AGI Field

Within the scope of the core AGI hypothesis, a number of different approaches to defining and characterizing AGI are under current study, encompassing psychological, mathematical, pragmatic and cognitive architecture perspectives. This paper surveys the contemporary AGI field in a fairly inclusive way. It also discusses the question of how much evidence exists for the core AGI hypothesis – and how the task of gathering more evidence about this hypothesis should best be pursued. The goal here is not to present any grand new conclusions, but rather to summarize and systematize some of the key aspects AGI as manifested in current science and engineering efforts. It is argued here that most contemporary approaches to designing AGI systems fall into four top-level categories: symbolic, emergentist, hybrid and universalist. Leading examples of each category are provided, and the generally perceived pros and cons of each category are summarized. Not all contemporary AGI approaches seek to create human-like general intelligence specif- ically. But it is argued here, that, for any approach which does, there is a certain set of key cognitive processes and interactions that it must come to grips with, including familiar con- structs such as working and long-term memory, deliberative and reactive processing, perception, action and reinforcement learning, metacognition and so forth. A robust theory of general intelligence, human-like or otherwise, remains elusive. Multiple approaches to defining general intelligence have been proposed, and in some cases these coincide with different approaches to designing AGI systems (so that various systems aim for general intelligence according to different definitions). The perspective presented here is that a mature theory of AGI would allow one to theoretically determine, based on a given environment and goal set and collection of resource constraints, the optimal AGI architecture for achieving the goals in the environments given the constraints. Lacking such a theory at present, researchers must conceive architectures via diverse theoretical paradigms and then evaluate them via practical metrics. Finally, in order for a community to work together toward common goals, environments and metrics for evaluation of progress are necessary. Metrics for assessing the achievement of human-level AGI are argued to be fairly straightforward, including e.g. the classic Turing test, and the test of operating a robot that can graduate from elementary school or university. On the other hand, metrics for assessing partial progress toward, human-level AGI are shown to be more controversial and problematic, with different metrics suiting different AGI approaches, 3.2 Characterizing AGI and General Intelligence 51 and with the possibility of systems whose partial versions perform poorly on commonsensical metrics, yet whose complete versions perform well. The problem of defining agreed-upon metrics for incremental progress remains largely open, and this constitutes a substantial challenge for the young field of AGI moving forward.

3.2 Characterizing AGI and General Intelligence

One interesting feature of the AGI community, alluded to above, is that it does not currently agree on any single definition of the AGI concept – though there is broad agreement on the gen- eral intuitive nature of AGI, along the lines I’ve summarized above; and broad agreement that some form of the core AGI hypothesis is true. There is a mature theory of general intelligence in the psychology field, and a literature in the AGI field on the formal mathematical definition of intelligence; both of these will be reviewed below; however, none of the psychological nor mathematical conceptions of general intelligence are accepted as foundational in their details, by more than a small plurality of the AGI community. Rather, the formulation of a detailed and rigorous theory of “what AGI is”, is a small but significant part of the AGI community’s ongoing research. The bulk of the emerging AGI community’s efforts is devoted to devising and implementing designs for AGI systems, and developing theories regarding the best way to do so; but the fleshing out of the concept of “AGI” is being accomplished alongside and in synergy with these other tasks. It must be noted, however, that the term “AI" also has many different meanings within the AI research community, with no clear agreement on the definition. George Lugar’s popular AI textbook famously defined it as “that which AI practitioners do." The border between AI and advanced algorithmics is often considered unclear. A common joke is that, as soon as a certain functionality has been effectively achieved by computers, it’s no longer considered AI. The situation with the ambiguity of “AGI" is certainly no worse than that with the ambiguity of the term “AI" itself. In terms of basic semantics, the term “AGI” has been variously used to describe • a property of certain systems (“AGI” as the intersection of “artificial” (i.e. synthetic) and “generally intelligent”) • a system that displays this property (an “AGI” meaning “an AGI system”) • the field of endeavor pursuing the creation of AGI systems, and the study of the nature of AGI AGI is related to many other terms and concepts. Joscha Bach [? ] has elegantly character- ized it in terms of the quest to create “synthetic intelligence.” One also finds communities of re- searchers working toward AGI-related goals under the labels “computational intelligence”, “nat- ural intelligence”, “cognitive architecture”, “biologically inspired cognitive architecture” (BICA), and many others. Each of these labels was introduced with a certain underlying purpose, and has a specific collection of concepts and approaches associated with it; each corresponds to a certain perspective or family of perspectives. The specific purpose underlying the concept and term “AGI” is to focus attention on the general scope and generalization capability of certain intelligent systems, such as humans, theoretical system like AIXI [Hut05], and a subset of potential future synthetic intelligences. That is, roughly speaking, an AGI system is a syn- 52 3 Artificial General Intelligence: Concept, State of the Art and Future Prospects thetic intelligence that has a general scope and is good at generalization across various goals and contexts. The ambiguity of the concept of “AGI" relates closely to the underlying ambiguity of the concepts of “intelligence" and “general intelligence." The AGI community has embraced, to varying extents, a variety of characterizations of general intelligence, finding each of them to contribute different insights to the AGI quest. Legg and Hutter [? ] wrote a paper summarizing and organizing over 70 different published definitions of “intelligence”, most oriented toward general intelligence, emanating from researchers in a variety of disciplines. In the rest of this section I will overview the main approaches to defining or characterizing general intelligence taken in the AGI field.

3.2.1 AGI versus Human-Level AI

One key distinction to be kept in mind as we review the various approaches to characterizing AGI, is the distinction between AGI and the related concept of “human-level AI” (which is usually used to mean, in effect: human-level, reasonably human-like AGI). AGI is a fairly abstract notion, which is not intrinsically tied to any particular characteristics of human beings. Some properties of human general intelligence may in fact be universal among all powerful AGIs, but given our current limited understanding of general intelligence, it’s not yet terribly clear what these may be. The concept of “human-level AGI”, interpreted literally, is confusing and ill-defined. It’s difficult to place the intelligences of all possible systems in a simple hierarchy, according to which the “intelligence level” of an arbitrary intelligence can be compared to the “intelligence level” of a human. Some researchers, as will be discussed below, have proposed universal intelligence measures that could be used in this way; but currently the details and utility of such measures are both quite contentious. To keep things simpler, here I will interpret “human-level AI” as meaning “human-level and roughly human-like AGI,” a restriction that makes the concept much easier to handle. For AGI systems that are supposed to operate in similar sorts of environments to humans, according to cognitive processes vaguely similar to those used by humans, the concept of “human level” is relatively easy to understand. The concept of “AGI” appears more theoretically fundamental than “human-level AGI”; how- ever, its very breadth can also be problematic. “Human-level AGI” is more concrete and specific, which lets one take it in certain directions more easily than can be done with general AGI. In our discussions on evaluations and metrics below, for example, we will restrict attention to human- level AGI systems, because otherwise creating metrics to compare qualitatively different AGI systems becomes a much trickier problem.

3.2.2 The Pragmatic Approach to Characterizing General Intelligence

The pragmatic approach to conceptualizing general intelligence is typified by the AI Magazine article “Human Level Artificial Intelligence? Be Serious!”, written by Nils Nilsson, one of the early leaders of the AI field [? ]. Nilsson’s view is 3.2 Characterizing AGI and General Intelligence 53

... that achieving real Human Level artificial intelligence would necessarily imply that most of the tasks that humans perform for pay could be automated. Rather than work toward this goal of automation by building special-purpose systems, I argue for the development of general-purpose, educable systems that can learn and be taught to perform any of the thousands of jobs that humans can perform. Joining others who have made similar proposals, I advocate beginning with a system that has minimal, although extensive, built-in capabilities. These would have to include the ability to improve through learning along with many other abilities. In this perspective, once an AI obsoletes humans in most of the practical things we do, it’s got general Human Level intelligence. The implicit assumption here is that humans are the generally intelligent system we care about, so that the best practical way to characterize general intelligence is via comparison with human capabilities. The classic Turing Test for machine intelligence – simulating human conversation well enough to fool human judges [? ] – is pragmatic in a similar sense to Nilsson. But the Turing test has a different focus, on emulating humans. Nilsson isn’t interested in whether an AI system can fool people into thinking it’s a human, but rather in whether an AI system can do the useful and important practical things that people can do.

3.2.3 Psychological Characterizations of General Intelligence

The psychological approach to characterizing general intelligence also focuses on human-like general intelligence; but rather than looking directly at practical capabilities, it tries to isolate deeper underlying capabilities that enable these practical capabilities. In practice it encompasses a broad variety of sub-approaches, rather than presenting a unified perspective. Viewed historically, efforts to conceptualize, define, and measure intelligence in humans re- flect a distinct trend from general to specific (it is interesting to note the similarity between historical trends in psychology and AI) [? ]. Thus, early work in defining and measuring in- telligence was heavily influenced by Spearman, who in 1904 proposed the psychological factor g (the “g factor”, for general intelligence) [? ]. Spearman argued that g was biologically deter- mined, and represented the overall intellectual skill level of an individual. A related advance was made in 1905 by Binet and Simon, who developed a novel approach for measuring general intelligence in French schoolchildren [? ]. A unique feature of the Binet-Simon scale was that it provided comprehensive age norms, so that each child could be systematically compared with others across both age and intellectual skill level. In 1916, Terman introduced the notion of an intelligence quotient or IQ, which is computed by dividing the test-taker’s mental age (i.e., their age-equivalent performance level) by their physical or chronological age [? ]. In subsequent years, psychologists began to question the concept of intelligence as a single, undifferentiated capacity. There were two primary concerns. First, while performance within an individual across knowledge domains is somewhat correlated, it is not unusual for skill lev- els in one domain to be considerably higher or lower than in another (i.e., intra-individual variability). Second, two individuals with comparable overall performance levels might differ significantly across specific knowledge domains (i.e., inter-individual variability). These issues helped to motivate a number of alternative theories, definitions, and measurement approaches, which share the idea that intelligence is multifaceted and variable both within and across in- dividuals. Of these approaches, a particularly well-known example is Gardner’s theory of mul- tiple intelligences, which proposes eight distinct forms or types of intelligence: (1) linguistic, (2) logical-mathematical, (3) musical, (4) bodily-kinesthetic, (5) spatial, (6) interpersonal, (7) 54 3 Artificial General Intelligence: Concept, State of the Art and Future Prospects intrapersonal, and (8) naturalist [? ]. Gardner’s theory suggests that each individual’s intellec- tual skill is represented by an intelligence profile, that is, a unique mosaic or combination of skill levels across the eight forms of intelligence.

3.2.3.1 Competencies Characterizing Human-Level General Intelligence

Another approach to understanding general intelligence based on the psychology literature, is to look at the various competencies that cognitive scientists generally understand humans to display. The following list of competencies was assembled at the 2009 AGI Roadmap Workshop [? ] via a group of 12 experts, including AGI researchers and psychologists, based on a review of the AI and psychology literatures. The list is presented as a list of broad areas of capability, each one then subdivided into specific sub-areas: • Perception

– Vision: image and scene analysis and understanding – Hearing: identifying the sounds associated with common objects; understanding which sounds come from which sources in a noisy environment – Touch: identifying common objects and carrying out common actions using touch alone – Crossmodal: Integrating information from various senses – Proprioception: Sensing and understanding what its body is doing

• Actuation – Physical skills: manipulating familiar and unfamiliar objects – Tool use, including the flexible use of ordinary objects as tools – Navigation, including in complex and dynamic environments

• Memory – Implicit: Memory the content of which cannot be introspected – Working: Short-term memory of the content of current/recent experience (awareness) – Episodic: Memory of a first-person experience (actual or imagined) attributed to a particular instance of the agent as the subject who had the experience – Semantic: Memory regarding or beliefs – Procedural: Memory of sequential/parallel combinations of (physical or mental) actions, often habituated (implicit) – • Learning

– Imitation: Spontaneously adopt new behaviors that the agent sees others carrying out – Reinforcement: Learn new behaviors from positive and/or negative reinforcement sig- nals, delivered by teachers and/or the environment – Interactive verbal instruction – Learning from written media – Learning via experimentation • Reasoning 3.2 Characterizing AGI and General Intelligence 55

– Deduction, from uncertain premises observed in the world – Induction, from uncertain premises observed in the world – Abduction, from uncertain premises observed in the world – Causal reasoning, from uncertain premises observed in the world – Physical reasoning, based on observed “fuzzy rules” of naive physics – Associational reasoning, based on observed spatiotemporal associations • Planning – Tactical – Strategic – Physical – Social • Attention – Visual Attention within the agent’s of its environment – Social Attention – Behavioral Attention • Motivation – Subgoal creation, based on the agent’s preprogrammed goals and its reasoning and planning – Affect-based motivation – Control of emotions • Emotion – Expressing Emotion – Perceiving / Interpreting Emotion • Modeling Self and Other – Self-Awareness – Theory of Mind – Self-Control – Other-Awareness – Empathy • Social Interaction – Appropriate Social Behavior – Communication about and oriented toward social relationships – Inference about social relationships – Group interactions (e.g. play) in loosely-organized activities • Communication

– Gestural communication to achieve goals and express emotions – Verbal communication using natural language in its life-context – Pictorial Communication regarding objects and scenes with 56 3 Artificial General Intelligence: Concept, State of the Art and Future Prospects

– Language acquisition – Cross-modal communication • Quantitative

– Counting sets of objects in its environment – Simple, grounded arithmetic with small numbers – Comparison of observed entities regarding quantitative properties – Measurement using simple, appropriate tools

• Building/Creation – Physical: creative constructive play with objects – Conceptual invention: concept formation – Verbal invention – Social construction (e.g. assembling new social groups, modifying existing ones)

Different researchers have different views about which of the above competency areas is most critical, and as you peruse the list, you may feel that it over or under emphasizes certain aspects of intelligence. But it seems clear that any software system that could flexibly and robustly display competency in all of the above areas, would be broadly considered a strong contender for possessing human-level general intelligence.

3.2.4 A Cognitive-Architecture Perspective on General Intelligence

Complementing the above perspectives, Laird et al [? ] have composed a list of “requirements for human-level intelligence” from the standpoint of designers of cognitive architectures. Their own work has mostly involved the SOAR cognitive architecture, which has been pursued from the AGI perspective, but also from the perspective of accurately simulating human cognition: • R0. FIXED STRUCTURE FOR ALL TASKS (i.e., explicit loading of knowledge files or software modification should not be done when the AGI system is presented with a new task) • R1. REALIZE A SYMBOL SYSTEM (i.e., the system should be able to create symbol- ism and utilize symbolism internally, regardless of whether this symbolism is represented explicitly or implicitly within the system’s knowledge representation) • R2. REPRESENT AND EFFECTIVELY USE MODALITY-SPECIFIC KNOWLEDGE • R3. REPRESENT AND EFFECTIVELY USE LARGE BODIES OF DIVERSE KNOWL- EDGE • R4. REPRESENT AND EFFECTIVELY USE KNOWLEDGE WITH DIFFERENT LEV- ELS OF GENERALITY • R5. REPRESENT AND EFFECTIVELY USE DIVERSE LEVELS OF KNOWLEDGE • R6. REPRESENT AND EFFECTIVELY USE BELIEFS INDEPENDENT OF CURRENT PERCEPTION • R7. REPRESENT AND EFFECTIVELY USE RICH, HIERARCHICAL CONTROL KNOWL- EDGE • R8. REPRESENT AND EFFECTIVELY USE META-COGNITIVE KNOWLEDGE 3.2 Characterizing AGI and General Intelligence 57

• R9. SUPPORT A SPECTRUM OF BOUNDED AND UNBOUNDED DELIBERATION (where “bounded” refers to computational space and time resource utilization) • R10. SUPPORT DIVERSE, COMPREHENSIVE LEARNING • R11. SUPPORT INCREMENTAL, ONLINE LEARNING

As Laird et al [? ] note, there are no current AI systems that plainly fulfill all these requirements (although the precise definitions of these requirements may be open to a fairly broad spectrum of interpretations). It is worth remembering, in this context, Stan Franklin’s careful articulation of the difference between a software “agent” and a mere “program” [? ]:

An autonomous agent is a system situated within and a part of an environment that senses that environment and acts on it, over time, in pursuit of its own agenda and so as to effect what it senses in the future.

Laird and Wray’s requirements do not specify that the general intelligence must be an au- tonomous agent rather than a program. So, their requirements span both “agent AI” and “tool AI”. However, if we piece together Franklin’s definition with Laird and Wray’s requirements, we get a reasonable stab at a characterization of a “generally intelligent agent”, from the perspective of the cognitive architecture designer.

3.2.5 A Mathematical Approach to Characterizing General Intelligence

In contrast to approaches focused on human-like general intelligence, some researchers have sought to understand general intelligence in general. The underlying intuition here is that • Truly, absolutely general intelligence would only be achievable given infinite computational ability. For any computable system, there will be some contexts and goals for which it’s not very intelligent • However, some finite computational systems will be more generally intelligent than others, and it’s possible to quantify this extent This approach is typified by the recent work of Legg and Hutter [? ], who give a formal defini- tion of general intelligence based on the Solomonoff-Levin prior. Put very roughly, they define intelligence as the average reward-achieving capability of a system, calculated by averaging over all possible reward-summable environments, where each environment is weighted in such a way that more compactly describable programs have larger weights. According to this sort of measure, humans are nowhere near the maximally generally in- telligent system. However, humans are more generally intelligent than, say, rocks or worms. 10

10 A possible practical issue with this approach is that the quantitative general-intelligence values it yields are dependent on the choice of reference Universal Turing Machine underlying the measurement of program length. A system is judged as intelligent largely based on extent that it solves simple problems effectively, but the definition of “simple", in practice, depends on the assumed UTM – what is simple to one UTM may be complex to another. In the limit of infinitely large problems, this issue goes away due to the ability of any UTM to 58 3 Artificial General Intelligence: Concept, State of the Art and Future Prospects

While the original form of Legg and Hutter’s definition of intelligence is impractical to compute, a more tractable approximation has recently been developed [? ]. Also, Achler [? ] has proposed an interesting, pragmatic AGI intelligence measurement approach explicitly inspired by these formal approaches, in the sense that it explicitly balances the effectiveness of a system at solving problems with the compactness of its solutions. This is similar to a common strategy in evolutionary program learning, where one uses a fitness function comprising an accuracy term and an “Occam’s Razor" compactness term.

3.2.6 The Adaptationist Approach to Characterizing General Intelligence

Another perspective views general intelligence as closely tied to the environment in which it exists. Pei Wang has argued carefully for a conception of general intelligence as adaptation to the environment using insufficient resources [Wan06]. A system may be said to have greater general intelligence, if it can adapt effectively to a more general class of environments, within reaepilistic resource constraints. In a 2010 paper, I sought to modify Legg and Hutter’s mathematical approach in an attempt to account for the factors Wang’s definition highlights [? ]; • The pragmatic general intelligence is defined relative to a given distribution over environments and goals, as the average goal-achieving capability of a system, calculated by weighted-averaging over all possible environments and goals, using the given distribution to determine the weights • The generality of a system’s intelligence is defined in a related way, as (roughly speaking) the entropy of the class of environments over which the system displays high pragmatic general intelligence • The efficient pragmatic general intelligence is defined relative to a given probability distribu- tion over environments and goals, as the average effort-normalized goal-achieving capability of a system, calculated by weighted-averaging over all possible environments and goals, us- ing the given distribution to determine the weights. The effort-normalized goal-achieving capability of a system is defined by taking its goal-achieving capability (relative to a par- ticular goal and environment), and dividing it by the computational effort the system must expend to achieve that capability

Lurking in this vicinity are some genuine differences of perspective with the AGI community, regarding the proper way to conceive general intelligence. Some theorists (e.g. Legg and Hutter) argue that intelligence is purely a matter of capability, and that the intelligence of a system is purely a matter of its behaviors, and is independent of how much effort it expends in achieving its behaviors. On the other hand, some theorists (e.g. Wang) believe that the essence of general intelligence lies in the complex systems of compromises needed to achieve a reasonable degree of generality of adaptation using limited computational resources. In the latter, adaptationist simulate any other one, but human intelligence is not currently mainly concerned with the limit of infinitely large problems. This means that in order to turn these ideas into a practical intelligence measure, one would have to make a commitment to a particular UTM; and current science and philosophy don’t give strong guidance regarding which one to choose. How large a difficulty this constitutes in practice remains to be seen. Researchers working on this sort of approach, tend not to consider this a real problem. 3.2 Characterizing AGI and General Intelligence 59 view, the sorts of approaches to goal-achievement that are possible in the theoretical case of infinite or massive computational resources, have little to do with real-world general intelligence. But in the former view, real-world general intelligence can usefully be viewed as a modification of infinite-resources, infinitely-general intelligence to the case of finite resources.

3.2.7 The Embodiment Focused Approach to Characterizing General Intelligence

A close relative of the adaptationist approach, but with a very different focus that leads to some significant conceptual differences as well, is what we may call the embodiment approach to characterizing general intelligence. In brief this perspective holds that intelligence is something that physical bodies do in physical environments. It holds that intelligence is best understood via focusing on the modulation of the body-environment interaction that an embodied system carries out as it goes about in the world. Rodney Brooks is one of the better known advocates of this perspective [? ]. Pfeifer and Bonard summarize the view of intelligence underlying this perspective adroitly as follows: “ In spite of all the difficulties of coming up with a concise definition, and regardless of the enormous complexities involved in the concept of intelligence, it seems that whatever we intuitively view as intelligent is always vested with two particular characteristics: compliance and diversity. In short, intelligent agents always comply with the physical and social rules of their environment, and exploit those rules to produce diverse behavior. " [? ]. For example, they note: “All animals, humans and robots have to comply with the fact that there is gravity and friction, and that locomotion requires energy... [A]dapting to these constraints and exploiting them in particular ways opens up the possibility of walking, running, drinking from a cup, putting dishes on a table, playing soccer, or riding a bicycle." Pfeifer and Bonard go so far as to assert that intelligence, in the perspective they analyze it, doesn’t apply to conventional AI software programs. “We ascribe intelligence only to ... real physical systems whose behavior can be observed as they interact with the environment. Soft- ware agents, and computer programs in general, are disembodied, and many of the conclusions drawn ... do not apply to them." Of course, this sort of view is quite contentious, and e.g. Pei Wang has argued against it in a paper titled “Does a Laptop Have a Body?" [? ] – the point being that any software program with any kind of user interface is interacting with the physical world via some kind of body, so the distinctions involved are not as sharp as embodiment-oriented researchers sometimes imply. Philosophical points intersect here with issues regarding research focus. Conceptually, the embodiment perspective asks whether it even make sense to talk about human-level or human- like AGI in a system that lacks a vaguely human-like body. Focus-wise, this perspective suggests that, if one is interested in AGI, it makes sense to put resources on achieving human-like intelligence the way evolution did, i.e. in the context of controlling a body with complex sensors and actuators in a complex physical world. The overlap between the embodiment and adaptationist approaches is strong, because histor- ically, human intelligence evolved specifically to adapt to the task of controlling a human body in certain sorts of complex environment, given limited energetic resources and subject to par- ticular physical constraints. But, the two approaches are not identical, because the embodiment approach posits that adaptation to physical body-control tasks under physical constraints is key, 60 3 Artificial General Intelligence: Concept, State of the Art and Future Prospects whereas the adaptationist approach holds that the essential point is more broadly-conceived adaptation to environments subject to resource constraints.

3.3 Approaches to Artificial General Intelligence

As appropriate for an early-stage research field, there is a wide variety of different approaches to AGI in play. Fairly comprehensive reviews have been provided by Wlodek Duch’s review pa- per from the AGI-08 conference [? ]; and Alexei Samsonovich’s BICA review paper [? ], which compares a number of (sometimes quite loosely) biologically inspired cognitive architectures in terms of a feature checklist, and was created collaboratively with the creators of the architec- tures. Hugo de Garis and I also wrote two review papers, one focused on biologically-inspired cognitive architectures [? ] and the other on computational neuroscience systems with AGI ambitions [? ]. Here I will not try to review the whole field in detail; I will be content with describing the main categories of approaches, and briefly citing a few illustrative examples of each one. 11 Duch’s survey [? ], divides existing approaches into three paradigms – symbolic, emergentist and hybrid. Whether this trichotomy has any fundamental significance is somewhat contentious, but it is convenient given the scope of approaches currently and historically pursued, so I will use it to help structure the present brief review of AGI approaches. But I will deviate from Duch in a couple ways: I add one additional category (“universalist”), and I split the emergentist category into multiple subcategories.

3.3.1 Symbolic AGI Approaches

A venerable tradition in AI focuses on the physical symbol system hypothesis [? ], which states that minds exist mainly to manipulate symbols that represent aspects of the world or them- selves. A physical symbol system has the ability to input, output, store and alter symbolic entities, and to execute appropriate actions in order to reach its goals. Generally, symbolic cog- nitive architectures focus on “working memory” that draws on long-term memory as needed, and utilize a centralized control over perception, cognition and action. Although in principle such ar- chitectures could be arbitrarily capable (since symbolic systems have universal representational and computational power, in theory), in practice symbolic architectures tend to be weak in learning, creativity, procedure learning, and episodic and associative memory. Decades of work in this tradition have not compellingly resolved these issues, which has led many researchers to explore other options. Perhaps the most impressive successes of symbolic methods on learning problems have oc- curred in the areas of Genetic Programming (GP) [? ], Inductive Logic Programming [? ], and probabilistic learning methods such as Markov Logic Networks (MLN) [? ]. These techniques are interesting from a variety of theoretical and practical standpoints. For instance, it is no-

11 The choice to include approach X here and omit approach Y, should not be construed as implying that I think X has more potential than Y, or even that X illustrates the category in which I’ve placed it better than Y would. Rather, there are a lot of AGI approaches and systems out there, and I’ve selected a few reasonably representative ones to give an overall picture of the field. 3.3 Approaches to Artificial General Intelligence 61 table that, GP and MLN have been usefully applied to high-level symbolic relationships , and also to quantitative data resulting directly from empirical observations, depending on how one configures them and how one prepares their inputs. Another important one may make about these methods is that, in each case, the ability to do data-driven learning using an underlying symbolic representation, comes along with a lack of transparency in how and why the learning algorithms come up with the symbolic constructs that they do. Nontrivially large GP program trees are generally quite opaque to the human reader, though in principle using a comprehensible symbolic formalism. The propositions making up a Markov Logic Network are easy to understand, but the reasons that MLN weight learning ranks one propositional rule higher than another over a given set of evidence, are obscure and not easily determinable from the results MLN produces. In some ways these algorithms blur the border between symbolic and subsymbolic, because they use underlying symbolic representation languages according to algorithms that produce large, often humanly inscrutable combinations of data elements in a manner conceptually similar to many subsymbolic learning algorithms. Indeed, the complex, somewhat “emergentist" nature of “symbolic" algorithms like GP and MLN provides a worthwhile reminder that the “symbolic vs. subsymbolic" dichotomy, while heuristically valuable for describing the AI and AGI approaches existent at the current time, is not necessarily a clear, crisp, fundamentally grounded distinction. It is utilized here more for its sociological descriptive value, as for its core value as a scientific, mathematical or philosophical distinction. A few illustrative symbolic cognitive architectures are:

• ACT-R [? ] is fundamentally a symbolic system, but Duch classifies it as a hybrid system because it incorporates connectionist-style activation spreading in a significant role; and there is an experimental thoroughly connectionist implementation to complement the pri- mary mainly-symbolic implementation. Its combination of SOAR-style “production rules” with large-scale connectionist dynamics allows it to simulate a variety of human psycholog- ical phenomena. • Cyc [? ] is an AGI architecture based on predicate logic as a knowledge representation, and using logical reasoning techniques to answer questions and derive new knowledge from old. It has been connected to a natural language engine, and designs have been created for the connection of Cyc with Albus’s 4D-RCS [? ]. Cyc’s most unique aspect is the large database of commonsense knowledge that Cycorp has accumulated (millions of pieces of knowledge, entered by specially trained humans in predicate logic format); part of the philosophy underlying Cyc is that once a sufficient quantity of knowledge is accumulated in the knowledge base, the problem of creating human-level general intelligence will become much less difficult due to the ability to leverage this knowledge. • EPIC [RCK01], a cognitive architecture aimed at capturing human perceptual, cognitive and motor activities through several interconnected processors working in parallel. The system is controlled by production rules for cognitive processor and a set of perceptual (visual, auditory, tactile) and motor processors operating on symbolically coded features rather than raw sensory data. It has been connected to SOAR for problem solving, planning and learning. • ICARUS [? ], an integrated cognitive architecture for physical agents, with knowledge. specified in the form of reactive skills, each denoting goal-relevant reactions to a class of problems. The architecture includes a number of modules: a perceptual system, a planning system, an execution system, and several memory systems. 62 3 Artificial General Intelligence: Concept, State of the Art and Future Prospects

• SNePS (Semantic Network Processing System) [? ] is a logic, frame and network-based knowledge representation, reasoning, and acting system that has undergone over three decades of development, and has been used for some interesting prototype experiments in language processing and virtual agent control. • SOAR [? ], a classic example of expert rule-based cognitive architecture. designed to model general intelligence. It has recently been extended to handle sensorimotor functions and reinforcement learning. A caricature of some common attitudes for and against the symbolic approach to AGI would be: • For: Symbolic thought is what most strongly distinguishes humans from other animals; it’s the crux of human general intelligence. Symbolic thought is precisely what lets us gener- alize most broadly. It’s possible to realize the symbolic core of human general intelligence independently of the specific neural processes that realize this core in the brain, and inde- pendently of the sensory and motor systems that serve as (very sophisticated) input and output conduits for human symbol-processing. • Against: While these symbolic AI architectures contain many valuable ideas and have yielded some interesting results, they seem to be incapable of giving rise to the emergent structures and dynamics required to yield humanlike general intelligence using feasible com- putational resources. Symbol manipulation emerged evolutionarily from simpler processes of perception and motivated action; and symbol manipulation in the human brain emerges from these same sorts of processes. Divorcing symbol manipulation from the underlying sub- strate of perception and motivated action doesn’t make sense, and will never yield generally intelligent agents, at best only useful problem-solving tools.

3.3.2 Emergentist AGI Approaches

Another species of AGI design expects abstract symbolic processing – along with every other aspect of intelligence – to emerge from lower-level “subsymbolic” dynamics, which sometimes (but not always) are designed to simulate neural networks or other aspects of human brain function. Today’s emergentist architectures are sometimes very strong at recognizing patterns in high-dimensional data, reinforcement learning and associative memory; but no one has yet compellingly shown how to achieve high-level functions such as abstract reasoning or complex language processing using a purely subsymbolic, emergentist approach. There are research re- sults doing inference and language processing using subsymbolic architectures, some of which are reviewed in [? ]; but these mainly involve relatively simplistic problem cases. The most broadly effective reasoning and language processing systems available are those utilizing various forms of symbolic representations, though often also involving forms of probabilistic, data-driven learn- ing, as in examples like Markov Logic Networks [? ] and statistical language processing [? ]. A few illustrative subsymbolic, emergentist cognitive architectures are: • DeSTIN [?? ] is a hierarchical temporal pattern recognition architecture, with some similarities to HTM [? ] but featuring more complex learning mechanisms. It has been integrated into the CogPrime [? ] architecture to serve as a perceptual subsystem; but is primarily being developed to serve as the center of its own AGI design, assisted via action and reinforcement hierarchies. 3.3 Approaches to Artificial General Intelligence 63

• Hierarchical Temporal Memory (HTM) [? ] is a hierarchical temporal pattern recog- nition architecture, presented as both an AI / AGI approach and a model of the cortex. So far it has been used exclusively for vision processing, but a conceptual framework has been outlined for extension to action and perception/action coordination. • SAL [JL08], based on the earlier and related IBCA (Integrated Biologically-based Cog- nitive Architecture) is a large-scale emergent architecture that seeks to model distributed information processing in the brain, especially the posterior and frontal cortex and the hippocampus. So far the architectures in this lineage have been used to simulate various human psychological and psycholinguistic behaviors, but haven’t been shown to give rise to higher-level behaviors like reasoning or subgoaling. • NOMAD (Neurally Organized Mobile Adaptive Device) automata and its successors [? ] are based on Edelman’s “Neural Darwinism” model of the brain, and feature large numbers of simulated neurons evolving by natural selection into configurations that carry out senso- rimotor and categorization tasks. This work builds conceptually on prior work by Edelman and colleagues on the “Darwin" series of brain-inspired perception systems [? ]. • Ben Kuipers and his colleagues [? MK08? ] have pursued an extremely innovative research program which combines qualitative reasoning and reinforcement learning to enable an intelligent agent to learn how to act, perceive and model the world. Kuipers’ notion of “bootstrap learning” involves allowing the robot to learn almost everything about its world, including for instance the structure of 3D space and other things that humans and other animals obtain via their genetic endowments. • Tsvi Achler [? ] has demonstrated neural networks whose weights adapt according to a different methodology than the usual, combining feedback and feedforward dynamics in a particular way, with the result that the weights in the network have a clear symbolic meaning. This provides a novel approach to bridging the symbolic-subsymbolic gap. There has also been a great deal of work relevant to these sorts of architectures, done without explicit reference to cognitive architectures, under labels such as “deep learning" – e.g. Andrew Ng’s well known work applying deep learning to practical vision processing problems [?? ], and the work of Tomasso Poggio and his team which achieves deep learning via simulations of visual cortex [? ]. And there is a set of emergentist architectures focused specifically on , which we will review below in a separate subsection, as all of these share certain common characteristics. A caricature of some common attitudes for and against the emergentist approach to AGI would be: • For: The brain consists of a large set of simple elements, complexly self-organizing into dynamical structures in response to the body’s experience. So, the natural way to approach AGI is to follow a similar approach: a large set of simple elements capable of appropriately adaptive self-organization. When a cognitive faculty is achieved via emergence from sub- symbolic dynamics, then it automatically has some flexibility and adaptiveness to it (quite different from the “brittleness” seen in many symbolic AI systems). The human brain is ac- tually very similar to the brains of other mammals, which are mostly involved in processing high-dimensional sensory data and coordinating complex actions; this sort of processing, which constitutes the foundation of general intelligence, is most naturally achieved via sub- symbolic means. • Against: The brain happens to achieve its general intelligence via self-organizing networks of neurons, but to focus on this underlying level is misdirected. What matters is the cognitive 64 3 Artificial General Intelligence: Concept, State of the Art and Future Prospects

“software” of the mind, not the lower-level hardware or wetware that’s used to realize it. The brain has a complex architecture that evolution has honed specifically to support advanced symbolic reasoning and other aspects of human general intelligence; what matters for creating human-level (or greater) intelligence is having the right information processing architecture, not the underlying mechanics via which the architecture is implemented.

3.3.2.1 Computational Neuroscience as a Route to AGI

One commonsensical approach to AGI, falling conceptually under the “emergentist” umbrella, would be to use computational neuroscience to create a model of how the brain works, and then to use this model as an AGI system. If we understood the brain more fully, this would be an extremely effective approach to creating the world’s first human-level AGI. Given the reality of our currently limited understanding of the brain and how best to digitally simulate it, the computational neuroscience approach to AGI is no panacea, and in fact is almost impossible to pursue – but this is an interesting direction nonetheless. To understand the difficulty of taking this approach to AGI, consider some illustrative ex- amples of contemporary large-scale computational neuroscience projects: • Markram’s IBM “Blue Brain Project” , which used a “Blue Gene” supercomputer to simulate (at ion channel level of detail) the neural signaling of a cortical column of the rat brain. The long-term goal of the project, now continuing in the EU with a large sum of government funding under the label “Human Brain Project", is to “be able to simulate the full cortex of the human brain” [Mar06]. • Modha’s IBM “Cognitive Computation Project”, aimed at “reverse engineering the structure, function, dynamics and behavior of the human brain, and then delivering it in a small compact form factor consuming very low power that rivals the power consumption of the human brain.” The best publicized achievement of Modha’s team has been a simulation (at a certain level of accuracy) of a neural network the size of the “cortex of a cat”, with 109 neurons, and 1013 synapses [? ]. • Boahen’s “Neurogrid Project” (at Stanford), involving the creation of custom integrated circuits that emulate the way neurons compute. So far his “neuromorphic engineering” research group has built a silicon retina, intended to be developed into something capable of giving the blind some degree of sight; and a self organizing chip, that emulates the way a developing brain wires itself up [? ]. • Horwitz’s “Large-Scale Brain Modeling” (at the US NIH) initiative, involving simulation of the dynamic assemblage of neural subnetworks performing cognitive tasks, especially those associated with audition and language, and with an emphasis on the alteration of these networks during brain disorders. Horwitz’s simulation work is guided closely by data gathered from brain imaging using fMRI, PET, and MEG [? ]. • Izhikevich’s and Edelman’s “Large Scale Model of Thalamocortical Systems” , a simulation on a scale similar to that of the full human brain itself. By simulating the spiking and plasticity features of the neural cortex, they managed to reproduce certain special features of the brain, such as initial states sensitivity, brain wave propagation, etc. Their model was used to simulate a million spiking neurons consisting of multiple compartments, joined by a half billion synapses, with responses calibrated to reproduce known types of responses recorded in vitro in rats. In this simulation, they observed a variety of interesting phenom- ena, including: spontaneous activity, the emergence of waves and rhythms, and functional 3.3 Approaches to Artificial General Intelligence 65

connectivity on different scales.[IE08]. Izhikevich’s current proprietary work in his firm “The Brain Corporation" is founded on similar principles. • Just’s “4CAPS” (Cortical Capacity-Constrained Concurrent Activation-based Production System) cognitive architecture, a hybrid of a computational neuroscience model and a sym- bolic AI system, intended to explain both behavioral and neuroimaging data. The architec- ture includes computational features such as variable-binding and constituent-structured representations, alongside more standard neural net structures and dynamics [? ]. These are all fantastic projects; however, they embody a broad scope of interpretations of the notion of “simulation” itself. Different researchers are approaching the task of large-scale brain simulation with very different objectives in mind, e.g. 1. Creating models that can actually be connected to parts of the human brain or body, and can serve the same role as the brain systems they simulate. (e.g. Boahen’s artificial cochlea and retina [? ]). 2. Creating a precise functional simulation of a brain subsystem, i.e. one that simulates the subsystem’s internal dynamics and its mapping of inputs to outputs with adequate fidelity to explain exactly what the brain subsystem does to control the organism. (something that so far has been done compelling only on a small scale for very specialized brain systems; Horwitz’s work is pushing in this direction on a somewhat larger scale than typical). 3. Creating models that quantitatively simulate the generic behavior and internal dynamics of a certain subsystem of the brain, but without precisely functionally simulating that sub- system. (e.g. Izhikevich and Edelman’s large-scale simulation, and Markram’s “statistically accurate” simulated cortical column). 4. Creating models that qualitatively simulate brain subsystems or whole brains at a high level, without simulating the particular details of dynamics or I/O, but with a goal of exploring some of the overall properties of the system. (e.g. Just’s 4CAPS work). 5. Creating models that demonstrate the capacity of hardware to simulate large neural models based on particular classes of equations, but without any claims about the match of the models in question to empirical neuroscience data. (e.g. Modha’s “cat” simulation). All of the above are validly called “large scale brain simulations”, yet they constitute very dif- ferent forms of research. Simulations in the first and fifth category are adequate to serve as components of AGI systems. Simulations in the other categories are useful for guiding neuro- science or hardware development, but are less directly useful for AGI. Now, any one of these simulations, if advanced a little further in the right direction, could become more robustly functional and hence more clearly “AGI” rather than just computational neuroscience. But at the present time, our understanding of neuroscience isn’t quite advanced enough to guide the creation of computational neuroscience systems that actually display inter- esting intelligent behaviors, while still displaying high neural fidelity in their internal structures and dynamics. The bottleneck here isn’t really the computational simulation side, but more the neuroscience side – we just haven’t gathered the neuroscience data needed to spawn the creation of the neuroscience knowledge and understanding we’d need to drive this sort of AGI approach effectively yet. Summing up, a caricature of some common attitudes for and against computational neuro- science as an approach to AGI would be: • For: The brain is the only example we have of a system with a high level of general intelligence. So, emulating the brain is obviously the most straightforward path to achieving 66 3 Artificial General Intelligence: Concept, State of the Art and Future Prospects

AGI. Neuroscience is advancing rapidly, and so is computer hardware; so, putting the two together, there’s a fairly direct path toward AGI by implementing cutting-edge neuroscience models on massively powerful hardware. Once we understand how brain-based AGIs work, we will likely then gain the knowledge to build even better systems. • Against: Neuroscience is advancing rapidly but is still at a primitive stage; our knowledge about the brain is extremely incomplete, and we lack understanding of basic issues like how the brain learns or represents abstract knowledge. The brain’s cognitive mechanisms are well-tuned to run efficiently on neural wetware, but current computer hardware has very different properties; given a certain fixed amount of digital computing hardware, one can create vastly more intelligent systems via crafting AGI algorithms appropriate to the hardware than via trying to force algorithms optimized for neural wetware onto a very different substrate.

3.3.2.2 Artificial Life as a Route to AGI

Another potential emergentist approach to AGI is to simulate a different type of biology: not the brain, but the evolving ecosystem that gave rise to the brain in the first place. That is: to seek AGI via artificial life 12. Although Alife itself is a flourishing field, the artificial organisms created so far have been quite simplistic, more like simplified bugs or microscopic organisms than like creatures typically thought of as displaying a high level of general intelligence. Further, given the state of the art, each Alife simulation tends to reach an upper limit of complexity relatively soon; no one has yet managed to emulate the open-ended nature of biological ecosystems. Bruce Damer’s Evogrid [? ] attempts to break through this logjam directly, via a massive distributed- computing powered use of chemistry simulations, in which evolutionary algorithms are used in an effort to evolve the best possible chemical soups; but this is still early-stage, though initial results are promising. The main limitation of this approach is computational resource related: An ecosystem obvi- ously requires a lot more computing resources than an individual brain or body. At present it’s unclear whether we have sufficient computational resources to realize individual human-level minds at feasible cost; simulating a whole ecosystem may be out of reach until a few more Moore’s Law doublings have occurred. Although, this isn’t a definitive objection, because it may be possible to craft artificial life-forms making exquisitely efficient use of digital , or even of quantum computers or other radical new computing fabrics. At any rate, the Alife approach is not a major force in the AGI community at present, but it may surge as readily available computational power increases.

3.3.2.3 Developmental Robotics

Finally, one subset of emergentist cognitive architectures that I consider particularly important is the developmental robotics architectures, focused on controlling robots without significant “hard-wiring” of knowledge or capabilities, allowing robots to learn (and learn how to learn etc.) via their engagement with the world. A significant focus is often placed here on “intrin- sic motivation,” wherein the robot explores the world guided by internal goals like novelty or curiosity, forming a model of the world as it goes along, based on the modeling requirements

12 See the site of the Society for Artificial Life http://alife.org 3.3 Approaches to Artificial General Intelligence 67 implied by its goals. Many of the foundations of this research area were laid by Juergen Schmid- huber’s work in the 1990s [Sch91b, Sch91a, Sch95? ], but now with more powerful computers and robots the area is leading to more impressive practical demonstrations. I mention here a handful of the illustrative initiatives in this area: • Juyang Weng’s Dav [? ] and SAIL [? ] projects involve mobile robots that explore their environments autonomously, and learn to carry out simple tasks by building up their own world-representations through both unsupervised and teacher-driven processing of high- dimensional sensorimotor data. The underlying philosophy is based on human child devel- opment [? ], the knowledge representations involved are neural network based, and a number of novel learning algorithms are involved, especially in the area of vision processing. • FLOWERS [? ], an initiative at the French research institute INRIA, led by Pierre-Yves Oudeyer, is also based on a principle of trying to reconstruct the processes of development of the human child’s mind, spontaneously driven by intrinsic motivations. Kaplan [? ] has taken this project in a practical direction via the creation of a “robot playroom.” Experiential language learning has also been a focus of the project [? ], driven by innovations in speech understanding. • IM-CLEVER 13, a new European project coordinated by Gianluca Baldassarre and con- ducted by a large team of researchers at different institutions, which is focused on creating software enabling an iCub [? ] humanoid robot to explore the environment and learn to carry out human childlike behaviors based on its own intrinsic motivations. A caricature of some common attitudes for and against the developmental robotics approach to AGI would be: • For: Young human children learn, mostly, by unsupervised exploration of their environ- ment – using body and mind together to adapt to the world, with progressively increasing sophistication. This is the only way that we know of, for a mind to move from ignorance and incapability to knowledge and capability. • Against: Robots, at this stage in the development of technology, are extremely crude com- pared to the human body, and thus don’t provide an adequate infrastructure for mind/body learning of the sort a young human child does. Due to the early stage of robotics technol- ogy, robotics projects inevitably become preoccupied with robotics particulars, and never seem to get to the stage of addressing complex cognitive issues. Furthermore, it’s unclear whether detailed sensorimotor grounding is actually necessary in order to create an AGI doing human-level reasoning and learning.

3.3.3 Hybrid AGI Architectures

In response to the complementary strengths and weaknesses of the symbolic and emergentist approaches, in recent years a number of researchers have turned to integrative, hybrid archi- tectures, which combine subsystems operating according to the two different paradigms. The combination may be done in many different ways, e.g. connection of a large symbolic subsystem with a large subsymbolic system, or the creation of a population of small agents each of which is both symbolic and subsymbolic in nature.

13 http://im-clever.noze.it/project/project-description 68 3 Artificial General Intelligence: Concept, State of the Art and Future Prospects

Nils Nilsson expressed the motivation for hybrid AGI systems very clearly in his article at the AI-50 conference (which celebrated the 50’th anniversary of the AI field) [? ]. While affirming the value of the “Physical Symbol System Hypothesis” (PSSH) that underlies classical symbolic AI, he argues that the PSSH explicitly assumes that, whenever necessary, symbols will be grounded in objects in the environment through the perceptual and effector capabilities of a physical symbol system.

Thus, he continues,

I grant the need for non-symbolic processes in some intelligent systems, but I think they sup- plement rather than replace symbol systems. I know of no examples of reasoning, understanding language, or generating complex plans that are best understood as being performed by systems using exclusively non-symbolic processes.... AI systems that achieve human-level intelligence will involve a combination of symbolic and non-symbolic processing.

Hybrid architectures are often designed to leverage (hypothesized or empirically observed) “whole is greater than the sum of the parts" phenomena arising when multiple components are appropriately connected. This is philosophically related to the emergence phenomena at the conceptual heart of many subsymbolic architectures. In [? ] the concept of “cognitive synergy" is formulated to capture this idea; it is conjectured that human-level AGI intrinsically depends on the synergetic interaction of multiple components (for instance, as in the CogPrime design [? ], multiple memory systems each supplied with its own learning process). A few illustrative hybrid cognitive architectures are: • CLARION [? ] is a hybrid architecture that combines a symbolic component for reasoning on “explicit knowledge” with a connectionist component for managing “implicit knowledge.” Learning of implicit knowledge may be done via neural net, reinforcement learning, or other methods. The integration of symbolic and subsymbolic methods is powerful, but a great deal is still missing such as episodic knowledge and learning and creativity. Learning in the symbolic and subsymbolic portions is carried out separately rather than dynamically coupled. • CogPrime [? ], an AGI approach developed by myself and my colleagues, and being imple- mented within the OpenCog open source AI software platform. CogPrime integrates mul- tiple learning algorithms associated with different memory types, using a weighted labeled hypergraph knowledge representation and making heavy use of probabilistic semantics. The various algorithms are designed to display “cognitive synergy” and work together to achieve system goals. It is currently being used to control characters, and a project to use it to control humanoid robots is in the planning stage. • DUAL [NK04] is arguably the most impressive system to come out of Marvin Minsky’s “Society of Mind” paradigm. It features a population of agents, each of which combines symbolic and connectionist representation, utilizing population-wide self-organization to collectively carry out tasks such as perception, analogy and associative memory. • LIDA [? ] is a comprehensive cognitive architecture heavily based on Bernard Baars’ “Global Workspace Theory” [? ]. It articulates a “cognitive cycle” integrating various forms of memory and intelligent processing in a single processing loop. The architecture ties in well with both neuroscience and cognitive psychology, but it deals most thoroughly with 3.3 Approaches to Artificial General Intelligence 69

“lower level” aspects of intelligence; the handling of more advanced aspects like language and reasoning in LIDA has not yet been worked out in detail. • MicroPsi [? ] is an integrative architecture based on Dietrich Dorner’s Psi model of moti- vation, emotion and intelligence. It has been tested on some practical control applications, and also on simulating artificial agents in a simple virtual world. MicroPsi’s basis in neu- roscience and psychology are extensive and carefully-drawn. Similar to LIDA, MicroPsi currently focuses on the “lower level” aspects of intelligence, not yet directly handling ad- vanced processes like language and abstract reasoning. • PolyScheme [? ] integrates multiple methods of representation, reasoning and inference schemes for general problem solving. Each Polyscheme “specialist” models a different aspect of the world using specific representation and inference techniques, interacting with other specialists and learning from them. Polyscheme has been used to model infant reasoning including object , events, , spatial relations. • Shruti [? ] is a biologically-inspired model of human reflexive inference, which uses a connectionist architecture to represent relations, types, entities and causal rules using focal- clusters. • James Albus’s 4D/RCS robotics architecture shares a great deal with some of the emer- gentist architectures discussed above, e.g. it has the same hierarchical pattern recognition structure as DeSTIN and HTM, and the same three cross-connected hierarchies as DeSTIN, and shares with the developmental robotics architectures a focus on real-time adaptation to the structure of the world. However, 4D/RCS is not foundationally learning-based but relies on hard-wired architecture and algorithms, intended to mimic the qualitative structure of relevant parts of the brain (and intended to be augmented by learning, which differentiates it from emergentist approaches). The nature of integration between components varies among the hybrid architectures. Some of them are in essence, multiple, disparate algorithms carrying out separate functions, encap- sulated in black boxes and communicating results with each other. For instance, PolyScheme, ACT-R and CLARION all display this “modularity” property to a significant extent. On the other hand, architectures such as CogPrime, DUAL, Shruti, LIDA and MicroPsi feature richer integration – which makes their dynamics more challenging to understand and tune. A caricature of some common attitudes for and against the hybrid approach to AGI would be:

• For: The brain is a complex system with multiple different parts, architected according to different principles but all working closely together; so in that sense, the brain is a hybrid system. Different aspects of intelligence work best with different representational and learning mechanisms. If one designs the different parts of a hybrid system properly, one can get the different parts to work together synergetically, each contributing its strengths to help over come the others’ weaknesses. Biological systems tend to be messy, complex and integrative; searching for a single “algorithm of general intelligence” is an inappropriate attempt to project the aesthetics of physics or theoretical computer science into a qualitative different domain. • Against: Gluing together a bunch of inadequate systems isn’t going to make an adequate system. The brain uses a unified infrastructure (a neural network) for good reason; when you try to tie together qualitatively different components, you get a brittle system that can’t adapt that well, because the different components can’t work together with full flexibility. Hybrid systems are inelegant, and violate the “Occam’s Razor” heuristic. 70 3 Artificial General Intelligence: Concept, State of the Art and Future Prospects 3.3.4 The Universalist Approach to AGI

A school of AGI research that doesn’t fit neatly into any of the three categories reviewed above (symbolic, emergentist, hybrid) is what I call the “universalist approach”. In this approach, one starts with AGI algorithms that would yield incredibly powerful general intelligence if supplied with massively, unrealistically much computing power; and then one tries to “scale them down,” via adapting them to work using feasible computational resources. Historically, the roots of this approach may be traced to Solomonoff’s pioneering work on the theory of induction [?? ]. The paradigm case of a universalist AGI approach is Marcus Hutter’s AIXI system, which is based on the following simple concepts: • An AGI system is going to be controlled by some program • Instead of trying to figure out the right program via human wizardry, we can just write a “meta-algorithm” to search program space, and automatically find the right program for making the AGI smart, and then use that program to operate the AGI • We can then repeat this meta-algorithm over and over, as the AGI gains more data about the world, so it will always have the operating program that’s best according to all its available data Marcus Hutter [Hut05] has proved that the AIXI system, which works basically as described in the above list, would be maximally generally intelligent, if the latter is defined appropriately in terms of maximizing computable reward functions in computable environments. The catch is that AIXI requires infinite processing power. But there’s another version, AIXItl, that requires only an infeasibly massive finite amount of computing power. Juergen Schmidhuber’s Goedel Machine [Sch06] operates differently in detail, but the concept is similar. At each step of the way, it takes the action that it can prove, according to its axiom system and its perceptual data, will be the best way to achieve its goals. Like AIXI, this is uncomputable in the most direct formulation, and computable but probably intractable in its most straightforward simplified formulations. These theoretical approaches suggest a research program of “scaling down from infinity”, and finding practical, scalable ways of achieving AGI using similar ideas. Some promising results have been obtained, using simplified program space search to solve various specialized problems [? ]. But whether this approach can be used for human-level AGI, with feasible resource usage, remains uncertain. It’s a gutsy strategy, setting aside particularities of the human mind and brain, and focusing on what’s viewed as the mathematical essence of general intelligence. A caricature of some common attitudes for and against the program search approach to AGI would be: • For: The case of AGI with massive computational resources is an idealized case of AGI, similar to assumptions like the frictionless plane in physics, or the large population size in evolutionary biology. Now that we’ve solved the AGI problem in this simplified special case, we can use the understanding we’ve gained to address more realistic cases. This way of proceeding is mathematically and intellectually rigorous, unlike the more ad hoc approaches typically taken in the field. And we’ve already shown we can scale down our theoretical approaches to handle various specialized problems. • Against: The theoretical achievement of advanced general intelligence using infinitely or unrealistically much computational resources, is a mathematical game which is only mini- mally relevant to achieving AGI using realistic amounts of resources. In the real world, the 3.4 Structures Underlying Human-Like General Intelligence 71

simple “trick” of exhaustively searching program space until you find the best program for your purposes, won’t get you very far. Trying to “scale down” from this simple method to something realistic isn’t going to work well, because real-world general intelligence is based on various complex, overlapping architectural mechanisms that just aren’t relevant to the massive-computational-resources situation.

3.4 Structures Underlying Human-Like General Intelligence

AGI is a very broad pursuit, not tied to the creation of systems emulating human-type general intelligence. However, if one temporarily restricts attention to AGI systems intended to vaguely emulate human functionality, then one can make significantly more intellectual progress in certain interesting directions. For example, by piecing together insights from the various archi- tectures mentioned above, one can arrive at a rough idea regarding what are the main aspects that need to be addressed in creating a “human-level AGI” system. 14 I will present here my rough understanding of the key aspects of human-level AGI in a series of seven figures, each adapted from a figure used to describe (all or part of) one of the AGI approaches listed above. The collective of these seven figures I will call the “integrative diagram.” When the term “architecture” is used in the context of these figures, it refers to an abstract cognitive architecture that may be realized in hardware, software, wetware or perhaps some other way. This “integrative diagram” is not intended as a grand theoretical conclusion, but rather as a didactic overview of the key elements involved in human-level general intelligence, expressed in a way that is not extremely closely tied to any one AGI architecture or theory, but represents a fair approximation of the AGI field’s overall understanding (inasmuch as such a diverse field can be said to have a coherent “overall understanding”). First, figure 3.1 gives a high-level breakdown of a human-like mind into components, based on Aaron Sloman’s high-level cognitive-architectural sketch [? ]. This diagram represents, roughly speaking, “modern common sense” about the architecture of a human-like mind. The separation between structures and processes, embodied in having separate boxes for Working Memory vs. Reactive Processes, and for Long Term Memory vs. Deliberative Processes, could be viewed as somewhat artificial, since in the human brain and most AGI architectures, memory and processing are closely integrated. However, the tradition in cognitive psychology is to separate out Working Memory and Long Term Memory from the cognitive processes acting thereupon, so I have adhered to that convention. The other changes from Sloman’s diagram are the explicit inclusion of language, representing the hypothesis that language processing is handled in a somewhat special way in the human brain; and the inclusion of a reinforcement component parallel to the perception and action hierarchies, as inspired by intelligent control systems theory (e.g. Albus as mentioned above) and deep learning theory. Of course Sloman’s high level diagram in its original form is intended as inclusive of language and reinforcement, but I felt it made sense to give them more emphasis. Figure 3.2, modeling working memory and reactive processing, is essentially the LIDA dia- gram as given in prior papers by Stan Franklin, Bernard Baars and colleagues [?? ]. 15 The boxes in the upper left corner of the LIDA diagram pertain to sensory and motor processing,

14 The material in this section is adapted from a portion of the article [? ] 15 The original LIDA diagram refers to various “codelets”, a key concept in LIDA theory. I have replaced “attention codelets” here with “attention flow”, a more generic term. I suggest one can think of an attention 72 3 Artificial General Intelligence: Concept, State of the Art and Future Prospects

Fig. 3.1: High-Level Structure of a Human-Like Mind which LIDA does not handle in detail, and which are modeled more carefully by deep learning theory. The bottom left corner box refers to action selection, which in the integrative diagram is modeled in more detail by Psi. The top right corner box refers to Long-Term Memory, which the integrative diagram models in more detail as a synergetic multi-memory system (Figure 3.4). Figure 3.3, modeling motivation and action selection, is a lightly modified version of the Psi diagram from Joscha Bach’s book Principles of Synthetic Intelligence [? ]. The main difference from Psi is that in the integrative diagram the Psi motivated action framework is embedded in a larger, more complex cognitive model. Psi comes with its own theory of working and long- term memory, which is related to but different from the one given in the integrative diagram – it views the multiple memory types distinguished in the integrative diagram as emergent from a common memory substrate. Psi comes with its own theory of perception and action, which seems broadly consistent with the deep learning approach incorporated in the integrative

codelet as a piece of information that it’s currently pertinent to pay attention to a certain collection of items together. 3.4 Structures Underlying Human-Like General Intelligence 73

Fig. 3.2: Architecture of Working Memory and Reactive Processing, closely modeled on the LIDA architecture diagram. Psi’s handling of working memory lacks the detailed, explicit workflow of LIDA, though it seems broadly conceptually consistent with LIDA. In Figure 3.3, the box labeled “Other parts of working memory” is labeled “Protocol and situation memory” in the original Psi diagram. The Perception, Action Execution and Action Selection boxes have fairly similar semantics to the similarly labeled boxes in the LIDA-like Figure 3.2, so that these diagrams may be viewed as overlapping. The LIDA model doesn’t explain action selection and planning in as much detail as Psi, so the Psi-like Figure 3.3 could be viewed as an elaboration of the action-selection portion of the LIDA-like Figure 3.2. In Psi, reinforcement is considered as part of the learning process involved in action selection and planning; in Figure 3.3 an explicit “reinforcement box” has been added to the original Psi diagram, to emphasize this. Figure 3.4, modeling long-term memory and deliberative processing, is derived from my own prior work studying the “cognitive synergy” between different cognitive processes associated with different types of memory, and seeking to embody this synergy into the OpenCog system. The division into types of memory is fairly standard in the cognitive science field. Declarative, procedural, episodic and sensorimotor memory are routinely distinguished; we like to distinguish attentional memory and intentional (goal) memory as well, and view these as the interface between long-term memory and the mind’s global control systems. One focus of our AGI design work has been on designing learning algorithms, corresponding to these various types of memory, that interact with each other in a synergetic way [? ], helping each other to overcome their intrinsic combinatorial explosions. There is significant evidence that these various types of 74 3 Artificial General Intelligence: Concept, State of the Art and Future Prospects

Fig. 3.3: Architecture of Motivated Action long-term memory are differently implemented in the brain, but the degree of structure and dynamical commonality underlying these different implementations remains unclear [? ]. Each of these long-term memory types has its analogue in working memory as well. In some cognitive models, the working memory and long-term memory versions of a memory type and corresponding cognitive processes, are basically the same thing. OpenCog is mostly like this –it implements working memory as a subset of long-term memory consisting of items with particularly high importance values. The distinctive nature of working memory is enforced via using slightly different dynamical equations to update the importance values of items with importance above a certain threshold. On the other hand, many cognitive models treat working and long term memory as more distinct than this, and there is evidence for significant functional and anatomical distinctness in the brain in some cases. So for the purpose of the integrative 3.4 Structures Underlying Human-Like General Intelligence 75

Fig. 3.4: Architecture of Long-Term Memory and Deliberative and Metacognitive Thinking diagram, it seemed best to leave working and long-term memory subcomponents as parallel but distinguished. Figure 3.4 may be interpreted to encompass both workaday deliberative thinking and metacognition (“thinking about thinking"), under the hypothesis that in human beings and human-like minds, metacognitive thinking is carried out using basically the same processes as plain ordinary deliberative thinking, perhaps with various tweaks optimizing them for thinking about thinking. If it turns out that humans have, say, a special kind of reasoning faculty ex- clusively for metacognition, then the diagram would need to be modified. Modeling of self and others is understood to occur via a combination of metacognition and deliberative thinking, as well as via implicit adaptation based on reactive processing. Figure 3.5 models perception, according to the concept of deep learning [?? ]. Vision and audition are modeled as deep learning hierarchies, with bottom-up and top-down dynamics. The lower layers in each hierarchy refer to more localized patterns recognized in, and abstracted from, sensory data. Output from these hierarchies to the rest of the mind is not just through the top layers, but via some sort of sampling from various layers, with a bias toward the top layers. The different hierarchies cross-connect, and are hence to an extent dynamically coupled together. It is also recognized that there are some sensory modalities that aren’t strongly hierarchical, e.g touch and smell (the latter being better modeled as something like an asymmetric Hopfield net, prone to frequent chaotic dynamics [? ]) – these may also cross-connect with each other and with the more hierarchical perceptual subnetworks. Of course the suggested architecture could include any number of sensory modalities; the diagram is restricted to four just for simplicity. The self-organized patterns in the upper layers of perceptual hierarchies may become quite complex and may develop advanced cognitive capabilities like episodic memory, reasoning, lan- guage learning, etc. A pure deep learning approach to intelligence argues that all the aspects of intelligence emerge from this kind of dynamics (among perceptual, action and reinforcement hierarchies). My own view is that the heterogeneity of human brain architecture argues against 76 3 Artificial General Intelligence: Concept, State of the Art and Future Prospects

Fig. 3.5: Architecture for Multimodal Perception this perspective, and that deep learning systems are probably better as models of perception and action than of general cognition. However, the integrative diagram is not committed to my perspective on this – a deep-learning theorist could accept the integrative diagram, but argue that all the other portions besides the perceptual, action and reinforcement hierarchies should be viewed as descriptions of phenomena that emerge in these hierarchies due to their interaction. Figure 3.6 shows an action subsystem and a reinforcement subsystem, parallel to the per- ception subsystem. Two action hierarchies, one for an arm and one for a leg, are shown for concreteness, but of course the architecture is intended to be extended more broadly. In the hierarchy corresponding to an arm, for example, the lowest level would contain control patterns corresponding to individual joints, the next level up to groupings of joints (like fingers), the next level up to larger parts of the arm (hand, elbow). The different hierarchies corresponding to different body parts cross-link, enabling coordination among body parts; and they also connect at multiple levels to perception hierarchies, enabling sensorimotor coordination. Finally there is a module for motor planning, which links tightly with all the motor hierarchies, and also overlaps with the more cognitive, inferential planning activities of the mind, in a manner that is modeled different ways by different theorists. Albus [? ] has elaborated this kind of hierarchy quite elaborately. 3.4 Structures Underlying Human-Like General Intelligence 77

Fig. 3.6: Architecture for Action and Reinforcement

The reinforcement hierarchy in Figure 3.6 provides reinforcement to actions at various levels on the hierarchy, and includes dynamics for propagating information about reinforcement up and down the hierarchy. Figure 3.7 deals with language, treating it as a special case of coupled perception and action. The traditional architecture of a computational language comprehension system is a pipeline [? ? ], which is equivalent to a hierarchy with the lowest-level linguistic features (e.g. sounds, words) at the bottom, and the highest level features (semantic abstractions) at the top, and syntactic features in the middle. Feedback connections enable semantic and cognitive modulation of lower- level linguistic processing. Similarly, language generation is commonly modeled hierarchically, with the top levels being the ideas needing verbalization, and the bottom level corresponding 78 3 Artificial General Intelligence: Concept, State of the Art and Future Prospects

Fig. 3.7: Architecture for Language Processing to the actual sentence produced. In generation the primary flow is top-down, with bottom-up flow providing modulation of abstract concepts by linguistic surface forms. This completes the posited, rough integrative architecture diagram for human-like general intelligence, split among 7 different pictures, formed by judiciously merging together architec- ture diagrams produced via a number of cognitive theorists with different, overlapping foci and research paradigms. One may wonder: Is anything critical left out of the diagram? A quick pe- rusal of the table of contents of cognitive psychology textbooks suggests that if anything major is left out, it’s also unknown to current cognitive psychology. However, one could certainly make an argument for explicit inclusion of certain other aspects of intelligence, that in the integrative diagram are left as implicit emergent phenomena. For instance, creativity is obviously very important to intelligence, but, there is no “creativity” box in any of these diagrams – because in our view, and the view of the cognitive theorists whose work we’ve directly drawn on here, creativity is best viewed as a process emergent from other processes that are explicitly included in the diagrams. A high-level “cognitive architecture diagram” like this is certainly not a design for an AGI. Rather, it is more like a pointer in the direction of a requirements specification. These are, to a rough approximation, the aspects that must be taken into account, by anyone who wants to create a human-level AGI; and this is how these aspects appear to interact in the human mind. Different AGI approaches may account for these aspects and their interactions in different ways – e.g. via explicitly encoding them, or creating a system from which they can emerge, etc.

3.5 Metrics and Environments for Human-Level AGI

Science hinges on measurement; so if AGI is a scientific pursuit, it must be possible to measure what it means to achieve it. 3.5 Metrics and Environments for Human-Level AGI 79

Given the variety of approaches to AGI, it is hardly surprising that there are also multiple approaches to quantifying and measuring the achievement of AGI. However, things get a little simpler if one restricts attention to the subproblem of creating “human-level” AGI. When one talks about AGI beyond the human level, or AGI that is very qualitatively different from human intelligence, then the measurement issue becomes very abstract – one basically has to choose a mathematical measure of general intelligence, and adopt it as a measure of success. This is a meaningful approach, yet also worrisome, because it’s difficult to tell, at this stage, what relation any of the existing mathematical measures of general intelligence is going to have to practical systems. When one talks about human-level AGI, however, the measurement problem gets a lot more concrete: one can use tests designed to measure human performance, or tests designed relative to human behavior. The measurement issue then decomposes into two subproblems: quantifying achievement of the goal of human-level AGI, and measuring incremental progress toward that goal. The former subproblem turns out to be considerably more straightforward.

3.5.1 Metrics and Environments

The issue of metrics is closely tied up with the issue of “environments” for AGI systems. For AGI systems that are agents interacting with some environment, any method of measuring the general intelligence of these agents will involve the particulars of the AGI systems’ environ- ments. If an AGI is implemented to control video game characters, then its intelligence must be measured in the video game context. If an AGI is built with solely a textual user interface, then its intelligence must be measured purely via conversation, without measuring, for example, visual pattern recognition. And the importance of environments for AGI goes beyond the value of metrics. Even if one doesn’t care about quantitatively comparing two AGI systems, it may still be instructive to qualitatively observe the different ways they face similar situations in the same environment. Using multiple AGI systems in the same environment also increases the odds of code-sharing and concept-sharing between different systems. It makes it easier to conceptually compare what different systems are doing and how they’re working. It is often useful to think in terms of “scenarios” for AGI systems, where a “scenario” means an environment plus a set of tasks defined in that environment, plus a set of metrics to mea- sure performance on those tasks. At this stage, it is unrealistic to expect all AGI researchers to agree to conduct their research relative to the same scenario. The early-stage manifesta- tions of different AGI approaches tend to fit naturally with different sorts of environments and tasks. However, to whatever extent it is sensible for multiple AGI projects to share common environments or scenarios, this sort of cooperation should be avidly pursued.

3.5.2 Quantifying the Milestone of Human-Level AGI

A variety of metrics, relative to various different environments, may be used to measure achieve- ment of the goal of “human-level AGI.” Examples include: 80 3 Artificial General Intelligence: Concept, State of the Art and Future Prospects

• the classic Turing Test, conceived as (roughly) “fooling a panel of college-educated human judges, during a one hour long interrogation, that one is a human being” [? ] (and see [HF95?? ] for discussions of some of the test’s weaknesses). • the Virtual World Turing Test occurring in an online virtual world, where the AGI and the human controls are controlling avatars (this is inclusive of the standard Turing Test if one assumes the avatars can use language) [? ]. • Shane Legg’s AIQ measure [? ], which is a computationally practical approximation to the algorithmic information theory based formalization of general intelligence given in [? ]. Work by Hernandez-Orallo and Dowe pursues a similar concept with different technical details[? ]. • Text compression – the idea being that any algorithm capable of understanding text should be transformable into an algorithm for compressing text based on the patterns it recognizes therein. This is the basis of the Hutter Prize [? ], a cash prize funded by Marcus Hutter which rewards data compression improvements on a specific 100 MB English text file, consisting of the first 100,000,000 characters of a certain version of English . • the Online University Student Test, where an AGI has to obtain a college degree at an online university, carrying out the same communications with the professors and the other students as a human student would (including choosing its curriculum, etc.)[? ]. • the Robot University Student Test, where an AGI has to obtain a college degree at an physical university, carrying out the same communications with the professors and the other students as a human student would, and also moving about the campus and handling relevant physical objects in a sufficient manner to complete the coursework[? ]. • the Artificial Scientist Test, where an AGI that can do high-quality, original scientific research, including choosing the research problem, reading the relevant literature, writing and publishing the paper, etc. (this may be refined to a Nobel Prize Test, where the AGI has do original scientific research that wins a Nobel Prize)[? ].

Each of these approaches has its pluses and minuses. None of them can sensibly be considered necessary conditions for human-level intelligence, but any of them may plausibly be considered sufficient conditions. The latter three have the disadvantage that they may not be achievable by every human – so they may set the bar a little too high. The former two have the disadvantage of requiring AGI systems to imitate humans, rather than just honestly being themselves; and it may be that accurately imitating humans when one does not have a human body or experience, requires significantly greater than human level intelligence. Regardless of the practical shortcomings of the above measures, though, I believe they are basically adequate as precisiations of “what it means to achieve human-level general intelli- gence.”

3.5.3 Measuring Incremental Progress Toward Human-Level AGI

While postulating criteria for assessing achievement of full human-level general intelligence seems relatively straightforward, positing good tests for intermediate progress toward the goal of human-level AGI seems much more difficult. That is: it is not clear how to effectively measure whether one is, say, 50 percent of the way to human-level AGI? Or, say, 75 or 25 percent? 3.5 Metrics and Environments for Human-Level AGI 81

What I have found via a long series of discussions on this topic with a variety of AGI researchers is that: • It’s possible to pose many “practical tests” of incremental progress toward human-level AGI, with the property that if a proto-AGI system passes the test using a certain sort of architecture and/or dynamics, then this implies a certain amount of progress toward human-level AGI based on particular theoretical assumptions about AGI. • However, in each case of such a practical test, it seems intuitively likely to a significant percentage of AGI researchers that there is some way to “game” the test via designing a system specifically oriented toward passing that test, and which doesn’t constitute dramatic progress toward AGI. A series of practical tests of this nature were discussed and developed at a 2009 gathering at the University of Tennessee, Knoxville, called the “AGI Roadmap Workshop,” which led to an article in AI Magazine titled Mapping the Landscape of Artificial General Intelligence [? ]. Among the tests discussed there were:

• The Wozniak “coffee test” 16: go into an average American house and figure out how to make coffee, including identifying the coffee machine, figuring out what the buttons do, finding the coffee in the cabinet, etc. • Story understanding – reading a story, or watching it on video, and then answering questions about what happened (including questions at various levels of abstraction) • Passing the elementary school reading curriculum (which involves reading and answering questions about some picture books as well as purely textual ones) • Learning to play an arbitrary video game based on experience only, or based on experience plus reading instructions • Passing child psychologists’ typical evaluations aimed at judging whether a human preschool student is normally intellectually capable

One thing we found at the AGI Roadmap Workshop was that each of these tests seems to some AGI researchers to encapsulate the crux of the AGI problem, and to be unsolvable by any system not far along the path to human-level AGI – yet seems to other AGI researchers, with different conceptual perspectives, to be something probably game-able by narrow-AI methods. And of course, given the current state of science, there’s no way to tell which of these practical tests really can be solved via a narrow-AI approach, except by having a lot of researchers and engineers try really hard over a long period of time.

3.5.3.1 Metrics Assessing Generality of Machine Learning Capability

Complementing the above tests that are heavily inspired by human everyday life, there are also some more computer science oriented evaluation paradigms aimed at assessing AI systems going beyond specific tasks. For instance, there is a literature on Òmultitask learning,Ó where the goal for an AI is to learn one task quicker given another task solved previously [TM95? ? ]. There is a literature on Òshaping,Ó where the idea is to build up the capability of an AI

16 The Wozniak coffee test, suggested by J. Storrs Hall, is so named due to a remark by Apple co-founder Steve Wozniak, to the effect that no robot will ever be able to go into a random American house and make a cup of coffee 82 3 Artificial General Intelligence: Concept, State of the Art and Future Prospects by training it on progressively more difficult versions of the same tasks [LD03, LLL06]. Also, Achler [? ] has proposed criteria measuring the “flexibility of recognition” and posited this as a key measure of progress toward AGI. While we applaud the work done in these areas, we also note it is an open question whether exploring these sorts of processes using mathematical abstractions, or in the domain of various machine-learning or robotics test problems, is capable of adequately addressing the problem of AGI. The potential problem with this kind of approach is that generalization among tasks, or from simpler to more difficult versions of the same task, is a process whose nature may depend strongly on the overall nature of the set of tasks and task-versions involved. Real-world humanly-relevant tasks have a subtlety of interconnectedness and developmental course that is not captured in current mathematical learning frameworks nor standard AI test problems. To put it a little differently, it is possible that all of the following hold: • the universe of real-world human tasks may possess a host of “special statistical propertiesÓ that have implications regarding what sorts of AI programs will be most suitable • exploring and formalizing and generalizing these statistical properties is an important re- search area; however, • an easier and more reliable approach to AGI testing is to create a testing environment that embodies these properties implicitly, via constituting an emulation of the most cognitively meaningful aspects of the real-world human learning environment Another way to think about these issues is to contrast the above-mentioned “AGI Roadmap Workshop” ideas with the “General Game PlayerÓ (GGP) AI competition, in which AIs seek to learn to play games based on formal descriptions of the rules 17. Clearly doing GGP well requires powerful AGI; and doing GGP even mediocrely probably requires robust multitask learning and shaping. But it is unclear whether GGP constitutes a good approach to testing early- stage AI programs aimed at roughly humanlike intelligence. This is because, unlike the tasks involved in, say, making coffee in an arbitrary house, or succeeding in preschool or university, the tasks involved in doing simple instances of GGP seem to have little relationship to humanlike intelligence or real-world human tasks. So, an important open question is whether the class of statistical biases present in the set of real-world human environments tasks, has some sort of generalizable relevance to AGI beyond the scope of human-like general intelligence, or is informative only about the particularities of human-like intelligence. Currently we seem to lack any solid, broadly accepted theoretical framework for resolving this sort of question.

3.5.3.2 Why is Measuring Incremental Progress Toward AGI So Hard?

A question raised by these various observations is whether there is some fundamental reason why it’s hard to make an objective, theory-independent measure of intermediate progress toward advanced AGI, which respects the environment and task biased nature of human intelligence as well as the mathematical generality of the AGI concept. Is it just that we haven’t been smart enough to figure out the right test – or is there some conceptual reason why the very notion of such a test is problematic?

17 http://games.stanford.edu/ 3.5 Metrics and Environments for Human-Level AGI 83

Why might a solid, objective empirical test for intermediate progress toward humanly mean- ingful AGI be such a difficult project? One possible reason could be the phenomenon of “cog- nitive synergy" briefly noted above. In this hypothesis, for instance, it might be that there are 10 critical components required for a human-level AGI system. Having all 10 of them in place results in human-level AGI, but having only 8 of them in place results in having a dramatically impaired system – and maybe having only 6 or 7 of them in place results in a system that can hardly do anything at all. Of course, the reality is not as strict as the simplified example in the above paragraph sug- gests. No AGI theorist has really posited a list of 10 crisply-defined subsystems and claimed them necessary and sufficient for AGI. We suspect there are many different routes to AGI, involving integration of different sorts of subsystems. However, if the cognitive synergy hypoth- esis is correct, then human-level AGI behaves roughly like the simplistic example in the prior paragraph suggests. Perhaps instead of using the 10 components, you could achieve human-level AGI with 7 components, but having only 5 of these 7 would yield drastically impaired function- ality – etc. To mathematically formalize the cognitive synergy hypothesis becomes complex, but here we’re only aiming for a qualitative argument. So for illustrative purposes, we’ll stick with the “10 components” example, just for communicative simplicity. Next, let’s additionally suppose that for any given task, there are ways to achieve this task using a system that is much simpler than any subset of size 6 drawn from the set of 10 com- ponents needed for human-level AGI, but works much better for the task than this subset of 6 components(assuming the latter are used as a set of only 6 components, without the other 4 components). Note that this additional supposition is a good bit stronger than mere cognitive synergy. For lack of a better name, I have called this hypothesis “tricky cognitive synergy” [? ]. Tricky cognitive synergy would be the case if, for example, the following possibilities were true: • creating components to serve as parts of a synergetic AGI is harder than creating compo- nents intended to serve as parts of simpler AI systems without synergetic dynamics • components capable of serving as parts of a synergetic AGI are necessarily more complicated than components intended to serve as parts of simpler AI systems These certainly seem reasonable possibilities, since to serve as a component of a synergetic AGI system, a component must have the internal flexibility to usefully handle interactions with a lot of other components as well as to solve the problems that come its way. If tricky cognitive synergy holds up as a property of human-level general intelligence, the difficulty of formulating tests for intermediate progress toward human-level AGI follows as a consequence. Because, according to the tricky cognitive synergy hypothesis, any test is going to be more easily solved by some simpler narrow AI process than by a partially complete human- level AGI system. At the current stage in the development of AGI, we don’t really know how big a role “tricky cognitive synergy” plays in the general intelligence. Quite possibly, 5 or 10 years from now someone will have developed wonderfully precise and practical metrics for the evaluation of incremental progress toward human-level AGI. However, it’s worth carefully considering the possibility that fundamental obstacles, tied to the nature of general intelligence, stand in the way of this possibility. 84 3 Artificial General Intelligence: Concept, State of the Art and Future Prospects 3.6 What Would a General Theory of General Intelligence Look Like?

While most approaches to creating AGI are theoretically motivated in one way or another, nobody would claim there currently exists a thorough and systematic theory of AGI in the same sense that there exist theories of say, sorting algorithms, respiration, genetics, or near- equilibrium thermodynamics. Current AGI theory is a patchwork of overlapping concepts, frameworks and hypotheses, often synergetic and sometimes mutually contradictory. Current AGI system designs are usually inspired by theories, but do not have all their particulars derived from theories. The creation of an adequate theoretical foundation for AGI is far beyond the scope of this review paper; however, it does seem worthwhile to briefly comment on what we may hope to get out of such a theory once it has been developed. Or in other words: What might a general theory of general intelligence look like? Some of the things AGI researchers would like to do with a general theory of general intelli- gence are: • Given a description of a set of goals and environments (and perhaps a probability distri- bution over these), and a set of computational resource restrictions, determine what is the system architecture that will display the maximum general intelligence relative to these goals and environments, subject to the given restrictions • Given a description of a system architecture, figure out what are the goals and environments, with respect to which it will reach a relatively high level of general intelligence • Given an intelligent system architecture, determine what sort of subjective experience the system will likely report having, in various contexts • Given a set of subjective experiences and associated environments, determine what sort of intelligent system will likely have those experiences in those environments • Find a practical way to synthesize a general-intelligence test appropriate for a given class of reasonably similar intelligent systems • Identify the implicit representations of abstract concepts, arising within emergentist, hybrid, program learning based or other non-wholly-symbolic intelligent systems • Given a certain intelligent system in a certain environment, predict the likely course of development of that system as it learns, experiences and grows • Given a set of behavioral constraints (for instance, ethical constraints), estimate the odds that a given system will obey the constraints given certain assumptions about its envi- ronment. Determine architectures that, consistent with given computational resource con- straints, provide an optimal balance between general intelligence for specified goals and environments, and adherence to given behavioral constraints • What are the key structures and dynamics required for an AGI system to achieve human- level, human-like general intelligence within feasible computational resources? • Predict the consequences of releasing an AGI into the world, depending on its level of intelligence and some specificities of its design • Determine methods of assessing the ethical character of an AGI system, both in its current form and in future incarnations likely to develop from its current form (for discussion of various issues regarding the ethics of advanced AGI see [???? ]) Anyone familiar with the current state of AGI research will find it hard to suppress a smile at this ambitious list of objectives. At the moment we would seem very far from having a theoretical understanding capable of thoroughly addressing any of these points, in a practically 3.7 Conclusion 85 useful way. It is unclear to how far the limits of mathematics and computing will allow us to progress toward theoretical goals such as these. However: the further we can get in this direction, the better off the AGI field will be. At the moment, AGI system design is as much artistic as scientific, relying heavily on the designer’s scientific intuition. AGI implementation and testing are interwoven with (more or less) inspired tinkering, according to which systems are progressively improved internally as their behaviors are observed in various situations. This sort of approach is not unworkable, and many great inventions have been created via similar processes. It’s unclear how necessary or useful a more advanced AGI theory will be for the creation of practical AGI systems. But it seems likely that, the further we can get toward a theory providing tools to address questions like those listed above, the more systematic and scientific the AGI design process will become, and the more capable the resulting systems. It’s possible that a thorough, rigorous theory of AGI will emerge from the mind of some genius AGI researcher, in one fell swoop – or from the mind of one of the early AGI successes itself! However, it appears more probable that the emergence of such a theory will be a gradual process, in which theoretical and experimental developments progress hand in hand.

3.7 Conclusion

Given the state of the art in AGI research today, what can we say about the core AGI hypoth- esis? Is it actually the case that creating generally intelligent systems, requires fundamentally different concepts and approaches than creating more specialized, “narrow AI” systems? Is there a deep necessity for considering “AGI” as its own distinctive pursuit? Personally I am confident the answer to this question is “yes.” However, setting aside intuition and looking only at the available relevant science and engineering results, I would have to say that the jury is still out. The narrow AI approach has not led to dramatic progress toward AGI goals; but at the present time, the AGI achievements of researchers explicitly working toward AGI (myself included) have also been relatively modest. There exist a number of theoretical frameworks explaining why AGI is profoundly distinct from narrow AI; but none of these frameworks can be considered thoroughly empirically validated. The next question, then, is: What is being done – and what should be done – to further explore the core AGI hypothesis, and move toward its verification or falsification? It seems that to move the AGI field rapidly forward, one of the two following things must happen: • The emergence, within the AGI community, of a broadly accepted theory of general intelli- gence – including a characterization of what it is, and a theory of what sorts of architecture can be expected to work for achieving human-level AGI using realistic computational re- sources; or • The demonstration of an AGI system that qualitatively appears, to both novice and expert observers, to demonstrate a dramatic and considerable amount of general intelligence. For instance: a robot that can do a variety of preschool-type activities in a flexible and adaptive way; or a chatbot that can hold an hour’s conversation without sounding insane or resorting to repeating catch-phrases, etc. Neither of these occurrences would rigorously prove the core AGI hypothesis. However, either of them would build confidence in the core AGI hypothesis: in the first case because there would 86 3 Artificial General Intelligence: Concept, State of the Art and Future Prospects be a coherent and broadly accepted theory implying the core AGI hypothesis; in the second case because we would have a practical demonstration that an AGI perspective has in fact worked better for creating AGI than a narrow AI approach. These are still early days for AGI; and yet, given the reality of exponential technological advance [? ], this doesn’t necessarily imply that dramatic success is a long way off. There is a real possibility of dramatic, interlinked progress in AGI design, engineering, evaluation and theory in the relatively near future – in the next few decades, and potentially even the next few years. No one can accurately predict the course of development of any research area; but it’s interesting that in a survey of researchers at the AGI-2010 conference, the majority of respondents felt that human-level AGI was likely to arise before 2050, and some were much more optimistic [? ]. Optimism regarding the near advent of advanced AGI is controversial, but is a position held by an increasing plurality of the AGI community, who are working hard to make their hopes and projections rapidly eventuate. Chapter 4 Mapping the Landscape of AGI

Ben Goertzel, Sam Adams, Itamar Arel, Joscha Bach, Robert Coop , Rod Furlan , J. Storrs Hall , Alexei Samsonovich , Matthias Scheutz , Matthew Schlesinger , Stuart C Shapiro and John Sowa

Abstract This chapter presents the broad outlines of a roadmap toward human-level AGI. An initial capability landscape is presented, drawing on major themes from developmental psychol- ogy and illuminated by mathematical, physiological and information processing perspectives. The challenge of identifying appropriate tasks and environments for measuring AGI is addressed, and seven scenarios are presented as milestones suggesting a roadmap across the AGI landscape along with directions for ongoing research and collaboration.

4.1 Introduction

This chapter is based on an “AGI Roadmap Workshop" that Itamar Arel and Ben Goertzel organized at the University of Tennessee in 2009, with a goal of gathering some measure of common understanding among a variety of AGI researchers regarding the most sensible ways to make and assess incremental progress toward AGI. The content here is mainly a subset of that given in the paper “Mapping the Landscape of Human-Level Artificial General Intelligence”, published in AI Magazine in 2011 [? ]. Some of the ideas also trace back to discussions held during two workshops on “Evaluation and Metrics for Human Level AI” organized by John Laird and Pat Langley (one in Ann Arbor in late 2008, and one in Tempe in early 2009). Some of the conclusions of the Ann Arbor workshop were recorded in [LWML09]. Inspiration was also obtained from discussion at the “Future of AGI” post-conference workshop of the AGI-09 conference, triggered by Itamar Arel’s presentation on the “AGI Roadmap” theme; and from an earlier article on AGI Road-mapping by Arel and Livingston [? ]. Laird and Wray [LWML09] identified two significant challenges related to AGI: “...one of the best ways to refine and extend these sets of requirements and characteristics is to develop agents using cognitive architectures that test the sufficiency and necessity of all these and other possible characteristics and requirements on a variety of real-world tasks. One challenge is to find tasks and environments where all of these characteristics are active, and thus all of the requirements must be confronted. A second challenge is that the existence of an architecture that achieves a subset of these requirements, does not guarantee that such an architecture can be extended to achieve other requirements while maintaining satisfaction of the original set of requirements.”

87 88 4 Mapping the Landscape of AGI

Our 2009 AGI Road-mapping workshop took up the first challenge of finding appropriate tasks and environments to assess AGI systems, while we felt the second challenge was more appropri- ately handled by individual research efforts. We also added a third challenge, that of defining the landscape of AGI, in service to the AGI road-mapping effort that has been underway for several years.

4.2 Mapping the Landscape of AGI

There was much discussion both in preparation and throughout our AGI Roadmap Workshop on the process of develop-ing a roadmap. A traditional highway roadmap shows multiple driving routes across a landscape of cities and towns, natural features like rivers and and mountains, and political features like state and national borders. A technology roadmap typically shows a single progression of developmental milestones from a known starting point to a desired result. Our first challenge in defining a roadmap for achieving AGI was that we initially had neither a well defined starting point nor a commonly agreed upon target result. The history of both AI and AGI is replete with this problem, which is somewhat understandable given the breadth and depth of the subjects of both human intelligence and computer technology. We made progress by borrowing more metaphors from the highway roadmap, deciding to first define the landscape for AGI and then populate that landscape with milestones that may be traversed via multiple routes. The final destination, full human-level artificial general intelligence, encompasses a system that could learn, replicate (and possibly exceed) human level performance in the full breadth of cognitive and intellectual abilities. The starting point, however, is more problematic, since there are many current approaches to achieving AGI that assume different initial states. One broad commonality among workshop participants was interest in a developmental approach to thinking about the roadmap, following human cognitive development from birth through adulthood.

4.3 Tasks and Environments for Early-Stage AGI Systems

The first challenge we address is that of defining appropriate tasks and environments for as- sessing progress in AGI, where all of the key characteristics of an AGI system are active (for instance, all the requirements laid out by Laird and Wray in [LWML09], as explicated in Chap- ter ?? above). The usefulness of any task or environment for this purpose is critically dependent on it providing a basis for comparison between alternative approaches and architectures. With this in mind, we believe that both tasks and environments must be designed or specified with some knowledge of each other. For example, consider the DARPA Grand Challenge competition for developing an automobile capable of driving itself across a roughly specified desert course. While much of the architecture and even implementation done for this competition was useful for the later competition focused on autonomous city driving, it was also clear that the different environment required significant reconsideration of the tasks and subtasks themselves. 4.3 Tasks and Environments for Early-Stage AGI Systems 89

Tasks, therefore, require a context to be useful, and environments, including the AGI’s em- bodiment itself, must be considered in tandem with the task definition. Using this reasoning, we decided to define scenarios that combine both tasks and their necessary environments. Further, since we are considering multiple scenarios as well as multiple approaches and ar- chitectures, it is also important to be able to compare and connect tasks belonging to different scenarios. With this in mind, we chose to proceed by first articulating a rough heuristic list of human intelligent competencies. As a rule of thumb, we suggest, tasks may be conceived as ways to assess competencies within environments. However, contemporary cognitive science does not give us adequate guidance to formulate anything close to a complete, rigid and refined list of competencies; so what we present here must be frankly classified as an intuitive approach for thinking about task generation, rather than a rigorous analytical methodology from which tasks can be derived.

4.3.1 Environments and Embodiments for AGI

General intelligence in humans develops within the context of a human body, complete with many thousands of input sensors and output effectors, which is itself situated within the con- text of a richly reactive environment, complete with many other humans, some of which may be caregivers, teachers, collaborators or adversaries. Human perceptual input ranges from skin sensors reacting to temperature and pressure, smell and taste sensors in the nose and mouth, sound and light sensors in the ears and eyes, to internal proprioceptive sensors that provide information on gravitational direction and body joint orientation, among others. Human output effectors include the many ways we can impact the physical states of our environment, like pro- ducing body movements via muscles, sound waves using body motions or producing vibrations via vocal chords, for example. One of the distinctive challenges of achieving AGI is thus how to account for a comparably rich sensory/motor experience. As reviewed in Chapter ??, the history of AI is replete with vari- ous attempts at robotic embodiment, and virtual embodiment has also been utilized, exploiting software virtual environments that simulate bodies in environments. There are many challenges in determining an appropriate level of body sophistication and environmental richness in designing either physical or virtual bodies and environments. Regret- tably, nearly every AI or AGI project that has included situated embodiment has done so in some unique, incompatible and typically non-repeatable form. A notable exception to this have been various leagues of the Robocup competition, where a commercially available hardware robotic platform such as the Sony AIBO or Aldebaran Nao is specified, with contestants only allowed to customize the software of the robotic players. (Unfortunately, the limited nature of the Robocup task Ð a modified game of soccer, is not suitable as a benchmark for AGI.)

4.3.2 The Breadth of Human Competencies

Regardless of the particular environment or embodiment chosen, there are, intuitively speaking, certain competencies that an AI system must possess to be considered a full-fledged Human- Level AGI. For instance, even if an AI system could answer questions involving previously 90 4 Mapping the Landscape of AGI

Fig. 4.1: Nao robots participating in a RoboCup competition known entities and concepts vastly more intelligently than , if it could not create new concepts when the situation called for it, we would be reluctant to classify it as Human Level. While articulation of a precise list of the competencies characterizing human level general intelligence is beyond the scope of current science (and definitely beyond the scope of this paper!), it is nevertheless useful to explore the space of competen-cies, with a goal of providing heuristic guidance for the generation of tasks for AGI systems acting in environments. The following subsection lists a broad set of competencies that we explored at the roadmapping workshop: 14 high level competency areas, with a few key competency sub-areas corresponding to each. A thorough discussion of any one of these sub-areas would involve a lengthy review paper and dozens of references, but for the purposes of this paper an evocative list will suffice. We consider it important to think about competencies in this manner – because otherwise it is too easy to pair an testing environment with an overly limited set of tasks biased to the limitations of that environment. What we are advocating is not any particular competency list, but rather the approach of exploring a diverse range of competency areas, and then generat- ing tasks that evaluate the manifestation of one or more articulated competency areas within specified environments. Some of the competencies listed may appear intuitively simpler and “earlier-stage” than oth- ers, at least by reference to human cognitive development. The developmental theories reviewed near the start of this paper, each contain their own hypotheses regarding the order in which developing humans achieve various competencies. However, after careful consideration we have 4.3 Tasks and Environments for Early-Stage AGI Systems 91 concluded that the ordering of competencies doesn’t need to be part of an AGI capability roadmap. In many cases, different existing AGI approaches pursue the same sets of compe- tencies in radically different orders, so that imposing any particular ordering on competencies would be tantamount to assigning greater validity to particular AGI approaches. For example, developmental robotics and deep learningoriented AGI approaches often assume that perceptual and motor competencies should be developed prior to linguistic and inferential ones. On the other hand, logic-focused AGI approaches are often more naturally pursued via beginning with linguistic and inferential competencies, and moving to perception and actuation in rich environments only afterwards. By dealing with scenarios, competencies and tasks, but not specifying an ordering of competencies or tasks, we are able to present a capability roadmap that is friendly to all these approaches and more.

4.3.2.1 Key Competencies Characterizing Human-Level General Intelligence

In this subsection we give a fairly comprehensive list of the competencies that we feel AI systems should be expected to display in one or more of the scenarios given above in order to be considered as full-fledged "human level AGI" systems. These competency areas have been assembled somewhat opportunistically via a review of the cognitive and developmental psychology literature as well as the scope of the current AI field. We are not claiming this as a precise or exhaustive list of the competencies characterizing human-level general intelligence, and will be happy to accept additions to the list, or mergers of existing list items, etc. What we are advocating is not this specific list, but rather the approach of enumerating competency areas, and then generating tasks by combining competency areas with scenarios. We also give, with each competency, an example task illustrating the competency. The tasks are expressed in the context of one particular scenario – “robot preschool” – for concreteness, but they all apply to a virtual preschool scenario as well. In the following section we will discuss several other AGI learning/testing scenarios, and the imaginative reader will be able to formulate similar tasks for each competency in the context of each of these. Of course, these tasks are only individual examples, and ideally to teach an AGI in a struc- tured way, within a particular scenario, one would like to • associate several tasks with each competency • present each task in a graded way, with multiple subtasks of increasing complexity • associate a quantitative metric with each task However, the briefer treatment given here should suffice to give a sense for how the competencies manifest themselves practically in the AGI Preschool context. 1. Perception • Vision: image and scene analysis and understanding – Example task: When the teacher points to an object in the preschool, the robot should be able to identify the object and (if it’s a multi-part object) its major parts. If it can’t perform the identification initially, it can approach the object and manipulate it before making its identification. • Hearing: identifying the sounds associated with common objects; understanding which sounds come from which sources in a noisy environment 92 4 Mapping the Landscape of AGI

– Example task: When the teacher covers the robot’s eyes and then makes a noise with an object, the robot should be able to guess what the object is • Touch: identifying common objects and carrying out common actions using touch alone – Example task: With its eyes and ears covered, the robot should be able to identify some object by manipulating it; and carry out some simple behaviors (say, putting a block on a table) via touch alone • Crossmodal: Integrating information from various senses – Example task: Identifying an object in a noisy, dim environment via combining visual and auditory information • Proprioception: Sensing and understanding what its body is doing – Example task: The teacher moves the robot’s body into a certain configuration. The robot is asked to restore its body to an ordinary standing position, and then repeat the configuration that the teacher moved it into. 2. Actuation • Physical skills: manipulating familiar and unfamiliar objects – Example task: Manipulate blocks based on imitating the teacher: e.g. pile two blocks atop each other, lay three blocks in a row, etc. • Tool use, including the flexible use of ordinary objects as tools – Example task: Use a stick to poke a ball out of a corner, where the robot cannot directly reach • Navigation, including in complex and dynamic environments – Example task: Find its own way to a named object or person through a crowded room with people walking in it and objects laying on the floor. 3. Memory • Declarative: noticing, observing and recalling facts about its environment and expe- rience – Example task: If certain people habitually carry certain objects, the robot should remember this (allowing it to know how to find the objects when the relevant people are present, even much later) • Behavioral: remembering how to carry out actions – Example task: If the robot is taught some skill (say, to fetch a ball), it should remember this much later • Episodic: remembering significant, potentially useful incidents from life history – Example task: Ask the robot about events that occurred at times when it got partic- ularly much, or particularly little, reward for its actions; it should be able to answer simple questions about these, with significantly more accuracy than about events occurring at random times 4.3 Tasks and Environments for Early-Stage AGI Systems 93

4. Learning • Imitation: Spontaneously adopt new behaviors that it sees others carrying out – Example task: Learn to build towers of blocks by watching people do it • Reinforcement: Learn new behaviors from positive and/or negative reinforcement signals, delivered by teachers and/or the environment – Example task: Learn which box the red ball tends to be kept in, by repeatedly trying to find it and noticing where it is, and getting rewarded when it finds it correctly • Imitation/Reinforcement – Example task: Learn to play “fetch”, “tag” and “follow the leader” by watching people play it, and getting reinforced on correct behavior • Interactive Verbal Instruction – Example task: Learn to build a particular structure of blocks faster based on a combination of imitation, reinforcement and verbal instruction, than by imitation and reinforcement without verbal instruction • Written Media – Example task: Learn to build a structure of blocks by looking at a series of diagrams showing the structure in various stages of completion • Learning via Experimentation – Example task: Ask the robot to slide blocks down a ramp held at different angles. Then ask it to make a block slide fast, and see if it has learned how to hold the ramp to make a block slide fast. 5. Reasoning • Deduction, from uncertain premises observed in the world – Example task: If Ben more often picks up red balls than blue balls, and Ben is given a choice of a red block or blue block to pick up, which is he more likely to pick up? • Induction, from uncertain premises observed in the world – Example task: If Ben comes into the lab every weekday morning, then is Ben likely to come to the lab today (a weekday) in the morning? • Abduction, from uncertain premises observed in the world – Example task: If women more often give the robot food than men, and then someone of unidentified gender gives the robot food, is this person a man or a woman? • Causal reasoning, from uncertain premises observed in the world – Example task: If the robot knows that knocking down Ben’s tower of blocks makes him angry, then what will it say when asked if kicking the ball at Ben’s tower of blocks will make Ben mad? • Physical reasoning, based on observed “fuzzy rules” of naive physics 94 4 Mapping the Landscape of AGI

– Example task: Given two balls (one rigid and one compressible) and two tunnels (one significantly wider than the balls, one slightly narrower than the balls), can the robot guess which balls will fit through which tunnels? • Associational reasoning, based on observed spatiotemporal associations – Example task: If Ruiting is normally seen near Shuo, then if the robot knows where Shuo is, that is where it should look when asked to find Ruiting 6. Planning • Tactical – Example task: The robot is asked to bring the red ball to the teacher, but the red ball is in the corner where the robot can’t reach it without a tool like a stick. The robot knows a stick is in the cabinet so it goes to the cabinet and opens the door and gets the stick, and then uses the stick to get the red ball, and then brings the red ball to the teacher. • Strategic – Example task: Suppose that Matt comes to the lab infrequently, but when he does come he is very happy to see new objects he hasn’t seen before (and suppose the robot likes to see Matt happy). Then when the robot gets a new object Matt has not seen before, it should put it away in a drawer and be sure not to lose it or let anyone take it, so it can show Matt the object the next time Matt arrives. • Physical – Example task: To pick up a cup with a handle which is lying on its side in a position where the handle can’t be grabbed, the robot turns the cup in the right position and then picks up the cup by the handle • Social – Example task: The robot is given a job of building a tower of blocks by the end of the day, and he knows Ben is the most likely person to help him, and he knows that Ben is more likely to say "yes" to helping him when Ben is alone. He also knows that Ben is less likely to say "yes" if he’s asked too many times, because Ben doesn’t like being nagged. So he waits to ask Ben till Ben is alone in the lab. 7. Attention • Visual Attention within its observations of its environment – Example task: The robot should be able to look at a scene (a configuration of objects in front of it in the preschool) and identify the key objects in the scene and their relationships. • Social Attention – Example task: The robot is having a conversation with Itamar, which is giving the robot reward (for instance, by teaching the robot useful information). Conversations with other individuals in the room have not been so rewarding recently. But Itamar keeps getting distracted during the conversation, by talking to other people, or 4.3 Tasks and Environments for Early-Stage AGI Systems 95

playing with his cellphone. The robot needs to know to keep paying attention to Itamar even through the distractions. • Behavioral Attention – Example task: The robot is trying to navigate to the other side of a crowded room full of dynamic objects, and many interesting things keep happening around the room. The robot needs to largely ignore the interesting things and focus on the movements that are important for its navigation task. 8. Motivation • Subgoal creation, based on its preprogrammed goals and its reasoning and planning – Example task: Given the goal of pleasing Hugo, can the robot learn that telling Hugo facts it has learned but not told Hugo before, will tend to make Hugo happy? • Affect-based motivation – Example task: Given the goal of gratifying its curiosity, can the robot figure out that when someone it’s never seen before has come into the preschool, it should watch them because they are more likely to do something new? • Control of emotions – Example task: When the robot is very curious about someone new, but is in the middle of learning something from its teacher (who it wants to please), can it control its curiosity and keep paying attention to the teacher? 9. Emotion • Expressing Emotion – Example task: Cassio steals the robot’s toy, but Ben gives it back to the robot. The robot should appropriately display anger at Cassio, and gratitude to Ben. • Understanding Emotion – Example task: Cassio and the robot are both building towers of blocks. Ben points at Cassio’s tower and expresses happiness. The robot should understand that Ben is happy with Cassio’s tower. 10. Modeling Self and Other • Self-Awareness – Example task: When someone asks the robot to perform an act it can’t do (say, reaching an object in a very high place), it should say so. When the robot is given the chance to get an equal reward for a task it can complete only occasionally, versus a task it finds easy, it should choose the easier one. • Theory of Mind – Example task: While Cassio is in the room, Ben puts the red ball in the red box. Then Cassio leaves and Ben moves the red ball to the blue box. Cassio returns and Ben asks him to get the red ball. The robot is asked to go to the place Cassio is about to go. 96 4 Mapping the Landscape of AGI

• Self-Control – Example task: Nasty people come into the lab and knock down the robot’s towers, and tell the robot he’s a bad boy. The robot needs to set these experiences aside, and not let them impair its self-model significantly; it needs to keep on thinking it’s a good robot, and keep building towers (that its teachers will reward it for). • Other-Awareness – Example task: If Ben asks Cassio to carry out a task that the robot knows Cassio cannot do or does not like to do, the robot should be aware of this, and should bet that Cassio will not do it. • Empathy – Example task: If Itamar is happy because Ben likes his tower of blocks, or upset because his tower of blocks is knocked down, the robot is asked to identify and then display these same emotions 11. Social Interaction

• Appropriate Social Behavior – Example task: The robot should learn to clean up and put away its toys when it’s done playing with them. • Social Communication – Example task: The robot should greet new human entrants into the lab, but if it knows the new entrants very well and it’s busy, it may eschew the greeting • Social Inference about simple social relationships – Example task: The robot should infer that Cassio and Ben are friends because they often enter the lab together, and often talk to each other while they are there • Group Play at loosely-organized activities – Example task: The robot should be able to participate in “informally kicking a ball around” with a few people, or in informally collaboratively building a structure with blocks 12. Communication • Gestural communication to achieve goals and express emotions – Example task: If the robot is asked where the red ball is, it should be able to show by pointing its hand or finger • Verbal communication using English in its life-context – Example tasks: Answering simple questions, responding to simple commands, de- scribing its state and observations with simple statements • Pictorial Communication regarding objects and scenes it is familiar with 4.3 Tasks and Environments for Early-Stage AGI Systems 97

– Example task: The robot should be able to draw a crude picture of a certain tower of blocks, so that e.g the picture looks different for a very tall tower and a wide low one • Language acquisition – Example task: The robot should be able to learn new words or names via people uttering the words while pointing at objects exemplifying the words or names • Cross-modal communication – Example task: If told to "touch Bob’s knee" but the robot doesn’t know what a knee is, being shown a picture of a person and pointed out the knee in the picture should help it figure out how to touch Bob’s knee 13. Quantitative • Counting sets of objects in its environment – Example task: The robot should be able to count small (homogeneous or heteroge- neous) sets of objects • Simple, grounded arithmetic with small numbers – Example task: Learning simple facts about the sum of integers under 10 via teaching, reinforcement and imitation • Comparison of observed entities regarding quantitative properties – Example task: Ability to answer questions about which object or person is bigger or taller • Measurement using simple, appropriate tools – Example task: Use of a yardstick to measure how long something is 14. Building/Creation • Physical: creative constructive play with objects – Example task: Ability to novel, interesting structures from blocks • Conceptual invention: concept formation – Example task: Given a new category of objects introduced into the lab (e.g. hats, or pets), the robot should create a new internal concept for the new category, and be able to make judgments about these categories (e.g. if Ben particularly likes pets, it should notice this after it has identified "pets" as a category) • Verbal invention – Example task: Ability to coin a new word or phrase to describe a new object (e.g. the way Alex the parrot coined "bad cherry" to refer to a tomato) • Social – Example task: If the robot wants to play a certain activity (say, practicing soccer), it should be able to gather others around to play with it 98 4 Mapping the Landscape of AGI 4.3.3 Supporting Diverse Concurrent Research Efforts

Balancing the AGI community’s need for collaborative progress while still supporting a wide range of independent research and development efforts, we must allow for concurrent activity in many regions of the landscape. This requires that many of the unsolved “lower-level” aspects of general intelligence such as visual perception and fine-grained motor control be “finessed” in some of our scenarios, replaced by an approximate high-level implementation or even a human surrogate, until we have sufficiently rich underlying systems as a common base for our research. It is sometimes argued that since human intel-ligence emerges and develops within a complex body richly endowed with innate capabilities, AGI may only be achieved by following the same path in a similar embodiment. But until we have the understanding and capability to construct such a system, our best hope for progress is to foster concurrent but ultimately collaborative work.

4.4 Scenarios for Assessing AGI

In order to create a roadmap toward Human-Level AGI, one must begin with one or more particular “scenarios.” By a “scenario” we mean a combination tasks set within a specified environment, together with a set of assumptions pertaining to the way the AGI system interacts with the environment and the inanimate entities and other agents therein. Given a scenario as a basis, one can then talk about particular subtasks and their ordering and performance evaluation. A wide variety of scenarios may be posited as relevant to Human-Level AGI; here we review seven that were identified by one or more participants in the AGI Roadmap Workshop as being particularly interesting. Each of these scenarios may be described in much more detail than we have done here; this section constitutes a high-level overview, and more details on many of the scenarios is provided in the references.

4.4.1 General Video-game Learning

This scenario addresses many of these challenges at once: providing a standard, easily accessible situated embodiment for AGI research; a gradient of ever increasing sensory/motor requirements to support gradual development, measurement and comparison of AGI capabilities; a wide range of differing environments to test generality of learning ability; and a compelling, easily evaluated demonstration of an AGI’s capability or lack thereof by non-specialists, based on widely common human experiences playing various video games. The goal of this scenario would not be Human-Level performance of any single video game, but the ability to learn and succeed at a wide range of video games, including new games unknown to the AGI developers before the competition. The contestant system would also be limited to a sensory/motor interface to the game, such as video and audio output and controller input, and would be blocked from any access to the internal programming or states of the game implementation. To provide for motivation and performance feedback during the game, the normal scoring output would be mapped to a standard hedonic (/pleasure) interface for 4.4 Scenarios for Assessing AGI 99

Fig. 4.2: Example Tasks and Task Families Corresponding to Various Scenarios, Addressing Human General Intelligence Competencies the contestant. Contestants would have to learn the nature of the game through experimentation and observation, by manipulating game controls and observing the results. The scores against a prepro-grammed opponent or storyline would provide a standard measure of achievement along with the time taken to learn and win each kind of game. The range of video games used for testing in this scenario could be open ended in both simplicity and sophistication. Early games like Pong might itself prove too challenging a starting point, so even simpler games may be selected or developed. Since success at most video games would require some level of visual intelligence, General Video Game Learning would also provide a good test of computer vision techniques, ranging from simple 2D object identification and 100 4 Mapping the Landscape of AGI

Fig. 4.3: Example Tasks and Task Families Corresponding to Various Scenarios, Addressing Human General Intelligence Competencies, continued tracking in Pong to full 3D perception and recognition in a game like Halo or HalfLife at the highest levels of performance. Various genres of video games such as the early side-scrolling Mario Brothers games to First Person Shooters like Doom to flight simulations like the Star Wars X-Wing series provide rich niches where different researchers could focus and excel, while the common interface would still allow application of learned skills to other genres. Among other things, in order to effectively play many videos games a notable degree of strategic thinking should be demonstrated. This refers to the ability to map situations to actions, while considering not just the short-term but also the longer-term implications of choosing an action. Such capability, often associated with the notion of the credit assignment problem, is one that remains to be demonstrated in scalable way. GVL provides a effective platform for such demonstration. This scenario has the added benefit of supporting the growth of a research community focused on the specification and development of appropriate tests for AGI. Such a community would create the video game tests, probably in collaboration with the original game vendors, and both administer the tests and report the results via an internet forum similar to the SPEC- style system performance reporting sites.

4.4.2 Preschool Learning

In the spirit of the popular book “All I Really Need to Know I Learned in Kindergarten” [Ful89], it is appealing to consider early childhood education such as kindergarten or preschool as inspiration for scenarios for teaching and testing Human-Level AGI systems. The details of this scenario are fleshed out in [GB09]. 4.4 Scenarios for Assessing AGI 101

This idea has two obvious variants: a physical preschool-like setting involving an AI-controlled robot, and a virtual-world preschool involving an AI-controlled virtual agent. The goal in such scenarios is not to precisely imitate human child behavior, but rather to use AI to control a robot or virtual agent qualitatively displaying similar cognitive behaviors to a young human child. In fact this sort of idea has a long and venerable history in the AI field Ð Alan Turing’s original 1950 paper on AI, where he proposed the “Turing Test”, contains the suggestion that ”Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child’s?” [Tur50] This “childlike cognition” based approach seems promising for many reasons, including its integrative nature: what a young child does involves a combination of perception, actuation, linguistic and pictorial communication, social interaction, conceptual problem solving and cre- ative imagination. Human intelligence develops in response to the demands of richly in-teractive environments, and a preschool is specifically designed to be a richly interactive environment with the capability to stimulate diverse mental growth. The richness of the preschool environ- ment suggests that significant value is added by the robotics based approach; but a lot can also potentially be done by stretching the boundaries of current virtual world technology. Another advantage of focusing on childlike cognition is that child psychologists have created a variety of instruments for measuring child intelligence. So in a preschool context, one can present one’s AI system with variants of tasks typically used to measure the intelligence of young human children. It doesn’t necessarily make sense to outfit a virtual or robot preschool as a precise imitation of a human preschool Ð this would be inappropriate since a contemporary robotic or virtual body is rather differently capable than that of a young human child. The aim in constructing an AGI preschool environment should rather be to emulate the basic diversity and educational character of a typical human preschool. To imitate the general character of a human preschool, we suggest to create several centers in a virtual or robot preschool. The precise architecture will be adapted via experience but initial centers might be, for instance: • a blocks center: a table with blocks of various shapes and sizes on it • a language center: a circle of chairs, intended for people to sit around and talk with the AI • a manipulatives center, with a variety of different objects of different shapes and sizes, intended to teach visual and motor skills • a ball play center: where balls are kept in chests and there is space for the AI to kick or throw the balls around • a dramatics center where the AI can observe and enact various movements

4.4.3 Reading Comprehension

The next scenario is closely related to the previous one, but doesn’t require embodiment in a virtual world, and makes a special commitment as to the type of curriculum involved. In this scenario, an aspiring AGI should work through the grade school reading curriculum, and take and pass the assessments normally used to assess the progress of human children. This requires the obvious: understand a natural language (NL) text, and answer questions about it. However, it also requires some not so obvious abilities. 102 4 Mapping the Landscape of AGI

Very early readers are picture books that tightly integrate the pictures with the text. In some, the story is mostly conveyed through the pictures. In order to understand the story, the pictures must be understood as well as the NL text. This requires recognizing the characters and what the characters are doing. Reference resolution is required between characters and events mentioned in the text and illustrated in the pictures. The actions that the characters are performing must be recognized from ”snapshot” poses, unlike the more usual action recognition from a sequence of frames taken from a video. The next stage of readers are ”early chapter books,” which use pictures to expand on the text. Although the story is now mainly advanced through the text, reference resolution with the pictures is still important for understanding. Instructors of reading recognize ”four roles of a reader as: meaning maker, code breaker, text user, and text critic... meaning makers read to understand... comprehension questions [explore] literal comprehension, inferential comprehension, and critical thinking” [? ]. Code breakers translate written text to speech and vice versa. Text users identify whether the text is fiction or nonfiction. ”If the book is factual, they focus on reading for information. If a text is fiction, they read to understand the plot, characters, and message of the story.... Text critics evaluate the author’s purpose and the author’s decisions about how the information is presented ... check for social and cultural fairness ... look for misinformation ... and think about their own response to the book and whether the book is the best it might be” [? ]. The roles of meaning maker, code breaker, and text user are mostly, though not entirely, familiar to people working in Natural Language Understanding. However, the role of text critic is new, requiring meta-level reasoning about the quality of the text, its author’s intentions, the relation of the text to the reader’s society and culture, and the reader’s own reaction to the text, rather than merely reasoning about the information in the text. This scenario fulfills most of the Laird/Wray criteria handily, and a few less obviously. The AGI’s environment in this scenario is not dynamic in a direct sense (C2), but the AGI does have to reason about a dynamic environment to fulfill the tasks. In a sense, the tasks involved are fixed rather than novel (C5), but they are novel to the AGI as it proceeds through them. Other agents impact task performance (C4) if group exercises are involved in the curriculum (which indeed is sometime the case). Many of the specific abilities needed for this scenario are discussed in the next scenario, on scene and story comprehension.

4.4.4 Story/Scene Comprehension

The next scenario, focused on scene and story comprehension, shares something with the preschool and reading comprehen-sion scenarios. Like the reading curriculum scenario, it fo- cuses on a subset of child-like performance Ð but a different subset, involving a broader variety of engagements with the world. ”Scene comprehension” here does not mean only illustrations, but real-world scenes, which can be presented at different granularities, media and difficul- ties (cartoons, movies, or theatrical performances for instance). This approach differs from the reading curriculum scenario, in that it more directly provides a dynamic environment. If group exercises are included then all the Laird/Wray criteria are fulfilled in a direct and obvious way. The story/scene comprehension scenario focuses on the relationship between perception, and natural language. Here, the system might be presented with a story, which it needs to analyze semantically and re-present in a differ-ent form (for instance, a movie sequence, or a re-telling). In the other direction, the system is shown a movie or cartoon se- 4.4 Scenarios for Assessing AGI 103 quence, which it needs to comprehend and re-present using language. This approach lends itself to a variety of standardizable test scenarios, which allow the direct comparison of competing AGI architectures with each other, and with child perfor-mance. Note that these tests ignore the interaction of the system and the environment, and the particular needs of the system itself. There is a lot to the argument that general intelligence might only be attained by a motivated, interacting system, but perhaps not enough to introduce this notion as a prerequisite to the test. Rather, we might want to compare the test performance of motivated, interactive architectures with ”merely” processing oriented architectures. The performances of story and scene comprehension are probably not independent, but related, since the tasks focus on modeling the relationships between perception, language and thought, which are not unidirectional. Further tasks might focus on the interplay between these areas, by combining expression and comprehension. This could either be done completely inter- active (i.e., by designing a system that has to respond to a teacher), or “self interactive” (two agents learn to communicate with each other, while sharing an environment).

4.4.5 School Learning

The ”virtual school student” scenario continues the virtual preschool scenario discussed above, but is focused on higher cognitive abilities, assuming that, if necessary, lower-level skills will be finessed. In particular, it is assumed that all interface with the agent is implemented at a symbolic level: the agent is not required to process a video stream, to recognize speech and gestures, to balance its body and avoid obstacles while moving in space, etc. All this can be added as parts of the challenge, but part of the concept underlying the scenario is that it can also be finessed. On the other hand, it is critical to the scenario for the agent to make academic progress at a human student level, to understand human minds, and to understand and use practically class-related social relations in the environment in which it is embedded. In this scenario, the agent is embedded in a real high school classroom by means of a virtual- reality-based interface. The agent lives in a symbolic virtual world that is continuously displayed on a big screen in the classroom. The virtual world includes a virtual classroom represented at a symbolic (object) level, including the human instructor and human students represented by simplistic avatars. The agent itself is represented by an avatar in this virtual classroom. The symbolic virtual world is ”synchronized” with the real physical world with the assistance of intelligent monitoring and recording equipment performing scene analysis, speech recognition, language comprehension, gesture recognition, etc. (if necessary, some or all of these functions will be performed by hidden human personnel running the test; students should not be aware of their existence). The study material, including the textbook and other curriculum materials available to each student, will be encoded electronically and made available to the agent at a symbolic level. The second part of the challenge means that the agent will be evaluated not only based on its learning and problem solving performance, but also based on its approach to problem solving and based on its interactions with students and with the in-structor. Here the general metrics for self-regulated learning can be used [WP00]. In addition, social performance of the agent can be evaluated based on surveys of students and using standard psychological metrics. Another, potentially practically important measure is the effect of the agent presence in the classroom on student learning. 104 4 Mapping the Landscape of AGI 4.4.6 The Wozniak Test

In an interview a few years ago [WM07] , Steve Wozniak of Apple Computer fame expressed doubt that there would ever be a robot that could walk into an unfamiliar house and make a cup of coffee. We feel that the task is demanding enough to stand as a “Turing Test” equivalent for embodied AGI. Note that the Wozniak Test is a single, special case of Nils Nilsson’s general ”Employment Test” for Human-Level AI [? ]. A robot is placed at the door of a typical house or apartment. It must find a doorbell or knocker, or simply knock on the door. When the door is answered, it must explain itself to the householder and enter once it has been invited in. (We will assume that the householder has agreed to allow the test in her house, but is otherwise completely unconnected with the team doing the experiment, and indeed has no special knowledge of AI or robotics at all.) The robot must enter the house, find the kitchen, locate local coffee-making supplies and equipment, make coffee to the householder’s taste, and serve it in some other room. It is allowed, indeed required by some of the specifics, for the robot to ask questions of the householder, but it may not be physically assisted in any way. The state of the robotics art falls short of this capability in a number of ways. The robot will need to use vision to navigate, identify objects, possibly identify gestures (“the coffee’s in that cabinet over there”), and to coordinate complex manipulations. Manipulation and physical modeling in a tight feedback learning loop may be necessary, for example, to pour coffee from an unfamiliar pot into an unfamiliar cup. Speech recognition and natural language understanding and generation will be necessary. Planning must be done at a host of levels ranging from manipulator paths to coffee brewing sequences. But the major advance for a coffee-making robot is that all of these capabilities must be coordinated and used appropriately and coherently in aid of the overall goal. The usual set-up, task definition, and so forth are gone from standard narrow AI formulations of problems in all these areas; the robot has to find the problems as well as to solve them. That makes coffee- making in an unfamiliar house a strenuous test of a system’s adaptivity and ability to deploy common sense. Although standard shortcuts might be used, such as having a database of every manufactured coffeemaker built in, it would be prohibitive to have the actual manipulation sequences for each one pre-programmed, especially given the variability in workspace geometry, dispensers and containers of coffee grounds, and so forth. Transfer learning, generalization, reasoning by analogy, and in particular learning from example and practice are almost certain to be necessary for the system to be practical. Coffee-making is a task that most 10-year-old humans can do reliably with a modicum of experience. A week’s worth of being shown and practicing coffee making in a variety of homes with a variety of methods would provide the grounding for enough generality that a 10 year old could make coffee in the vast majority of homes in a Wozniak Test. Another advantage to this test is it would be extremely difficult to “game” or cheat, since the only reasonably economical way to approach the task would be to build general learning skills and have a robot that is capable of learning not only to make coffee but any similar domestic chore. 4.5 From Scenarios to Tasks, Metrics and Challenges 105 4.4.7 Remaining Whitespace

In our view, the suggested scenarios represent a significant step in populating the AGI Land- scape, but many alternative sce-narios are conceivable, and might even cover areas that we have incompletely covered or not mentioned at all. Among the areas that might need more attention, are: • Aesthetical appreciation and performance: Composition, literature, artistic portrayal, dance, etc. Examining the appreciation of aesthetics is bound to increase our understanding of the motivational system, of mental representation, and of meta-cognition. Aeathetic per- formance adds manipulation, creation and adaptation of physical and imagined objects, aspects of sociality and many things more. • Structured social interaction: Goal adoption, collaboration, competition and exploitation, negotiation, discourse and joint decision making, group organization, leadership and so on. Scenarios in the social domain will require representations and assessment of social setups, the mental states of individuals and possible ramifications for the agent’s own goals. • Skills that require high cognitive function and integration, for instance, complex rescue scenarios, shopping, and assist-ence. Each of these task areas asks for a complex coordination between interaction, exploration, evaluation and planning. Thus, we encourage and invite suggestions for additional scenarios, as long as these are firmly routed in an AGI approach, i.e., each scenario should foster our understanding of general, human-like intelligence, and do not bolster narrow engineering solutions to a limited task.

4.5 From Scenarios to Tasks, Metrics and Challenges

The articulation of scenarios and competency areas, as we have done, makes it possible to articulate specific tasks for as-sessing progress toward AGI in a principled manner. For each of the scenarios reviewed above (or other analogous scenarios), one may articulate a specific task-set, where each task addresses one or more of the competency areas in the context of the scenario. To constitute a reasonable approximation of an AGI test suite or overall “AGI Challenge,” the total set of tasks for a scenario must cover all the competency areas. Each task must also be associated with some particular performance metric – quantitative wherever possible, but perhaps qualitative in some cases depending on the nature of the task. The obvious risk of an AGI evaluation approach based on a long list of tasks is that it is susceptible to solution by a “big switch statement” type approach, in which separate narrowly- specialized programs corresponding to the individual tasks are combined together in a simplistic harness. Some might argue that this is a feature not a bug, because they may believe that human intelligence is well approximated by a big switch statement of this sort. However, we take a slightly different perspective (which doesn’t rule out the possibility that human intelligence is a big switch statement, but also doesn’t focus on this possibility). Also, the competency areas alluded to above include many that focus on learning and gen- eralization. So if a Ôbig switch’ statement is used to handle tasks based on these requirements, the switch will be choosing between specialized programs that, in themselves can handle a great deal of learning. 106 4 Mapping the Landscape of AGI 4.5.1 Example Tasks

We have already given some example tasks above, in the context of the AGI preschool scenario. For sake of brief further illustration, Tables ?? and ?? roughly describe a handful of example tasks and task families corresponding to a few of the other scenarios described above. Much of the work of fleshing out the roadmap roughly described here into a more precise roadmap, would involve systematically creating a substantial list of tasks corresponding to the scenarios chosen, using an exploratory enumeration of human general intelligence competencies as a guide. Each task would then need to be associated with quantitative performance metrics. Some of the examples in these tables are highly specific, others are broader task families. Our feeling is that it will generally be better to identify moderately broad task families rather than highly specific tasks, in order to encourage the creation of systems with more robust and flexible functionality.

4.5.2 Multidimensional Challenges

Summing up, what then is the right sort of challenge to present to a would-be AGI system? We suggest it is: to address a few dozen closely interrelated, moderately broadly-defined tasks in an environment drawn from a moderately broad class (e.g. an arbitrary preschool or textbook rather than specific ones determined in advance). If the environment is suitably rich and the tasks are drawn to reflect the spectrum of human general intelligence competencies, then this sort of challenge will motivate the development of genuine Human-Level AGI systems. As noted above, prioritization or temporal ordering of tasks need not be part of such a challenge, and perhaps should not be if the goal is to appeal to AGI researchers pursuing a broad variety of approaches. This sort of challenge is not nearly as neat and clean as, say, a chess contest, RoboCup or the DARPA Grand Challenge. But we feel the complexity and heterogeneity here is directly reflective of the complexity and heterogeneity of human general intelligence. Even if there are simple core principles underlying human-level general intelligence, as some have posited, nevertheless the real-world manifestation of these principles - which is what we can measure - is complex and involves mul-tiple interrelated competencies defined via interaction with a rich, dynamic world.

4.5.3 Challenges and Competitions

Finally, we emphasize that our main goal here is to propose a capability roadmap for the AGI research community, not a competition or contest. Competitions can be valuable, and we would be happy to see one or more developed along the lines of the capability roadmap we have described. However, no such competition could ever constitute the “Ôbe all and end all”of a roadmap such as this. One reason for this is that a competition is very likely to involve concrete decisions regarding the prioritization or temporal ordering of tasks, which will inevitably result in the competition having widely differential appeal to AGI researchers depending on their chosen scientific approach. 4.6 Roadmapping as an Ongoing Process 107

The primary value to be derived, via wide adoption of a capability roadmap of this nature, is the concentration of multiple researchers pursuing diverse approaches on common scenarios and tasks – which allows them to share ideas and learn from each other much more than would be the case otherwise. There is much to be gained via multiple researchers pursuing the same concrete challenges in parallel, with or without the presence of an explicit judged competition.

4.6 Roadmapping as an Ongoing Process

The roadmapping exercise carried out at the 2009 AGI Roadmap workshop, and summarized here, goes only a certain distance toward the goal of a detailed roadmap toward human-level AGI. We have already explicitly noted, along the way, many of the limitations of our treatment. There is no consensus among AGI researchers on the definition of general intelligence, though we can generally agree on a pragmatic goal. The diversity of scenarios presented reflects a diversity of perspectives among AGI researchers regarding which environments and tasks best address the most critical aspects of Human Level AGI. Most likely neither the tentative list of competencies nor the Laird/Wray criteria are either necessary or sufficient. There is no obvious way to formulate a precise measure of progress toward Human Level AGI based on the competencies and scenarios provided Ð though one can use these to motivate potentially useful approximative measures. But, in spite of its acknowledged limitations, a capability roadmap of the sort outlined here • allows multiple researchers following diverse approaches to compare their work in a mean- ingful way • allows researchers, and other observers, to roughly assess the degree of research progress toward the end goal of Human Level AGI • allows work on related road-mapping aspects, such as tools roadmaps and study of social implications and potential future applications, to proceed in a more structured way Thus we feel that even such a partial roadmapping exercise has value on its own, in terms of clarifying various issues involved. But more particular and elaborate roadmaps focused on particular scenarios would also be very worthwhile. The next step, if someone chose to undertake it, would be to put more meat on the bones: describe the scenarios in more detail, refine the list of specific competency areas, and then create tasks and metrics along the lines outlined above. Once this is done, we would have a capability roadmap clearly articulating some plausible pathways to Human Level AGI; and a more solid basis for creating holistic, complex challenges meaningfully assessing progress on the AGI problem. Such a roadmap, even if more fully fleshed out in the context of some specific scenario, would not give a highly rigorous, objective way of assessing the percentage of progress toward the end- goal of Human Level AGI. However, it gives a much better sense of progress than one would have otherwise. For instance, if an AGI system performed well on diverse metrics corresponding to tasks assessing 50% of the competency areas listed above, in several of the above scenarios, the creators of this system would seem justified in claiming to have made very substantial progress toward Human Level AGI. If the number were 90% rather than 50%, one would seem justified in claiming to be ”almost there.” If the number were 25%, this would give one a reasonable claim to ”interesting AGI progress.” This kind of qualitative assessment of progress is not the 108 4 Mapping the Landscape of AGI most one could hope for, but it is better than the progress indications one could get without this sort of roadmap. Section II Background from Neural and Cognitive Science

Chapter 5 A Neuroscience Primer

Leslie Allan Combs and Stanley Krippner

Abstract Contemporary neuroscience is an increasingly important, indeed essential, com- ponent to the understanding of human nature in virtually every academic or applied context. Different AGI approaches rely on neuroscience to different extents – some very closely, some not at all, and some roughly and loosely. However, it behooves every student of AGI to understand a reasonable amount about how the brain works, if only because the human brain is our best current example of a reasonably generally intelligent system. This chapter is a self-contained neuroscience primer leading the student to a basic understanding of the nervous system and particularly the brain. It begins with the basic facts and principles of nerve cells and the func- tional organization of the nervous system, and proceeds to examine the role of the nervous system in such functions as emotion, cognition, and other aspects of consciousness. 1

5.1 Introduction

During the past decade, and continuing, no field in science has moved forward with more vigor and enthusiasm than the study of the brain and its relationship to human experience of all kinds. It is no longer possible to study the psychology of memory, language, emotion, or moti- vation, much less to inquire into the nature of consciousness itself, without some knowledge of the fundamental makeup and function of the central nervous system and particularly the brain. Furthermore, the applied fields of clinical assessment and treatment rely increasingly on neuro- logical evaluations and sophisticated knowledge of the emotional and cognitive mechanisms of the brain. Interest in the nervous system is not historically new, but the veritable explosion of re- search and knowledge about it in recent decades, with its increasing concentration on high level functions such as emotion, memory, planning and problem solving, and even the mystery of con- sciousness itself, has resulted in a great deal of enthusiasm among professional neuroscientists and lay people alike. The term “neuroscience” emphasizes the fact that the study of the nervous system, including the brain, is no longer the province of any single field such as clinical neurology, experimental physiology, cell biology, and the like. Rather, it is a common interest of a very wide variety of

1 Leslie Allan Combs gave lectures covering much of the material in this chapter at the 2009 AGI Summer School at Xiamen University; videos of these lectures are available online.

111 112 5 A Neuroscience Primer disciplines. These include, as well as the above, fields as disparate as psychiatry, molecular biol- ogy, artificial intelligence (AI) studies, bioengineering, cognitive psychology, and the , all of which are vitally interested in the brain and its relationship to consciousness and behavior (Baars, Banks, and Newman, 2003). These days, a typical conference on the brain is attended and addressed by representatives of all of these fields, among others. The aim of this primer is to provide the student with a solid knowledge of the organization and function of the human nervous system. This includes a fundamental understanding of nerve cells and how they communicate, and a basic knowledge of the brain’s functional organization. The latter entails an understanding of the major sensory and motor systems, the regulation of arousal, drive states, and emotion, and the higher cognitive processes represented by language, thought, and memory. The primer begins with an examination of the basic neuronal processes that underlie virtually all important activities in the nervous system. It then proceeds to an examination of the overall structure of the brain, emphasizing functional organization. Following this it reviews the basic functional systems of the brain such as those involved with the motor control of the body, the major sensory systems with special emphasis on vision, emotions, and drives, biological rhythms, sleep and other states of consciousness, memory, and language and cognition. such as those involved with the motor control of the body, the major sensory systems with special emphasis on vision, emotions, and drives, biological rhythms, sleep and other states of consciousness, memory, and language and cognition. This short primer does not include graphic illustrations of the brain. For this reason the reader is advised to accompany its reading with one or more texts or other materials that offer visual representations of the nervous system and its internal structures referred to here. Online resources providing visual illustrations of various regions of the brain include • http://www.pbs.org/wnet/brain/ • XXXX insert more

5.2 A Brief History of Neuroscience

The study of the brain and its involvement with behavior and consciousness is very old. Skulls dating as far back as 10,000 years have been found with holes drilled through them, presumably to release evil spirits or otherwise aid the patient. Egyptian physicians, famous throughout the Mediterranean world, left documents describing surgical procedures and in a few instances even brain surgery. The Edwin Smith papyrus, dating to about 3,000 B.C.E., and apparently written by such a surgeon, describes a head injury that exposed the brain. The commentary implies the writer understood the importance of the brain as the seat of bodily sensations, although ancient Egyptian traditionally located the soul in oneÕs heart. Given the inquisitiveness of the ancient Greeks, it is not surprising that they were among the first to systematically speculate about the function of the brain. Strangely enough, the first scientifically minded Greek philosopher, (384-322 B.C.E.), believed, as did his teacher , that the mind was not in the head, but in the heart. The brain was said to be important for cooling the body. Physicians, nonetheless, early on understood the importance of the brain for the registering of sensations and the productions of behavior. Not long after Aristotle, two of the foremost physicians of the ancient world, Herophilus (335-280 B.C.E.) and Erasistratus (310-250 B.C.E.), actually conducted numerous dissections, tracing out the spinal 5.2 A Brief History of Neuroscience 113 cord and many of the cranial and spinal nerves. Herophilus even divided the latter into those that were voluntary and those that were involuntary. Erasistratus rightly concluded that the richness of the brain’s convolutions is related across different animal species to intelligence. Four centuries later, the Greek physician, Galen of Pergamon (131-200 C.E.), a Roman citizen and personal physician to the Emperor Commodus, son of Marcus Aurelius, studied the brain again, emphasizing the importance of its internal fluid channels. Galen speculated on virtually all aspects of health and the human body. His writings, of which only a few of many survived, dominated medical theory and practice right down to the rise of the Renaissance when, for the first time in many centuries, anatomists were again able to base their observations on dissections of human bodies. Galen’s ideas were actually not new even in his own day, but grew from very old traditions in Greek medicine that emphasized the importance of gases and fluids in the living body. The theory of gases, or pneuma, was most important to Galen, who believed the brain’s fluid-filled channels to be suffused with an animal or "psychical" spirit. This substance flowed out through the nerves to the muscles when these channels twitched or contracted. This theory amounts to a kind of subtle hydraulics, but cannot be understood entirely in physical terms. The word pneuma means air, but is also colored by the idea of breath or inspiration and is thus associated with spirit. Galen believed the entire body to be animated by three kinds of pneuma, one centered in the liver, one in the heart, and the third in the brain. Together they controlled the functions of the body such as digestion and reproduction, and, from the brain, voluntary and involuntary movement. The modern understanding of the brain can be dated to the rise of empirical science and the mechanistic worldview during the 17th and 18th centuries; the word "neurology" was coined by Thomas Willis in 1681. Machines became the dominant metaphor for a cosmos, which after ’s 1687 publication of the Principia Mathematica, seemed nothing so much as a wondrously perfect clock. It even appeared to many that the human body must surely be nothing less than a machine as well. In 1748, a Frenchman named La Mettrie published an influential work titled L’Homme Machine (The Human Machine). La Mettrie argued not only that the body is a machine, but that the mind is simply a by- product of the physical processes of the brain. The idea that the body is a machine, however, was not new with La Mettrie. It was an idea that had already become influential a century earlier through the writings of Rene Descartes, one of the most important thinkers in the history of Western civilization. Standing in the palace gardens at Versailles, Descartes had been impressed with the unusual statues there, on which the arms and legs moved in life-like motions. The statues were animated by changes of water pressure in hidden conduits, or in other words, by hydraulics. This was suggestive of Galen’s ideas, still well-known in Descartes’ day, and it led Descartes to conclude that he was witnessing the actual mechanism of human body movement. It took no great conceptual leap to replace Galen’s spirits with the more substantial cerebrospinal fluid in this hydropic scheme of things. Unlike La Mettrie, Descartes was not a materialist. He needed to put the soul (read: "mind") into this system somewhere. He did so at the pineal gland, a small organ which lies just above the apparent juncture of the fluid-filled cerebral aqueducts. The pineal gland was thought to transmit sensations to the soul, and at the soul’s command cause contractions of the aqueducts, sending fluid pressure to the muscles of the body and causing them to contract. The body was seen as a kind of marionette played by the soul, but using hydraulics instead of strings. The implications of this picture for the future understanding of both the body and the soul, or mind, were enormous, as it separated them off into completely distinct categories that only interacted 114 5 A Neuroscience Primer in this very restricted way. At the same time, the quick and widespread acceptance of Descartes’ ideas, especially by the Church, freed up the physical realm to empirical investigation, including the human body. It was not long, for example, before William Harvey demonstrated that blood flows through the vessels of the body, pumped by the heart. In time, Descartes’ ideas about hydraulics were put to scientific tests. For instance, if muscles are caused to contract by fluid pressure, then they must become engorged with the fluid when they were contracted. This implies that they should have greater volume than when relaxed. Experiments were conducted in which frog leg muscles were caused to contract in water-filled containers, but failed to produce the predicted rises in water level. Also arguing against the hydraulic notion was the fact that individual nerve cells, which were supposed to carry the fluids to the muscles, are very small. It seemed unlikely that they could transport sufficient volumes of fluid to yield the swift and effortless ease of natural muscle contraction. Some theorists suggested that the working fluid was not a liquid at all, but finer and more subtle. This led to the suggestion of some form of gas such as air, as the Greeks had long ago suspected, or perhaps a thin gas such as hydrogen. The latter was advocated by the "balloonist" school, named after the hydrogen gas used in recreational ballooning. Efforts were made to extract such gases directly from nerves and muscles by holding them under the water, cutting them open, and squeezing. This did not work. It was only with the beginnings of the study of electricity that a modern understanding of nerve action began to present itself. Electricity was a fascination of the 18th and 19th centuries in the West. Much of the early research on it involved living organisms, which suggested an intimate connection with life itself. It was this connection that Mary Shelley capitalized on in her great 1818 romantic fantasy, Frankenstein (which came to her in a series of hypnagogic pre-sleep images). In 1780 the Italian physicist Luigi Galvani discovered that an electrical discharge to the muscle of a frog’s leg causes it to contract. About this time the great German physiologist Johannes Mueller observed a "current of injury" that arises from muscle tissue that is exposed in wounds received on the battlefield. Such observations made it clear that muscles and nerves are involved with electricity in an essential way. The discovery resulted in a controversy about the velocity of nerve action. Estimates ranged from 50 meters per second up to about sixty times the speed of light, while some thought the nerve impulse was instantaneous, as electricity was believed to be. In the mid-1800s, one of Mueller’s students, a young man named Hermann von Helmholtz, who was destined to be one of the greatest scientists of the 19th century, came to suspect that chemical activity as well as electricity was involved in nerve action. If this were the case, then it must be much slower than the high estimates posited earlier. He conducted two brilliant experiments, one involving nerve action in a frog nerve-and-muscle preparation, and the other using human reaction time delays for tactile stimulation to the toe and the upper thigh. He thereby demonstrated nerve impulse velocities in the range of 50 to 100 meters per second. The demonstration of this very finite conduction velocity gave great hope to neurologists of the day that the mysteries of the nervous system would soon submit to scientific . The picture of the nervous system that emerged during the late 19th century was that of a complex electrical circuit: not one of instantaneous activity, but one in which neurological events took place in finite biological time. In the early decades of the 20th century the British experimental neurologist, Lord Adrian, would refer to the brain as an "enchanted loom" in which the intricate execution of electrical activity is like the dance of a shuttle, weaving complex patterns of thought and behavior. loom" in which the intricate execution of electrical activity is like the dance of a shuttle, weaving complex patterns of thought and behavior. 5.2 A Brief History of Neuroscience 115

During the latter half of the 19th century, two important debates were carried out about how the brain, and the neurons, its most important cells, are organized. We will consider the debate about neurons first. In 1839 the prominent zoologist Theodore Schwann proposed that the nervous system is composed of individual cells, which later came to be known as neurons. The celebrated anatomist Camillo Golgi, in the latter half of the 19th century, believed that neurons, with their long thread-like extensions that carry action potentials, but which were too small to see in detail until almost a century later, were connected together at the ends like blood vesicles, forming one continuous neural-web throughout the entire brain and nervous system. The great Spanish histologist, Santiago Ramon y Cajal, at the same time, argued to the contrary that neurons do not fuse together at their ends at all, but only come near each other, communicating by contact rather than continuity. This idea was known as the neuron doctrine. The disagreement was still hotly contested between the two men when they shared a Nobel Prize in 1906. But over subsequent years, evidence accumulated in favor of the neuron doctrine. With the advent of electron microscopy in the 1950s the issue was finally settled in its favor. The idea of separate neurons communicating across narrow gaps, or synapses, is now understood to be an essential feature of the way the nervous system operates. The second debate dealt with the localization of function in the nervous system, especially with regard to its higher operations. We can trace questions about the functional anatomy of the nervous system all the way back to the ancients, when Galen observed that the forebrain, or cerebrum, is soft on direct inspection and thus likely to be the plastic repository of memory, while the harder cerebellum must, on the other hand, control motor functions. In 1809, Franz Joseph Gall proposed the idea that different parts of the cortex of the cerebrum actually serve different and very specific mental functions such as memory, the carrying out of computations, the composing of rhymes, and so on. This idea came to be known as phrenology. He argued that these various areas of the cortex actually grew in size as they developed, causing visible extensions of the skull, or bumps, that could be observed and measured. Thus phrenology became a practical study of human character as well as a theory about the organization of the brain. In time it became so popular that one might be asked to bring a certified phrenologist’s report along to a job interview to prove one’s qualifications! Despite phrenology’s popularity, it was despised as a pseudo-science by many neurologists, and indeed it was just that. None held it in greater contempt than the great French physiologist Marie-Jean-Pierre Florens, who argued, on the basis of his own experimental ablations (removal) of brain tissue in animals, that the action of the cortex was unified, not able to be segregated into discrete functions, as the phrenologists claimed. He carried on his debate with a vengeance throughout the mid-19th century, but eventually was proven wrong, at least in principle, by discoveries such as those of the French neurologist Paul Broca, who in 1861 discovered an area in the back of the left frontal lobe essential for the formation of normal speech. In 1870, the German experimentalists Gustav Fritsch and Edward Hitzig located a discrete region of the dog’s upper frontal lobe that, when electrically stimulated, produced contractions of various muscles in the dog’s body. These as well as other findings began to accumulate, demonstrating that the brain is indeed, articulated into discrete functional regions. -19th century, but eventually was proven wrong, at least in principle, by discoveries such as those of the French neurologist Paul Broca, who in 1861 discovered an area in the back of the left frontal lobe essential for the formation of normal speech. In 1870, the German experimentalists Gustav Fritsch and Edward Hitzig located a discrete region of the dog’s upper frontal lobe that, when electrically stimulated, produced contractions of various muscles in the dog’s body. These as well as other findings began to accumulate, demonstrating that the brain is indeed, articulated into discrete functional regions. 116 5 A Neuroscience Primer

All this, however, did little to help the claims of the phrenologists, because the functional areas discovered by science did not match their maps in either location or function. No evidence supported even a rough correlation between brain function and the shape of the brain, much less the appearance of the skull. Ironically, many of the phrenologists themselves were very progressive people, contributing to reform in education and generally to progress in society. Their neuroscience, however, was not sophisticated. The debate over the specificity of function was not left behind in the 19th century, however, but has been and continues to be an ongo- ing controversy in the neurosciences. To what degree do the functions of the nervous system, and particularly the brain, reside in specific locations, as the phrenologists believed? To what extent do they reside at large in the overall operation of the cerebrum, or more specifically the neocortex? These are ongoing questions. One version of the issue of specificity of function was replayed in the 20th century, for instance, in terms of the search for the memory trace, or engram, in the brain. During the 1940s and 1950s, the eminent neuropsychologist Karl Lashley conducted many ablation experiments with rats, teaching them first to run mazes for food, then, after they learned the courses, ablating various parts of their cerebrums. He concluded that memory is spread widely throughout the cortex and not located in any particular region. More recent work, both from the clinic and from the laboratory, has tended to disprove Lashley’s findings, apparently locating specific areas in the brain that are active in memory storage, and also in recall. The question, however, is far from settled. The issue of specificity of function is, for the study of the nervous system, very much like the issue of nature versus nurture for developmental psychology. There will be no final answers, only an increasingly sophisticated knowledge of the contributing factors on both sides of the question.

5.2.1 The behaviorist’s brain

After the First World War the entire spectrum of science and philosophy became much more practical and austere than it had been in the 19th century. Broad philosophical systems repre- sented by figures such as France’s Henri Bergson and America’s John Dewey gave way to the no-nonsense linguistic analysis of logical . The latter considered verbal statements to be valid only in so far as they referred to something that could be observed and measured directly, such as overt behavior. Notions of mind and consciousness dropped from the scientific vocabulary. This trend was represented in psychology by John Watson’s , which co- opted Ivan Pavlov’s reflexology, replacing the latter’s "psychic reflexes" with stimulus-response (S-R) associations as the groundwork for a new psychology and a new neurology. behaviorism, which co-opted Ivan Pavlov’s reflexology, replacing the latter’s "psychic reflexes" with stimulus- response (S-R) associations as the groundwork for a new psychology and a new neurology. For psychologists and neurologists alike, the brain came to be modeled after a kind of elec- trical relay network, the principal function of which was to connect up responses to incoming stimuli. It was popularly depicted as a telephone switchboard where incoming messages could be flexibly plugged into outgoing lines of communication. On the face of it, this was a sur- prisingly simplistic portrayal of an organ of such known complexity. Moreover, it implied that the brain was a passive network where learning amounted to no more than the forging of new stimulus-response connections. It also suggests that without external sensory input through hearing, vision, taste, and so on, higher brain functions would have nothing to do and become 5.2 A Brief History of Neuroscience 117 entirely inactive. For instance, it was commonly held that if all the senses were inoperative, a person would simply fall asleep.

5.2.2 The computing brain

By the 1950s, however, this entire picture began to change. There were a number of important reasons for this. To begin with, psychological studies were becoming more complex and sophisti- cated. During the Second World War, for instance, applied psychologists had found behaviorist theories, developed mostly from research on rats and pigeons, inadequate for the selection and training of military personnel on complex tasks such as learning to fly aircraft. Meanwhile, important clinical studies of brain injury, such as those conducted by Alexander Luria in Rus- sia, contributed to an already rich and growing clinical literature that made little sense from the perspective of a simple switchboard model of the brain. And in the 1950s, the British cell physiologists Alan Hodgkin and Andrew Huxley finally penetrated a living nerve cell with an electrode, opening the door to decades of investigation of the electrodynamics and biochemistry of neurons. The beginnings of an in-depth understanding of neurons, the elementary units of brain activity, were underway. Subsequent studies would disclose a brain of increasing chemical and electrical richness and complexity. All this came at a time when behaviorism and logical positivism were beginning to show fault lines. In another decade they would collapse entirely, at least in their original forms. At the same time, the technology from which brain researchers drew their models was rapidly changing. Cybernetic theory, which dealt with the guidance of complex machines by making use of corrective feedback loops, was originally developed during the war to control anti-aircraft batteries, but quickly became a new way to think about complex systems in general. For many researchers of the 1940s, 1950s, and 1960s, it became apparent that the brain was just such a cybernetic device, one that utilized sensory feedback to regulate thought and behavior. During this same period the most compelling technological metaphor of the 20th century was also announcing its own arrival. This was the digital computer, a marvelously complex device that seemed to have an active internal intelligence of its own. From the mid-1960s on, it became increasingly common to conceptualize the brain as a computer rather than as a switchboard. intelligence of its own. From the mid-1960s on, it became increasingly common to conceptualize the brain as a computer rather than as a switchboard.

5.2.3 The dynamic brain

Thinking of the brain as a computer immediately suggests an active internal process much more dynamic and complex than any switching circuit could be. Thus it is not surprising that the neurosciences began, slowly at first, and then more rapidly, to see the return of interest in topics such as memory, perception, problem solving, and even emotion. This growing view of the brain as active and dynamic in its own right was also given considerable impetus in the 1960s by experiments on sensory deprivation conducted in Montreal by Donald Hebb and in the Virgin Islands by John Lilly. In these studies volunteers were placed in closed environmental chambers where virtually all external sensory input was eliminated. Contrary to the idea that 118 5 A Neuroscience Primer the withdrawal of sensory input would virtually shut down the brain, it was found that within a few hours or even less, the sensory isolated brain becomes active, producing a rich variety of fantasies and hallucinations. A contemporary view of the brain, the one consistent with humanistic psychology and sys- tems theory, is dynamic. Today it is becoming increasingly apparent that the brain is not a passive switchboard, or even a computer, in the sense of a mechanism that mindlessly processes information according to preprogrammed routines. The brain is as active as the human mind that it supports. Sensory input from the eyes, the ears, and the skin, is acted upon energeti- cally to create composite perceptions that are congruent with past knowledge, and the entire context of current experience, both inside and outside of the organism. Memories are actively reconstructed from fragments of recollections, knowledge of the events in question, and our own needs at the time. Nothing above the level of reflexes is automatic. All higher processes involve rich patterns of interactions of many areas of the brain in dynamic and active cooperation. It is for this reason that human intelligence is so rich and creative, and that brain disorders are so complex and baffling. This is especially the case in human decision making, which appears to engage the most anterior part of the frontal lobes (Koechlin and Hyafil, 2007). However, rats seem to have the capacity to work through a series of possibilities to test and plan what they are going to do next (Heyman, 2007). The "theatre" metaphor for brain activity is dynamic in nature and has been used in differ- ent ways. Descartes’ metaphor, now referred to as "Cartesian Theatre," has become outdated because Baars’ "Global Workplace Theatre" is far more congruent with neuroscientific data. Whitehead (2001) has used "Theatre of Mind" in a very different way, noting that "in imagi- nation we rehearse and explore social scenarios and actor-audience interactions" (p. 3). In this "embodied theatre," thoughts resemble experiences; evidence from dissociative identity disorder patients and from hypnosis research indicates that the human mind can ac- commodate "multiple minds." Whitehead argues that role-playing is the crucial adaptation that makes humans most different from other primates, and that self-awareness depends on social mirrors, empathy, and shared experiential worlds. Hubbard (1994), on the other hand, has taken the position that the "theatre" metaphor is misguided, even when applied to the study of nighttime dreaming, because it is a "leftover" from "ghost in the machine" vocabular- ies (p. 264). For Hubbard, "network" metaphors are more parsimonious to the description and understanding of consciousness. Hodgson’s (2005) perspective adds an important dimension to this discussion, holding that a "non-reductionist approach to questions concerning the brain/mind will produce the most useful results. It is generally a person (or other animal) as a whole that is taken to be the subject because they have the capacity to experience and to act (except when in a coma, etc.). This resembles the "holistic" approach taken by humanistic psychologists over the years. 5.3 The Embryology and Functional Organization of the Nervous System 119 5.3 The Embryology and Functional Organization of the Nervous System

5.3.1 The formation of the major divisions of the nervous system

Valuable insights into the structure and function of the human nervous system, may be obtained by studying its early development. Within the first few weeks of conception the embryo forms into three distinct layers called the endoderm, the mesoderm, and the ectoderm. The first of these will eventually become the internal organs of the body (viscera), and the second will become the body’s skeleton and muscles. The third will become the nervous system. The central part of the ectoderm, a sheet termed the neural plate, folds in upon itself creating the neural tube. The entire central nervous system (CNS), comprised of the brain and spinal cord, derives from this tube. Two outlying strands are pinched off from the tube during its formation and will be the source for the peripheral nervous system (PNS), comprised of all the nerve cells (neurons) that form pathways in and out of the central nervous system, connecting the latter to the entire body and senses. As the neural tube grows, it articulates into several recognizable sections. This is the embry- onic origin of the major divisions of the central nervous system. There is a long cord section above which several enlargements define the hindbrain (rhombencephalon), the relatively small midbrain (mesencephalon), and the large forebrain (prosencephalon). The latter will further articulate during growth into the telencephalon, or the large outer covering of the forebrain, and the diencephalon, or the core structures of the forebrain. mesencephalon), and the large forebrain (prosencephalon). We will list some of the important structures of the forebrain here, and return to their significance below. The telencephalon includes the outer surface of the forebrain called the cerebral cortex (from the French word for forebrain, cerebrum; and cortex, meaning covering or "bark"). It is this structure that has grown to considerable size in the evolution of the mammalian, and particularly, the human brain. It is only a few millimeters thick, but the extent of its surface area is what counts. To solve the packaging problem of putting the large cortex into the limited space of the skull, it has become folded, or convoluted, especially in animals with large cortexes such as primates and humans. Much of the cerebral cortex is sufficiently new, in terms of evolution, to be designated neocortex, as opposed to some of the older cortical regions near the core of the forebrain, which are termed paleocortex. The neocortex itself is sometimes called gray matter, because its many cell bodies and connections give this tissue a gray appearance on direct inspection. Beneath it are much thicker regions of white matter, comprised almost entirely of long nerve fibers interconnecting various parts of the brain with each other and with the outside world through the spinal cord and the cranial nerves. These fibers have thin lipid (fatty) insulating sheaths which give this tissue its characteristic white appearance. One important finding emerging from the use of Functional Magnetic Resonance Imaging (fMRI) instrumentation is discovering that the interconnections between different parts of the brain are dynamic and not static. In other words, brain networks can operate jointly to solve different cognitive tasks. The cerebral cortex is divided into two large "hemispheres" which are called the left and right cerebral hemispheres. A very large fiber bundle called the corpus callo- sum ("hard body") connects them together across the midline of the brain. The telencephalon 120 5 A Neuroscience Primer also includes several deeper structures including a group which, taken together, form the basal nuclei (or basal ganglia), important to the control of movement. The telencephalon also contains large lateral ventricles, reservoirs filled with cerebrospinal fluid that connect with a network of fluid aqueducts throughout the central nervous system. The diencephalon, forming the core of the forebrain, includes the thalamus and the hy- pothalamus. We will have more to say about these shortly. For purposes of understanding the functional organization of the nervous system, it is helpful to start from the outside and work inward, which is what we will do in the following paragraphs.

5.3.2 The peripheral nervous system

The peripheral system can be divided into two functional parts, though they are not entirely distinct anatomically. These are the somatic nervous system and the autonomic nervous system. The somatic nervous system is comprised of bundles of nerve cells (neurons) that connect the various regions of the body with the central nervous system. These nerves "exit" between the vertebra on either side of the spinal cord forming pairs of spinal nerves, and from openings in the skull forming pairs of cranial nerves. Some of the neurons in each nerve bring information, in the form of nerve action, into the central nervous system. These are called sensory neurons. Others carry information out from the central nervous system to regulate the muscles of the body. These are called motor neurons. The spinal nerves exit the cord in two branches, or roots,a dorsal (toward the back) and ventral (toward the front or belly) root. The dorsal root is made up entirely of sensory neurons and is called a sensory nerve, while the ventral root is made up entirely of motor neurons and is called a motor nerve. Incidentally, alternative names for "sensory" and "motor" are afferent and efferent. (There is a mnemonic for remembering the arrangement of the cord roots that goes like this: DAVE, or dorsal-afferent, ventral- efferent.) The autonomic nervous system is important in the regulation of the biological or visceral ac- tivities of the body. These include heart rate, blood pressure, oxygen level in the arteries, muscle activity in the walls of the viscera, and so on. An important distinction is made between two "branches" of the autonomic nervous system. These are called the sympathetic and parasym- pathetic nervous systems. The sympathetic nervous system was named the "fight or flight" system by the American physiologist Walter Cannon, because when it dominates autonomic activity the body becomes prepared to defend itself through fight or flight. The rate and depth of the heartbeat is increased, blood is moved from the viscera and the surface tissues of the body into the central nervous system and the large locomotive muscles, adrenaline is released into the blood stream, respiration is deepened, and so forth. On the other hand, domination by the parasympathetic nervous system tends to place the body in a more relaxed state of energy conservation. The blood pressure is lower, the heart pumps more slowly, respiration is more shallow and relaxed, there is less adrenaline in the blood, and other responses. Issues concern- ing the relative activation of the sympathetic and parasympathetic nervous systems have been important to these psychologists and medical personnel concerned with stress management, because an essential aspect of psychological stress is the chronic activation of the sympathetic nervous system. Because the autonomic nervous system controls many vital functions of the body, and because it is closely associated with those parts of the brain that regulate the endocrine gland system, 5.4 Functional anatomy of the Central Nervous System 121 it is considered an important link in the connection between higher brain functions and their associated mental states on the one hand, and the biology of the body itself on the other.

5.4 Functional anatomy of the Central Nervous System

This section is intended to introduce the major divisions of the central nervous system, and particularly the brain, in association with some of their most important known functions. We will begin with the spinal cord and work upward.

5.4.1 The spinal cord

The spinal cord is the home of many important functions that play vital roles in posture and locomotion. For us, its most important activity is to carry afferent or sensory messages from the body to the brain, and efferent or motor messages from the brain to the body. As the cord ascends upward it enlarges near the top, becoming the hindbrain.

5.4.2 The hindbrain

The lower (inferior) portion of the hindbrain is a bulb-like enlargement called the medulla, important in the regulation of certain autonomic nervous system functions and for carrying neural messages up from the spinal cord or down from the higher brain centers. Above the medulla is an even larger expansion of the hindbrain termed the pons. This structure also carries activity to and from the cord, but its considerable mass is due to the many nerve fibers that connect with the cerebellum (or "little brain") that lies behind, or dorsal, to it. The cerebellum is important in organizing and controlling movements that involve total body coordination such as walking, running, or jumping. These are also termed "ballistic" movements. The cerebellum is involved in rapid and especially involuntary movements, ones that have been learned so well that they require little or no attention. This structure is quite large in primates, including humans, reflecting their remarkable agility. Like the other areas described here, more is being learned about the cerebellum every year, and new or different functions may well become associated with it as research and clinical observations continue to accumulate.

5.4.3 The midbrain

Above (superior to) the hindbrain is the narrow midbrain. On its upper, dorsal, surface are found two pairs of small bumps. The upper two are the superior colliculi and the lower two are the inferior colliculi. These are very old structures in terms of evolution, once the seats of vision and audition (hearing). In the human brain they play little to no role in the experience of vision or hearing, but remain important in coordinating these senses with the motor control 122 5 A Neuroscience Primer center represented by the cerebellum. For example, they are important for orienting to a light or sound, such as turning one’s head automatically toward a movement seen in the periphery of vision, or toward the source of a sound. Beneath the dorsal surface of the midbrain, in an area called the tegmentum, is found a region of cells that form a dense interconnecting web, or reticulum. This area actually extends downward through the core of the hindbrain and on into the spinal cord, where it forms a small tightly interconnected core. In the other direction, it extends upward to the thalamus at the core of the diencephalon, forming a thin but important sheath around it. Taken as a whole this structure is termed the reticular formation. It is perhaps the oldest part of the brain, controlling overall states of activation such as wakefulness, rapid eye movement sleep, and non-rapid eye movement sleep. Its descending influences act on the spinal cord to adjust the sensitivity of postural reflexes, and can inhibit virtually all afferent activity from the senses as well as efferent activity to the skeletal muscles of the body during dreaming.

5.4.4 The forebrain

5.4.4.1 The diencephalon

The diencephalon at the core of the forebrain is defined by two important structures, the thalamus and the hypothalamus. The thalamus is of interest for a number of reasons. First, it is low in the center of the forebrain, making it an often used reference for the location of other structures, which are said to be dorsal, ventral, lateral (out from), anterior (in front of), or posterior (behind) to it. Traditionally the thalamus has been identified as a "sensory relay station" because all of the external senses except smell send afferent neurons to the thalamus where they connect with other "higher order" neurons that carry activity on to higher centers of the forebrain, primarily the cortex. This designation is not entirely accurate, however, because it is becoming increasingly apparent that the thalamus, like many other areas of the brain, is not passive at all, but plays its own very active role in perception, operating on incoming sensory signals in conjunction with feedback signals from the higher cortical centers. The thalamus has many smaller divisions, or nuclei, devoted to different functions. Some of these have to do with the regulation of motor output from the forebrain, in which it also plays an important role. The hypothalamus, as the name implies, is located below but also somewhat anterior to the thalamus. It is a small structure containing a large number of smaller nuclei. The hypothalamus has been of interest to neuroscientists for many years because of its essential involvement in emotions and drive states. It plays an important role in the regulation of emotional states, and also in biological drives such as hunger, thirst, and sexual activity. The hypothalamus is connected by a thin stalk of blood vessels and nerve cells to the bone-encased pituitary gland, the "master gland" of the endocrine glandular system of the body. Through this stalk communication passes between these two structures as neuronal activity and also as hormonal messages carried in the vascular system of the stalk. Thus the hypothalamus is an important interface between the rapid acting nervous system on the one hand, and the slower acting but vitally important hormonal communication system of the blood stream on the other. 5.4 Functional anatomy of the Central Nervous System 123

5.4.4.2 The telencephalon

Near the core of the forebrain are two sets of interconnected structures that are important to be familiar with. One is the basal nuclei (or basal ganglia), just anterior and lateral to the thalamus. The basal nuclei play an important role in the control of movement. They are particularly involved in slow voluntary movement, as opposed to rapid ballistic motions, which are controlled by the cerebellum. The other is a more loosely defined system, anatomically speaking, that forms a functional circuit shaped roughly like two large vertical loops on either side of the thalamus that meet in the hypothalamus at the bottom. This is termed the limbic system. It is important in the regulation of emotion and drive states. It includes a number of discrete structures near the core of the forebrain, including the hypothalamus, hippocampus, amygdala and septal nuclei, the cingulate gyrus, and others. Many of the functions originally attributed to the hypothalamus are in fact found widely throughout the limbic system. As we will see further on, parts of the limbic system also play a vital role in the formation of memory. The most prominent feature of the telencephalon is the large outer covering of the brain, the cerebral cortex. The cortex is divided longitudinally into the more or less symmetrical left and right hemispheres, connected together by the corpus callosum and two other small fiber tracks. Using major cortical outcroppings (gyri), and invaginations or fissures (also called sulci), as landmarks, the cortex of each hemisphere is divided into four lobes. Since this is not a functional division, and not even a truly anatomical one, the operations served by these four lobes are entirely discrete. Still, there is a sufficient degree of separation to make it helpful to review some major functions of each of these four lobes. served by these four lobes are entirely discrete. Still, there is a sufficient degree of separation to make it helpful to review some major functions of each of these four lobes. As you know from the previous chapter, more than a century ago Florens believed the cortex to operate as an undifferentiated unit. Since that time research has been a continuing story of discovery of particular functions in particular areas. This does not mean, however, that in time the cortex will become completely mapped into discrete specialties as the phrenologists believed. Studies of cell activity show that there is a surprising degree of participation of neurons from all over the cortex in any particular brain activity such as memory, perception, or thinking. It is likely that there is a great deal of cooperative participation between the many areas of the cortex in just about all of the higher functions of the brain. Nevertheless, certain areas are much more dedicated to certain types of activities than others. We will discuss each of the four lobes in an introductory fashion, listing their most salient involvements. The four major divisions of the neocortex are the temporal, occipital, frontal, and parietal lobes. The temporal lobe, located under the temple, has been known for many years for several functions. The primary auditory cortex, where sensory activity from the ear first arrives from the thalamic relay station, is located in the upper, or superior, region of the temporal lobe. The middle, or medial temporal lobe, and the underlying hippocampus (of the limbic system), is important in the formation of new memories. The lower region, or inferior temporal lobe, is important in the recognition of visual objects. Several areas of the left hemisphere (in most people) are associated with language. Wernicke’s area is found on the left side behind, or posterior, to the primary auditory cortex and plays a vital role in understanding spoken and written language, as well as in the production of organized speech. The occipital lobe, which occupies the entire posterior portion of the brain, is given over almost entirely to the processing of vision. The primary visual cortex lies near its posterior 124 5 A Neuroscience Primer extreme, along the inner wall of the longitudinal fissure that separates the left from the right hemispheres. This region is also termed the striate cortex. Visual information processing be- comes increasingly sophisticated as we move forward along the surface of the occipital lobe from the striate cortex. The optic, or visual cortex, as this entire region is often called, is divided into many more or less distinct regions, each of which maps out the entire opposite visual fieldÑthe left hemisphere visual cortex responding primarily to the right half of the visual field and vice versa. Each of these regions responds to particular features of the visual field, such as the size of an object seen, its shape, color, whether and in which direction it is moving, and so on. We will have more to say about these aspects of vision in the section on perception. The parietal lobe (or "roof") of the brain is located above the posterior half of the temporal lobe and anterior to the occipital lobe. It extends forward to near the top of the brain where it is bounded by the large central fissure, which visually divides the cortex above the temporal lobe into the frontal lobe in front and the occipital lobe behind. Just behind the central fissure lies a band of cortical tissue termed the somatosensory cortex, which is the primary sensory cortex for touch and tactile senses of the body. Along this strip of tissue the entire surface of the body is mapped out in terms of sensory representation, with the head furthest down near the temporal lobe, and the legs and feet extending over the top and down along the inner wall of the longitudinal fissure. This strip in each hemisphere represents skin sensations from the opposite side of the whole body. The parietal lobe has also been shown to play an important role in the attentional systems of the brain, and also in the localization of objects in visual space. strip of tissue the entire surface of the body is mapped out in terms of sensory representation, with the head furthest down near the temporal lobe, and the legs and feet extending over the top and down along the inner wall of the longitudinal fissure. This strip in each hemisphere represents skin sensations from the opposite side of the whole body. The parietal lobe has also been shown to play an important role in the attentional systems of the brain, and also in the localization of objects in visual space. Last we come to the frontal lobe. For many years the frontal lobe, along with most of the parietal lobe, was the dark continent of the brain. The large areas for which no specific function could be assigned were termed the association cortex, based on the assumption that they must be important in higher mental processes, thought to be based on the association of ideas. During recent decades the frontal lobe, as well as the parietal lobe, have offered up an increasingly rich variety of functions, some of which we will discuss throughout the primer. Since Fritsch and Hitzig’s 1870 studies on the dog, however, the posterior extreme of the frontal lobe has been known to be important for motor control of the body. This cortical strip, called the primary motor cortex, parallels the parietal somatosensory strip just across the central fissure from it, but mapping the body in terms of movement. Like the sensory strip, at each point it corresponds to some area on the opposite side of the body. Here it would be well to stop and note that there is a general rule of the brain according to which both afferent and efferent activities correspond to the opposite side of the body. This includes the control of the muscles of the body as well as the afferent input from the various senses. Another well known region in the frontal lobe is Broca’s area, usually found on the left side of the brain, and always at the posterior extreme of the frontal lobe near the junction of the central fissure and the temporal lobe. We noted previously that this area was discovered in the 1860s by Paul Broca who recognized its importance for the normal production of speech. The more anterior regions of the frontal lobe are involved in a variety of different functions, many of which are still not well understood. Overall, the frontal lobe is vitally important in 5.5 Neurons and Their Interactions 125 planning for the immediate as well as the long-range future, and the ability to carry out those plans. This is perhaps the hallmark of the frontal lobe in humans. However, it is also involved with the regulation of emotion, with short term "working" memory, and with the regulation of attention. One of the defining characteristics of clinical death is the cessation of brain activity. However, there is evidence that this marker can be reversed. The Safar Center for Resuscitation Research has developed a technique in which a dog’s veins are drained of blood and filled with an ice- cold salt solution. The dogs are considered clinically dead, as they stop breathing and have no heartbeat or brain activity, but three hours later, their blood is replaced and the animals are brought back to life with an electric shock. Their tissues and organs are perfectly preserved, and there is no brain damage (hence, they are not " dogs"). During the procedure, blood is replaced with a saline solution at a few degrees above zero; the dog’s body temperature drops slightly, as opposed to hypothermia which precedes death. Applications to human beings could save countless lives on the battlefield or among victims of domestic and urban violence, such as stabbings or gunshot wounds, all of which involve massive blood loss (Buchan, 2005). lives on the battlefield or among victims of domestic and urban violence, such as stabbings or gunshot wounds, all of which involve massive blood loss (Buchan, 2005).

5.5 Neurons and Their Interactions

The nervous system is comprised of millions of nerve cells, or neurons, along with an even larger number of supporting glial cells, or glia. The average neuron receives stimulation from several hundred other neurons, and in turn sends stimulation to as many others, resulting in over a trillion connections in the cortex alone. Thus the human brain, along with the brains of other advanced mammals, is easily the most complex structure in the known universe. In this section we review the basic facts about the structure and function of neurons. This may be the most technically demanding material in this primer, so hold on to your hat! It is important material, however, and once you have mastered it you will find that it is very useful, indeed essential, for understanding a great many aspects of the brain and its functions.

5.5.1 Neurons

It is important to understand from the beginning of the study of nerve cells, however, that they are more like other cells than they are different. That is, like all cells with nuclei, neurons are comprised of a complex substance called protoplasm, and surrounded by a cell membrane. Inside the cell is a nucleus, surrounded by its own nuclear membrane, and containing the genetic material that controls the growth and metabolism of the entire cell. The cell membrane is very complex and dynamic. It regulates the flow of waste material out of the cell and the flow of water, oxygen, and nutritional substances into the cell. It is noteworthy that the electrical membranes of all cells, nerve cells or otherwise, regulate the inflow and outflow of charged particles, ions, maintaining an overall electrical voltage difference between the inside and outside of the cell. Usually this difference is in the order of just less than one tenth of a volt, that is, around 80 millivolts (1000 mv is equal to one volt, so 100 mv is one tenth of a volt). Through evolution, 126 5 A Neuroscience Primer neurons have become specialists in modulating this electrical resting potential into the form of impulses that carry information through the nervous system and the rest of the body. As far as is known at the present, such impulses are the only vehicle for the long-range transmission of nerve signals. Neurons seem exotic because of their unusual shapes, though they operate for the most part by the same principles as do other cells. They come in a wide variety of shapes and sizes, though most have specific common anatomical and physiological features. Typically a neuron has a cell body, or soma, that contains the nucleus and looks very much like any other cell. From this cell body, however, extend a varying number of long "processes" called dendrites and axons. The most common situation is for the cell body to have a large number of dendrites, but only a single axon. The dendrites branch out like roots, making connections with many other nerve cells. The axon may itself branch at some distance from the cell body, or it may continue as a solitary fiber for great distances before connecting with one or more other neurons. Single axons can be found that course the entire length of the spinal cord, or the length of a giraffe’s neck. Neurons, however, vary greatly in shape. Some, such as the amacrine cell of the retina, have no discernable axon at all, while others have just one axon and one dendrite. The latter, called bipolar cells, because they have just two anatomical poles, are found almost universally in sensory systems such as the ear, the eye, and the skin. the inflow and outflow of charged particles, ions, maintaining an overall electrical voltage difference between the inside and outside of the cell. Usually this difference is in the order of just less than one tenth of a volt, that is, around 80 millivolts (1000 mv is equal to one volt, so 100 mv is one tenth of a volt). Through evolution, neurons have become specialists in modulating this electrical resting potential into the form of impulses that carry information through the nervous system and the rest of the body. As far as is known at the present, such impulses are the only vehicle for the long-range transmission of nerve signals. Neurons seem exotic because of their unusual shapes, though they operate for the most part by the same principles as do other cells. They come in a wide variety of shapes and sizes, though most have specific common anatomical and physiological features. Typically a neuron has a cell body, or soma, that contains the nucleus and looks very much like any other cell. From this cell body, however, extend a varying number of long "processes" called dendrites and axons. The most common situation is for the cell body to have a large number of dendrites, but only a single axon. The dendrites branch out like roots, making connections with many other nerve cells. The axon may itself branch at some distance from the cell body, or it may continue as a solitary fiber for great distances before connecting with one or more other neurons. Single axons can be found that course the entire length of the spinal cord, or the length of a giraffe’s neck. Neurons, however, vary greatly in shape. Some, such as the amacrine cell of the retina, have no discernable axon at all, while others have just one axon and one dendrite. The latter, called bipolar cells, because they have just two anatomical poles, are found almost universally in sensory systems such as the ear, the eye, and the skin. Activity in neurons tends to be one-way, moving from the dendrites through the cell body and away along the axon. In a sense, neurons are the telephone lines of the nervous system, the dendrites and cell bodies collecting information from many other cells, while the axons transmit the results for long distances to other parts of the nervous system, or to muscles, causing them to contract. Sensory or afferent neurons carry information centrally into the CNS, while motor or efferent neurons carry activity out toward the muscles and glands. The largest class of neurons by far, however, is called interneurons. These transmit activity between other neurons, entirely within the nervous system. 5.5 Neurons and Their Interactions 127

There are differences in the way activity travels through the axon, on the one hand, and the dendrite and the cell body on the other. The axon activity is generally better understood because it has been studied longer, and the axon is larger and more accessible to investigation than dendrites, many of which are very small, even on a microscopic scale. Axons produce action potentials, sometimes called "impulses," or we say that the cell "discharges" (though it is just the axon that actually discharges). These all refer to an abrupt depolarization, or disappearance, of the resting potential, which lasts only about a millisecond but travels with considerable speed along the axon fiber. Any information sent over significant distances in the nervous system is sent as pulsatile activity in axons. What is more, all action potentials for a given axon are virtually the same sizeÑsince they are simply depolarizations of the resting potentialÑand travel at the same speed. Axons follow what is called the "all-or-none" law. That is, once an axon has been sufficiently stimulated to cause it to fire off one or more action potentials, all these action potentials are the same, at least for a particular cell. It is like firing a pistol; no matter how hard the trigger is pulled, the muzzle velocity of the bullet is the same. The only free variable for the axon is the number of impulses per second that it transmits. This is often called the rate of discharge. High levels of dischargingÑmany impulses per secondÑ signal greater intensity, for example a stronger taste, a brighter light, or the command for a vigorous muscle contraction. Axons are typically surrounded by myelin sheath cells, which provide them with physical sup- port, nutrition, and electrical insulation. The myelin cells look a bit like little pancakes wrapped side-by-side around the axon. The small spaces between them are called nodes of Ranvier. For large axons, one effect of these sheath cells is to accelerate the velocity of propagation of the action potentials. As it travels down the axon it leaps from node to node, skipping the spaces in between. This type of transmission is termed saltatory ("dancing") conduction. Myelin sheath cells are not neurons, but a form of glial cell, of which we will have more to say later. The most interesting events in the neuron are located in and near the dendrites and cell bodies. Activity here does not follow the all-or-none law, but rather is graded. This means that the extent to which a dendrite or cell body is depolarized depends on the degree to which it is stimulated. What is more, the graded potentials that occur in these regions spread out slowly, like waves on the surface of water, interacting with each other. If we allow for the fact that some of the stimulation may be excitatoryÑcausing graded waves of partial depolarizationÑwhile other stimulation may be inhibitoryÑforcing potentials back in the direction of the original resting potential or beyondÑthen we begin to see how amazingly complex such interactions can be. It is precisely within such interactions that the most complex and creative activities of the brain are most likely taking place, activities involving thinking, memory, the imagination, and so on. Neurons communicate with each other across synapses, where nerve cells come into close proximity with each another and share specialized transmitter chemicals. Thus, nerve cells typ- ically communicate by use of chemicals rather than by direct electrical contact. The exception, termed a gap junction, is common in invertebrates, but rare in vertebrates, where it occurs only at connections that require great speed, or the synchronization of several neurons, as is the case with cardiac pacemaker cells. The point of communication between neurons typically involves the end of an axon coming near, but not touching, the dendrite or cell body of another cell. The membranes of these cells do not make physical contact. Rather, the axon of the pre-synaptic cell releases a small quantity of neurotransmitter substance which rapidly diffuses across the synapse to stimulate or inhibit the post-synaptic neuron. The regions of both cells near the synapse are highly specialized. Typically 128 5 A Neuroscience Primer the end of the axon displays an enlargement called an axon terminal (or terminal bouton) that comes into close proximity to the membrane of the post-synaptic cell. This terminal contains a number of specialized structures, mitochondria, involved in energy metabolism, and many small packages, or vesicles, of neurotransmitter chemical. Some of the latter release their contents into the space between the cells at the time of transmission. This narrow space is termed the synaptic cleft. Diffusing across this space, the neurotransmitter locks briefly onto specialized molecular structures in the post-synaptic membrane called receptor sites. In response, these sites open the membrane to the flow of certain ions, causing it to partially depolarize in the case of an excitatory synapse. If it is an inhibitory synapse the effect of the neurotransmitter is to allow different ions access to the cell, which cause it to become more polarized. If the result of the latter is large enough to push the potential beyond the level of the ordinary resting potential, for example to 90 mv or 100 mv for a cell with a resting potential of 80 mv, then we say the cell has become hyperpolarized. Hyperpolarization is usually associated with inhibition. There are a variety of types of chemical synapses based upon the actual neurotransmitter chemical and its mode of operation within the post-synaptic cell. First, there are synapses where the neurotransmitter operates on the receptor cites of post-synaptic membrane, directly opening it to depolarization or hyperpolarization as a result of the inflow of ions. Sodium ions, for example, carry a positive charge, and thus tend to depolarize the cell. On the other hand, there are synapses that involve a second messenger system, where the neurotransmitter chemical operates on the post-synaptic membrane to activate a short lived "second messenger" chemical that is the actual agent of depolarization. In some instances there can even occur a "second messenger cascade" in which a series of chemical events connect the activation of the receptor site to the resulting depolarization of the cell. The study of chemical synapses is an area of vigorous contemporary research, yielding fre- quent new discoveries. Knowledge of how synapses work is very important for several reasons. To begin with, many mechanisms of self-regulation within the nervous system itself operate at the synapse by adjusting the neurochemical action there. For instance, there is an important class of chemicals called neuromodulators that have the overall long-term effect of facilitating or inhibiting activity in entire classes of chemical synapses. The neuromodulator endorphin (meaning "endogenous morphine"), for example, is important in regulating pain sensitivity. It is also suspected of being involved in pleasurable mood states. Beyond this, more or less permanent changes in the brain, such as the creation of new memories, are most likely to involve chemical changes at central nervous system synapses. It is here that the greatest flexibility is found. Axon transmission, for example, is very resilient, and not amenable to change with experience. Anatomical changes in the size and shape of synapses require at least a few hours, if not a few days. So short-term memory, for instance, must surely involve chemical changes in synapses in key regions of the brain. We will return to this topic later in the primer. Finally, virtually all drugs exert their effects by modifying synaptic activities. They do so by acting through various routes. Some mimic actual neurotransmitters, latching onto post- synaptic receptor sites, not activating them. These block synaptic activity, inhibiting synaptic action. Other drugs block the re-uptake of already used neurotransmitter chemicals by the pre- synaptic cell. Thus, neurotransmitter concentrations build up in the synaptic space, causing the synapse to become increasingly active. There are many such mechanisms by which drugs affect activity in whole classes of synapses in the nervous system. 5.5 Neurons and Their Interactions 129

It may be useful to summarize the principle categories of chemicals important in the commu- nication of nerve cells, and also in the interaction of the brain and the immune system. These include: • Hormones are chemical messengers carried by the blood stream. Important hormones are released by the hypothalamus via the pituitary gland. These regulate many of the basic autonomic functions of the body by governing the entire system of the body’s endocrine glands. • Immunomodulators are immune system messenger substances that are used to communicate with immune system cells such as macrophages and T-cells. • Neurotransmitters and neuromodulators, described previously, are basic to both rapid com- munication and long-term influences between nerve cells. There are many types of neuro- transmitters and neuromodulators, including neuropeptides, which may be important in emotion, biological drives, and may also play an important role in healing.

5.5.2 Glia

The glial cells provide nutrition and physical support for neurons in the central and peripheral nervous system. The most numerous variety of glial cell is the astrocyte, which occupies the space between neurons. It is also the job of the astrocytes to clean up after the death of nerve cells, or parts of them such as axons. Such death, incidentally, is a normal part of the total dynamic of the nervous system, especially during early development when over half the original neurons in some regions of the brain die as part of an overall shaping of the brain circuitry. Myelin sheath cells are also glial cells. These may be very large when associated with large axons in the peripheral somatic nervous system, where they facilitate high saltatory conduction velocities, for example in motor neurons supplying the large skeletal muscles of the body. Glial cells have traditionally been assigned a secondary role in the operation of the nervous system. It is not clear, however, that they will always remain in this position. For many years, researchers and theorists have speculated that glial cells may play a vital role in the formation of memory, that they may be the source of the DC fluctuations we call the EEG, or that they represent an evolutionarily older network than the nervous system itself, serving an essential role on mobilizing healing in the body. All these ideas, and others, are exciting possibilities waiting to be thoroughly explored.

5.5.3 Mirror neurons

The discovery of "mirror neurons" in the 1990s explains why people tend to see others as similar to themselves, rather than different (Borg, 2007). According to research by neuroscientists at the University of Parma in Italy, mirror neurons "could help explain how and why we ’read’ other people’s minds and feel empathy for them" (Winerman, 2005, p. 48). Mirror neurons have been identified as a kind of brain cell that exhibit the same response when someone performs an action, as when one observes the same action in someone else. These neurons appear to play an important role in mimicking, empathy, and the evolution of language. Autism may result from their malfunctioning. Mirror neuron activity can be measured by such technologies as 130 5 A Neuroscience Primer transcranial magnetic stimulation. Well established in the brain’s of rhesus macaque monkeys, their localization in human brains is in its initial phases. Mirror-touch synesthetes literally "feel the pain of others" sensing they are being touched while watching others being touched. Continued research may shed light on how people respond to others’ actions, emotions, and intentions (Agnew, Bhakoo, and Puri, 2007).

5.6 Vision and Sensory Input

ÒMost people intuitively think that human vision works much like a camera. However, this is far from the case.Ó (Tong and Pearson, 2007, p. 150) Aristotle’s traditional senses included vision, hearing, taste (gustation), smell (olfaction), and touch. Today these are called exteroceptive senses to distinguish them from internal, or interoceptive senses such as proprioception (sense of body position), body temperature, and the registering of nutrient and salt levels in the blood. Some of the latter senses, such as the sensing of blood nutrient levels that leads to hunger, are still not entirely understood, and some are not accessible to conscious awareness. The latter include the output from the tiny organs called muscle spindles, positioned deep in the skeletal muscles that constantly register the degree of muscle contraction. They also include many of the autonomic sensors such as those that register blood pressure. To review all these senses would be an entire field of study in and of itself. The degree of anatomical and functional similarity between at least the exteroceptive senses, however, is surprisingly great. Most, for example, begin with a type of neural receptor cell that transforms some form of energy from the environment, such as sound or light, into the neuroelectrical energy we are familiar with as a graded potential; in other words, we have a neural response. Very near the periphery of each system are also found bipolar cells, which pick up the neural activity and send it into the central nervous system as action potentials along their axons. Typically there is also important communication laterally between the neurons that carry sensory activity toward or into the CNS. This takes place at the synapses on one or both sides of the bipolar cells, and elsewhere, often with the effect of sharpening the definition of the sensory message that is delivered centrally. Here we will focus our discussion on the visual system. There are several reasons for doing so. First, the visual system has been studied far more thoroughly and successfully than the other sensory systems. Second, as indicated above, there is a surprising commonality among sensory systems, and with this in mind it would seem a better choice to understand one of them in some detail than to have but a nodding acquaintance with many. Third, the study of the visual system has revealed a great deal about how the brain in general functions, so that its study is instructive beyond the understanding of vision itself. And finally, more human brain tissue is devoted to vision than any other sense by a large margin, meaning that, biologically speaking, it is our most important sense, a fact noted by Sir Francis Crick (1994), who used the visual system to introduce readers to his model of consciousness in his book The Astonishing Hypothesis. 5.6 Vision and Sensory Input 131 5.6.1 The eye

The human eye is the external organ of vision. It is a spherical structure, held out by the internal pressure of the clear fluids that fill it. The tough outer surface of the eyeball is clear in the front, forming the cornea. Behind the cornea are the iris and the crystalline lens. The cornea and the crystalline lens act together as a single functional lens to focus images of the landscape in front of the eye onto its inner surface, the retina. Light first passes through the cornea, where it is bent, or refracted, by the latter’s lens-like action, then passes through the pupil at the center of the iris. The iris varies the size of the pupil to adjust for varying light levels, making it smaller under high illumination, thus optimizing the resolution of the retinal image, or larger under low illumination, to maximize the amount of light entering the eye. Light then passes through the crystalline lens where it is further refracted. The curvature of the lens, and thus its optical power, can be adjusted for the distance of the object viewed. This is called accommodation. As people grow older, especially after the age of about 40, the lens becomes increasingly less flexible, and accommodation is dramatically reduced, commonly requiring people to wear glasses to read and to view near objects. Other distortions in the curvature of the lens can cause nearsightedness (when it is too powerful), farsightedness (when it is too weak), or astigmatisms (when the shape is imperfect). Our main interest with the eye begins with the retina. This is a thin layer of nerve cells and blood vesicles that line practically the entire inner surface of the eye except in the immediate region of the cornea, and a small region called the optic disk, where the optic nerves exit. This disk, incidentally, corresponds visually to the blind spot. The human eye has an inverted retina, that is, the receptor cells that respond to light are located on the "outer" surface of the retina where it comes into contact with the material of the eyeball itself. (Note that in the eye, outer, means away from the center of the eyeball, and inner, means toward the middle of the eyeball. This can be confusing at first.) There are two types of receptor cells, termed rods and cones, named after their appearance. Both are filled with light-sensitive chemicals, or pigments. These are termed rhodopsin in rods and iodopsin in cones. Rods are primarily responsible for night vision. They are sensitive to all wavelengths of light except deep red, to which they do not respond. Many nocturnal animals have pure rod retinas. These animals do not make color discriminations, but can only detect differences in brightness. Color vision requires cones, which are also the basis of day vision. There are actually three types of cones, based on three types of iodopsin, one found in each type of cone. These are maximally sensitive to different regions of the color spectrum, one to the short wavelength blue region, one to the somewhat longer green-yellow region, and the third to the longer yellow-orange-red region. These are usually referred to simply as the short, medium, and long wavelength cones. Color vision depends on the relative degree of activation of these three types of cones. For instance, activation of only the short wavelength cones would signal blue, whereas activity in both the short and medium wavelength cones would indicate green or greenish yellow. If all three types of cone were equally active, one would see white, since it is the result of the additive mixture of all colors. Just inside the layer of receptor cells is a layer of bipolar cells. The dendrites of these cells receive stimulation from the rods and cones, carrying it further into the retina. The number of receptor cells served by a particular bipolar cell varies widely, depending on the location in the retina. In the fovea, the most central portion of the retina where vision has the greatest acuity, there may be as few as one cone for each bipolar cell. Far in the periphery, however, where the stress is upon sensitivity to low light levels and movement, there may be hundreds of receptor 132 5 A Neuroscience Primer cells for each bipolar cell. At the level of the synapses between the receptor cells and the bipolar cells are found horizontal cells, which connect laterally across the retina. These come in two varieties, those that respond only to the level of illumination, and those that respond in terms of color. Evidently, brightness coding (signaling brightness) and color coding (signaling color) are taking place already at this first synaptic level in the visual system. At the other end of the bipolar cells, where they synapse on the giant ganglion cells that carry visual messages all the way into the brain along the optic nerve, are found another variety of neurons that connect laterally through the retina. These are called amacrine cells. They play an important role in the visual response to movement. In fact they do not respond to stationary visual objects at all. Interestingly, the ganglion cells are the first nerve cells in the visual system to carry action potentials. All of the cells contained entirely within the retina respond only in terms of graded potentials. This again points to the importance of graded potentials in the processing of in- formation. Indeed, since the retina is embryonically derived from the brain, it is considered to be brain tissue itself, meaning that all this retinal activity is, in fact, the first level of brain processing of visual stimulation.

5.6.2 The brain

So far we have seen the presence of active and complex processes even in the first stages of the visual sensory system. Indeed, it would be a serious mistake to think that vision, or any other sensory system, simply sends replicas of sensory events, like images from the retina, on to the brain to be processed there. In fact, the image on the retinal surface is not seen by the higher brain centers at all, or anything like it for that matter. The messages the eye sends to the thalamus resembles the original image no more than telephone signals sent across wires and through the airwaves resemble the sounds of the people speaking. There is a deeper mystery here, however, because those telephone signals are reconstructed at the other end into close facsimiles of the original voices of the speakers, whereas the activity sent to the brain is never reconstructed again into an image. Indeed, it seems at present that once reaching the cortex, visual activity scatters out in multiple directions, where different kinds of processing are carried out on different aspects of visionÑaspects such as color, form, sizeÑnever to come together again. This is a deep mystery of vision, and a deep mystery of consciousness as well. Specifically, how do the various elements of experienceÑin the case of vision, color, form, brightness, and so onÑthat are processed in different locations in the brain, find their way back into a single conscious experience? This is known as the binding problem. Now let us trace out some of the aspects of the brain’s response to visual stimulation. Each ganglion cell carries a complex message to the lateral geniculate bodies, the visual nucleus of the thalamus. Most commonly this is a combination of information about location, brightness, and color. The typical ganglion cell responds to a particular region on the surface of the retinaÑ- corresponding to a particular region of the visual field. This is known as its receptive field. This field can usually be mapped into a small round central area, and a donut-shaped surrounding region. Cells differ in their response characteristics, but most often the adequate stimulus for a particular ganglion cell is the presence of light in the center, or in the surround. These two regions are usually antagonistic, so that light in one region inhibits the response to light in the other. In other words, the cell responds either to light in the center and is inhibited by light 5.6 Vision and Sensory Input 133 in the surround (a center-on surround-off cell), or vice versa (a center-off surround-on cell). To make matters more complex, many cells are color specific. For instance such a cell may be stimulated by red light in the center, but inhibited by green light in the surround. Similar cells are found in the lateral geniculate bodies. Now, just what is going on here? The traditional explanation for this question is to view these, and other similar cell responses, as providing discrete units of information which eventually are combined by the higher visual centers of the brain into increasingly complex representations of the outside world. Let’s follow this line of thought. The lateral geniculate body itself is a complex structure, with a layered overlapping representation from the two eyes. Certain of its 6 layers deal with color and detail, and others with the global properties of the visual field and especially with movement. They all project to the striate cortex, the primary visual area in the occipital lobe. The cortex has 6 layers. The fanned-out pathway from the thalamus, called the optic radiations, project to layer 4. If we look in on layer 4 of the striate cortex we find cells with center-surround receptor fields not unlike those seen in the lateral geniculate bodies, and in the ganglion cells from the retina. Above and below layer 4, however, are located cells with more complex response patterns. These more complex cells were discovered in the 1960s by sensory physiologists Davis Hubel and Torsen Wiesel, at Johns Hopkins University, causing a considerable impact on the neuro- science community, which saw them as representing an early stage of feature detection analysis in the brain. The idea was that the brain extracts increasingly sophisticated features from the visual image, leading to an eventual recognition or reconstruction of the image at a higher center. These cells prominently included simple cells, which acted much like the above center- surround cells, but required a bar of light to trigger their best response, and complex cells, which were less demanding as to the location of the stimulus, but often required the bar to be in motion. And unlike the simple cells, they would not respond to points of light at all. In both instances the requisite bar of light must be at a particular angle, or orientation, specific to the cell. Even more impressive hypercomplex cells were also found in the striate cortex, which would respond to a bar of light in either of two mutually perpendicular orientations. It seemed as if the brain were searching the visual image for lines and contours that would define a particular stimulus. A cartoonist does something like this in drawing a caricature of a well-known face. Only a few lines are needed to characterize the already familiar person. In fact, there must be something to this idea because the very organization of the striate cortex seems designed for its operation. One of Hubel and Wiesel’s most important discoveriesÑwhich goes far beyond the study of vision aloneÑ was that the cortex is organized into small and surprisingly independent computational units, or columns, that vertically penetrate the entire cortex (Hubel and Wiesel, 1965). Very little, if any, communication passes directly between these columns, even those immediately adjacent to each other. Signals that pass between them do so through the subcortical white matter. Within a single column, virtually all activity carried on is between the cells of that column, traveling up and down within it. Now, it turns out that the simple and complex cells of each column respond to only one single orientation. This could be lateral, vertical, or oblique bars or light, but in each case it must be the correct one for the column. If one then moves along the cortexÑimagine, for example, flying over it in a tiny planeÑthen each successive column as you pass responds to a slightly different orientation, like a clock hand turning slowly from lateral to oblique, then on to horizontal, and so on. In other words, the columns are systematically laid out side by side in terms of orientation. If one moves along far enough a complete cycle will be observed, forming a functional hypercolumn. Passing beyond the completion of each hypercolumn the cycle begins again, but this time responding 134 5 A Neuroscience Primer to a different, contiguous, region of the retina. If, on the other hand, we would turn directly to the left or the right and set out again, we would pass alternating columns representing input from the left and the right eyes. With the above in mind, it is tempting to speculate in terms of visual activity moving from the primary visual cortex out through the pre-striate portions of the occipital lobe, and then on into the temporal and parietal lobes, all the way finding columns with visual neurons that respond to increasingly complex features of the visual stimulus. Neurons might respond successively to angles, degrees of curvature, complex figures like the printed letters of the alphabet, and finally to intricate patterns such as the shapes of animals and human faces. Indeed, one famous investigation found a cell in the inferior temporal cortex of a monkey that responded to the silhouette of a monkey paw as seen from the perspective of the monkey itself! Indeed, this is how many researchers still understand the basic path of perception in the visual system. There are problems, however, with this view. For one thing, with the exception of the much mentioned "monkey-paw cell," above, no one has found any neuron in the entire visual system that responds to anything more complex than angles and contours. Moreover, it is not always clear from a feature detection perspective just what certain cells are communicating. Consider a layer 4 striate cortical cell that responds to green light in the center, but is inhibited by red light in the surround. White light, which contains all wavelengths, has a modest effect on both the center and the surround. Of course, the brightness of the stimulating light plays a role too, activating or inhibiting the cell in proportion to its intensity. Now, technically this cell is coding for location, color, and intensity, but how is the brain to sort out which is the effective factor at any given moment? Suppose, for instance, the cell is observed to respond at a moderate level of excitation. Is this the result of a dim green light in the center, or a white light flooding the entire receptive field, or perhaps a bright green light in the center with a dim red light in the inhibitory surround? There is a great deal of ambiguity in the response of neurons that can be activated or inhibited by several variables at once. The standard answer to this problem is to say that the brain handles input in terms of groups of cells, or ensembles. Cells are not counted individually so much as they participate in complex groupings. This may well be the case, but until the actual logic of such ensembles is worked out, the simple feature detection approach to understanding vision, or any other sensory mode in the brain, will remain problematic. What is more, it is not clear that features, at least in terms of lines and angles, are what the visual cortex is looking at anyway. It turns out, for example, that these same simple and complex cells that Hubel and Wiesel mapped respond even more definitively to patterns with particular spatial frequencies. Here, spatial frequency refers to the "rate" of change of brightness and darkness across the visual display. A pure frequency can, for instance, be represented by a white and black gradient of a particular grade of fineness or roughness. The point here is that we are far from having the final word, and while the logical, linear, and AI-friendly idea of feature detection has considerable merit, it may well not be the whole story. For example, the prominent brain scientist Karl Pribram has researched and developed an entire theory of perception, and memory as well, based on the idea that the brain in some ways works on the same principle as a hologram, treating information in terms of waveforms and frequencies rather than lines and angles. The finding that the striate cells respond so clearly in terms of spatial frequency is highly supportive of this approach. 5.7 Motor Control of the Body 135 5.6.3 Contemporary research

In recent years our entire understanding of the operation of the visual brain has been vastly improved by clinical studies with human beings and experimental research with animals. It is now apparent that vision is an enormously rich process in the brain, involving many different areas that specialize in different aspects of the visual world. For example, the striate cortex turns out to be peppered with color blobs, collections of cells that respond specifically to color. Exploring the cortical regions adjacent to the primary visual cortex researchers have found substantial regions that specialize in coding shape, color, movement, and even "dynamic movement," the sense of movement in stationary figures such as the dancers on a Greek vase. Visual activity spreads out in two broad streams from the occipital lobe, into the inferior temporal lobe and up into the parietal lobe. The inferior lobe in primates is important in the visual recognition of objects. This area has close connections with the hippocampus and limbic system, and also with the medial temporal lobe, playing an important role in visual memory. It is fascinating to study the disorders of perception and memory that result from damage to the inferior and medial temporal lobe. The parietal lobe is important for localizing objects in space. In other words, identifying their direction and distance. This area seems to also play an important role in directing visual attention to one part of the visual field or another. Extensive damage to the parietal lobe on either side can result in the "hemi-neglect syndrome," in which the individual seems entirely unable to notice objects on the opposite side of the body. This neglect can extend to washing, dressing, and even eating food from both sides of the plate.

5.7 Motor Control of the Body

Galen placed motor control in the cerebellum of the hindbrain, and was not entirely wrong in doing so. But there are other centers of motor control as well. As we have seen, in 1870 the German physiologists Gustav Fritsch and Edward Hitzig located an area in the frontal lobe of the dog which produced muscle contractions when electrically stimulated. By about the turn of the century other investigators who worked with animals, such as the prominent British experimentalist Charles Sherrington, had identified the precentral cortex of the frontal lobe (Area 4) as vitally important in the production of movement. In the 1950s Wilder Penfield, a Canadian neurosurgeon, finally demonstrated beyond doubt the importance of this area by showing that mild stimulation to it produces muscle contractions in various parts of the body. This region of the frontal cortex became known as the primary motor cortex. It was assumed that its prominent location in the neocortex meant that it was the point of highest origination of voluntary movement, an inference that did not pan out. The detailed study of how the body regulates motor activity was developed to a surprising degree of sophistication at the turn of the century by Sherrington’s classic investigations of the role of spinal mediated reflexes. During the first half of the century researchers continued to probe the fine grained details of muscle regulation in locomotion, such as walking, running, and swimming, and in the ordinary maintenance of posture. By the late 1960s the particulars were coming together to form a complex picture in which the rich synaptic core of the spinal cord was seen to regulate a variety of muscle activities leading to smooth movement and stable posture. The complexity of this activity, proceeding entirely out of consciousness at the level of the cord and the motor and sensory neurons that connect it to the muscles, is not often 136 5 A Neuroscience Primer appreciated. A dog or a cat that has suffered a transection (cut or division) of the spinal cord just below the hindbrain, for instance, can still produce stiff but correctly sequenced walking movements mediated solely by reflex action from the cord, when placed in an upright position. A reflex is an automatic response involving a small number of synapses between some sensory input and the subsequent motor output. The myotactic reflex, for example, is a single synapse reflex (one that is mono-synaptic) activated when a muscle is passively stretched, causing it to contract and offer resistance to the stretch. It is this reflex that keeps your arm from dropping when a heavy book is placed in your hand, and keeps horses from falling down when riders jump upon them. It is also this reflex that causes your knee to jerk when the patellar tendon is tapped with the physician’s rubber mallet, thus stretching the leg muscle above it. Within the skeletal muscles are found sensory organs that respond to the degree of stretch of the muscles. These stretch receptors are called muscle spindles, and each contains its own tiny muscle within, which is controlled by the central nervous system independently of the primary muscle in which the organ is located. By actively calibrating the tension on this muscle within a muscle, the CNS is able to adjust the "set" of the large muscle, allowing reflex activity to hold the degree of stretch of the primary muscle. In this way the set of the muscle spindle determines the angle of the joint that the muscle controls, thus maintaining a desired posture. This, in fact, is a simple feedback loop in which the muscle-spindle organ operates for the joint position something like a thermostat operates for the temperature of a house. It has a set point outside of which it sets in motion a corrective process in the heating system, or in the case of the muscle spindle, the primary muscle. Such reflexive feedback loops play an important role in ordinary posture such as standing or just sitting quietly. It has been shown, however, that they are also very active during locomotion, such as walking, during which reflexive activity constantly compares the ongoing position of the leg and foot joints to an internal template representing where they are supposed to be. If there is a mismatch, complex processes are set in motion to correct the situation. These usually involve simple adjustments of the necessary skeletal muscles, but in extreme cases such as losing one’s balance, may require conscious intervention. We mention these basic facts about the regulation of muscle activity, even though it is not necessarily a topic of great interest to psychologists, because it demonstrates a basic principle about the nervous system, namely, that there are corrective feedback loops at all levels of its activity. Whether one looks at postural reflexes at the level of the cord, as we did above, or at the highest levels of perceptual processing, corrective feedback seems virtually universal. It has recently been discovered, for example, that a substantial portion of the axon fibers in the optic and auditory nerves are actually carrying information out of the CNS into the eye and ear. This seemed a paradox at first, but now it is becoming increasingly recognized that these efferent neurons are supplying feedback from the brain to "tune" the eyes and ears according to what is needed in the way of visual and auditory input. Likewise, all connections in the visual brain, from the thalamus to the striate cortex, from the striate cortex to the various specialized patches of parastriate cortex, and so on, are found to have massive return feedback for no apparent reason other than to supply a constant means of active calibration and adjustment of the flow of visual information as it makes its way through the brain. Thus, the importance of dynamic feedback is seen from the highest levels of the perceptual systems to the lowest levels of the motor system. 5.7 Motor Control of the Body 137 5.7.1 Overview of the motor control system of the brain

The easiest way to understand the motor control system of the brain is to begin with an overview of the entire system, then proceed to examine the importance of the individual structures. The main structures involved in the control of movement are the cerebellum, the basal nuclei (or basal ganglia), and the primary motor cortex (M1) of the frontal lobe, as well as the thalamus as a relay center. Other regions of importance are the premotor area (PMA), located just in front of the primary motor cortex, and the supplementary motor area (SMA), lying in front of that. There is also important input from the somatosensory cortex, which you may recall is located just posterior to the primary motor cortex, on the other side of the central fissure. It is difficult to say just where a behavior begins to solidify in the brain. There are probably a wide range of areas involved. But one of the most important is without doubt the frontal lobe, especially its anterior portions. In the laboratory, a wave of electrical activity called a readiness potential is seen in the region of the supplemental motor cortex as early as a second before the actual appearance of a voluntary movement such as a finger flick. This, plus other evidence, suggests that the SMA is involved in the early stages of organizing a behavior, well before it is sent as a motor command to the muscles. For example, PET scans, which show overall levels of cortical activity, register high levels of activity in the SMA even when people are asked to imagine that they are doing something like reading aloud, or tapping their finger. It is evidently not the case, however, that activity is sent directly from the SMA and the nearby PMA to the primary motor cortex without involving other areas of the brain as well. Many behaviors, such as rising from a chair or taking a step, seem to require a kind of "start" signal from the basal nuclei. Dysfunctions of the basal nuclei can cause people to "freeze up," finding it difficult to initiate such simple activities. The advanced stages of Parkinson’s disease can cause this problem, and other difficulties as well. This condition is the result of the deficiency of a particular neurotransmitter, dopamine, in the basal nuclei. It occurs in about 1% of the population over 50. Other symptoms include a general paucity and slowness of movement, a tremor of the extremities when at rest, increased muscle tone, and difficulty organizing movements, especially ones that are slow and voluntary. On the other hand, a person with Parkinson’s disease might be able to effortlessly carry out well learned and rapid behaviors such as walking and running, once these behaviors are initiated. Damage to the cerebellum seems to result in just the opposite effects. There is a spastic quality to movement, so that, for example, reaching for a cup may throw the arm into a series of oscillations. But there is no resting tremor, as in Parkinson’s disease. There is considerable difficulty, however, with behaviors that require the organized coordination of muscles over a range of the body. Walking, running, and throwing a ball, would all be difficult for a person with a damaged cerebellum. It would seem that, while the basal nuclei is important in orga- nizing consciously executed fine motor behaviors such as dialing a telephone, the cerebellum is vital to coordinating many muscles in rapid, well- learned, sequences that are ordinarily carried out in a more or less automatic fashion. Recent research indicates that the cerebellum constantly monitors the results of such motor behaviors, making corrections as learning takes place. Acquiring the ability to ride a bicycle or throw a basketball are accomplishments of the cerebellum. Now let us return briefly to the primary motor cortex. Recordings of activity in various parts of the brain prior to the launching of movements make it clear, paradoxically, that despite the fact that the motor cortex is part of the evolutionarily new neocortex, it is not the first stage in the launching of behaviors, even voluntary ones. Depending on the nature of the behavior, either 138 5 A Neuroscience Primer the basal nuclei or the cerebellum become active before the primary motor cortex. As indicated above, these structures are involved in the production of either slow, voluntary movement, in the case of the basal nuclei, or rapid, well-learned, movements that require the coordination of many muscle groups, in the case of the cerebellum. Messages from the basal nuclei and the cerebellum are communicated to the primary motor cortex through the ventral lateral nuclei of the thalamus. The primary motor cortex, paradoxically, becomes the last forebrain structure involved, evidently patterning the behavior in a form to be sent directly to the lower brain centers and the muscles themselves.

5.7.2 The spinal pathways

Signals from the primary motor cortex find their way to the spinal pathways by two general routes. The first is a direct route, comprised of large fast-acting axons that carry action poten- tials directly from the giant pyramidal cells of the cortex (so named because they look a bit like huge upside down pyramids) directly to the level of the spinal cord, where they synapse onto the final motor axons. This pathway has traditionally been called the pyramidal track (a track is a pathway in the CNS, whereas a nerve is a similar pathway in the peripheral nervous system). The pyramidal pathway provides swift and articulate control of the skeletal muscles. The extra-pyramidal pathway has many synapses along its route, is slow, and spreads into the superior colliculi and the reticular formation, where it activates descending influences down through the spinal cord that heightens or inhibits muscle tone and reflexes. Damage to the upper cord can disconnect these influences, resulting in exaggerated or depressed reflexes as well as the loss of voluntary motor control.

5.7.3 The frontal lobe

Though we have mentioned the frontal lobe previously, we finish this section by considering it again. The recently discovered supplemental motor cortex already seems increasingly important in understanding the earliest stages of the organization of behavior. It is as if the behavior were first assembled there in the abstract, waiting for the right moment to be launched off to the basal nuclei, the cerebellum, or in some instances perhaps directly to the premotor or even the primary motor cortex. The premotor cortex has been even more recently investigated, though results are far from exhaustive at this point. It seems, however, that it holds dispositions to actÑfor example a readiness to make a particular responseÑuntil the right moment comes to release the behavior. An example would be waiting for a light on a game board to flash on before making a preselected response. In all of this we see a gradation across the frontal lobe from the very specific and motor oriented activity of the primary motor cortex, to the preparedness of the premotor area, to the abstract construction of the response in the supplementary motor cortex, a construction that may never actually be emitted as real behavior. Moving even further forward into the prefrontal lobe we find an increasing concern with thinking and planning for the future. Thus, this highly developed lobe, the hallmark of human evolution, seems as much engaged in planning for and acting into the future, whether it be the next few seconds, or the rest of one’s life, as the 5.8 Biological Rhythms 139 occipital lobe is involved in vision. seconds, or the rest of one’s life, as the occipital lobe is involved in vision.

5.8 Biological Rhythms

As biological organisms we are subject to three classes of biological rhythms. These are daily, or circadian rhythms, which repeat once a day, ultradian rhythms which repeat more than once a day, and infradian rhythms which repeat less than once a day. The word circadian derives from circa, which means "approximately," and dies which means "day." Alternatively, the prefixes ultra and infra mean "above" or "below" this value. We are all familiar with the circadian rhythm of sleep and waking. Examples of ultradian rhythms include breathing, heart action, and the electrical fluctuations of the brain that we record as EEG. Another example is the nightly rhythm of non-dream and dream sleep. Infradian rhythms include the menstrual cycle. The study of biological rhythms, or chronobiology, is both a fascinating and complex field, with a great deal still to be learned about it. Even so, as individuals and as a culture we have a great deal to assimilate from what is already known in this important area. The fact, for example, that we require teenagers to appear in school at 7:30 in the morning, ready to perform at their best, that we take the same drugs in the same doses no matter the time of day or night, and that we expect people to work for stretches of 4 to 6 hours at a time without real rest breaks, all speak to a lagging ability to put to use the basic facts of chronobiology.

5.8.1 Circadian rhythms

The most obvious circadian rhythm is the cycle of sleeping and waking. As human beings we tend to be active in the daytime and to sleep at night. We tend to be locked into this rhythm by the daily cycle of light and darkness, the most important environmental zeitgeber, or "time giver." Nocturnal animals such as owls respond to the day-night cycle in an opposite fashion to humans. Most of us sleep between 6 and 8 hours at night and are awake for about 16 or 17 hours during the day, though the waking period is not without periods of inactivity or even short naps, and sleep is often accompanied by brief episodes of waking. Over the past half century or so several experiments have been carried out in which volunteers have lived for periods of weeks or even months in environments such as caves that are isolated from the ordinary environmental cues (zeitgebers) for waking and sleeping. These individuals set their own daily schedules with no knowledge of the time in the outside world. In this situation people tend to shift rather quickly into a 25-hour daily rhythm in which they are active for 17 hours, then sleep for 8 hours. Thus they cycle forward in relationship to the ordinary day at a rate of about 1 hour per day. -hour daily rhythm in which they are active for 17 hours, then sleep for 8 hours. Thus they cycle forward in relationship to the ordinary day at a rate of about 1 hour per day. This tendency to shift forward is an important fact of human chronobiology, one that can be utilized in environments where people are required to work at varying times of the day and night. Examples of people subject to these kinds of demands include fire fighters, police, airline pilots, truckers, and so on. It is difficult to adjust to changing patterns of waking and sleeping, 140 5 A Neuroscience Primer but if a person can move forward, as is the natural human tendency, then it is easier than if their sleep periods were scuffled about randomly, or even moved backward. For instance, it is easier for a nurse to move from an early, to a late, to a night or "graveyard" shift, in this order, than the other way around. Each night she can stay up a little later, and when she moves to the next shift, it will come relatively easily. It is easier for most people to stay up in this fashion a little later each night than to try to get to sleep earlier. Studies show that performance and job satisfaction are both facilitated by moving forward around the clock rather than by moving backward. A similar situation occurs when people fly from one time zone to another. It is easier for most people to fly west than east. If you fly from New York to San Francisco you will find yourself getting tired three hours earlier than the local people there. This is not a serious problem, however, since you can stay up with them and then you will sleep very well, waking up bright and early and ready to go the next day maybe even a bit too early! Alternatively, however, on the return trip you find yourself in New York wide awake at midnight, unable to sleep until the wee hours of the morning, and then unable to get up at a reasonable time in the morning. It usually takes several days to adjust to this situation. If certain individuals stay for long enough in an environment with no external scheduleÑ for instance weeks or monthsÑthey may show an odd shift in their free running daily rhythm so that they stay awake for as much as 20 hours at a time, then sleep for as much as 12 hours. They are seemingly unaware of this shift. Along with it, however, some of the many biological rhythms ordinarily locked together, or entrained, by the light-dark cycle of the sun, become desynchronized. The lowest point in the body’s temperature cycle, which is usually lowest not long before waking, may drift forward through the sleep period and even exhibit itself during waking. At the same time the highest temperature, which ordinarily comes in the early afternoon, may drift into the sleeping period. The result, not surprisingly, is poor sleep, and a poor quality of wakefulness. This is exactly what happens in jet lag. After making a long trip, and even after adjusting to the local sleep rhythm, one may still experience several days, or a week or more, of fatigue, vague feelings of discomfort or anxiety, and poorer than normal performance. This is also what happens to people forced to work on "swing shifts," in which their work schedule is regularly shifted around dramatically. There is an age difference in the sleep cycle as well. Young people in their teens and twenties tend to shift forward toward staying awake late into the night and sleeping late in the morning. This is natural for them, and efforts to force them to perform in an alert fashion early in the morning are often unpleasant and not very productive for anyone concerned. On the other hand, with midlife and increasing age there is a tendency for the cycle to move backward. Older people experience earlier sleepiness and tend to wake up earlier in the morning, ready to get up and start the day. Subjective reports of quality of experience also indicate that most older people experience a greater sense of well-being in the morning than in the evening. Of course there are wide individual differences, with "owls" enjoying late night activity and sleeping late into the morning, and "larks" going to bed early and looking forward to getting up early the next morning. It is important for "owls" and "larks" to negotiate differences between these proclivities in close relations such as marriages. There are also age differences in the amount of sleep people need, and there are large indi- vidual differences as well. In fact, studies differ to a surprising degree on these matters. The gist of it, however, is that most adults need around 7 or 8 hours of sleep per night. Children and teenagers need closer to 10 hours. Infants may need more. On the average, newborn infants sleep about 12 hours a day, but there are very wide individual differences. Some studies indicate 5.8 Biological Rhythms 141 that older people, past midlife, require less sleep, getting along on as little as 5 or 6 hours per night if they are generally healthy. Certainly it is true that many older people expect to sleep as much or more than they did when they were younger and especially after retirement may go to bed early with the expectation of a long night’s rest. Then they may find that they have difficulty sleeping. This would seem to be part of the problem with the insomnia commonly reported by older people. It is true, however, that in general we do not sleep as soundly as we age as we did when we were young. In the meantime, people who do not get sufficient sleep often suffer cognitive impairments. There are many theories on why we need to sleep. Generally they are either restorative or survival theories. The former focuses on the restoration of the body, or the need to take the brain off-line, as it were, for a period of rest each day. The first of these emphasizes the need to eliminate toxins or waste products during sleep, or the reduction of some sleep-inducing hormone. Such theories appear with surprising regularity, but at the time of this writing none have entirely won the field. It is well to keep in mind in evaluating these ideas that there is no clear biological evidence for a need to become entirely inactive for long periods of time each day in terms of the restoration of muscle tissues or other functions. A second type of restorative theory emphasizes the idea that, for one reason or another, the brain, and especially its higher functions, cannot be operated continuously without allowing it down time. Charles Sherrington believed, wrongly, that the cortex is inactive during sleep, and needed sleep for recuperation. Ivan Pavlov likewise thought the cortex to be inactive in sleep, and that sleep was necessary to balance excitatory and inhibitory process at play in the cortex. These notions seem outmoded today, but modern efforts to model the cortex with neural-network computer simulations have suggested the possibility that sleep, and dreaming sleep in particular, may be important for sorting out important from unimportant information in memory consolidation. Such computer circuits accumulate too much information during their active learning runs. They seem to benefit by chaotic "dream" episodes in which they are disconnected from outside input. During these episodes they dissolve weak, spurious, learning connections that build up over time and interfere with more important memories. Actually this computer simulation work is very tenuous, but nevertheless there is a substantial and growing body of evidence suggesting that dreaming may play a facilitory role in memory consolidation in humans as well as other species. We should also note before leaving the restorative theories that it is only warm-blooded animals that produce deep sleep of the kind we are familiar with as human beings. During such sleep the body temperature drops below its daytime level, and along with behavioral strategies such as curling up in a warm, protected, location, and sleeping with others, conserves a considerable amount of heat energy. Considering that the evolutionary shift from cold-blooded to warm-blooded animals was paid for by a multiple increase in the total energy necessary to keep alive, this conservation of energy during sleeping may be an important part of how sleep evolved in the first place. The survival (SYA, or "save your ass!") theories of sleep tend to be behavioral, emphasizing that most animals tend to be active and busy most of the time when they are awake. This "business" has great survival benefit, but being out in the jungle, or climbing glaciers, when the day’s supply of provisions has already been gathered, can result in a person, or an animal, being eaten by some of the "busy" predators out there. Sleep, by this view, is a neurological mechanism that evolved to keep the organism out of mischief when no more activity is required for the day. Most of the evidence for this theory comes from looking at animals comparatively. 142 5 A Neuroscience Primer

Wild African dogs, for instance, hunt very efficiently and need only 2 to 4 hours a day to gather food. The remainder of the time they laze about and sleep. Lions do the same thing. Ungulates, such as zebra and gazelle, that live on grass and must eat 20 to 22 hours a day to get enough food, typically sleep only a few hours a day standing up. If that is not enough, there are studies of rare human beings who seem to need no more than a few hours or less sleep each week. These individuals seem to be healthy, and their presence argues against any notion that it is biologically necessary for the brain to become inactive for long periods of time daily. However, they are exceptions to the rule.

5.8.2 Ultradian rhythms

There seems to be a cycling between high levels of activity and periods of passive rest that we all experience every few hours throughout the day. There are large individual differences, but the length of the cycle for most people seems to be roughly 45 minutes to 3 hours. The psychologist Ernest Rossi (1996) has found that the ability to become aware of this cycle, and give oneself brief periods of rest during the passive moments, is important to stress management and mental health. A similar cycle between active and passive states seems to characterize sleep as well. This is the vacillation between dream and non-dream sleep. Over a century ago, the Scottish physiologist Richard Caton discovered minute electrical fluctuations from the surface of the brain of a rabbit. These electrical oscillations are today known as the electroencephalogram (or EEG). They were again recorded from the surface of the human head in 1928 by Hans Berger (see Gregory, 1987; and Hobson, 1988), who shortly thereafter discovered that the wave pattern was more rapid with the eyes opened than closed, and that the rhythm became much slower during sleep. His findings finally became widely accepted in 1933, when the prominent British brain researchers Lord Edgar Adrian and Brian Matthews replicated these measurements using more refined instrumentation. The final step in the history of the early research on brain activity and sleep came in 1953 when a brilliant graduate student, Eugene Azeritsky, and the prominent sleep researcher, Nathaniel Kleitman, at the University of Chicago, discovered that periods of sizable rapid eye movements during sleep (REM) were associated with dreaming. Since such movements, signaling dreaming, are easy to identify using nonintrusive surface electrodes placed on the skin near the eyes, this discovery opened the door to modern investigations of the neuropsychology of sleep and dreaming. Research since that time makes it clear that sleep can be divided into non-REM sleep and intermittent periods of REM sleep. REM sleep is usually associated with dreaming, and is sometimes termed active sleep, while non-REM sleep is not associated with dreaming and is sometimes called slow-wave sleep, referring to the long, slow, EEG waves that characterize it. This division, however, is not always exclusive, as episodes of dreaming have been reported from non-REM sleep as well. This is clearly an area that needs more clarification. EEG analysis reveals that the shift from wakefulness into sleep, and then into deep non- REM sleep, is associated with an increase in the amplitude (size) of the EEG rhythm, and a slowing of its oscillations. Wakefulness is marked by a shallow, rapid, and relatively high frequency EEG pattern termed beta, of 12 Hz(Hertz or cycles per second). Passively relaxing with the eyes closed produces a larger, slower, rhythm termed alpha that ranges from 8 to 12 Hz. It is normal for a person to pass through alpha while falling asleep. In an ordinary night’s sleep a person will then pass through the series of four stages of non-REM sleep. The first of 5.8 Biological Rhythms 143 these, or Stage 1, is characterized by theta rhythms, ranging from 4 to 8 Hz. Stage 2 sleep is also characterized by theta, but when one is falling into deep sleep rather, for example, than taking a nap, the record also displays occasional bursts of rapid high frequency activity that may last from half a second or less up to two or 3 seconds. These are called sleep spindles because on displays they look like spindles of thread. Along with these are seen occasional abrupt and brief bursts of irregular electrical activity termed K complexes. Stage 3 is characterized by an even slower and larger pattern called delta rhythm. It ranges from 3 Hz down to 1 Hz, or in other words, one oscillation per second. Stage 3 is characterized by the upper end of delta, while stage 4, the deepest stage of non-REM sleep, is associated with the lowest end of delta. Let us take a look at some of the physiological and experiential characteristics of each of these stages. Stage 1 sleep is a relaxed transitional condition in which one experiences a sense of drifting off to sleep. Somewhere in Stage 1 there is a slow vertical eye roll that evidently corresponds to the inward turning of consciousness and the loss of sensory input from the outside world. Stage 2 often is associated with deep relaxation and a sense of floating or reverie. Images, usually static in nature, may be seen but rarely remembered upon waking. These are termed hypnagogic images. This stage of sleep may also be accompanied by a large whole-body spasm, which if it occurs, will probably wake a person up, and anyone else in bed as well! This is a normal phenomenon. hypnagogic images. This stage of sleep may also be accompanied by a large whole-body spasm, which if it occurs, will probably wake a person up, and anyone else in bed as well! This is a normal phenomenon. Stage 3 is a deeply relaxed state, but one in which body activity in the form of rolling or thrashing about in bed is not unusual. Stage 4 is the deepest and most restful stage of non- REM sleep. It is the only form of non-REM sleep to demonstrate rebound, or compensation in length when a person has lost sleep or is excessively fatigued. Ironically, it is Stage 4 sleep that suffers the greatest loss as a consequence of taking many sleep medications, which results in more time in the superficial stages of sleep, and poor rest. Some other medications cut into REM sleep. In an ordinary night’s sleep, within less than the first hour and sometimes within the first few minutes, one slips through the four stages from Stage 1 to Stage 4, then continues through the night to oscillate between these stages, sliding back and forth from shallow to deep sleep in a roughly rhythmic fashion with a period very roughly of 90 minutes. The episodes of Stage 4 sleep, however, tend to be longest at the beginning of the night’s sleep cycle, and become very short or entirely absent during the last few hours of sleep. Opposite to Stage 4 non-REM sleep, REM periods, which occur with roughly 90 minute regu- larity, are very short during the beginning of the sleep cycle, and become longer throughout the night, becoming as long 30 to 45 minutes or more if one sleeps late into the morning. They are characterized by an EEG pattern that looks superficially like Stage 1 sleep or even wakefulness. During dreaming, sensory input to the higher brain centers is actively inhibited by the retic- ular formation. And there is a prominent shift of the brain’s neurochemical balance from the aminergic dominance, characteristic of waking, to cholinergic dominance. Activity in important centers of the brain-stem reticular formation is redistributed, while large electrical (PGO) waves originate in the pontine reticular formation and travel through the lateral geniculate bodies of the thalamus to the occipital cortex. These changes are sufficiently impressive to persuade J. Allan Hobson (1988), one of the most important contemporary researchers of dreaming, that the dream experience can be entirely accounted for in terms of physiology. He has proposed an activation-synthesis hypothesis that explains dream imagery as the misinterpretation of PGO 144 5 A Neuroscience Primer stimulation by the visual cortex as sensory input. This is a view that many writers find overly simplistic, but represents an important contribution in current thinking about the brain. The still unfolding story of sleeping and the brain is rich. We have only scratched the surface, and encourage the reader to pursue this topic more deeply in terms of his or her own interest, as well as the many fascinating stories coming from the study of chronobiology.

5.9 Drive States and Emotion

ÒEmotion refers to a relatively brief episode of coordinated brain, autonomic, and behav- ioral changes that facilitate a response to an external or internal event of significance for the organism. Feelings are the subjective representation of emotionsÉ. Mood typically refers to a diffuse affective state that is of lower intensity but considerably longer in duration.Ó (Davidson, Scherer, and Goldsmith, 2003, p. xiii) In the brain, biological drive states such as hunger, thirst, and sexual arousal, are closely related to emotional states. This is because both involve the limbic system. Beyond this, drives and emotions both have strong physiological as well as psychological aspects to their expression. Sexual arousal, for example, is both a state of mind and a state of the body. It is also, arguably, both a drive state and an emotional condition. With this in mind, it is not surprising that the history of the study of biological drives and the study of emotions are very much intertwined. On the following pages we will emphasize the history and the basic facts that have come out of it, feeling that this is the best way to obtain a clear sense of the nature of the challenges these fields present. emotions are very much intertwined. On the following pages we will emphasize the history and the basic facts that have come out of it, feeling that this is the best way to obtain a clear sense of the nature of the challenges these fields present.

5.9.1 Biological drives

The history of the study of biological drives is also the history of the study of how the body maintains a constant internal environment, or milieu interne as it was originally termed by the prominent French physiologist Claud Bernard. In the 1920s, the American physiologist Walter Cannon (1929) stressed the active role of the organism in maintaining this environment. He emphasized the importance of the autonomic nervous system (ANS) in the regulation of internal physiological states. You may recall that it was he who designated the sympathetic nervous system as the "fight or flight" system, stressing the latter’s role in arousal states such as displayed during fighting and fleeing. He also emphasized the role of the parasympathetic nervous system in relaxation and energy conservation. At this point the reader may wish to take a little time to review the basic anatomy and physiology of the ANS from one of the recommended texts. In 1934, Walter Cannon proposed an essentially behaviorist theory of drive motivation, sug- gesting that hunger, thirst, and the like, are the result of local "drive stimulation" in the body. The prominent behavioral psychologist Clark L. Hull (1943) proposed a similar notion in the form of a variable Sd, for "drive stimulus," in one of his behavioral equations. These local the- ories were based on the premise that drive states amount to no more than stimulation from 5.9 Drive States and Emotion 145 somewhere in the body. Hunger, for example, is no more and no less than feelings of emptiness in the stomach, and thirst is a dryness in the mouth and throat. Such ideas motivated a good deal of research. For instance, the dorsal root ganglia of the spinal cord were cut in laboratory animals to see if the loss of afferent input would eliminate hunger. Hungry medical students swallowed balloons that could be inflated through a tube that extended down their throats, to see if the subsequent feeling of fullness in the stomach would eliminate their appetite. Hungry dogs were fed food that immediately passed back out of their bodies through surgically im- planted tubes (fistula) in the throat, to see if the sensation of food in the mouth along with the act of eating was sufficient to satiate their hunger. All of this unsavory research led to a single conclusion: biological drives cannot be reduced to local sources of stimulation in the body. In 1938, Karl Lashley, perhaps the greatest physiological psychologist of the first half of the 20th century, concluded that drive states must surely rely on important complex processes in the central nervous system itself. In 1943, the first of several neurological theories of drive was published by a physiological psychologist named Clifford Morgan (see Morgan and Steller, 1950), emphasizing the idea of a central motive state. The latter referred to the hypothesized presence of an active set of neurons in the brain that corresponds to the drive state of the organism. The idea was inspired in part by the fact that motivation usually leads to increased levels of activity. A hungry animal becomes active, searching about for food. The condition of hunger, then, not only directs the animal’s behavior but puts it into an increased state of physical arousal. The latter was thought to be the result of elevated activity in particular neurons. Actually, what Morgan had in mind was some sort of "reverberating circuit," in other words, neurons that formed a closed loop which folds back onto itself, so that once set into activity it continues to discharge cyclically until an inhibitory influence acts upon it. That something was assumed to be a more or less direct result of the arrival of the stimulus that satisfies the drive. The idea of reverberating circuits was a good one, and very useful in several branches of neurophysiology. But by the late 1960s it was finally abandoned for a lack of supporting evidence. No researcher had actually located such circuits anywhere in the nervous system. Originally, Morgan did not have the research data with which to tie down the locations of his central motive states. During the decade following the publication of his theory, however, a flurry of research pointed to the importance of the hypothalamus in drive states and emotions as well. In the mid-1950s Morgan (1965) developed a more anatomically detailed theory of bio-motivation that placed the location of these drive states in the hypothalamus. During this decade, and on into the 1960s, it was commonly thought that hunger, thirst, sex, and powerful emotions such as fear and rage, were associated with specific activation or "start" centers in the hypothalamus. There was considerable evidence that particular regions, or nuclei, of the hypothalamus played this role, and that other "stop" centers served the function of turning drives off in the presence of appropriate stimuli. The evidence came from a wealth of studies, some apparently demonstrating that drives, as well as fear and rage, can effectively be switched on or off by gentle electrical stimulation to appropriate areas in the hypothalamus. Other in- vestigations demonstrated the effectiveness of tissue ablations (removals) in these same regions, apparently having the reverse effects. The accumulated evidence for these excitation and inhibition centers suggested that for each drive state there is an on-switch, as it were, and a mutually antagonistic off-switch, that oppose each other in dynamic balance. Most readers will recall pictures of enormously fat laboratory rats that could not seem to stop eating no matter how much they had consumed, or monkeys with wires attached to their heads that would fly into a rage at the tap of a button. It was a 146 5 A Neuroscience Primer very complete picture of the drive mechanisms of the brain, and researchers were proceeding to explore not only the locations of these centers, but their neurochemical features and anatomical connections. Unfortunately, as time passed and more research was reported the story began to fray, then unravel, and finally it came all but completely undone. The story of the deterioration of this clean picture of drive mechanisms in the hypothalamus is complex and involves many kinds of evidence. Essentially, it became increasingly unclear that the identified regions of the hypothalamus were what they appeared to be. For example, one turned out to be directly in the pathway of the neuronal track that carries taste sensations. When it was ablated, the animal lost its sense of taste and its appetite. At the same time, other important centers of motivation were being found throughout the limbic system. If that were not enough, the experimentally induced motivational states came under question. The enormously overweight rat, for instance, was discovered to be very fussy about what it would ingest, and in fact would not eat anything that was not especially tasty. Nor would it work for food, like a normal hungry rat. Clearly it was not as famished as originally thought, but something more complex and not at all understood was going on. Morgan had also, and somewhat more successfully, emphasized the importance of the retic- ular formation in producing the increased activation that accompanies a biological drive state. This was a significant anatomical addition to the earlier views that tended to center almost entirely on the hypothalamus. From a modern perspective it was a step in the right direction, but did not go far enough. During the 1950s and 1960s an increasing body of research was implicating the importance of other parts of the limbic system, beyond the hypothalamus, in drive states. Eventually it became apparent that, though the hypothalamus is important in drives, and emotions as well, it is but part of the larger limbic system, and ultimately must be viewed in that context. What had happened during the earlier years, is that the significance of the hypothalamus had been overemphasized, leaving the importance of many other areas of the brain to be discovered only later. reticular formation in producing the increased activation that accompanies a biological drive state. This was a significant anatomical addition to the earlier views that tended to center almost entirely on the hypothalamus. From a modern perspective it was a step in the right direction, but did not go far enough. During the 1950s and 1960s an increasing body of research was implicating the importance of other parts of the limbic system, beyond the hypothalamus, in drive states. Eventually it became apparent that, though the hypothalamus is important in drives, and emotions as well, it is but part of the larger limbic system, and ultimately must be viewed in that context. What had happened during the earlier years, is that the significance of the hypothalamus had been overemphasized, leaving the importance of many other areas of the brain to be discovered only later. The limbic system actually began its scientific career in 1878 when Paul Broca, the French neurologist who discovered the language area in the left posterior frontal lobe that now bears his name, identified a series of structures that formed a border, or limbus, around the core of the forebrain. They included the hippocampus at the bottom, and the cingulate gyrus above. Together he called them the limbic lobe. (Here it would be useful for the reader to review a text diagram of the limbic system.) In 1937, a rather obscure neurologist named James Papez published a paper in which he enlarged the original limbic lobe into an entire circuit, which he argued was important in the production and expression of emotion. This expanded Papez circuit included the original hippocampus and cingulate gyrus, but added the anterior thalamus, the fornix, and certain other structures in what he believed to be an entire circuit. The latter included the amygdala and septal nuclei. In his view, the hypothalamus played a determining role in the production of emotion, but it was the cingulate gyrus, with its intimate connections 5.9 Drive States and Emotion 147 with the neocortex, that led to the experience of emotion. Since that time it has become clear that the limbic system is as important in biological drive states as it is for emotion. Papez’s circuit was comprised primarily of the evolutionarily old mammalian cortex, often called paleocortex, as distinguished from the newer neocortex which covers the entire outer surface of the forebrain in modern mammals. Curiously, his work went all but unnoticed by the neurological community for over 10 years, finally coming to light in the 1950s as the limbic system. It is now recognized that drive states involve activity scattered throughout this system, and engage the reticular formation as well. In recent years much of the work on biological drives has centered on the complex physio- logical mechanisms in the body that cause hunger and other drives, and how these also lead to satisfaction. For instance, the nervous system uses sophisticated feedback loops to regulate body weight. For each person, the brain has a "set point" for the total amount of body fat. The actual amount of fat is evidently signaled by the insulin level in the cerebrospinal fluid. If the amount of body fat as communicated by this signal falls below that set point, the brain causes us to become hungry, so we eat more food and store more fat. Since the central set point changes only very slowly, a crash diet is typically followed by a crash weight gain. Satisfaction is signaled by two mechanisms, a short- term satiety mechanism and a long-term one. The short-term one involves hormonal responses of the gut, produced during digestion, that signal the brain of the presence of food, temporarily shutting off the hunger drive when enough has been eaten. The longer term mechanism involves the insulin level in the cerebrospinal fluid, and if it is not satisfiedÑfor example, if the person has had a meal of junk foodÑit leads to the reappearance of hunger. Thus, for example, poor nutritional habits lead to overeating. on, that signal the brain of the presence of food, temporarily shutting off the hunger drive when enough has been eaten. The longer term mechanism involves the insulin level in the cerebrospinal fluid, and if it is not satisfiedÑfor example, if the person has had a meal of junk foodÑit leads to the reappearance of hunger. Thus, for example, poor nutritional habits lead to overeating. A principal connection between the limbic system and the biological condition of the rest of the body is through the hypothalamus. The latter contains cells directly sensitive to a number of hormones, or blood-borne messengers. The hypothalamus is also intimately associated with the pituitary gland, to which it is connected by a long thin stalk. The pituitary gland is sometimes termed the master gland of the body because it controls the entire system of endocrine glands, vital to the growth and maintenance of the body as well as to its day-to-day regulation. This gland can be divided into a posterior lobe, which is controlled by the hypothalamus through direct neuronal connections, and an anterior lobe that communicates with the hypothalamus by a specialized set of blood vesicles termed the portal system. One of the functions of the posterior of the hypothalamus lobe is the release of vasopressin (also called antidiuretic hor- mone or ADH) which regulates blood volume and salt concentration. The anterior lobe of the pituitary gland releases a number of hormones that control, for example, the basic metabolism (by the release of thyrotropin), and the mobilization of the body under stress (by the release of adrenocorticotropin or ACTH). Thus the hypothalamus, representing the activity of the limbic system, in conjunction with the pituitary gland, which is directly interfaced with the blood-borne hormonal messenger system of the body, presents an important axis of interaction between the brain on the one hand and the biological functions of the body on the other. 148 5 A Neuroscience Primer 5.9.2 Emotions

The modern history of psychological theories of emotion begins with William James, who in 1884 proposed the counterintuitive notion that emotions are the by-product of physiological reactions to emotion-causing situations. His theory is often introduced by asking a question taken from his own discussion. If you come across a bear in the forest, "Do you run because you’re scared, or are you scared because you run?" James’ answer was that you’re scared because you run. He did not mean this quite literally, though. Rather, he meant that the substance of emotion is the sensory experience of the bodily fear responseÑthe feeling of the racing heart, the tightening of the throat, and the like. The Danish psychologist William Lange had developed a similar view at about the same time, so this idea became known as the James-Lange theory of emotion. Most modern textbooks are quick to point out the inadequacy of the James-Lange theory in light of the vast research literature indicating the importance of central brain processes in the experience of emotion. The present writers, however, feel that this is a bit unfair. James’ psychology was about consciousness more than it was about behavior, or even the brain, though he was very much interested in the role played by the brain. His description of emotion is, in fact, quite valid if one looks at the experiential components of emotions. They are felt events in the body. Love, for example, is a warm tenderness in the heart and throat, whereas anxiety is a hollowness of the chest. Or, to put the shoe on the other foot, no one feels emotions anywhere else but in the body! So as phenomenology, James’ description was entirely correct. As a physiological theory, however, it was not. nywhere else but in the body! So as phenomenology, James’ description was entirely correct. As a physiological theory, however, it was not. In 1929 Walter Cannon proposed that the hypothalamus plays an important role in emotional responses. He theorized that the appearance of an emotion-causing stimulus, such as a bear, triggers a hypothalamic reaction which signals the physiological reactions of the body, but at the same time signals a parallel response in the neocortex. It was the latter that provided the experience of emotion. This theory was not a bad start, but the entire limbic system tends in many ways to work as a unit, and by the 1950s it was becoming increasingly apparent that, as with biological drive states, emotional responses involve activity scattered throughout the entire limbic system and beyond. Because of the considerable anatomical and neurochemical complexity of the limbic system, the research in the field of emotion is difficult to sum up in a fashion as succinct as, for example, that in vision or motor control of the body. We leave it to the interested readers to investigate this field as their interest dictates. We note, however, that much recent work in the area of emotion has dealt with its regulation by other parts of the brain. In particular the frontal lobe, in fact its most anterior portion, the prefrontal lobe, has been the center of much recent discussion. Interest in the role of the frontal lobe in emotion dates back at least to the now famous disaster of Phineas Gage, a 25-year-old Vermont rail worker who in 1848 was struck by a flying iron rod that passed through his left cheek and out the top of his head, destroying much of the anterior pole of his frontal lobe. Miraculously, he retained consciousness, talking all the way back to a nearby hotel and walking up a long flight of stairs there! After a difficult bout with infection, he recovered almost completely, if outer appearance is any measure of recovery. But soon it became apparent that he was not the same person. Previously disciplined and well- mannered, Phineas became sloppy, quick tempered, vulgar, and ill-mannered. He soon lost his 5.9 Drive States and Emotion 149 job and began to travel, for a time presenting himself and his iron rod as a circus attraction. He never recovered his previous temperament. If nothing else, Gage’s case suggests that the frontal lobe is important in impulse control, including impulses that have their roots in the limbic system. More recent work on clinical disorders as seemingly disparate as excessive anxiety and attentional deficit disorder suggest that the frontal lobe, indeed, plays an active role in selecting and inhibiting outputs of the limbic system as well as other areas of the brain. Paradoxically, the loss of the most anterior aspect of the frontal lobe can result in a state of reduced anxiety and lower activity. This observation in chimpanzees, along with a modicum of theory, led to the invention of the frontal (or prefrontal) lobotomy operations of the 1940s and 1950s, in which the most frontal pole of the frontal lobe was cut away from the rest of the brain. This was done with thousands of patients, often in a doctor’s office, with apparently excellent results. Anxiety cases mellowed, and irritable people relaxed. IQ scores sometimes even went up, and several studies made it clear that psychiatrists could not tell in an interview whether patients had been lobotomized or not. Only the families of these patients were not happy, often claiming that their lobotomized loved ones had lost their souls. It is amazing today to think that so recently psychiatrists imagined that cutting away a sizable chunk of the most recently evolved cortical tissue would have no deleterious effect on the victim. In fact, we now know that prefrontal patients suffer a number of disorders, including an inability and lack of will to plan for the future, difficulties with concentration and decision making, and a loss of interest in other people. These were all qualities originally seen in Phineas Gage, though his more extensively damaged frontal lobe resulted in a loss of impulse control as well. loved ones had lost their souls. It is amazing today to think that so recently psychiatrists imagined that cutting away a sizable chunk of the most recently evolved cortical tissue would have no deleterious effect on the victim. In fact, we now know that prefrontal patients suffer a number of disorders, including an inability and lack of will to plan for the future, difficulties with concentration and decision making, and a loss of interest in other people. These were all qualities originally seen in Phineas Gage, though his more extensively damaged frontal lobe resulted in a loss of impulse control as well. Recent work, especially by Antonio Damasio (1994), has shown that people who have suffered damage to the prefrontal cortex have difficulty making connections with their emotions. Thus they are forced to make decisions for purely cognitive reasons, without the counsel of feeling. The result seems to be that, while such individuals may appear as capable as normal persons on intelligence and informational texts, they do not make productive practical decisions. The continuing history of research on the frontal lobe makes fascinating reading; we recom- mend it as an excellent topic for a research paper. Another topic worth following up, though here it would take us away from the discussion of the brain itself, is the relationship of emotion to cognition. The results of a number of investigations have suggested that emotional experiences are influenced by their situational contexts. The sense of fear, for example, is quite different on a roller coaster than in a car that has gone out of control; assuming you have chosen to ride the roller coaster. Several investigations have found that injecting adrenaline into the blood, thus increasing heart rate and activating the body, leads to a sense of excitement with no specific content. The most famous of these, published in 1962 by Stanley Schacter and Jerome Singer, found that subjects with artificially elevated adrenaline levels were more susceptible to situa- tional influences than other subjects, reporting feelings of anger or humor, depending on how other ("confederate") subjects behaved. This investigation has been very influential and is often cited in the literature on emotion. We would note, however, that Schacter and Singer’s findings 150 5 A Neuroscience Primer were less robust than is usually appreciated. Many of the treatment conditions in their very complex study did not yield statistical significance, and in general their results have been diffi- cult to replicate. It would appear that the extent to which emotional experiences are influenced by cognitive interpretations is still an open-ended question.

5.10 Learning and Memory

ÒIndividual neurons in the medial temporal lobe are able to recognize specific people and ob- jectsÉ. For example, a single neuron in the left posterior hippocampus of one subject responded to all pictures of Jennifer AnistonÉ. In another patient, pictures of actress Halle Berry activated a neuron in the right anterior hippocampus.Ó (Quiroga et al., 2005) activated a neuron in the right anterior hippocampus.Ó (Quiroga et al., 2005) William James’ (1890/1981) treatment of memory in his Principles of Psychology was both poetic and powerfully descriptive. Yet he had no idea of how the brain made memories. In James’ day, and for many decades later, the relationship of memory to the brain remained a fascinating and profound mystery. It still remains so, yet recent research has been enormously exciting and illuminating. Today it is one of the most vigorous and fast-paced fields in the study of the brain and the mind. Part of the reason for the current success of this field is the presence of a critical mass of carefully gathered clinical data on memory loss and brain trauma. Part of it is due to increasingly refined techniques of observing the chemical and electrical events at critical neural synapses during learning. And part of it is the result of the new and powerful techniques for imaging the normal brain in operation. The latter include positron emission tomography (PET scans), pictures of the metabolic activity of the brain produced by the emission of positrons from radioactive glucose previously injected into the blood stream, as well as functional magnetic resonance imaging, produced by hydrogen emitted radio waves from active tissues in the brain. A recently developed technique reads the minute magnetic fluctuations of neural cells in a fashion comparable to the EEG. It is termed the magnetoencephalogram (MEG), and unlike the EEG which registers from only the surface tissues of the brain, the MEG allows in-depth instantaneous images of the living brain in action. Psychologists today make several distinctions regarding types of memory. Most important for the study of the brain would seem to be the division between declarative and procedu- ral memories. The former concern information that can be reported out, such as one’s name and address, or what one had for breakfast. An important type of declarative memory con- cerns particular events, or recollections as they are commonly called. An example would be remembering the events of last summer’s vacation. This is termed an episodic memory, since it concerns a particular episode in one’s life, and not an abstract piece of information such as the names of the capitals of the states. Much of the contemporary neuropsychology of memory deals with such episodic memories. Procedural memories, on the other hand, are learned sequences of behaviors, such as riding a bicycle or skiing. Interestingly, declarative memories are often easily learned and easily forgotten, while procedural memories frequently require significant practice, but take a long time to forget. Who, for example, has lost the ability to ride a bicycle? Procedural memories are very much like the habits referred to by the behaviorists. 5.10 Learning and Memory 151 5.10.1 Where is memory?

One of the first questions that neuroscientists asked about the brain’s role in memory is whether the neurological record of past events (the engram or memory trace) is located in a particular location, like on the hard disk of a desktop computer, or whether it is scattered into different locations throughout the brain. This is one version of the broader specificity of function question that has been debated since the time of Florens and the phrenologists. It turns out that both sides are correct to a limited degree, as we will see. whether the neurological record of past events (the engram or memory trace) is located in a particular location, like on the hard disk of a desktop computer, or whether it is scattered into different locations throughout the brain. This is one version of the broader specificity of function question that has been debated since the time of Florens and the phrenologists. It turns out that both sides are correct to a limited degree, as we will see. The classic studies on the specificity of function in memory were carried out by Karl Lashley in the 1920s. He conducted many experiments with rats using a maze with a start box at one end and a goal box with food at the other. In some instances he allowed rats to first learn the course of the maze, then he ablated (removed) various portions of their neocortex. Not surprisingly, the animals performed more poorly after the removal of cortical tissue, suggesting a loss of memory. In other instances he removed portions of the neocortex before training the rats in the maze to begin. In these cases he found the rats to learn less quickly than their normal counterparts. After collecting the data from many such investigations, Lashley concluded that it was not the specific cortical area that was important, but the size of the lesion. He theorized that memory is dispersed widely throughout the cortex (Lashley, 1929). Since Lashley’s day, however, a large number of studies have accumulated suggesting that some areas of the cortex are more important for memory, particularly in the formation of memories, than others. In the 1950s the prominent Canadian neurosurgeon, Wilder Penfield, made the remarkable observation that when, during brain surgery (which normally uses only local anesthesia), a light-stimulating electric current is applied to the right temporal lobe of certain patients, they report what seem to be vivid flashbacks of earlier memories, replete with sounds, smells, and emotional atmospheres. He concluded that these were reactivated memories. It now turns out, however, that in fact, few if any of these unusual experiences were reactivated memories. Just what they were is actually still unknown, but continuing investigations verify the importance of the temporal lobes in memory. Of specific interest are the medial and inferior sections of the temporal lobe, and moving around the enfolded lower extreme of the temporal lobe, the perirhinal, rhinal, and entorhinal cortexes, and the hippocampus, all names for regions of the enfolded underbelly of the inferior temporal lobe. (It would be helpful to look at a text diagram at this point.) The importance of the hippocampus in the consolidation of experiences into long-term mem- ories and the formation of declarative memories has been suspected for many years, both as a result of ablation studies with animals, and from human clinical data. Recently, however, the importance of the temporal lobe itself, and the areas in and around the rhinal cortex, has come to light as vitally important in the formation of memories. One of the most dramatic clinical cases that demonstrates the importance of these areas concerns a man known in the literature as "H.M.," who in 1953 at the age of 27 was operated on in a last-ditch effort to control increasingly severe epileptic seizures. The operation removed major portions of the entire medial and lower regions of the temporal lobe bilaterally (on both sides), right down through the hippocampus, taking even the associated limbic area of the amygdala. The operation was successful as far as 152 5 A Neuroscience Primer the epilepsy goes, but left H.M. with virtually no ability to form new declarative memories. He simply could not remember anything for more than about 5 minutes. He could not follow a TV show because he did not remember what happened just a few minutes previously. He could not recall new acquaintances, including neurologist Brenda Milner who had worked with him for decades. The fact that H.M. recalled most of the events prior to the surgeryÑthough he had some loss for the years immediately preceding itÑsuggests that the areas removed are much more important to the formation of memories than to their permanent storage, which evidently is located elsewhere, perhaps scattered widely throughout the cortex as Lashley believed. (This form of memory loss, beginning at the time of the operation, is called anterograde amnesia, as distinguished with H.M.’s lesser degree of retrograde amnesia for events that occurred prior to the surgery.) Semantic memory is a major division of declarative memory that includes knowledge of the meaning of objects and words. Data from functioning neuroimaging of the human brain indicates that information about salient properties of an object is stored in sensory and motor systems active when that information was acquired. As a result, object concepts belonging to different categories like animals or tools are represented in partially distinct sensory and motor property-based neural networks (Martin, 2006). During the 1980s Mortimer Mishkin (Mishkin and Appenzeller, 1987), at the National Insti- tutes of Mental Health, performed a remarkable series of experiments on anterograde amnesia with rhesus macaques, in which he mapped a pathway that seems essential to the formation of visual memories. Mishkin wanted to devise an experimental task that would assure him that he was working with cognitive recall of the type that would be called declarative in human subjects. He settled on a novelty task, one in which the monkey was required on each trial to select the single novel object from an assortment of items that it had previously seen. Normal monkeys are very good at this task, but ones with lesions to the areas discussed above show dramatic deficits. They apparently cannot recall what they have previously seen and what they have not. Compiling the results of many experimental investigations, both of his own and those of others, with clinical data such as the case of H.M., Mishkin proposed a circuit for the formation of visual memories. This circuit begins with the primary visual cortex and continues out through the various cortical regions adjacent to it, on down into the medial temporal lobe, an area we have already identified as important in memory formation. From there activity is transmitted into the amygdala and hippocampus (though at the time of this writing it appears that the adjacent rhinal cortical regions are vitally involved in memory formation as well). Then activity is transmitted forward to the prefrontal cortex, an area important in conscious recall. Damage to the medial temporal lobe, the amygdala, hippocampus, or rhinal region can cause significant anterograde amnesia in experimental animals. These areas seem to be involved in many cases of human amnesia as well, as we saw with H.M. It is important to note that the medial temporal lobe and the adjacent limbic system struc- tures, the amygdala and the hippocampus, and probably the rhinal cortex, are not solely in- volved in the formation of visual memories. As seen in the case of H.M., these areas also are vital to the formation of other memories. Nor are they the only areas that have been identified as important for the formation of declarative memories. An interesting case of a young airman, known in the literature as "N.A.," exhibits a very similar anterograde memory loss, though perhaps not as complete as with H.M., as the result of an accidental fencing foil stab through the right nostril that damaged the left dorsomedial nucleus of his thalamus. This area, as well 5.10 Learning and Memory 153 as the small mammillary bodies at the base of the hypothalamus, seems to be part of a common complex circuit that is also involved in the formation of memory. As seen in the case of N.A., damage to these diencephalic structures produces significant anterograde memory loss. One clinical condition associated with damage to the dorsomedial thalamus and the mammillary bodies is Korsakoff’s syndrome. This condition, brought on by a thiamin deficiency sometimes associated with chronic alcoholism, involves confusion, confabulation, and a severe memory loss for new information as well as earlier memories. Many instances of traumatic anterograde amnesia, as with H.M. and N.A., seem to be much more destructive to the formation of new memories than to the continuation of older ones. It is indeed rare for persons to lose well-established memories that have existed for years prior to the onset of the memory disorder. This fact argues for two conclusions. First, that the structures discussed above are more important to the formation of new memoriesÑa process researchers call consolidationÑthan to the long-term storage of them. Second, that well-established memories do not seem to be confined to particular regions of the cortex, where they would be vulnerable to loss in the case of local tissue damage, for example as the result of a stroke. If permanent memories are not stored in specific locations in the brain, just how are they stored? One of the oldest and still most productive ideas about this was suggested in 1949 by Lashley’s student, the Canadian researcher Donald Hebb. His suggestion was both simple and powerful. It is simply that the neural elements that are activated by any particular experience tend to form common bonds, so that repeated activation creates a stable pattern or network which is easily triggered again by the original stimulus (Hebb, 1949). For instance, consider one of your grandmothers. Her face has a unique shape that activates a particular set of neurons in your visual system. Let us say she also has graying hair that hangs in a special way that suits her, and she wears a perfume which always seems to accompany her. All of these sensory events produce their own unique patterns of brain activity, and produce them whenever you are with your grandmother. According to Hebb, they begin to bond together to form a common network which he termed a cell assembly, so that they are easily activated together. In other words, the scent of the perfume alone is enough to trigger the entire assembly, bringing up the recollection of her countenance and the special way her hair hangs. Maybe you smell it in a store, or come across someone else wearing it, and find yourself savoring images of the dear lady and times you have spent with her. Now, the wonderful thing about the cell assembly notion is that it does not require anything special. Any set of neurons involved in a sensory experience, or a motor activity for that matter, can produce the desired effect. The important question posed by Hebb’s theory is how the bonding takes place between nerve cells. His own answer to this question was that they form larger and perhaps more effective synaptic connections, while other irrelevant connections may dwindle and be lost. It is a molding process not entirely unlike molding processes that take place in the original embryonic development of the brain. This idea, especially in recent years, has found considerable research support. As we will see, however, the formation of new and stronger synaptic connections is not the whole story because it would take too long to wholly account for ordinary remembering. Another suggestion concerning how the brain forms memories was proposed by Karl Pribram in the 1960s, and continues to be a source of creative research and theory. It is that the brain, or small regions of it, operates like a hologram, storing complex patterns of information in waveforms, like holograms store light images in the interference patterns of the waveforms captured by the holographic plate (Pribram, 1991). This view has not been widely accepted, perhaps in part because it is extremely demanding on the researcher, who must be an expert 154 5 A Neuroscience Primer neurological investigator and a first-rate mathematician as well. Pribram’s idea, however, is fully compatible with the notion of cell assemblies, and over the years his laboratory and a few others have continued to creatively pursue this line of thought. It may well be that the time will come when the understanding of the brain will simply require the requisite mathematical knowledge, for which Pribram himself returned to graduate school in mid-career to learn, or that researchers work in teams with mathematicians and engineers, as Pribram himself typically does. In the meantime the authoritative text, Handbook of Affective Sciences, stated, "modern research illustrates how there is no single region of the brain dedicated to emotion but, rather, different aspects of emotion processing are distributed in different brain circuits." (Davidson, Scherer, and Goldsmith, 2003, p. 5), a conclusion in accord with Pribram’s proposal. Let us now return briefly to Mortimer Mishkin. Beyond what we mentioned about him above, he makes an interesting case for the idea that there is a second circuit involved in memory formation, one that does not pass through the medial temporal lobe and on into the limbic system from this point. This suggested circuit is important for the creation of procedural rather than declarative memories, which means that it may be the basis for operant (instrumental) conditioning, if not classical conditioning as well. This route begins in the primary visual area, as before, but does not continue all the way along the pathway through the posterior temporal lobe to the medial temporal lobe. Instead it enters only partially into the temporal lobe then exits into the basal nuclei, here it continues on up to the supplementary motor cortex. There presumably it connects with the primary motor cortex in the production of a response. One interesting thing about this pathway is that, unlike the previous circuit which emerges in the lower prefrontal lobe, it connects with the neocortex only in the supplementary motor area, a region that has been shown paradoxically to be devoid of access to consciousness. In other words, this learning circuit, unlike the other one, is outside of the province of consciousness. This is consistent with many observations that conscious participation is not needed for effective conditioning. Moreover, the point at which this pathway exits the temporal lobe is still upstream, as it were, of the areas in the inferior temporal lobe that have been associated with very high order pattern recognition. One is reminded of the many learning studies from the heyday of behaviorism that emphasized the importance of elementary visual elements, such as color, height in the visual field, size, or orientation, as principle cues for learning. Perhaps these studies were correct, but only for conditioning, which may well engage this second learning circuit of Mishkin’s. Despite the general trend of evidence and theory in favor of widely scattered permanent or long-term memory, there is a trend in current research toward the discovery of particular areas in the brain that seem to be committed to certain types of memories. Some contemporary research, for example, focuses on the activation of different parts of speech by different areas of the cortex. One part of the brain that has been studied in detail in this regard is the hippocampus, which seems to play a unique role in spatial memory. This includes "cognitive maps" and memories for where one is in space. In the 1970s John O’Keefe, at University College, London, found cells in the hippocampi of rats that seemed to respond when the animal was in a particular location in a maze. It is as if these cells had a receptive field, like those of visual cells, for that unique place (O’Keefe and Nadel, 1978). Work done since then at a number of laboratories has established the importance of the hippocampus in memory for locations. The term mental imagery refers to the experience of a perception in the absence of a cor- responding physical stimulus. In everyday life, mental imagery represents a crucial element of various cognitive abilities, such as object recognition, reasoning, language comprehension, and memory. Because of its importance, the exact processes associated with imagery have long 5.10 Learning and Memory 155 occupied cognitive psychologists. Mental imagery is accompanied by the activation of fronto- parietal networks, but the exact brain areas engaged in imagery depend on the specific features of the imagery task. When spatial comparisons between imagined objects are required, most functional imaging studies show bilateral parietal activation in the parietal lobes of both the left and right hemispheres. However, neuropsychological studies on patients with focal brain lesions generally support a dominant role of the left hemisphere (in right-handed individuals). In addition, each parietal lobe has a distinct functional role at different moments in time; the sequential parietal activation appear to represent a transition from an earlier, more distributed processing stage of image generation to a later, right hemispherical lateralized state of spatial analysis of the images. In one experiment, research participants were asked to imagine two analog clock faces based on acoustically presented times (e.g., 2 o’clock and 5 o’clock) and to judge at which of the two times the clock hands formed the greater angle. Using transcranial magnetic stimulation to determine which distinct aspect of mental imagery was carried out by which lobe, it was discovered that the left parietal lobe was predominant in generating mental images whereas the right parietal lobe specialized in the spatial comparison of the imagined content (Sack, Camprodon, Pascual-Leone, and Goebel, 2005).

5.10.2 The time course of memory formation

In the 1960s, a common method for studying memory in the laboratory was to expose a rat to a one-trial learning situation, then give it an electroconvulsive shock (ECS) at varying intervals afterward. The one-trial learning situation usually involved nothing fancier than to place the rat in a well-lighted chamber with a small dark chamber off to the side. Rats prefer the dark and immediately move into the dark chamber. There, however, it receives an uncomfortable shock to its feet, causing it to rush back out into the light. Ordinarily it only needs to experience this situation once to stay out of the small room on future trials. If, however, the rat is quickly given a brief ECS to the brain, which can be administered through external electrodes placed near the eyes, it later returns to the dark room as if nothing had happened. If the investigator waits long enough before administering the ECS, however, the rat retains the learning. "Long enough" usually means several hours. The fact that immediate ECS destroys the memory, while delayed ECS does not, argues that the formation of the memory must pass through at least two distinct stages, one sensitive to electrical shock and one not. Actually, there is nothing unique in this regard about electrical shock. A blow to the rat’s head would probably be as effective in producing retrograde amnesia for the learning experience, as often is the case with boxers or people struck in the head in traffic accidents, who then cannot recall the events leading up to that moment. Indeed, it seems that anything that disrupts ongoing brain processes will do the trick. Now, if ECS is administered during the first 10 seconds or so after the one-trial learning experience, the learning is irrevocably lost. This suggests that for at least that long the memory is being held in active patterns of discharge of neural cells. If the ECS is administered after the first few seconds, the animal shows some partial degree or retention later. If it is administered 12 or more hours afterward, there is no loss at all. The above suggests that there is an intermediate stage to memory formation when memories are not entirely vulnerable to disruption, but still not completely formed as permanent memory. In other words, it would seem that there is a period of consolidation into long-term memory that 156 5 A Neuroscience Primer begins a few seconds after the learning experience itself and continues on until the permanent memory is completely established several hours later. Moreover, from what we have learned so far, it would seem that the region of the rhinal cortex would be one likely place to look for this process. In fact, two Europeans, Tim Bliss and Terje Lomo (1973), have reported just such a process in the hippocampus of the rabbit. The process itself is termed potentiation. It refers to the fact that certain patterns of stimulation at synapses between hippocampal neurons increase the sensitivity of these synapses for several hours afterward. In other words, potentiation effec- tively creates a facilitated synapse. This comes about through chemical changes there, which could well be the intermediary phase of memory between the sensitive first few seconds and the resilient permanent stage. Putting all this together it would seem that memory formation involves three stages. The first is an active one in which information is retained by actual patterns of discharge in critical neurons. This stage may very well be like repeating a phone number until it is stored. The second stage is more secure, but still vulnerable. It may involve the potentiation of critical circuits in the hippocampus, and probably elsewhere. During this stage the actual long-term memories are being formed as cell assemblies in different locations in the cortex, depending on the type of memory. Once this latter memory is formed, its wide distribution and anatomical foundation make it resistant to all but the most violent traumatic assaults. The potential for human neuroimaging to discern the detailed contents of a person’s has yet to be fully explored. For example, fMRI has identified the neural systems involved in an active forgetting process that keeps unwanted memories out of awareness. Does this contradict Freud’s notion that "repression" excludes them from awareness? Or is this "active forgetting process" an updated term that covers ground similar to the term "repression"? In any event, the process is associated with increased dorso-laterial prefrontal activation, reduced hippocampal facilitation, and impaired retention of the "unwanted" memories; both prefrontal cortical and right hippocampal activation predicted the magnitude of forgetting. Post-Freudian theories of forgetting assume that it is a "cue-overload" phenomenon, an assumption supported by paired associates in laboratory procedures. An alternative interference theory holds that recently formed memories that have not yet had a chance to consolidate are vulnerable to the interfering force of mental activity and memory formation, even if the interfering activity does not involve material similar to what was previous learned (Wixted, 2005). This account helps to explain why sleep, alcohol, and benzodiazepines all forestall the forgetting of a recently learned list of words, and is consistent with recent work on the variables that affect the induction and maintenance of long-term potentiation in the hippocampus. You might ask what practical results have emerged from neuroscientific research. Perhaps the most significant collection of data for students leads to the conclusion that growing old does not mean that you have to succumb to poor memory, slower cognitive reactions, and fuzzy thinking. "Cognitive enhancement drugs" are in production and some (e.g., Modafinil) are already available. But in the meantime, you can eat a hearty breakfast, ingest foods filled with antioxidants (e.g., salads, berries), take advantage of the "Mozart effect" by listening to (or playing) classical music (its effect on the brain is still unclear but it will improve your mood and, if you play it yourself, your motor coordination), engaging in such mental tasks as crossword puzzles and "brain teasers," using mnemonic devices to improve your memory (e.g., remembering names of new acquaintances by connecting their name to one of their physical characteristicsÑ "Adrian" might look as if he needs "aid"; "Mercedes" might dress so stylishly that she might be able to afford a Mercedes automobile), put important items (your driver’s license, car keys, etc., in the same place when you are not using them), get a good night’s 5.11 Language and Consciousness 157 rest and retire at about the same time each evening, engage in half an hour of yoga or other physical exercises at least five times each week, set aside time for meditation or prayer daily, and find a social support group with whom you can converse and interact. The cerebellum is the foundation for learning new skills and its work can be enhanced by these common sense practices. The cerebellum is the foundation for learning new skills and its work can be enhanced by these common sense practices. There are several neurological anomalies that run in families. One of these is prosopagnoisa, or face blindness. These individuals cannot easily tell faces apart, even when they belong to people they know. This condition can also result from strokes, and has been linked to defects in the fusiform face area of the brain. Here we end our brief discussion of memory and the brain. We wish to point out, however, that there is much more to this topic for the interested student. There is an entire history of research on the neurochemistry of memory, as well as a large clinical literature only touched upon here. It is rich material for investigation and speculation.

5.11 Language and Consciousness

The intellect is an organ composed of several groups of functions, divisible into two important classes, the functions and facilities of the right hand, and the functions and facilities of the left hand. The facilities of the right hand are comprehensive, creative and synthetic; the facilities of the left hand are critical and analytic. To the right hand belong judgment, imagination, mem- ory, observation; to the left hand comparison and reasoning. The critical faculties distinguish, compare, classify, generalize, deduce, infer, conclude; they are the component parts of logical reason. The right-hand facilities comprehend, command, judge in their own right, grasp, hold, manipulate. The right-hand mind is the master of the knowledge; the left-hand mind is the servant. The left hand touches only the body of knowledge; the right-hand penetrates its soul. The left-hand limits itself to ascertained truth, the right-hand grasps that which is still elusive or unascertained. Both are essential to the completeness of the human reason. (Aurobindo, 1924/1986, p. 16) Combining the topics of language and consciousness may seem curious to the modern reader, but in the history of the study of the brain the two have often been very much intertwined. For example, as recently as the early 1980s, the great British neurophysiologist Sir John Eccles (1989) believed only the left hemisphere of the brain to be connected to the conscious self, while the mute right hemisphere, with little or no language, was thought to be without conscious experience. Discussion about the right and left hemispheres and their roles in thought, language, and consciousness, goes back many years. For example the 19th-century physicist turned experi- mental psychologist, Gustav Fechner, believed that if the corpus callosum, the large fiber tract connecting the two cerebral hemispheres, were to be severed, two distinct conscious minds would result, each with its own thoughts and experience. In 1911 the American psychologist, William McDougall, asserted the opposite view, that consciousness must of necessity be unitary even if the corpus callosum were severed. He went so far as to volunteer himself for such an operation if he should ever contract a terminal illness. Fortunately for him, but not for us, he acquired no such illness, and the question was not addressed again for many decades. As we have seen, neurologist Paul Broca discovered a language production center in the posterior aspect of the frontal lobe, now termed Broca’s area. During the 1860s and 1870s he 158 5 A Neuroscience Primer observed that in the majority of his patients, though not all of them, it was located on the left side of the brain. About this time, the German neurologist Carl Wernicke discovered an area in the upper posterior border of the temporal lobe that was vital to the ability to understand spoken and written language as well as to the production of grammatically sensible sentences. This region, now termed Wernicke’s area, was also found most often on the left side. Thus by the turn of the 20th century, it was well-known that the left hemisphere was essential for language, and by implication, for all the higher functions associated with it. These include writing, logic, mathematics, and reason in general. It was not surprising that the renowned British neurologist Hughlings Jackson came to refer to the left as the "leading hemisphere." The only real question concerned the right cerebral hemisphere. What did it do? More recent investigations have continued to support the idea that the left hemisphere, for most people anyway, is the "dominant" one, with language and thus intelligence at its disposal. Clinical brain trauma cases often support this view. A stroke (the blockage of a blood vesicle that temporarily or permanently damages brain tissue) in the left hemisphere, for example, can be totally debilitating, leaving an individual without the ability to speak, write, or even understand complex spoken or written language. A similar stroke in the right hemisphere may produce almost no apparent loss of function. Wilder Penfield, mentioned in the previous chapter, observed a number of cortical locations at which a light electric current causes a person to have difficulty speaking or articulating sentences. These were typically in the left hemisphere, in or near the areas discovered by Broca and Wernicke. In the 1960s, an even clearer picture became available with the Wada technique, by which a mild anestheticÑoften sodium amytalÑis injected into the carotid artery, which affects the brain in such a fashion as to anesthetize one hemisphere while leaving the other groggy but awake. For most individuals, this procedure leads to an abrupt cessation of speech if the anesthetic is supplied to the left hemisphere, while providing it to the right hemisphere leaves the patient sluggish but communicative. For a few people this situation was found to be reversed. The Wada procedure was, and still is, important to neurosurgeons, allowing them to know prior to surgery which hemisphere contains the language functions, the one they typically want to avoid if at all possible. The outcome of many observations using the Wada technique, taken along with the previous information about the brain, above, was that for most people a single hemisphere is by far the most important for language skills, and that for the vast majority this hemisphere is the left one. All of this still failed to disclose the importance of the right hemisphere. Hughlings Jackson had suspected that the right hemisphere was particularly important for sensory functions; in this he was not entirely incorrect. During the 1960s, another Canadian neurologist, Brenda Milner made an extensive examination of the effects of both left and right hemisphere damage in many clinical cases. What she discovered was that damage to the right hemisphere resulted in various types of deficits that typically had to do with perception. Indi- viduals with right brain damage had great difficulty with putting simple puzzles together, or drawing simple two or three dimensional figures so that they looked like the originals. They also seemed unable to picture figures as they might appear if rotated in one direction or another. These and other results appearing in the late 1960s and 1970s led to the general feeling that the right hemisphere was vastly superior to the left in perceptual tasks, and tended to operate in a holistic or synthetic fashion, taking in all the elements of an image at once, as opposed to the more linguistic and analytic approach of the left hemisphere. More recent research supports this view in general, but tempers it with the finding that both hemispheres are capable of some degree of perception, and the holistic aspect of right brain function was probably over-stressed, especially in the popular media of the day. 5.11 Language and Consciousness 159 5.11.1 Language centers

During the 1970s, Norman Geschwind (1979), at Boston University, developed a clinically-based theory of language processing in the left hemisphere of the brain that was derived in part from the writings of Carl Wernicke. His theory, called the Wernicke-Geschwind model, was directed toward explaining various prominent types of aphasia or loss of language function, as a result of damage to the brain. Among the types of aphasia of particular interest in the construction of this model were Broca’s aphasia, also termed expressive aphasia, and Wernicke’s aphasia, also termed receptive aphasia, as well as conduction aphasia. The first of these is associated with damage to Broca’s area in the posterior frontal lobe near the junction of the frontal, temporal, and parietal lobes. (Reviewing a text diagram would be useful here.) The second results from damage to Wernicke’s area at the posterior and superior extreme of the temporal lobe. Conduction aphasia results from damage to the arcuate fasciculus, a neural pathway that connects Wernicke’s and Broca’s areas. Victims of Broca’s aphasia have difficulty forming clear sentences. They often speak single, separated content words in a slow "telegraph" style, giving the impression of difficulty in making the muscles of the mouth, tongue, and throat work properly. This difficulty with the motor side of speech led early researchers to suspect that Broca’s aphasia involved a dysfunction in the primary motor cortex adjacent to Broca’s area. This is not the case, however, since these same muscles can be used in other activities without difficulty. People with this disorder have difficulty repeating sentences spoken to them, sometimes substituting incorrect words, or non-words that sound roughly like the original. These are called paraphasic errors. Persons with Wernicke’s aphasia, on the other hand, will speak rapidly, effortlessly and fluently, but make little sense. Unlike the Broca’s aphasics, however, they do not seem to notice the unusual quality of their own speech. They often do not understand simple verbal instructions as would a Broca’s aphasic. Indeed, it appears that their inability to produce grammatically sensible strings of words is somehow related to the inability to process what they hear. The deficit, however, is limited to language, and does not involve any other kind of hearing dysfunction. Conduction aphasics have good comprehension for spoken language, and fluent grammatical speech, but have difficulty repeating sentences spoken to them. They also have word retrieval problems, often stopping in the middle of a sentence to search for the right word. Now let us use the Wernicke-Geschwind model to explain these aphasias. Damage to Broca’s area causes expressive difficulties because this region of the left hemisphere is responsible for the initial structuring, or phrasing, of spoken sentences. It communicates with the motor areas that organize the final patterning of muscle activity. Wernicke’s area, on the other hand, translates spoken words into meaningful units. It receives input from the nearby auditory cortex and extracts its meaning. Damage here causes speech sounds to lose their meaning. For this reason, receptive aphasia involves no loss of auditory discrimination except the meaning of words. It is evidently because of the absence of the ability to understand speech that such persons do not notice the strange run-on quality of their own conversation. The angular gyrus is at least partially responsible for the human ability to understand metaphor. Research data from the University of California, San Diego, indicated that patients with damage to this part of the brain showed gross deficits in comprehending such sayings as "the grass is always greener on the other side of the fence." The metaphorical meaning escaped them, and they provided the examiners with literal interpretations. 160 5 A Neuroscience Primer

According the Wernicke-Geschwind model, the angular gyrus, located adjacent to Wernicke’s area at the junction of the occipital, temporal, and parietal lobes, receives input from the adja- cent optic cortex, responding to written words by essentially translating them into a facsimile of spoken words, then passing them on to Wernicke’s area which acts on them much the same fashion as it does spoken words. No matter which source the words come from, Wernicke’s area acts on them to give them meaning. It then communicates with Broca’s area, sending something like a copy of the now meaningful speech. Thus, Broca’s area in the normal brain can produce a repetition of sentences heard by the person. If this communication, carried out over the arcuate fasciculus, is interrupted by damage to the latter, the result is an inability to repeat sentences that have just been heard, even though the person has a perfectly good understanding of them, and also can produce normal spontaneous speech. The Wernicke-Geschwind model is powerful and appealing. It represents a major first step in understanding language processing in the left hemisphere. It is, however, not perfect. For example, it does not explain why most aphasics actually exhibit some degree of difficulty both with understanding speech and producing it. Moreover, it is now understood that words which are read can communicate to Broca’s area without passing through Wernicke’s area at all. It is simply not the case that all reading involves something like internal speech. Modern imaging techniques are rapidly expanding the boundaries of the classical speech areas of the left hemisphere. For instance there appears to be an area in the inferior frontal lobe important for the making of semantic associations. This whole field is moving forward very rapidly at the time of this writing, and makes a fascinating area for further investigation. Another point before leaving this topic concerns the incidence of left hemisphere language representation in the brain. differ from study to study, but generally show that about 99% of right-handed individuals have language in the left hemisphere. Roughly 66% of the left-handers also have left hemisphere language as well (as do speakers of Japanese and other languages that incorporate several "nature-like" sounds). Of the remaining left-handers, roughly half have their language function in the right hemisphere and the other half have scattered language ability in both hemispheres. It is fairly clear that right-handedness is usually inherited, and with it left brain language potential. In the rare instances of right hemisphere language in a right-handed person, it is likely that some degree of trauma was experienced by the left hemisphere early in development. The situation is not so clear in the case of left-handers. Some are beyond doubt the product of early trauma to the left hemisphere, while others are probably not. The origins of left-handedness are not entirely understood, and a variety of theories remain to be compared and researched. There is also a gender difference in the laterality of brain function, with women exhibit- ing some language ability in both hemispheres, and men tending to specialize in just one or the other, though usually the left. Women experience less language loss from left hemisphere strokes, and recover more quickly than men. They also have a larger corpus callosum, suggesting that they have more informational traffic between the hemispheres. These observations have attracted surprisingly little attention, but seem to the present writers to be rich with possible implications. There are also data indicating that some languages (e.g., Japanese, Hawaiian) tend to defy strict localization in one hemisphere because of their inclusion of many words that resemble sounds from nature as well as emotional expressions (Tsunoda, 1985). There is also and association between various types of creativity and right hemisphere activity (e.g. Weinstein and Graves, 2002). Finally, a team of researchers at the University of Wisconsin-Madison received permission from the Dalai Lama to test some of his long-term meditators (Lutz, Greischar, Rawlings, 5.11 Language and Consciousness 161

Ricard, and Davidson, 2004). In comparison with a control group, they found the meditators able to self-induce high amplitude gamma brainwave synchrony during meditation. Additional data from the monks suggested that such emotions as compassion and joy are teachable, and that meditation can alter the brain in long-lasting ways (Davidson et al., 2003). Similar results have been reported from studies of Carmelite and Franciscan nuns (Beauregard and O’Leary, 2007, pp. 255-285). These studies are examples of a field often referred to as "neurotheology," one that is not confined to monks and nuns. Time Magazine (December 3, 2007) ran a cover story, "What Makes Us Good/Evil?" that emphasized‘ neuroscientific data, including the notion of a "moral grammar," the propensity for which is built into the brain. And Young (2002), in another popular article reported that part of the brain is devoted to detecting cheating.

5.11.2 Consciousness and the brain

One line of research that has been of particular importance to the understanding of conscious- ness and the brain has dealt with "split-brain" individuals, those who have had their corpus callosum surgically cut, thus effectively separating the brain into two independent halves above the level of the diencephalon. This operation has been used in certain instances of intractable epilepsy as a last ditch effort to contain the seizures within a single hemisphere. Interestingly, the seizures usually disappeared completely following the operation for reasons that are still un- known. But the fascinating question that remained was whether or not these split-brain people experience two independent streams of consciousness. As noted above, speculations about the results of such an operation date back as far as Gustav Fechner and William McDougall, but the first actual series of operations was done in the 1940s. Subsequent studies of the patients themselves failed to find anything unusual about them at all. One commentator with a sense of humor observed that the sole function of the corpus callosum was evidently to keep the two cerebral hemispheres from falling apart! Not until the 1960s, when neurosurgeon Joseph Bogen teamed up with neurobiologist and Nobel laureate Roger Sperry (1974) on a similar series of operations did the reality of divided consciousness in these patients come to light. Sperry devised an ingenious series of experiments in which stimuli were presented to either the left or right hemispheres, but not both. For example, a familiar object such as a comb was placed out of sight in one hand or the other. If it was the right hand, controlled by the left hemisphere, the person could name the object. If it was placed in the left hand, controlled by the right hemisphere, however, they could not. In either instance, the opposite hand could not identify the object. In more elaborate experiments, visual stimuli such as pictures of faces or simple words were flashed on a screen to the right or left of the person’s point of fixation. Since the visual system is arranged in such a way as to communicate the left half-retina to the left half of the brain, and vice versa for the right half-retina, what was seen to the right of the fixation point was transmitted to the left side of the brain, and what was seen to the left to the right side of the brain. This opened many possibilities. For instance, a picture shown to the left hemisphere resulted in an accurate verbal report, though when the same picture was shown to the right hemisphere the person reported that they saw nothing. Further testing, however, demonstrated that the left hand could identify the picture from a group of similar ones. ones. All of this seemed to leave little doubt that some kind of responsiveness existed in each of the hemispheres, but so far it was not clear to what extent the right hemisphere exhibited 162 5 A Neuroscience Primer anything that could be called consciousness or even mental activity in the usual sense. Sperry (1977) maintained that each isolated hemisphere had a mental life of its own, each proceeding in parallel with the other. Michael Gazzaniga, his colleague and one-time student, believed the left hemisphere to contain pretty much the entire brain’s stock of intelligence because of its language ability (Gazzaniga and LeDoux, 1978). Also, it was often observed in split-brain people that the left "dominant" hemisphere seems to control behavior most of the time. Gazzaniga had observed many Wada procedures used in preparations for brain surgery, and had seen for himself that the isolated right hemisphere in this situation shows few signs of what ordinarily would be called intelligence. He was skeptical. As indicated above, Sir John Eccles was even more skeptical, believing that the right brain was simply not conscious. In the 1970s, Sperry and his colleagues (1979) performed an experiment that settled the debate in his own favor. Using the procedure mentioned above, he presented photographs se- lectively to the right brains of split-brain individuals, ones that he had secretly obtained from their own family albums. He also included certain other photographs, some of historical figures like Abraham Lincoln and John F. Kennedy. These were pictures chosen to elicit emotional responses. Now, since the limbic system is not divided in split-brain persons, the reactions of the right hemisphere, the one seeing the pictures, can also be read, as it were, by the left hemi- sphere, which can report them out. The result of this procedure was striking. The participants produced appropriate and emotionally intelligent responses to old family pictures of embar- rassing situations, relatives long past, and historical figures, often guessing the context with surprising accuracy. For instance, photographs of relatives were successfully identified from the emotional tone alone. One participant actually identified a photograph of Adolph Hitler from its emotional quality. These results left little doubt that the right hemisphere is capable of conscious intelligent responding, though it does not control the language faculty. A broad reading of the research on split-brain individuals leaves the impression that they exhibit a dual-consciousness. The left hemisphere supports the primary personality, which re- tains ordinary language abilities, exhibits a strong sense of self, and exerts control over most of the person’s behavior. The right hemisphere, on the other hand, retains the lion’s share of the ability for holistic visual thinking, for art and music, and possibly for intuition and emotional expression. Both seem to have a normal degree of social awareness. Apparently, when the corpus callosum is cut, the conscious mind itself is indeed bisected, creating two separate streams of experience. Within a few weeks after the operation, however, a tacit division of labor seems to come about between the hemispheres, in which various daily tasks are divvied up between them. For instance, the left hemisphere does the talking, reading, and writing, while the right hemisphere seems to slip into a very passive state, as indicated by a slowing of the EEG pattern on the right side. If the person is doing mechanical work, drawing, or mixing food, however, the left brain falls into a more or less quiescent state and the right brain becomes more active. Thus, there is rarely competition between the two "skull mates"Ñas Gazzaniga calls themÑover an individual’s behavior. the talking, reading, and writing, while the right hemisphere seems to slip into a very passive state, as indicated by a slowing of the EEG pattern on the right side. If the person is doing mechanical work, drawing, or mixing food, however, the left brain falls into a more or less quiescent state and the right brain becomes more active. Thus, there is rarely competition between the two "skull mates"Ñas Gazzaniga calls themÑover an individual’s behavior. 5.11 Language and Consciousness 163 5.11.3 Why consciousness?

Most neuroscientists today, like most psychologists, are functionalists of one sort or another. In other words, they believe that if a biological structure or function has survived the long haul of evolution, then it must serve some useful function. What, then, is the function of consciousness? One contemporary theorist who has spent a great deal of time thinking about such matters is the cognitive psychologist Bernard Baars (2001). He theorizes that consciousness plays an essential role in distributing vital information broadly throughout the many operational systems of the brain, systems such as perception, memory, emotion, and the like. In his own words, "conscious experience involves a global workspace, a central information exchange that allows many different specialized processors [read: systems] to interact. "Global" in this context simply refers to information that is usable across many different subsystems" (Baars, 1988, p. 43; also see Baars, 1983). Information of sufficiently broad interest, here, would include anything of real significance to a person, but particularly novelty. In this view, whatever captures our conscious attention becomes available to many if not all the subsystems of the brain. Michael Gazzaniga’s (1988) views on consciousness are similar to Baars’, but run in the opposite direction. Gazzaniga sees consciousness as a single readout of information from the many separate subsystems of the brain. Such a readout would presumably be available, in return, to the other subsystems themselves, so that in reality consciousness ends up performing precisely the same information distributing function suggested by Baars. For Gazzaniga, however, what is most important about consciousness is its ability to report one’s internal states to others and to oneself. It does this through its close association with a particular brain module which he believes to be located in the left hemisphere. He terms it the . It is the responsibility of the interpreter to report to consciousness the output of other brain processes such as memory, emotion, and perception. In doing this, it makes a big picture of what is going on in the brain as a whole, a single interpretation of the information it gets from throughout the brain. This notion has practical implications for understanding human behavior. It seems that the interpreter must perform its function no matter how sparse or flawed the input it receives might be. As a consequence it is the hardest thing in the world for people to simply have no opinion about something. We always have beliefs and opinions, whether we have information or not! whether we have information or not! Moreover, the interpreter is tenacious when it comes to making sense out of what goes on in our own nervous systems and will brook no criticism, even when faulty information from malfunctioning regions of the brain may provide it with defective readouts. For example, Gazzaniga reports the case of a woman hospitalized at the Memorial Sloan-Kettering Hospital in New York City because of damage to her right parietal lobe. She seemed intelligent and aware except for one thing; she insisted that she was still at her home in Freeport, Maine. Nothing could convince her otherwise. Gazzaniga finally pointed to some big hospital elevators that could be seen just outside her room. He asked, "And what are those things over there?" She answered, "Those are elevators. Do you have any idea what it cost to have them put in here?" Evidently this curious behavior was the result of the malfunction of some brain subsystem responsible for keeping track of one’s physical location. Even though its output was dramatically distorted, the interpreter continued to function as usual, oblivious to this gaping error. A somewhat different view of consciousness in relation to the brain is the idea that it emerged as a "higher order" process when the evolution of neural complexity reached some critical value. This idea, with its many variations, is common among artificial intelligence theorists, especially those who believe that sufficiently complex computers may also acquire consciousness. Perhaps 164 5 A Neuroscience Primer the most elegant advocate of this notion was Roger Sperry. He believed consciousness to be a dynamic and emergent property of the uppermost levels of brain organization. He writes, “The causal power attributed to the [conscious mind] is nothing mystical. It is seen to reside in the hierarchical organization of the nervous system combined with the universal power of any whole over its parts....The whole has properties as a system that are not reducible to the properties of the parts.” (Sperry, 1987, p. 101) For Sperry, consciousness exhibits its own unique properties while at the same time exerting "downward" control over the neural subsystems of the brain. In a sense this view of consciousness is potentially inclusive of those we already have seen, because it allows consciousness to fulfill both Baars’ and Gazzaniga’s informational functions, while allowing it the active role typical of our individual experience of it.

5.11.4 The enigma of consciousness

One of the great enigmas in the history of philosophy is often called the "mind-brain problem," namely, how something material like the brain can be so intimately associated with something as immaterial and subjective as consciousness. Today this problem has come onto the main- stage of the neurosciences. To explore it in depth would warrant a whole course in itself. Here we will point out that recent conferences and publications on the brain have focused on the variety of possibilities that might account for this enigma. Many of these are more philosophical than physiological. Philosopher (1991), for instance, has written a much discussed book, Consciousness Explained, in which he essentially argues against the idea of consciousness as ordinarily thought of, and emphasizes the notion that our common ideas about consciousness are largely illusory. Another philosopher, David Chalmers (1996), argues that consciousness as subjective experience is, in fact, a valid idea. He further speculates that consciousness is the product of information, of which the brain has a great deal. In contrast to the above views, which are essentially philosophical, anesthesiologist Stewart Hammeroff and mathematician Roger Penrose (see Penrose, 1994) have developed a theory of consciousness which views it as a property of the tiny neurotubules found in the axons of nerve cells. The theory is based on certain assumptions about the collapse of "quantum gravity" at a critical threshold of neural volume. (Penrose is a Nobel laureate and so can produce such theories with a certain grace! In any case his books are excellent reading.) Indeed, there is a whole class of recent theories that equate consciousness with quantum phenomena, and more than one Nobel laureate is involved with them. At the same time, Sir John Eccles in England continued to promote a dualist theory of an independent interacting mind and body, a view which he originally developed working with the prominent European philosopher . He utilized quantum theory as well, but in his case to explain how the conscious self communicates with the material brain. Most contemporary neuroscientists reject dualism and argue that conscious states arise from physical events in the brain, notably the electrical impulses in neurons of the cerebral cortex. Christof Koch (2004) has provided a lucid argument for this description, even though he does not really explain how patterns of electrical activity can produce consciousness. Many aspects of the relationship of consciousness and the brain are currently under explo- ration. It is an exhilarating time, and rich with possibilities for further exploration. 5.12 Pathology and the Brain 165 5.12 Pathology and the Brain

“The realization... that the mammalian brain produces new neurons well into adulthood was a revolution in neurobiology.... Now, researchers may have a powerful new... non-invasive method for detecting neural stem cells in live mammals –including humans." (Miller, 2007, p. 899) There is a great deal of excellent research being done today on behalf of better understanding the role of the brain in a variety of behavioral dysfunctions. Progress in this general field has recently led to the full recognition by the American Psychological Association, in the summer of 1996, of a distinct field of psychology, Neuropsychology, dedicated specifically to the diag- nosis and treatment of such disorders. Biological approaches to affective disorders have shown significant progress due, in part, to advances in neuroscience (Depue and Iacono, 1989). Brain-related pathologies range from Alzheimer’s disease to the schizophrenias. Some of these, such as strokes and closed head injuries, are the result of tissue damage. Others, such as chronic depression and excessive anxiety are associated with elevated or depressed levels of transmitter chemicals and neuromodulators. Other degenerative neurological disorders such as Parkinson’s disease and multiple sclerosis affect motor skill function and movement, among other symptoms. Whereas the cause of multiple sclerosis is known to be a breakdown of myelin, the sheath that covers nerves in the brain and spinal cord, the cause of Parkinson’s disease remains unknown, although depletion of the neurotransmitter dopamine is involved. In other brain dysfunctions, such as dyslexia and attention deficit disorders, the root causes are still a matter of conjecture. In virtually all cases except those produced by explicit physical damage, it is not clear the extent to which environmental factors contribute to, or diminish, the degree of the dysfunction. With these realities in mind, it is not surprising that the study of brain-based pathologies does not lend itself to many broad generalizations. Rather, the interested student must carefully examine each in its own right. Neuroscience has been criticized for its lack of applicability, especially for those suffering from neuropsychological disorders. However, there are now several attempts to apply neuro- science technology to the treatment of brain pathology. Researchers at the New York State Psychiatric Institute at Columbia University have used a technique, Repetitive Transcranial Magnetic Stimulation (rTMS), which aims a powerful magnet at a spot on the brain to "reset" the neural circuits that trigger episodes of profound depression. Treatment typically lasts 1 hour, five times per week, for 6 weeks. There is some evidence suggesting that rTMS may also be useful in treating anxiety disorders, the aftereffects of strokes, some types of schizophrenia, and perhaps epilepsy. aftereffects of strokes, some types of schizophrenia, and perhaps epilepsy. Sophisticated electromagnets, similar to those used in fMRI brain scanners, are utilized in rTMS; it is not the magnetic pulses that affect the brain but the electrical currents that the pulses induce. Treatment does not trigger seizures, as does electroconvulsive "electroshock" therapy, so there is no need for muscle relaxants or anesthesia. Because the brain is both an electrical and a chemical organ, electroconvulsive therapy, despite its troubling side effects, is still an effective treatment for many people who cannot be helped by such medications as Prozac and Zoloft that only address chemical imbalances. The mechanisms of rTMS are not clearly understood but appear to be related to the supposition that it corrects "imbalances" in the brain; applying periodic bursts of electrical current at the cortex may "reset" the neural networks in a process akin to rebooting a computer (Gorman, 2005). 166 5 A Neuroscience Primer 5.12.1 Depression

Much of what is known about the biochemistry of psychological disorders is due to the history of which agents have been effective in treating them. Depression is such a disorder. Let us note, however, that there are many types of depression, or conditions that pass for depression. These include perfectly normal periods of grief as well as seasonal depression, and bipolar disorders in which depressive episodes are interspersed with manic intervals. Here we will focus on chronic depression that lasts for significant periods of time with no obvious cause. It is worth noting that such depression is not always identified by feelings of sadness, but may be heralded by a loss of appetite, spontaneous episodes of crying, or difficulty with sleep. Such depression has a significant genetic component, as shown by studies of siblings and identical twins. It is twice as common among women as men. About 20% of women in the U.S. experience at least one major bout with it during a lifetime. Nevertheless, there is impressive evidence that this gender difference may be more cultural than biological. The story of the chemical theory of depression began with the introduction of a traditional Ayurvedic medication, reserpine, for the treatment of schizophrenia. It was soon found to elicit depression-like symptoms. An examination of its biochemical effects in the brain suggested that it causes the depletion of a class of molecules called biogenic amines. A second related line of evidence came from the observation that depression is relieved by certain chemicals (monoamine oxidase inhibitors) that inhibit the oxidization of monoamines (amine molecules with one amino group, such as the neurotransmitters dopamine, norepinephrine, and serotonin). This led to the monoamine hypothesis of depression, stating that depression may be the result of a lowered level of these neurotransmitters. A third line of evidence involved the effectiveness against depression of another type of drug, tricyclic and heterocyclic antidepressants, which block the reuptake at the synapse of these three neurotransmitters. This causes their concentration to build up at synapses, increasing their effectiveness. Recently a new class of antidepressants has been developed that specifically targets the reuptake of serotonin. They seem very effective and do not have some of the undesirable side effects of the tricyclic and heterocyclic agents. These include Prozac, Paxil, and Zoloft. Exactly why the specific targeting of serotonin is so effective is not known. One suggestion is that normal levels of this neurochemical tend to stabilize the neuronal systems involved in depression, but low levels allow them to be driven freely by the norepinephrine level. In addition, a body of research demonstrates that psychotherapeutic interventions for depression may yield equally impressive long-term results. reuptake of serotonin. They seem very effective and do not have some of the undesirable side effects of the tricyclic and heterocyclic agents. These include Prozac, Paxil, and Zoloft. Exactly why the specific targeting of serotonin is so effective is not known. One suggestion is that normal levels of this neurochemical tend to stabilize the neuronal systems involved in depression, but low levels allow them to be driven freely by the norepinephrine level. In addition, a body of research demonstrates that psychotherapeutic interventions for depression may yield equally impressive long-term results. In reality, this view of depression is understood to be oversimplified. There are questions, for example, about whether antidepressants are chemically doing what they are said to be doing. Beyond this, there are other theories of depression, for example, that conceptualize it as a dysfunction of the normal circadian cycle. For instance, sleep seems to involve at least two distinct biological clocks, or pace-makers, one that times REM sleep and one that times non-REM sleep. There is evidence that in at least some cases these clocks get out of phase in depression. One way to reset them is to wake up each morning at the same time and expose 5.12 Pathology and the Brain 167 oneself to a strong wide-spectrum light like the sun. Of course, sunlight is not readily available on winter mornings, so a good artificial wide-spectrum light can substitute for it. Interestingly, depression seems to occur at a surprisingly high incidence among certain highly creative individuals, especially writers. Clearly there is much more to learn about it. In the meantime, the available data demonstrate the arbitrary nature of the separation of "physical" and "mental" disorders.

5.12.2 Alzheimer’s disease

Almost two million Americans presently suffer from Alzheimer’s disease. With the progres- sive aging of the population, this number could potentially grow to enormous proportions. Alzheimer’s disease is a form of dementia, in other words, a severe and relatively rapid deteri- oration of mental faculties, often accompanied in the early stages by comparatively few losses. It begins with a deterioration of memory for recent events. Memory impairment becomes in- creasingly profound until the person can no longer keep track of where they are or what they are doing. Eventually they cannot communicate effectively. Anatomically, the progression of the disease is associated with severe deterioration of the cortex, especially the frontal, temporal, and parietal areas. These changes are preceded by decreases in brain activity in these regions, as seen in PET scans of them. Eventually subcortical areas also become involved. Microscopic studies of the brain cells of Alzheimer’s patients present some dramatic changes. Some cells show neurofibrillary tangles, arrays of jumbled filaments. Eventually, patches of de- generating axon terminals and dendrites, termed senile plaques, are found which contain a char- acteristic substance called beta-amyloid, produced by the cleavage of large protein molecules. Some researchers have suggested that this substance is toxic, while others disagree. Interest- ingly, these cellular features are also observed in the brains of individuals with Downs syndrome. There is evidence that Alzheimer’s disease involves the loss of function in neurons that utilize the neurotransmitter acetylcholine, especially in the subcortical region of the basal forebrain called Meynert’s nucleus, which shows dramatic deterioration in Alzheimer’s patients. Axons of cells in this area normally extend widely throughout the cortex and contain acetylcholine. disease involves the loss of function in neurons that utilize the neurotransmitter acetylcholine, especially in the subcortical region of the basal forebrain called Meynert’s nucleus, which shows dramatic deterioration in Alzheimer’s patients. Axons of cells in this area normally extend widely throughout the cortex and contain acetylcholine. The causes of Alzheimer’s disease are not known. Heredity plays an important role in some cases, especially those that involve an early onset. In other cases heredity seems much less of an influence, though there is some tendency for it to run in families. On the other hand, some researchers have suggested that most people would eventually get Alzheimer’s disease if they lived long enough. One theory is that the disease is transmitted by a virus-like infection. An- other is that it results from the effect of toxins such as excessive aluminum. Indeed, the brains of Alzheimer’s patients show a higher than average amount of aluminum. Another suggestion is that it is an autoimmune disorder in which the body creates antibodies that selectively at- tack acetylcholine- containing neurons. One other possibility is that the substance NGF (nerve growth factor), essential for the survival of acetylcholine containing neurons, may become defi- 168 5 A Neuroscience Primer cient in this disorder. Animal studies have reported positive results from treatments based on this theory. What can one do to minimize the likelihood of acquiring this disorder? As with many dis- orders of aging, the best advice is to exercise regularly and keep healthy. Beyond this there is increasing evidence that an active mental life is a good, if partial, inoculation against Alzheimer’s disease. Learning new skills and keeping the mind supple seems to stave off onset, and slow its progress as well. Indeed, this advice has general applications, and the authors of this primer would like our readers to take stock of their own capacities, taking steps to ensure the healthy functioning of their own body and mind.

5.12.3 The schizophrenias

The spectrum of consciousness, especially cognition, among patients with diagnoses of schizophre- nia is wide, but a meta-analysis has discerned several similarities in comparison with nonclinical individuals (Heinrichs, 2005). Over 80 quantified differences were identified, ranging from neu- ropsychological differences to cognitive differences to gender and social differences. For exam- ple, schizophrenics differed from nonclinical individuals on regional brain volume, blood flow, metabolism, receptor occupancy, and genetic background; on reactions to stress and distress, verbal recall deficits, delusions, apathy, hallucinations, and global cognitive impairments; as well as on gender differences (although the rates were equal, the symptoms were more severe among men) and socioeconomic differences (a higher incidence was found among disadvantaged groups). Care must be taken to ensure that language does not bias comparisons of different cultural groups when the various types of schizophrenia are studied. Researchers in the area of psychopathology are beginning to realize that cognitive performance depends on the integrity of brain systems that mediate information processing; when this integrity is compromised through disease, trauma, or faulty development, performance problems are likely to ensue.

5.12.4 Altered States of Consciousness (ASCs)

While Altered States of Consciousness (ASCs) are no longer considered pathological, they are novel modalities of consciousness that deserve attention in this primer. ASCs can occur spon- taneously, can be evoked by physical and physiological stimulation (e.g., Ehrsson, 2007), can be induced by psychological means, and can be caused by diseases (both "mental" and "physi- cal"). Vaitl and his colleagues (2005) conducted an analysis of the phenomenological literature concerning ASCs produced by all these modes, finding four dimensions that could be used for categorization purposes: activation, span of awareness, degree of self-awareness, and sensory dynamics. They also reviewed the work of neuroscientists, finding that that these researchers had concluded that ASCs were brought about by a compromised brain structure, transient changes in brain dynamics, and neurochemical and metabolic processes. Psychological perspec- tives added environmental stimuli, mental practices, and various techniques of self-regulation, each of which can temporarily alter brain functioning and conscious experience. Like the blind men and the elephant, one’s view of ASCs depends on what one’s initial orientation may be. 5.13 Techniques for imaging the brain 169 5.13 Techniques for imaging the brain

XXX this section needs expansion XXX One of the largest challenges neuroscientists face is measurement of the fine-level structures and dynamics of the brain. Current neuroscience knowledge has been creatively assembled from glimpses into aspects of the brain provided by a variety of different technologies over a long period of a time. A systematic way of imaging the structure and dynamics of the brain at the cellular or (better yet) molecular level would provide a deluge of data, whose analysis would require advances in statistical and AI techniques, but would also provide a key to tremendously improving our overall neuroscience knowledge. Such a method has not yet emerged, but existing brain measurement technologies are ongoingly improving.

5.13.1 Functional Magnetic Resonance Imaging (fRMI)

One of the many uses of functional magnetic resonance imaging (fMRI) is to study the inter- connections between different parts of the brain. Researchers at Northwestern University have discovered that these connections are dynamic, not static, and change according to the task involved. They have compared brain networks to a system of highways connecting different parts of a city. The highway is static; no matter how heavy the traffic load, it always has the same number of lanes. In the brain, there is a dynamic change that allows certain pathways to preferentially facilitate the demands of a given cognitive task. The brain highway "adds lanes" to accommodate the requirements of a particular task. This research method, Dynamic Causal Modeling, examines the influences between brain regions, and suggests that specific regions serve as convergence zones that integrate information from other parts of the brain. For example, brain imaging technology has revealed that lying and truth-telling activate different areas of the brain; lying generates more overall activity, firing up regions association with emotions as well as those involved in the inhibition of responses (Holden, 2004). Stereotactic radiosurgery uses functional magnetic resonance imaging, along with radiation beams to locate and treat tumors and other problems deep within the brain without opening the skull. The most widely used form of radiosurgery is gamma knife surgery. The first devices were invented in Sweden in the 1950s, and their work led to the construction of the gamma knife in the 1960s. This "knife" produces some 180 focused beams of gamma radiation, generated from the decay of radioactive cobalt pellets. Hundreds of thousands of patients have undergone stereotactic radiosurgery not only for cancerous tumors but for nerve injuries, "leaky" blood vessels in the brain, and various types of neuralgias.

5.13.2 Direct Electrode Recordings

An alternate method of monitoring brainwave activity has provided support for a holistic the- ory of how the brain works. Extending technology previously used to enable monkeys to move a robot arm with the brain, researchers at Duke University Medical Center inserted thin mi- croelectrodes into areas of living rat brains involved in sensory processing, motor function, and memory formation. After recording electrical signals through the rats’ sleep-wakefulness cycles 170 5 A Neuroscience Primer over a period of several days, the researchers discovered distinct patterns that marked the an- imals’ brain activity through waking, deep sleep, and rapid eye movement sleep, including the consolidation of memory. The researchers also were able to distinguish changes in the brain that marked transitions between different sleep states. This approach surpasses fMRI and PET brain scans, which only give quick glimpses into brain activity. The Duke University procedure provides an instant-to-instant electrical map, measuring global activity in the brain. It appears that there is no such thing as a single neural "code," because the "code" is continually changing according to the internal state of the brain and according to the strategy the animal selects to search the environment. This implies that perception may not be simply the analysis of in- coming information, but may depend on the internal state of the brain at any given time. The brain may not be as passive as it was once thought, but may instead be an ever-adapting organ. Additionally, the brain may not be simply a chest of compartments, each responsible for its own function; the entire brain appears to contribute to all of its actions, as Karl Pribram suggested in his 1991 book, Brain and Perception. Besides providing insight into brain function, the Duke University brain- monitoring technique allows researchers to monitor attention, expectation, and the nature of neurological disorders (Schwartz and Begley, 2002).

5.14 Conclusion

Neuroscience has come a long way in recent decades and continues to advance extremely rapidly. But there is still a long way to go. The neurosciences (affective, behavioral, cognitive, social) place the emphasis on the brain and central nervous system in understanding behavior. But it is not the brain that has thoughts and experiences, it is the whole human animal. From a philosophical standpoint (e.g. Bennett, Dennet, Hacker, and Searle, 2007) there are unsolved questions that the neurosciences pose, guaranteeing that the future of this field will be an exciting one. From an AGI perspective, the future of the relationship between AGI and neuroscience remains unclear. At very least, neuroscience provides the AGI researcher with a bounteous source of inspirations. Step by step, neuroscience is revealing aspects of human general intelligence with potential implications for engineering general intelligence. Which of these aspects will prove fruitful for AGI designers and engineers to emulate, is a story that will unfold over the coming decades. As an example, in recent years we have seen significant practical successes in vision and speech processing from narrow-AI deep learning systems, with structures loosely but not highly specif- ically modeled on the human visual and auditory cortex. Some AGI researchers are pursuing this sort of “broadly inspired by the brain” approach beyond the domain of machine perception, others are drawing on more cognitive or mathematical inspirations and paying less mind to the flow of discoveries from neuroscience, and others are working with more literal brain simulation. Which approaches will ultimately prove most effecitve remain sto be seen. Chapter 6 Essentials of Cognitive Science

Let’s put a chapter here !!!

171

Chapter 7 Mapping Mind into Brain

Let’s put a chapter here !!!

173

Chapter 8 Essentials of Cognitive Development

Let’s put a chapter here !!!

175

Chapter 9 Dynamic Global Workspace Theory

Let’s put a chapter here !!!

177

Chapter 10 Perspectives on Human and Machine Consciousness

Ben Goertzel

Abstract Consciousness studies is now a highly active research field, with a wide variety of different approaches in play. In this chapter, concepts and findings from a number of perspec- tives are presented, and partially synthesized into a unified understanding. Six key properties of consciousness are emphasized: Dynamical representation of the focus of consciousness, Focusing of energetic resources and focusing of informational resources on a subset of system knowledge, Global Workspace dynamics as outlined by Bernard Baars in his cognitive theory of consciousness, Integrated Information as emphasized by Tononi, and correlation of attentional focus with self-modeling. It is proposed that the extent, and relative importance, of these properties may vary in different states of consciousness; and that any AI system displaying closely human-like intelligence will need to manifest these properties in its consciousness as well. The “hard problem” of consciousness is sidestepped throughout, via focusing on structures and dynamics posited to serve as neural or cognitive correlates of subjective conscious experience.

10.1 Introduction

These days, unlike a few decades ago, consciousness is a significant topic of research in psychol- ogy, neuroscience, philosophy and other fields. However, there remains no scientific consensus on how to define or conceptualize consciousness, let alone on how to quantitatively measure it, or formally model its structure or dynamics. Among other open questions, there is no broadly accepted way to measure the degree of consciousness displayed or experienced by a system (be it a human or other animal brain, or an robot or other AI) during a certain interval of time. I will not aim, here, to address the foundational question of “what consciousness funda- mentally is." Instead, the question I will focus on is: What are the important properties specifically characterizing human, or human-like, consciousness?. 1 I believe that this question admits scientific answers – but probably does not admit a simple, elegant, unified answer. Rather, I suggest that human consciousness is distinguished from “con- sciousness in general" by a mix of different properties, which have evolutionarily co-adapted to

1 Just as physics has told us many interesting and useful things about the movement of objects without resolving all the core philosophical issues regarding the nature of space and time, I believe we can come to many valuable conclusions about consciousness without first needing to resolve all related philosophical perplexities.

179 180 10 Perspectives on Human and Machine Consciousness work together in a coherent way. At a high level, this same sort of “messy coherence" can be observed in many other examples in the domain of evolved systems – very often, in a biological or ecological context, we find that heterogeneous aspects of a system work together coherently, in a manner that works pragmatically, yet is somewhat “ad hoc” and specific to the functioning of some particular sort of system. It is possible that there is some elegant, crisp, beautiful theory of human consciousness lurking around the corner, which we have not yet found because we’ve been approaching the topic with the wrong theoretical or empirical toolkit. However, I suspect that this is not the case – and that the quest for such a theory may be understood as yet another case of “physics envy", i.e. the fallacy of expecting or hoping complex and specific systems like ecosystems or human organisms to display regularities expressible as simple, mathematically aesthetic equations like the ones found in theoretical physics. It is with this in mind that I advocate here an expressly “multifactorial” approach to human consciousness. The messy coherence of human consciousness is analogous to – and also closely related to – the messy coherence of human intelligence, which is a special case of “intelligence in general," with a host of special properties that evolved in reaction to the specific evolutionary needs of early humans and their predecessors. In particular, intelligence testing has something to tell us about the prospects for rigorous consciousness measurement. The IQ test aims to provide a single number summarizing human general intelligence, but succeeds only very narrowly. Cultural biases in IQ testing are well substantiated [? ]. Further, psychologists have proposed various “multiple intelligence" theories aimed at measuring various aspects of intelligence individually, arguing that the standard IQ test somewhat arbitrarily squashes multiple, largely distinct capabilities into a single numerical score [Gar99]. Given the difficulty of establishing IQ testing as a measure among human beings, it seems clear that current IQ tests cannot meaningfully be applied beyond the human realm, e.g. to intelligent animals with fundamentally different natures such as cetacea, or to AI systems with cognitive architectures differing significantly from the human mind/brain. On the other hand, mathematical measures of general intelligence have been proposed [LH07, Goe10b], but these are very abstract, and it is not clear how to apply them in the context of everyday human intelligence. On a very qualitative level, one can summarize the nature of human intelligence concisely, with phrases like “ the ability to learn and generalize", or “ the ability to achieve complex goals in complex environments", etc. But when one tries to formalize ideas like these, one hits numerous thorny dilemmas, mostly revolving around the dichotomy and boundary between extremely broad theoretical problem-solving capability (which quickly gets beyond the human level, when one considers it in the abstract), and highly specialized problem-solving capability (at which humans are good in certain domains, and terrible at in others). Human intelligence is indeed somewhat specialized, but also has an element of generality interwoven with the specialization – which makes it complex and complicated, in the typical manner of biological phenomena. And similarly, human consciousness has multiple aspects, which seem difficult to summarize in a single number. The experience of human consciousness, while it often seems simple to humans experien- tially, may actually be a complex amalgam of different phenomena. The human mind/brain seems to contain many specialized forms of consciousness, which then weave together into an overall consciousness dynamic. Consciousness seems not to be a simple, unidimensional physical concept like energy or mass; but rather a complex, multidimensional psychological concept like intelligence or happiness – and as such is measurement becomes a complex, context-dependent matter of balancing multiple factors. The idea that some sort of “raw consciousness”, with an 10.2 Aspects of Human Consciousness 181 elemental simplicity to it, may be immanent in the physical universe, doesn’t really help with the pragmatic measurement of human or human-like consciousness – any more than intelligence testing is aided by the observation that some sort of “raw intelligence” is immanent in the uni- verse due to the way basic physical dynamics implicitly optimizes complex objective functions in complex situations. In this paper, a number of contemporary analyses of human consciousness are analyzed from this multifactorial perspective: Baars’ Global Workspace Theory [Baa97? ] and the LIDA soft- ware system that partially embodies it [? ]; Tononi’s Integrated Information theory [? ]; Goerner and Combs’ analysis of consciousness in terms of nonlinear dynamics and energy minimization [? ]; Tart’s theory of states of consciousness [? ]; and the analysis of consciousness in terms of reflective self-modeling [Met04?? ]. It is argued that these theories, diverse on the surface, are actually elucidating different aspects of the same complex underlying human consciousness process. Finally, the possibility of measuring the degree of consciousness displayed by a human or human-like system, using multiple factors derived from these multiple theoretical perspectives, is discussed. Six key factors relevant to measuring human-like consciousness are summarized as: 1. Dynamical representation of the focus of consciousness 2. Focusing of energetic resources on a subset of system knowledge 3. Focusing of informational resources on a subset of system knowledge 4. Global Workspace dynamics 5. Integrated Information, perhaps as quantified by Tononi 6. Correlation of attentional focus with self-modeling The optimal ways to quantify all these phenomena are not yet clear; this is a topic needing further study. What is argued here is that, if one wishes to quantify the degree of consciousness of real-world systems, this is the right way to proceed – i.e., by identifying and then quantifying multiple aspects of the multifarious, multifactorial dynamical process that is human-like con- sciousness. That is, the goal here is not to propose a precise quantitative measure of human-like consciousness, but rather to lay out a clear conceptual framework, integrating relevant bodies of knowledge and theory, within which human-like consciousness can be qualitatively analyzed in a variety of systems (including AI systems), and within which comprehensive, precise quantitative measures of human-like consciousness can be pursued.

10.2 Aspects of Human Consciousness

Consciousness has been addressed from a variety of different vantages, far more than could be surveyed in a brief paper. In this section – which constitutes the bulk of the paper – I review a subset of the many important ideas from the literature, which combine together to form the overall perspective on consciousness outlined in the following section. 182 10 Perspectives on Human and Machine Consciousness 10.2.1 Hard and Possibly Less Hard Problems Regarding Consciousness

David Chalmers [Cha97] famously distinguished the “hard problem of consciousness” from other issues regarding consciousness – where what he meant by the “hard problem" was, in essence, the problem of connecting subjective experience (the “raw feel" of consciousness, sometimes referred to using the term “") with empirically observable factors. According to my own understanding, in the current ontology of intellectual disciplines, this “hard problem" is a philo- sophical rather than scientific problem. My reasoning is that science , as currently understood, is focused on prediction and explanation of measurements that are observable by an arbitrary observer within a community; whereas subjective experience, by its nature, is not observable by an arbitrary observer with a community. There is some wiggle room here, in that a com- munity of meditators or psychedelic adventurers may consensually agree that they can sense one another’s subjective experience, thus arguably bringing subjective experience within the domain of the interpersonally observable, at least with respect to that particular community. However, this sort of observability is different from the measurement as generally pursued in science, meaning that handling the “hard problem" in a scientific way would involve substantial extension or modification of the scientific method as commonly understood. In reaction to the slipperiness of the “hard problem”, many consciousness researchers have focused their attention on the perhaps easier, though still challenging, problem of finding “neural correlates of consciousness" [? ], or else cognitive correlates of consciousness. The research question then becomes: What are the patterns in the brain or mind that tend to correlate with reports of subjective experience? A related question is what Scott Aaronson has called the “pretty hard problem of consciousness" [? ] – determining which kinds of systems are capable of having consciousness experience. Which kinds of systems can have physical correlates of consciousness at all? The topic addressed in this paper is the cognitive and neural correlates of human and human- like consciousness. I believe this topic can be explored quite thoroughly without making any commitments regarding the “hard problem." However, before moving on, it would be dishonest of me not to clarify that I do have a personal and intellectual position on the “hard problem." Some have argued that the “hard problem” is nonsense and qualia do not exist in any mean- ingful sense [Den93]; I am not one of these. Rather, I personally tend to agree with Chalmers that that some sort of is probably the right answer – i.e. I tend to view con- sciousness as a property that everything in existence possess to some degree and in some form. Analytic philosopher Galen Strawson [? ] has argued strenuously and rigorously that any other perspective is logically ill-founded; there is no consistent, sensible way to view consciousness and the physical world as separate but interacting entities. But if one accepts this, there remain difficult questions regarding why particular physical entities are associated with particular sorts of conscious experience. The philosophical and scientific aspects of panpsychism have been ex- plored in detail by many others [??? ] and I will not repeat those discussions here. The central ideas in this paper are not predicated on the panpsychist perspective, so I mention my orienta- tion toward panpsychism here mainly to point out that there is at least one simple, conceptually coherent answer to the question of the basic nature of consciousness, which appears to be fully 10.2 Aspects of Human Consciousness 183 consistent with the concepts discussed 2. The reader is invited to explore panpsychism further, or else to harmonize the cognitive and empirical ideas presented here with their own different views on the fundamental nature of consciousness.

10.2.2 Degrees of Consciousness

Consciousness, from the perspective of a subjective human experiencer, seems not to be a Boolean, either/or phenomenon. Rather, there seems a subjectively clear notion of the degree of consciousness. For instance, there is a sense in which • After I’m fully awake, I am more conscious than I am a half-second after I wake up in the morning • I am more conscious of something at the center of my attention, than something at the fringe of my attention • A fully alert person has more consciousness than a fully alert chimp, worm, bug or rock One important research question is: To what extent are these three different kinds of more- ness related? Is there a single axis of "quantity or degree of consciousness" to which they all refer? Or do they refer to different ways of quantifying/comparing instances of consciousness? One possibility is that single measure, or a certain set of measures, could be shown to lie at the center of all three of the kinds of more-ness I’ve mentioned. A simple and compelling way to think about “degrees of consciousness” is as degrees of conscious access [? ]. As Bernard Baars notes [? ],

Zero Conscious Access (Zero CA) is a perfectly acceptable label for brain events — a world of them — that never come to consciousness, but which need to be understood if we are to understand the dimension of CA. The dimension CA could be operational- ized by specific measurable variables ... [e.g.] in the case of sensory events or recalled memories. Degree of drowsiness can also be assessed as an empirical CA measure, by counting the number of slow waves in the occipital EEG per second. Or in the case of surgical anesthesia, it can be assessed by the amount of inhaled anesthetic per second. Behaviorally patients may be asked to answer questions, or count to ten, etc.

Intuitively, the notion of CA has potential to encompass the three types of “degree of con- sciousness" mentioned above. A fully awake state of mind, compared to a half-awake state of mind, involves more entities having more conscious access; entities in the focus of attention are more accessible to conscious processes than those on the fringe; and a worm or a bug, very likely, has fewer and simpler items possessing conscious access during any given interval com- pared to a human (as these simpler animals, to the extent that they are viewed as conscious, likely have consciousnesses dominated by immediate perceptions and actions, whereas human consciousness gives access not only to these but also to various other memories, plans, ideas, etc.).

2 The main argument posed against panpsychism seems to be that people find it counterintuitive. However, science is frequently counterintuitive; and further, the view that panpsychism is counterintuitive is much more prevalent in the West than in the East or Africa 184 10 Perspectives on Human and Machine Consciousness 10.2.3 Specific Subprocesses of Human Consciousness

Cognitive psychology has traditionally said a lot more about “working memory" (or before that, “ short-term memory") than about consciousness per se. However, there is a great deal of overlap between these topics. So it is highly relevant to any discussion of human consciousness to note some of the specific substructures and subprocesses that cognitive psychologists have identified within human consciousness, e.g. the classic 3 identified by Baddeley [? ], • A phonological loop, which deals with sound or phonological information, and is subdivided into a short-term phonological store with auditory memory traces that are subject to rapid decay and an articulatory loops that can revive the memory traces. • A visuospatial sketchpad, which handles the temporary storage and manipulation of spa- tial and visual information, and also assists with tasks which involve planning of spatial movements. It is thought to handle visual, spatial and kinesthetic information in slightly different, though overlapping, ways. • An episodic buffer, which is concerned with linking information across domains to form integrated “episodic" units of visual, spatial, and verbal information, such as the memory of a story or a movie scene It would be hard to argue for the necessity of such particular structures and processes within any conscious system. However, such phenomena clearly play a key role in the human experience of being conscious, and the empirical correlates of this experience. Human consciousness is not just a generic phenomenon of attention-focusing or what-not; it clearly involves multiple important characteristics common to the broader phenomenon of consciousness, but it is a specific process involving a specific architecture that evolved for specific reasons. Numerous subtleties arise here, such as the re-use of these specialized structures for more general purposes. E.g. it seems that the phonological loop can be used for handling abstract mathematical knowledge as well as ordinary speech 3; and that the visuospatial sketchpad can be used for abstract visual or partly-visual representation of abstract conceptual relationships [? ]. As commonly occurs in biological systems, mechanisms that evolved for one purpose may then be adaptively deployed for others. Baddeley’s model of working memory also posits a “central executive" that coordinates the operation of the phonological, visuospatial and episodic components. This is somewhat contro- versial as it has some appearance of being a “homunculus" type mechanism. However, the broad notion of a central executive function can be fulfilled in many ways, and not necessarily by a physically localized or operationally isolated subsystem. The Global Workspace Theory and LIDA cognitive model provide an example of how Baddeley’s central executive function can be modeled as a distributed, system-wide dynamical process rather than an isolated, homuncular module.

10.2.4 Dynamic Global Workspace Theory

Perhaps the most comprehensive model of the cognitive correlates of consciousness in the human mind, is the Global Workspace Theory (GWT) developed by Bernard Baars and his colleagues

3 Hadamard reports that some mathematicians, e.g. the great George Polya, say they think about mathematical concepts in terms of grunts and groans [? ] 10.2 Aspects of Human Consciousness 185

[Baa97? ]. The LIDA cognitive architecture, developed by Stan Franklin and his colleagues, is a cognitive model and AI architecture and system that directly incorporates the key aspects of GWT, along with other AI and cognitive science ideas. A global workspace (GW) is broadly defined as “a functional hub of binding and propagation in a population of loosely coupled signaling elements." Intuitively and experientially speaking, the GW is the “inner domain in which we can rehearse telephone numbers to ourselves or in which we carry on the narrative of our lives. It is usually thought to include inner speech and visual imagery." [Baa97]. Conscious experience in humans and other similar animals is viewed as associated with GW functions. There are reasonably substantiated hypotheses regarding the neural underpinnings of the GWT in humans and similar animals The cortico-thalamic (C-T) core is believed to underlie conscious experience and associated cognitive functions. However, the GW is not to be identified with any specific anatomical hub within the C-T core. Rather, the GW is to be thought of as spanning multiple anatomical hubs, and constituting “dynamic capacity for binding and propagation of neural signals over multiple task-related networks, a kind of neuronal cloud computing" [? ]. The hypothesis is that

[C]onscious contents can arise in any region of the C-T core when multiple input streams settle on a winner-take-all equilibrium. The resulting conscious gestalt may ignite an any-to-many broadcast, lasting 100 − 200ms, and trigger widespread adap- tation in previously established networks. To account for the great range of conscious contents over time, the theory suggests an open repertoire of binding coalitions that can broadcast via theta/gamma or alpha/gamma phase coupling, like radio channels competing for a narrow frequency band. Conscious moments are thought to hold only 1Ð4 unrelated items; this small focal capacity may be the biological price to pay for global access.

To phrase the core dynamic of GWT in neural network terms, one may describe the global broadcast of the contents of consciousness as follows: an active cell assembly (which would correspond to a coalition in the LIDA model; see below) “wins out over the competition and ignites, gathering momentum, and spreads out to include the whole of the Edelman and Tononi thalamocortical core." [? ] The GWT is a cognitive and cognitive-neuroscience rather than computational theory, but it has served as the inspiration for various AI system designs to various degrees. The closest re- lationship is with LIDA (Learning Intelligent Distribution Agent), an ambitious computational cognitive architecture created by Stan Franklin and his colleagues, inspired by direct collabora- tion with Baars, which attempts to provide a working model of a broad spectrum of cognition in humans and other animals, from low-level perception/action to high-level reasoning. Inspired by GWT, LIDA is founded on two core hypotheses: • Much of human cognition functions by means of frequently iterated ( 10 Hz) interactions, called cognitive cycles, between conscious contents, the various memory systems and action selection. • These cognitive cycles, serve as the ÒatomsÓ of cognition of which higher-level cognitive processes are composed. Spanning both sides of the symbolic/subsymbolic dichotomy, LIDA is a hybrid architecture in that it employs a variety of computational mechanisms, chosen for their psychological plausi- 186 10 Perspectives on Human and Machine Consciousness bility and practical feasibility. The focus on action selection is carefully reasoned based on Stan Franklin’s associated AI theories [? ]. Along with the GWT, LIDA incorporates a broad spectrum of ideas from the cognitive science literature, including models of the particular architecture of human working memory as roughly indicated above. It incorporates specific components corresponding to particular substructures within the working memory, e.g. Transient Episodic Memory, Sensory Memory, Perceptual Associative Memory, etc. More specifically 4 , the LIDA cognitive cycle can be subdivided into three phases, the understanding phase, the attention (consciousness) phase, and the action selection and learning phase. Beginning the understanding phase, incoming stimuli activate low-level feature detectors in Sensory Memory. The output engages Perceptual Associative Memory where higher-level feature detectors feed in to more abstract entities such as objects, categories, actions, events, etc. The resulting percept moves to the Workspace where it cues both Transient Episodic Memory and Declarative Memory producing local associations. These local associations are combined with the percept to generate a current situational model; the agentÕs understanding of what is going on right now. The attention phase begins with the forming of coalitions of the most salient portions of the current situational model, which then compete for attention, that is a place in the current conscious contents. These conscious contents are then broadcast globally (following the core dynamic proposed by Global Workspace Theory), initiating the learning and action selection phase. New entities and associations, and the reinforcement of old ones, occur as the conscious broadcast reaches the various forms of memory, perceptual, episodic and procedural. In parallel with all this learning, and using the conscious contents, possible action schemes are instantiated from Procedural Memory and sent to Action Selection, where they compete to be the behavior selected for this cognitive cycle. The selected behavior triggers Sensory-Motor Memory to produce a suitable algorithm for its execution, which completes the cognitive cycle. In GWT terms, LIDA may be understood to incorporate (among other things) a detailed theory of how the focusing of attention works, and hence of how the process of consciousness works. The input to the consciousness process is understood to include stimuli from both the ex- ternal and internal environments. Along with various sorts of memory, the internal environment is viewed as including a “preconscious workspace" in external stimuli and internal constructs are understood with the help of recall from various memories. Input to the consciousness process is viewed as coming from this preconscious workspace; the consciousness process processes these inputs, producing dynamic “conscious contents.” The contents of consciousness are then broad- cast according to the core GWT dynamic, causing a global impact on the mind network, thus impacting the preconscious workspace and completing the loop between preconscious-workspace and conscious processes. The consciousness process as depicted by GWT and LIDA is clearly a complex nonlinear dy- namic, susceptible to various subtle emergent self-organizational phenomena, that are not well characterized at present. In dynamical systems terms, what Franklin and the the LIDA group would tend to view as a “pattern of conscious contents", I would describe in more detail as a "probabilistically approximately invariant subspace of the set of possible states of the dynamical system comprised of the consciousness process and the contents of consciousness". Crudely, one could speak of an "attractor" instead of a "probabilistically approximately invariant subspace" – but in actuality, if one did real mathematical modeling of these systems,one would likely not find something so deterministic as an attractor according to the standard definitions of dynam-

4 This paragraph is paraphrased from [? ] 10.2 Aspects of Human Consciousness 187 ical systems theory [? ]. What I am here calling a "probabilistically approximately invariant subspace" has sometimes been called a “persistent transient." Biological, social and psychologi- cal systems are full of these sorts of phenomena, even though mathematical dynamical systems theory deals much more with simpler cases like attractors and invariant measures.

10.2.5 Consciousness as a Nonlinear-Dynamical Process

The perspective of consciousness as a complex, nonlinear-dynamical process bears further elab- oration. Sally Goerner and Leslie Allan Combs, in a concise, elegant article from 1998 [? ], argued in favor of a process perspective on consciousness, from a more experiential view:

Consciousness always has an object. In other words, it is always about something. We are not just conscious, we are conscious of the taste of food, the smell of the sea, a tooth ache. We are conscious of joy, of boredom, of the meaning of words on the page in front of us, of the sound of music playing in the next room, of our own thoughts, of memories. The point is that virtually all experience is experience of something. ... Let us go one step further and note that events which lead to increased complexity in conscious experience also must in their own way lead to increased complexity in brain processes. To look at a tree in bloom presents the mind with a picture of pleasing complexity. Likewise, we cannot doubt that the brain is treated to a similar upgrade in complexity, and that electrochemical changes there support our experience of pleasure as well. ... In the above example it is apparent that looking at a tree in bloom in-forms both the brain and the mind, or conscious experience, in a way that increases their complexity. Their information level has been enlarged. Here we see the interchangability of experience and information. Consciousness would seem to be intimately involved with the informing of the brain and mind by objects of attention. Moreover, on the brain side we see that the complexification associated with a conscious experience also involves an increase in energy, though this may be only be of a small amount. Here again the connection with neg-entropy comes into play as a decrease in disorganization and an increase in order.

There is a clear connection between these ideas and the LIDA model, in which a large role is played by the contents of consciousness, the object of attention at the moment. The LIDA model lays out out a hypothetical, but plausible, mechanism for the self-organization to which Combs refers, based on the idea that non-linear dynamics can provide bridge between high-level, conceptual models of mind like LIDA and underlying neural mechanisms. Combs goes further and states that

Bringing the above ideas together, we suggest that each state of consciousness, mood, or frame of mind, represents a unique and coherent–minimal energy–fit for the in- formation streams represented by the many psychological processes which comprise it, producing a stable pattern or gestalt. Further, the stability of the pattern arises from its autopoietic tendency to self-organize.

This relates to the notion that consciousness has to do with some subnetwork of the brain settling into, if not an attractor in the strict sense of dynamical systems theory, then at least a 188 10 Perspectives on Human and Machine Consciousness persistent transient associated with some particular basin in state space. This related to neuro- scientist Walter Freeman’s perspective on neurodynamics as dominated by "strange attractors with wings" [Fre95? ].

10.2.6 Consciousness and Attention

The relationship between consciousness and attention is universally recognized as a close one, but has been articulated in a great variety of different ways. For instance, Baars [Baa97] sum- marizes attention as Attention. In GW theory, the control of access to consciousness by reference to long- term or recent goals. Attention may be voluntary or automatic. See also Prioritizing Function. Thus, he defines attention in terms of consciousness in a particular way. While this seems per- fectly sensible, I suggest that it may be useful to define attention separately from consciousness, so as to be able to more clearly explore the relationship between the two. A careful analysis allows us to decompose the notion of attention into at least two subcon- cepts: • Resource Attention – attention as regarding the allocation of resources. One can define the attention a system pays to some entity E, relative to an observer (which may be the entity itself), as the percentage of the systemÕs resources that are devoted to E or other entities related to E (where Òrelatedness to EÓ is judged by the given observer). Of course one also has to specify whether one is concerned with space or time resources. Generally in the case of the brain, one is thinking about processing-time as a resource, and also about short-term memory buffers as a resource (but not about long-term memory as a resource; when we day a person is focusing their attention on a certain entity, we donÕt assume that entity is dominating their long-term memory). • Information Attention – one can define the attention a system pays to some entity E as the percentage of the Òinformation contentÓ observable in the system (over a certain interval of time) that concerns E or other entities related to E. In this context one would need to carefully choose the right definition of Òinformation contentÓ, so as to exclude information that is largely- dormant in LTM. It seems one wishes to look at information that is detectable from the internal dynamics of the system during the given interval of time (which is similar to what is done in Tononi’s information integration measure). Conceptualizing things as such, “attentionÓ can be separated from “consciousnessÓ, so that the alignment of consciousness with attention becomes an observation about certain kinds of cognitive architectures, rather than a definition. Attention, obviously, is a very broadly applicable concept. I would hypothesize that the emergence of some sort of attentional focusing mechanism is almost inevitable in any intelligence R for which (to use a schematic equation to convey a qualitative notion) U is sufficiently small, where • R = the system’s available compute resources • U = the system’s urgent need for real-time action selection, where each action depends to differing degrees on differing items of data stored in system memory 10.2 Aspects of Human Consciousness 189

Intuitively, the combination of these two factors means that the system will need to focus attention on some memory items more than others in order to survive, which is going to cause the emergence of something like a focus of attention. Rather than defining attention in terms of consciousness, one can say: It is a fact about the human cognitive architecture (but not necessarily about all possible cognitive architectures) that attentional focus and "reported as conscious" events tend to be aligned. In other words, • When resource attention, information attention and consciousness are aligned, then one has a case where a substantial portion of a systemÕs elements are mutually entrained in the process of focusing a substantial portion of the systemÕs energy (resources) and information-processing on some entity E. • This kind of alignment is important to the qualia of ordinary human conscious experience. Without this kind of alignment, one would have a different kind of phenomenon, that would subjectively feel quite different.

10.2.7 States of Consciousness

Another well-documented aspect of human consciousness, important for studying consciousness in humans, animals and engineered systems, is that it comes in different "states." 5 The foun- dational work here is Charles Tart’s book State of Consciousness [? ], but the concept goes back much further; e.g. Tart quotes William James [? ], who said

Our ordinary waking consciousness... is but one special type of consciousness, whilst all about it, parted from it by the filmiest of screens, there lie potential forms of consciousness entirely different. We may go through life without suspecting their ex- istence; but apply the requisite stimulus, and at a touch they are all there in all their completeness, definite types of mentality which probably somewhere have their field of application and adaptation. No account of the universe in its totality can be final which leaves these other forms of consciousness quite disregarded. How to regard them is the question – for they are so discontinuous with ordinary consciousness.

Tart defines a Discrete State of Consciousness or d-SoC as follows: We can define a d-SoC for a given individual as a unique configuration or system of psychological structures or subsystems. The structures vary in the way they process information, or cope, or affect experiences within varying environments. The struc- tures operative within a d-SoC make up a system where the operation of the parts, the psychological structures, interact with each other and stabilize each other’s func- tioning by means of feedback control, so that the system, the d-SoC, maintains its overall pattern of functioning in spite of changes in the environment. Thus, the in- dividual parts of the system may vary, but the overall, general configuration of the system remains recognizably the same.

5 As Leslie Allan Combs pointed out to me, philosophers of mind anyway tend to use the term ÒstateÓ to refer to nearly any mental condition, such as anger, sleepiness, being bewildered, and the like; whereas, psychologists and scholars of consciousness tend to use the term “stateÓ to refer to broader experiential landscapes, such as waking, dreaming, being “stoned", being hypnotized, "losing one’s head" in a complete uncontrollable rage, tripping on LSD, etc. Here I will use the term in the latter sense. 190 10 Perspectives on Human and Machine Consciousness

What are the neural or cognitive correlates of the “state of consciousness" phenomenon? One approach to conceptualizing the issus is as follows. If we think of consciousness as a process, it may make sense to think of it as a parametrized process. One can then talk about two levels of dynamics

• dynamics involving changes in the contents of consciousness • dynamics involving changes in the parameter values of the parameters of the consciousness process I would propose that different states of consciousness (e.g. stoned, tripping, dreaming, enraged, ordinary-waking) may correspond to different regions of the parameter-value-vector space of the consciousness process. The parameters of any complex cognitive system are going to have subtle interdependencies, in terms of the influence they have on system behavior; so that not every possible collection of parameter settings will lead to coherent, meaningful system behavior. But neither will there be one unique set of narrow ranges for each parameter that corresponds to successful system functioning. Rather, there are multiple collections of narrow ranges for each parameter. In the case of parameters directly related to the consciousness process, such collections may correspond to different states of consciousness. Of course, different parameter vectors for the consciousness process will tend to lead to dif- ferent patterns in consciousness contents ... e.g. one is unlikely to do one’s taxes while tripping on LSD, etc. Thus, as well as a set of parameter values, each state of consciousness will corre- spond to a certain subspace of the space of possible states of the consciousness process. States of consciousness tend to have a certain momentum to them, meaning that they correspond to “probabilistically almost invariant subspaces" of state space. Note that in dynamical systems theory "state" is generally used to refer to an instantaneous condition of a system; whereas "states of consciousness" are not that, they are classes of in- stantaneous states that stand in a certain relationship to each other relative to the underlying process. The use of "state" in "state of consciousness" is more analogous to the term "state of matter" as used to refer to solid, gas, liquid, plasma, etc. In the case of states of matter, the underlying physical processes are the same as a substance moves between different states (based on changes in underlying parameters such as temperature), but of course the dynamical properties of the system may change based on external and internal conditions, in spite of there being a consistent underlying process... As an example, recent research suggests that psychedelic states are higher-entropy states than ordinary waking consciousness [? ]. In an AI system this sort of higher-entropy state could potentially be induced by tweaking parameters of the attention allocation subsystem so that attention is spread more diffusively across the system’s knowledge base, rather than being tightly concentrated among the top fraction of most "important" knowledge items at a given point in time. The result of this parameter tweaking would be the system settling into states involving cognitive processes not relying on tight attentional focus on any one topic, but relying more on lateral thinking, perceptual metaphor, and other cognitive correlates of diffused attention. 10.2 Aspects of Human Consciousness 191 10.2.8 Tononi’s Integrated Information Measure

Giulio Tononi [? ] has outlined a theory of consciousness founded on the following two conceptual principles: • Every conscious state or moment contains a massive amount of information. • All the information that an agent gleans from conscious states is highly, and innately, integrated into the agent’s mind Based on these ideas and subsequent analyses, Tononi has proposed a quantitative measure of consciousness called the “Integrated Information” or Φ. Roughly speaking, Φ attempts to measure the degree to which there is a lot of information generated among the parts of a system as opposed to within them. 6 Tononi is to be congratulated for making a specific formal hypothesis regarding the nature and measurement of consciousness; and unsurprisingly, his hypothesis has attracted signifi- cant critical attention alongside significant enthusiasm. Computer scientist Scott Aaronson pre- sented a detailed argument showing that, according to Tononi’s mathematical measure, certain relatively simple mathematical constructs would be assessed as having a very high degree of consciousness [? ]. A similar point was made more simply by Eric Schwetzgebel, who argued that according to Tononi’s Φ measure, the United States would likely be assessed as far more conscious than any human [? ]. Tononi’s counter-argument to Aaronson basically argues that the Φ measure was never in- tended to be applied to arbitrary mathematical constructs, but rather to be applied in the context of organisms engaging with the world 7. This is conceptually reasonable, but dramati- cally reduces the value of Φ as a rigorous measure of consciousness. If Φ should only be applied to certain kinds of systems, and the class of applicable systems is defined only informally and qualitatively, then do we really have a rigorous quantitative measure of consciousness? It would seem that, to have a rigorous measure, one would then need a formal way to measure the applicability of the Φ measure. This might seem a nit-picky, pedantic point, but I believe it is more than that. Alternate approaches to understanding consciousness, like the ideas of Baars, Franklin, Combs and Tart mentioned above, are focused largely on understanding what it means for an organism to intel- ligently, cognitively engage with the world. As I will elaborate below, applying Tononi’s ideas in the context of a theory like GWT yields a more complex and subtle understanding of conscious information processing, which does not attribute consciousness to simple mathematical con- structs – but might perhaps attribute some degree of consciousness to the United States. This sort of nuanced, systems-oriented view of consciousness lacks the mathematical elegance and unidimensional clarity of Tononi’s theory as originally outlined, and seems to align reasonably with Tononi’s overall intentions.

6 While Tononi’s Φ is a reasonable measure of information integration, it is worth noting that there are many other ways to quantify the concept of “integrated information”; my own work in this area from two decades ago outlined similar definitions using algorithmic information and related ideas rather than Shannon information [Goe06, Goe93]. Algorithmic information is not practical to compute exactly based on real-world data; but neither is Tononi’s Φ for any complex system. 7 See [? ] for Tononi’s counter-argument and Aaronson’s detailed response to it 192 10 Perspectives on Human and Machine Consciousness 10.2.9 Self-Modeling, Reflection and Self-Awareness

One of the key aspects of human consciousness is its reflective, recursive, self-awareness. This is not always present – in a meditative state the human mind can in a sense transcend self- awareness [Aus99]; and in a “flow” state, the human mind can be so completely immersed in its task that it “forgets itself" [? ]. But much of the time, explicit self-awareness is a prominent aspect of human consciousness. Thomas Metzinger [Met04] has outlined a detailed, cross-disciplinary “self-model theory of subjectivity," centered on the concept of a “phenomenal self-modelÓ (PSM). A “self-model" is understood as a dynamic, ongoing process by which a portion of an organism’s cognitive system comes to reflect and predict the organism itself; and the PSM is, basically, conceived as the “conscious" portion of an organism’s self-model. What is meant by “conscious" here is a set of properties, such as availability for introspective attention and for selective, flexible motor control, integration into the organism’s internal representation of time, and ongoing dynamic integration into an overall model of the organism and its world. Metzinger [? ] distinguishes several levels of embodiment in cognitive systems: • first-order: cognitive properties emerging within perceiving, acting bodies as they interact with their environment • second-order: when a cognitive system represents its own embodiment internally, and uses this representation to help choose and guide actions • third-order: when a cognitive system’s representation of its own embodiment becomes part of the system’s “conscious contents" It seems intuitively clear that ordinary waking human consciousness involves what he calls third-order embodiment; this is a key part of the ordinary human self-model. In [? ] I dig deeper into the possible structure of the PSM, and propose to model the reflective aspect of human consciousness in terms of hypersets, mathematical objects that extend ordinary sets via their capability to recursively contain themselves as elements. There, the following recursive definitions are given: • "S is reflectively conscious of X" is defined as: The declarative content that "S is reflectively conscious of X" correlates with "X is a pattern in S" • "S wills X" is defined as: The declarative content that "S wills X" causally implies "S does X" • "X is part of S’s self" is defined as: The declarative content that "X is a part of S’s self" correlates with "X is a persistent pattern in S over time" These are posited as ideal forms that are approximated by the recursive forms in actual human mind/brains. These definitions imply an interesting symmetry to the relationship between self and awareness, namely: Self is to long-term memory as reflective awareness is to short-term memory. These recursive patterns, it is hypothesized, occupy a significant amount of energetic and informational attention in human minds. They often occupy significant attention within the Global Workspace; and it seems intuitively clear that the brain regions embodying these recursions would display significant integrated information. 10.3 Toward a Unified Model of Human and Human-Like Consciousness 193 10.3 Toward a Unified Model of Human and Human-Like Consciousness

What, then, are the critical factors characterizing the consciousness of human beings, and likely to characterize the consciousness of AI systems with roughly human-like cognitive architectures? Based on the literature and concepts reviewed above, an integrative understanding emerges fairly clearly. When a human-like system has the experience of being conscious of some entity X, then the system should manifest: 1. Dynamical representation: the entity X should correspond to a distributed, dynamic pattern of activity spanning a portion of the system (a “probabilistically invariant subspace of the system’s state space"). Note that X may also correspond to a localized representation, e.g. a concept neuron in the human brain [? ] 2. Focusing of energetic resources: the entity X should be the subject of a high degree of energetic attentional focusing 3. Focusing of informational resources: X should also be the subject of a high degree of informational attentional focusing 4. Global Workspace dynamics: X should be the subject of GWT style broadcasting throughout the various portions of the system’s active knowledge store, including those portions with medium or low degrees of current activity. The GW “functional hub” doing the broadcasting is the focus of energetic and informational energy 5. Integrated Information: the information observable in the system, and associated with X, should display a high level of information integration 6. Correlation of attentional focus with self-modeling: X should be associated with the system’s “self-model”, via associations that may have a high or medium level of conscious access, but not generally a low level These I will call six key factors of human-like consciousness. I do not claim that they are the only important aspects; but I do posit that they are among the most important aspects. The first five factors, I suggest, are relevant regardless of the state of consciousness – but may have different levels of importance in different states of consciousness. On the other hand, the sixth factor may play a minimal role in some states of consciousness, e.g. “non-symbolic” states as experienced by meditators, advanced spiritual practitioners and others [? ]. Relative to the ordinary waking state of consciousness, psychedelic states [? ] and flow states [? ],would (qual- itatively speaking) seem to involve less of a role for the self-model, as well as less concentrated attentional focusing.

10.3.1 Measuring Human-Like Consciousness Multifactorially

How then can one measure the degree of consciousness possessed by a system at a certain point in time, or the degree of conscious access that a system is giving to a certain entity during a certain interval of time? One reasonably tractable way to phrase this question, I suggest, is: How can one measure the degree of human-like conscious access that a system gives to a certain entity during a certain interval of time? To formalize the degree to which a system S gives human-like conscious access to an entity X, as a first approximation one could quantify the six factors listed above: energetic attentional 194 10 Perspectives on Human and Machine Consciousness focusing, informational attentional focusing, GW broadcasting, information integration, and association with self. One would then quantify conscious access as a weighted combination of these factors, with the weighting being state of consciousness dependent.The formulation of precise mathematical measures of each of these six factors would not be extremely difficult, but would require detailed analysis and would increase the length of this paper by a small integer multiple. So these particularities will be left for sequel papers. Next, given a definition of human-like conscious access, one can conceive • the degree of human-like consciousness of a system as the sum over all entities X in the system, of the degree to which the system gives X conscious access • the ratio of human-like consciousness of a system as the average over all entities X in the system, of the degree to which the system gives X conscious access This characterization of human-like consciousness is admittedly messy, and in more than one way. These six factors are all important, but it’s quite possible that a handful of further factors could usefully be added to the list. Furthermore, each of these factors could be quantified in multiple ways – as in the example of Tononi’s Information Integration measure, which is only one among a large number of sensible-looking mathematical formulas for capturing the conceptual notion of information integration. This messiness, however, strikes me as inevitable – i.e. it is simply part of the territory, which any reasonable map must reflect. Consciousness-in-general may be elementally simple in some sense, but human consciousness is a specific cognitive construct that evolved to serve the needs of specific sorts of organisms. AI systems may in principle display quite different varieties of consciousness; but if an AI system is going to display closely human-like intelligence, it will almost surely need to manifest closely human-like consciousness as well. The processing and memory dynamics that produce human-like consciousness are integral to the production of human-like intelligence.

10.3.2 Measuring Consciousness in the Human Brain

It is an appealing idea to use neurophysiological measurements to gauge the degree of con- sciousness of a human brain, as it passes through various states and experiences. Given an appropriate measure, the degree of consciousness of different parts of the human brain could also be gauged, providing a new perspective on the investigation of the neural correlates of conscious experience. Research has been done regarding the computation of certain (mathematically crude but perhaps pragmatically valuable) estimates of the Integrated Information of the brain [? ]. In a similar vein, one could measure the informational attention focusing of the brain during a certain period of time. Energetic attention focusing should be more straightforward to measure, as standard tools such as fMRI already give a view into the brain’s energy expenditure. Measurement of the degree to which the brain’s focus of attention is represented as a dynam- ical pattern, or the prevalence of GW dynamics in the brain, on the other hand, would seem to require neuroimaging with simultaneous spatial and temporal resolution going beyond what current technology provides. One would need to be able to measure the broadcasting happening within a single “conscious moment” between different regions of the brain – say, on the time scale of milliseconds, and the spatial scale of a cortical column. Such neuroimaging tools are 10.3 Toward a Unified Model of Human and Human-Like Consciousness 195 likely coming in the future and will have many exciting applications beyond the measurement of consciousness. Perhaps analysis of the data provided by such tools will enable modeling of the way the human brain builds its self-model, which will allow measurement of the association between entities in the GW and the self-model as well.

10.3.3 Human-Like Consciousness in LIDA and OpenCog

As compared to measuring consciousness in the human brain, the measurement of human-like consciousness in AI systems is a relatively straightforward matter. Issues of instrumentation are reasonably rapidly resolvable, so one is left only with the problem of formalizing the relevant aspects of consciousness in a computationally tractable way. This problem is far from trivial, since mining patterns from the dynamics of a rapidly changing large-scale software system is highly resource intensive. For instance, accurately computing the integrated information according to Tononi’s definition seems likely to be an NP-hard problem [? ]. We have seen above one example of an AGI (Artificial General Intelligence) system engineered to manifest human-like consciousness: the LIDA system is built centrally around the Global Workspace theory, so that a properly functioning LIDA system automatically incorporates some of the six aspects highlighted here. As well as having a GW, the dynamics of LIDA are designed to focus energetic and informational attention on the contents of the GW. The feedback between the central workspace and the rest of LIDA is intended to give rise to nonlinear dynamics that will form persistent dynamical patterns occupying the focus of attention. The different components of the LIDA system are intended to work together in a tightly coupled nonlinear way, which should in most cases lead to a high degree of integrated information among the active knowledge in the various components. LIDA theory does not focus on the emergence of self-modeling, however in a LIDA system put in situations where self-modeling was the simplest clearly effective strategy for goal achievement, it could be expected that a reasonably thorough self-model would emerge and would often occupy a significant fraction of the workspace. The OpenCog AGI architecture [?? ] also manifests the six aspects mentioned above, in its own way. OpenCog’s main memory store consists of a weighted labeled hypergraph whose nodes and links are called Atoms; and each Atom is labeled with ShortTermImportance (STI) and LongTermImportance (LTI) numbers, the former governing the Atom’s frequency of occurrence in cognitive processes, the latter governing the Atom’s retention in RAM. The set of Atoms with STI above a certain boundary level is called the AttentionalFocus (AF). A current de- velopment initiative focuses on adding specialized structures corresponding to the phonological loop, visuospatial sketchpad and episodic memory buffer to the system, to work closely with the AttentionalFocus. Roughly, speaking, in OpenCog, the AttentionalFocus corresponds to the Global Workspace. The dynamics of STI spreading through the Atomspace can be viewed as an implementation of the GW theory notion of GW broadcasting. Energetic and informational focusing on the AF occurs because the system’s various cognitive processes focus their attention preferentially on discovering new things about the Atoms in the AF, and building new Atoms via combining the ones in the AF. While some entities are represented by specific Atoms in the style of a traditional semantic network, the nonlinear dynamics of STI spreading means that entities are also represented by distributed patterns of activity (this dual representation has been referred 196 10 Perspectives on Human and Machine Consciousness to as “glocal memory” [GPI+10]). As in the case of LIDA, a self-model is not built into the system, but is intended to emerge naturally as a result of the system’s behavior in the context of environments and goals that benefit substantially from self-modeling. Information integration, finally, is closely related to the “cognitive synergy” principle that lies at the heart of OpenCog theory. The key notion here is that the various cognitive processes acting on the Atomspace should interoperate at a deep level, helping each other to overcome the combinatorial explosions they confront. Conceptually, this seems to imply that the interim data produced by the different cognitive processes should display a high degree of integrated information. Qualitatively, we thus see that both LIDA and OpenCog are design in a way that is in princi- ple amenable to displaying the six key aspects of human-like consciousness we have highlighted here. The same would certainly be true of a number of other cognitive architectures aimed at human-like AGI (see e.g. the review [DOP08, Sam10]). The extent to which human-like con- sciousness is actually manifested by running instances of these systems, is dependent on the degree to which these instances actually implement the cognitive architectures in question, and the extent to which these architectures operate as the underlying theories predict. Currently, LIDA, OpenCog and other architectures aimed at human-like cognition are still in relatively early research phases.

10.3.4 Human-Like Consciousness in the Global Brain

One may also apply these ideas is to the notion of an emergent “Global Brain” – an intelligence arising from collective dynamics in the global network of humans, computers and communication devices [Goe01, Hey07]. Many observers have argued that the Internet and related networks already display some form of intelligence; and some have speculated that as related technologies progress, the global communication/computing/social network will achieve more and more of the aspects of an autonomous, individual mind. This line of thinking naturally gives rise to the question of whether, or in what sense, a Global Brain could be conscious. More particularly, from the perspective pursued here, one well-posed question is whether, and to what degree, a Global Brain (GB) – today’s or a future descendant – might have human-like consciousness. The clearest candidate for the attentional focus of today’s GB would be the distributed, active data stores of major Internet companies. These occupy a decent fraction of the com- pute resources available, and get preferential treatment in the infrastructure of Internet service providers, so that their information is served fastest. Information as well as energy are substan- tially focused on these data stores – which, like the GW in the human brain, are functional hubs rather than single physical hubs (as they generally span server farms in multiple physi- cal locations). The data stores of Google, Facebook, Microsoft, LinkedIn, Twitter and the like broadcast information widely throughout the world’s population of humans and computers, and then receive feedback which guides their next information broadcasts. A “self" in the precise sense of human psychology is lacking, but several of these companies (e.g. the major search engines) do have internal models of large portions of the Internet, which they model in various ways. Information integration, finally, is the goal of the major analytics efforts undertaken by so many large Internet companies recently; it is the goal that underlies the recent rise of applied machine learning in the Internet and social network context (e.g. in early 2014: Google’s ac- 10.3 Toward a Unified Model of Human and Human-Like Consciousness 197 quisition of Deep Mind; Facebook’s work on face recognition and their founder’s investment in Vicarious Systems; etc.). The goal of these machine learning projects is precisely to learn abstract patterns that are implicit when you put a huge amount of data together in one (dis- tributed) place, that are not so readily observable in smaller troves of data. At the present time, the GW dynamics of the Internet is fairly different from that of a human brain. One major difference is that the rate at which the GW impacts the periphery is much slower than in the human brain (measured proportionally to internal dynamics of GW or the periphery). The current GB’s GW does sophisticated modeling of the whole GB, but uses the results of this modeling only relatively slowly and weakly; whereas, the human brain’s GW does a type of broadcasting that much more heavy-handedly drives the overall dynamics of the brain. This difference in dynamics affects self-modeling as well; it means that the self-models maintained by the major Internet companies tend to focus on static relationships rather than dynamics. The Internet, however, is rapidly evolving; and there are developments underway that seem likely to bring GB dynamics closer to that of the human brain. Once AI technology advances to the point that the descendants of current personal assistants like , Google Now and – as well as personal assistants in form – can interact with a modicum of general intelligence, then the the GB will have a periphery capable of sensitively and frequently exchanging high-information feedback with its GW. This will require the GW to maintain a more complex self-model focusing more on dynamics, and will, at a high level, make the GB more human brain like. It will also, no doubt, introduce various subtleties without parallels in the human brain. Quantifying the six factors of consciousness mentioned above in an Internet context, would give a way of measuring the degree of human-like consciousness of the global brain, and tracking the various features of this consciousness as it emerges.

Acknowledgements

This paper owes a huge amount to conversations with a number of people, including most no- tably (alphabetically) Bernard Baars, Leslie Allan Combs, Stan Franklin, Zar Goertzel, Cosmo Harrigan, Jim Rutt and Gino Yu.

Chapter 11 Motivation and Intelligence

Let’s put a chapter here !!!

199

Section III Key Concepts for AGI

Chapter 12 Natural Language Understanding, Generation and Interaction

Let’s put a chapter here !!!

203

Chapter 13 Logical and Probabilistic Inference

Let’s put a chapter here !!!

205

Chapter 14 Complex Dynamics and Self-Organization in Intelligent Systems

Matthew Ikle

Abstract Emergent approaches to AGI frequently rely upon a broad class of deep mathemat- ical methods used in the study of differential equations and iterated mappings. The methods are often described by such terms as “dynamical systems theory”, “non-linear science”, “com- plex adaptive systems”, “chaos theory”, or some amalgam thereof, and consist of a wide array of analytic, geometric, topological and, more recently, computational, methods used to study and classify qualitative behaviors of systems. By qualitative behaviors we mean systemic global behaviors including stability or instability; local behaviors such as basins of attraction or re- pulsion; as well as the strange behavior of the “strange chaotic attractor” that is globally stable yet locally unstable.

14.1 Introduction

Alife Robotics Mind Uploading Development robotics DYNAMICS CHAPTER – general intro to nonlinear dynamics concepts – neuron equations: —- Hodgkin Huxley equation (briefly) —- Izhikevich neuron —- formal neuron as used in CS style NN – neural nets —- Hopfield net —- symmetric hopfield stores memories associatively —- asymmetric hopfield yields complex dynamics incl. strange attractors —- feedforward vs. recurrent NN (the latter can have complex nonlinear dynamics) – brief mention of Alife perhaps? one example of cellular automata? – evolutionary learning —- very basics of GA —- GP as an approach to automated program learning —- basic idea of an EDA DEEP LEARNING CHAPTER – I have a lot of relevant material in the form of a Word doc – What’s missing are a few details on, say —- CNN —- stacked autoencoders – Details on DeSTIN we have and can paste in from elsewhere – BS on the potential grand extensions of deep learning I can insert at the end... LOGIC CHAPTER – who will write it? – basics of predicate logic – term logic vs. predicate logic distinction – basic idea of fuzzy logic – MLN – PLN – causal inference, causal networks COGNITIVE MODELING CHAPTER OR SECTION – not critical to have a whole chapter on this, if need be I can just add a section on this in one of the earlier intro chapters, pointing students into the literature... – the general direction of the overview in this paper seems good

207 208 14 Complex Dynamics and Self-Organization in Intelligent Systems http://psych.stanford.edu/ jlm/papers/McClelland09PlaceOfModelingtopiCS.pdf ... but for a textbook one would need to give more explicit examples of cognitive models... In this chapter we provide a survey of relevant methods from the diverse and interesting field of dynamical systems theory. After a brief history, we provide in section 1 an overview of the most important basic concepts in general dynamical systems theory. We then provide rationales and motivations for more detailed analyses for AGI research. In section 2, we continue to explore the reasons complex dynamical systems are important by introducing the Hodgkin Huxley neural model. We investigate the role of Hebbian learnming in demonstrating associative memory within Hopfield networks, and examine the role and importance of complex attractors in asymmetric Hopfield networks. We investigate Freeman’s work modeling olfaction using strange attractors in neural networks in section 3. In section 4 we introduce Izhikevich’s neuron equation and compare it to more simplified formal neurons, and in section 5 we report on Izhikevich and Edelman’s large-scale brain simulation results. We end the chapter by describing and analyzing OpenCog’s attentional allocation system, the “Economic Attention Network”.

14.2 Dynamical Systems Concepts: Basic Concepts and History

14.2.1 What Are Dynamical Systems?

14.2.1.1 A Brief History

While dynamical systems theory grew out of the study of problems in classical mechanics (specif- ically celestial mechanics), many of the methods and techniques that grew out of this work were determined to be applicable to problems from many other fields as well including, but certainly not limited to, areas of biology, economics, meteorology, sociology and even anthropology. Dynamical systems grew out of ancient astronomical observations of the motions of the planets. These studies continued through the 17th century with the works of Kepler, Galileo, and Newton, and on through the eighteenth and nineteenth centuries through the works of Euler, Laplace, Lagrange, and Hamilton in the form of analytical dynamics [Ito93]. Yet the foundation of the modern theory of dynamical systems is usually attributed to Henri Poincaré in the late 19th century. Poincaré was the first to abstract commonalities in all these dynamic systems and provide a solid mathematical framework for modern dynamical systems theory as we conceive of it today. As Aubin and Dahan Dalmedico state [AD02]:

Due to the novelty, the variety of tools, concepts, and methods deployed by Henri Poincaré, there can be no doubt whatsoever that his œuvre is the point of origin of the domain under consideration here – dynamical systems and chaos – and the cornerstone on which it was built. Whatever conflicts have arisen between historiographical viewpoints, the recognition of Poincaré as the true founder and major theoretician of the domain has been unanimous. In his scientific lifework, he indeed articulated four especially important themes for our concern: (1) the qualitative theory of differential equations; (2) the study of global stability of sets of trajectories (instead of focusing on single solutions); (3) the notion of bifurcation together with the study of families of dynamical systems depending on a parameter; (4) the introduction of probabilistic concepts into dynamics, with respect to ergodic theory and the exclusion of exceptional cases. In the course of the following century, each of these four broad themes was mobilized either jointly or separately. Associated with fundamental concepts and methods, they would set the outline of the domain. 14.2 Dynamical Systems Concepts: Basic Concepts and History 209

Perhaps the most ambitious realization of the goals of dynamical systems theory was the foundation, in 1984, of the Santa Fe Institute for the multidisciplinary study of complex adaptive systems in general and in all their guises. What all of these areas share in common is the desire to understand and explain how systems change over time. While the specific laws and rules governing these systems may be different, the models describing their time evolution can be unified using the concept of a dynamical system – literally, a system that changes over time.

14.2.2 Basic Definitions

The fundamental goal of dynamical systems theory is the creation and study of mathematical models for describing the time evolution of systems. Mathematical formalization leads to a deep underlying theory of dynamical systems. In this chapter we will strive to explain the importance of these ideas for AGI in as simple a manner as possible. To reflect its broad generality, we begin with the following simple definition of a dynamical system: Definition 1 A dynamical system is a space coupled with a rule describing the time evolution of the space.

Building upon this simple description, we more formally state: Definition 2 A dynamical system is a set X, a set T of time values, and a function f : T × X −→ X. The space X is usually called the state space or phase space; in our context, the set T may be discrete or continuous; and the function f is usually termed the evolution function or evolution rule. The system’s time-dependence t ∈ T is often treated as a parameter rather than as an independent variable so that we write ft(x). It should also be pointed out that the function ft(x) should be commutative in the variable (or parameter) t. That is,

fs(ft(x)) = fs+t(x) = ft(fs(x)).

The idea of the state space X is that it is the space (described by some defining set of variables) needed to completely define the state of a system at a given point in time t – in essence, the system’s set of all possible world-states. From some given initial state xi at some initial time t0, the evolution function f then uniquely and deterministically traces out the system’s future evolution. The deterministic behavior of the system is codified through the requirement that the function f be commutative in time – it should not matter whether from some initial state at time t0 the system traces its path first for some length s of time followed by a length of time t, or whether the times are reversed: we should end at the same state at time s + t. Dynamical systems theory comes equipped with a number of definitions and developed theory for describing and characterizing different behaviors. Depending upon the characteristics of the set T of times, we distinguish between discrete dynamical systems and continuous dynamical systems. 210 14 Complex Dynamics and Self-Organization in Intelligent Systems

There is much common overlap between these two types of systems. Since most of AGI work is in the digital domain, we will begin by developing the theory for discrete systems. We will, however, also have an interest in developing and understanding the theory for continuous systems. Ultimately the two types of systems are tied together via the Poincaré map or Poincaré section, though we will not cover this connection.

14.3 Discrete Dynamical Systems

To explore and understand the sorts of qualitative behaviors that can arise in dynamical systems, we begin by examining the simple discrete maps, named logistic maps, given by the functions x −→ fa(x) = ax(1−x) where a is a parameter. In discrete dynamical systems we are interested 2 in what happens to such maps as we iterate the map repeatedly. We define fa (x) = fa(fa(x)), 3 n fa (x) = fa(fa(fa(x))), and in general fa (x) = fa(fa(fa(··· fa(x)))). Regardless of the value of the parameter a, we note that p = 0 and p = 1 − 1/a both satisfy fa(p) = p. Points p displaying this behavior – fa(p) = p – are called fixed points: once the system is at a fixed point it will remain there. We are also interested in knowing what happens to points arbitrarily close to the fixed points. As we shall see, for this example that depends upon the value of a.

14.3.1 Graphical Tools

We will use a number of graphical tools to aid us in understanding how the dynamics change as we vary the parameter a from a = 1 to a = 4. The first tool we will use is called a cobweb plot (also called a Verhulst diagram) and will help us visualize the orbits for various initial values. To create a cobweb plot, graph both the logistic map (or other function analyzed) and the line y = x on the same plot. The cobweb plot for an initial value of x0 begins at the point (x0, 0) and proceeds as follows:

• Draw a vertical line to the point on the logistic map x0, f(x0); • Now draw an horizontal line to the until it meets the line y = x; • Next draw another vertical line until it hits the function curve; • Repeat steps 2 and 3 as long as desired. The graphs below show characteristic behaviors for the logistic maps for a wide variety of parameter values for a. Looking closely at these plots we might notice some rather interesting behaviors as we vary a. For some values of a, there appears to be exactly one fixed point. For others, one obtains what are called 2-cycles. As we continue to increase a we find 4-cycles, 8-cycles, 16-cycles, each period doubling happening faster and faster. 14.3 Discrete Dynamical Systems 211

xn+1

1.0

0.8

0.6

0.4

0.2

xn 0.2 0.4 0.6 0.8 1.0

Fig. 14.1: a = 1, fixed point at x = 0

xn+1

1.0

0.8

0.6

0.4

0.2

xn 0.2 0.4 0.6 0.8 1.0

Fig. 14.2: a = 2, fixed point at x = 1 212 14 Complex Dynamics and Self-Organization in Intelligent Systems

xn+1

1.0

0.8

0.6

0.4

0.2

xn 0.2 0.4 0.6 0.8 1.0

Fig. 14.3: a = 3, period 2-cycle

xn+1

1.0

0.8

0.6

0.4

0.2

xn 0.2 0.4 0.6 0.8 1.0

Fig. 14.4: a = 3.55, preiod 4-cycle 14.3 Discrete Dynamical Systems 213

xn+1

1.0

0.8

0.6

0.4

0.2

xn 0.2 0.4 0.6 0.8 1.0

Fig. 14.5: a = 3.57, preiod 8-cycle

So what is going on here? More importantly, how does this quirky behavior relate to our main topic, Artificial General Intelligence? To answer these questions we need to analyze the behaviors in more depth and to do this we need some more definitions.

Definition 3 An orbit or trajectory of a given initial state a of a dynamical system is the set of states in the state space traced out by the evolution function F beginning at a. Definition 4 A fixed point a is said to be attracting if the orbits of points near a converge to a as t −→ ∞. An attracting fixed point is also called a sink.

Definition 5 A fixed point a is said to be repelling if all points in an arbitrarily small neigh- borhood of a end up at some nonzero distance away from a as t −→ ∞. A repelling fixed point is also called a source. We also generalize the notion of attracting fixed points to the more general notion of attract- ing sets or attractors:

Definition 6 A set S of the state space is said to be an attractor if for all s ∈ S, F(s, t) is also in S for all t > 0. As we saw from the graphs above, some attractors end up periodically cycling through some set of points.

Definition 7 A set S of the state space is said to be an attractor if for all s ∈ S, F(s, t) is also in S for all t > 0. We next define the concept of a basin of attraction. This idea will prove especially useful for understanding desired behaviors of neural networks, deep learning systems, attention control mechanisms, and other dynamics-based systems used in AGI research. 214 14 Complex Dynamics and Self-Organization in Intelligent Systems

Definition 8 A basin of attraction for an attractor S simply consists of all points in the state space that eventually end up in S.

14.4 Continuous Dynamical Systems

Most continuous dynamical systems take the form of systems of differential equations which can be ordinary, partial, or integro-differential equations. Moreover these equations can be either deterministic or stochastic. In this book we shall restrict our attention to systems of deterministic ordinary differential equations: they are the simplest to analyze; the theory is very well-developed; the qualitative dynamics are quite rich; and most importantly they are sufficient to display the sorts of behaviors that seem most useful in AGI research. As we shall see, the most interesting dynamics occur when these systems are nonlinear. Regardless, one of the primary tools for understanding the behavior of dynamical systems occurs via linearization, in this case near “fixed points”. We begin with a few fundamental notations and definitions. Let X = (x1(t), x2(t), x3(t), ··· , xn(t)) and   f1( X, t) f2( X, t) F =   .  .   .  fn( X, t) Definition 9 We then write a system of first-order ordinary differential equations (ODES) in the form X˙ (t) = F (X, t)

We remind the reader that any ordinary differential equation can be written as a system of ordinary first-order ODEs via a simple substitution. Definition 10 The system of ODES is said to be autonomous or time-invariant if the vector field is independent of time:

F (X, t) = F (X) .

Definition 11 A fixed point of a system of ODES is a point a for which F (a, t) = 0 for all t. Equivalently, we say that a is a fixed point if X(a) = a. Definition 12 An orbit or trajectory of a given initial state a of a dynamical system is the set of states in the state space traced out by the evolution function F beginning at a.

Definition 13 A fixed point a is said to be attracting if the orbits of points near a converge to a as t −→ ∞. An attracting fixed point is also called a sink. Definition 14 A fixed point a is said to be repelling if all points in an arbitrarily small neighborhood of a end up at some nonzero distance away from a as t −→ ∞. A repelling fixed point is also called a source. 14.4 Continuous Dynamical Systems 215

We also generalize the notion of attracting fixed points to the more general notion of attract- ing sets or attractors: Definition 15 A set S of the state space is said to be an attractor if for all s ∈ S, F(s, t) is also in S for all t > 0. We next define the concept of a basin of attraction. This idea will prove especially useful for understanding desired behaviors of neural networks, deep learning systems, attention control mechanisms, and other dynamics-based systems used in AGI research.

Definition 16 A basin of attraction for an attractor S simply consists of all points in the state space that eventually end up in S.

14.4.1 Analysis of Dynamical Systems

As we begin our analysis of the behavior of continuous dynamical systems, it helps to better understand our goals: • to describe regions of stability and instability; • to classify different types of attractors; • to understand the shape (topology) of basins of attraction; • to characterize behaviors near fixed points; • to define and understand map bifurcations; • to define and understand the notions of order and chaos; • to define and characterize periodic points; • to define and characterize limit cycles; • to define and describe boundary points; • to understand and characterize types of orbits. Note the qualitative nature of the above list of system properties. One way to understand the behavior of systems is, quite obviously, to analytically solve the system. Yet when it is not feasible to find an exact analytic solution, we can often still describe general qualitative properties of the system. Moreover, even when exact solutions are possible, we may still have only inexact information concerning the initial state of the system. In cases such as these, we need to understand how sensitive a system is to any uncertainty in this initial information. If a system is governed by “deterministic chaos” then the slightest changes in these initial conditions can lead to large-scale changes in the solution: the so-called “Butterfly effect” named for Edward Lorenz’s seminal talk at the 1972 annual meeting of the American Association for the Advancement of Science, “Predictability: does the flap of a butterfly?s wings in Brazil set off a tornado in Texas?” The behavior near the fixed points will be determined by the topology of the region. For example, the bottom of a cereal bowl is a fixed point. If we place a marble precisely at the bottom point, it will remain stationary. What happens next depends upon whether or the system is dissipative – that is, in this example, whether or not there is any friction. If we perturb the marble slightly from this initial fixed point, after a few oscillations, the marble will either oscillate forever in the absence of friction, or a few times and eventually return back to the bottom and remain there if there is friction. 216 14 Complex Dynamics and Self-Organization in Intelligent Systems

Contrast the behavior of that particular example with the behavior near the fixed point at the top of the same cereal bowl when we turn the bowl upside down. If we balance the marble perfectly at the top of the bowl, it once again remains at its initial position. Yet the slightness change from this initial position finds the marble rapidly descending down the side of the bowl away from the initial fixed point. It may then come to rest at another fixed point, or into another oscillatory state, or it may even race off towards infinity. Note the different end states of the two examples. In the first case it ended up at or near the same fixed point from whence it started; but in the second case, no matter how small the change in the initial position, the marble moved away towards some completely different location. Of course in the original example, we could have moved the marble enough (provided enough energy to it) so that it goes over the top edge. The difference, though, is that in the second example, any small change will necessarily lead to a state change. The two examples just analyzed, provide a good glimpse of what we mean by the qualitative nature of dynamical systems. In the first example, the marble is in a valley bottom and we say the the fixed point is stable with respect to small changes in initial conditions. In the second example, the system is said to be unstable. Motivated by the examples above, we now classify four levels of stability based upon the behavior of points arbitrarily close to the the equilibria, but not exactly at the equilibria. A state of equilibrium a is defined to be • stable or Lyapunov stable if given any  > 0, there exists some δ > 0 such that kX (t) − ak <  for all t > t0 whenever kX (t0) − ak < δ; • asymptotically stable if a is stable and X −→ a as t −→ ∞; • unstable there is some  > 0, such that for all δ > 0 there exists an initial value with kX (t0) − ak < δ so that for some t > t0 we have kX (t) − ak > ; • completely unstable there is some  > 0, such that for all δ > 0 and for all initial values with kX (t0) − ak < δ there is some t > t0 we have kX (t) − ak > . The question next becomes how does one characterize and determine regions of stability? The answer depends upon assumptions we can make about the evolution function F. If we know that F is at differentiable, then we can say a lot.

14.4.1.1 Linearizations about Equilibria

Much of the qualitative analysis of dynamic systems occurs via linearization about equilibria (fixed points) of the evolution function. Definition 17 The linearization of

14.4.1.2 Manifolds and Subspaces

14.5 Discrete Dynamical Systems

Definition 18 We write a system of differential equations in the form Definition 19 A fixed point 14.6 Neuron Models 217

Definition 20 A fixed point is said to be attracting if Definition 21 A fixed point is said to be repelling if

14.6 Neuron Models

14.6.1 Hodgkin Huxley equation

In the 1940s, Alan Lloyd Hodgkin and Andrew Huxley conducted experiments on the squid giant axon in their efforts to understand the chemical processes of nerve impulses. The two researchers chose the squid giant axon precisely because the axon is, well, giant (relatively speaking, 0.5 mm in diameter). In 1952, the two developed a model for the initiation and propagation of these impulses as a system of nonlinear differential equations. Hodgkin, Huxley, and John Eccles jointly shared the 1963 Nobel prize in medicine “for their discoveries concerning the ionic mechanisms involved in excitation and inhibition in the peripheral and central portions of the nerve cell membrane". The Hodgkin-Huxley model itself is certainly one of the most sophisticated models of complex biological processes ever developed. The model can be understood with the help of the following schematic diagram representing the electrochemical channels in the axon: 218 14 Complex Dynamics and Self-Organization in Intelligent Systems

What is remarkable about the model is that Hodgkin and Huxley modeled the chemical processes of sodium and potassium ion channels as time and voltage dependent electrical con- ductances. Through a series of voltage clamp experiments they were then able to deduce that the model was governed by a system of four nonlinear ordinary differential equations. For the purposes of AGI, what is important is the qualitative behavior of this system of ODEs. 14.8 Evolutionary Learning 219 14.6.2 The Izhikevich neuron

The computational neuroscientist Eugene Izhikevich pioneered a new approach to modeling by incorporating more complex dynamical systems ideas within the model.

14.6.3 formal neuron as used in CS style NN

14.7 Neural Networks

14.7.1 Hopfield Networks

Hopfield neural networks are a particular artificial neural network with a number of desirable properties for AGI purposes: • the network is recurrent and so can display complex nonlinear dynamic behavior; • can serve as content-addressable memory; • the updating equations have the effect of forcing a certain scalar function of the network state, referred to as the energy function, to converge to a local minimum; • this local minimum attractor state is stable; • can be trained to • is simple to analyze

14.7.1.1 Symmetric Hopfield Networks symmetric hopfield stores memories associatively

14.7.1.2 Asymmetric Hopfield Networks

—- asymmetric hopfield yields complex dynamics incl. strange attractors

14.7.1.3 Strange Attractors in Assymetric Hopfield Networks

—- feedforward vs. recurrent NN (the latter can have complex nonlinear dynamics)

14.8 Evolutionary Learning

Inspired by biological evolution, evolutionary methods are a robust research field within AGI. In this section we will investigate three important sub-fields: Genetic Algorithms, Genetic Pro- gramming, and Estimation of Distribution Algorithms. 220 14 Complex Dynamics and Self-Organization in Intelligent Systems 14.8.1 Genetic Algorithms

Though the idea of evolutionary algorithms date back to the 1950’s, genetic algorithms (GAs) became popular through the work of John Holland at the University of Michigan in the 1960s and 1970s. Genetic algorithms are a method of function optimization mimicking the ideas of Darwinian natural selection to evolve optimizer candidates. While there are now many GA variations, the most basic GA algorithm is quite simple in design consisting of the three operations selection, crossover, and mutation. The basic idea is to use these three biologically motivated operators to evolve populations of candidate solutions.

14.8.1.1 Selection

Starting with some initial population, the selection operator chooses two individuals from the population in proportion to their “fitness” calculated from a given fitness function. In terms of optimization, the fitness function would simply the function one desires to optimize.

14.8.1.2 Crossover

14.8.1.3 Mutation

14.8.1.4 A Simple Example

To make the GA process concrete we will walk through a very simple (though artificial) example. Suppose we wish to maximize the fitness function f(x) = −x2 +12x+48 using an initial random population size of n = 10, and we are interested only in integer solutions on the interval [0, 16]. While one can easily solve this particular problem using simple algebra, the point here is to demonstrate the process by which GAs work. For this problem, an individual can have any one of the 16 values 0, 1, 2, ··· , 15. The following table then gives the value of f at each of these values: x 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 f(x) 48 59 68 75 80 83 84 83 80 75 68 59 48 35 20 3

14.8.2 Genetic Programming

14.8.3 Estimation of Distribution Algorithms

14.9 Economic Attention Networks

An Economic Attention Network (ECAN) is a particular dual-purpose neural network used in the OpenCog AGI open-source framework. It two broad purposes are to serve • as OpenCog’s primary system for allocating attention and resources; 14.9 Economic Attention Networks 221

• and as an associative memory used to drive concept creation. Unlike traditional neural networks, ECAN is motivated by an economic metaphor (hence its name) and can be viewed as a conservative network (it conserves currencies.) An ECAN is a hypergraph, consisting of un-typed nodes and links weighted with two numbers called STI (short-term importance) and LTI (long-term importance) ECAN also contains links that may be typed as HebbianLinks which are weighted with a probability. The term Atom will be used to refer to both nodes and links. The equations of an ECAN explain how the STI, LTI and Hebbian probability values get updated over time. The metaphor underlying these equations is the interpretation of STI and LTI values as (separate) artificial currencies. The fact that STI (for instance) is a currency means that the total amount of STI in the system is conserved (except in unusual instances where the ECAN controller decides to introduce inflation or deflation and explicitly manipulate the amount of currency in circulation), a fact that makes the dynamics of an ECAN dramatically different than that of, say, an attractor neural network (in which there is no law of conservation of activation). Conceptually, the STI value of an Atom is interpreted to indicate the immediate urgency of the Atom to the ECAN at a certain point in time; whereas the LTI value of an Atom indicates the amount of value the ECAN perceives in the retention of the Atom in memory (RAM). An ECAN will often be coupled with a “Forgetting” process that removes low-LTI Atoms from memory according to certain heuristics. STI and LTI values will generally vary continuously, but the ECAN equations we introduce below contain the notion of an AttentionalFocus (AF), consisting of those Atoms in the ECAN with the highest STI values. The AF is given its meaning by the existence of equations that treat Atoms with STI above a certain threshold differently. The probability value of a HebbianLink from A to B is the odds that if A is in the AF, so is B. A critical aspect of the ECAN equations is that Atoms periodically spread their STI and LTI to other Atoms that connect to them via Hebbian links; this is the ECAN analogue of activation spreading in neural networks.

14.9.1 Updating Equations

There are four basic steps involved in the ECAN updating equations: 1. Wages and Rent: As Atoms perform goal-driven work through external processes, they are rewarded stimulus which is converted into STI and LTI wages in amounts proportional to the work an Atom does; Those Atoms deemed of current interest (in the Attentional Focus) must also pay rent to remain in the focus; 2. HebbianLink Conjunction Equations: The “conjunction” ci, j between Atoms xi and xj linked via a HebbianLink represents the probability that if the Atom xi is in the attentional focus at some instant of time, than so is the Atom xj; 3. Diffusion: The process of Hebbian probability updating is carried out via a diffusion process in which some nodes ?trade? STI and LTI. 4. Homeostatic Bounds Maintenance: To maintain overall system funds within homeostatic bounds, a mid-cycle tax and rent-adjustment can be triggered if necessary. We are currently investigating various particular equations for performing each of the four governing steps itemized above. 222 14 Complex Dynamics and Self-Organization in Intelligent Systems

1. Stimulus is converted to wages in amounts proportional to the work an Atom does.

si = RecentStimulus_xi· < STIMULUS_STI_MULTIPLIER > X TotalRecentSTI = si i s wages = i . TotalRecentSTI

and Rent can take a variety of forms, the simplest ones being flat rent,

( < RENT >, if s ≥ s rent = i AF 0, else

and linear rent,

( (si−sAF ) max( , if si ≥ sAF rent = recentMaxSTI−sAF 0, else.

Here sAF is the boundary of the Attentional Focus. 2. Our current conjunction equations are

( si recentMaxSTI , if si ≥ sAF normi = si recentMinSTI , if si < sAF .

conji,j = Conjunction(si, sj) = max(normi × normj, 1).

3. The decision about which nodes diffuse in each diffusion cycle is carried out via a decision function. We currently are working with two types of decision functions: a standard thresh- old function, by which nodes diffuse if and only if the nodes are in the AF; and a stochastic decision function in which nodes diffuse with probability

tanh(shape(s − s )) + 1 i AF , 2

where shape and sAF (the focus boundary) are parameters. The amount of STI an Atom diffuses is given by maxSpread · si and maxSpread is given by the product of the strength and confidence of the Atom diffusing the STI/LTI. 4. Finally, the set of homeostatic bounds equations we are currently investigating is 14.9 Economic Attention Networks 223 X RecentAFSTI = si

si∈AF RecentAFSTI × RentFactor = Recent Size of AF x tax = Recent size of AF ∀i : si = min (si − tax, minimumSTI).

All quantities enclosed in angled brackets are system parameters, and LTI updating is ac- complished using a completely analogous set of equations. A key property of these equations is that both wages paid to, and rent paid by, each node are positively correlated to their STI values. That is, the more important nodes are paid more for their services, but they also pay more in rent. A fixed percentage of the links with the lowest LTI values is then forgotten (which corresponds equationally to setting the LTI to 0).

14.9.2 ECAN and Cognitive Synergy

Being designed as a conservative neural network with "Hebbian-like" learning rules means that ECAN is able to perform two major roles in an AGI. ECAN can serve as a system for resource and attention allocation. The equations described above were largely designed for this reason. Yet ECAN’s roots provide the network with the ability to also serve as an associative memory. In this context, ECAN can create new Nodes that represent prominent attractor patterns and in this manner to drive concept creation. It seems intuitively clear that these attractor- convergence properties will in turn reinforce ECAN’s attention allocation capabilities. If a collection of Atoms is often collectively useful for some cognitive process, then the associative-memory-type behavior of ECANs means that once a handful of the Atoms in the collection are found useful in a certain inference process, the other Atoms in the collection will get their STI significantly boosted, and will be likely to get chosen in subsequent portions of that same inference process. This allocation role also means that ECAN can be used to guide forward and backward reasoning processes. This interaction between attention allocation and other cognitive processes is exactly the sort of dynamics one would like to see occur and and is an idea we refer to as “cognitive synergy.”

Chapter 15 Deep Learning: Principles and Practices

DEEP LEARNING CHAPTER – I have a lot of relevant material in the form of a Word doc – What’s missing are a few details on, say —- CNN —- stacked autoencoders – Details on DeSTIN we have and can paste in from elsewhere – BS on the potential grand extensions of deep learning I can insert at the end...

15.1 Convolutional Neural Networks (CNNs)

225

Chapter 16 Algorithmic Information Theory and General Intelligence

Let’s put a chapter here !!!

227

Section IV Example AGI Architectures

Chapter 17 The LIDA Architecture

http://ccrg.cs.memphis.edu/assets/papers/2013/franklin-ieee-tamd11.pdf http://ccrg.cs.memphis.edu/assets/papers/2010/Event_Representation_Final.pdf Abstract Intelligent software agents (agents) adhering to the action selection paradigm have only one primary task that they need accomplish at any given time: to choose their next action. Consequently, modeling the current situation effectively is a critical task for any agent. With an accurate model of the current situation, actions can be better selected. W e propose an event- based representational framework designed to provide grounded perceptual representations of events for agents. We describe how they are produced and detail their role in a comprehensive cognitive architecture designed to explain, integrate, and model human cognition. Event-based representations draw inspiration from research on thematic roles, and integrate research on event perception. Events are represented as parameterized actions, that is, nodes with thematic role links that can bind to Agent, Object, and other node types.

17.1 Introduction

Agents adhering to the action selection paradigm, (Franklin, 1995) have only one primary task they need to accomplish at any given time; that is, selecting their next action. In order to choose actions well, it is critical for the agent to effectively represent its current situation. In this paper we present basic, primitive representations for agents to represent their current situation with. We also detail the processes necessary to produce these representations and the role they play in a comprehensive cognitive architecture using the LIDA model as an example (Franklin & Patterson 2006, Franklin et al. 2007). The LIDA model is a comprehensive, conceptual and computational architecture designed to explain, integrate, and model a large portion of human cognition. Based primarily on Global Workspace Theory (Baars 1988), the model implements and fleshes out a number of psychologi- cal and neuropsychological theories including situated cognition (Varela et al. 1991), perceptual symbol systems (Barsalou 1999, 2008), working memory (Baddeley and Hitch 1974), memory by affordances (Glenberg 1997), long-term working memory (Ericsson and Kintsch 1995), Sloman?s H-CogAff architecture (1999), and transient episodic memory (Conway 2001). The LIDA computational architecture, derived from the LIDA cognitive model, employs several modules that are designed using computational mechanisms drawn from the ?new AI.?

231 232 17 The LIDA Architecture

These include variants of the Copycat Architecture (Hofstadter and Mitchell 1995, Marshall 2002), Sparse Distributed Memory (Kanerva 1988, Rao and Fuentes 1998), the Schema Mech- anism (Drescher 1991, Chaput et al. 2003), the Behavior Net (Maes 1989, Tyrrell 1994), and the Subsumption Architecture (Brooks 1991). An initial version of a software framework for LIDA-based agents has recently been completed. The LIDA model and its ensuing architecture are grounded in the LIDA cognitive cycle. Every autonomous agent (Franklin and Graesser 1997), be it human, animal, or artificial, must frequently sample (sense) its environment and select an appropriate response (action). More sophisticated agents, such as humans, process (make sense of) the input from such sam- pling in order to facilitate their action selection. The agent?s ?life? can be viewed as consisting of a continual sequence of these cognitive cycles, as they are called in the LIDA model. Each cycle is composed of phases of sensing, attending and acting. A cognitive cycle can be thought of as a moment of cognition - a cognitive ?moment.? Higher- level cognitive processes are composed of many of these cognitive cycles, each a cognitive ?atom.? Just as atoms are composed of protons, neutrons and electrons, and some of these are composed of quarks, gluons, etc., these cognitive ?atoms? have a rich inner structure. What the LIDA model hypothesizes as the rich inner structure of the LIDA cognitive cycle will be described briefly. More detailed descriptions are available elsewhere (Baars & Franklin 2003, Franklin et al. 2007). During each cognitive cycle the LIDA agent first makes sense of its current situation as best as it can by updating its representation of its world, both external and internal. I By a competitive process, as specified by Global Workspace Theory, it then decides what portion of the represented situation is most in need of attention. Broadcasting this portion, the current contents of consciousness, helps the agent to finally choose an appropriate action and execute it. Thus, the LIDA cognitive cycle can be subdivided into three phases, the understanding phase, the attention phase, and the action selection phase. Figure 1 should help the reader follow the description. It proceeds clockwise from the upper left. Beginning the understanding phase, incoming stimuli activate low-level feature detectors in Sensory Memory. This preprocessed output is sent to Perceptual Associative Memory where higher-level feature detectors feed in to more abstract entities such as objects, categories, ac- tions, events, feelings, etc. The resulting percept is sent to the Workspace where it cues both Transient Episodic Memory and Declarative Memory producing local associations. These local associations are combined with the percept to generate a current situational model, the agent?s understanding of what?s going on right now. Attention Codelets1 begin the attention phase by forming coalitions of selected portions of the current situational model and moving them to the Global Workspace. A competition in the Global Workspace then selects the most salient, the most relevant, the most important, the most urgent coalition, whose contents become the content of consciousness that are broadcast globally. The action selection phase of LIDA?s cognitive cycle is also a learning phase in which several processes operate in parallel. New entities and associations, and the reinforcement of old ones, occur as the conscious broadcast reaches Perceptual Associative Memory. Events from the con- scious broadcast are encoded as new memories in Transient Episodic Memory. Possible action schemes, together with their contexts and expected results, are learned into Procedural Memory from the conscious broadcast. Older schemes are reinforced. In parallel with all this learning, and using the conscious contents, possible action schemes are recruited from Procedural Mem- ory. A copy of each such scheme is instantiated with its variables bound, and sent to Action 17.1 Introduction 233

Selection, where it competes to be the behavior selected for this cognitive cycle. The selected behavior triggers Sensory-Motor Memory to produce a suitable algorithm for the execution of the behavior. Its execution completes the cognitive cycle. While its developers hesitate to claim that LIDA is more? general or more powerful than other comprehensive cognitive architectures such as SOAR (Laird, et al., 1987), ACT-R (Anderson, 1990), Clarion (Sun, 2007), etc., they do believe that LIDA will prove to be a more detailed and faithful model of human cognition, including several forms of learning, that incorporates the processes and mechanisms required for sophisticated decision making.

Figure 1. The LIDA Cognitive Cycle

By a competitive process, as specified by Global coalitions of selected portions of the current situational Workspace Theory, it then decides what portion of the model and moving them to the Global Workspace. A represented situation is most in need of attention. competition in the Global Workspace then selects the most Broadcasting this portion, the current contents of salient, the most relevant, the most important, the most consciousness, helps the agent to finally choose an urgent coalition, whose contents become the content of appropriate action and execute it. Thus, the LIDA consciousness that are broadcast globally. cognitive cycle can be subdivided into three phases, the The action selection phase of LIDA’s cognitive cycle is understanding phase, the attention phase, and the action also a learning phase in which several processes operate in selection phase. Figure 1 should help the reader follow the parallel. New entities and associations, and the description. It proceeds clockwise from the upper left . reinforcement of old ones, occur as the conscious Beginning the understanding phase, incoming stimuli broadcast reaches Perceptual Associative Memory. Events activate low-level feature detectors in Sensory Memory. from the conscious broadcast are encoded as new This preprocessed output is sent to Perceptual Associative memories in Transient Episodic Memory. Possible action Memory where higher-level feature detectors feed in to schemes, together with their contexts and expected results, more abstract entities such as objects, categories, actions, are learned into Procedural Memory from the conscious events, feelings, etc. The resulting percept is sent to the broadcast. Older schemes are reinforced. In parallel with Workspace where it cues both Transient Episodic Memory all this learning, and using the conscious contents, possible and Declarative Memory producing local associations. action schemes are recruited from Procedural Memory. A These local associations are combined with the percept to copy of each such scheme is instantiated with its variables LIDA hasgenerate a anumber current situational of features model, the that agent’sseparate bound, and it sent from to Action other Selection, cognitivewhere it competes architectures. to There is understanding of what’s going on right now. be the behavior selected for this cognitive cycle. The an explicit Attention attention Codelets1 begin mechanism the attention phase (functional by forming selected consciousness) behavior triggers Sensory to focus-Motor Memory on a to salient portion of its produce a suitable algorithm for the execution of the 1 The term codelet refers generally to any small, special purpose behavior. Its execution completes the cognitive cycle. processor or running piece of computer code. While its developers hesitate to claim that LIDA is more 234 17 The LIDA Architecture current situation. Feelings and emotions are used for motivation and to bias learning. LIDA incorporates the ?cognitive cycle? hypothesis ? that the action-perception cycle (Neisser, 1976; Freeman, 2002) can be thought of as a cognitive atom, and that all higher- level cognitive pro- cesses are composed of multiple cognitive cycles implemented using behavior streams. LIDA?s Workspace provides a detailed inner structure for preconscious working memory. It includes a model of the agent?s current situation (Current Situational Model). The Current Situational Model contains a perceptual scene with windows for both real and virtual (imaginary) con- ceptual representation (McCall, Snaider, Franklin, 2010). The Current Situational Model also contains complex structures for an even higher-level, ?global? representation. The Workspace also contains an ordered queue of the recent contents of consciousness (Conscious Contents Queue) and an episodic buffer of recent local associations (Franklin et al., 2005). A quick glance at the LIDA model, particularly its cognitive cycle diagram (Figure 1), makes the model appear modular. This interpretation is misleading. Within a single cycle, individ- ual modules such as Perceptual Associative Memory and Action Selection are internally quite interactive. Structure-building codelets operate interactively on the Workspace. The cognitive cycle as a whole operates quite interactively, in that internal procedures happen asynchronously. For example, nodes and links in Perceptual Associative Memory instantiate grounded copies of themselves in the Workspace whenever they become sufficiently activated, without waiting for a single percept to be moved. Thus, the cognitive cycle is more interactive than would be expected by a system based on information passing between modules. The only straight infor- mation passing in the entire cycle is the conscious broadcast of a single coalition?s contents and the selection of the single action to be performed. All other processes within a cycle are interactive. All higher-level, multi-cyclic processes in LIDA are quite interactive, since their cognitive cycles, occurring at a rate of five to ten per second, continually interact with one another. Though LIDA superficially appears modular, it is, in its operation, much more aligned with the interactive approach.

17.2 Background Chapter 18 The MicroPsi Architecture

Let’s put a chapter here !!!

235

Chapter 19 The CogPrime Architecture and OpenCog System

Ben Goertzel

Abstract The CogPrime architecture for embodied AGI is overviewed, covering the core ar- chitecture and algorithms, the underlying conceptual motivations, and the emergent structures, dynamics and functionalities expected to arise in a completely implemented CogPrime system once it has undergone appropriate experience and education.

19.1 Introduction

This chapter overviews CogPrime, a conceptual and technical design for an AGI system, which is intended when complete to be capable of the same qualitative sort of general intelligence as human beings. CogPrime is described in more detail in the two-volume book Engineering General Intelligence [? ][? ], which exceeds 1000 pages including appendices; this chapter outlinse some of the key points in a more compact format. CogPrime is not a model of human neural or cognitive structure or activity. It draws heavily on knowledge about human intelligence, especially cognitive psychology; but it also deviates from the known nature of human intelligence in many ways, with a goal of providing maximal humanly-meaningful general intelligence using available computer hardware.

19.1.1 An Integrative Approach

There is no consensus on why all the related technological and scientific progress mentioned above has not yet yielded AI software systems with human-like general intelligence. Underlying the CogPrime design, however, is the hypothesis that the core reason boils down to three points: • Intelligence depends on the emergence of certain high-level structures and dynamics across a system’s whole knowledge base; • We have not discovered any one algorithm or approach capable of yielding the emergence of these structures; • Achieving the emergence of these structures within a system formed by integrating a number of different AI algorithms and structures is tricky. It requires careful attention to the manner

237 238 19 The CogPrime Architecture and OpenCog System

in which these algorithms and structures are integrated; and so far the integration has not been done in the correct way. The human brain appears to be an integration of an assemblage of diverse structures and dynamics, built using common components and arranged according to a sensible cognitive ar- chitecture. However, its algorithms and structures have been honed by evolution to work closely together – they are very tightly inter-adapted, in somewhat the same way that the different organs of the body are adapted to work together. Due their close interoperation they give rise to the overall systemic behaviors that characterize human-like general intelligence. We believe that the main missing ingredient in AI so far is cognitive synergy: the fitting-together of different intelligent components into an appropriate cognitive architecture, in such a way that the components richly and dynamically support and assist each other, interrelating very closely in a similar manner to the components of the brain or body and thus giving rise to appropriate emergent structures and dynamics. Which leads us to one of the central hypotheses underlying the CogPrime approach to AGI: that the cognitive synergy ensuing from integrating multiple symbolic and subsymbolic learning and memory components in an appro- priate cognitive architecture and environment, can yield robust intelligence at the human level and ultimately beyond. The reason this sort of intimate integration has not yet been explored much is that it’s difficult on multiple levels, requiring the design of an architecture and its component algorithms with a view toward the structures and dynamics that will arise in the system once it is coupled with an appropriate environment. Typically, the AI algorithms and structures corresponding to different cognitive functions have been developed based on divergent theoretical principles, by disparate communities of researchers, and have been tuned for effective performance on different tasks in different environments. Making such diverse components work together in a truly synergetic and cooperative way is a tall order, yet we believe that this – rather than some particular algorithm, structure or architectural principle – is the “secret sauce” needed to create human-level AGI based on technologies available today.

19.1.2 Key Claims

One way to approach CogPrime is to ask: Apart from the general notion of cognitive synergy, what specific hypotheses about cognition is the CogPrime design based on? What follows is list of claims such that, if the reader accepts these claims, they should probably accept that the CogPrime approach to AGI is a viable one. On the other hand if the reader rejects one or more of these claims, they may find one or more aspects of CogPrime unacceptable for a related reason. Some of the claims, as stated, use some CogPrime-specific terminology that will be explained only later in the chapter. So the list should be viewed as a guide to reading the rest of the chapter, and to be reviewed again once the chapter is done. 1. General intelligence (at the human level and ultimately beyond) can be achieved via creating a computational system that uses much of its resources seeking to achieve its goals, via using perception and memory to predict which actions will achieve its goals in the contexts in which it finds itself. 2. To achieve general intelligence in the context of human-intelligence-friendly environments and goals using feasible computational resources, it’s important that an AGI system can 19.1 Introduction 239

handle different kinds of memory (declarative, procedural, episodic, sensory, intentional, attentional) in customized but interoperable ways. 3. Cognitive synergy: It’s important that the cognitive processes associated with different kinds of memory can appeal to each other for assistance in overcoming bottlenecks in a manner that enables each cognitive process to act in a manner that is sensitive to the particularities of each others’ internal representations, and that doesn’t impose unreasonable delays on the overall cognitive dynamics. 4. As a general principle, neither purely localized nor purely global memory is sufficient for general intelligence under feasible computational resources; “glocal” memory will be re- quired. 5. To achieve human-like general intelligence, it’s important for an intelligent agent to have sensory data and motoric affordances that roughly emulate those available to humans. We don’t know exactly how close this emulation needs to be, which means that our AGI systems and platforms need to support fairly flexible experimentation with virtual-world and/or robotic infrastructures. 6. To work toward adult human-level, roughly human-like general intelligence, one fairly easily comprehensible path is to use environments and goals reminiscent of human childhood, and seek to advance one’s AGI system along a path roughly comparable to that followed by human children. 7. It is most effective to teach an AGI system aimed at roughly human-like general intelli- gence via a mix of spontaneous learning and explicit instruction, and to instruct it via a combination of imitation, reinforcement and correction, and a combination of linguistic and nonlinguistic instruction. 8. One effective approach to teaching an AGI system human language is to supply it with some in-built linguistic facility, in the form of rule-based and statistical-linguistics-based NLP systems, and then allow it to improve and revise this facility based on experience. 9. An AGI system with adequate mechanisms for handling the key types of knowledge men- tioned (in item 2) above, and the capability to explicitly recognize large-scale pattern in itself, should, upon sustained interaction with an appropriate environment in pur- suit of appropriate goals, emerge a variety of complex structures in its internal knowledge network, including, but not limited to: • a hierarchical network, representing both a spatiotemporal hierarchy and an approxi- mate “default inheritance” hierarchy, cross-linked; • a heterarchical network of associativity, roughly aligned with the hierarchical network; • a self network which is an approximate micro image of the whole network; • inter-reflecting networks modeling self and others, reflecting a “mirrorhouse” design pattern [? ]. 10. Given the strengths and weaknesses of current and near-future digital computers, a. a (loosely) neural-symbolic network is a good representation for directly storing many kinds of memory, and interfacing between those that it doesn’t store directly; b. Uncertain logic is a good way to handle declarative knowledge. To deal with the prob- lems facing a human-level AGI, an uncertain logic must integrate imprecise probability and fuzziness with a broad scope of logical constructs. PLN is one good realization. c. Programs are a good way to represent procedures (both cognitive and physical-action, but perhaps not including low-level motor-control procedures). 240 19 The CogPrime Architecture and OpenCog System

d. Evolutionary program learning is a good way to handle difficult program learning prob- lems. Probabilistic learning on normalized programs is one effective approach to evolu- tionary program learning. MOSES is one good realization of this approach. e. Multistart hill-climbing, with a strong Occam prior, is a good way to handle relatively straightforward program learning problems. f. Activation spreading and Hebbian learning comprise a reasonable way to handle atten- tional knowledge (though other approaches, with greater overhead cost, may provide better accuracy and may be appropriate in some situations). • Artificial economics is an effective approach to activation spreading and Hebbian learning in the context of neural-symbolic networks; • ECAN is one good realization of artificial economics; • A good trade-off between comprehensiveness and efficiency is to focus on two kinds of attention: processor attention (represented in CogPrime by ShortTermImpor- tance) and memory attention (represented in CogPrime by LongTermImportance). g. Simulation is a good way to handle episodic knowledge (remembered and imagined). Running an internal world simulation engine is an effective way to handle simulation. h. Hybridization of one’s integrative neural-symbolic system with a spatiotemporally hier- archical deep learning system is an effective way to handle representation and learning of low-level sensorimotor knowledge. DeSTIN is one example of a deep learning system of this nature that can be effective in this context. i. One effective way to handle goals is to represent them declaratively, and allocate atten- tion among them economically. CogPrime ’s PLN/ECAN based framework for handling intentional knowledge is one good realization. 11. It is important for an intelligent system to have some way of recognizing large-scale patterns in itself, and then embodying these patterns as new, localized knowledge items in its memory ( a dynamic called the "cognitive equation" in [Goe94]),. Given the use of a neural-symbolic network for knowledge representation, a graph-mining based “map formation” heuristic is one good way to do this. 12. Occam’s Razor: Intelligence is closely tied to the creation of procedures that achieve goals in environments in the simplest possible way. Each of an AGI system’s cognitive algorithms should embody a simplicity bias in some explicit or implicit form. 13. An AGI system, if supplied with a commonsensically ethical goal system and an intentional component based on rigorous uncertain inference, should be able to reliably achieve a much higher level of commonsensically ethical behavior than any human being. 14. Once sufficiently advanced, an AGI system with a logic-based declarative knowledge ap- proach and a program-learning-based procedural knowledge approach should be able to radically self-improve via a variety of methods, including supercompilation and automated theorem-proving.

19.2 CogPrime and OpenCog

CogPrime is closely allied with the OpenCog open-source AI software framework. But the two are not synonymous. OpenCog is a more general framework, suitable for implementation of a variety of specialized AI applications as well as, potentially, alternate AGI designs. And 19.2 CogPrime and OpenCog 241

CogPrime could potentially be implemented other than within the OpenCog framework. The particular implementation of CogPrime in OpenCog is called OpenCogPrime. OpenCog was designed with the purpose, alongside others, of enabling efficient, scalable implementation of the full CogPrime design.

19.2.1 Current and Prior Applications of OpenCog

To give greater understanding regarding the practical platform for current work aimed at realiz- ing CogPrime, we now briefly discuss some of the practicalities of work done with the OpenCog system that currently implements parts of the CogPrime architecture. OpenCog, the open-source software framework underlying the “OpenCogPrime” (currently partial) implementation of the CogPrime architecture, has been used for commercial applica- tions in the area of natural language processing and . For instance, see [GPPG06] where OpenCogPrime’s PLN reasoning and RelEx language processing are combined to do automated biological hypothesis generation based on information gathered from PubMed ab- stracts. [Loo06] describes the use of OpenCog’s MOSES component for biological data analysis; this use has been extended considerably in a variety of unpublished commercial applications since that point, in domains such as financial prediction, genetics, marketing data analysis and natural language processing. Most relevantly to the present work, OpenCog has also been used to control virtual agents in virtual worlds [GEA08]. Prototype work done during 2007-2008 involved using an OpenCog variant called the Open- PetBrain to control virtual dogs in a virtual world (see Figure 19.1 for a screenshot of an OpenPetBrain-controlled virtual dog). While these OpenCog virtual dogs did not display in- telligence closely comparable to that of real dogs (or human children), they did demonstrate a variety of interesting and relevant functionalities including • learning new behaviors based on imitation and reinforcement • responding to natural language commands and questions, with appropriate actions and natural language replies • spontaneous exploration of their world, remembering their experiences and using them to bias future learning and linguistic interaction One current OpenCog initiative [GPC+11] involves extending the virtual dog work via using OpenCog to control virtual agents in a game world inspired by the game Minecraft. These agents are initially specifically concerned with achieving goals in a game world via constructing structures with blocks and carrying out simple English communications. Representative example tasks would be: • Learning to build steps or ladders to get desired objects that are high up • Learning to build a shelter to protect itself from aggressors • Learning to build structures resembling structures that itÕs shown (even if the available materials are a bit different) • Learning how to build bridges to cross chasms Of course, the AI significance of learning tasks like this all depends on what kind of feedback the system is given, and how complex its environment is. It would be relatively simple to make an AI system do things like this in a highly specialized way, but that is not the intent of the project – 242 19 The CogPrime Architecture and OpenCog System

Fig. 19.1: Screenshot of OpenCog-controlled virtual dog the goal is to have the system learn to carry out tasks like this using general learning mechanisms and a general cognitive architecture, based on embodied experience and only scant feedback from human teachers. If successful, this will provide an outstanding platform for ongoing AGI development, as well as a visually appealing and immediately meaningful demo for OpenCog. A few of the specific tasks that are the focus of this project teamÕs current work at time of writing include:

• Watch another character build steps to reach a high-up object • Figure out via imitation of this that, in a different context, building steps to reach a high up object may be a good idea • Also figure out that, if it wants a certain high-up object but there are no materials for building steps available, finding some other way to get elevated will be a good idea that may help it get the object (including e.g. building a ladder, or asking someone tall to pick it up, etc.) • Figure out that, if the character wants to hide its valued object from a creature much larger than it, it should build a container with a small hole that the character can get through, but the creature cannot

19.2.2 Transitioning from Virtual Agents to a Physical Robot

In 2009-2010, preliminary experiments were conducted using OpenCog to control a Nao robot [GdG08]. These involved hybridizing OpenCog with a separate subsystem handling low-level perception and action. This hybridization was accomplished in an extremely simplistic way, 19.3 Philosophical Background 243 however. How to do this right is a topic treated in detail in [? ] and [Goe04] and only briefly touched here. We suspect that reasonable level of capability will be achievable by simply interposing DeS- TIN [ARC09] (or some other reasonably capable "hierarchical temporal memory" type senso- rimotor system) as a perception/action “black box” between OpenCog and a robot. However, we also suspect that to achieve robustly intelligent robotics we must go beyond this approach, and connect robot perception and actuation software with OpenCogPrime in a “white box” manner that allows intimate dynamic feedback between perceptual, motoric, cognitive and lin- guistic functions. We suspect this may be achievable, for example, via the creation and real-time utilization of links between the nodes in CogPrime ’s and DeSTIN’s internal networks.

19.3 Philosophical Background

The creation of advanced AGI systems is an engineering endeavor, whose achievement will require significant input from science and mathematics; and also, we believe, guidance from philosophy. Having an appropriate philosophy of mind certainly is no guarantee of creating advanced AGI system; philosophy only goes so far. However, having a badly inappropriate philosophy of mind may be a huge barrier in the creation of AGI systems. For instance, we believe that philosophical views holding that

• the contents of a mind are best understood purely as a set of logical propositions, terms or predicates; or that • brains and other intelligence-substrate systems are necessarily so complex and emergence- dependent that it’s hopeless to try to understand how they represent any particular thing, or carry out any particular action are particularly poorly suited to guide AGI development, and are likely to directly push adher- ents in the wrong directions AGI-design-wise. The development of the CogPrime design has been substantially guided by a philosophy of mind called ”patternism” [Goe06]. This guidance should not be overstated; CogPrime is an integrative design formed via the combination of a number of different philosophical, scientific and engineering ideas, and the success or failure of the design doesn’t depend on any particular philosophical understanding of intelligence. In that sense, the more abstract notions summa- rized in this section should be considered “optional” rather than critical in a CogPrime context. However, due to the core role patternism has played in the development of CogPrime , un- derstanding a few things about general patternist philosophy will be helpful for understanding CogPrime , even for those readers who are not philosophically inclined. Those readers who are philosophically inclined, on the other hand, are urged to read The Hidden Pattern [Goe06] and then interpret the particulars of CogPrime in this light. The patternist philosophy of mind is a general approach to thinking about intelligent systems. It is based on the very simple premise that mind is made of pattern – and that a mind is a system for recognizing patterns in itself and the world, critically including patterns regarding which procedures are likely to lead to the achievement of which goals in which contexts. Pattern as the basis of mind is not in itself is a very novel idea; it is present, for instance, in the 19’th-century philosophy of Charles Peirce [Pei34], in the writings of contemporary philosophers Daniel Dennett [Den91] and Douglas Hofstadter [Hof79, Hof96], in Benjamin Whorf’s [Who64] 244 19 The CogPrime Architecture and OpenCog System linguistic philosophy and Gregory Bateson’s [Bat79] systems theory of mind and nature. Bateson spoke of the Metapattern: “that it is pattern which connects.” In our prior writings on philosophy of mind, an effort has been made to pursue this theme more thoroughly than has been done before, and to articulate in detail how various aspects of human mind and mind in general can be well-understood by explicitly adopting a patternist perspective. 1 In the patternist perspective, "pattern" is generally defined as "representation as something simpler." Thus, for example, if one measures simplicity in terms of bit-count, then a program compressing an image would be a pattern in that image. But if one uses a simplicity measure incorporating run-time as well as bit-count, then the compressed version may or may not be a pattern in the image, depending on how one’s simplicity measure weights the two factors. This definition encompasses simple repeated patterns, but also much more complex ones. While pattern theory has typically been elaborated in the context of computational theory, it is not intrinsically tied to computation; rather, it can be developed in any context where there is a notion of "representation" or "production" and a way of measuring simplicity. One just needs to be able to assess the extent to which P represents or produces X, and then to compare the simplicity of P and X; and then one can assess whether P is a pattern in X. A formalization of this notion of pattern is given in [Goe06]; the crux is simply Definition 19.1. Given a metric space (M, d), and two functions c : M → [0, ∞] (the “sim- plicity measure”) and F : M → M (the “production relationship”), we say that P ∈ M is a pattern in X ∈ M to the degree

 d(F (P),X) c(X) − c(P)+ ιP = 1 − X c(X) c(X) This degree is called the pattern intensity of P in X. Next, in patternism the mind of an intelligent system is conceived as the (fuzzy) set of patterns in that system, and the set of patterns emergent between that system and other systems with which it interacts. The latter clause means that the patternist perspective is inclusive of notions of distributed intelligence [Hut96]. Basically, the mind of a system is the fuzzy set of different simplifying representations of that system that may be adopted. In the patternist perspective, intelligence is conceived as roughly indicated above: as the ability to achieve complex goals in complex environments; where complexity itself may be defined as the possession of a rich variety of patterns. A mind is thus a collection of patterns that is associated with a persistent dynamical process that achieves highly-patterned goals in highly-patterned environments. An additional hypothesis made within the patternist philosophy of mind is that reflection is critical to intelligence. This lets us conceive an intelligent system as a dynamical system that recognizes patterns in its environment and itself, as part of its quest to achieve complex goals. While this approach is quite general, it is not vacuous; it gives a particular structure to the tasks of analyzing and synthesizing intelligent systems. About any would-be intelligent system, we are led to ask questions such as: • How are patterns represented in the system? That is, how does the underlying infrastructure of the system give rise to the displaying of a particular pattern in the system’s behavior?

1 In some prior writings the term “psynet model of mind” has been used to refer to the application of patternist philosophy to cognitive theory, but this term has been "deprecated" in recent publications as it seemed to introduce more confusion than clarification. 19.3 Philosophical Background 245

• What kinds of patterns are most compactly represented within the system? • What kinds of patterns are most simply learned? • What learning processes are utilized for recognizing patterns? • What mechanisms are used to give the system the ability to introspect (so that it can recognize patterns in itself)? Now, these same sorts of questions could be asked if one substituted the word “pattern” with other words like “knowledge” or “information”. However, we have found that asking these ques- tions in the context of pattern leads to more productive answers, avoiding unproductive byways and also tying in very nicely with the details of various existing formalisms and algorithms for knowledge representation and learning. Among the many kinds of patterns in intelligent systems, semiotic patterns are particularly interesting ones. Peirce decomposed these into three categories: • iconic patterns, which are patterns of contextually important internal similarity between two entities (e.g. an iconic pattern binds a picture of a person to that person) • indexical patterns, which are patterns of spatiotemporal co-occurrence (e.g. an indexical pattern binds a wedding dress and a wedding) • symbolic patterns, which are patterns indicating that two entities are often involved in the same relationships (e.g. a symbolic pattern between the number “5” (the symbol) and various sets of 5 objects (the entities that the symbol is taken to represent))

Of course, some patterns may span more than one of these semiotic categories; and there are also some patterns that don’t fall neatly into any of these categories. But the semiotic patterns are particularly important ones; and symbolic patterns have played an especially large role in the history of AI, because of the radically different approaches different researchers have taken to handling them in their AI systems. Mathematical logic and related formalisms provide sophisticated mechanisms for combining and relating symbolic patterns (“symbols”), and some AI approaches have focused heavily on these, sometimes more so than on the identification of symbolic patterns in experience or the use of them to achieve practical goals. Pursuing the patternist philosophy in detail leads to a variety of particular hypotheses and conclusions about the nature of mind. Following from the view of intelligence in terms of achieving complex goals in complex environments, comes a view in which the dynamics of a cognitive system are understood to be governed by two main forces: • self-organization, via which system dynamics cause existing system patterns to give rise to new ones • goal-oriented behavior, which has been defined more rigorously in [Goe10b], but basically amounts to a system interacting with its environment in a way that appears like an attempt to maximize some reasonably simple function

Self-organized and goal-oriented behavior must be understood as cooperative aspects. For in- stance – to introduce an example that will be elaborated in more detail below – an agent is asked to build a surprising structure out of blocks and does so, this is goal-oriented. But the agent’s ability to carry out this goal-oriented task will be greater if it has previously played around with blocks a lot in an unstructured, spontaneous way. And the “nudge toward creativ- ity” given to it by asking it to build a surprising blocks structure may cause it to explore some novel patterns, which then feed into its future unstructured blocks play. 246 19 The CogPrime Architecture and OpenCog System

Based on these concepts, as argued in detail in [Goe06], several primary dynamical principles may be posited, including the following. For consistency of explanation, we will illustrate these principles with examples from the "playing with blocks" domain, which has the advantage of simplicity, and also of relating closely to our current work with OpenCog-controlled video game agents. However, the readers should not get the impression that CogPrime has somehow been specialized for this sort of domain; it has not been. The principles: • Evolution , conceived as a general process via which patterns within a large population thereof are differentially selected and used as the basis for formation of new patterns, based on some “fitness function” that is generally tied to the goals of the agent – Example: If trying to build a blocks structure that will surprise Bob, an agent may simulate several procedures for building blocks structures in its “mind’s eye”, assessing for each one the expected degree to which it might surprise Bob. The search through procedure space could be conducted as a form of evolution, via an such as CogPrime’s MOSES (to be discussed below). • Autopoiesis: the process by which a system of interrelated patterns maintains its integrity, via a dynamic in which whenever one of the patterns in the system begins to decrease in intensity, some of the other patterns increase their intensity in a manner that causes the troubled pattern to increase in intensity again – Example: An agent’s set of strategies for building the base of a tower, and its set of strategies for building the middle part of a tower, are likely to relate autopoietically. If the system partially forgets how to build the base of a tower, then it may regenerate this missing knowledge via using its knowledge about how to build the middle part (i.e., it knows it needs to build the base in a way that will support good middle parts). Similarly if it partially forgets how to build the middle part, then it may regenerate this missing knowledge via using its knowledge about how to build the base (i.e. it knows a good middle part should fit in well with the sorts of base it knows are good). – This same sort of interdependence occurs between pattern-sets containing more than two elements – Sometimes (as in the above example) autopoietic interdependence in the mind is tied to interdependencies in the physical world, sometimes not. • Association. Patterns, when given attention, spread some of this attention to other pat- terns that they have previously been associated with in some way. Furthermore, there is Peirce’s law of mind [Pei34], which could be paraphrased in modern terms as stating that the mind is an associative memory network, whose dynamics dictate that every idea in the memory is an active agent, continually acting on those ideas with which the memory associates it. – Example: Building a blocks structure that resembles a tower, spreads attention to mem- ories of prior towers the agents has seen, and also to memories of people whom the agent knows has seen towers, and structures it has built at the same time as towers, structures that resemble towers in various respects, etc. • Differential attention allocation / credit assignment. Patterns that have been valu- able for goal-achievement are given more attention, and are encouraged to participate in giving rise to new patterns. 19.3 Philosophical Background 247

– Example: Perhaps in a prior instance of the task “build me a surprising structure out of blocks,” searching through memory for non-blocks structures that the agent has played with has proved a useful cognitive strategy. In that case, when the task is posed to the agent again, it should tend to allocate disproportionate resources to this strategy. • Pattern creation. Patterns that have been valuable for goal-achievement are mutated and combined with each other to yield new patterns. – Example: Building towers has been useful in a certain context, but so has building structures with a large number of triangles. Why not build a tower out of triangles? Or maybe a vaguely tower-like structure that uses more triangles than a tower easily could? – Example: Building an elongated block structure resembling a table was successful in the past, as was building a structure resembling a very flat version of a chair. Generalizing, maybe building distorted versions of furniture is good. Or maybe it is building distorted version of any previously perceived objects that is good. Or maybe both, to different degrees.... Next, for a variety of reasons outlined in [Goe06] it becomes appealing to hypothesize that the network of patterns in an intelligent system must give rise to the following large-scale emergent structures • Hierarchical network. Patterns are habitually in relations of control over other patterns that represent more specialized aspects of themselves. – Example: The pattern associated with “tall building” has some control over the pattern associated with “tower”, as the former represents a more general concept ... and “tower” has some control over “Eiffel tower”, etc. • Heterarchical network. The system retains a memory of which patterns have previously been associated with each other in any way. – Example: “Tower” and “snake” are distant in the natural pattern hierarchy, but may be associatively/heterarchically linked due to having a common elongated structure. This heterarchical linkage may be used for many things, e.g. it might inspire the creative construction of a tower with a snake’s head. • Dual network. Hierarchical and heterarchical structures are combined, with the dynamics of the two structures working together harmoniously. Among many possible ways to hier- archically organize a set of patterns, the one used should be one that causes hierarchically nearby patterns to have many meaningful heterarchical connections; and of course, there should be a tendency to search for heterarchical connections among hierarchically nearby patterns. – Example: While the set of patterns hierarchically nearby “tower” and the set of patterns heterarchically nearby “tower” will be quite different, they should still have more overlap than random pattern-sets of similar sizes. So, if looking for something else heterarchically near “tower”, using the hierarchical information about “tower” should be of some use, and vice versa – In PLN, hierarchical relationships correspond to Atoms A and B so that InheritanceAB and InheritanceBA have highly dissimilar strength; and heterarchical relationships cor- respond to IntensionalSimilarity relationships. The dual network structure then arises 248 19 The CogPrime Architecture and OpenCog System

when intensional and extensional inheritance approximately correlate with each other, so that inference about either kind of inheritance assists with figuring out about the other kind. • Self structure. A portion of the network of patterns forms into an approximate image of the overall network of patterns. – Example: Each time the agent builds a certain structure, it observes itself building the structure, and its role as “builder of a tall tower” (or whatever the structure is) becomes part of its self-model. Then when it is asked to build something new, it may consult its self-model to see if it believes itself capable of building that sort of thing (for instance, if it is asked to build something very large, its self-model may tell it that it lacks persistence for such projects, so it may reply “I can try, but I may wind up not finishing it”). If the patternist theory of mind presented in [Goe06] is indeed appropriate as a guide for AGI work, then the success of CogPrime as a design will depend largely on whether these high-level structures and dynamics can be made to emerge from the synergetic interaction of CogPrime ’s representation and algorithms, when they are utilized to control an appropriate agent in an appropriate environment. The extended treatment of CogPrime given in [? ], takes care to specifically elaborate how each of these abstract concepts arises concretely from CogPrime ’s structures and algorithms. In the more concise treatment given here, we will touch this aspect only lightly.

19.3.1 A Mind-World Correspondence Principle

Beyond patternist philosophy per se, an additional philosophical principle has guided CogPrime design; this is the "mind-world correspondence principle", which enlarges on the notion of "intelligence as adaptation to environments" mentioned above. Real-world minds are always adapted to certain classes of environments and goals. As we have noted above, even a system of vast general intelligence, subject to real-world space and time constraints, will necessarily be more efficient at some kinds of learning than others. Thus, one approach to analyzing general intelligence is to look at the relationship between minds and worlds Ð where a ÒworldÓ is conceived as an environment and a set of goals defined in terms of that environment. An informal version of the "mind-world correspondence principle" given in [? ] is as follows: For intelligence to occur, there has to be a natural correspondence between the transition- sequences of world-states and the corresponding transition-sequences of mind-states, at least in the cases of transition-sequences leading to relevant goals. A slightly more rigorous version is:

MIND-WORLD CORRESPONDENCE-PRINCIPLE

. For a mind to work intelligently toward certain goals in a certain world, there should be a nice mapping from goal-directed sequences of world-states into sequences of mind-states, where 19.4 High-Level Architecture of CogPrime 249

ÒniceÓ means that a world-state-sequence W composed of two parts W1 and W2, gets mapped into a mind-state-sequence M composed of two corresponding parts M1 and M2.

What’s nice about this principle is that it relates the decomposition of the world into parts, to the decomposition of the mind into parts. (A more fully formalized statement of the principle involves mathematical concepts that won’t be introduced here for risk of digression. 2 Each component of CogPrime has been carefully thought-through in terms of this conceptual principle; i.e. it has been designed so that its internal dynamics are decomposable in a way that maps into everyday human goals and environment in a way that matches the natural decomposition of the dynamics of these goals and environments. This aspect of the CogPrime design will not be highlighted here due to space considerations, but is a persistent theme in the longer treatment in [? ][? ].

19.4 High-Level Architecture of CogPrime

The above philosophical principles would be consistent with a very wide variety of concrete AGI designs. CogPrime has not been directly derived from these philosophical principles; rather, it has been created via beginning with a combination of human cognitive psychology and com- puter science algorithms and structures, and then shaping this combination so as to yield a system that appears likely to be conformant with these philosophical principles, as well as being computationally feasible on current hardware and containing cognitive structures and dynamics roughly homologous to the key human ones. Figures 19.2, 19.3, 19.5 and 19.6 depict the high-level architecture of CogPrime. A key underlying principle is: the use of multiple cognitive processes associated with multiple types of memory to enable an intelligent agent to execute the procedures that it believes have the best probability of working toward its goals in its current context. In a robot preschool context, for example, the top-level goals would be everyday things such as pleasing the teacher, learning new information and skills, and protecting the robot’s body. Figure 19.4 shows part of the architecture via which cognitive processes interact with each other, via commonly acting on the AtomSpace knowledge repository. It is interesting to compare these diagrams to the integrative human cognitive architecture diagram given in [Goe12], which is intended to compactly overview the structure of human cog- nition as currently understood. The main difference is that the CogPrime diagrams commit to specific structures (e.g. knowledge representations) and processes, whereas the generic integra- tive architecture diagram refers merely to types of structures and processes. For instance, the integrative diagram refers generally to declarative knowledge and learning, whereas the Cog- Prime diagram refers to PLN, as a specific system for reasoning and learning about declarative knowledge. In [? ] a table is provided articulating the key connections between the components of the CogPrime diagram and the well-known human cognitive structures/processes represented in integrative diagram, thus indicating the general cognitive functions instantiated by each of the CogPrime components.

2 For the curious reader, a more rigorous statement of the principle looks like: For an organism with a reasonably high level of intelligence in a certain world, relative to a certain set of goals, the mind-world path transfer function is a goal-weighted approximate functor 250 19 The CogPrime Architecture and OpenCog System

Fig. 19.2: High-Level Architecture of CogPrime. This is a conceptual depiction, not a detailed flowchart (which would be too complex for a single image). Figures ?? , ?? and ?? highlight specific aspects of this diagram.

19.5 Local and Global Knowledge Representation

One of the biggest decisions to make in designing an AGI system is how the system should represent knowledge. Naturally any advanced AGI system is going to synthesize a lot of its own knowledge representations for handling particular sorts of knowledge – but still, an AGI design typically makes at least some sort of commitment about the category of knowledge representation mechanisms toward which the AGI system will be biased. 19.5 Local and Global Knowledge Representation 251

OpenCog’s knowledge representation mechanisms are all based fundamentally on networks. This is because we feel that, on a philosophical level, one of the most powerful known metaphors for understanding minds is to view them as networks of interrelated, interconnected elements. The view of mind as network is implicit in the patternist philosophy, because every pattern can be viewed as a pattern in something, or a pattern of arrangement of something – thus a pattern is always viewable as a relation between two or more things. A collection of patterns is thus a pattern-network. Knowledge of all kinds may be given network representations; and cognitive processes may be represented as networks also, for instance via representing them as programs, which may be represented as trees or graphs in various standard ways. The emergent patterns arising in an intelligence as it develops may be viewed as a pattern network in themselves; and the relations between an embodied mind and its physical and social environment may be viewed in terms of ecological and social networks. The two major supercategories of knowledge representation systems are local (also called explicit) and global (also called implicit) systems, with a hybrid category we refer to as glocal that combines both of these. In a local system, each piece of knowledge is stored using a small percentage of cognitive system elements; in a global system, each piece of knowledge is stored using a particular pattern of arrangement, activation, etc. of a large percentage of cognitive system elements; in a glocal system, the two approaches are used together. All three of these knowledge representation types may be realized using networks. In CogPrime, all three are realized using the same (Atomspace) network. In the first part of this section we discuss the symbolic, semantic-network aspects of knowl- edge representation in CogPrime. Then, at the end of the section we turn to distributed, neural-net-like knowledge representation, focusing largely on CogPrime’s “glocal” knowledge representation mechanisms.

19.5.1 Weighted, Labeled Hypergraphs

There are many different mechanisms for representing knowledge in AI systems in an explicit, localized way, most of them descending from various variants of formal logic. Here we briefly describe how it is done in CogPrime. On the surface, CogPrime’s explicit representation scheme is not that different from a number of prior approaches. However, the particularities of Cog- Prime’s explicit knowledge, representation, however, are carefully tuned to match CogPrime’s cognitive processes, which are more distinctive in nature than the corresponding representa- tional mechanisms. One useful way to think about CogPrime’s explicit, localized knowledge representation is in terms of hypergraphs. A hypergraph is an abstract mathematical structure [Bol98], which consists of objects called Nodes and objects called Links which connect the Nodes. In computer science, a graph traditionally means a bunch of dots connected with lines (i.e. Nodes connected by Links, or nodes connected by links). A hypergraph, on the other hand, can have Links that connect more than two Nodes. In CogPrime it is most useful to consider “generalized hypergraphs” that extend ordinary hypergraphs by containing two additional features:

• Links that point to Links instead of Nodes • Nodes that, when you zoom in on them, contain embedded hypergraphs. 252 19 The CogPrime Architecture and OpenCog System

Properly, such “hypergraphs” should always be referred to as generalized hypergraphs, but this is cumbersome, so we will persist in calling them merely hypergraphs. In a hypergraph of this sort, Links and Nodes are not as distinct as they are within an ordinary mathematical graph (for instance, they can both have Links connecting them), and so it is useful to have a generic term encompassing both Links and Nodes; for this purpose, we use the term Atom. A weighted, labeled hypergraph is a hypergraph whose Links and Nodes come along with labels, and with one or more numbers that are generically called weights. A label associated with a Link or Node may sometimes be interpreted as telling you what type of entity it is, or alternatively as telling you what sort of data is associated with a Node. On the other hand, an example of a weight that may be attached to a Link or Node is a number representing a probability, or a number representing how important the Node or Link is. Obviously, hypergraphs may come along with various sorts of dynamics. Minimally, one may think about: • Dynamics that modify the properties of Nodes or Links in a hypergraph (such as the labels or weights attached to them.) • Dynamics that add new Nodes or Links to a hypergraph, or remove existing ones. Both types of dynamics are very important in CogPrime.

19.5.1.1 Atoms: Their Types and Weights

This section reviews a variety of CogPrime Atom types and gives simple examples of each of them. The Atom types considered are drawn from those currently in use in the OpenCog system. This does not represent a complete list of Atom types required for optimally efficient implementation of a complete CogPrime system, nor a complete list of those used in OpenCog currently (though it does cover a substantial majority of those used in OpenCog currently, omitting only some with specialized importance or intended only for temporary use). The partial nature of the list given here reflects a more general point: The specific collection of Atom types in an OpenCog system is bound to change as the system is developed and experi- ment with. CogPrime specifies a certain collection of representational approaches and cognitive algorithms for acting on them; any of these approaches and algorithms may be implemented with a variety of sets of Atom types. The specific set of Atom types in the OpenCog system currently does not necessarily have a profound and lasting significance – the list might look a bit different five years from time of writing, based on various detailed changes. The treatment here is informal and intended to get across the general idea of what each Atom type does. A longer and more formal treatment of the Atom types is given in the online OpenCog wikibook [Goe10a] and in [? ].

19.5.1.2 Some Basic Atom Types

We begin with ConceptNode – and note that a ConceptNode does not necessarily refer to a whole concept, but may refer to part of a concept; it is essentially a "basic semantic node" whose meaning comes from its links to other Atoms. It would be more accurately, but less tersely, named "concept or concept fragment or element node." A simple example would be a ConceptNode grouping nodes that are somehow related, e.g. 19.5 Local and Global Knowledge Representation 253

ConceptNode: C InheritanceLink (ObjectNode: BW) C InheritanceLink (ObjectNode: BP) C InheritanceLink (ObjectNode: BN) C ReferenceLink BW (PhraseNode "Ben’s watch") ReferenceLink BP (PhraseNode "Ben’s passport") ReferenceLink BN (PhraseNode "Ben’s necklace") indicates the simple and uninteresting ConceptNode grouping three objects owned by Ben (note that the above-given Atoms don’t indicate the ownership relationship, they just link the three objects with textual descriptions). In this example, the ConceptNode links transparently to physical objects and English descriptions, but in general this won’t be the case – most ConceptNodes will look to the human eye like groupings of links of various types, that link to other nodes consisting of groupings of links of various types, etc. There are Atoms referring to basic, useful mathematical objects, e.g. NumberNodes like NumberNode #4 NumberNode #3.44 The numerical value of a NumberNode is explicitly referenced within the Atom. A core distinction is made between ordered links and unordered links; these are handled differently in the Atomspace software. A basic unordered link is the SetLink, which groups its arguments into a set. For instance, the ConceptNode C defined by ConceptNode C MemberLink A C MemberLink B C is equivalent to SetLink A B On the other hand, ListLinks are like SetLinks but ordered, and they play a fundamental role due to their relationship to predicates. Most predicates are assumed to take ordered arguments, so we may say e.g. EvaluationLink PredicateNode eat ListLink ConceptNode cat ConceptNode mouse to indicate that cats eat mice. Note that by an expression like ConceptNode cat is meant ConceptNode C ReferenceLink W C WordNode W #cat since it’s WordNodes rather than ConceptNodes that refer to words. (And note that the strength of the ReferenceLink would not be 1 in this case, because the word "cat" has multiple senses.) However, there is no harm nor formal incorrectness in the "ConceptNode cat" usage, since "cat" is just as valid a name for a ConceptNode as, say, "C." 254 19 The CogPrime Architecture and OpenCog System

We’ve already introduced above the MemberLink, which is a link joining a member to the set that contains it. Notable is that the truth value of a MemberLink is fuzzy rather than probabilistic, and that PLN (CogPrime’s logical reasoning component [GIGH08]) is able to inter-operate fuzzy and probabilistic values. SubsetLinks also exist, with the obvious meaning, e.g. ConceptNode cat ConceptNode animal SubsetLink cat animal Note that SubsetLink refers to a purely extensional subset relationship, and that Inheri- tanceLInk should be used for the generic "intensional + extensional" analogue of this – more on this below. SubsetLink could more consistently (with other link types) be named Extension- alInheritanceLink, but SubsetLink is used because it’s shorter and more intuitive. There are links representing Boolean operations AND, OR and NOT. For instance, we may say ImplicationLink ANDLink ConceptNode young ConceptNode beautiful ConceptNode attractive or, using links and VariableNodes instead of ConceptNodes, AverageLink $X ImplicationLink ANDLink EvaluationLink young $X EvaluationLink beautiful $X EvaluationLink attractive $X NOTLink is a unary link, so e.g. we might say AverageLink $X ImplicationLink ANDLink EvaluationLink young $X EvaluationLink beautiful $X EvaluationLink NOT EvaluationLink poor $X EvaluationLink attractive $X ContextLink allows explicit contextualization of knowledge, which is used in PLN, e.g. ContextLink ConceptNode golf InheritanceLink ObjectNode BenGoertzel ConceptNode incompetent says that Ben Goertzel is incompetent in the context of golf. 19.5 Local and Global Knowledge Representation 255

19.5.1.3 Variable Atoms

We have already introduced VariableNodes above; it’s also possible to specify the type of a VariableNode via linking it to a VariableTypeNode via a TypedVariableLink, e.g. VariableTypeLink VariableNode $X VariableTypeNode ConceptNode which specifies that the variable $X should be filled with a ConceptNode. Variables are handled via quantifiers; the default quantifier being the AverageLink, so that the default interpretation of ImplicationLink InheritanceLink $X animal EvaluationLink PredicateNode: eat ListLink \$X ConceptNode: food is AverageLink $X ImplicationLink InheritanceLink $X animal EvaluationLink PredicateNode: eat ListLink \$X ConceptNode: food The AverageLink invokes an estimation of the average TruthValue of the embedded expression (in this case an ImplicationLink) over all possible values of the variable $X. If there are type restrictions regarding the variable $X, these are taken into account in conducting the averaging. ForAllLink and ExistsLink may be used in the same places as AverageLink, with uncertain truth value semantics defined in PLN theory using third-order . There is also a ScholemLink used to indicate variable dependencies for existentially quantified variables, used in cases of multiply nested existential quantifiers. EvaluationLink and MemberLink have overlapping semantics, allowing expression of the same conceptual/logical relationships in terms of predicates or sets, i.e. EvaluationLink PredicateNode: eat ListLink $X ConceptNode: food has the same semantics as MemberLink ListLink $X ConceptNode: food ConceptNode: EatingEvents The relation between the predicate "eat" and the concept "EatingEvents" is formally given by 256 19 The CogPrime Architecture and OpenCog System

ExtensionalEquivalenceLink ConceptNode: EatingEvents SatisfyingSetLink PredicateNode: eat In other words, we say that "EatingEvents" is the SatisfyingSet of the predicate "eat": it is the set of entities that satisfy the predicate "eat". Note that the truth values of MemberLink and EvaluationLink are fuzzy rather than probabilistic.

19.5.1.4 Logical Links

There is a host of link types embodying logical relationships as defined in the PLN logic system [GIGH08], e.g. • InheritanceLink • SubsetLink (aka ExtensionalInheritanceLink) • Intensional InheritanceLink which embody different sorts of inheritance, e.g. SubsetLink salmon fish IntensionalInheritanceLink whale fish InheritanceLink fish animal and then • SimilarityLink • ExtensionalSimilarityLink • IntensionalSimilarityLink which are symmetrical versions, e.g. SimilarityLink shark barracuda IntensionalSimilarityLink shark dolphin ExtensionalSimiliarityLink American obese_person There are also higher-order versions of these links, both asymmetric • ImplicationLink • ExtensionalImplicationLink • IntensionalImplicationLink and symmetric • EquivalenceLink • ExtensionalEquivalenceLink • IntensionalEquivalenceLink These are used between predicates and links, e.g. ImplicationLink EvaluationLink eat ListLink $X 19.5 Local and Global Knowledge Representation 257

dirt EvaluationLink feel ListLink $X sick or ImplicationLink EvaluationLink eat ListLink $X dirt InheritanceLink $X sick or ForAllLink $X, $Y, $Z ExtensionalEquivalenceLink EquivalenceLink $Z EvaluationLink + ListLink $X $Y EquivalenceLink $Z EvaluationLink + ListLink $Y $X Note, the latter is given as an extensional equivalence because it’s a pure mathematical equiv- alence. This is not the only case of pure extensional equivalence, but it’s an important one.

19.5.1.5 Temporal Links

There are also temporal versions of these links, such as • PredictiveImplicationLink • PredictiveAttractionLink • SequentialANDLink • SimultaneousANDLink which combine logical relation between the argument with temporal relation between their arguments. For instance, we might say PredictiveImplicationLink PredicateNode: JumpOffCliff PredicateNode: Dead or including arguments, 258 19 The CogPrime Architecture and OpenCog System

PredictiveImplicationLink EvaluationLink JumpOffCliff $X EvaluationLink Dead $X The former version, without variable arguments given, shows the possibility of using higher- order logical links to join predicates without any explicit variables. Via using this format exclu- sively, one could avoid VariableAtoms entirely, using only higher-order functions in the manner of pure functional programming formalisms like combinatory logic. However, this purely func- tional style has not proved convenient, so the Atomspace in practice combines functional-style representation with variable-based representation. Temporal links often come with specific temporal quantification, e.g. PredictiveImplicationLink <5 seconds> EvaluationLink JumpOffCliff $X EvaluationLink Dead $X indicating that the conclusion will generally follow the premise within 5 seconds. There is a system for managing fuzzy time intervals and their interrelationships, based on a fuzzy version of Allen Interval Algebra. SequentialANDLink is similar to PredictiveImplicationLink but its truth value is calculated differently. The truth value of SequentialANDLink <5 seconds> EvaluationLink JumpOffCliff $X EvaluationLink Dead $X indicates the likelihood of the sequence of events occurring in that order, with gap lying within the specified time interval. The truth value of the PredictiveImplicationLink version indicates the likelihood of the second event, conditional on the occurrence of the first event (within the given time interval restriction). There are also links representing basic temporal relationships, such as BeforeLink and Af- terLink. These are used to refer to specific events, e.g. if X refers to the event of Ben waking up on July 15 2012, and Y refers to the event of Ben getting out of bed on July 15 2012, then one might have AfterLink X Y And there are TimeNodes (representing time-stamps such as temporal moments or intervals) and AtTimeLinks, so we may e.g. say AtTimeLink X TimeNode: 8:24AM Eastern Standard Time, July 15 2012 AD

19.5.2 Associative Links

There are links representing associative, attentional relationships, • HebbianLink • AsymmetricHebbianLink • InverseHebbianLink 19.5 Local and Global Knowledge Representation 259

• SymmetricInverseHebbianLink These connote associations between their arguments, i.e. they connote that the entities repre- sented by the two arguments occurred in the same situation or context, for instance HebbianLink happy smiling AsymmetricHebbianLink dead rotten InverseHebbianLink dead breathing The asymmetric HebbianLink indicates that when the first argument is present in a situation, the second is also often present. The symmetric (default) version indicates that this relationship holds in both directions. The inverse versions indicate the negative relationship: e.g. when one argument is present in a situation, the other argument is often not present.

19.5.3 Procedure Nodes

There are nodes representing various sorts of procedures; these are kinds of ProcedureNode, e.g. • SchemaNode, indicating any procedure • GroundedSchemaNode, indicating any procedure associated in the system with a Combo program or C++ function allowing the procedure to be executed • PredicateNode, indicating any predicate that associates a list of arguments with an output truth value • GroundedPredicateNode, indicating a predicate associated in the system with a Combo program or C++ function allowing the predicate’s truth value to be evaluated on a given specific list of arguments ExecutionLinks and EvaluationLinks record the activity of SchemaNodes and PredicateN- odes. We have seen many examples of EvaluationLinks in the above. Example ExecutionLinks would be: ExecutionLink step\_forward ExecutionLink step\_forward 5 ExecutionLink + ListLink NumberNode: 2 NumberNode: 3 The first example indicates that the schema "step forward" has been executed. The second example indicates that it has been executed with an argument of "5" (meaning, perhaps, that 5 steps forward have been attempted). The last example indicates that the "+" schema has been executed on the argument list (2,3), presumably resulting in an output of 5. The output of a schema execution may be indicated using an ExecutionOutputLink, e.g. ExecutionOutputLink + ListLink NumberNode: 2 NumberNode: 3 refers to the value "5" (as a NumberNode). 260 19 The CogPrime Architecture and OpenCog System 19.5.4 Links for Special External Data Types

Finally, there are also Atom types referring to specific types of data important to using OpenCog in specific contexts. For instance, there are Atom types referring to general natural language data types, such as • WordNode • SentenceNode • WordInstanceNode • DocumentNode plus more specific ones referring to relationships that are part of link-grammar parses of sen- tences • FeatureNode • FeatureLink • LinkGrammarRelationshipNode • LinkGrammarDisjunctNode or RelEx semantic interpretations of sentences • DefinedLinguisticConceptNode • DefinedLinguisticRelationshipNode • PrepositionalRelationshipNode There are also Atom types corresponding to entities important for embodying OpenCog in a virtual world, e.g. • ObjectNode • AvatarNode • HumanoidNode • UnknownObjectNode • AccessoryNode

19.5.4.1 Truth Values and Attention Values

CogPrime Atoms (Nodes and Links) of various types are quantified with truth values that, in their simplest form, have two components, one representing probability (strength) and the other representing weight of evidence; and also with attention values that have two components, short- term and long-term importance, representing the estimated value of the Atom on immediate and long-term time-scales. In practice many Atoms are labeled with CompositeTruthValues rather than elementary ones. A composite truth value contains many component truth values, representing truth values of the Atom in different contexts and according to different estimators. It is important to note that the CogPrime declarative knowledge representation is neither a neural net nor a semantic net, though it does have some commonalities with each of these traditional representations. It is not a neural net because it has no activation values, and involves no attempts at low-level brain modeling. However, attention values are very loosely analogous 19.5 Local and Global Knowledge Representation 261 to time-averages of neural net activations. On the other hand, it is not a semantic net because of the broad scope of the Atoms in the network: for example, Atoms may represent percepts, procedures, or parts of concepts. Most CogPrime Atoms have no corresponding English label. However, most CogPrime Atoms do have probabilistic truth values, allowing logical semantics.

19.5.5 Glocal Memory

Now we move on from localized memory to the "glocal" coordination of local and global memory, which plays a large role in the OpenCog architecture. A glocal memory is one that transcends the global/local dichotomy and incorporates both aspects in a tightly interconnected way. The notion of glocal memory has implicitly occurred in a number of neural theories (without use of the neologism “glocal”), e.g. [Cal96] and [Goe01], but was never extensively developed prior to the advent of CogPrime theory. In[HG08][GPI+10] we describe glocality in attractor neural nets, a close analogue to CogPrime’s attention allocation subsystem. Glocal memory overcomes the dichotomy between localized memory (in which each memory item is stored in a single location within an overall memory structure) and global, distributed memory (in which a memory item is stored as an aspect of a multi-component memory system, in such a way that the same set of multiple components stores a large number of memories). In a glocal memory system, most memory items are stored both locally and globally, with the property that eliciting either one of the two records of an item tends to also elicit the other one. Glocal memory applies to multiple forms of memory; however we will focus largely on percep- tual and declarative memory in our detailed analyses here, so as to conserve space and maintain simplicity of discussion. The central idea of glocal memory is that (perceptual, declarative, episodic, procedural, etc.) items may be stored in memory in the form of paired structures that are called (key, map) pairs. Of course the idea of a “pair” is abstract, and such pairs may manifest themselves quite differently in different sorts of memory systems (e.g. brains versus non-neuromorphic AI systems). The key is a localized version of the item, and records some significant aspects of the items in a simple and crisp way. The map is a dispersed, distributed version of the item, which represents the item as a (to some extent, dynamically shifting) combination of fragments of other items. The map includes the key as a subset; activation of the key generally (but not necessarily always) causes activation of the map; and changes in the memory item will generally involve complexly coordinated changes on the key and map level both. Memory is one area where animal brain architecture differs radically from the von Neu- mann architecture underlying nearly all contemporary general-purpose computers. Von Neu- mann computers separate memory from processing, whereas in the human brain there is no such distinction. Human memories are generally constructed in the course of remembering [Ros88], which gives human memory a strong capability for “filling in gaps” of remembered experi- ence and knowledge; and also causes problems with inaccurate remembering in many contexts [BF71, RM95] e.g. We believe the constructive aspect of memory is largely associated with its glocality. 262 19 The CogPrime Architecture and OpenCog System

19.5.5.1 Neural-Symbolic Glocality in CogPrime

In CogPrime, we have explicitly sought to span the symbolic/emergentist pseudo-dichotomy, via creating an integrative knowledge representation that combines logic-based aspects with neural- net-like aspects. As reviewed above, these function not in the manner of a multimodular system, but rather via using (probabilistic logical) truth values and (attractor neural net like) attention values as weights on nodes and links of the same (hyper) graph. The nodes and links in this hypergraph are typed, like a standard semantic network approach for knowledge representation, so they’re able to handle all sorts of knowledge, from the most concrete perception and actuation related knowledge to the most abstract relationships. But they’re also weighted with values similar to neural net weights, and pass around quantities (importance values) similar to neural net activations, allowing emergent attractor/assembly based knowledge representation similar to attractor neural nets. The concept of glocality lies at the heart of this combination, in a way that spans the pseudo- dichotomy: • Local knowledge is represented in abstract logical relationships stored in explicit logical form, and also in Hebbian-type associations between nodes and links. • Global knowledge is represented in large-scale patterns of node and link weights, which lead to large-scale patterns of network activity, which often take the form of attractors qualitatively similar to Hopfield net attractors. These attractors are called maps. The result of all this is that a concept like “cat” might be represented as a combination of: • A small number of logical relationships and strong associations, that constitute the “key” subnetwork for the “cat” concept. • A large network of weak associations, binding together various nodes and links of various types and various levels of abstraction, representing the “cat map”. The activation of the key will generally cause the activation of the map, and the activation of a significant percentage of the map will cause the activation of the rest of the map, including the key. Furthermore, if the key were for some reason forgotten, then after a significant amount of effort, the system would likely to be able to reconstitute it (perhaps with various small changes) from the information in the map. We conjecture that this particular kind of glocal memory will turn out to be very powerful for AGI, due to its ability to combine the strengths of formal logical inference with those of self-organizing attractor neural networks. As a simple example, consider the representation of a “tower”, in the context of an artificial agent that has built towers of blocks, and seen pictures of many other kinds of towers, and seen some tall buildings that it knows are somewhat like towers but perhaps not exactly towers. If this agent is reasonably conceptually advanced (say, at Piagetan the concrete operational level) then its mind will contain some declarative relationships partially characterizing the concept of “tower,” as well as its sensory and episodic examples, and its procedural knowledge about how to build towers. The key of the “tower” concept in the agent’s mind may consist of internal images and episodes regarding the towers it knows best, the essential operations it knows are useful for building towers (piling blocks atop blocks atop blocks...), and the core declarative relations summarizing “towerness” – and the whole “tower” map then consists of a much larger number of images, episodes, procedures and declarative relationships connected to “tower” and other related entities. If any portion of the map is removed – even if the key is removed – then the 19.6 Memory Types and Associated Cognitive Processes in CogPrime 263 rest of the map can be approximately reconstituted, after some work. Some cognitive operations are best done on the localized representation – e.g. logical reasoning. Other operations, such as attention allocation and guidance of inference control, are best done using the globalized map representation.

19.6 Memory Types and Associated Cognitive Processes in CogPrime

Now we dig deeper into the internals of the CogPrime approach, turning to aspects of the relationship between structure and dynamics. Architecture diagrams are all very well, but, ultimately it is dynamics that makes an architecture come alive. Intelligence is all about learning, which is by definition about change, about dynamical response to the environment and internal self-organizing dynamics. CogPrime relies on multiple memory types and, as discussed above, is founded on the premise that the right course in architecting a pragmatic, roughly human-like AGI system is to handle different types of memory differently in terms of both structure and dynamics. CogPrime’s memory types are the declarative, procedural, sensory, and episodic memory types that are widely discussed in cognitive neuroscience [TC05], plus attentional memory for allocating system resources generically, and intentional memory for allocating system resources in a goal-directed way. Table 19.1 overviews these memory types, giving key references and indicating the corresponding cognitive processes, and also indicating which of the generic pat- ternist cognitive dynamics each cognitive process corresponds to (pattern creation, association, etc.). Figure 19.7 illustrates the relationships between several of the key memory types in the context of a simple situation involving an OpenCogPrime -controlled agent in a virtual world. In terms of patternist cognitive theory, the multiple types of memory in CogPrime should be considered as specialized ways of storing particular types of pattern, optimized for spacetime efficiency. The cognitive processes associated with a certain type of memory deal with creating and recognizing patterns of the type for which the memory is specialized. While in principle all the different sorts of pattern could be handled in a unified memory and processing architecture, the sort of specialization used in CogPrime is necessary in order to achieve acceptable efficient general intelligence using currently available computational resources. And as we have argued in detail in [Goe10b], efficiency is not a side-issue but rather the essence of real-world AGI (since as Hutter has shown, if one casts efficiency aside, arbitrary levels of general intelligence can be achieved via a trivially simple program). The essence of the CogPrime design lies in the way the structures and processes associated with each type of memory are designed to work together in a closely coupled way, yielding coop- erative intelligence going beyond what could be achieved by an architecture merely containing the same structures and processes in separate “black boxes.” The inter-cognitive-process interactions in OpenCog are designed so that • conversion between different types of memory is possible, though sometimes computation- ally costly (e.g. an item of declarative knowledge may with some effort be interpreted procedurally or episodically, etc.) • when a learning process concerned centrally with one type of memory encounters a situation where it learns very slowly, it can often resolve the issue by converting some of the relevant knowledge into a different type of memory: i.e. cognitive synergy 264 19 The CogPrime Architecture and OpenCog System

General Cognitive Memory Type Specific Cognitive Processes Functions Probabilistic Logic Networks (PLN) Declarative [GMIH08]; conceptual blending pattern creation [FT02] MOSES (a novel probabilistic Procedural evolutionary program learning pattern creation algorithm) [Loo06] association, pattern Episodic internal simulation engine [GEA08] creation Economic Attention Networks association, credit Attentional (ECAN) [GPI+10] assignment probabilistic goal hierarchy refined by credit assignment, Intentional PLN and ECAN, structured pattern creation according to MicroPsi [Bac09] association, attention In CogBot, this will be supplied by allocation, pattern Sensory the DeSTIN component creation, credit assignment

Table 19.1: Memory Types and Cognitive Processes in CogPrime. The third column indicates the general cognitive function that each specific cognitive process carries out, according to the patternist theory of cognition.

19.6.1 Cognitive Synergy in PLN

To put a little meat on the bones of the "cognitive synergy" idea mentioned above, we now elaborate a little on the role it plays in the interaction between procedural and declarative learning. While MOSES handles much of CogPrime’s procedural learning, and CogPrime’s internal simulation engine handles most episodic knowledge, CogPrime’s primary tool for handling declarative knowledge is an uncertain inference framework called Probabilistic Logic Networks (PLN). The complexities of PLN are the topic of a lengthy technical monograph [GMIH08]; here we will eschew most details and focus mainly on pointing out how PLN seeks to achieve efficient inference control via integration with other cognitive processes. As a logic, PLN is broadly integrative: it combines certain term logic rules with more standard predicate logic rules, and utilizes both fuzzy truth values and a variant of imprecise probabilities called indefinite probabilities. PLN mathematics tells how these uncertain truth values propagate through its logic rules, so that uncertain premises give rise to conclusions with reasonably accurately estimated uncertainty values. This careful management of uncertainty is critical for the application of logical inference in the robotics context, where most knowledge is abstracted from experience and is hence highly uncertain. PLN can be used in either forward or backward chaining mode; and in the language intro- duced above, it can be used for either analysis or synthesis. As an example, we will consider backward chaining analysis, exemplified by the problem of a robot preschool-student trying to determine whether a new playmate “Bob” is likely to be a regular visitor to is preschool or not (evaluating the truth value of the implication Bob → regular_visitor). The basic backward chaining process for PLN analysis looks like: 19.6 Memory Types and Associated Cognitive Processes in CogPrime 265

1. Given an implication L ≡ A → B whose truth value must be estimated (for instance L ≡ Concept ∧ Procedure → Goal as discussed above), create a list (A1, ..., An) of (inference rule, stored knowledge) pairs that might be used to produce L 2. Using analogical reasoning to prior inferences, assign each Ai a probability of success

• If some of the Ai are estimated to have reasonable probability of success at generating reasonably confident estimates of L’s truth value, then invoke Step 1 with Ai in place of L (at this point the inference process becomes recursive) • If none of the Ai looks sufficiently likely to succeed, then inference has “gotten stuck” and another cognitive process should be invoked, e.g. – Concept creation may be used to infer new concepts related to A and B, and then Step 1 may be revisited, in the hope of finding a new, more promising Ai involving one of the new concepts – MOSES may be invoked with one of several special goals, e.g. the goal of finding a procedure P so that P (X) predicts whether X → B. If MOSES finds such a procedure P then this can be converted to declarative knowledge understandable by PLN and Step 1 may be revisited.... – Simulations may be run in CogPrime ’s internal simulation engine, so as to observe the truth value of A → B in the simulations; and then Step 1 may be revisited.... The combinatorial explosion of inference control is combatted by the capability to defer to other cognitive processes when the inference control procedure is unable to make a sufficiently confident choice of which inference steps to take next. Note that just as MOSES may rely on PLN to model its evolving populations of procedures, PLN may rely on MOSES to create complex knowledge about the terms in its logical implications. This is just one example of the multiple ways in which the different cognitive processes in CogPrime interact synergetically; a more thorough treatment of these interactions is given in [Goe09a]. In the “new playmate” example, the interesting case is where the robot initially seems not to know enough about Bob to make a solid inferential judgment (so that none of the Ai seem particularly promising). For instance, it might carry out a number of possible inferences and not come to any reasonably confident conclusion, so that the reason none of the Ai seem promising is that all the decent-looking ones have been tried already. So it might then recourse to MOSES, simulation or concept creation. For instance, the PLN controller could make a list of everyone who has been a regular visitor, and everyone who has not been, and pose MOSES the task of figuring out a procedure for distinguishing these two categories. This procedure could then used directly to make the needed assessment, or else be translated into logical rules to be used within PLN inference. For example, perhaps MOSES would discover that older males wearing ties tend not to become regular visitors. If the new playmate is an older male wearing a tie, this is directly applicable. But if the current playmate is wearing a tuxedo, then PLN may be helpful via reasoning that even though a tuxedo is not a tie, it’s a similar form of fancy dress – so PLN may extend the MOSES-learned rule to the present case and infer that the new playmate is not likely to be a regular visitor. 266 19 The CogPrime Architecture and OpenCog System 19.7 Goal-Oriented Dynamics in CogPrime

CogPrime’s dynamics has both goal-oriented and “spontaneous” aspects; here for simplicity’s sake we will focus more heavily on the goal-oriented ones, although both are important. The basic goal-oriented dynamic of the CogPrime system, within which the various types of memory are utilized, is driven by implications known as “cognitive schematics”, which take the form

Context ∧ P rocedure → Goal < p > (summarized C ∧ P → G). Semi-formally, this implication may be interpreted to mean: “If the context C appears to hold currently, then if I enact the procedure P , I can expect to achieve the goal G with certainty represented by truth value object p.” Cognitive synergy means that the learning processes corresponding to the different types of memory actively cooperate in figuring out what procedures will achieve the system’s goals in the relevant contexts within its environment. CogPrime’s cognitive schematic is significantly similar to production rules in classical archi- tectures like SOAR and ACT-R; however, there are significant differences which are important to CogPrime’s functionality. Unlike with classical production rules systems, uncertainty is core to CogPrime’s knowledge representation, and each CogPrime cognitive schematic is labeled with an uncertain truth value, which is critical to its utilization by CogPrime’s cognitive processes. Also, in CogPrime, cognitive schematics may be incomplete, missing one or two of the terms, which may then be filled in by various cognitive processes (generally in an uncertain way). A stronger similarity is to MicroPsi’s triplets; the differences in this case are more low-level and technical and are reviewed in [? ]. Finally, the biggest difference between CogPrime’s cognitive schematics and production rules or other similar constructs, is that in CogPrime this level of knowledge representation is not the only important one. CLARION [SZ04], as reviewed above, is an example of a cognitive architecture that uses production rules for explicit knowledge representation and then uses a totally separate subsymbolic knowledge store for implicit knowledge. In CogPrime, both explicit and implicit knowledge are stored in the same graph of nodes and links, with

• explicit knowledge stored in probabilistic logic based nodes and links such as cognitive schematics (see Figure 19.8 for a depiction of some explicit linguistic knowledge.) • implicit knowledge stored in patterns of activity among these same nodes and links, defined via the activity of the “importance” values (see Figure 19.9 for an illustrative example thereof) associated with nodes and links and propagated by the ECAN attention allocation process The meaning of a cognitive schematic in CogPrime is hence not entirely encapsulated in its explicit logical form, but resides largely in the activity patterns that ECAN causes its activation or exploration to give rise to. And this fact is important because the synergetic interactions of system components are in large part modulated by ECAN activity. Without the real-time combination of explicit and implicit knowledge in the system’s , the synergetic interaction of different cognitive processes would not work so smoothly, and the emergence of effective high-level hierarchical, heterarchical and self structures would be less likely. 19.7 Goal-Oriented Dynamics in CogPrime 267 19.7.1 Analysis and Synthesis Processes in CogPrime

The cognitive schematic Context∧Procedure → Goal leads to a conceptualization of the internal action of an intelligent system as involving two key categories of learning: • Analysis: Estimating the probability p of a posited C ∧ P → G relationship • Synthesis: Filling in one or two of the variables in the cognitive schematic, given as- sumptions regarding the remaining variables, and directed by the goal of maximizing the probability of the cognitive schematic

To further flesh these ideas out, we will now use examples from the “virtual dog” application to motivate the discussion. For example, where synthesis is concerned, • The MOSES probabilistic evolutionary program learning algorithm is applied to find P , given fixed C and G. Internal simulation is also used, for the purpose of creating a simulation embodying C and seeing which P lead to the simulated achievement of G. – Example: A virtual dog learns a procedure P to please its owner (the goal G) in the context C where there is a ball or stick present and the owner is saying “fetch”. • PLN inference, acting on declarative knowledge, is used for choosing C, given fixed P and G (also incorporating sensory and episodic knowledge as appropriate). Simulation may also be used for this purpose. – Example: A virtual dog wants to achieve the goal G of getting food, and it knows that the procedure P of begging has been successful at this before, so it seeks a context C where begging can be expected to get it food. Probably this will be a context involving a friendly person.

• PLN-based goal refinement is used to create new subgoals G to sit on the right hand side of instances of the cognitive schematic. – Example: Given that a virtual dog has a goal of finding food, it may learn a subgoal of following other dogs, due to observing that other dogs are often heading toward their food. • Concept formation heuristics are used for choosing G and for fueling goal refinement, but especially for choosing C (via providing new candidates for C). They are also used for choosing P , via a process called “predicate schematization” that turns logical predicates (declarative knowledge) into procedures.

– Example: At first a virtual dog may have a hard time predicting which other dogs are going to be mean to it. But it may eventually observe common features among a number of mean dogs, and thus form its own concept of “pit bull,” without anyone ever teaching it this concept explicitly. Where analysis is concerned: 268 19 The CogPrime Architecture and OpenCog System

• PLN inference, acting on declarative knowledge, is used for estimating the probability of the implication in the cognitive schematic, given fixed C, P and G. Episodic knowledge is also used in this regard, via enabling estimation of the probability via simple similarity matching against past experience. Simulation is also used: multiple simulations may be run, and statistics may be captured therefrom. – Example: To estimate the degree to which asking Bob for food (the procedure P is “asking for food”, the context C is “being with Bob”) will achieve the goal G of getting food, the virtual dog may study its memory to see what happened on previous occasions where it or other dogs asked Bob for food or other things, and then integrate the evidence from these occasions. • Procedural knowledge, mapped into declarative knowledge and then acted on by PLN in- ference, can be useful for estimating the probability of the implication C ∧ P → G, in cases where the probability of C ∧ P1 → G is known for some P1 related to P . – Example: knowledge of the internal similarity between the procedure of asking for food and the procedure of asking for toys, allows the virtual dog to reason that if asking Bob for toys has been successful, maybe asking Bob for food will be successful too. • Inference, acting on declarative or sensory knowledge, can be useful for estimating the probability of the implication C ∧ P → G, in cases where the probability of C1 ∧ P → G is known for some C1 related to C. – Example: if Bob and Jim have a lot of features in common, and Bob often responds positively when asked for food, then maybe Jim will too. • Inference can be used similarly for estimating the probability of the implication C ∧P → G, in cases where the probability of C ∧ P → G1 is known for some G1 related to G. Concept creation can be useful indirectly in calculating these probability estimates, via providing new concepts that can be used to make useful inference trails more compact and hence easier to construct. – Example: The dog may reason that because Jack likes to play, and Jack and Jill are both children, maybe Jill likes to play too. It can carry out this reasoning only if its concept creation process has invented the concept of “child” via analysis of observed data. In these examples we have focused on cases where two terms in the cognitive schematic are fixed and the third must be filled in; but just as often, the situation is that only one of the terms is fixed. For instance, if we fix G, sometimes the best approach will be to collectively learn C and P . This requires either a procedure learning method that works interactively with a declarative-knowledge-focused concept learning or reasoning method; or a declarative learning method that works interactively with a procedure learning method. That is, it requires the sort of cognitive synergy built into the CogPrime design.

19.8 Clarifying the Key Claims

Having overviewed some of the basics of the CogPrime design, we now return to the “key claims” that were listed at the end of the Introduction As noted there, this is a list of claims such that 19.8 Clarifying the Key Claims 269

– roughly speaking – if the reader accepts these claims, they should accept that the CogPrime approach to AGI is a viable one. On the other hand if the reader rejects one or more of these claims, they may well find one or more aspects of CogPrime unacceptable for some related reason. Above, we merely listed these claims; here we briefly discuss each one in the context of CogPrime concepts presented in the intervening sections. As we clarified in the Introduction, we don’t fancy that we have provided an ironclad ar- gument that the CogPrime approach to AGI is guaranteed to work as hoped, once it’s fully engineered, tuned and taught. Mathematics isn’t yet adequate to analyze the real-world be- havior of complex systems like these 3; and we have not yet implemented, tested and taught enough of CogPrime to provide convincing empirical validation. So, most of the claims listed here have not been rigorously demonstrated, but only heuristically argued for. That is the re- ality of AGI work right now: one assembles a design based on the best combination of rigorous and heuristic arguments one can, then proceeds to create and teach a system according to the design, adjusting the details of the design based on experimental results as one goes along.

19.8.1 Multi-Memory Systems

The first of our key claims is that to achieve general intelligence in the context of human- intelligence-friendly environments and goals using feasible computational resources, it’s impor- tant that an AGI system can handle different kinds of memory (declarative, procedural, episodic, sensory, intentional, attentional) in customized but interoperable ways. The basic idea is that these different kinds of knowledge have very different characteristics, so that trying to handle them all within a single approach, while surely possible, is likely to be unacceptably inefficient. The tricky issue in formalizing this claim is that “single approach” is an ambiguous notion: for instance, if one has a wholly logic-based system that represents all forms of knowledge using predicate logic, then one may still have specialized inference control heuristics corresponding to the different kinds of knowledge mentioned in the claim. In this case one has “customized but interoperable ways” of handling the different kinds of memory, and one doesn’t really have a “single approach” even though one is using logic for everything. To bypass such conceptual difficulties, one may formalize cognitive synergy using a geometric framework as discussed in [GI11], in which different types of knowledge are represented as metrized categories, and cognitive synergy becomes a statement about paths to goals being shorter in metric spaces combining multiple knowledge types than in those corresponding to individual knowledge types. In CogPrime we use a complex combination of representations, including the Atomspace for declarative, attentional and intentional knowledge and some episodic and sensorimotor knowl- edge, Combo programs for procedural knowledge, simulations for episodic knowledge, and hi- erarchical neural nets for some sensorimotor knowledge (and related episodic, attentional and intentional knowledge). In cases where the same representational mechanism is used for dif- ferent types of knowledge, different cognitive processes are used, and often different aspects of the representation (e.g. attentional knowledge is dealt with largely by ECAN acting on Atten- tionValues and HebbianLinks in the Atomspace; whereas declarative knowledge is dealt with largely by PLN acting on TruthValues and logical links, also in the AtomSpace). So one has

3 Although an Appendix of [? ] gives a list of formal propositions echoing many of the ideas in the chapter – propositions such that, if they are true, then the success of CogPrime as an architecture for general intelligence is likely. 270 19 The CogPrime Architecture and OpenCog System a mix of the “different representations for different memory types” approach and the “different control processes on a common representation for different memory types” approach. It’s unclear how closely dependent the need for a multi-memory approach is on the particu- lars of “human-friendly environments.” We have argued in [Goe09b] that one factor militating in favor of a multi-memory approach is the need for multimodal communication: declarative knowledge relates to linguistic communication; procedural knowledge relates to demonstrative communication; attentional knowledge relates to indicative communication; and so forth. But in fact the multi-memory approach may have a broader importance, even to intelligences with- out multimodal communication. This is an interesting issue but not particularly critical to the development of human-like, human-level AGI, since in the latter case we are specifically con- cerned with creating intelligences that can handle multimodal communication. So if for no other reason, the multi-memory approach is worthwhile for handling multi-modal communication. Pragmatically, it is also quite clear that the human brain takes a multi-memory approach, e.g. with the cerebellum and closely linked cortical regions containing special structures for handling procedural knowledge, with special structures for handling motivational (intentional) factors, etc. And (though this point is certainly not definitive, it’s meaningful in the light of the above theoretical discussion) decades of computer science and narrow-AI practice strongly suggest that the “one memory structure fits all” approach is not capable of leading to effective real-world approaches.

19.8.2 Perception, Action and Environment

The more we understand of human intelligence, the clearer it becomes how closely it has evolved to match the particular goals and environments for which the human organism evolved. This is true in a broad sense, as illustrated by the above issues regarding multi-memory systems, and is also true in many particulars, as illustrated e.g. by Changizi’s [Cha09] evolutionary analysis of the human visual system. While it might be possible to create a human-like, human-level AGI by abstracting the relevant biases from human biology and behavior and explicitly encoding them in one’s AGI architecture, it seems this would be an inordinately difficult approach in practice, leading to the claim that to achieve human-like general intelligence, it’s important for an intelligent agent to have sensory data and motoric affordances that roughly emulate those available to humans. We don’t claim this is a necessity – just a dramatic convenience. And if one accepts this point, it has major implications for what sorts of paths toward AGI it makes most sense to follow. Unfortunately, though, the idea of a “human-like” set of goals and environments is fairly vague; and when you come right down to it, we don’t know exactly how close the emulation needs to be to form a natural scenario for the maturation of human-like, human-level AGI systems. One could attempt to resolve this issue via a priori theory, but given the current level of scientific knowledge it’s hard to see how that would be possible in any definitive sense ... which leads to the conclusion that our AGI systems and platforms need to support fairly flexible experimentation with virtual-world and/or robotic infrastructures. Our own intuition is that currently neither current virtual world platforms, nor current robotic platforms, are quite adequate for the development of human-level, human-like AGI. Virtual worlds would need to become a lot more like robot simulators, allowing more flexible interaction with the environment, and more detailed control of the agent. Robots would need 19.8 Clarifying the Key Claims 271 to become more robust at moving and grabbing – e.g. with Big Dog’s movement ability but the grasping capability of the best current grabber arms. We do feel that development of adequate virtual world or robotics platforms is quite possible using current technology, and could be done at fairly low cost if someone were to prioritize this. Even without AGI-focused prioritization, it seems that the needed technological improvements are likely to happen during the next decade for other reasons. So at this point we feel it makes sense for AGI researchers to focus on AGI and exploit embodiment-platform improvements as they come along – at least, this makes sense in the case of AGI approaches (like CogPrime ) that can be primarily developed in an embodiment-platform-independent manner.

19.8.3 Developmental Pathways

But if an AGI system is going to live in human-friendly environments, what should it do there? No doubt very many pathways leading from incompetence to adult-human-level general intel- ligence exist, but one of them is much better understood than any of the others, and that’s the one normal human children take. Of course, given their somewhat different embodiment, it doesn’t make sense to try to force AGI systems to take exactly the same path as human chil- dren, but having AGI systems follow a fairly close approximation to the human developmental path seems the smoothest developmental course ... a point summarized by the claim that: To work toward adult human-level, roughly human-like general intelligence, one fairly easily com- prehensible path is to use environments and goals reminiscent of human childhood, and seek to advance one’s AGI system along a path roughly comparable to that followed by human children. Human children learn via a rich variety of mechanisms; but broadly speaking one conclusion one may draw from studying human child learning is that it may make sense to teach an AGI system aimed at roughly human-like general intelligence via a mix of spontaneous learning and explicit instruction, and to instruct it via a combination of imitation, reinforcement and correction, and a combination of linguistic and nonlinguistic instruction. We have explored exactly what this means in [GEA08] and other prior experiments, via looking at examples of these types of learning in the context of virtual pets in virtual worlds, and exploring how specific CogPrime learning mechanisms can be used to achieve simple examples of these types of learning. One important case of learning that human children are particularly good at is language learning; and in [Goe08] we have argued that this is a case where it may pay for AGI systems to take a route somewhat different from the one taken by human children. Humans seem to be born with a complex system of biases enabling effective language learning, and it’s not yet clear exactly what these biases are nor how they’re incorporated into the learning process. It is very tempting to give AGI systems a “short cut” to language proficiency via making use of existing rule-based and statistical-corpus-analysis-based NLP systems; and we have fleshed out this approach sufficiently to have convinced ourselves it makes practical as well as conceptual sense, in the context of the specific learning mechanisms and NLP tools built into OpenCog. Thus we have provided a number of detailed arguments and suggestions in support of our claim that one effective approach to teaching an AGI system human language is to supply it with some in-built linguistic facility, in the form of rule-based and statistical-linguistics-based NLP systems, and then allow it to improve and revise this facility based on experience. 272 19 The CogPrime Architecture and OpenCog System 19.8.4 Knowledge Representation

Many knowledge representation approaches have been explored in the AI literature, and ulti- mately many of these could be workable for human-level AGI if coupled with the right cog- nitive processes. The key goal for a knowledge representation for AGI should be naturalness with respect to the AGI’s cognitive processes – i.e. the cognitive processes shouldn’t need to undergo complex transformative gymnastics to get information in and out of the knowl- edge representation in order to do their cognitive work. Toward this end we have come to a similar conclusion to some other researchers (e.g. Joscha Bach and Stan Franklin), and con- cluded that given the strengths and weaknesses of current and near-future digital computers, a (loosely) neural-symbolic network is a good representation for directly storing many kinds of memory, and interfacing between those that it doesn’t store directly. CogPrime’s AtomSpace is a neural-symbolic network designed to work nicely with PLN, MOSES, ECAN and the other key CogPrime cognitive processes; it supplies them with what they need without causing them undue complexities. It provides a platform that these cognitive processes can use to adaptively, automatically construct specialized knowledge representations for particular sorts of knowledge that they encounter.

19.8.5 Cognitive Processes

The crux of intelligence is dynamics, learning, adaptation; and so the crux of an AGI design is the set of cognitive processes that the design provides. These processes must collectively allow the AGI system to achieve its goals in its environments using the resources at hand. Given CogPrime’s multi-memory design, it’s natural to consider CogPrime’s cognitive processes in terms of which memory subsystems they focus on (although, this is not a perfect mode of analysis, since some of the cognitive processes span multiple memory types).

19.8.5.1 Uncertain Logic for Declarative Knowledge

One major decision made in the creation of CogPrime was that given the strengths and weak- nesses of current and near-future digital computers, uncertain logic is a good way to handle declarative knowledge. Of course this is not obvious nor is it the only possible route. Declarative knowledge can potentially be handled in other ways; e.g. in a hierarchical network architecture, one can make declarative knowledge emerge automatically from procedural and sensorimotor knowledge, as is the goal in the HTM and DeSTIN designs, for example. It seems clear that the human brain doesn’t contain anything closely parallel to formal logic – even though one can ground logic operations in neural-net dynamics as explored in [? ], this sort of grounding leads to “uncertain logic enmeshed with a host of other cognitive dynamics” rather than “uncertain logic as a cleanly separable cognitive process.” But contemporary digital computers are not brains – they lack the human brain’s capacity for cheap massive parallelism, but have a capability for single-operation speed and precision far exceeding the brain’s. In this way computers and formal logic are a natural match (a fact that’s not surprising given that Boolean logic lies at the foundation of digital computer operations). Using uncertain logic is a sort of compromise between brainlike messiness and fuzziness, and 19.8 Clarifying the Key Claims 273 computerlike precision. An alternative to using uncertain logic is using crisp logic and incorpo- rating uncertainty as content within the knowledge base – this is what SOAR [Lai12] does, for example, and it’s not a wholly unworkable approach. But given that the vast mass of knowledge needed for confronting everyday human reality is highly uncertain, and that this knowledge of- ten needs to be manipulated efficiently in real-time, it seems to us there is a strong argument for embedding uncertainty in the logic. Many approaches to uncertain logic exist in the literature, including probabilistic and fuzzy approaches, and one conclusion we reached in formulating CogPrime is that none of them was adequate on its own – leading us, for example, to the conclusion that to deal with the problems facing a human-level AGI, an uncertain logic must integrate imprecise probability and fuzziness with a broad scope of logical constructs. The arguments that both fuzziness and probability are needed seem hard to counter – these two notions of uncertainty are qualitatively different yet both appear cognitively necessary. The argument for using probability in an AGI system is assailed by some AGI researchers such as Pei Wang, but we are swayed by the theoretical arguments in favor of probability theory’s mathematically fundamental nature, as well as the massive demonstrated success of probability theory in various areas of narrow AI and applied science. However, we are also swayed by the arguments of Pei Wang, Peter Walley and others that using single-number probabilities to represent truth values leads to untoward complexities related to the tabulation and manipulation of amounts of evidence. This has led us to an imprecise probability based approach; and then technical arguments regarding the limitations of standard imprecise probability formalisms have led us to develop our own “indefinite probabilities” formalism. The PLN logic framework is one way of integrating imprecise probability and fuzziness in a logical formalism that encompasses a broad scope of logical constructs. It integrates term logic and predicate logic – a feature that we consider not necessary, but very convenient, for AGI. Either predicate or term logic on its own would suffice, but each is awkward in certain cases, and integrating them as done in PLN seems to result in more elegant handling of real-world inference scenarios. Finally, PLN also integrates intensional inference in an elegant manner that demonstrates integrative intelligence – it defines intension using pattern theory, which binds inference to pattern recognition and hence to other cognitive processes in a conceptually appropriate way. Clearly PLN is not the only possible logical formalism capable of serving a human-level AGI system; however, we know of no other existing, fleshed-out formalism capable of fitting the bill. In part this is because PLN has been developed as part of an integrative AGI project whereas other logical formalisms have mainly been developed for other purposes, or purely theoretically. Via using PLN to control virtual agents, and integrating PLN with other cognitive processes, we have tweaked and expanded the PLN formalism to serve all the roles required of the “declarative cognition” component of an AGI system with reasonable elegance and effectiveness.

19.8.5.2 Program Learning for Procedural Knowledge

Even more so than declarative knowledge, procedural knowledge is represented in many different ways in the AI literature. The human brain also apparently uses multiple mechanisms to embody different kinds of procedures. So the choice of how to represent procedures in an AGI system is not particularly obvious. However, there is one particular representation of procedures that is particularly well-suited for current computer systems, and particularly well-tested in this 274 19 The CogPrime Architecture and OpenCog System context: programs. In designing CogPrime , we have acted based on the understanding that programs are a good way to represent procedures – including both cognitive and physical-action procedures, but perhaps not including low-level motor-control procedures. Of course, this begs the question of programs in what , and in this context we have made a fairly traditional choice, using a special language called Combo that is essentially a minor variant of LISP, and supplying Combo with a set of customized primitives intended to reduce the length of the typical programs CogPrime needs to learn and use. What differentiates this use of LISP from many traditional uses of LISP in AI is that we are only using the LISP-ish representational style for procedural knowledge, rather than trying to use it for everything. One test of whether the use of Combo programs to represent procedural knowledge makes sense is whether the procedures useful for a CogPrime system in everyday human environments have short Combo representations. We have worked with Combo enough to validate that they generally do in the virtual world environment – and also in the physical-world environment if lower-level motor procedures are supplied as primitives. That is, we are not convinced that Combo is a good representation for the procedure a robot needs to do to move its fingers to pick up a cup, coordinating its movements with its visual perceptions. It’s certainly possible to represent this sort of thing in Combo, but Combo may be an awkward tool. However, if one represents low-level procedures like this using another method, e.g. learned cell assemblies in a hierarchical network like DeSTIN, then it’s very feasible to make Combo programs that invoke these low-level procedures, and encode higher-level actions like “pick up the cup in front of you slowly and quietly, then hand it to Jim who is standing next to you.” Having committed to use programs to represent many procedures, the next question is how to learn programs. One key conclusion we have come to via our empirical work in this area is that some form of powerful program normalization is essential. Without normalization, it’s too hard for existing learning algorithms to generalize from known, tested programs and draw useful uncertain conclusions about untested ones. We have worked extensively with a generalization of Holman’s “Elegant Normal Form” in this regard. For learning normalized programs, we have come to the following conclusions: • for relatively straightforward procedure learning problems, hillclimbing with random restart and a strong Occam bias is an effective method • for more difficult problems that elude hillclimbing, probabilistic evolutionary program learn- ing is an effective method The probabilistic evolutionary program learning method we have worked with most in OpenCog is MOSES, and significant evidence has been gathered showing it to be dramatically more effective than genetic programming on relevant classes of problems. However, more work needs to be done to evaluate its progress on complex and difficult procedure learning problems. Alternate, related probabilistic evolutionary program learning algorithms such as PLEASURE have also been considered and may be implemented and tested as well.

19.8.5.3 Attention Allocation

There is significant evidence that the brain uses some sort of “activation spreading” type method to allocate attention, and many algorithms in this spirit have been implemented and utilized in the AI literature. So, we find ourselves in agreement with many others that activation spreading 19.8 Clarifying the Key Claims 275 is a reasonable way to handle attentional knowledge (though other approaches, with greater overhead cost, may provide better accuracy and may be appropriate in some situations). We also agree with many others who have chosen Hebbian learning as one route of learning associative relationships, with more sophisticated methods such as information-geometric ones potentially also playing a role Where CogPrime differs from standard practice is in the use of an economic metaphor to reg- ulate activation spreading. In this matter CogPrime is broadly in agreement with Eric Baum’s arguments about the value of economic methods in AI, although our specific use of economic methods is very different from his. Baum’s work (e.g. Hayek) embodies more complex and com- putationally expensive uses of artificial economics, whereas we believe that in the context of a neural-symbolic network, artificial economics is an effective approach to activation spread- ing; and CogPrime’s ECAN framework seeks to embody this idea. ECAN can also make use of more sophisticated and expensive uses of artificial currency when large amount of system resources are involved in a single choice, rendering the cost appropriate. One major choice made in the CogPrime design is to focus on two kinds of attention: proces- sor (represented by ShortTermImportance) and memory (represented by LongTermImportance). This is a direct reflection of one of the key differences between the von Neumann architecture and the human brain: in the former but not the latter, there is a strict separation between memory and processing in the underlying compute fabric. We carefully considered the possibil- ity of using a larger variety of attention values, but for reasons of simplicity and computational efficiency we are currently using only STI and LTI in our OpenCogPrime implementation, with the possibility of extending further if experimentation proves it necessary.

19.8.5.4 Internal Simulation and Episodic Knowledge

For episodic knowledge, as with declarative and procedural knowledge, CogPrime has opted for a solution motivated by the particular strengths of contemporary digital computers. When the human brain runs through a “mental movie” of past experiences, it doesn’t do any kind of accurate physical simulation of these experiences. But that’s not because the brain wouldn’t benefit from such – it’s because the brain doesn’t know how to do that sort of thing! On the other hand, any modern laptop can run a reasonable Newtonian physics simulation of everyday events, and more fundamentally can recall and manage the relative positions and movements of items in an internal 3D landscape paralleling remembered or imagined real-world events. With this in mind, we believe that in an AGI context, simulation is a good way to handle episodic knowledge; and running an internal “world simulation engine” is an effective way to handle simulation. CogPrime can work with many different simulation engines; and since simulation technology is continually advancing independently of AGI technology, this is an area where AGI can buy some progressive advancement for free as time goes on. The subtle issues here regard interfacing between the simulation engine and the rest of the mind: mining meaningful information out of simulations using pattern mining algorithms; and more subtly, figuring out what simulations to run at what times in order to answer the questions most relevant to the AGI system in the context of achieving its goals. We believe we have architected these interactions in a viable way in the CogPrime design, but we have tested our ideas in this regard only in some fairly simple contexts regarding virtual pets in a virtual world, and much more remains to be done here. 276 19 The CogPrime Architecture and OpenCog System

19.8.5.5 Low-Level Perception and Action

The centrality or otherwise of low-level perception and action in human intelligence is a matter of ongoing debate in the AI community. Some feel that the essence of intelligence lies in cognition and/or language, with perception and action having the status of “peripheral devices.” Others feel that modeling the physical world and one’s actions in it is the essence of intelligence, with cognition and language emerging as side-effects of these more fundamental capabilities. The CogPrime architecture doesn’t need to take sides in this debate. Currently we are experimenting both in virtual worlds, and with real-world robot control. The value added by robotic versus virtual embodiment can thus be explored via experiment rather than theory, and may reveal nuances that no one currently foresees. As noted above, we are unconfident of CogPrime’s generic procedure learning or pattern recognition algorithms in terms of their capabilities to handle large amounts of raw sensorimotor data in real time, and so for robotic applications we advocate hybridizing CogPrime with a separate (but closely cross-linked) system better customized for this sort of data, in line with our general hypothesis that Hybridization of one’s integrative neural-symbolic system with a spatiotemporally hierarchical deep learning system is an effective way to handle representation and learning of low-level sensorimotor knowledge. While this general principle doesn’t depend on any particular approach, DeSTIN is one example of a deep learning system of this nature that can be effective in this context We have not yet done any sophisticated experiments in this regard – our prior experiments using OpenCog to control robots have involved cruder integration of OpenCog with perceptual and motor subsystems, rather than the tight hybridization described in [Goe04] and envisioned for our future work. Creating such a hybrid system is “just” a matter of software engineering, but testing such a system may lead to many surprises!

19.8.5.6 Goals

Given that we have characterized general intelligence as “the ability to achieve complex goals in complex environments,” it should be plain that goals play a central role in our work. However, we have chosen not to create a separate subsystem for intentional knowledge, and instead have concluded that one effective way to handle goals is to represent them declaratively, and allocate attention among them economically. An advantage of this approach is that it automatically provides integration between the goal system and the declarative and attentional knowledge systems. Goals and subgoals are related using logical links as interpreted and manipulated by PLN, and attention is allocated among goals using the STI dynamics of ECAN, and a specialized variant described in [? ] based on RFS’s (requests for service). Thus the mechanics of goal management is handled using uncertain inference and artificial economics, whereas the figuring-out of how to achieve goals is done integratively, relying heavily on procedural and episodic knowledge as well as PLN and ECAN. The combination of ECAN and PLN seems to overcome the well-known shortcomings found with purely neural-net or purely inferential approaches to goals. Neural net approaches gener- ally have trouble with abstraction, whereas logical approaches are generally poor at real-time responsiveness and at tuning their details quantitatively based on experience. At least in prin- 19.8 Clarifying the Key Claims 277 ciple, our hybrid approach overcomes all these shortcomings; though of current, it has been tested only in fairly simple cases in the virtual world.

19.8.6 Fulfilling the “Cognitive Equation”

A key claim based on the notion of the “Cognitive Equation” posited in Chaotic Logic [Goe94] is that it is important for an intelligent system to have some way of recognizing large-scale patterns in itself, and then embodying these patterns as new, localized knowledge items in its memory. This dynamic introduces a feedback dynamic between emergent pattern and substrate, which is hypothesized to be critical to general intelligence under feasible computational resources. It also ties in nicely with the notion of “glocal memory” – essentially positing a localization of some global memories, which naturally will result in the formation of some glocal memories. One of the key ideas underlying the CogPrime design is that given the use of a neural-symbolic network for knowledge representation, a graph-mining based “map formation” heuristic is one good way to do this. Map formation seeks to fulfill the Cognitive Equation quite directly, probably more directly than happens in the brain. Rather than relying on other cognitive processes to implicitly recog- nize overall system patterns and embody them in the system as localized memories (though this implicit recognition may also happen), the MapFormation MindAgent explicitly carries out this process. Mostly this is done using fairly crude greedy pattern mining heuristics, though if really subtle and important patterns seem to be there, more sophisticated methods like evolutionary pattern mining may also be invoked. It seems possible that this sort of explicit approach could be less efficient than purely implicit approaches; but, there is no evidence for this, and it may also provided increased efficiency. And in the context of the overall CogPrime design, the explicit MapFormation approach seems most natural.

19.8.7 Occam’s Razor

The key role of “Occam’s Razor” or the urge for simplicity in intelligence has been observed by many before (going back at least to Occam himself, and probably earlier!), and is fully embraced in the CogPrime design. Our favored theoretical analysis of intelligence portrays intelligence as closely tied to the creation of procedures that achieve goals in environments in the simplest possible way. And this quest for simplicity is present in many places throughout the CogPrime design, for instance • In MOSES and hillclimbing, where program compactness is an explicit component of pro- gram tree fitness • In PLN, where the backward and forward chainers explicitly favor shorter proof chains, and intensional inference explicitly characterizes entities in terms of their patterns (where patterns are defined as compact characterizations) • In pattern mining heuristics, which search for compact characterizations of data 278 19 The CogPrime Architecture and OpenCog System

• In the forgetting mechanism, which seeks the smallest set of Atoms that will allow the regeneration of a larger set of useful Atoms via modestly-expensive application of cognitive processes • Via the encapsulation of procedural and declarative knowledge in simulations, which in many cases provide a vastly compacted form of storing real-world experiences Like cognitive synergy and emergent networks, Occam’s Razor is not something that is imple- mented in a single place in the CogPrime design, but rather an overall design principle that underlies nearly every part of the system.

19.8.8 Cognitive Synergy

To understand more specifically how cognitive synergy works in CogPrime, in the following sub- sections we will review some synergies related to the key components of CogPrime as discussed above. These synergies are absolutely critical to the proposed functionality of the CogPrime system. Without them, the cognitive mechanisms are not going to work adequately well, but are rather going to succumb to combinatorial explosions. The other aspects of CogPrime - the cognitive architecture, the knowledge representation, the embodiment framework and associ- ated developmental teaching methodology - are all critical as well, but none of these will yield the critical emergence of intelligence without cognitive mechanisms that effectively scale. And, in the absence of cognitive mechanisms that effectively scale on their own, we must rely on cognitive mechanisms that effectively help each other to scale. The reasons why we believe these synergies will exist are essentially qualitative: we have not proved theorems regarded these syn- ergies, and we have observed them in practice only in simple cases so far. However, we do have some ideas regarding how to potentially prove theorems related to these synergies, and some of these are described in [? ].

19.8.8.1 Synergies that Help Inference

The combinatorial explosion in PLN is obvious: forward and backward chaining inference are both fundamentally explosive processes, reined in only by pruning heuristics. This means that for nontrivial complex inferences to occur, one needs really, really clever pruning heuristics. The CogPrime design combines simple heuristics with pattern mining, MOSES and economic attention allocation as pruning heuristics. Economic attention allocation assigns importance levels to Atoms, which helps guide pruning. Greedy pattern mining is used to search for patterns in the stored corpus of inference trees, to see if there are any that can be used as analogies for the current inference. And MOSES comes in when there is not enough information (from importance levels or prior inference history) to make a choice, yet exploring a wide variety of available options is unrealistic. In this case, MOSES tasks may be launched, pertinently to the leaves at the fringe of the inference tree, under consideration for expansion. For instance, suppose there is an Atom A at the fringe of the inference tree, and its importance hasn’t been assessed with high confidence, but a number of items B are known so that: MemberLink A B 19.8 Clarifying the Key Claims 279

Then, MOSES may be used to learn various relationships characterizing A, based on recognizing patterns across the set of B that are suspected to be members of A. These relationships may then be used to assess the importance of A more confidently, or perhaps to enable the inference tree to match one of the patterns identified by pattern mining on the inference tree corpus. For example, if MOSES figures out that: SimilarityLink G A then it may happen that substituting G in place of A in the inference tree, results in something that pattern mining can identify as being a good (or poor) direction for inference.

19.8.8.2 Synergies that Help MOSES

MOSES’s combinatorial explosion is obvious: the number of possible programs of size N increases very rapidly with N. The only way to get around this is to utilize prior knowledge, and as much as possible of it. When solving a particular problem, the search for new solutions must make use of prior candidate solutions evaluated for that problem, and also prior candidate solutions (including successful and unsuccessful ones) evaluated for other related problems. But, extrapolation of this kind is in essence a contextual analogical inference problem. In some cases it can be solved via fairly straightforward pattern mining; but in subtler cases it will require inference of the type provided by PLN. Also, attention allocation plays a role in figuring out, for a given problem A, which problems B are likely to have the property that candidate solutions for B are useful information when looking for better solutions for A.

19.8.8.3 Synergies that Help Attention Allocation

Economic attention allocation, without help from other cognitive processes, is just a very sim- ple process analogous to “activation spreading” and “Hebbian learning” in a neural network. The other cognitive processes are the things that allow it to more sensitively understand the attentional relationships between different knowledge items (e.g. which sorts of items are often usefully thought about in the same context, and in which order).

19.8.8.4 Further Synergies Related to Pattern Mining

Statistical, greedy pattern mining is a simple process, but it nevertheless can be biased in various ways by other, more subtle processes. For instance, if one has learned a population of programs via MOSES, addressing some particular fitness function, then one can study which items tend to be utilized in the same programs in this population. One may then direct pattern mining to find patterns combining these items found to be in the MOSES population. And conversely, relationships denoted by pattern mining may be used to probabilistically bias the models used within MOSES. Statistical pattern mining may also help PLN by supplying it with information to work on. For instance, conjunctive pattern mining finds conjunctions of items, which may then be combined with each other using PLN, leading to the formation of more complex predicates. These conjunctions may also be fed to MOSES as part of an initial population for solving a relevant problem. 280 19 The CogPrime Architecture and OpenCog System

Finally, the main interaction between pattern mining and MOSES/PLN is that the former may recognize patterns in links created by the latter. These patterns may then be fed back into MOSES and PLN as data. This virtuous cycle allows pattern mining and the other, more expensive cognitive processes to guide each other. Attention allocation also gets into the game, by guiding statistical pattern mining and telling it which terms (and which combinations) to spend more time on.

19.8.8.5 Synergies Related to Map Formation

The essential synergy regarding map formation is obvious: Maps are formed based on the HebbianLinks created via PLN and simpler attentional dynamics, which are based on which Atoms are usefully used together, which is based on the dynamics of the cognitive processes doing the “using.” On the other hand, once maps are formed and encapsulated, they feed into these other cognitive processes. This synergy in particular is critical to the emergence of self and attention. What has to happen, for map formation to work well, is that the cognitive processes must utilize encapsulated maps in a way that gives rise overall to relatively clear clusters in the network of HebbianLinks. This will happen if the encapsulated maps are not too complex for the system’s other learning operations to understand. So, there must be useful coordinated attentional patterns whose corresponding encapsulated-map Atoms are not too complicated. This has to do with the system’s overall parameter settings, but largely with the settings of the attention allocation component. For instance, this is closely tied in with the limited size of “attentional focus” (the famous 7 +/- 2 number associated with humans’ and other mammals short term memory capacity). If only a small number of Atoms are typically very important at a given point in time, then the maps formed by grouping together all simultaneously highly important things will be relatively small predicates, which will be easily reasoned about - thus keeping the “virtuous cycle” of map formation and comprehension going effectively.

19.8.9 Emergent Structures and Dynamics

We have spent much more time in this book on the engineering of cognitive processes and structures, than on the cognitive processes and structures that must emerge in an intelligent system for it to display human-level AGI. However, this focus should not be taken to represent a lack of appreciation for the importance of emergence. Rather, it represents a practical focus: engineering is what we must do to create a software system potentially capable of AGI, and emergence is then what happens inside the engineered AGI to allow it to achieve intelligence. Emergence must however be taken carefully into account when deciding what to engineer! One of the guiding ideas underlying the CogPrime design is that an AGI system with ade- quate mechanisms for handling the key types of knowledge mentioned above, and the capability to explicitly recognize large-scale pattern in itself, should upon sustained interaction with an appropriate environment in pursuit of appropriate goals, emerge a variety of com- plex structures in its internal knowledge network, including (but not limited to): a hierarchical network, representing both a spatiotemporal hierarchy and an approximate “default inheritance” hierarchy, cross-linked; a heterarchical network of associativity, roughly aligned with the hierar- 19.9 Measuring Incremental Progress Toward Human-Level AGI 281 chical network; a self network which is an approximate micro image of the whole network; and inter-reflecting networks modeling self and others, reflecting a “mirrorhouse” design pattern. The dependence of these posited emergences on the environment and goals of the AGI system should not be underestimated. For instance, PLN and pattern mining don’t have to lead to a hierarchical structured Atomspace. But if the AGI system is placed in an environment which it hierarchically structured via its own efforts, thenPLN and pattern mining very likely will lead to a hierarchically structured Atomspace. And if this environment consists of hierarchi- cally structured language and culture, then what one has is a system of minds with hierarchical networks, each reinforcing the hierarchality of each others’ networks. Similarly, integrated cog- nition doesn’t have to lead to mirrorhouse structures, but integrated cognition about situations involving other minds studying and predicting and judging each other, is very likely to do so. What is needed for appropriate emergent structures to arise in a mind, is mainly that the knowl- edge representation is sufficiently flexible to allow these structures, and the cognitive processes are sufficiently intelligent to observe these structures in the environment and then mirror them internally. Of course, it also doesn’t hurt if the internal structures and processes are at least slightly biased toward the origination of the particular high-level emergent structures that are characteristic of the system’s environment/goals; and this is indeed the case with CogPrime ... biases toward hierarchical, heterarchical, dual and mirrorhouse networks are woven throughout the system design, in a thoroughgoing though not extremely systematic way.

19.9 Measuring Incremental Progress Toward Human-Level AGI

We discussed, toward the start of this paper, various proposed tests for gauging achievement of human-level general intelligence. Perhaps more critical, however, is to think about how to measure incremental progress toward this goal. How do you tell when you’re 25% or 50% of the way to having an AGI that can pass the Turing Test, or get an online university degree. Fooling 50% of the Turing Test judges is not a good measure of being 50% of the way to passing the Turing Test (that’s too easy); and passing 50% of university classes is not a good measure of being 50% of the way to getting an online university degree (it’s too hard – if one had an AGI capable of doing that, one would almost surely be very close to achieving the end goal). Measuring incremental progress toward human-level AGI is a subtle thing, and we argue that the best way to do it is to focus on particular scenarios and the achievement of specific competencies therein. As we have argued in [? ], there are some theoretical reasons to doubt the possibility of creating a rigorous objective test for partial progress toward AGI – a test that would be con- vincing to skeptics, and impossible to "game" via engineering a system specialized to the test. Fortunately, though, we don’t need a test of this nature for the purposes of assessing our own incremental progress toward advanced AGI, based on our knowledge about our own approach. Based on the nature of the grand goals articulated above, there seems to be a very natural approach to creating a set of incremental capabilities building toward AGI: to draw on our copious knowledge about human cognitive development. This is by no means the only possible path; one can envision alternatives that have nothing to do with human development (and those might also be better suited to non-human AGIs). However, so much detailed knowledge about human development is available – as well as solid knowledge that the human developmental 282 19 The CogPrime Architecture and OpenCog System trajectory does lead to human-level AI – that the motivation to draw on human cognitive development is quite strong. The main problem with the human development inspired approach is that cognitive devel- opmental psychology is not as systematic as it would need to be for AGI to be able to translate it directly into architectural principles and requirements. While early thinkers like Piaget and Vygotsky outlined systematic theories of child cognitive development, these are no longer con- sidered fully accurate, and one currently faces a mass of detailed theories of various aspects of cognitive development, but without an unified understanding. Nevertheless we believe it is viable to work from the human-development data and understanding currently available, and craft a workable AGI roadmap therefrom. In this vein, Sam Adams and his team at IBM have outlined a so-called “Toddler Turing Test,” in which one seeks to use AI to control a robot qualitatively displaying similar cognitive behaviors to a young human child (say, a 3 year old) [AABL02]. In fact this sort of idea has a long and venerable history in the AI field – Alan Turing’s original 1950 paper on AI [Tur50], where he proposed the Turing Test, contains the suggestion that

"Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child’s?" We find this childlike cognition based approach promising for many reasons, including its in- tegrative nature: what a young child does involves a combination of perception, actuation, lin- guistic and pictorial communication, social interaction, conceptual problem solving and creative imagination. Specifically, inspired by these ideas, in [GB09] we have suggested the approach of teaching and testing early-stage AGI systems in environments that emulate the preschools used for teaching human children. Human intelligence evolved in response to the demands of richly interactive environments, and a preschool is specifically designed to be a richly interactive environment with the capability to stimulate diverse mental growth. So, we are currently exploring the use of CogPrime to control virtual agents in preschool-like virtual world environments, as well as commercial humanoid robot platforms such as the Nao or Robokind in physical preschool-like robot labs. Another advantage of focusing on childlike cognition is that child psychologists have created a variety of instruments for measuring child intelligence. In [? ], we present some details of an approach to evaluating the general intelligence of human childlike AGI systems via combining tests typically used to measure the intelligence of young human children, with additional tests crafted based on cognitive science and the standard preschool curriculum. According to this approach, we don’t suggest to place a CogPrime system in an environment that is an exact imitation of a human preschool – this would be inappropriate since current robotic or virtual bodies are very differently abled than the body of a young human child. But we aim to place CogPrime in an environment emulating the basic diversity and educational character of a typical human preschool.

19.9.1 Competencies and Tasks on the Path to Human-Level AI

With this preschool focus in mind, we give in this subsection a fairly comprehensive list of the competencies that we feel AI systems should be expected to display in one or more of these scenarios in order to be considered as full-fledged "human level AGI" systems. These 19.9 Measuring Incremental Progress Toward Human-Level AGI 283 competency areas have been assembled somewhat opportunistically via a review of the cognitive and developmental psychology literature as well as the scope of the current AI field. We are not claiming this as a precise or exhaustive list of the competencies characterizing human-level general intelligence, and will be happy to accept additions to the list, or mergers of existing list items, etc. What we are advocating is not this specific list, but rather the approach of enumerating competency areas, and then generating tasks by combining competency areas with scenarios. We also give, with each competency, an example task illustrating the competency. The tasks are expressed in the robot preschool context for concreteness, but they all apply to the virtual preschool as well. Of course, these are only examples, and ideally to teach an AGI in a structured way one would like to • associate several tasks with each competency • present each task in a graded way, with multiple subtasks of increasing complexity • associate a quantitative metric with each task

However, the briefer treatment given here should suffice to give a sense for how the competencies manifest themselves practically in the AGI Preschool context. 1. Perception • Vision: image and scene analysis and understanding – Example task: When the teacher points to an object in the preschool, the robot should be able to identify the object and (if it’s a multi-part object) its major parts. If it can’t perform the identification initially, it can approach the object and manipulate it before making its identification. • Hearing: identifying the sounds associated with common objects; understanding which sounds come from which sources in a noisy environment – Example task: When the teacher covers the robot’s eyes and then makes a noise with an object, the robot should be able to guess what the object is • Touch: identifying common objects and carrying out common actions using touch alone – Example task: With its eyes and ears covered, the robot should be able to identify some object by manipulating it; and carry out some simple behaviors (say, putting a block on a table) via touch alone • Crossmodal: Integrating information from various senses – Example task: Identifying an object in a noisy, dim environment via combining visual and auditory information • Proprioception: Sensing and understanding what its body is doing – Example task: The teacher moves the robot’s body into a certain configuration. The robot is asked to restore its body to an ordinary standing position, and then repeat the configuration that the teacher moved it into. 2. Actuation • Physical skills: manipulating familiar and unfamiliar objects 284 19 The CogPrime Architecture and OpenCog System

– Example task: Manipulate blocks based on imitating the teacher: e.g. pile two blocks atop each other, lay three blocks in a row, etc. • Tool use, including the flexible use of ordinary objects as tools – Example task: Use a stick to poke a ball out of a corner, where the robot cannot directly reach • Navigation, including in complex and dynamic environments – Example task: Find its own way to a named object or person through a crowded room with people walking in it and objects laying on the floor. 3. Memory • Declarative: noticing, observing and recalling facts about its environment and expe- rience – Example task: If certain people habitually carry certain objects, the robot should remember this (allowing it to know how to find the objects when the relevant people are present, even much later) • Behavioral: remembering how to carry out actions – Example task: If the robot is taught some skill (say, to fetch a ball), it should remember this much later • Episodic: remembering significant, potentially useful incidents from life history – Example task: Ask the robot about events that occurred at times when it got partic- ularly much, or particularly little, reward for its actions; it should be able to answer simple questions about these, with significantly more accuracy than about events occurring at random times 4. Learning

• Imitation: Spontaneously adopt new behaviors that it sees others carrying out – Example task: Learn to build towers of blocks by watching people do it • Reinforcement: Learn new behaviors from positive and/or negative reinforcement signals, delivered by teachers and/or the environment – Example task: Learn which box the red ball tends to be kept in, by repeatedly trying to find it and noticing where it is, and getting rewarded when it finds it correctly • Imitation/Reinforcement – Example task: Learn to play “fetch”, “tag” and “follow the leader” by watching people play it, and getting reinforced on correct behavior • Interactive Verbal Instruction – Example task: Learn to build a particular structure of blocks faster based on a combination of imitation, reinforcement and verbal instruction, than by imitation and reinforcement without verbal instruction 19.9 Measuring Incremental Progress Toward Human-Level AGI 285

• Written Media – Example task: Learn to build a structure of blocks by looking at a series of diagrams showing the structure in various stages of completion • Learning via Experimentation – Example task: Ask the robot to slide blocks down a ramp held at different angles. Then ask it to make a block slide fast, and see if it has learned how to hold the ramp to make a block slide fast. 5. Reasoning • Deduction, from uncertain premises observed in the world – Example task: If Ben more often picks up red balls than blue balls, and Ben is given a choice of a red block or blue block to pick up, which is he more likely to pick up? • Induction, from uncertain premises observed in the world – Example task: If Ben comes into the lab every weekday morning, then is Ben likely to come to the lab today (a weekday) in the morning? • Abduction, from uncertain premises observed in the world – Example task: If women more often give the robot food than men, and then someone of unidentified gender gives the robot food, is this person a man or a woman? • Causal reasoning, from uncertain premises observed in the world – Example task: If the robot knows that knocking down Ben’s tower of blocks makes him mad, then what will it say when asked if kicking the ball at Ben’s tower of blocks will make Ben mad? • Physical reasoning, based on observed “fuzzy rules” of naive physics – Example task: Given two balls (one rigid and one compressible) and two tunnels (one significantly wider than the balls, one slightly narrower than the balls), can the robot guess which balls will fit through which tunnels? • Associational reasoning, based on observed spatiotemporal associations – Example task: If Ruiting is normally seen near Shuo, then if the robot knows where Shuo is, that is where it should look when asked to find Ruiting 6. Planning

• Tactical – Example task: The robot is asked to bring the red ball to the teacher, but the red ball is in the corner where the robot can’t reach it without a tool like a stick. The robot knows a stick is in the cabinet so it goes to the cabinet and opens the door and gets the stick, and then uses the stick to get the red ball, and then brings the red ball to the teacher. • Strategic 286 19 The CogPrime Architecture and OpenCog System

– Example task: Suppose that Matt comes to the lab infrequently, but when he does come he is very happy to see new objects he hasn’t seen before (and suppose the robot likes to see Matt happy). Then when the robot gets a new object Matt has not seen before, it should put it away in a drawer and be sure not to lose it or let anyone take it, so it can show Matt the object the next time Matt arrives. • Physical – Example task: To pick up a cup with a handle which is lying on its side in a position where the handle can’t be grabbed, the robot turns the cup in the right position and then picks up the cup by the handle • Social – Example task: The robot is given a job of building a tower of blocks by the end of the day, and he knows Ben is the most likely person to help him, and he knows that Ben is more likely to say "yes" to helping him when Ben is alone. He also knows that Ben is less likely to say yes if he’s asked too many times, because Ben doesn’t like being nagged. So he waits to ask Ben till Ben is alone in the lab. 7. Attention • Visual Attention within its observations of its environment – Example task: The robot should be able to look at a scene (a configuration of objects in front of it in the preschool) and identify the key objects in the scene and their relationships. • Social Attention – Example task: The robot is having a conversation with Itamar, which is giving the robot reward (for instance, by teaching the robot useful information). Conversations with other individuals in the room have not been so rewarding recently. But Itamar keeps getting distracted during the conversation, by talking to other people, or playing with his cellphone. The robot needs to know to keep paying attention to Itamar even through the distractions. • Behavioral Attention – Example task: The robot is trying to navigate to the other side of a crowded room full of dynamic objects, and many interesting things keep happening around the room. The robot needs to largely ignore the interesting things and focus on the movements that are important for its navigation task. 8. Motivation • Subgoal creation, based on its preprogrammed goals and its reasoning and planning – Example task: Given the goal of pleasing Hugo, can the robot learn that telling Hugo facts it has learned but not told Hugo before, will tend to make Hugo happy? • Affect-based motivation 19.9 Measuring Incremental Progress Toward Human-Level AGI 287

– Example task: Given the goal of gratifying its curiosity, can the robot figure out that when someone it’s never seen before has come into the preschool, it should watch them because they are more likely to do something new? • Control of emotions – Example task: When the robot is very curious about someone new, but is in the middle of learning something from its teacher (who it wants to please), can it control its curiosity and keep paying attention to the teacher? 9. Emotion

• Expressing Emotion – Example task: Cassio steals the robot’s toy, but Ben gives it back to the robot. The robot should appropriately display anger at Cassio, and gratitude to Ben. • Understanding Emotion – Example task: Cassio and the robot are both building towers of blocks. Ben points at Cassio’s tower and expresses happiness. The robot should understand that Ben is happy with Cassio’s tower.

10. Modeling Self and Other • Self-Awareness – Example task: When someone asks the robot to perform an act it can’t do (say, reaching an object in a very high place), it should say so. When the robot is given the chance to get an equal reward for a task it can complete only occasionally, versus a task it finds easy, it should choose the easier one. • Theory of Mind – Example task: While Cassio is in the room, Ben puts the red ball in the red box. Then Cassio leaves and Ben moves the red ball to the blue box. Cassio returns and Ben asks him to get the red ball. The robot is asked to go to the place Cassio is about to go. • Self-Control – Example task: Nasty people come into the lab and knock down the robot’s towers, and tell the robot he’s a bad boy. The robot needs to set these experiences aside, and not let them impair its self-model significantly; it needs to keep on thinking it’s a good robot, and keep building towers (that its teachers will reward it for). • Other-Awareness – Example task: If Ben asks Cassio to carry out a task that the robot knows Cassio cannot do or does not like to do, the robot should be aware of this, and should bet that Cassio will not do it. • Empathy 288 19 The CogPrime Architecture and OpenCog System

– Example task: If Itamar is happy because Ben likes his tower of blocks, or upset because his tower of blocks is knocked down, the robot should express and display these same emotions

11. Social Interaction • Appropriate Social Behavior – Example task: The robot should learn to clean up and put away its toys when it’s done playing with them. • Social Communication – Example task: The robot should greet new human entrants into the lab, but if it knows the new entrants very well and it’s busy, it may eschew the greeting • Social Inference about simple social relationships – Example task: The robot should infer that Cassio and Ben are friends because they often enter the lab together, and often talk to each other while they are there • Group Play at loosely-organized activities – Example task: The robot should be able to participate in “informally kicking a ball around” with a few people, or in informally collaboratively building a structure with blocks 12. Communication

• Gestural communication to achieve goals and express emotions – Example task: If the robot is asked where the red ball is, it should be able to show by pointing its hand or finger • Verbal communication using English in its life-context – Example tasks: Answering simple questions, responding to simple commands, de- scribing its state and observations with simple statements • Pictorial Communication regarding objects and scenes it is familiar with – Example task: The robot should be able to draw a crude picture of a certain tower of blocks, so that e.g the picture looks different for a very tall tower and a wide low one • Language acquisition – Example task: The robot should be able to learn new words or names via people uttering the words while pointing at objects exemplifying the words or names • Cross-modal communication – Example task: If told to "touch Bob’s knee" but the robot doesn’t know what a knee is, being shown a picture of a person and pointed out the knee in the picture should help it figure out how to touch Bob’s knee 13. Quantitative 19.9 Measuring Incremental Progress Toward Human-Level AGI 289

• Counting sets of objects in its environment – Example task: The robot should be able to count small (homogeneous or heteroge- neous) sets of objects • Simple, grounded arithmetic with small numbers – Example task: Learning simple facts about the sum of integers under 10 via teaching, reinforcement and imitation • Comparison of observed entities regarding quantitative properties – Example task: Ability to answer questions about which object or person is bigger or taller • Measurement using simple, appropriate tools – Example task: Use of a yardstick to measure how long something is 14. Building/Creation • Physical: creative constructive play with objects – Example task: Ability to construct novel, interesting structures from blocks • Conceptual invention: concept formation – Example task: Given a new category of objects introduced into the lab (e.g. hats, or pets), the robot should create a new internal concept for the new category, and be able to make judgments about these categories (e.g. if Ben particularly likes pets, it should notice this after it has identified "pets" as a category) • Verbal invention – Example task: Ability to coin a new word or phrase to describe a new object (e.g. the way Alex the parrot coined "bad cherry" to refer to a tomato) • Social – Example task: If the robot wants to play a certain activity (say, practicing soccer), it should be able to gather others around to play with it

Based on these competencies and associated tasks, it is not hard to articulate a specific roadmap for progress toward human-level AGI, inspired by human child development. This sort of roadmap does not give a highly rigorous, objective way of assessing the percentage of progress toward the end-goal of human-level AGI. However, it gives a much better sense of progress than one would have otherwise. For instance, if an AGI system performed well on diverse metrics corresponding to 50% of the competency areas listed above, one would seem justified in claiming to have made very substantial progress toward human-level AGI. If an AGI system performed well on diverse metrics corresponding to 90% of these competency areas, one would seem justified in claiming to be "almost there." Achieving, say, 25% of the metrics would give one a reasonable claim to "interesting AGI progress." This kind of qualitative assessment of progress is not the most one could hope for, but again, it is better than the progress indications one could get without this sort of roadmap. 290 19 The CogPrime Architecture and OpenCog System 19.10 A CogPrime Thought Experiment: Build Me Something I Haven’t Seen Before

AGI design necessarily leads one into some rather abstract spaces – but being a human-like intelligence in the everyday world is a pretty concrete thing. If the CogPrime research program is successful, it will result not just in abstract ideas and equations, but rather in real AGI robots carrying out tasks in the everyday human world; and AGI agents in virtual worlds and online digital spaces conducting important business, doing science, entertaining and being entertained by us, and so forth. With this in mind, in this final chapter we will bring the discussion closer to the concrete and everyday, and pursue a thought experiment of the form ”How would a completed CogPrime system carry out this specific task?” The task we will use for this thought-experiment is one we have sometimes used as a running example in our internal technical discussions within the OpenCog team, and have briefly men- tioned above. We consider the case of a robotically or virtually embodied CogPrime system, operating in a preschool type environment, interacting with a human whom it already knows and given the task of ”Build me something with blocks that I haven’t seen before.” This target task is fairly simple, but it is complex enough to involve essentially every one of CogPrime ’s processes, interacting in a unified way. We consider the case of a simple interaction based on the above task where: 1. The human teacher tells the CogPrime agent ”Build me something with blocks that I haven’t seen before.” 2. After a few false starts, the agent builds something it thinks is appropriate and says ”Do you like it?” 3. The human teacher says ”It’s beautiful. What is it?” 4. The agent says ”It’s a car man” [and indeed, the construct has 4 wheels and a chassis vaguely like a car, but also a torso, arms and head vaguely like a person] Of course, a complex system like CogPrime could carry out an interaction like this internally in many different ways, and what is roughly described here is just one among many possibilities. In [? ], this example is discussed more thoroughly, via enumerating a number of CogPrime processes and explaining some ways that each one may help CogPrime carry out the target task. Here, instead, we give more evocative "semi-narrative" conveying the dynamics that would occur in CogPrime while carrying out the target task, and mentioning how each of the enumerated cognitive processes as it arises in the narrative. The reason we call this a semi-narrative rather than a narrative, is that there is no particular linear order to the processes occurring in each phase of the situation described here. CogPrime’s internal cognitive processes do not occur in a linear narrative; rather, what we have is a complex network of interlocking events. But still, describing some of these events concretely in a manner correlated with the different stages of a simple interaction, may have some expository value.

19.10.1 Let the Semi-Narrative Begin...

The human teacher tells the CogPrime agent ”Build me something with blocks that I haven’t seen before.” 19.10 A CogPrime Thought Experiment: Build Me Something I Haven’t Seen Before 291

Upon hearing this, the agent’s cognitive cycles are dominated by language processing and retrieval from episodic and sensory memory. The agent may decide to revive from disk the mind-states it went through when building human-pleasing structures from blocks before, so as to provide it with guidance It will likely experience the emotion of happiness, because it anticipates the pleasure of getting rewarded for the task in future. The ubergoal (pre-programmed, top-level goal) of pleasing the teacher gets active (gets funded significantly with STI currency), as there it becomes apparent there are fairly clear ways of fulfilling that goal (via the subgoal S of building blocks structures that will get positive response from the teacher). Other ubergoals like gaining knowledge are not funded as much with STI currency just now, as they are not immediately relevant. Action selection, based on ImplicationLinks derived via PLN (between various possible activ- ities and the subgoal S) causes it to start experimentally building some blocks structures. Past experience with building (turned into ImplicationLinks via mining the SystemActivityTable) tells it that it may want to build a little bit in its internal simulation world before building in the external world, causing STI currently to flow to the simulation MindAgent. The Atom corresponding to the context blocks-building gets high STI and is pushed into the AttentionalFocus (the set of Atoms with the highest ShortTermImportance values), making it likely that many future inferences will occur in this context. Other Atoms related to this one also get high STI (the ones in the blocks-building map, and others that are especially related to blocks-building in this particular context). After a few false starts, the agent builds something it thinks is appropriate and says ”Do you like it?” Now that the agent has decided what to do to fulfill its well-funded goal, its cognitive cycles are dominated by action, perception and related memory access and concept creation. An obvious subgoal is spawned: build a new structure now, and make this particular structure under construction appealing and novel to the teacher. This subgoal has a shorter time scale than the high level goal. The subgoal gets some currency from its supergoal using the mechanism of RFS spreading. Action selection must tell it when to continue building the same structure and when to try a new one, as well as more micro level choices Atoms related to the currently pursued blocks structure get high STI. After a failed structure (a Òfalse startÓ) is disassembled, the corresponding Atoms lose STI dramatically (leaving AF) but may still have significant LTI, so they can be recalled later as appropriate. They may also have VLTI so they will be saved to disk later on if other things push them out of RAM due to getting higher LTI. Meanwhile everything that’s experienced from the external world goes into the Experi- enceDB. Atoms representing different parts of aspects of the same blocks structure will get Hebbia Links between them, which will guide future reasoning and importance spreading. Importance spreading helps the system go from an idea for something to build (say, a rock or a car) to the specific plans and ideas about how to build it, via increasing the STI of the Atoms that will be involved in these plans and ideas. If something apparently good is done in building a blocks structure, then other processes and actions that helped lead to or support that good thing, get passed some STI from the Atoms representing the good thing, and also may get linked to the Goal Atom representing ÒgoodÓ in this context. This leads to reinforcement learning. 292 19 The CogPrime Architecture and OpenCog System

The agent may play with building structures and then seeing what they most look like, thus exercising abstract object recognition (that uses procedures learned by MOSES or hillclimbing, or uncertain relations learned by inference, to guess what object category a given observed collection of percepts most likely falls into). Since the agent has been asked to come up with something surprising, it knows it should probably try to formulate some new concepts. Because it has learned in the past, via SystemAc- tivityTable mining, that often newly formed concepts are surprising to others. So, more STI currency is given to concept formation MindAgents, such as the ConceptualBlending MindA- gent (which, along with a lot of stuff that gets thrown out or stored for later use, comes up with Òcar-manÓ). When the notion of ÒcarÓ is brought to mind, the distributed map of nodes corresponding to ÒcarÓ get high STI. When car-man is formed, it is reasoned about (producing new Atoms), but it also serves as a nexus of importance-spreading, causing the creation of a distributed car-man map. If the goal of making an arm for a man-car occurs, then goal-driven schema learning may be done to learn a procedure for arm-making (where the actual learning is done by MOSES or hill-climbing). If the agent is building a man-car, it may have man-building and car-building schema in its ActiveSchemaPool at the same time, and SchemaActivation may spread back and forth between the different modules of these two schema. If the agent wants to build a horse, but has never seen a horse made of blocks (only various pictures and movies of horses), it may uses MOSES or hillclimbing internally to solve the problem of creating a horse-recognizer or a horse-generator which embodies appropriate abstract properties of horses. Here as in all cases of procedure learning, a complexity penalty rewards simpler programs, from among all programs that approximately fulfill the goals of the learning process. If a procedure being executed has some abstract parts, then these may be executed by inferential procedure evaluation (which makes the abstract parts concrete on the fly in the course of execution). To guess the fitness of a procedure for doing something (say, building an arm or recognizing a horse), inference or simulation may be used, as well as direct evaluation in the world. Deductive, inductive and abductive PLN inference may be used in figuring out what a blocks structure will look or act like like before building it (it’s tall and thin so it may fall down; it won’t be bilaterally symmetric so it won’t look much like a person; etc.) Backward-chaining inference control will help figure out how to assemble something matching a certain specification Ð e.g. how to build a chassis based on knowledge of what a chassis looks like. Forward chaining inference (critically including intensional relationships) will be used to estimate the properties that the teacher will perceive a given specific structure to have. Spatial and temporal algebra will be used extensively in this reasoning, within the PLN framework. Coordinating different parts of the body Ð say an arm and a hand Ð will involve importance spreading (both up and down) within the hierarchical action network, and from this network to the hierarchical perception network and the heterarchical cognitive network. In looking up Atoms in the AtomSpace, some have truth values whose confidences have decayed significantly (e.g. those regarding the teacher’s tastes), whereas others have confidences that have hardly decayed at all (e.g. those regarding general physical properties of blocks). 19.10 A CogPrime Thought Experiment: Build Me Something I Haven’t Seen Before 293

Finding previous blocks structures similar to the current one (useful for guiding building by analogy to past experience) may be done rapidly by searching the system’s internal dimensional- embedding space. As the building process occurs, patterns mined via past experience (tall things often fall down) are used within various cognitive processes (reasoning, procedure learning, concept cre- ation, etc.); and new pattern mining also occurs based on the new observations made as different structures are build and experimented with and destroyed. Simulation of teacher reactions, based on inference from prior examples, helps with the evaluation of possible structures, and also of procedures for creating structures. As the agent does all this, it experiences the emotion of curiosity (likely among other emo- tions), because as it builds each new structure it has questions about what it will look like and how the teacher would react to it. The human teacher says ”It’s beautiful. What is it?” The agent says ”It’s a car man” Now that the building is done and the teacher says something, the agent’s cognitive cycles are dominated by language understanding and generation. The Atom representing the context of talking to the teacher gets high STI, and is used as the context for many ensuing inferences. Comprehension of “it” uses anaphor resolution based on a combination of ECAN and PLN inference based on a combination of previously interpreted language and observation of the external world situation. The agent experiences the emotion of happiness because the teacher has called its creation beautiful, which is recognizes as a positive evaluation Ð so the agent knows one of its ubergoals (“please the teacher”) has been significantly fulfilled. The goal of pleasing the teacher causes the system to want to answer the question. So the QuestionAnswering DialogueController schema gets paid a lot and gets put into the Ac- tiveSchemaPool. In reaction to the question asked, this DC chooses a semantic graph to speak, then invokes NL generation to say it. NL generation chooses the most compact expression that seems to adequately convey the intended meaning, so it decides on “car man” as the best simple verbalization to match the newly created conceptual blend that it thinks effectively describes the newly created blocks structure. The positive feedback from the user leads to reinforcement of the Atoms and processes that led to the construction of the blocks structure that has been judged beautiful (via importance spreading and SystemActivityTable mining).

19.10.2 Conclusion

The simple situation considered in this section is complex enough to involve nearly all the different cognitive processes in the CogPrime system – and many interactions between these processes. This fact illustrates one of the main difficulties of designing, building and testing an artificial mind like CogPrime – until nearly all of the system is built and made to operate in an integrated way, it’s hard to do any meaningful test of the system. Testing PLN or MOSES or conceptual blending in isolation may be interesting computer science, but it doesn’t tell you much about CogPrime as a design for a thinking machine. 294 19 The CogPrime Architecture and OpenCog System

According to the CogPrime approach, getting a simple child-like interaction like ”build me something with blocks that I haven’t seen before” to work properly requires a holistic, integrated cognitive system. Once one has built a system capable of this sort of simple interaction then, according to the theory underlying CogPrime, one is not that far from a system with adult human-level intelligence. Of course there will be a lot of work to do to get from a child-level system to an adult-level system – it won’t necessarily unfold as ”automatically” as seems to happen with a human child, because CogPrime lacks the suite of developmental processes and mechanisms that the young human brain has. But still, a child CogPrime mind capable of doing the things outlined in this chapter will have all the basic components and interactions in place, all the ones that are needed for a much more advanced artificial mind. Of course, one could concoct a narrow-AI system carrying out the specific activities described in this section, much more simply than one could build a CogPrime system capable of doing these activities. But that’s not the point – the point of this section has been not to explain how to achieve some particular narrow set of activities ”by any means necessary”, but rather to explain how these activities might be achieved within the CogPrime framework, which has been designed with much more generality in mind. It would be worthwhile to elaborate a number of other situations similar to the one de- scribed in this chapter, and to work through the various cognitive processes and structures in CogPrime carefully in the context of each of these situations. In fact this sort of exercise has frequently been carried out informally in the context of developing CogPrime. But this paper is already long enough, so we will end here, and leave the rest for future works – emphasizing that it is via intimate interplay between concrete considerations like the ones presented in this section, and general algorithmic and conceptual considerations, that we have the greatest hope of creating advanced AGI. The value of this sort of interplay actually follows from the theory of real-world general intelligence indicated above. Thoroughly general intelligence is only possible given unrealistic computational resources, so real-world general intelligence is about achieving high generality given limited resources relative to the specific classes of environments relevant to a given agent. Specific situations like building surprising things with blocks are particularly im- portant insofar as they embody broader information about the classes of environments relevant to broadly human-like general intelligence. No doubt, once a CogPrime system is completed, the specifics of its handling of the situation described here will differ somewhat from the treatment presented in this chapter. Furthermore, the final CogPrime system may differ algorithmically and structurally in some respects from the specifics outlined here – it would be surprising if the process of building, testing and interacting with CogPrime didn’t teach us some new things about various of the topics covered. But our conjecture is that, if sufficient effort is deployed appropriately, then a system much like the CogPrime system here will be able to handle the situation described in this section in a roughly similar manner to the one described in this chapter – and that this will serve as a natural precursor to much more dramatic AGI achievements.

19.11 Broader Issues

Current, practical AGI development is at a quite early stage – we are still struggling to get our AGI systems to deal with basic situations like creative blocks play, which human children and even apes handle easily. This is not surprising, as when one takes a design like CogPrime 19.11 Broader Issues 295 and gradually spells it out into a specific software system, one obtains a large and complex system, which requires at least dozens of expert human-years to refine, implement and test. Much simpler kinds of software, such as word processors, games or operating systems, often require dozens to hundreds of expert human years for their realization. Even at this early stage, however, it is worthwhile to pay attention to the broader issues related to AGI development, such as the potential for escalation of AGI beyond the human level, and the ethical implications of human level or more advanced AGI. Given the reality of exponential technological acceleration [Kur06], it is quite possible for an area of technology to progress from the early to advanced stages in a brief number of years.

19.11.1 Ethical AGI

Creating an AGI with guaranteeably ethical behavior seems an infeasible task; but of course, no human is guaranteeably ethical either, and in fact it seems almost guaranteed that in any moderately large group of humans there are going to be some with strong propensities for extremely unethical behaviors, according to any of the standard human ethical codes. One of our motivations in developing CogPrime has been the belief that an AGI system, if supplied with a commonsensically ethical goal system and an intentional component based largely on rigorous uncertain inference, should be able to reliably achieve a much higher level of commonsensically ethical behavior than any human being. Our explorations in the detailed design of CogPrime’s goal system have done nothing to degrade this belief. While we have not yet developed any CogPrime system to the point where experimenting with its ethics is meaningful, based on our understanding of the current design it seems to us that

• a typical CogPrime system will display a much more consistent and less conflicted and confused motivational system than any human being, due to its explicit orientation toward carrying out actions that (based on its knowledge) rationally seem most likely to lead to achievement of its goals • if a CogPrime system is given goals that are consistent with commonsensical human ethics (say, articulated in natural language), and then educated in an ethics-friendly environment such as a virtual or physical school, then it is reasonable to expect the CogPrime system will ultimately develop an advanced (human adult level or beyond) form of commmonsensical human ethics Human ethics is itself wracked with inconsistencies, so one cannot expect a rationality-based AGI system to precisely mirror the ethics of any particular human individual or cultural system. But given the degree to which general intelligence represents adaptation to its environment, and interpretation of natural language depends on life history and context, it seems very likely to us that a CogPrime system, if supplied with a human-commonsense-ethics based goal system and then raised by compassionate and intelligent humans in a school-type environment, would arrive at its own variant of human-commonsense-ethics. The AGI system’s ethics would then interact with human ethical systems in complex ways, leading to ongoing evolution of both systems and the development of new cultural and ethical patterns. Predicting the future is difficult even in the absence of radical advanced technologies, but our intuition is that this path has the potential to lead to beneficial outcomes for both human and machine intelligence. 296 19 The CogPrime Architecture and OpenCog System 19.11.2 Toward Superhuman General Intelligence

Human-level AGI is a difficult goal, relative to the current state of scientific understanding and engineering capability, and most of our work on CogPrime has been focused on our ideas about how to achieve it. However, we also intuitively suspect the CogPrime architecture has the ultimate potential to push beyond the human level in many ways. As part of this suspi- cion we advance the claim that once sufficiently advanced, a CogPrime system should be able to radically self-improve via a variety of methods, including supercompilation and automated theorem-proving. Supercompilation allows procedures to be automatically replaced with equivalent but mas- sively more time-efficient procedures. This is particularly valuable in that it allows AI algorithms to learn new procedures without much heed to their efficiency, since supercompilation can al- ways improve the efficiency afterwards. So it is a real boon to automated program learning. Theorem-proving is difficult for current narrow-AI systems, but for an AGI system with a deep understanding of the context in which each theorem exists, it should be much easier than for human mathematicians. So we envision that ultimately an AGI system will be able to design itself new algorithms and data structures via proving theorems about which ones will best help it achieve its goals in which situations, based on mathematical models of itself and its environment. Once this stage is achieved, it seems that machine intelligence may begin to vastly outdo human intelligence, leading in directions we cannot now envision. While such projections may seem science-fictional, we note that the CogPrime architecture explicitly supports such steps. If human-level AGI is achieved within the CogPrime framework, it seems quite feasible that profoundly self-modifying behavior could be achieved fairly shortly thereafter. For instance, one could take a human-level CogPrime system and teach it computer science and mathematics, so that it fully understood the reasoning underlying its own design, and the whole mathematics curriculum leading up the the algorithms underpinning its cognitive processes.

19.11.3 Conclusion

What we have sought to do in this overview chapter is, mainly, • to articulate a theoretical perspective on general intelligence, according to which the cre- ation of a human-level AGI doesn’t require anything that extraordinary, but “merely” an appropriate combination of closely interoperating algorithms operating on an appropriate multi-type memory system, utilized to enable a system in an appropriate body and envi- ronment to figure out how to achieve its given goals • to describe a (CogPrime) that, according to this somewhat mundane but theoretically quite well grounded vision of general intelligence, appears likely (according to a combination of rigorous and heuristic arguments) to be able to lead to human-level AGI using feasible computational resources • to describe some of the preliminary lessons we’ve learned via implementing and experiment- ing with aspects of the CogPrime design, in the OpenCog system We wish to stress that not all of our arguments and ideas need to be 100% correct in order for the project to succeed. The quest to create AGI is a mix of theory, engineering, and scientific 19.11 Broader Issues 297 and unscientific experimentation. If the current CogPrime design turns out to have significant shortcomings, yet still brings us a significant percentage of the way toward human-level AGI, the results obtained along the path will very likely give us clues about how to tweak the design to more effectively get the rest of the way there. And the OpenCog platform is extremely flexible and extensible, rather than being tied to the particular details of the CogPrime design. While we do have faith that the CogPrime design as described here has human-level AGI potential, we are also pleased to have a development strategy and implementation platform that will allow us to modify and improve the design in accordance with whatever suggestions are made by our ongoing experimentation. Many great achievements in history have seemed more magical before their first achievement than afterwards. Powered flight and spaceflight are the most obvious examples, but there are many others such as mobile telephony, prosthetic limbs, electronically deliverable books, robotic factory workers, and so on. We now even have wireless transmission of power (one can recharge cellphones via wifi), though not yet as ambitiously as Tesla envisioned. We very strongly suspect that human-level AGI is in the same category as these various examples: an exciting and amazing achievement, which however is achievable via systematic and careful application of fairly mundane principles. We believe computationally feasible human-level intelligence is both complicated (involving many interoperating parts, each sophisticated in their own right) and complex (in the sense of involving many emergent dynamics and structures whose details are not easily predictable based on the parts of the system) ... but that neither the complication nor the complexity is an obstacle to engineering human-level AGI. In our view, what is needed to create human-level AGI is not a new scientific breakthrough, nor a miracle, but “merely” a sustained effort over a number of years by a moderate-sized team of appropriately-trained professionals, completing the implementation of an adequate design – such as the one described in [? ][? ] and roughly sketched here – and then parenting and educating the resulting implemented system. 298 19 The CogPrime Architecture and OpenCog System

Fig. 19.3: Key Explicitly Implemented Processes of CogPrime. The large box at the center is the Atomspace, the system’s central store of various forms of (long-term and working) memory, which contains a weighted labeled hypergraph whose nodes and links are "Atoms" of various sorts. The hexagonal boxes at the bottom denote various hierarchies devoted to recog- nition and generation of patterns: perception, action and linguistic. Intervening between these recognition/generation hierarchies and the Atomspace, we have a pattern mining/imprinting component (that recognizes patterns in the hierarchies and passes them to the Atomspace; and imprints patterns from the Atomspace on the hierarchies); and also OpenPsi, a special dynam- ical framework for choosing actions based on motivations. Above the Atomspace we have a host of cognitive processes, which act on the Atomspace, some continually and some only as context dictates, carrying out various sorts of learning and reasoning (pertinent to various sorts of memory) that help the system fulfill its goal sand motivations. 19.11 Broader Issues 299

Fig. 19.4: MindAgents and AtomSpace in OpenCog. This is a conceptual depiction of one way cognitive processes may interact in OpenCog – they may be wrapped in MindAgent objects, which interact via cooperatively acting on the AtomSpace. 300 19 The CogPrime Architecture and OpenCog System

Fig. 19.5: Links Between Cognitive Processes and the Atomspace. The cognitive pro- cesses depicted all act on the Atomspace, in the sense that they operate by observing certain Atoms in the Atomspace and then modifying (or in rare cases deleting) them, and potentially adding new Atoms as well. Atoms represent all forms of knowledge, but some forms of knowl- edge are additionally represented by external data stores connected to the Atomspace, such as the Procedure Repository; these are also shown as linked to the Atomspace. 19.11 Broader Issues 301

Fig. 19.6: Invocation of Atom Operations By Cognitive Processes. This diagram depicts some of the Atom modification, creation and deletion operations carried out by the abstract cognitive processes in the CogPrime architecture. 302 19 The CogPrime Architecture and OpenCog System

Fig. 19.7: Relationship Between Multiple Memory Types. The bottom left corner shows a program tree, constituting procedural knowledge. The upper left shows declarative nodes and links in the Atomspace. The upper right corner shows a relevant system goal. The lower right corner contains an image symbolizing relevant episodic and sensory knowledge. All the various types of knowledge link to each other and can be approximatively converted to each other. 19.11 Broader Issues 303

Fig. 19.8: Example of Explicit Knowledge in the Atomspace. One simple example of explicitly represented knowledge in the Atomspace is linguistic knowledge, such as words and the concepts directly linked to them. Not all of a CogPrime system’s concepts correlate to words, but some do. 304 19 The CogPrime Architecture and OpenCog System

Fig. 19.9: Example of Implicit Knowledge in the Atomspace. A simple example of im- plicit knowledge in the Atomspace. The "chicken" and "food" concepts are represented by "maps" of ConceptNodes interconnected by HebbianLinks, where the latter tend to form be- tween ConceptNodes that are often simultaneously important. The bundle of links between nodes in the chicken map and nodes in the food map, represents an "implicit, emergent link" between the two concept maps. This diagram also illustrates "glocal" knowledge representation, in that the chicken and food concepts are each represented by individual nodes, but also by distributed maps. The "chicken" ConceptNode, when important, will tend to make the rest of the map important – and vice versa. Part of the overall chicken concept possessed by the system is expressed by the explicit links coming out of the chicken ConceptNode, and part is represented only by the distributed chicken map as a whole. Chapter 20 The Aera Architecture

Hercules Madonna

Abstract This is a very abstract abstract

20.1 Introduction

In the beginning ...

20.2 Conclusion

And so it went

305 306 20 The Aera Architecture

cra Appendix A Glossary

::

A.1 List of Specialized Acronyms

This includes acronyms that are commonly used in discussing CogPrime, OpenCog and related ideas, plus some that occur here and there in the text for relatively ephemeral reasons. • AA: Attention Allocation • ADF: Automatically Defined Function (in the context of Genetic Programming) • AF: Attentional Focus • AGI: Artificial General Intelligence • AV: Attention Value • BD: Behavior Description • C-space: Configuration Space • CBV: Coherent Blended Volition • CEV: Coherent Extrapolated Volition • CGGP: Contextually Guided Greedy Parsing • CSDLN: Compositional Spatiotemporal Deep Learning Network • CT: Combo Tree • ECAN: Economic Attention Network • ECP: Embodied Communication Prior • EPW : Experiential Possible Worlds (semantics) • FCA: Formal Concept Analysis • FI : Fisher Information • FIM: Frequent Itemset Mining • FOI: First Order Inference • FOPL: First Order Predicate Logic • FOPLN: First Order PLN • FS-MOSES: Feature Selection MOSES (i.e. MOSES with feature selection integrated a la LIFES) • GA: Genetic Algorithms

307 308 A Glossary

• GB: Global Brain • GEOP: Goal Evaluator Operating Procedure (in a GOLEM context) • GIS: Geospatial Information System • GOLEM: Goal-Oriented LEarning Meta-architecture • GP: Genetic Programming • HOI: Higher-Order Inference • HOPLN: Higher-Order PLN • HR: Historical Repository (in a GOLEM context) • HTM: Hierarchical Temporal Memory • IA: (Allen) Interval Algebra (an algebra of temporal intervals) • IRC: Imitation / Reinforcement / Correction (Learning) • LIFES: Learning-Integrated Feature Selection • LTI: Long Term Importance • MA: MindAgent • MOSES: Meta-Optimizing Semantic Evolutionary Search • MSH: Mirror System Hypothesis • NARS: Non-Axiomatic Reasoning System • NLGen: A specific software component within OpenCog, which provides one way of dealing with Natural Language Generation • OCP: OpenCogPrime • OP: Operating Program (in a GOLEM context) • PEPL: Probabilistic Evolutionary Procedure Learning (e.g. MOSES) • PLN: Probabilistic Logic Networks • RCC: Region Connection Calculus • RelEx: A specific software component within OpenCog, which provides one way of dealing with natural language Relationship Extraction • SAT: Boolean SATisfaction, as a mathematical / computational problem • SMEPH: Self-Modifying Evolving Probabilistic Hypergraph • SRAM: Simple Realistic Agents Model • STI: Short Term Importance • STV: Simple Truth VAlue • TV: Truth Value • VLTI: Very Long Term Importances • WSPS: Whole-Sentence Purely-Syntactic Parsing

A.2 Glossary of Specialized Terms

• Abduction: A general form of inference that goes from data describing something to a hypothesis that accounts for the data. Often in an OpenCog context, this refers to the PLN abduction rule, a specific First-Order PLN rule (If A implies C, and B implies C, then maybe A is B), which embodies a simple form of abductive inference. But OpenCog may also carry out abduction, as a general process, in other ways. • Action Selection: The process via which the OpenCog system chooses which Schema to enact, based on its current goals and context. • Active Schema Pool: The set of Schema currently in the midst of Schema Execution. A.2 Glossary of Specialized Terms 309

• Adaptive Inference Control: Algorithms or heuristics for guiding PLN inference, that cause inference to be guided differently based on the context in which the inference is taking place, or based on aspects of the inference that are noted as it proceeds. • AGI Preschool: A virtual world or robotic scenario roughly similar to the environment within a typical human preschool, intended for AGIs to learn in via interacting with the environment and with other intelligent agents. • Atom: The basic entity used in OpenCog as an element for building representations. Some Atoms directly represent patterns in the world or mind, others are components of represen- tations. There are two kinds of Atoms: Nodes and Links. • Atom, Frozen: See Atom, Saved • Atom, Realized: An Atom that exists in RAM at a certain point in time. • Atom, Saved: An Atom that has been saved to disk or other similar media, and is not actively being processed. • Atom, Serialized: An Atom that is serialized for transmission from one software process to another, or for saving to disk, etc. • Atom2Link: A part of OpenCogPrime s language generation system, that transforms appropriate Atoms into words connected via link parser link types. • Atomspace: A collection of Atoms, comprising the central part of the memory of an OpenCog instance. • Attention: The aspect of an intelligent system’s dynamics focused on guiding which aspects of an OpenCog system’s memory & functionality gets more computational resources at a certain point in time • Attention Allocation: The cognitive process concerned with managing the parameters and relationships guiding what the system pays attention to, at what points in time. This is a term inclusive of Importance Updating and Hebbian Learning. • Attentional Currency: Short Term Importance and Long Term Importance values are implemented in terms of two different types of artificial money, STICurrency and LTICur- rency. Theoretically these may be converted to one another. • Attentional Focus: The Atoms in an OpenCog Atomspace whose ShortTermImportance values lie above a critical threshold (the AttentionalFocus Boundary). The Attention Allo- cation subsystem treats these Atoms differently. Qualitatively, these Atoms constitute the system’s main focus of attention during a certain interval of time, i.e. it’s a moving bubble of attention. • Attentional Memory: A system’s memory of what it’s useful to pay attention to, in what contexts. In CogPrime this is managed by the attention allocation subsystem. • Backward Chainer: A piece of software, wrapped in a MindAgent, that carries out back- ward chaining inference using PLN. • CIM-Dynamic: Concretely-Implemented Mind Dynamic, a term for a cognitive process that is implemented explicitly in OpenCog (as opposed to allowed to emerge implicitly from other dynamics). Sometimes a CIM-Dynamic will be implemented via a single MindAgent, sometimes via a set of multiple interrelated MindAgents, occasionally by other means. • Cognition: In an OpenCog context, this is an imprecise term. Sometimes this term means any process closely related to intelligence; but more often it’s used specifically to refer to more abstract reasoning/learning/etc, as distinct from lower-level perception and action. • Cognitive Architecture: This refers to the logical division of an AI system like OpenCog into interacting parts and processes representing different conceptual aspects of intelligence. 310 A Glossary

It’s different from the software architecture, though of course certain cognitive architectures and certain software architectures fit more naturally together. • Cognitive Cycle: The basic ”loop” of operations that an OpenCog system, used to control an agent interacting with a world, goes through rapidly each ”subjective moment.” Typically a cognitive cycle should be completed in a second or less. It minimally involves perceiving data from the world, storing data in memory, and deciding what if any new actions need to be taken based on the data perceived. It may also involve other processes like deliber- ative thinking or metacognition. Not all OpenCog processing needs to take place within a cognitive cycle. • Cognitive Schematic: An implication of the form ”Context AND Procedure IMPLIES goal”. Learning and utilization of these is key to CogPrime’s cognitive process. • Cognitive Synergy: The phenomenon by which different cognitive processes, controlling a single agent, work together in such a way as to help each other be more intelligent. Typically, if one has cognitive processes that are individually susceptible to combinatorial explosions, cognitive synergy involves coupling them together in such a way that they can help one another overcome each other’s internal combinatorial explosions. The CogPrime design is reliant on the hypothesis that its key learning algorithms will display dramatic cognitive synergy when utilized for agent control in appropriate environments. • CogPrime : The name for the AGI design presented in this book, which is designed specifi- cally for implementation within the OpenCog software framework (and this implementation is OpenCogPrime). • CogServer: A piece of software, within OpenCog, that wraps up an Atomspace and a number of MindAgents, along with other mechanisms like a Scheduler for controlling the activity of the MindAgents, and code for important and exporting data from the Atomspace. • Cognitive Equation: The principle, identified in Ben Goertzel’s 1994 book "Chaotic Logic", that minds are collections of pattern-recognition elements, that work by iteratively recognizing patterns in each other and then embodying these patterns as new system ele- ments. This is seen as distinguishing mind from ”self-organization” in general, as the latter is not so focused on continual pattern recognition. Colloquially this means that ”a mind is a system continually creating itself via recognizing patterns in itself.” • Combo: The programming language used internally by MOSES to represent the programs it evolves. SchemaNodes may refer to Combo programs, whether the latter are learned via MOSES or via some other means. The textual realization of Combo resembles LISP with less syntactic sugar. Internally a Combo program is represented as a program tree. • Composer: In the PLN design, a rule is denoted a composer if it needs premises for generating its consequent. See generator. • CogBuntu: an Ubuntu Linux remix that contains all required packages and tools to test and develop OpenCog. • Concept Creation: A general term for cognitive processes that create new ConceptNodes, PredicateNodes or concept maps representing new concepts. • Conceptual Blending: A process of creating new concepts via judiciously combining pieces of old concepts. This may occur in OpenCog in many ways, among them the explicit use of a ConceptBlending MindAgent, that blends two or more ConceptNodes into a new one. • Confidence: A component of an OpenCog/PLN TruthValue, which is a scaling into the interval [0,1] of the weight of evidence associated with a truth value. In the simplest case (of a probabilistic Simple Truth Value), one uses confidence c = n / (n+k), where n is A.2 Glossary of Specialized Terms 311

the weight of evidence and k is a parameter. In the case of an Indefinite Truth Value, the confidence is associated with the width of the probability interval. • Confidence Decay: The process by which the confidence of an Atom decreases over time, as the observations on which the Atom’s truth value is based become increasingly obsolete. This may be carried out by a special MindAgent. The rate of confidence decay is subtle and contextually determined, and must be estimated via inference rather than simply assumed a priori. • Consciousness: CogPrime is not predicated on any particular conceptual theory of con- sciousness. Informally, the AttentionalFocus is sometimes referred to as the ”conscious” mind of a CogPrime system, with the rest of the Atomspace as ”unconscious” but this is just an informal usage, not intended to tie the CogPrime design to any particular theory of consciousness. The primary originator of the CogPrime design (Ben Goertzel) tends toward panpsychism, as it happens. • Context: In addition to its general common-sensical meaning, in CogPrime the term Con- text also refers to an Atom that is used as the first argument of a ContextLink. The second argument of the ContextLink then contains Links or Nodes, with TruthValues calculated restricted to the context defined by the first argument. For instance, (ContextLink USA (InheritanceLink person obese )). • Core: The MindOS portion of OpenCog, comprising the Atomspace, the CogServer, and other associated ”infrastructural” code. • Corrective Learning: When an agent learns how to do something, by having another agent explicitly guide it in doing the thing. For instance, teaching a dog to sit by pushing its butt to the ground. • CSDLN: (Compositional Spatiotemporal Deep Learning Network): A hierarchical pattern recognition network, in which each layer corresponds to a certain spatiotemporal granularity, the nodes on a given layer correspond to spatiotemporal regions of a given size, and the children of a node correspond to sub-regions of the region the parent corresponds to. Jeff Hawkins’s HTM is one example CSDLN, and Itamar Arel’s DeSTIN (currently used in OpenCog) is another. • Declarative Knowledge: Semantic knowledge as would be expressed in propositional or predicate logic facts or beliefs. • Deduction: In general, this refers to the derivation of conclusions from premises using logical rules. In PLN in particular, this often refers to the exercise of a specific inference rule, the PLN Deduction rule (A → B, B → C, therefore A→ C) • Deep Learning: Learning in a network of elements with multiple layers, involving feedfor- ward and feedback dynamics, and adaptation of the links between the elements. An example deep learning algorithm is DeSTIN, which is being integrated with OpenCog for perception processing. • Defrosting: Restoring, into the RAM portion of an Atomspace, an Atom (or set thereof) previously saved to disk. • Demand: In CogPrime’s OpenPsi subsystem, this term is used in a manner inherited from the Psi model of motivated action. A Demand in this context is a quantity whose value the system is motivated to adjust. Typically the system wants to keep the Demand between certain minimum and maximum values. An Urge develops when a Demand deviates from its target range. • Deme: In MOSES, an ”island” of candidate programs, closely clustered together in program space, being evolved in an attempt to optimize a certain fitness function. The idea is that 312 A Glossary

within a deme, programs are generally similar enough that reasonable syntax-semantics correlation obtains. • Derived Hypergraph: The SMEPH hypergraph obtained via modeling a system in terms of a hypergraph representing its internal states and their relationships. For instance, a SMEPH vertex represents a collection of internal states that habitually occur in relation to similar external situations. A SMEPH edge represents a relationship between two SMEPH vertices (e.g. a similarity or inheritance relationship). The terminology ”edge /vertex” is used in this context, to distinguish from the ”link / node” terminology used in the context of the Atomspace. • DeSTIN – Deep SpatioTemporal Inference Network: A specific CSDLN created by Itamar Arel, tested on visual perception, and appropriate for integration within CogPrime. • Dialogue: Linguistic interaction between two or more parties. In a CogPrime context, this may be in English or another natural language, or it may be in Lojban or Psynese. • Dialogue Control: The process of determining what to say at each juncture in a dialogue. This is distinguished from the linguistic aspects of dialogue, language comprehension and language generation. Dialogue control applies to Psynese or Lojban, as well as to human natural language. • Dimensional Embedding: The process of embedding entities from some non-dimensional space (e.g. the Atomspace) into an n-dimensional Euclidean space. This can be useful in an AI context because some sorts of queries (e.g. ”find everything similar to X”, ”find a path between X and Y”) are much faster to carry out among points in a Euclidean space, than among entities in a space with less geometric structure. • Distributed Atomspace: An implementation of an Atomspace that spans multiple com- putational processes; generally this is done to enable spreading an Atomspace across mul- tiple machines. • Dual Network: A network of mental or informational entities with both a hierarchical structure and a heterarchical structure, and an alignment among the two structures so that each one helps with the maintenance of the other. This is hypothesized to be a critical emergent structure, that must emerge in a mind (e.g. in an Atomspace) in order for it to achieve a reasonable level of human-like general intelligence (and possibly to achieve a high level of pragmatic general intelligence in any physical environment). • Efficient Pragmatic General Intelligence: A formal, mathematical definition of general intelligence (extending the pragmatic general intelligence), that ultimately boils down to: the ability to achieve complex goals in complex environments using limited computational resources (where there is a specifically given weighting function determining which goals and environments have highest priority). More specifically, the definition weighted-sums the system’s normalized goal-achieving ability over (goal, environment pairs), and where the weights are given by some assumed measure over (goal, environment pairs), and where the normalization is done via dividing by the (space and time) computational resources used for achieving the goal. • Elegant Normal Form (ENF): Used in MOSES, this is a way of putting programs in a normal form while retaining their hierarchical structure. This is critical if one wishes to probabilistically model the structure of a collection of programs, which is a meaningful operation if the collection of programs is operating within a region of program space where syntax-semantics correlation holds to a reasonable degree. The Reduct is used to place programs into ENF. A.2 Glossary of Specialized Terms 313

• Embodied Communication Prior: The class of prior distributions over (goal, environ- ment pairs), that are imposed by placing an intelligent system in an environment where most of its tasks involve controlling a spatially localized body in a complex world, and in- teracting with other intelligent spatially localized bodies. It is hypothesized that many key aspects of human-like intelligence (e.g. the use of different subsystems for different memory types, and cognitive synergy between the dynamics associated with these subsystems) are consequences of this prior assumption. This is related to the Mind-World Correspondence Principle. • Embodiment: Colloquially, in an OpenCog context, this usually means the use of an AI software system to control a spatially localized body in a complex (usually 3D) world. There are also possible ”borderline cases” of embodiment, such as a search agent on the Internet. In a sense any AI is embodied, because it occupies some physical system (e.g. computer hardware) and has some way of interfacing with the outside world. • Emergence: A property or pattern in a system is emergent if it arises via the combination of other system components or aspects, in such a way that its details would be very difficult (not necessarily impossible in principle) to predict from these other system components or aspects. • Emotion: Emotions are system-wide responses to the system’s current and predicted state. Dorner’s Psi theory of emotion contains explanations of many human emotions in terms of underlying dynamics and motivations, and most of these explanations make sense in a CogPrime context, due to CogPrime’s use of OpenPsi (modeled on Psi) for motivation and action selection. • Episodic Knowledge: Knowledge about episodes in an agent’s life-history, or the life- history of other agents. CogPrime includes a special dimensional embedding space only for episodic knowledge, easing organization and recall. • Evolutionary Learning: Learning that proceeds via the rough process of iterated differen- tial reproduction based on fitness, incorporating variations of reproduced entities. MOSES is an explicitly evolutionary-learning-based portion of CogPrime; but CogPrime’s dynamics as a whole may also be conceived as evolutionary. • Exemplar: (in the context of imitation learning) - When the owner wants to teach an OpenCog controlled agent a behavior by imitation, he/she gives the pet an exemplar. To teach a virtual pet "fetch" for instance, the owner is going to throw a stick, run to it, grab it with his/her mouth and come back to its initial position. • Exemplar: (in the context of MOSES) – Candidate chosen as the core of a new deme, or as the central program within a deme, to be varied by representation building for ongoing exploration of program space. • Explicit Knowledge Representation: Knowledge representation in which individual, easily humanly identifiable pieces of knowledge correspond to individual elements in a knowl- edge store (elements that are explicitly there in the software and accessible via very rapid, deterministic operations) • Extension: In PLN, the extension of a node refers to the instances of the category that the node represents. In contrast is the intension. • Fishgram (Frequent and Interesting Sub-hypergraph Mining): A pattern mining algorithm for identifying frequent and/or interesting sub-hypergraphs in the Atomspace. • First-Order Inference (FOI): The subset of PLN that handles Logical Links not in- volving VariableAtoms or higher-order functions. The other aspect of PLN, Higher-Order Inference, uses Truth Value formulas derived from First-Order Inference. 314 A Glossary

• Forgetting: The process of removing Atoms from the in-RAM portion of Atomspace, when RAM gets short and they are judged not as valuable to retain in RAM as other Atoms. This is commonly done using the LTI values of the Atoms (removing lowest LTI-Atoms, or more complex strategies involving the LTI of groups of interconnected Atoms). May be done by a dedicated Forgetting MindAgent. VLTI may be used to determine the fate of forgotten Atoms. • Forward Chainer: A control mechanism (MindAgent) for PLN inference, that works by taking existing Atoms and deriving conclusions from them using PLN rules, and then iter- ating this process. The goal is to derive new Atoms that are interesting according to some given criterion. • Frame2Atom: A simple system of hand-coded rules for translating the output of RelEx2Frame (logical representation of semantic relationships using FrameNet relationships) into Atoms. • Freezing: Saving Atoms from the in-RAM Atomspace to disk. • General Intelligence: Often used in an informal, commonsensical sense, to mean the ability to learn and generalize beyond specific problems or contexts. Has been formalized in various ways as well, including formalizations of the notion of ”achieving complex goals in complex environments” and ”achieving complex goals in complex environments using limited resources.” Usually interpreted as a fuzzy concept, according to which absolutely general intelligence is physically unachievable, and humans have a significant level of general intelligence, but far from the maximally physically achievable degree. • Generalized Hypergraph: A hypergraph with some additional features, such as links that point to links, and nodes that are seen as ”containing” whole sub-hypergraphs. This is the most natural and direct way to mathematically/visually model the Atomspace. • Generator: In the PLN design, a rule is denoted a generator if it can produce its consequent without needing premises (e.g. LookupRule, which just looks it up in the AtomSpace). See composer. • Global, Distributed Memory: Memory that stores items as implicit knowledge, with each memory item spread across multiple components, stored as a pattern of organization or activity among them. • Glocal Memory: The storage of items in memory in a way that involves both localized and global, distributed aspects. • Goal: An Atom representing a function that a system (like OpenCog) is supposed to spend a certain non-trivial percentage of its attention optimizing. The goal, informally speaking, is to maximize the Atom’s truth value. • Goal, Implicit: A goal that an intelligent system, in practice, strives to achieve; but that is not explicitly represented as a goal in the system’s knowledge base. • Goal, Explicit: A goal that an intelligent system explicitly represents in its knowledge base, and expends some resources trying to achieve. Goal Nodes (which may be Nodes or, e.g. ImplicationLinks) are used for this purpose in OpenCog. • Goal-Driven Learning: Learning that is driven by the cognitive schematic i.e. by the quest of figuring out which procedures can be expected to achieve a certain goal in a certain sort of context. • Grounded SchemaNode: See SchemaNode, Grounded. • Hebbian Learning: An aspect of Attention Allocation, centered on creating and updating HebbianLinks, which represent the simultaneous importance of the Atoms joined by the HebbianLink. A.2 Glossary of Specialized Terms 315

• Hebbian Links: Links recording information about the associative relationship (co- occurrence) between Atoms. These include symmetric and asymmetric HebbianLinks. • Heterarchical Network: A network of linked elements in which the semantic relationships associated with the links are generally symmetrical (e.g. they may be similarity links, or symmetrical associative links). This is one important sort of subnetwork of an intelligent system; see Dual Network. • Hierarchical Network: A network of linked elements in which the semantic relationships associated with the links are generally asymmetrical, and the parent nodes of a node have a more general scope and some measure of control over their children (though there may be important feedback dynamics too). This is one important sort of subnetwork of an intelligent system; see Dual Network. • Higher-Order Inference (HOI): PLN inference involving variables or higher-order func- tions. In contrast to First-Order Inference (FOI). • Hillclimbing: A general term for greedy, local optimization techniques, including some relatively sophisticated ones that involve ”mildly nonlocal” jumps. • Human-Level Intelligence: General intelligence that’s ”as smart as” human general in- telligence, even if in some respects quite unlike human intelligence. An informal concept, which generally doesn’t come up much in CogPrime work, but is used frequently by some other AI theorists. • Human-Like Intelligence: General intelligence with properties and capabilities broadly resembling those of humans, but not necessarily precisely imitating human beings. • Hypergraph: A conventional hypergraph is a collection of nodes and links, where each link may span any number of nodes. OpenCog makes use of generalized hypergraphs (the Atomspace is one of these). • Imitation Learning: Learning via copying what some other agent is observed to do. • Implication: Often refers to an ImplicationLink between two PredicateNodes, indicating an (extensional, intensional or mixed) logical implication. • Implicit Knowledge Representation: Representation of knowledge via having easily humanly identifiable pieces of knowledge correspond to the pattern of organization and/or dynamics of elements, rather than via having individual elements correspond to easily hu- manly identifiable pieces of knowledge. • Importance: A generic term for the Attention Values associated with Atoms. Most com- monly these are STI (short term importance) and LTI (long term importance) values. Other importance values corresponding to various different time scales are also possible. In general an importance value reflects an estimate of the likelihood an Atom will be useful to the system over some particular future time-horizon. STI is generally relevant to processor time allocation, whereas LTI is generally relevant to memory allocation. • Importance Decay: The process of Atom importance values (e.g. STI and LTI) decreasing over time, if the Atoms are not utilized. Importance decay rates may in general be context- dependent. • Importance Spreading: A synonym for Importance Updating, intended to highlight the similarity with ”activation spreading” in neural and semantic networks. • Importance Updating: The CIM-Dynamic that periodically (frequently) updates the STI and LTI values of Atoms based on their recent activity and their relationships. • Imprecise Truth Value: Peter Walley’s imprecise truth values are intervals [L,U], inter- preted as lower and upper bounds of the means of probability distributions in an envelope 316 A Glossary

of distributions. In general, the term may be used to refer to any truth value involving intervals or related constructs, such as indefinite probabilities. • Indefinite Probability: An extension of a standard imprecise probability, comprising a credible interval for the means of probability distributions governed by a given second-order distribution. • Indefinite Truth Value: An OpenCog TruthValue object wrapping up an indefinite prob- ability • Induction: In PLN, a specific inference rule (A → B, A → C, therefore B → C). In general, the process of heuristically inferring that what has been seen in multiple examples, will be seen again in new examples. Induction in the broad sense, may be carried out in OpenCog by methods other than PLN induction. When emphasis needs to be laid on the particular PLN inference rule, the phrase ”PLN Induction” is used. • Inference: Generally speaking, the process of deriving conclusions from assumptions. In an OpenCog context, this often refers to the PLN inference system. Inference in the broad sense is distinguished from general learning via some specific characteristics, such as the intrinsically incremental nature of inference: it proceeds step by step. • Inference Control: A cognitive process that determines what logical inference rule (e.g. what PLN rule) is applied to what data, at each point in the dynamic operation of an inference process. • Integrative AGI: An AGI architecture, like CogPrime, that relies on a number of different powerful, reasonably general algorithms all cooperating together. This is different from an AGI architecture that is centered on a single algorithm, and also different than an AGI architecture that expects intelligent behavior to emerge from the collective interoperation of a number of simple elements (without any sophisticated algorithms coordinating their overall behavior). • Integrative Cognitive Architecture: A cognitive architecture intended to support inte- grative AGI. • Intelligence: An informal, natural language concept. ”General intelligence” is one slightly more precise specification of a related concept; ”Universal intelligence” is a fully precise specification of a related concept. Other specifications of related concepts made in the particular context of CogPrime research are the pragmatic general intelligence and the efficient pragmatic general intelligence. • Intension: In PLN, the intention of a node consists of Atoms representing properties of the entity the node represents. • Intentional memory: A system’s knowledge of its goals and their subgoals, and associa- tions between these goals and procedures and contexts (e.g. cognitive schematics). • Internal Simulation World: A simulation engine used to simulate an external environ- ment (which may be physical or virtual), used by an AGI system as its ”mind’s eye” in order to experiment with various action‘ q sequences and envision their consequences, or observe the consequences of various hypothetical situations. Particularly important for dealing with episodic knowledge. • Interval Algebra: Allen Interval Algebra, a mathematical theory of the relationships be- tween time intervals. CogPrime utilizes a fuzzified version of classic Interval Algebra. • IRC Learning (Imitation, Reinforcement, Correction): Learning via interaction with a teacher, involving a combination of imitating the teacher, getting explicit reinforcement signals from the teacher, and having one’s incorrect or suboptimal behaviors guided toward betterness by the teacher in real-time. This is a large part of how young humans learn. A.2 Glossary of Specialized Terms 317

• Knowledge Base: A shorthand for the totality of knowledge possessed by an intelligent system during a certain interval of time (whether or not this knowledge is explicitly rep- resented). Put differently: this is an intelligence’s total memory contents (inclusive of all types of memory) during an interval of time. • Language Comprehension: The process of mapping natural language speech or text into a more ”cognitive”, largely language-independent representation. In OpenCog this has been done by various pipelines consisting of dedicated natural language processing tools, e.g. a pipeline: text → Link Parser → RelEx → RelEx2Frame → Frame2Atom Atomspace; and alternatively a pipeline Link Parser → Link2Atom → Atomspace. It would also be possi- ble to do language comprehension purely via PLN and other generic OpenCog processes, without using specialized language processing tools. • Language Generation: The process of mapping (largely language-independent) cognitive content into speech or text. In OpenCog this has been done by various pipelines consisting of dedicated natural language processing tools, e.g. a pipeline: Atomspace → NLGen → text; or more recently Atomspace → Atom2Link → surface realization → text. It would also be possible to do language generation purely via PLN and other generic OpenCog processes, without using specialized language processing tools. • Language Processing: Processing of human language is decomposed, in CogPrime, into Language Comprehension, Language Generation, and Dialogue Control. • Learning: In general, the process of a system adapting based on experience, in a way that increases its intelligence (its ability to achieve its goals). The theory underlying CogPrime doesn’t distinguish learning from reasoning, associating, or other aspects of intelligence. • Learning Server: In some OpenCog configurations, this refers to a software server that performs ”offline” learning tasks (e.g. using MOSES or hillclimbing), and is in communica- tion with an Operational Agent Controller software server that performs real-time agent control and dispatches learning tasks to and receives results from the Learning Server. • Linguistic Links: A catch-all term for Atoms explicitly representing linguistic content, e.g. WordNode, SentenceNode, CharacterNode. • Link: A type of Atom, representing a relationship among one or more Atoms. Links and Nodes are the two basic kinds of Atoms. • Link Parser: A natural language syntax parser, created by Sleator and Temperley at Carnegie-Mellon University, and currently used as part of OpenCogPrime’s natural language comprehension and natural language generation system. • Link2Atom: A system for translating link parser links into Atoms. It attempts to resolve precisely as much ambiguity as needed in order to translate a given assemblage of link parser links into a unique Atom structure. • Lobe: A term sometimes used to refer to a portion of a distributed Atomspace that lives in a single computational process. Often different lobes will live on different machines. • Localized Memory: Memory that stores each item using a small number of closely- connected elements. • Logic: In an OpenCog context, this usually refers to a set of formal rules for translating certain combinations of Atoms into ”conclusion” Atoms. The paradigm case at present is the PLN probabilistic logic system, but OpenCog can also be used together with other . • Logical Links: Any Atoms whose truth values are primarily determined or adjusted via logical rules, e.g. PLN’s InheritanceLink, SimilarityLink, ImplicationLink, etc. The term isn’t usually applied to other links like HebbianLinks whose semantics isn’t primarily logic- 318 A Glossary

based, even though these other links can be processed via (e.g. PLN) logical inference via interpreting them logically. • Lojban: A constructed human language, with a completely formalized syntax and a highly formalized semantics, and a small but active community of speakers. In principle this seems an extremely good method for communication between humans and early-stage AGI sys- tems. • Lojban++: A variant of Lojban that incorporates English words, enabling more flexible expression without the need for frequent invention of new Lojban words. • Long Term Importance (LTI): A value associated with each Atom, indicating roughly the expected utility to the system of keeping that Atom in RAM rather than saving it to disk or deleting it. It’s possible to have multiple LTI values pertaining to different time scales, but so far practical implementation and most theory has centered on the option of a single LTI value. • LTI: Long Term Importance • Map: A collection of Atoms that are interconnected in such a way that they tend to be commonly active (i.e. to have high STI, e.g. enough to be in the AttentionalFocus, at the same time). • Map Encapsulation: The process of automatically identifying maps in the Atomspace, and creating Atoms that ”encapsulate” them; the Atom encapsulation a map would link to all the Atoms in the map. This is a way of making global memory into local memory, thus making the system’s memory glocal and explicitly manifesting the ”cognitive equation.” This may be carried out via a dedicated MapEncapsulation MindAgent. • Map Formation: The process via which maps form in the Atomspace. This need not be explicit; maps may form implicitly via the action of Hebbian Learning. It will commonly occur that Atoms frequently co-occurring in the AttentionalFocus, will come to be joined together in a map. • Memory Types: In CogPrime this generally refers to the different types of memory that are embodied in different data structures or processes in the CogPrime architecture, e.g. declarative (semantic), procedural, attentional, intentional, episodic, sen- sorimotor. • Mind-World Correspondence Principle: The principle that, for a mind to display efficient pragmatic general intelligence relative to a world, it should display many of the same key structural properties as that world. This can be formalized by modeling the world and mind as probabilistic state transition graphs, and saying that the categories implicit in the state transition graphs of the mind and world should be inter-mappable via a high- probability morphism. • Mind OS: A synonym for the OpenCog Core. • MindAgent: An OpenCog software object, residing in the CogServer, that carries out some processes in interaction with the Atomspace. A given conceptual cognitive process (e.g. PLN inference, Attention allocation, etc.) may be carried out by a number of different MindAgents designed to work together. • Mindspace: A model of the set of states of an intelligent system as a geometrical space, imposed by assuming some metric on the set of mind-states. This may be used as a tool for formulating general principles about the dynamics of generally intelligent systems. • Modulators: Parameters in the Psi model of motivated, emotional cognition, that modu- late the way a system perceives, reasons about and interacts with the world. A.2 Glossary of Specialized Terms 319

• MOSES (Meta-Optimizing Semantic Evolutionary Search): An algorithm for proce- dure learning, which in the current implementation learns programs in the Combo language. MOSES is an evolutionary learning system, which differs from typical genetic programming systems in multiple aspects including: a subtler framework for managing multiple ”demes” or ”islands” of candidate programs; a library called Reduct for placing programs in Elegant Normal Form; and the use of probabilistic modeling in place of, or in addition to, mutation and crossover as means of determining which new candidate programs to try. • Motoric: Pertaining to the control of physical actuators, e.g. those connected to a robot. May sometimes be used to refer to the control of movements of a virtual character as well. • Moving Bubble of Attention: The Attentional Focus of a CogPrime system. • Natural Language Comprehension: See Language Comprehension • Natural Language Generation: See Language Generation • Natural Language Processing (NLP): See Language Processing • NLGen: Software for carrying out the surface realization phase of natural language gen- eration, via translating collections of RelEx output relationships into English sentences. Was made functional for simple sentences and some complex sentences; not currently under active development, as work has shifted to the related Atom2Link approach to language generation. • Node: A type of Atom. Links and Nodes are the two basic kinds of Atoms. Nodes, math- ematically, can be thought of as "0-ary" links. Some types of Nodes refer to external or mathematical entities (e.g. WordNode, NumberNode); others are purely abstract, e.g. a ConceptNode is characterized purely by the Links relating it to other atoms. Grounded- PredicateNodes and GroundedSchemaNodes connect to explicitly represented procedures (sometimes in the Combo language); ungrounded PredicateNodes and SchemaNodes are abstract and, like ConceptNodes, purely characterized by their relationships. • Node Probability: Many PLN inference rules rely on probabilities associated with Nodes. Node probabilities are often easiest to interpret in a specific context, e.g. the probability P(cat) makes obvious sense in the context of a typical American house, or in the context of the center of the sun. Without any contextual specification, P(A) is taken to mean the probability that a randomly chosen occasion of the system’s experience includes some instance of A. • Novamente Cognition Engine (NCE): A proprietary proto-AGI software system, the predecessor to OpenCog. Many parts of the NCE were open-sourced to form portions of OpenCog, but some NCE code was not included in OpenCog; and now OpenCog includes multiple aspects and plenty of code that was not in NCE. • OpenCog: A software framework intended for development of AGI systems, and also for narrow-AI application using tools that have AGI applications. Co-designed with the Cog- Prime cognitive architecture, but not exclusively bound to it. • OpenCog Prime (OCP): The implementation of the CogPrime cognitive architecture within the OpenCog software framework. • OpenPsi: CogPrime’s architecture for motivation-driven action selection, which is based on adapting Dorner’s Psi model for use in the OpenCog framework. • Operational Agent Controller (OAC): In some OpenCog configurations, this is a soft- ware server containing a CogServer devoted to real-time control of an agent (e.g. a virtual world agent, or a robot). Background, offline learning tasks may then be dispatched to other software processes, e.g. to a Learning Server. 320 A Glossary

• Pattern: In a CogPrime context, the term ”pattern” is generally used to refer to a process that produces some entity, and is judged simpler than that entity. • Pattern Mining: Pattern mining is the process of extracting an (often large) number of patterns from some body of information, subject to some criterion regarding which patterns are of interest. Often (but not exclusively) it refers to algorithms that are rapid or ”greedy”, finding a large number of simple patterns relatively inexpensively. • Pattern Recognition: The process of identifying and representing a pattern in some substrate (e.g. some collection of Atoms, or some raw perceptual data, etc.). • Patternism: The philosophical principle holding that, from the perspective of engineering intelligent systems, it is sufficient and useful to think about mental processes in terms of (static and dynamical) patterns. • Perception: The process of understanding data from sensors. When natural language is ingested in textual format, this is generally not considered perceptual. Perception may be taken to encompass both pre-processing that prepares sensory data for ingestion into the Atomspace, processing via specialized perception processing systems like DeSTIN that are connected to the Atomspace, and more cognitive-level process within the Atomspace that is oriented toward understanding what has been sensed. • Piagetan Stages: A series of stages of cognitive development hypothesized by develop- mental psychologist Jean Piaget, which are easy to interpret in the context of developing CogPrime systems. The basic stages are: Infantile, Pre-operational, Concrete Operational and Formal. Post-formal stages have been discussed by theorists since Piaget and seem relevant to AGI, especially advanced AGI systems capable of strong self-modification. • PLN: short for Probabilistic Logic Networks • PLN, First-Order: See First-Order Inference • PLN, Higher-Order: See Higher-Order Inference • PLN Rules: A PLN Rule takes as input one or more Atoms (the ”premises”, usually Links), and output an Atom that is a ”logical conclusion” of those Atoms. The truth value of the consequence is determined by a PLN Formula associated with the Rule. • PLN Formulas: A PLN Formula, corresponding to a PLN Rule, takes the TruthValues corresponding to the premises and produces the TruthValue corresponding to the conclusion. A single Rule may correspond to multiple Formulas, where each Formula deals with a different sort of TruthValue. • Pragmatic General Intelligence: A formalization of the concept of general intelligence, based on the concept that general intelligence is the capability to achieve goals in environ- ments, calculated as a weighted average over some fuzzy set of goals and environments. • Predicate Evaluation: The process of determining the Truth Value of a predicate, embod- ied in a PredicateNode. This may be recursive, as the predicate referenced internally by a Grounded PredicateNode (and represented via a Combo program tree) may itself internally reference other PredicateNodes. • Probabilistic Logic Networks (PLN): A mathematical and conceptual framework for reasoning under uncertainty, integrating aspects of predicate and term logic with extensions of imprecise probability theory. OpenCogPrime’s central tool for symbolic reasoning. • Procedural Knowledge: Knowledge regarding which series of actions (or action-combinations) are useful for an agent to undertake in which circumstances. In CogPrime these may be learned in a number of ways, e.g. via PLN or via Hebbian learning of Schema Maps, or via explicit learning of Combo programs via MOSES or hillclimbing. Procedures are represented as SchemaNodes or Schema Maps. A.2 Glossary of Specialized Terms 321

• Procedure Evaluation/Execution: A general term encompassing both Schema Execu- tion and Predicate Evaluation, both of which are similar computational processes involving manipulation of Combo trees associated with ProcedureNodes. • Procedure Learning: Learning of procedural knowledge, based on any method, e.g. evo- lutionary learning (e.g. MOSES), inference (e.g. PLN), reinforcement learning (e.g. Hebbian learning). • Procedure Node: A SchemaNode or PredicateNode • Psi: A model of motivated action and emotion, originated by Dietrich Dorner and further developed by Joscha Bach, who incorporated it in his proto-AGI system MicroPsi. OpenCog- Prime’s motivated-action component, OpenPsi, is roughly based on the Psi model. • Psynese: A system enabling different OpenCog instances to communicate without using natural language, via directly exchanging Atom subgraphs, using a special system to map references in the speaker’s mind into matching references in the listener’s mind. • Psynet Model: An early version of the theory of mind underlying CogPrime, referred to in some early writings on the Webmind AI Engine and Novamente Cognition Engine. The concepts underlying the psynet model are still part of the theory underlying CogPrime, but the name has been deprecated as it never really caught on. • Reasoning: See inference • Reduct: A code library, used within MOSES, applying a collection of hand-coded rewrite rules that transform Combo programs into Elegant Normal Form. • Region Connection Calculus: A mathematical formalism describing a system of basic operations among spatial regions. Used in CogPrime as part of spatial inference to provide relations and rules to be referenced via PLN and potentially other subsystems. • Reinforcement Learning: Learning procedures via experience, in a manner explicitly guided to cause the learning of procedures that will maximize the system’s expected future reward. CogPrime does this implicitly whenever it tries to learn procedures that will maxi- mize some Goal whose Truth Value is estimated via an expected reward calculation (where ”reward” may mean simply the Truth Value of some Atom defined as ”reward”). Goal-driven learning is more general than reinforcement learning as thus defined; and the learning that CogPrime does, which is only partially goal-driven, is yet more general. • RelEx: A software system used in OpenCog as part of natural language comprehension, to map the output of the link parser into more abstract semantic relationships. These more abstract relationships may then be entered directly into the Atomspace, or they may be further abstracted before being entered into the Atomspace, e.g. by RelEx2Frame rules. • RelEx2Frame: A system of rules for translating RelEx output into Atoms, based on the FrameNet ontology. The output of the RelEx2Frame rules make use of the FrameNet library of semantic relationships. The current (2012) RelEx2Frame rule-based is problematic and the RelEx2Frame system is deprecated as a result, in favor of Link2Atom. However, the ideas embodied in these rules may be useful; if cleaned up the rules might profitably be ported into the Atomspace as ImplicationLinks. • Representation Building: A stage within MOSES, wherein a candidate Combo program tree (within a deme) is modified by replacing one or more tree nodes with alternative tree nodes, thus obtaining a new, different candidate program within that deme. This process currently relies on hand-coded knowledge regarding which types of tree nodes a given tree node should be experimentally replaced with (e.g. an AND node might sensibly be replaced with an OR node, but not so sensibly replaced with a node representing a ”kick” action). 322 A Glossary

• Request for Services (RFS): In CogPrime’s Goal-driven action system, a RFS is a package sent from a Goal Atom to another Atom, offering it a certain amount of STI currency if it is able to deliver the goal what it wants (an increase in its Truth Value). RFS’s may be passed on, e.g. from goals to subgoals to sub-subgoals, but eventually an RFS reaches a Grounded SchemaNode, and when the corresponding Schema is executed, the payment implicit in the RFS is made. • Robot Preschool: An AGI Preschool in our physical world, intended for robotically em- bodied AGIs. • Robotic Embodiment: Using an AGI to control a robot. The AGI may be running on hardware physically contained in the robot, or may run elsewhere and control the robot via networking methods such as wifi. • Scheduler: Part of the CogServer that controls which processes (e.g. which MindAgents) get processor time, at which point in time. • Schema: A ”script” describing a process to be carried out. This may be explicit, as in the case of a GroundedSchemaNode, or implicit, as the case in Schema maps or ungrounded SchemaNodes. • Schema Encapsulation: The process of automatically recognizing a Schema Map in an Atomspace, and creating a Combo (or other) program embodying the process carried out by this Schema Map, and then storing this program in the Procedure Repository and associating it with a particular SchemaNode. This translates distributed, global procedural memory into localized procedural memory. It’s a special case of Map Encapsulation. • Schema Execution: The process of ”running” a Grounded Schema, similar to running a computer program. Or, phrased alternately: The process of executing the Schema referenced by a Grounded SchemaNode. This may be recursive, as the predicate referenced internally by a Grounded SchemaNode (and represented via a Combo program tree) may itself internally reference other Grounded SchemaNodes. • Schema, Grounded: A Schema that is associated with a specific executable program (either a Combo program or, say, C++ code) • Schema Map: A collection of Atoms, including SchemaNodes, that tend to be enacted in a certain order (or set of orders), thus habitually enacting the same process. This is a distributed, globalized way of storing and enacting procedures. • Schema, Ungrounded: A Schema that represents an abstract procedure, not associated with any particular executable program. • Schematic Implication: A general, conceptual name for implications of the form ((Con- text AND Procedure) IMPLIES Goal) • SegSim: A name for the main algorithm underlying the NLGen language generation soft- ware. The algorithm is based on segmenting a collection of Atoms into small parts, and matching each part against memory to find, for each part, cases where similar Atom- collections already have known linguistic expression. • Self-Modification: A term generally used for AI systems that can purposefully modify their core algorithms and representations. Formally and crisply distinguishing this sort of ”strong self-modification” from ”mere” learning is a tricky matter. • Sensorimotor: Pertaining to sensory data, motoric actions, and their combination and intersection. • Sensory: Pertaining to data received by the AGI system from the outside world. In a CogPrime system that perceives language directly as text, the textual input will generally A.2 Glossary of Specialized Terms 323

not be considered as ”sensory” (on the other hand, speech audio data would be considered as ”sensory”). • Short Term Importance: A value associated with each Atom, indicating roughly the expected utility to the system of keeping that Atom in RAM rather than saving it to disk or deleting it. It’s possible to have multple LTI values pertaining to different time scales, but so far practical implementation and most theory has centered on the option of a single LTI value. • Similarity: a link type indicating the probabilistic similarity between two different Atoms. Generically this is a combination of Intensional Similarity (similarity of properties) and Extensional Similarity (similarity of members). • Simple Truth Value: a TruthValue involving a pair (s,d) indicating strength (e.g. proba- bility or fuzzy set membership) and confidence d. d may be replaced by other options such as a count n or a weight of evidence w. • Simulation World: See Internal Simulation World • SMEPH (Self-Modifying Evolving Probabilistic Hypergraphs): a style of modeling systems, in which each system is associated with a derived hypergraph • SMEPH Edge: A link in a SMEPH derived hypergraph, indicating an empirically observed relationship (e.g. inheritance or similarity) between two • SMEPH Vertex: A node in a SMEPH derived hypergraph representing a system, indicat- ing a collection of system states empirically observed to arise in conjunction with the same external stimuli • Spatial Inference: PLN reasoning including Atoms that explicitly reference spatial rela- tionships • Spatiotemporal Inference: PLN reasoning including Atoms that explicitly reference spa- tial and temporal relationships • STI: Shorthand for Short Term Importance • Strength: The main component of a TruthValue object, lying in the interval [0,1], refer- ring either to a probability (in cases like InheritanceLink, SimilarityLink, EquivalenceLink, ImplicationLink, etc.) or a fuzzy value (as in MemberLink, EvaluationLink). • Strong Self-Modification: This is generally used as synonymous with Self-Modification, in a CogPrime context. • Subsymbolic: Involving processing of data using elements that have no correspondence to natural language terms, nor abstract concepts; and that are not naturally interpreted as symbolically ”standing for” other things. Often used to refer to processes such as perception processing or motor control, which are concerned with entities like pixels or commands like ”rotate servomotor 15 by 10 degrees theta and 55 degrees phi.” The distinction between ”symbolic” and ”subsymbolic” is conventional in the history of AI, but seems difficult to formalize rigorously. Logic-based AI systems are typically considered ”symbolic”, yet • Supercompilation: A technique for program optimization, which globally rewrites a pro- gram into a usually very different looking program that does the same thing. A prototype supercompiler was applied to Combo programs with successful results. • Surface Realization: The process of taking a collection of Atoms and transforming them into a series of words in a (usually natural) language. A stage in the overall process of language generation. • Symbol Grounding: The mapping of a symbolic term into perceptual or motoric entities that help define the meaning of the symbolic term. For instance, the concept ”Cat” may be 324 A Glossary

grounded by images of cats, experiences of interactions with cats, imaginations of being a cat, etc. • Symbolic: Pertaining to the formation or manipulation of symbols, i.e. mental entities that are explicitly constructed to represent other entities. Often contrasted with subsymbolic. • Syntax-Semantics Correlation: In the context of MOSES and program learning more broadly, this refers to the property via which distance in syntactic space (distance between the syntactic structure of programs, e.g. if they’re represented as program trees) and se- mantic space (distance between the behaviors of programs, e.g. if they’re represented as sets of input/output pairs) are reasonably well correlated. This can often happen among sets of programs that are not too widely dispersed in program space. The Reduct library is used to place Combo programs in Elegant Normal Form, which increases the level of syntax-semantics corellation between them. The programs in a single MOSES deme are often closely enough clustered together that they have reasonably high syntax-semantics correlation. • System Activity Table: An OpenCog component that records information regarding what a system did in the past. • Temporal Inference: Reasoning that heavily involves Atoms representing temporal in- formation, e.g. information about the duration of events, or their temporal relationship (before, after, during, beginning, ending). As implemented in CogPrime, makes use of an uncertain version of Allen Interval Algebra. • Truth Value: A package of information associated with an Atom, indicating its degree of truth. SimpleTruthValue and IndefiniteTruthValue are two common, particular kinds. Multiple truth values associated with the same Atom from different perspectives may be grouped into CompositeTruthValue objects. • Universal Intelligence: A technical term introduced by Shane Legg and Marcus Hutter, describing (roughly speaking) the average capability of a system to carry out computable goals in computable environments, where goal/environment pairs are weighted via the length of the shortest program for computing them. • Urge: In OpenPsi, an Urge develops when a Demand deviates from its target range. • Very Long Term Importance (VLTI): A bit associated with Atoms, which determines whether, when an Atom is forgotten (removed from RAM), it is saved to disk (frozen) or simply deleted. • Virtual AGI Preschool: A virtual world intended for AGI teaching/training/learning, bearing broad resemblance to the preschool environments used for young humans. • Virtual Embodiment: Using an AGI to control an agent living in a virtual world or game world, typically (but not necessarily) a 3D world with broad similarity to the everyday human world. • Webmind AI Engine: A predecessor to the Novamente Cognition Engine and OpenCog, developed 1997-2001 – with many similar concepts (and also some different ones) but quite different algorithms and software architecture References 325 References

[AABL02] Nancy Alvarado, Sam S. Adams, Steve Burbeck, and Craig Latta. Beyond the turing test: Performance metrics for evaluating a computer simulation of the human mind. Development and Learning, International Conf. on, 0, 2002. [AD02] David Aubin and Amy Dahan Dalmedico. Writing the history of dynamical sys- tems and chaos: Longue durée and revolution, disciplines and cultures. Historia Mathematica, 29(3):273 – 339, 2002. [ARC09] I. Arel, D. Rose, and R. Coop. Destin: A scalable deep learning architecture with application to high-dimensional robust pattern recognition. Proc. AAAI Workshop on Biologically Inspired Cognitive Architectures, 2009. [Aus99] James Austin. Zen and the Brain. MIT Press, 1999. [Baa97] Bernard Baars. In the Theater of Consciousness: The Workspace of the Mind. Oxford University Press, 1997. [Bac09] Joscha Bach. Principles of Synthetic Intelligence. Oxford University Press, 2009. [Bat79] Gregory Bateson. Mind and Nature: A Necessary Unity. New York: Ballantine, 1979. [BF71] J. D. Bransford and J. Franks. The abstraction of linguistic ideas. Cognitive Psy- chology, 2:331–350, 1971. [Bol98] B. Bollobas. Modern Graph Theory. Springer, 1998. [Cal96] William Calvin. The Cerebral Code. MIT Press, 1996. [Cha97] David Chalmers. The Conscious Mind. Oxford University Press, 1997. [Cha09] Mark Changizi. The Vision Revolution. BenBella Books, 2009. [Den91] Daniel Dennett. Consciousness Explained. Back Bay, 1991. [Den93] Daniel Dennett. Consciousness Explained. Penguin, 1993. [DOP08] Wlodzislaw Duch, Richard Oentaryo, and Michel Pasquier. Cognitive architectures: Where do we go from here? Proc. of the Second Conf. on AGI, 2008. [Fre95] Walter Freeman. Societies of Brains. Erlbaum, 1995. [FT02] G. Fauconnier and M. Turner. The Way We Think: Conceptual Blending and the Mind’s Hidden Complexities. Basic, 2002. [Ful89] Robert Fulghum. All I Really Need To Know I Learned In Kindergarten. Ivy Books, 1989. [Gar99] H Gardner. Intelligence reframed: Multiple intelligences for the 21st century. Basic, 1999. [GB09] Ben Goertzel and Stephan Vladimir Bugaj. Agi preschool. In Proc. of the Second Conf. on Artificial General Intelligence. Atlantis Press, 2009. [GdG08] Ben Goertzel and Hugo de Garis. Xia-man: An extensible, integrative architecture for intelligent humanoid robotics. pages 86–90, 2008. [GEA08] Ben Goertzel and Cassio Pennachin Et Al. An integrative methodology for teaching embodied non-linguistic agents, applied to virtual animals in second life. In Proc.of the First Conf. on AGI. IOS Press, 2008. [GI11] B. Goertzel and M. Iklé. Steps toward a geometry of mind. In J Schmidhuber and K Thorisson, editors, Subm.to AGI-11. Springer, 2011. [GIGH08] B. Goertzel, M. Ikle, I. Goertzel, and A. Heljakka. Probabilistic Logic Networks. Springer, 2008. [GMIH08] B. Goertzel, I. Goertzel M. Iklé, and A. Heljakka. Probabilistic Logic Networks. Springer, 2008. 326 A Glossary

[Goe93] Ben Goertzel. The Evolving Mind. Plenum, 1993. [Goe94] Ben Goertzel. Chaotic Logic. Plenum, 1994. [Goe01] Ben Goertzel. Creating Internet Intelligence. Plenum Press, 2001. [Goe06] Ben Goertzel. The Hidden Pattern. Brown Walker, 2006. [Goe08] Ben Goertzel. A pragmatic path toward endowing virtually-embodied ais with human-level linguistic capability. IEEE World Congress on Computational Intelli- gence (WCCI), 2008. [Goe09a] Ben Goertzel. Cognitive synergy: A universal principle of feasible general intelli- gence? In ICCI 2009, Hong Kong, 2009. [Goe09b] Ben Goertzel. The embodied communication prior. In Proceedings of ICCI-09, Hong Kong, 2009. [Goe10a] Ben Goertzel. Opencogprime wikibook. 2010. http://wiki.opencog.org/w/ OpenCogPrime:WikiBook. [Goe10b] Ben Goertzel. Toward a formal definition of real-world general intelligence. 2010. [Goe12] Ben Goertzel. Perception processing for general intelligence: Bridging the symbol- ic/subsymbolic gap. In Proceedings of AGI-12, Lecture Notes in Computer Science. Springer, 2012. [Goe04] Ben Goertzel. Perception processing for general intelligence: Bridging the symbol- ic/subsymbolic gap. http://wp.goertzel.org/?p=404. [GP05] Ben Goertzel and Cassio Pennachin. Artificial General Intelligence. Springer, 2005. [GPC+11] Ben Goertzel, Joel Pitt, Zhenhua Cai, Jared Wigmore, Deheng Huang, Nil Geisweiller, Ruiting Lian, and Gino Yu. Integrative general intelligence for con- trolling game ai in a minecraft-like environment. In Proc. of BICA 2011, 2011. [GPI+10] Ben Goertzel, Joel Pitt, Matthew Ikle, Cassio Pennachin, and Rui Liu. Glocal memory: a design principle for artificial brains and minds. Neurocomputing, April 2010. [GPPG06] Ben Goertzel, Hugo Pinto, Cassio Pennachin, and Izabela Freire Goertzel. Using de- pendency parsing and probabilistic inference to extract relationships between genes, proteins and malignancies implicit among multiple biomedical research abstracts. In Proc. of Bio-NLP 2006, 2006. [Hey07] F. Heylighen. The Global Superorganism: an evolutionary-cybernetic model of the emerging network society. Social Evolution and History 6-1, 2007. [HF95] P. Hayes and K. Ford. Turing test considered harmful. IJCAI-14, 1995. [HG08] David Hart and Ben Goertzel. Opencog: A software framework for integrative arti- ficial general intelligence. In AGI, volume 171 of Frontiers in Artificial Intelligence and Applications, pages 468–472. IOS Press, 2008. [Hof79] Douglas Hofstadter. Godel, Escher, Bach: An Eternal Golden Braid. Basic, 1979. [Hof96] Douglas Hofstadter. Metamagical Themas. Basic Books, 1996. [Hut96] Edwin Hutchins. Cognition in the Wild. MIT Press, 1996. [Hut05] Marcus Hutter. Universal Artificial Intelligence: Sequential Decisions based on Al- gorithmic Probability. Springer, 2005. [IE08] E. M. Izhikevich and G. M. Edelman. Large-scale model of mammalian thalam- ocortical systems. Proc. of the national academy of sciences, 105(9):3593–3593, 2008. [Ito93] Kiyosi Ito, editor. Encyclopedic Dictionary of Mathematics: The Mathematical So- ciety of Japan, Volume 1, page 487. MIT Press, 1993. References 327

[JL08] D. J. Jilk and C. Lebiere. and o’reilly. R. C. and Anderson, J. R. (2008). SAL: An explicitly pluralistic cognitive architecture. Journal of Experimental and Theoretical Artificial Intelligence, 20:197–218, 2008. [Kur06] Ray Kurzweil. The Singularity is Near. 2006. [Lai12] John E Laird. The Soar Cognitive Architecture. MIT Press, 2012. [LD03] A. Laud and G. Dejong. The influence of reward on the speed of reinforcement learning. Proc. of the 20th International Conf. on Machine Learning, 2003. [LH07] Shane Legg and Marcus Hutter. A definition of machine intelligence. Minds and Machines, 17, 2007. [LLL06] T. Walsh L Li and M. Littman. Towards a unified theory of state abstraction for mdps. Proc. of the ninth international symposium on AI and mathematics., 2006. [Loo06] Moshe Looks. Competent Program Evolution. PhD Thesis, Computer Science De- partment, Washington University, 2006. [LWML09] John Laird, Robert Wray, Robert Marinier, and Pat Langley. Claims and challenges in evaluating human-level intelligent systems. Proc. of AGI-09, 2009. [Mar06] H. Markram. The blue brain project. Nature Reviews Neuroscience, 7(2):153–160, 2006. [Met04] Thomas Metzinger. Being No One. Bradford, 2004. [MK08] Jonathan Mugan and Benjamin Kuipers. Towards the application of reinforcement learning to undirected developmental learning. International Conf. on Epigenetic Robotics, 2008. [NK04] A. Nestor and B. Kokinov. Towards active vision in the dual cognitive architecture. International Journal on Information Theories and Applications, 11, 2004. [Pei34] C. Peirce. Collected papers: Volume V. and pragmaticism. Harvard University Press. Cambridge MA., 1934. [RCK01] J. Rosbe, R. S. Chong, and D. E. Kieras. Modeling with perceptual and memory constraints: An epic-soar model of a simplified enroute air traffic control task. SOAR Technology Inc. Report, 2001. [RM95] H. L. Roediger and K. B. McDermott. Creating false memories: Remembering words not presented in lists. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21:803–814, 1995. [Ros88] Israel Rosenfield. The Invention of Memory: A New View of the Brain. Basic Books, 1988. [Sam10] Alexei V. Samsonovich. Toward a unified catalog of implemented cognitive archi- tectures. In BICA, pages 195–244, 2010. [Sch91a] Juergen Schmidhuber. Curious model-building control systems.. Proc. International Joint Conf. on Neural Networks, 1991. [Sch91b] Juergen Schmidhuber. A possibility for implementing curiosity and boredom in model-building neural controllers. Proc. of the International Conf. on Simulation of Adaptive Behavior: From Animals to Animats, 1991. [Sch95] Juergen Schmidhuber. Reinforcement-driven information acquisition in non- deterministic environments. Proc. ICANN’95, 1995. [Sch06] J. Schmidhuber. Godel machines: Fully Self-referential Optimal Universal Self- improvers. In B. Goertzel and C. Pennachin, editors, Artificial General Intelligence, pages 119–226. 2006. [SZ04] R. Sun and X. Zhang. Top-down versus bottom-up learning in cognitive skill ac- quisition. Cognitive Systems Research, 5, 2004. 328 A Glossary

[TC05] Endel Tulving and R. Craik. The Oxford Handbook of Memory. Oxford U. Press, 2005. [TM95] S. Thrun and Tom Mitchell. Lifelong robot learning. Robotics and Autonomous Systems, 1995. [Tur50] Alan Turing. Computing machinery and intelligence. Mind, 59, 1950. [Wan06] Pei Wang. Rigid Flexibility: The Logic of Intelligence. Springer, 2006. [Who64] Benjamin Lee Whorf. Language, Thought and Reality. 1964. [WM07] Steve Wozniak and Peter Moon. Three minutes with steve wozniak. PC World, July 19, 2007. [WP00] P H Winne and N E Perry. Measuring self-regulated learning, pages 531–566. Aca- demic Press, 2000.