Arxiv:1607.08289V4 [Cs.AI] 21 Jan 2019 Keywords: USA CA [email protected] Francisco, San FPC, Vicarious Hay J

1 Mammalian Value Systems Gopal P. Sarma School of Medicine, Emory University, Atlanta, GA USA [email protected] Nick J. Hay Vicarious FPC, San Francisco, CA USA [email protected] Keywords: Friendly AI, value alignment, human values, biologically inspired AI, human-mimetic AI Characterizing human values is a topic deeply interwoven with the sciences, humanities, political philosophy, art, and many other human endeavors. In recent years, a number of thinkers have argued that accelerating trends in computer science, cognitive science, and related disciplines foreshadow the creation of intelligent machines which meet and ultimately surpass the cognitive abilities of human beings, thereby entangling an understanding of human values with future technological development. Contemporary research accomplishments suggest increasingly sophisticated AI systems becoming widespread and responsible for managing many aspects of the modern world, from preemptively planning users’ travel schedules and logistics, to fully autonomous vehicles, to domestic robots assisting in daily living. The extrapolation of these trends has been most forcefully described in the context of a hypothetical “intelligence explosion,” in which the capabilities of an intelligent software agent would rapidly increase due to the presence of feedback loops unavailable to biological organisms. The possibility of superintelligent agents, or simply the widespread deployment of sophisticated, autonomous AI systems, highlights an important theoretical problem: the need to separate the cognitive and rational capacities of an agent from the fundamental goal structure, or value system, which constrains and guides the agent’s actions. The “value alignment problem” is to specify a goal structure for autonomous agents compatible with human values. In this brief article, we suggest that ideas from affective neuroscience and related disciplines aimed at characterizing neurological and behavioral universals in arXiv:1607.08289v4 [cs.AI] 21 Jan 2019 the mammalian class provide important conceptual foundations relevant to describing human values. We argue that the notion of “mammalian value systems” points to a potential avenue for fundamental research in AI safety and AI ethics. 1 Introduction saying that artificial intelligence has become a critical part of the infrastructure supporting Artificial intelligence, a term coined in the 1950’s modern human society. Indeed, prominent ven- at the now famous Dartmouth Conference, has ture capitalist Mark Andresseen famously wrote come to have a widespread impact on the modern that “software is eating the world,” in reference world [1, 2]. If we broaden the phrase to include to the ubiquitous deployment of software systems all software, and in particular, software respon- across all industries and organizations, and the sible for the control and operation of physical corresponding growth of the financial investment machinery, planning and operations manage- into software companies [3]. ment, or other tasks requiring sophisticated information processing, then it goes without Nonetheless, there is a fundamental gap 2 Sarma and Hay between the abilities of the most sophisticated would such an agent choose to do? What are the software-based control systems today and the ca- constraints that would guide its actions and to pacities of a human child or even many animals. what degree can these actions be shaped by the Our AI systems have yet to display the capacity designers? If a superintelligence can reason about for learning, creativity, independent thought and influence the world to a substantially greater and discovery that define human intelligence. degree than human beings themselves, how can It is a near-consensus position, however, that we design a system to be compatible with human at some point in the future, we will be able to values? Is it even possible to formalize the notion create software-based agents whose cognitive of human values? Are human values a monolithic, capacities rival those of human beings. While internally consistent entity, or are there intrinsic there is substantial variability in researchers’ conflicts and contradictions between the values forecasts about the time-horizons of the critical of individuals and between the value systems of breakthroughs and the consequences of achiev- different cultures? [9,12–16]. ing human-level artificial intelligence, there it is little disagreement that it is an attainable It is our belief that the value alignment 1 milestone [4,5]. problem is of fundamental importance both for its relevance to near-term developments likely Some have argued that the creation of human- to be realized by the computer and robotics level artificial intelligence would be followed by industries and for longer- term possibilities an “intelligence explosion,” whereby the intelli- of more sophisticated AI systems leading to gence of the software-based system would rapidly superintelligence. Furthermore, the broader increase due to its ability to analyze, model, and set of problems posed by the realization of improve its cognition by re-writing its codebase, intelligent, autonomous, software-based agents in a feat of self-improvement impossible for may provide an important unifying framework biological organisms. The net result would be that brings together disparate areas of inquiry a “superintelligence,” that is, an agent whose spanning computer science, cognitive science, fundamental cognitive abilities vastly exceed our philosophy of mind, behavioral neuroscience, and own [9–12]. anthropology, to name just a few. To be more explicit, let us consider a superin- In this article, we set aside the question of how, telligence to be any agent which can surpass the when, and if AI systems will be developed that are sum total of human cognitive and emotional abil- of sufficient sophistication to require a solution to ities. These abilities might include intellectual the value alignment problem. This is a substan- tasks such as mathematical or scientific research, tial topic in its own right which has been analyzed artistic invention in musical composition or elsewhere. We assume the feasibility of these sys- poetry, political philosophy and the crafting of tems as a starting point for further analysis of the public policy, or social skills and the ability to goal structures of autonomous agents and propose recognize and respond to human emotions. Many the notion of “mammalian value systems” as pro- commentators in recent years and decades have viding a framework for further research. predicted that convergent advances in computer science, robotics, and related disciplines will give rise to the development of superintelligent 2 Goal Structures for machines during the 21st century [4]. Autonomous Agents If it is possible to create a superintelligence, 2.1 The Orthogonality Thesis then a number of natural questions arise: What The starting point for discussing AI goal struc- 1 tures is the observation that the cognitive capac- There have been a number of prominent thinkers ities of an intelligent agent are independent of who have expressed strongly conservative viewpoints about AI timelines. See, for example, commentaries by David the goal structure that constrains or guides the Deutsch, Rodney Brooks, and Douglas Hofstadter [6–8]. agents’ actions, what Bostrom calls the “orthog- Mammalian Value Systems 3 onality thesis:” As the intelligent capabilities of an agent grows, the consequences for slight deviations from hu- We have seen that a superintelligence could man values will become greatly magnified. The have a great ability to shape the future ac- reason is that such an agent possesses increasing cording to its goals. But what will its goals capacity to achieve its goals, however arbitrary be? What is the relation between intelli- those goals might be. It is for this reason that gence and motivation in an artificial agent? researchers concerned with the value alignment Here we develop two theses. The orthogo- problem have distanced themselves from the fic- nality thesis holds (with some caveats) that titious and absurd scenarios portrayed in Holly- intelligence and final goals are independent wood thrillers. These movies often depict outright variables: any level of intelligence could be malevolent agents whose explicit aim is to destroy combined with any final goal. The instru- or enslave humanity. What is implicit in these mental convergence thesis holds that super- stories is a goal structure that has been explic- intelligent agents having any of a wide range itly defined to be in opposition to human values. of final goals will nevertheless pursue similar But as the simple example of the domestic robot intermediary goals because they have com- illustrates, this is hardly the risk we face with so- mon instrumental reasons to do so. Taken phisticated AI systems. The true risk is that if together, these theses help us to think about we incorrectly or inadequately specify the goals what a superintelligent agent would do. [9] of a sufficiently capable agent, then it will devote The orthogonality thesis allows us to illus- its cognitive capacities to a task that is at odds trate the importance of autonomous agents with our values in ways that may be subtle or being guided by human-compatible goal struc- even bizarre. In the example given above, there tures, whether they are truly superintelligent was no malevolence or ulterior motive behind the as Bostrom envisions, or even more modestly robot making a nutritious meal out

Arxiv:1607.08289V4 [Cs.AI] 21 Jan 2019 Keywords: USA CA [email protected] Francisco, San FPC, Vicarious Hay J

Intelligent Agents

Intelligent Agents - Catholijn M

Multi-Agent Reinforcement Learning: an Overview∗

An Exploration in Stigmergy-Based Navigation Algorithms

Expert Assessment of Stigmergy: a Report for the Department of National Defence

Intelligent Agent

A Systematic Approach to Artificial Agents

Adversarial Attack and Defense in Reinforcement Learning-From AI Security View Tong Chen , Jiqiang Liu, Yingxiao Xiang, Wenjia Niu*, Endong Tong and Zhen Han

Extending Universal Intelligence Models with Formal Notion of Representation

Intelligent Agents Web-Mining Agents

Using Stories to Teach Human Values to Artificial Agents

A Deep Reinforcement Learning-Based Approach to Intelligent Powertrain Control for Automated Vehicles *