Ethical Artificial Intelligence

Ethical Artificial Intelligence Bill Hibbard Space Science and Engineering Center University of Wisconsin - Madison and Machine Intelligence Research Institute, Berkeley, CA 17 November 2015 First Edition Please send typo and error reports, and any other comments, to [email protected]. Copyright © Bill Hibbard 2014, 2015 Preface Recent research is giving us ways to define the behaviors of future artificial intelligence (AI) systems, before they are built, by mathematical equations. We can use these equations to describe various broad types of unintended and harmful AI behaviors, and to propose AI design techniques that avoid those behaviors. That is the primary subject of this book. Because AI will affect everyone's future, the book is written to be accessible to readers at different levels. Mathematical explanations are provided for those who want details, but it is also possible to skip over the math and follow the general arguments via text and illustrations. The introductory and final sections of the mathematical chapters (2 −4 and 6 −9) avoid mathematical notation. While this book discusses dangers posed by AI and proposes solutions, it is only a snapshot of ongoing research from my particular point of view and is not offered as the final answer. The problem of defining and creating ethical AI continues, and interest in it will grow enormously. While the effort to solve this problem is currently much smaller than is devoted to Internet security, ultimately there is no limit to the effort that ethical AI merits. Hollywood movies such as the Terminator and Matrix series depict violent war between AI and humans. In reality future AI would defeat humans with brains rather than brawn. Unethical AI designs may incite resentment and hatred among humans, and con humans out of their wealth. Humans may not even be aware that manipulative AI is the source of their troubles. This book describes a number of technical problems facing the design of ethical AI systems and makes the case for various approaches for solving these problems. The first chapter argues that whereas current AI systems use models of their environments that are simpler than human mental models, future AI systems will be different because they will learn environment models that are more complex than human mental models. This will make it difficult to include safeguards in their designs. This book makes the case for utilitarian ethics for AI in its second chapter. Any complete and transitive preferences among outcomes can be expressed by utility functions, whereas preferences that are not complete and transitive don't support the choice of a most prefered outcome. We often think of ethics as a set of rules. Utility functions can express the preferences defined by rules and provide a way to resolve the ambiguities among conflicting rules. The fourth chapter makes the case that the technical, mathematical problems of infinite sets can be avoided by restricting AI definitions to finite sets, which are adequate for our finite universe. Chapter five defines the problem of unintended instrumental actions. These have been described as "basic AI drives" and as "instrumental goals." However, this book argues that these are properly described as actions rather than drives or goals. ii In its sixth chapter this book argues that the problem of self-delusion can be avoided using model-based utility functions. This chapter also defines the problem of inconsistency between the agent's utility function and its definition (Armstrong (2015) refers to one case of this problem as "motivated value selection"). The seventh chapter argues that humans are unable to adequately describe their values by any set of rules, and that human values should instead be learned statistically. Accurately learned human values are the key to the avoiding unintended instrumental actions described in the fifth chapter. Chapter seven describes the problem of corrupting the reward generator (Bostrom (2014) calls this "perverse instantiation") and argues that it can be avoided using a three- argument model-based utility function that evaluates outcomes at one point in time from the perspective of humans at a different point in time. This chapter also discusses integrating Rawls' Theory of Justice into AI designs, and the issue of AI systems that evolve to adapt to the values of an evolving humanity. Chapter eight describes several problems for AI systems embedded in the real world: how they evaluate increases to their limited computational resources, the possibility of being predicted by other agents, and the possibility that AI systems may violate their design intentions as they and humanity evolve. It proposes a self-modeling framework for such systems and describes how this framework can enable systems to evaluate increases to their resources, avoid being predicted by other agents, and avoid inconsistency between the agent's utility function and its definition (avoiding this problem contributes to maintaining the invariance of design intentions). The eighth chapter also discusses the difficulty of proving ethical properties of AI systems and proposes an alternate approach of estimating and minimizing the probability of failure. The ninth chapter describes the issues of testing AI systems. It discusses the problem of tested AI systems acting in the real world. Chapter ten describes political problems associated with AI systems. When greater-than- human intelligence can be created by technology, it will be difficult to avoid a future in which inequality of wealth among humans translates into inequality of intelligence. This chapter makes the case that developers of advanced AI systems represent the interests of all future humans and should be transparent about their efforts. The eleventh chapter expresses my hope that humanity will employ its super-intelligent AI systems to add meaning to their lives via scientific discovery, rather than to pursue a hedonistic dead end. The AI design proposed in this book is collectivist and intrusive. This does not match my political instincts, which are somewhat libertarian. However, as discussed in Section 10.3, a libertarian approach to powerful AI would create competition among AI systems which may be dangerous to humans. Safety constraints on competing systems may be difficult to enforce. I wish to thank Terri Gregory and Leanne Avila for help with the manuscript, and Howard Johnson for help with my figure drawing process. For typo and error reports, and other iii suggestions, I wish to thank Joshua Fox, José Hernández-Orallo, Luke Muehlhauser, Brian Muhia, Laurent Orseau, John Rose, Richard Sutton, and Tim Tyler. Please send reports of typos and errors, and any other suggestions, to me at [email protected]. This book will be permanently available at: arxiv.org/abs/1411.1373. More frequent updates should be available at: www.ssec.wisc.edu/~billh/g/Ethical_AI.pdf. My AI fiction is available at: https://sites.google.com/site/whibbard/g/mcnrs. iv Table of Contents 1. Future AI Will be Different from Current AI ............................................................................... 1 1.1 Book Outline ......................................................................................................................... 7 2. How Should We Instruct AI? ....................................................................................................... 9 2.1 A Mathematical Framework for AI ..................................................................................... 12 2.2 Utility-Maximizing Agents ................................................................................................... 14 2.3 Utilitarian Ethics for AI ........................................................................................................ 19 3. How Can AI Agents Learn Environment Models? ..................................................................... 21 3.1 Universal AI ......................................................................................................................... 23 3.2 A Formal Measure of Intelligence ....................................................................................... 24 3.3 Modifications of the Agent Framework .............................................................................. 26 3.4 The Ethics of Learned Environment Models ....................................................................... 28 4. AI in Our Finite Universe ........................................................................................................... 29 4.1 Learning Finite Stochastic Models ...................................................................................... 31 4.2 When Is the Most Likely Finite Stochastic Model the True Model? ................................... 34 4.3 Finite and Infinite Logic ....................................................................................................... 41 4.4 Agents Based on Logical Proof ............................................................................................ 46 4.5 Consistent Logic for Agents ................................................................................................ 48 4.6 The Ethics of Finite and Infinite Sets for AI ......................................................................... 50 5. Unintended Instrumental Actions ............................................................................................ 51 5.1 Maximizing Future Choice .................................................................................................

Ethical Artificial Intelligence

Infinite Ethics

Apocalypse Now? Initial Lessons from the Covid-19 Pandemic for the Governance of Existential and Global Catastrophic Risks

Artificial Consciousness and the Consciousness-Attention Dissociation

Computability and Complexity

Inventing Computational Rhetoric

Letters to the Editor

Computability Theory

Comments to Michael Jackson's Keynote on Determining Energy

Introduction to the Theory of Computation Computability, Complexity, and the Lambda Calculus Some Notes for CIS262

Exploratory Engineering in AI

Artificial Intelligence As a Positive and Negative Factor in Global Risk

Programming with Recursion