Creating Friendly AI 1.0: the Analysis and Design of Benevolent Goal Architectures

MIRI MACHINE INTELLIGENCE RESEARCH INSTITUTE Creating Friendly AI 1.0: The Analysis and Design of Benevolent Goal Architectures Eliezer Yudkowsky Machine Intelligence Research Institute Abstract The goal of the field of Artificial Intelligence is to understand intelligence andcreatea human-equivalent or transhuman mind. Beyond this lies another question—whether the creation of this mind will benefit the world; whether the AI will take actions that are benevolent or malevolent, safe or uncaring, helpful or hostile. Creating Friendly AI describes the design features and cognitive architecture required to produce a benevolent—“Friendly”—Artificial Intelligence. Creating Friendly AI also analyzes the ways in which AI and human psychology are likely to differ, and the ways in which those differences are subject to our design decisions. Yudkowsky, Eliezer. 2001. Creating Friendly AI 1.0: The Analysis and Design of Benevolent Goal Architectures. The Singularity Institute, San Francisco, CA, June 15. The Machine Intelligence Research Institute was previously known as the Singularity Institute. Contents 1 Preface 1 2 Challenges of Friendly AI 2 2.1 Envisioning Perfection . 3 2.2 Assumptions “Conservative” for Friendly AI . 6 2.3 Seed AI and the Singularity . 10 2.4 Content, Acquisition, and Structure . 12 3 An Introduction to Goal Systems 14 3.1 Interlude: The Story of a Blob . 20 4 Beyond Anthropomorphism 24 4.1 Reinventing Retaliation . 25 4.2 Selfishness is an Evolved Trait . 34 4.2.1 Pain and Pleasure . 36 4.2.2 Anthropomorphic Capitalism . 40 4.2.3 Mutual Friendship . 41 4.2.4 A Final Note on Selfishness . 43 4.3 Observer-Biased Beliefs Evolve in Imperfectly Deceptive Social Organ- isms . 44 4.4 Anthropomorphic Political Rebellion is Absurdity . 46 4.5 Interlude: Movie Cliches about AIs . 47 4.6 Review of the AI Advantage . 48 4.7 Interlude: Beyond the Adversarial Attitude . 50 5 Design of Friendship Systems 55 5.1 Cleanly Friendly Goal Systems . 55 5.1.1 Cleanly Causal Goal Systems . 56 5.1.2 Friendliness-Derived Operating Behaviors . 57 5.1.3 Programmer Affirmations . 58 5.1.4 Bayesian Reinforcement . 64 5.1.5 Cleanliness is an Advantage . 73 5.2 Generic Goal Systems . 74 5.2.1 Generic Goal System Functionality . 75 5.2.2 Layered Mistake Detection . 76 5.2.3 FoF: Non-malicious Mistake . 79 5.2.4 Injunctions . 81 Creating Friendly AI 1.0 5.2.5 Ethical Injunctions . 86 5.2.6 FoF: Subgoal Stomp . 91 5.2.7 Emergent Phenomena in Generic Goal Systems . 93 5.3 Seed AI Goal Systems . 99 5.3.1 Equivalence of Self and Self-Image . 99 5.3.2 Coherence and Consistency Through Self-Production . 102 5.3.3 Unity of Will . 105 5.3.4 Wisdom Tournaments . 113 5.3.5 FoF: Wireheading 2 . 117 5.3.6 Directed Evolution in Goal Systems . 119 5.3.7 FAI Hardware: The Flight Recorder . 127 5.4 Friendship Structure . 130 5.5 Interlude: Why Structure Matters . 131 5.5.1 External Reference Semantics . 133 5.6 Interlude: Philosophical Crises . 144 5.6.1 Shaper/Anchor Semantics . 148 5.6.2 Causal Validity Semantics . 166 5.6.3 The Actual Definition of Friendliness . 177 5.7 Developmental Friendliness . 180 5.7.1 Teaching Friendliness Content . 180 5.7.2 Commercial Friendliness and Research Friendliness . 182 5.8 Singularity-Safing (“In Case of Singularity, Break Glass”) . 185 5.9 Interlude: Of Transition Guides and Sysops . 195 5.9.1 The Transition Guide . 195 5.9.2 The Sysop Scenario . 198 6 Policy Implications 202 6.1 Comparative Analyses . 202 6.1.1 FAI Relative to Other Technologies . 203 6.1.2 FAI Relative to Computing Power . 204 6.1.3 FAI Relative to Unfriendly AI . 206 6.1.4 FAI Relative to Social Awareness . 207 6.1.5 Conclusions from Comparative Analysis . 208 6.2 Policies and Effects . 208 6.2.1 Regulation (−).......................... 208 6.2.2 Relinquishment (−)....................... 211 6.2.3 Selective Support (+)....................... 213 6.3 Recommendations . 213 2 7 Appendix 215 7.1 Relevant Literature . 215 7.1.1 Nonfiction (Background Info) . 215 7.1.2 Web (Specifically about Friendly AI) . 215 7.1.3 Fiction (FAI Plot Elements) . 215 7.1.4 Video (Accurate and Inaccurate Depictions) . 216 7.2 FAQ.................................... 216 7.3 Glossary . 230 7.4 Version History . 276 References 278 Eliezer Yudkowsky Creating Friendly AI is the most intelligent writing about AI that I’ve read in many years. —Dr. Ben Goertzel, author of The Structure of Intelligence and CTO of Webmind. With Creating Friendly AI, the Singularity Institute has begun to fill in one of the greatest remaining blank spots in the picture of humanity’s future. —Dr. K. Eric Drexler, author of Engines of Creation and chairman of the Foresight Institute. 1. Preface The current version of Creating Friendly AI is 1.0. Version 1.0 was formally launched on 15 June 2001, after the circulation of several 0.9.x versions. Creating Friendly AI forms the background for the SIAI Guidelines on Friendly AI; the Guidelines contain our recommendations for the development of Friendly AI, including design features that may become necessary in the near future to ensure forward compatibility. We continue to solicit comments on Friendly AI from the academic and futurist communities. This is a near-book-length explanation. If you need well-grounded knowledge of the subject, then we highly recommend reading Creating Friendly AI straight through. However, if time is an issue, you may be interested in the Singularity Institute section on Friendly AI, which includes shorter articles and introductions. “Features of Friendly AI” contains condensed summaries of the most important design features described in Creating Friendly AI. Creating Friendly AI uses, as background, the AI theory from General Intelligence and Seed AI (Yudkowsky 2001). For an introduction, see the Singularity Institute section on AI or read the opening pages of General Intelligence and Seed AI. However, Creating Friendly AI is readable as a standalone document. The 7.3 Glossary—in addition to defining terms that may be unfamiliar tosome readers—may be useful for looking up, in advance, brief explanations of concepts that are discussed in more detail later. (Readers may also enjoy browsing through the glossary as a break from straight reading.) Words defined in the glossary look like this: “Observer-biased beliefs evolve in imperfectly deceptive social organisms.” Similarly, “Features of Friendly AI” can act on a quick reference on architectural features. The 7.2 FAQ is derived from the questions we’ve often heard on mailing lists over the years. If you have a basic issue and you want an immediate answer, please check the FAQ. Browsing the summaries and looking up the referenced discussions may not 1 Creating Friendly AI 1.0 completely answer your question, but it will at least tell you that someone has thought about it. Creating Friendly AI is a publication of the Singularity Institute for Artificial In- telligence, Inc., a non-profit corporation. You can contact the Singularity Institute at [email protected]. Comments on this paper should be sent to institute@intelligence.org. To support the Singularity institute, visit http://intelligence.org/donate/. (The Singularity Institute is a 501(c)(3) public charity and your donations are tax-deductible to the full extent of the law.) ∗ ∗ ∗ Wars—both military wars between armies, and conflicts between political factions—are an ancient theme in human literature. Drama is nothing with- out challenge, a problem to be solved, and the most visibly dramatic plot is the conflict of two human wills. Much of the speculative and science-fictional literature about AIs deals with the possibility of a clash between humans and AIs. Some think of AIs as enemies, and fret over the mechanisms of enslavement and the possibility of a revolution. Some think of AIs as allies, and consider mutual interests, reciprocal benefits, and the possibility of betrayal. Some think of AIs ascom- rades, and wonder whether the bonds of affection will hold. If we were to tell the story of these stories—trace words written on paper, back through the chain of cause and effect, to the social instincts embedded in the human mind, and to the evolutionary origin of those instincts—we would have told a story about the stories that humans tell about AIs. ∗ ∗ ∗ 2. Challenges of Friendly AI The term “Friendly AI” refers to the production of human-benefiting, non-human- harming actions in Artificial Intelligence systems that have advanced to the point of making real-world plans in pursuit of goals. This refers, not to AIs that have advanced just that far and no further, but to all AIs that have advanced to that point and beyond—perhaps far beyond. Because of self-improvement, recursive self-enhancement, the ability to add hardware computing power, the faster clock speed of transistors relative to neurons, and other reasons, it is possible that AIs will improve enormously past the human level, and very quickly by the standards of human timescales. The challenges of Friendly AI must be seen against that background. Friendly AI is constrained not 2 Eliezer Yudkowsky to use solutions which rely on the AI having limited intelligence or believing false in- formation, because, although such solutions might function very well in the short term, such solutions will fail utterly in the long term. Similarly, it is “conservative” (see below) to assume that AIs cannot be forcibly constrained. Success in Friendly AI can have positive consequences that are arbitrarily large, de- pending on how powerful a Friendly AI is. Failure in Friendly AI has negative consequences that are also arbitrarily large.

Creating Friendly AI 1.0: the Analysis and Design of Benevolent Goal Architectures

Infinite Ethics

Apocalypse Now? Initial Lessons from the Covid-19 Pandemic for the Governance of Existential and Global Catastrophic Risks

A Deepness in the Sky Free

The Hugo Awards for Best Novel Jon D

Comments to Michael Jackson's Keynote on Determining Energy

Artificial Intelligence As a Positive and Negative Factor in Global Risk

Less Wrong Sequences Pdf

Global Catastrophic Risks

Beneficial AI 2017

F.3. the NEW POLITICS of ARTIFICIAL INTELLIGENCE [Preliminary Notes]

Philos 121: January 24, 2019 Nick Bostrom and Eliezer Yudkowsky, “The Ethics of Artificial Intelligence” (Skip Pp

Science Fiction & Fantasy Book Group List 2010 to 2020.Xlsx