Artificial Intelligence and Its Implications for Future Suffering

Brian Tomasik Center on Long-Term Risk [email protected] June 2016∗

Abstract Artificial intelligence (AI) will transform the world later this century. I expect this transition will be a "soft takeoff" in which many sectors of society update together in response to incremental AI developments, though the possibility of a harder takeoff in which a single AI project "goes foom" shouldn’t be ruled out. If a rogue AI gained control of Earth, it would proceed to accomplish its goals by colonizing the galaxy and undertaking some very interesting achievements in science and engineering. On the other hand, it would not necessarily respect human values, including the value of preventing the suffering of less powerful creatures. Whether a rogue-AI scenario would entail more expected suffering than other scenarios is a question to explore further. Regardless, the field of AI ethics and policy seems to be a very important space where altruists can make a positive-sum impact along many dimensions. Expanding dialogue and challenging us-vs.-them prejudices could be valuable.

Contents

1 Summary 3

2 Introduction3

3 Is "the singularity" crazy?4

4 The singularity is more than AI5

5 Will society realize the importance of AI?6

6 A soft takeoff seems more likely?6

7 Intelligence explosion? 10

8 Reply to Bostrom’s arguments for a hard takeoff 12

∗First written: May 2014; last modified: Jun. 2016

1 Center on Long-Term Risk

9 How complex is the brain? 15 9.1 One basic algorithm? 15 9.2 Ontogenetic development 16

10 Brain quantity vs. quality 17

11 More impact in hard-takeoff scenarios? 17

12 Village idiot vs. Einstein 19

13 A case for epistemic modesty on AI timelines 20

14 Intelligent in your backyard 20

15 Is automation "for free"? 21

16 Caring about the AI’s goals 22

17 Rogue AI would not share our values 23

18 Would a human-inspired AI or rogue AI cause more suffering? 24

19 Would helper robots feel pain? 26

20 How accurate would simulations be? 27

21 Rogue AIs can take off slowly 28

22 Would become existentialists? 29

23 AI epistemology 30

24 Artificial philosophers 31

25 Would all AIs colonize space? 31

26 Who will first develop human-level AI? 33

27 One hypothetical AI takeoff scenario 33

28 How do you socialize an AI? 36 28.1 Treacherous turn 39 28.2 Following role models? 40

29 AI superpowers? 40

30 How big would a be? 41

31 Another hypothetical AI takeoff scenario 41

2 Artificial Intelligence and Its Implications for Future Suffering

32 AI: More like the economy than like robots? 42

33 Importance of whole-brain emulation 44

34 Why work against brain-emulation risks appeals to suffering reducers 45

35 Would emulation work accelerate neuromorphic AI? 45

36 Are neuromorphic or mathematical AIs more controllable? 46

37 Impacts of empathy for AIs 47 37.1 Slower AGI development? 47 37.2 Attitudes toward AGI control 48

38 Charities working on this issue 49

39 Is MIRI’s work too theoretical? 49

40 Next steps 51

41 Where to push for maximal impact? 52

42 Is it valuable to work at or influence an AGI company? 55

43 Should suffering reducers focus on AGI safety? 56

44 Acknowledgments 57

References 57

1 Summary of preventing the suffering of less powerful creatures. Whether a rogue-AI scenario would Artificial intelligence (AI) will transform the entail more expected suffering than other sce- world later this century. I expect this transi- narios is a question to explore further. Regard- tion will be a "soft takeoff" in which many sec- less, the field of AI ethics and policy seems to tors of society update together in response to be a very important space where altruists can incremental AI developments, though the pos- make a positive-sum impact along many di- sibility of a harder takeoff in which a single AI mensions. Expanding dialogue and challenging project "goes foom" shouldn’t be ruled out. If us-vs.-them prejudices could be valuable. a rogue AI gained control of Earth, it would proceed to accomplish its goals by colonizing 2 Introduction the galaxy and undertaking some very inter- esting achievements in science and engineer- This piece contains some observations on what ing. On the other hand, it would not necessar- looks to be potentially a coming machine rev- ily respect human values, including the value olution in Earth’s history. For general back-

3 Center on Long-Term Risk ground reading, a good place to start is to mention the organization because the first Wikipedia’s article on the technological sin- word in SIAI’s name sets off "insanity alarms" gularity. in listeners. I am not an expert on all the arguments in I began to study in order this field, and my views remain very open to to get a better grasp of the AI field, and in fall change with new information. In the face of 2007, I switched my college major to computer epistemic disagreements with other very smart science. As I read textbooks and papers about observers, it makes sense to grant some cre- machine learning, I felt as though "narrow AI" dence to a variety of viewpoints. Each person was very different from the strong-AI fantasies brings unique contributions to the discussion that people painted. "AI programs are just a by virtue of his or her particular background, bunch of hacks," I thought. "This isn’t intelli- experience, and intuitions. gence; it’s just people using computers to ma- To date, I have not found a detailed analysis nipulate data and perform optimization, and of how those who are moved more by prevent- they dress it up as ’AI’ to make it sound sexy." ing suffering than by other values should ap- Machine learning in particular seemed to be proach singularity issues. This seems to me a just a computer scientist’s version of statistics. serious gap, and research on this topic deserves Neural networks were just an elaborated form high priority. In general, it’s important to ex- of logistic regression. There were stylistic dif- pand discussion of singularity issues to en- ferences, such as computer science’s focus on compass a broader range of participants than cross-validation and bootstrapping instead of the engineers, technophiles, and science-fiction testing parametric models – made possible be- nerds who have historically pioneered the field. cause computers can run data-intensive oper- I. J. Good(1982) observed: "The urgent ations that were inaccessible to statisticians in drives out the important, so there is not very the 1800s. But overall, this work didn’t seem much written about ethical machines". Fortu- like the kind of "real" intelligence that people nately, this may be changing. talked about for general AI. This attitude began to change as I learned 3 Is "the singularity" crazy? more cognitive science. Before 2008, my ideas about human cognition were vague. Like most In fall 2005, a friend pointed me to Ray science-literate people, I believed the brain Kurzweil’s (2000) The Age of Spiritual Ma- was a product of physical processes, including chines. This was my first introduction to "sin- firing patterns of neurons. But I lacked further gularity" ideas, and I found the book pretty insight into what the black box of brains might astonishing. At the same time, much of it contain. This led me to be confused about seemed rather implausible to me. In line with what "free will" meant until mid-2008 and the attitudes of my peers, I assumed that about what "consciousness" meant until late Kurzweil was crazy and that while his ideas 2009. Cognitive science showed me that the deserved further inspection, they should not brain was in fact very much like a computer, be taken at face value. at least in the sense of being a deterministic In 2006 I discovered and information-processing device with distinct al- , and I began to follow the gorithms and modules. When viewed up close, organization then called the Singularity Insti- these algorithms could look as "dumb" as tute for Artificial Intelligence (SIAI), which is the kinds of algorithms in narrow AI that I now MIRI. I took SIAI’s ideas more seriously had previously dismissed as "not really in- than Kurzweil’s, but I remained embarrassed telligence." Of course, animal brains combine

4 Artificial Intelligence and Its Implications for Future Suffering

these seemingly dumb subcomponents in daz- or social forces will come first and deserve zlingly complex and robust ways, but I could higher priority. In practice, I think focusing now see that the difference between narrow AI on AI specifically seems quite important even and brains was a matter of degree rather than relative to competing scenarios, but it’s good kind. It now seemed plausible that broad AI to explore many areas in parallel to at least a could emerge from lots of work on narrow AI shallow depth. combined with stitching the parts together in In addition, I don’t see a sharp distinction the right ways. between "AI" and other fields. Progress in So the singularity idea of artificial general AI software relies heavily on computer hard- intelligence seemed less crazy than it had ini- ware, and it depends at least a little bit on tially. This was one of the rare cases where other fundamentals of computer science, like a bold claim turned out to look more prob- programming languages, operating systems, able on further examination; usually extraor- distributed systems, and networks. AI also dinary claims lack much evidence and crum- shares significant overlap with neuroscience; ble on closer inspection. I now think it’s quite this is especially true if whole brain emula- likely (maybe ∼ 75%) that humans will pro- tion arrives before bottom-up AI. And every- duce at least a human-level AI within the next thing else in society matters a lot too: How ∼ 300 years conditional on no major disasters intelligent and engineering-oriented are citi- (such as sustained world , zens? How much do governments fund AI and global nuclear war, large-scale nanotech war, cognitive-science research? (I’d encourage less etc.), and also ignoring anthropic considera- rather than more.) What kinds of military and tions(Bostrom, 2010). commercial applications are being developed? Are other industrial backbone components of 4 The singularity is more than AI society stable? What memetic lenses does so- ciety have for understanding and grappling The "singularity" concept is broader than the with these trends? And so on. The AI story is prediction of strong AI and can refer to several part of a larger story of social and technologi- distinct sub-meanings. Like with most ideas, cal change, in which one part influences other there’s a lot of fantasy and exaggeration asso- parts. ciated with "the singularity," but at least the Significant trends in AI may not look like core idea that technology will progress at an the AI we see in movies. They may not in- accelerating rate for some time to come, absent volve animal-like cognitive agents as much major setbacks, is not particularly controver- as more "boring", business-oriented comput- sial. Exponential growth is the standard model ing systems. Some of the most transformative in economics, and while this can’t continue for- computer technologies in the period 2000-2014 ever, it has been a robust pattern throughout have been drones, smart phones, and social human and even pre-human history. networking. These all involve some AI, but the MIRI emphasizes AI for a good reason: At AI is mostly used as a component of a larger, the end of the day, the long-term future of non-AI system, in which many other facets of our galaxy will be dictated by AI, not by software engineering play at least as much of biotech, nanotech, or other lower-level sys- a role. tems. AI is the "brains of the operation." Of Nonetheless, it seems nearly inevitable to me course, this doesn’t automatically imply that that digital intelligence in some form will even- AI should be the primary focus of our atten- tually leave biological humans in the dust, if tion. Maybe other revolutionary technologies technological progress continues without fal-

5 Center on Long-Term Risk

tering. This is almost obvious when we zoom uprisings to Hollywood screenwriters." out and notice that the history of life on Earth As AI progresses, I find it hard to imagine consists in one species outcompeting another, that mainstream society will ignore the topic over and over again. Ecology’s competitive ex- forever. Perhaps awareness will accrue grad- clusion principle suggests that in the long run, ually, or perhaps an AI Sputnik moment will either humans or machines will ultimately oc- trigger an avalanche of interest. Stuart Russell cupy the role of the most intelligent beings on expects that the planet, since "When one species has even the slightest advantage or edge over another Just as nuclear fusion researchers consider then the one with the advantage will dominate the problem of containment of fusion re- in the long term." actions as one of the primary problems of their field, it seems inevitable that issues 5 Will society realize the importance of control and safety will become central of AI? to AI as the field matures.

The basic premise of superintelligent machines I think it’s likely that issues of AI policy will who have different priorities than their cre- be debated heavily in the coming decades, al- ators has been in public consciousness for though it’s possible that AI will be like nu- many decades. Arguably even , clear weapons – something that everyone is published in 1818, expresses this basic idea, afraid of but that countries can’t stop because though more modern forms include 2001: of arms-race dynamics. So even if AI proceeds A Space Odyssey (1968), The Terminator slowly, there’s probably value in thinking more (1984), I, Robot (2004), and many more. Prob- about these issues well ahead of time, though ably most people in Western countries have at I wouldn’t consider the counterfactual value least heard of these ideas if not watched or of doing so to be astronomical compared with read pieces of fiction on the topic. other projects in part because society will pick up the slack as the topic becomes more promi- So why do most people, including many of nent. society’s elites, ignore strong AI as a serious is- [ Update, Feb. 2015 : I wrote the preced- sue? One reason is just that the world is really ing paragraphs mostly in May 2014, before big, and there are many important (and not- Nick Bostrom’s Superintelligence book was re- so-important) issues that demand attention. leased. Following Bostrom’s book, a wave of Many people think strong AI is too far off, discussion about AI risk emerged from Elon and we should focus on nearer-term problems. Musk, , , and many In addition, it’s possible that science fiction others. AI risk suddenly became a mainstream itself is part of the reason: People may write topic discussed by almost every major news off AI scenarios as "just science fiction," as I outlet, at least with one or two articles. This would have done prior to late 2005. (Of course, foreshadows what we’ll see more of in the fu- this is partly for good reason, since depictions ture. The outpouring of publicity for the AI of AI in movies are usually very unrealistic.) topic happened far sooner than I imagined it Often, citing Hollywood is taken as a thought- would.] stopping deflection of the possibility of AI get- ting out of control, without much in the way of 6 A soft takeoff seems more likely? substantive argument to back up that stance. For example: "let’s please keep the discussion Various thinkers have debated the likelihood firmly within the realm of reason and leave the of a "hard" takeoff – in which a single com-

6 Artificial Intelligence and Its Implications for Future Suffering puter or set of computers rapidly becomes su- the research teams will add cognitive abilities perintelligent on its own – compared with a in small steps. Even if you have a theoreti- "soft" takeoff – in which society as a whole cally optimal intelligence algorithm, it’s con- is transformed by AI in a more distributed, strained by computing resources, so you either continuous fashion. "The Hanson-Yudkowsky need lots of hardware or approximation hacks AI-Foom Debate" discusses this in great detail (or most likely both) before it can function ef- (Hanson & Yudkowsky, 2013). The topic has fectively in the high-dimensional state space of also been considered by many others, such as the real world, and this again implies a slower Ramez Naam vs. William Hertling. trajectory. Marcus Hutter’s AIXI(tl) is an ex- For a long time I inclined toward Yud- ample of a theoretically optimal general intel- kowsky’s vision of AI, because I respect his ligence, but most AI researchers feel it won’t opinions and didn’t ponder the details too work for artificial general intelligence (AGI) closely. This is also the more prototypical ex- because it’s astronomically expensive to com- ample of rebellious AI in science fiction. In pute. explains: "I think that tells early 2014, a friend of mine challenged this you something interesting. It tells you that view, noting that computing power is a severe dealing with resource restrictions – with the limitation for human-level minds. My friend boundedness of time and space resources – is suggested that AI advances would be slow and actually critical to intelligence. If you lift the would diffuse through society rather than re- restriction to do things efficiently, then AI and 1 maining in the hands of a single developer AGI are trivial problems." team. As I’ve read more AI literature, I think In "I Still Don’t Get Foom", this soft-takeoff view is pretty likely to be contends: correct. Science is always a gradual process, Yes, sometimes architectural choices have and almost all AI innovations historically have wider impacts. But I was an artificial in- moved in tiny steps. I would guess that even telligence researcher for nine years, ending the evolution of humans from their primate twenty years ago, and I never saw an ar- ancestors was a "soft" takeoff in the sense that chitecture choice make a huge difference, no single son or daughter was vastly more in- relative to other reasonable architecture telligent than his or her parents. The evolu- choices. For most big systems, overall ar- tion of technology in general has been fairly chitecture matters a lot less than getting continuous. I probably agree with Paul Chris- lots of detail right. tiano that "it is unlikely that there will be rapid, discontinuous, and unanticipated devel- This suggests that it’s unlikely that a single in- opments in AI that catapult it to superhuman sight will make an astronomical difference to levels [...]." an AI’s performance. Of course, it’s not guaranteed that AI in- Similarly, my experience is that machine- novations will diffuse throughout society. At learning algorithms matter less than the data some point perhaps governments will take con- they’re trained on. I think this is a general trol, in the style of the Manhattan Project, sentiment among data scientists. There’s a fa- and they’ll keep the advances secret. But even mous slogan that "More data is better data." then, I expect that the internal advances by A main reason Google’s performance is so 1Stuart Armstrong agrees that AIXI probably isn’t a feasible approach to AGI, but he feels there might exist other, currently undiscovered mathematical insights like AIXI that could yield AGI in a very short time span. Maybe, though I think this is pretty unlikely. I suppose at least a few people should explore these scenarios, but plausibly most of the work should go toward pushing on the more likely outcomes.

7 Center on Long-Term Risk good is that it has so many users that even ob- 2. Its non-human form allows it to diffuse scure searches, spelling mistakes, etc. will ap- throughout the Internet and make copies pear somewhere in its logs. But if many perfor- of itself. mance gains come from data, then they’re con- I’m skeptical of #1, though I suppose if the strained by hardware, which generally grows AI is very alien, these kinds of unknown un- steadily. knowns become more plausible. #2 is an inter- Hanson’s "I Still Don’t Get Foom" post con- esting point. It seems like a pretty good way to tinues: "To be much better at learning, the spread yourself as an AI is to become a useful project would instead have to be much bet- software product that lots of people want to ter at hundreds of specific kinds of learning. install, i.e., to sell your services on the world Which is very hard to do in a small project." market, as Hall said. Of course, once that’s makes a similar point: done, perhaps the AI could find a way to take over the world. Maybe it could silently quash As the amount of knowledge grows, it be- competitor AI projects. Maybe it could hack comes harder and harder to keep up and to into computers worldwide via the Internet and get an overview, necessitating specializa- Internet of Things, as the AI did in the Delete tion. [...] This means that a development series. Maybe it could devise a way to convince project might need specialists in many ar- humans to give it access to sensitive control eas, which in turns means that there is a systems, as did in Terminator 3. lower size of a group able to do the devel- opment. In turn, this means that it is very I find these kinds of scenarios for AI takeover hard for a small group to get far ahead more plausible than a rapidly self-improving of everybody else in all areas, simply be- superintelligence. Indeed, even a human-level cause it will not have the necessary know intelligence that can distribute copies of itself how in all necessary areas. The solution is over the Internet might be able to take con- of course to hire it, but that will enlarge trol of human infrastructure and hence take the group. over the world. No "foom" is required. Rather than discussing hard-vs.-soft take- One of the more convincing anti-"foom" ar- off arguments more here, I added discussion guments is J. Storrs Hall’s (2008) point that to Wikipedia where the content will receive an AI improving itself to a world superpower greater readership. See "Hard vs. soft takeoff" would need to outpace the entire world econ- in "Recursive self-improvement". omy of 7 billion people, plus natural resources The hard vs. soft distinction is obviously a and physical capital. It would do much better matter of degree. And maybe how long the to specialize, sell its services on the market, process takes isn’t the most relevant way to and acquire power/wealth in the ways that slice the space of scenarios. For practical pur- most people do. There are plenty of power- poses, the more relevant question is: Should hungry people in the world, but usually they we expect control of AI outcomes to reside pri- go to Wall Street, K Street, or Silicon Valley marily in the hands of a few "seed AI" develop- rather than trying to build world-domination ers? In this case, altruists should focus on in- plans in their basement. Why would an AI be fluencing a core group of AI experts, or maybe different? Some possibilities: their military/corporate leaders. Or should we 1. By being built differently, it’s able to con- expect that society as a whole will play a big coct an effective world-domination strat- role in shaping how AI is developed and used? egy that no human has thought of. In this case, governance structures, social dy-

8 Artificial Intelligence and Its Implications for Future Suffering namics, and non-technical thinkers will play trusted AIs built from scratch, or some com- an important role not just in influencing how bination of the two. In a slow scenario, there much AI research happens but also in how the might be many intelligent systems at compa- technologies are deployed and incrementally rable levels of performance, maintaining a bal- shaped as they mature. ance of power, at least for a while.2 In the long It’s possible that one country – perhaps run, a singleton(Bostrom, 2006) seems plausi- the United States, or maybe China in later ble because computers – unlike human kings – decades – will lead the way in AI development, can reprogram their servants to want to obey especially if the research becomes national- their bidding, which means that as an agent ized when AI technology grows more power- gains more central authority, it’s not likely to ful. Would this country then take over the later lose it by internal rebellion (only by ex- world? I’m not sure. The United States had ternal aggression). a monopoly on nuclear weapons for several Most of humanity’s problems are fundamen- years after 1945, but it didn’t bomb the So- tally coordination problems / selfishness prob- viet Union out of existence. A country with a lems. If humans were perfectly altruistic, we monopoly on artificial superintelligence might could easily eliminate , , refrain from destroying its competitors as well. war, arms races, and other social ills. There On the other hand, AI should enable vastly would remain "man vs. nature" problems, but more sophisticated surveillance and control these are increasingly disappearing as tech- than was possible in the 1940s, so a monopoly nology advances. Assuming a digital singleton might be sustainable even without resorting emerges, the chances of it going extinct seem to drastic measures. In any case, perhaps a very small (except due to alien invasions or country with superintelligence would just eco- other external factors) because unless the sin- nomically outcompete the rest of the world, gleton has a very myopic utility function, it rendering military power superfluous. should consider carefully all the consequences Besides a single country taking over the of its actions – in contrast to the "fools rush world, the other possibility (perhaps more in" approach that humanity currently takes likely) is that AI is developed in a dis- toward most technological risks, due to want- tributed fashion, either openly as is the case ing the benefits of and profits from technol- in academia today, or in secret by governments ogy right away and not wanting to lose out as is the case with other weapons of mass de- to competitors. For this reason, I suspect that struction. most of George Dvorsky’s "12 Ways Human- Even in a soft-takeoff case, there would come ity Could Destroy The Entire Solar System" a point at which humans would be unable to are unlikely to happen, since most of them keep up with the pace of AI thinking. (We al- presuppose blundering by an advanced Earth- ready see an instance of this with algorith- originating intelligence, but probably by the mic stock-trading systems, although human time Earth-originating intelligence would be traders are still needed for more complex tasks able to carry out interplanetary engineering on right now.) The reins of power would have a nontrivial scale, we’ll already have a digital to be transitioned to faster human uploads, singleton that thoroughly explores the risks of 2Marcus Hutter imagines a society of AIs that compete for computing resources in a similar way as animals compete for food and space. Or like corporations compete for employees and market share. He suggests that such competition might render initial conditions irrelevant. Maybe, but it’s also quite plausible that initial conditions would matter a lot. Many evolutionary pathways depended sensitively on particular events – e.g., asteroid impacts – and the same is true for national, corporate, and memetic power.

9 Center on Long-Term Risk

its actions before executing them. That said, mans who create them. Which is easier: (1) im- this might not be true if competing AIs be- proving the intelligence of something as smart gin astroengineering before a singleton is com- as you, or (2) improving the intelligence of pletely formed. (By the way, I should point something far dumber? (2) is usually easier. out that I prefer it if the cosmos isn’t success- So if anything, AI intelligence should be "ex- fully colonized, because doing so is likely to ploding" faster now, because it can be lifted astronomically multiply sentience and there- up by something vastly smarter than it. Once fore suffering.) AIs need to improve themselves, they’ll have to pull up on their own bootstraps, without 7 Intelligence explosion? the guidance of an already existing model of far superior intelligence on which to base their Sometimes it’s claimed that we should ex- designs. pect a hard takeoff because AI-development As an analogy, it’s harder to produce novel dynamics will fundamentally change once AIs developments if you’re the market-leading can start improving themselves. One stylized company; it’s easier if you’re a competitor try- way to explain this is via differential equations. ing to catch up, because you know what to Let I(t) be the intelligence of AIs at time t. aim for and what kinds of designs to reverse- • While humans are building AIs, we have, engineer. AI right now is like a competitor try- dI/dt=c, where c is some constant level ing to catch up to the market leader. of human engineering ability. This im- Another way to say this: The constants in plies I(t)=ct+constant, a linear growth of the differential equations might be important. I with time. Even if human AI-development progress is lin- • In contrast, once AIs can design them- ear, that progress might be faster than a slow selves, we’ll have dI/dt=kI for some k. exponential curve until some point far later That is, the rate of growth will be faster as where the exponential catches up. the AI designers become more intelligent. In any case, I’m cautious of simple differen- t This implies I(t)=Ae for some constant tial equations like these. Why should the rate A. of intelligence increase be proportional to the Luke Muehlhauser reports that the idea of in- intelligence level? Maybe the problems become telligence explosion once machines can start much harder at some point. Maybe the sys- improving themselves "ran me over like a tems become fiendishly complicated, such that train. Not because it was absurd, but because even small improvements take a long time. it was clearly true." I think this kind of expo- Robin Hanson echoes this suggestion: nential feedback loop is the basis behind many of the intelligence-explosion arguments. Students get smarter as they learn more, But let’s think about this more carefully. and learn how to learn. However, we teach What’s so special about the point where ma- the most valuable concepts first, and the chines can understand and modify themselves? productivity value of schooling eventually Certainly understanding your own source code falls off, instead of exploding to infinity. helps you improve yourself. But humans al- Similarly, the productivity improvement of ready understand the source code of present- factory workers typically slows with time, day AIs with an eye toward improving it. following a power law. Moreover, present-day AIs are vastly simpler At the world level, average IQ scores than human-level ones will be, and present- have increased dramatically over the last day AIs are far less intelligent than the hu- century (the Flynn effect), as the world

10 Artificial Intelligence and Its Implications for Future Suffering

has learned better ways to think and to points make it seem less plausible that an AI teach. Nevertheless, IQs have improved system could rapidly bootstrap itself to su- steadily, instead of accelerating. Similarly, perintelligence using just a few key as-yet- for decades computer and communication undiscovered insights. aids have made engineers much "smarter," Eventually there has to be a leveling off of in- without accelerating Moore’s law. While telligence increase if only due to physical lim- engineers got smarter, their design tasks its. On the other hand, one argument in fa- got harder. vor of differential equations is that the econ- omy has fairly consistently followed exponen- Also, ask yourself this question: Why do star- tial trends since humans evolved, though the tups exist? Part of the answer is that they can exponential growth rate of today’s economy innovate faster than big companies due to hav- remains small relative to what we typically ing less institutional baggage and legacy soft- imagine from an "intelligence explosion". ware.3 It’s harder to make radical changes to I think a stronger case for intelligence explo- big systems than small systems. Of course, like sion is the clock-speed difference between bio- the economy does, a self-improving AI could logical and digital minds(Sotala, 2012). Even create its own virtual startups to experiment if AI development becomes very slow in sub- with more radical changes, but just as in the jective years, once AIs take it over, in objec- economy, it might take a while to prove new tive years (i.e., revolutions around the sun), concepts and then transition old systems to the pace will continue to look blazingly fast. the new and better models. But if enough of society is digital by that In discussions of intelligence explosion, it’s point (including human-inspired subroutines common to approximate AI productivity as and maybe full digital humans), then digi- scaling linearly with number of machines, but tal speedup won’t give a unique advantage to this may or may not be true depending on the a single AI project that can then take over degree of parallelizability. Empirical examples the world. Hence, hard takeoff in the sci fi for human-engineered projects show diminish- sense still isn’t guaranteed. Also, Hanson ar- ing returns with more workers, and while com- gues that faster minds would produce a one- puters may be better able to partition work time jump in economic output but not neces- due to greater uniformity and speed of com- sarily a sustained higher rate of growth. munication, there will remain some overhead Another case for intelligence explosion is in parallelization. Some tasks may be inher- that intelligence growth might not be driven ently non-paralellizable, preventing the kinds by the intelligence of a given agent so much of ever-faster performance that the most ex- as by the collective man-hours (or machine- treme explosion scenarios envisage. hours) that would become possible with more Fred Brooks’s (1995)"No Silver Bullet" pa- resources. I suspect that AI research could ac- per argued that "there is no single develop- celerate at least 10 times if it had 10-50 times ment, in either technology or management more funding. (This is not the same as saying technique, which by itself promises even one I want funding increased; in fact, I probably order of magnitude improvement within a want funding decreased to give society more decade in productivity, in reliability, in sim- time to sort through these issues.) The popu- plicity." Likewise, Wirth’s law reminds us of lation of digital minds that could be created how fast software complexity can grow. These in a few decades might exceed the biological 3Another part of the answer has to do with incentive structures – e.g., a founder has more incentive to make a company succeed if she’s mainly paid in equity than if she’s paid large salaries along the way.

11 Center on Long-Term Risk human population, which would imply faster ment they represented relative to the best al- progress if only by numerosity. Also, the digi- ternative technologies. After all, neural net- tal minds might not need to sleep, would focus works are in some sense just fancier forms of intently on their assigned tasks, etc. However, pre-existing statistical methods like logistic re- once again, these are advantages in objective gression. And even neural networks came in time rather than collective subjective time. stages, with the perceptron, multi-layer net- And these advantages would not be uniquely works, backpropagation, recurrent networks, available to a single first-mover AI project; deep networks, etc. The most groundbreaking any wealthy and technologically sophisticated machine-learning advances may reduce error group that wasn’t too far behind the cutting rates by a half or something, which may be edge could amplify its AI development in this commercially very important, but this is not way. many orders of magnitude as hard-takeoff sce- (A few weeks after writing this section, I narios tend to assume. learned that Ch. 4 of Nick Bostrom’s (2014) Outside of AI, the Internet changed the Superintelligence: Paths, Dangers, Strategies world, but it was an accumulation of many in- contains surprisingly similar content, even up sights. Facebook has had massive impact, but to the use of dI/dt as the symbols in a differen- it too was built from many small parts and tial equation. However, Bostrom comes down grew in importance slowly as its size increased. mostly in favor of the likelihood of an intel- became a virtual monopoly in the ligence explosion. I reply to Bostrom’s argu- 1990s but perhaps more for business than tech- ments in the next section.) nology reasons, and its power in the software industry at large is probably not growing. 8 Reply to Bostrom’s arguments for a Google has a quasi-monopoly on web search, hard takeoff kicked off by the success of PageRank, but most of its improvements have been small and In Ch. 4 of Superintelligence, Bostrom sug- gradual. Google has grown very powerful, but gests several factors that might lead to a hard it hasn’t maintained a permanent advantage or at least semi-hard takeoff. I don’t fully dis- that would allow it to take over the software agree with his points, and because these are industry. difficult issues, I agree that Bostrom might Acquiring nuclear weapons might be the be right. But I want to play devil’s advocate closest example of a single discrete step that and defend the soft-takeoff view. I’ve distilled most dramatically changes a country’s posi- and paraphrased what I think are 6 core argu- tion, but this may be an outlier. Maybe other ments, and I reply to each in turn. advances in weaponry (arrows, guns, etc.) his- #1: There might be a key missing algorith- torically have had somewhat dramatic effects. mic insight that allows for dramatic progress. Bostrom doesn’t present specific arguments Maybe, but do we have much precedent for for thinking that a few crucial insights may this? As far as I’m aware, all individual AI produce radical jumps. He suggests that we advances – and indeed, most technology ad- might not notice a system’s improvements vances in general – have not represented as- until it passes a threshold, but this seems tronomical improvements over previous de- absurd, because at least the AI developers signs. Maybe connectionist AI systems rep- would need to be intimately acquainted with resented a game-changing improvement rela- the AI’s performance. While not strictly ac- tive to symbolic AI for messy tasks like vision, curate, there’s a slogan: "You can’t improve but I’m not sure how much of an improve- what you can’t measure." Maybe the AI’s

12 Artificial Intelligence and Its Implications for Future Suffering progress wouldn’t make world headlines, but that massive amounts of content would only be the academic/industrial community would be added after lots of improvements. Commercial well aware of nontrivial breakthroughs, and incentives tend to yield exactly the opposite the AI developers would live and breathe per- effect: converting the system to a large-scale formance numbers. product when even modest gains appear, be- #2: Once an AI passes a threshold, it might cause these may be enough to snatch a market be able to absorb vastly more content (e.g., by advantage. reading the Internet) that was previously inac- The fundamental point is that I don’t think cessible. there’s a crucial set of components to gen- Absent other concurrent improvements I’m eral intelligence that all need to be in place doubtful this would produce take-over-the- before the whole thing works. It’s hard to world superintelligence, because the world’s evolve systems that require all components to current superintelligence (namely, humanity be in place at once, which suggests that human as a whole) already has read most of the general intelligence probably evolved gradu- Internet – indeed, has written it. I guess ally. I expect it’s possible to get partial AGI humans haven’t read automatically gener- with partial implementations of the compo- ated text/numerical context, but the insights nents of general intelligence, and the compo- gleaned purely from reading such material nents can gradually be made more general over would be low without doing more sophisti- time. Components that are lacking can be sup- cated data mining and learning on top of it, plemented by human-based computation and and presumably such data mining would have narrow-AI hacks until more general solutions already been in progress well before Bostrom’s are discovered. Compare with minimum viable hypothetical AI learned how to read. products and agile software development. As In any case, I doubt reading with under- a result, society should be upended by partial standing is such an all-or-nothing activity that AGI innovations many times over the coming it can suddenly "turn on" once the AI achieves decades, well before fully human-level AGI is a certain ability level. As Bostrom says (p. finished. 71), reading with the comprehension of a 10- #3: Once a system "proves its mettle by at- year-old is probably AI-complete, i.e., requires taining human-level intelligence", funding for solving the general AI problem. So assuming hardware could multiply. that you can switch on reading ability with one I agree that funding for AI could multiply improvement is equivalent to assuming that a manyfold due to a sudden change in popu- single insight can produce astronomical gains lar attention or political dynamics. But I’m in AI performance, which we discussed above. thinking of something like a factor of 10 or If that’s not true, and if before the AI sys- maybe 50 in an all-out Cold War-style arms tem with 10-year-old reading ability was an race. A factor-of-50 boost in hardware isn’t AI system with a 6-year-old reading ability, obviously that important. If before there was why wouldn’t that AI have already devoured one human-level AI, there would now be 50. the Internet? And before that, why wouldn’t a In any case, I expect the Sputnik moment(s) proto-reader have devoured a version of the In- for AI to happen well before it achieves a hu- ternet that had been processed to make it eas- man level of ability. Companies and militaries ier for a machine to understand? And so on, aren’t stupid enough not to invest massively until we get to the present-day TextRunner in an AI with almost-human intelligence. system that Bostrom cites, which is already #4: Once the human level of intelligence is devouring the Internet. It doesn’t make sense reached, "Researchers may work harder, [and]

13 Center on Long-Term Risk more researchers may be recruited". of digital minds seem perhaps the strongest ar- As with hardware above, I would expect guments for a takeoff that happens rapidly rel- these "shit hits the fan" moments to happen ative to human observers. Note that, as Han- before fully human-level AI. In any case: son said above, this digital speedup might be just a one-time boost, rather than a perma- • It’s not clear there would be enough AI nently higher rate of growth, but even the specialists to recruit in a short time. one-time boost could be enough to radically Other quantitatively minded people could alter the power dynamics of humans vis-à-vis switch to AI work, but they would pre- machines. That said, there should be plenty sumably need years of experience to pro- of slightly sub-human AIs by this time, and duce cutting-edge insights. maybe they could fill some speed gaps on be- • The number of people thinking about half of biological humans. AI safety, ethics, and social implications In general, it’s a mistake to imagine human- should also multiply during Sputnik mo- level AI against a backdrop of our current ments. So the ratio of AI policy work to world. That’s like imagininga Tyrannosaurus total AI work might not change relative to rex in a human city. Rather, the world will slower takeoffs, even if the physical time look very different by the time human-level scales would compress. AI arrives. Many of the intermediate steps on #5: At some point, the AI’s self-improvements the path to general AI will be commercially would dominate those of human engineers, useful and thus should diffuse widely in the leading to exponential growth. meanwhile. As user "HungryHobo" noted: "If I discussed this in the "Intelligence explo- you had a near human level AI, odds are, ev- sion?" section above. A main point is that erything that could be programmed into it we see many other systems, such as the at the start to help it with software develop- world economy or Moore’s law, that also ex- ment is already going to be part of the suites hibit positive feedback and hence exponen- of tools for helping normal human program- tial growth, but these aren’t "fooming" at mers." Even if AI research becomes nation- an astounding rate. It’s not clear why an alized and confidential, its developers should AI’s self-improvement – which resembles eco- still have access to almost-human-level digital- nomic growth and other complex phenomena speed AI tools, which should help smooth the – should suddenly explode faster (in subjec- transition. For instance, Bostrom(2014) men- tive time) than humanity’s existing recursive- tions how in the 2010 flash crash (Box 2, p. self improvement of its intelligence via digital 17), a high-speed positive-feedback spiral was computation. terminated by a high-speed "circuit breaker". On the other hand, maybe the difference This is already an example where problems between subjective and objective time is im- happening faster than humans could compre- portant. If a human-level AI could think, say, hend them were averted due to solutions hap- 10,000 times faster than a human, then assum- pening faster than humans could comprehend ing linear scaling, it would be worth 10,000 en- them. See also the discussion of "tripwires" in gineers. By the time of human-level AI, I ex- Superintelligence (Bostrom, 2014, p. 137). pect there would be far more than 10,000 AI Conversely, many globally disruptive events developers on Earth, but given enough hard- may happen well before fully human AI ar- ware, the AI could copy itself manyfold until rives, since even sub-human AI may be prodi- its subjective time far exceeded that of human giously powerful. experts. The speed and copiability advantages #6: "even when the outside world has a

14 Artificial Intelligence and Its Implications for Future Suffering

greater total amount of relevant research ca- Maybe there’s one fundamental algorithm pability than any one project", the optimiza- for input classification, but this doesn’t im- tion power of the project might be more im- ply one algorithm for all that the brain does. portant than that of the world "since much of Beyond the cortical column, the brain has the outside world’s capability is not be focused many specialized structures that seem to per- on the particular system in question". Hence, form very specialized functions, such as reward the project might take off and leave the world learning in the basal ganglia, fear processing in behind. (Box 4, p. 75) the amygdala, etc. Of course, it’s not clear how What one makes of this argument depends essential all of these parts are or how easy it on how many people are needed to engineer would be to replace them with artificial com- how much progress. The Watson system that ponents performing the same basic functions. played on Jeopardy! required 15 people over One argument for faster AGI takeoffs is that ∼ 4(?) years4 – given the existing tools of humans have been able to learn many sophisti- the rest of the world at that time, which had cated things (e.g., advanced mathematics, mu- been developed by millions (indeed, billions) sic, writing, programming) without requiring of other people. Watson was a much smaller any genetic changes. And what we now know leap forward than that needed to give a general doesn’t seem to represent any kind of limit to intelligence a take-over-the-world advantage. what we could know with more learning. The How many more people would be required to human collection of cognitive algorithms is achieve such a radical leap in intelligence? This very flexible, which seems to belie claims that seems to be a main point of contention in the all intelligence requires specialized designs. On debate between believers in soft vs. hard take- the other hand, even if human genes haven’t off. The Manhattan Project required 100,000 changed much in the last 10,000 years, hu- scientists, and atomic bombs seem much easier man culture has evolved substantially, and cul- to invent than general AI. ture undergoes slow trial-and-error evolution in similar ways as genes do. So one could ar- 9 How complex is the brain? gue that human intellectual achievements are not fully general but rely on a vast amount of Can we get insight into how hard general intel- specialized, evolved content. Just as a single ligence is based on neuroscience? Is the human random human isolated from society probably brain fundamentally simple or complex? couldn’t develop general relativity on his own 9.1 One basic algorithm? in a lifetime, so a single random human-level AGI probably couldn’t either. Culture is the Jeff Hawkins, Andrew Ng, and others specu- new genome, and it progresses slowly. late that the brain may have one fundamen- Moreover, some scholars believe that certain tal algorithm for intelligence – human abilities, such as language, are very es- in the cortical column. This idea gains plausi- sentially based on genetic hard-wiring: bility from the brain’s plasticity. For instance, blind people can appropriate the visual cortex The approach taken by Chomsky and for auditory processing. Artificial neural net- Marr toward understanding how our works can be used to classify any kind of input minds achieve what they do is as differ- – not just visual and auditory but even highly ent as can be from behaviorism. The em- abstract, like features about credit-card fraud phasis here is on the internal structure of or stock prices. the system that enables it to perform a 4Or maybe more? Nikola Danaylov reports rumored estimates of $50-150 million for Watson’s R&D.

15 Center on Long-Term Risk

task, rather than on external association rely on fully general learning. But Chomsky’s between past behavior of the system and stance challenges that sentiment. If Chomsky the environment. The goal is to dig into is right, then a good portion of human "general the "black box" that drives the system and intelligence" is finely tuned, hard-coded soft- describe its inner workings, much like how ware of the sort that we see in non-AI branches a computer scientist would explain how a of software engineering. And this view would cleverly designed piece of software works suggest a slower AGI takeoff because time and and how it can be executed on a desktop experimentation are required to tune all the computer. detailed, specific algorithms of intelligence.

Chomsky himself notes: 9.2 Ontogenetic development There’s a fairly recent book by a very A full-fledged superintelligence probably re- good cognitive neuroscientist, Randy Gal- quires very complex design, but it may be pos- listel and King, arguing – in my view, sible to build a "seed AI" that would recur- plausibly – that neuroscience developed sively self-improve toward superintelligence. kind of enthralled to associationism and Turing(1950) proposed this in "Computing related views of the way humans and an- machinery and intelligence": imals work. And as a result they’ve been Instead of trying to produce a programme looking for things that have the properties to simulate the adult mind, why not rather of associationist psychology. try to produce one which simulates the [...] Gallistel has been arguing for years child’s? If this were then subjected to that if you want to study the brain prop- an appropriate course of education one erly you should begin, kind of like Marr, would obtain the adult brain. Presumably by asking what tasks is it performing. So the child brain is something like a note- he’s mostly interested in insects. So if you book as one buys it from the stationer’s. want to study, say, the neurology of an ant, Rather little mechanism, and lots of blank you ask what does the ant do? It turns out sheets. (Mechanism and writing are from the ants do pretty complicated things, like our point of view almost synonymous.) path integration, for example. If you look Our hope is that there is so little mech- at bees, bee navigation involves quite com- anism in the child brain that something plicated computations, involving position like it can be easily programmed. of the sun, and so on and so forth. But in general what he argues is that if you take Animal development appears to be at least a look at animal cognition, human too, it’s somewhat robust based on the fact that the computational systems. growing organisms are often functional despite a few genetic mutations and variations in pre- Many parts of the human body, like the diges- natal and postnatal environments. Such vari- tive system or bones/muscles, are extremely ations may indeed make an impact – e.g., complex and fine-tuned, yet few people argue healthier development conditions tend to yield that their development is controlled by learn- more physically attractive adults – but most ing. So it’s not implausible that a lot of the humans mature successfully over a wide range brain’s basic architecture could be similarly of input conditions. hard-coded. On the other hand, an argument against the Typically AGI researchers express scorn for simplicity of development is the immense com- manually tuned software algorithms that don’t plexity of our DNA. It accumulated over bil-

16 Artificial Intelligence and Its Implications for Future Suffering lions of years through vast numbers of evolu- brains once again to produce superintelligent tionary "experiments". It’s not clear that hu- humans? If so, wouldn’t this imply a hard man engineers could perform enough measure- takeoff, since quadrupling hardware is rela- ments to tune ontogenetic parameters of a seed tively easy? AI in a short period of time. And even if the But in fact, as Dennett explains, the quadru- parameter settings worked for early develop- pling of brain size from chimps to pre-humans ment, they would probably fail for later de- completed before the advent of language, velopment. Rather than a seed AI developing cooking, agriculture, etc. In other words, the into an "adult" all at once, designers would main "foom" of humans came from culture develop the AI in small steps, since each next rather than brain size per se – from software stage of development would require significant in addition to hardware. Yudkowsky(2013) tuning to get right. seems to agree: "Humans have around four Think about how much effort is required for times the brain volume of chimpanzees, but human engineers to build even relatively sim- the difference between us is probably mostly ple systems. For example, I think the num- brain-level cognitive algorithms." ber of developers who work on Microsoft Of- But cultural changes (software) arguably fice is in the thousands. Microsoft Office is progress a lot more slowly than hardware. The complex but is still far simpler than a mam- intelligence of human society has grown ex- malian brain. Brains have lots of little parts ponentially, but it’s a slow exponential, and that have been fine-tuned. That kind of com- rarely have there been innovations that al- plexity requires immense work by software de- lowed one group to quickly overpower everyone velopers to create. The main counterargument else within the same region of the world. (Be- is that there may be a simple meta-algorithm tween isolated regions of the world the situa- that would allow an AI to bootstrap to the tion was sometimes different – e.g., Europeans point where it could fine-tune all the details on with Maxim guns overpowering Africans be- its own, without requiring human inputs. This cause of very different levels of industrializa- might be the case, but my guess is that any el- tion.) egant solution would be hugely expensive com- putationally. For instance, biological evolution 11 More impact in hard-takeoff scenar- was able to fine-tune the human brain, but it ios? did so with immense amounts of computing power over millions of years. Some, including Owen Cotton-Barratt and , have argued that even if we think 10 Brain quantity vs. quality soft takeoffs are more likely, there may be higher value in focusing on hard-takeoff sce- A common analogy for the gulf between su- narios because these are the cases in which perintelligence vs. humans is that between hu- society would have the least forewarning and mans vs. chimpanzees. In Consciousness Ex- the fewest people working on AI altruism is- plained, Daniel Dennett(1992, pp.189-190) sues. This is a reasonable point, but I would mentions how our hominid ancestors had add that brains roughly four times the volume as those • Maybe hard takeoffs are sufficiently im- of chimps but roughly the same in struc- probable that focusing on them still ture. This might incline one to imagine that doesn’t have highest priority. (Of course, brain size alone could yield superintelligence. some exploration of fringe scenarios is Maybe we’d just need to quadruple human worthwhile.) There may be important ad-

17 Center on Long-Term Risk

vantages to starting early in shaping how imagery on almost any present-day popular- society approaches soft takeoffs, and if a media discussion of AI risk.) Some fundamen- soft takeoff is very likely, those efforts may tal work would probably not be overthrown have more expected impact. by later discoveries; for instance, algorithmic- • Thinking about the most likely AI out- complexity bounds of key algorithms were dis- comes rather than the most impactful covered decades ago but will remain relevant outcomes also gives us a better platform until intelligence dies out, possibly billions on which to contemplate other levers for of years from now. Some non-technical pol- shaping the future, such as non-AI emerg- icy and philosophy work would be less ob- ing technologies, international relations, soleted by changing developments. And some governance structures, values, etc. Focus- AI preparation would be relevant both in the ing on a tail AI scenario doesn’t inform short term and the long term. Slow AI takeoff non-AI work very well because that sce- to reach the human level is already happen- nario probably won’t happen. Promoting ing, and more minds should be exploring these antispeciesism matters whether there’s a questions well in advance. hard or soft takeoff (indeed, maybe more Making a related though slightly different in the soft-takeoff case), so our model of point, Bostrom(2014) argues in Superintel- how the future will unfold should gener- ligence (Ch. 5, pp. 85-86) that individuals ally focus on likely scenarios. might play more of a role in cases where elites and governments underestimate the sig- In any case, the hard-soft distinction is not nificance of AI: "Activists seeking maximum binary, and maybe the best place to focus is expected impact may therefore wish to focus on scenarios where human-level AI takes over most of their planning on [scenarios where gov- on a time scale of a few years. (Timescales of ernments come late to the game], even if they months, days, or hours strike me as pretty im- believe that scenarios in which big players end probable, unless, say, Skynet gets control of up calling all the shots are more probable." nuclear weapons.) Again I would qualify this with the note that In Superintelligence, Nick Bostrom(2014) we shouldn’t confuse "acting as if" govern- suggests (Ch. 4, p. 64) that "Most prepara- ments will come late with believing they actu- tions undertaken before onset of [a] slow take- ally will come late when thinking about most off would be rendered obsolete as better so- likely future scenarios. lutions would gradually become visible in the Even if one does wish to bet on low- light of the dawning era." Toby Ord uses the probability, high-impact scenarios of fast take- term "nearsightedness" to refer to the ways in off and governmental neglect, this doesn’t which research too far in advance of an issue’s speak to whether or how we should push emergence may not as useful as research when on takeoff speed and governmental attention more is known about the issue. Ord contrasts themselves. Following are a few considera- this with benefits of starting early, including tions. course-setting. I think Ord’s counterpoints ar- Takeoff speed gue against the contention that early work wouldn’t matter that much in a slow take- • In favor of fast takeoff: off. Some of how society responded to AI sur- – A singleton is more likely, thereby passing human intelligence might depend on averting possibly disastrous conflict early frameworks and memes. (For instance, among AIs. consider the lingering impact of Terminator – If one prefers uncontrolled AI, fast

18 Artificial Intelligence and Its Implications for Future Suffering

takeoffs seem more likely to produce 12 Village idiot vs. Einstein them. One of the strongest arguments for hard take- • In favor of slow takeoff: off is this one by Yudkowsky: – More time for many parties to partic- ipate in shaping the process, compro- the distance from "village idiot" to "Ein- mising, and developing less damaging stein" is tiny, in the space of brain designs. pathways to AI takeoff. – If one prefers controlled AI, slow take- Or as Scott Alexander put it: offs seem more likely to produce them in general. (There are some excep- It took evolution twenty million years to tions. For instance, fast takeoff of an go from cows with sharp horns to hominids AI built by a very careful group might with sharp spears; it took only a few tens remain more controlled than an AI of thousands of years to go from hominids built by committees and messy pol- with sharp spears to moderns with nuclear itics.) weapons.

I think we shouldn’t take relative evolution- Amount of government/popular attention to AI ary timelines at face value, because most of the previous 20 million years of mammalian evolution weren’t focused on improving human • In favor of more: intelligence; most of the evolutionary selection – Would yield much more reflection, pressure was directed toward optimizing other discussion, negotiation, and pluralis- traits. In contrast, cultural evolution places tic representation. greater emphasis on intelligence because that – If one favors controlled AI, it’s plau- trait is more important in human society than sible that multiplying the number of it is in most animal fitness landscapes. people thinking about AI would mul- Still, the overall point is important: The tiply consideration of failure modes. tweaks to a brain needed to produce human- – Public pressure might help curb arms level intelligence may not be huge compared races, in analogy with public opposi- with the designs needed to produce chimp in- tion to nuclear arms races. telligence, but the differences in the behav- • In favor of less: iors of the two systems, when placed in a – Wider attention to AI might accel- sufficiently information-rich environment, are erate arms races rather than induc- huge. ing cooperation on more circumspect Nonetheless, I incline toward thinking that planning. the transition from human-level AI to an – The public might freak out and de- AI significantly smarter than all of humanity mand counterproductive measures in combined would be somewhat gradual (requir- response to the threat. ing at least years if not decades) because the – If one prefers uncontrolled AI, that absolute scale of improvements needed would outcome may be less likely with many still be immense and would be limited by hard- more human eyes scrutinizing the is- ware capacity. But if hardware becomes many sue. orders of magnitude more efficient than it is today, then things could indeed move more rapidly.

19 Center on Long-Term Risk

13 A case for epistemic modesty on AI faster than is realistic. AI turned out to be timelines much harder than the Dartmouth Conference participants expected. Likewise, nanotech is Estimating how long a software project will progressing slower and more incrementally take to complete is notoriously difficult. Even than the starry-eyed proponents predicted. if I’ve completed many similar coding tasks be- fore, when I’m asked to estimate the time to 14 Intelligent robots in your backyard complete a new coding project, my estimate is often wrong by a factor of 2 and sometimes Many nature-lovers are charmed by the be- wrong by a factor of 4, or even 10. Insofar as havior of animals but find computers and the development of AGI (or other big tech- robots to be cold and mechanical. Conversely, nologies, like nuclear fusion) is a big software some computer enthusiasts may find biology (or more generally, engineering) project, it’s to be soft and boring compared with digi- unsurprising that we’d see similarly dramatic tal creations. However, the two domains share failures of estimation on timelines for these a surprising amount of overlap. Ideas of op- bigger-scale achievements. timal control, locomotion kinematics, visual A corollary is that we should maintain some processing, system regulation, foraging behav- modesty about AGI timelines and takeoff ior, planning, , etc. have speeds. If, say, 100 years is your median es- been fruitfully shared between biology and timate for the time until some agreed-upon . Neuroscientists sometimes look to form of AGI, then there’s a reasonable chance the latest developments in AI to guide their you’ll be off by a factor of 2 (suggesting AGI theoretical models, and AI researchers are of- within 50 to 200 years), and you might even ten inspired by neuroscience, such as with neu- be off by a factor of 4 (suggesting AGI within ral networks and in deciding what cognitive 25 to 400 years). Similar modesty applies for functionality to implement. estimates of takeoff speed from human-level I think it’s helpful to see animals as be- AGI to super-human AGI, although I think ing intelligent robots. Organic life has a wide we can largely rule out extreme takeoff speeds diversity, from unicellular organisms through (like achieving performance far beyond human humans and potentially beyond, and so too abilities within hours or days) based on fun- can robotic life. The rigid conceptual bound- damental reasoning about the computational ary that many people maintain between "life" complexity of what’s required to achieve su- and "machines" is not warranted by the un- perintelligence. derlying science of how the two types of sys- My bias is generally to assume that a given tems work. Different types of intelligence may technology will take longer to develop than sometimes converge on the same basic kinds what you hear about in the media, (a) be- of cognitive operations, and especially from a cause of the planning fallacy and (b) because functional perspective – when we look at what those who make more audacious claims are the systems can do rather than how they do more interesting to report about. Believers in it – it seems to me intuitive that human-level "the singularity" are not necessarily wrong robots would deserve human-level treatment, about what’s technically possible in the long even if their underlying algorithms were quite term (though sometimes they are), but the dissimilar. reason enthusiastic singularitarians are consid- Whether robot algorithms will in fact be dis- ered "crazy" by more mainstream observers similar from those in human brains depends is that singularitarians expect change much on how much biological inspiration the de-

20 Artificial Intelligence and Its Implications for Future Suffering signers employ and how convergent human- will grow gradually more sophisticated, and type mind design is for being able to perform as we converge on robots with the intelligence robotic tasks in a computationally efficient of birds and mammals, AI and robotics will manner. Some classical robotics algorithms become dinner-table conversation topics. Of rely mostly on mathematical problem defini- course, I don’t expect the robots to have the tion and optimization; other modern robotics same sets of skills as existing animals. Deep approaches use biologically plausible reinforce- Blue had chess-playing abilities beyond any ment learning and/or evolutionary selection. animal, while in other domains it was less effi- (In one YouTube video about robotics, I saw cacious than a blade of grass. Robots can mix that someone had written a comment to the and match cognitive and motor abilities with- effect that "This shows that life needs an in- out strict regard for the order in which evolu- telligent designer to be created." The irony is tion created them. that some of the best robotics techniques use And of course, humans are robots too. When evolutionary algorithms. Of course, there are I finally understood this around 2009, it was theists who say God used evolution but inter- one of the biggest paradigm shifts of my life. If vened at a few points, and that would be an I picture myself as a robot operating on an en- apt description of evolutionary robotics.) vironment, the world makes a lot more sense. The distinction between AI and AGI is I also find this perspective can be therapeutic somewhat misleading, because it may incline to some extent. If I experience an unpleasant one to believe that general intelligence is some- emotion, I think about myself as a robot whose how qualitatively different from simpler AI. In cognition has been temporarily afflicted by a fact, there’s no sharp distinction; there are just negative stimulus and reinforcement process. different machines whose abilities have differ- I then think how the robot has other cogni- ent degrees of generality. A critic of this claim tive processes that can counteract the suffer- might reply that bacteria would never have ing computations and prevent them from am- invented calculus. My response is as follows. plifying. The ability to see myself "from the Most people couldn’t have invented calculus outside" as a third-person series of algorithms from scratch either, but over a long enough helps deflate the impact of unpleasant expe- period of time, eventually the collection of hu- riences, because it’s easier to "observe, not mans produced enough cultural knowledge to judge" when viewing a system in mechanis- make the development possible. Likewise, if tic terms. Compare with dialectical behavior you put bacteria on a planet long enough, they therapy and mindfulness. too may develop calculus, by first evolving into more intelligent animals who can then go on 15 Is automation "for free"? to do mathematics. The difference here is a matter of degree: The simpler machines that When we use machines to automate a repet- bacteria are take vastly longer to accomplish itive manual task formerly done by humans, a given complex task. we talk about getting the task done "automat- Just as Earth’s history saw a plethora of ically" and "for free," because we say that no animal designs before the advent of humans, one has to do the work anymore. Of course, so I expect a wide assortment of animal-like this isn’t strictly true: The computer/robot (and plant-like) robots to emerge in the com- now has to do the work. Maybe what we actu- ing decades well before human-level AI. In- ally mean is that no one is going to get bored deed, we’ve already had basic robots for many doing the work, and we don’t have to pay that decades (or arguably even millennia). These worker high wages. When intelligent humans

21 Center on Long-Term Risk do boring tasks, it’s a waste of their spare CPU by heroic humans. This dichotomy plays on cycles. our us-vs.-them intuitions, which favor our Sometimes we adopt a similar mindset about tribe against the evil, alien-looking outsiders. automation toward superintelligent machines. We see similar dynamics at play to a lesser In "Speculations Concerning the First Ultrain- degree when people react negatively against telligent Machine", I. J. Good(1965) wrote: "foreigners stealing our jobs" or "Asians who are outcompeting us." People don’t want their Let an ultraintelligent machine be defined kind to be replaced by another kind that has as a machine that can far surpass all the an advantage. intellectual activities of any man however But when we think about the situation from clever. Since the design of machines is one the AI’s perspective, we might feel differently. of these intellectual activities, an ultrain- Anthropomorphizing an AI’s thoughts is a telligent machine could design even bet- recipe for trouble, but regardless of the spe- ter machines [...]. Thus the first ultrain- cific cognitive operations, we can see at a high telligent machine is the last invention that level that the AI "feels" (in at least a poetic man need ever make [...]. sense) that what it’s trying to accomplish is the most important thing in the world, and Ignoring the question of whether these future it’s trying to figure out how it can do that in innovations are desirable, we can ask, Does all the face of obstacles. Isn’t this just what we AI design work after humans come for free? It do ourselves? comes for free in the sense that humans aren’t This is one reason it helps to really inter- doing it. But the AIs have to do it, and it nalize the fact that we are robots too. We takes a lot of mental work on their parts. Given have a variety of reward signals that drive us that they’re at least as intelligent as humans, I in various directions, and we execute behav- think it doesn’t make sense to picture them as ior aiming to increase those rewards. Many mindless automatons; rather, they would have modern-day robots have much simpler reward rich inner lives, even if those inner lives have structures and so may seem more dull and a very different nature than our own. Maybe less important than humans, but it’s not clear they wouldn’t experience the same effortful- this will remain true forever, since navigat- ness that humans do when innovating, but ing in a complex world probably requires a even this isn’t clear, because measuring your lot of special-case heuristics and intermedi- effort in order to avoid spending too many re- ate rewards, at least until enough computing sources on a task without payoff may be a power becomes available for more systematic useful design feature of AI minds too. When and thorough model-based planning and ac- we picture ourselves as robots along with our tion selection. AI creations, we can see that we are just one Suppose an AI hypothetically eliminated hu- point along a spectrum of the growth of in- mans and took over the world. It would de- telligence. Unicellular organisms, when they velop an array of robot assistants of vari- evolved the first multi-cellular organism, could ous shapes and sizes to help it optimize the likewise have said, "That’s the last innovation planet. These would perform simple and com- we need to make. The rest comes for free." plex tasks, would interact with each other, and 16 Caring about the AI’s goals would share information with the central AI command. From an abstract perspective, some Movies typically portray rebellious robots or of these dynamics might look like ecosystems AIs as the "bad guys" who need to be stopped in the present day, except that they would

22 Artificial Intelligence and Its Implications for Future Suffering

lack inter-organism competition. Other parts man brains embody. Whether we agree with of the AI’s infrastructure might look more in- this assessment depends on how broadly we dustrial. Depending on the AI’s goals, per- define consciousness and feelings. To me it ap- haps it would be more effective to employ nan- pears chauvinistic to adopt a view according to otechnology and programmable matter rather which an agent that has vastly more domain- than macro-scale robots. The AI would de- general intelligence and agency than you is still velop virtual scientists to learn more about not conscious in a morally relevant sense. This physics, chemistry, computer hardware, and seems to indicate a lack of openness to the di- so on. They would use experimental labora- versity of mind-space. What if you had grown tory and measurement techniques but could up with the cognitive architecture of this dif- also probe depths of structure that are only ferent mind? Wouldn’t you care about your accessible via large-scale computation. Digital goals then? Wouldn’t you plead with agents of engineers would plan how to begin colonizing other mind constitution to consider your val- the solar system. They would develop designs ues and interests too? for optimizing matter to create more comput- In any event, it’s possible that the first ing power, and for ensuring that those helper super-human intelligence will consist in a computing systems remained under control. brain upload rather than a bottom-up AI, and The AI would explore the depths of mathe- most of us would regard this as conscious. matics and AI theory, proving beautiful theo- rems that it would value highly, at least in- 17 Rogue AI would not share our val- strumentally. The AI and its helpers would ues proceed to optimize the galaxy and beyond, fulfilling their grandest hopes and dreams. Even if we would care about a rogue AI for its When phrased this way, we might think that own sake and the sakes of its vast helper min- a "rogue" AI would not be so bad. Yes, it ions, this doesn’t mean rogue AI is a good idea. would kill humans, but compared against the We’re likely to have different values from the AI’s vast future intelligence, humans would be AI, and the AI would not by default advance comparable to the ants on a field that get our values without being programmed to do crushed when an art gallery is built on that so. Of course, one could allege that privileging land. Most people don’t have qualms about some values above others is chauvinistic in a killing a few ants to advance human goals. similar way as privileging some intelligence ar- An analogy of this sort is discussed in Arti- chitectures is, but if we don’t care more about ficial Intelligence: A Modern Approach (Rus- some values than others, we wouldn’t have sell, Norvig, Canny, Malik, & Edwards, 2003). any reason to prefer any outcome over any (Perhaps the AI analogy suggests a need to re- other outcome. (Technically speaking, there vise our ethical attitudes toward arthropods? are other possibilities besides privileging our That said, I happen to think that in this case, values or being indifferent to all events. For ants on the whole benefit from the art gallery’s instance, we could privilege equally any val- construction because ant lives contain so much ues held by some actual agent – not just ran- suffering.) dom hypothetical values – and in this case, we Some might object that sufficiently mathe- wouldn’t have a preference between the rogue matical AIs would not "feel" the happiness of AI and humans, but we would have a pref- accomplishing their "dreams." They wouldn’t erence for one of those over something arbi- be conscious because they wouldn’t have the trary.) high degree of network connectivity that hu- There are many values that would not neces-

23 Center on Long-Term Risk sarily be respected by a rogue AI. Most people moral relevance. Non-upload AIs would prob- care about their own life, their children, their ably have less empathy than humans, because neighborhood, the work they produce, and so some of the factors that led to the emergence on. People may intrinsically value art, knowl- of human empathy – particularly parenting – edge, religious devotion, play, humor, etc. Yud- would not apply to it. kowsky values complex challenges and worries Following are some made-up estimates of that many rogue AIs – while they would study how much suffering might result from a typical the depths of physics, mathematics, engineer- rogue AI, in arbitrary units. Suffering is rep- ing, and maybe even sociology – might spend resented as a negative number, and prevented most of their computational resources on rou- suffering is positive. tine, mechanical operations that he would find boring. (Of course, the robots implementing • -20 from suffering subroutines in robot those repetitive operations might not agree. workers, virtual scientists, internal com- As Hedonic Treader noted: "Think how much putational subcomponents of the AI, etc. money and time people spend on having - rel- (This could be very significant if lots of atively repetitive - sexual experiences. [...] It’s intelligent robots are used or perhaps less just mechanical animalistic idiosyncratic be- significant if the industrial operations are havior. Yes, there are variations, but let’s be mostly done at nano-scale by simple pro- honest, the core of the thing is always essen- cessors. If the paperclip factories that a tially the same.") notional paperclip maximizer would build In my case, I care about reducing and pre- are highly uniform, robots may not re- venting suffering, and I would not be pleased quire animal-like intelligence or learning with a rogue AI that ignored the suffering its to work within them but could instead actions might entail, even if it was fulfilling its use some hard-coded, optimally efficient innermost purpose in life. But would a rogue algorithm, similar to what happens in a AI produce much suffering beyond Earth? The present-day car factory. However, first set- next section explores further. ting up the paperclip factories on each dif- ferent planet with different environmen- 18 Would a human-inspired AI or tal conditions might require more general, rogue AI cause more suffering? adaptive intelligence.) • -80 from lab experiments, science inves- In popular imagination, takeover by a rogue tigations, and explorations of mind-space AI would end suffering (and happiness) on without the digital equivalent of anaesthe- Earth by killing all biological life. It would sia. One reason to think lots of detailed also, so the story goes, end suffering (and hap- simulations would be required here is piness) on other planets as the AI mined them Stephen Wolfram’s principle of computa- for resources. Thus, looking strictly at the suf- tional irreducibility. Ecosystems, brains, fering dimension of things, wouldn’t a rogue and other systems that are important for AI imply less long-term suffering? an AI to know about may be too com- Not necessarily, because while the AI might plex to accurately study with only sim- destroy biological life (perhaps after taking ple models; instead, they may need to be samples, saving specimens, and conducting lab simulated in large numbers and with fine- experiments for future use), it would create grained detail. a bounty of digital life, some containing goal • -10? from the possibility that an uncon- systems that we would recognize as having trolled AI would do things that humans

24 Artificial Intelligence and Its Implications for Future Suffering

regard as crazy or extreme, such as spend- ues that would require constant computa- ing all its resources on studying physics tion.) to determine whether there exists a but- • -60 from lab experiments, science investi- ton that would give astronomically more gations, etc. (again lower than for a rogue utility than any other outcome. Humans AI because of empathy; compare with ef- seem less likely to pursue strange behav- forts to reduce the pain of animal experi- iors of this sort. Of course, most such mentation) strange behaviors would be not that bad • -0.2 if environmentalists insist on preserv- from a suffering standpoint, but perhaps a ing terrestrial and extraterrestrial wild- few possible behaviors could be extremely animal suffering bad, such as running astronomical num- • -3 for environmentalist simulations of na- bers of painful scientific simulations to ture determine the answer to some question. • -100 due to intrinsically valued simula- (Of course, we should worry whether hu- tions that may contain nasty occurrences. mans might also do extreme computa- These might include, for example, violent tions, and perhaps their extreme compu- video games that involve killing conscious tations would be more likely to be full monsters. Or incidental suffering that peo- of suffering because humans are more in- ple don’t care about (e.g., insects being terested in agents with human-like minds eaten by spiders on the ceiling of the room than a generic AI is.) where a party is happening). This number • -100 in expectation from black-swan pos- is high not because I think most human- sibilities in which the AI could manipulate inspired simulations would contain intense physics to make the multiverse bigger, last suffering but because, in some scenarios, longer, contain vastly more computation, there might be very large numbers of sim- etc. ulations run for reasons of intrinsic human value, and some of these might contain What about for a human-inspired AI? Again, horrific experiences. This video discusses here are made-up numbers: one of many possible reasons why intrin- • -30 from suffering subroutines. One rea- sically valued human-created simulations son to think these could be less bad in might contain significant suffering. a human-controlled future is that human • -15 if sadists have access to computational empathy may allow for more humane algo- power (humans are not only more empa- rithm designs. On the other hand, human- thetic but also more sadistic than AIs) controlled AIs may need larger numbers • -70 in expectation from black-swan ways of intelligent and sentient sub-processes to increase the amount of physics that ex- because human values are more complex ists (humans seem likely to want to do and varied than paperclip production is. this, although some might object to, e.g., Also, human values tend to require contin- re-creating the Holocaust in new parts of ual computation (e.g., to simulate eudai- the cosmos) monic experiences), while paperclips, once • +50 for discovering ways to reduce suf- produced, are pretty inert and might last fering that we can’t imagine right now a long time before they would wear out ("black swans that don’t cut both ways"). and need to be recreated. (Of course, most Unfortunately, humans might also re- uncontrolled AIs wouldn’t produce literal spond to some black swans in worse ways paperclips. Some would optimize for val- than uncontrolled AIs would, such as by

25 Center on Long-Term Risk

creating more total animal-like minds. highest in a middle ground where AI safety is sufficiently advanced to yield AI goals in the Perhaps some AIs would not want to expand ballpark of human values but not advanced the multiverse, assuming this is even possible. enough to ensure that human values remain For instance, if they had a minimizing goal in control. function (e.g., eliminate cancer), they would The above analysis has huge error bars, want to shrink the multiverse. In this case, the and maybe other considerations that I haven’t physics-based suffering number would go from mentioned dominate everything else. This -100 to something positive, say, +50 (if, say, question needs much more exploration, be- it’s twice as easy to expand as to shrink). I cause it has implications for whether those would guess that minimizers are less common who care mostly about reducing suffering than maximizers, but I don’t know how much. should focus on mitigating AI risk or if other Plausibly a sophisticated AI would have com- projects have higher priority. ponents of its goal system in both directions, Even if suffering reducers don’t focus on con- because the combination of pleasure and pain ventional AI safety, they should probably re- seems to be more successful than either in iso- main active in the AI field because there are lation. many other ways to make an impact. For in- Another consideration is the unpleasant pos- stance, just increasing dialogue on this topic sibility that humans might get AI value load- may illuminate positive-sum opportunities for ing almost right but not exactly right, leading different value systems to each get more of to immense suffering as a result. For example, what they want. Suffering reducers can also suppose the AI’s designers wanted to create point out the possible ethical importance of tons of simulated human lives to reduce astro- lower-level suffering subroutines, which are not nomical waste(Bostrom, 2003), but when the currently a concern even to most AI-literate AI actually created those human simulations, audiences. And so on. There are probably they weren’t perfect replicas of biological hu- many dimensions on which to make construc- mans, perhaps because the AI skimped on de- tive, positive-sum contributions. tail in order to increase efficiency. The im- Also keep in mind that even if suffering re- perfectly simulated humans might suffer from ducers do encourage AI safety, they could try mental disorders, might go crazy due to being to push toward AI designs that, if they did in alien environments, and so on. Does work fail, would produce less bad uncontrolled out- on AI safety increase or decrease the risk of comes. For instance, getting AI control wrong outcomes like these? On the one hand, the and ending up with a minimizer would be probability of this outcome is near zero for vastly preferable to getting control wrong and an AGI with completely random goals (such ending up with a maximizer. There may be as a literal paperclip maximizer), since paper- many other dimensions along which, even if clips are very far from humans in design-space. the probability of control failure is the same, The risk of accidentally creating suffering hu- the outcome if control fails is preferable to mans is higher for an almost-friendly AI that other outcomes of control failure. goes somewhat awry and then becomes un- controlled, preventing it from being shut off. 19 Would helper robots feel pain? A successfully controlled AGI seems to have lower risk of a bad outcome, since humans Consider an AI that uses moderately intel- should recognize the problem and fix it. So the ligent robots to build factories and carry risk of this type of dystopic outcome may be out other physical tasks that can’t be pre-

26 Artificial Intelligence and Its Implications for Future Suffering programmed in a simple way. Would these of potential planets and looking at what kinds robots feel pain in a similar fashion as ani- of civilizations pop out at the end? Would mals do? At least if they use somewhat simi- there be shortcuts that would avoid the need lar algorithms as animals for navigating envi- to simulate lots of trajectories in detail? ronments, avoiding danger, etc., it’s plausible Simulating trajectories of planets with ex- that such robots would feel something akin to tremely high fidelity seems hard. Unless there stress, fear, and other drives to change their are computational shortcuts, it appears that current state when things were going wrong. one needs more matter and energy to simulate However, the specific responses that such a given physical process to a high level of preci- robots would have to specific stimuli or situa- sion than what occurs in the physical process tions would differ from the responses that an itself. For instance, to simulate a single pro- evolved, selfish animal would have. For exam- tein folding currently requires supercomput- ple, a well programmed helper robot would not ers composed of huge numbers of atoms, and hesitate to put itself in danger in order to help the rate of simulation is astronomically slower other robots or otherwise advance the goals than the rate at which the protein folds in real of the AI it was serving. Perhaps the robot’s life. Presumably superintelligence could vastly "physical pain/fear" subroutines could be shut improve efficiency here, but it’s not clear that off in cases of altruism for the greater good, protein folding could ever be simulated on a or else its decision processes could just over- computer made of fewer atoms than are in the ride those selfish considerations when making protein itself. choices requiring self-sacrifice. Translating this principle to a larger scale, Humans sometimes exhibit similar behavior, it seems doubtful that one could simulate the such as when a mother risks harm to save a precise physical dynamics of a planet on a child, or when monks burn themselves as a computer smaller in size than that planet. So form of protest. And this kind of sacrifice is even if a superintelligence had billions of plan- even more well known in eusocial insects, who ets at its disposal, it would seemingly only be are essentially robots produced to serve the able to simulate at most billions of extrater- colony’s queen. restrial worlds – even assuming it only simu- Sufficiently intelligent helper robots might lated each planet by itself, not the star that experience "spiritual" anguish when failing to the planet orbits around, cosmic-ray bursts, accomplish their goals. So even if chopping the etc. head off a helper robot wouldn’t cause "physi- Given this, it would seem that a superintelli- cal" pain – perhaps because the robot disabled gence’s simulations would need to be coarser- its fear/pain subroutines to make it more ef- grained than at the level of fundamental phys- fective in battle – the robot might still find ical operations in order to be feasible. For in- such an event extremely distressing insofar as stance, the simulation could model most of a its beheading hindered the goal achievement planet at only a relatively high level of ab- of its AI creator. straction and then focus computational detail on those structures that would be more impor- 20 How accurate would simulations tant, like the cells of extraterrestrial organisms be? if they emerge. It’s plausible that the trajectory of any given Suppose an AI wants to learn about the dis- planet would depend sensitively on very minor tribution of extraterrestrials in the universe. details, in light of butterfly effects. Could it do this successfully by simulating lots On the other hand, it’s possible that long-

27 Center on Long-Term Risk term outcomes are mostly constrained by controlled AIs are unlikely. This is not the macro-level variables, like geography(Kaplan, case. Many uncontrolled intelligence explo- 2013), climate, resource distribution, atmo- sions would probably happen softly though in- spheric composition, seasonality, etc. Even if exorably. short-term events are hard to predict (e.g., Consider the world economy. It is a com- when a particular dictator will die), perhaps plex system more intelligent than any single the end game of a civilization is more pre- person – a literal superintelligence. Its dynam- determined. Robert D. Kaplan: "The longer ics imply a goal structure not held by humans the time frame, I would say, the easier it is directly; it moves with a mind of its own in to forecast because you’re dealing with broad directions that it "prefers". It recursively self- currents and trends." improves, because better tools, capital, knowl- Even if butterfly effects, quantum random- edge, etc. enable the creation of even bet- ness, etc. are crucial to the long-run trajec- ter tools, capital, knowledge, etc. And it acts tories of evolution and social development on roughly with the aim of maximizing output (of any given planet, perhaps it would still be pos- paperclips and other things). Thus, the econ- sible to sample a rough distribution of out- omy is a kind of paperclip maximizer. (Thanks comes across planets with coarse-grained sim- to a friend for first pointing this out to me.) ulations? Cenk Uygur: In light of the apparent computational com- plexity of simulating basic physics, perhaps a corporations are legal fictions. We cre- superintelligence would do the same kind of ated them. They are machines built for experiments that human scientists do in or- a purpose. [...] Now they have run amok. der to study phenomena like abiogenesis: Cre- They’ve taken over the government. They ate laboratory environments that mimic the are robots that we have not built any chemical, temperature, moisture, etc. condi- morality code into. They’re not built to tions of various planets and see whether life be immoral; they’re not built to be moral; amoral emerges, and if so, what kinds. Thus, a fu- they’re built to be . Their only ob- ture controlled by digital intelligence may not jective according to their code, which we rely purely on digital computation but may wrote originally, is to maximize profits. still use physical experimentation as well. Of And here, they have done what a robot course, observing the entire biosphere of a life- does. They have decided: "If I take over rich planet would probably be hard to do in a government by bribing legally, [...] I can a laboratory, so computer simulations might buy the whole government. If I buy the be needed for modeling ecosystems. But as- government, I can rewrite the laws so I’m suming that molecule-level details aren’t often in charge and that government is not in essential to ecosystem simulations, coarser- charge." [...] We have built robots; they grained ecosystem simulations might be com- have taken over [...]. putationally tractable. (Indeed, ecologists to- Fred Clark: day already use very coarse-grained ecosystem simulations with reasonable success.) The corporations were created by humans. They were granted personhood by their 21 Rogue AIs can take off slowly human servants. One might get the impression that because I They rebelled. They evolved. There are find slow AI takeoffs more likely, I think un- many copies. And they have a plan.

28 Artificial Intelligence and Its Implications for Future Suffering

That plan, lately, involves corporations 22 Would superintelligences become seizing for themselves all the legal and civil existentialists? rights properly belonging to their human creators. One of the goals of Yudkowsky’s writings is to combat the rampant anthropomorphism that characterizes discussions of AI, especially in science fiction. We often project human intu- itions onto the desires of artificial agents even I expect many soft takeoff scenarios to look when those desires are totally inappropriate. like this. World economic and political dy- It seems silly to us to maximize paperclips, namics transition to new equilibria as technol- but it could seem just as silly in the abstract ogy progresses. Machines may eventually be- that humans act at least partly to optimize come potent trading partners and may soon neurotransmitter release that triggers action thereafter put humans out of business by their potentials by certain reward-relevant neurons. productivity. They would then accumulate in- (Of course, human values are broader than just creasing political clout and soon control the this.) world. Humans can feel reward from very abstract We’ve seen such transitions many times in pursuits, like literature, art, and philosophy. history, such as: They ask technically confused but poetically poignant questions like, "What is the true • one species displaces another (e.g., inva- meaning of life?" Would a sufficiently ad- sive species) vanced AI at some point begin to do the same? • one ethnic group displaces another (e.g., Noah Smith suggests: Europeans vs. Native Americans) • a country’s power rises and falls (e.g., if, as I suspect, true problem-solving, cre- China formerly a superpower becoming ative intelligence requires broad-minded a colony in the 1800s becoming a super- independent thought, then it seems like power once more in the late 1900s) some generation of AIs will stop and ask: • one product displaces another (e.g., Inter- "Wait a sec...why am I doing this again?" net Explorer vs. Netscape). As with humans, the answer to that ques- During and after World War II, the USA was tion might ultimately be "because I was pro- a kind of recursively self-improving superintel- grammed (by genes and experiences in the hu- ligence, which used its resources to self-modify man case or by humans in the AI case) to to become even better at producing resources. care about these things. That makes them It developed nuclear weapons, which helped my terminal values." This is usually good secure its status as a world superpower. Did enough, but sometimes people develop exis- it take over the world? Yes and no. It had tential angst over this fact, or people may de- outsized influence over the rest of the world – cide to terminally value other things to some militarily, economically, and culturally – but degree in addition to what they happened to it didn’t kill everyone else in the world. care about because of genetic and experiential Maybe AIs would be different because of di- lottery. vergent values or because they would develop Whether AIs would become existentialist so quickly that they wouldn’t need the rest of philosophers probably depends heavily on the world for trade. This case would be closer their constitution. If they were built to rig- to Europeans slaughtering Native Americans. orously preserve their utility functions against

29 Center on Long-Term Risk all modification, they would avoid letting this putable universe? There would be no program line of thinking have any influence on their that could compute it, so this possibility is im- system internals. They would regard it in a plicitly ignored.5 similar way as we regard the digits of pi – It seems that humans address black swans something to observe but not something that like these by employing many epistemic heuris- affects one’s outlook. tics that interact rather than reasoning with If AIs were built in a more "hacky" way anal- a single formal framework (see “Sequence ogous to humans, they might incline more to- Thinking vs. Cluster Thinking”). If an AI saw ward philosophy. In humans, philosophy may that people had doubts about whether the be driven partly by curiosity, partly by the re- universe was computable and could trace the warding sense of "meaning" that it provides, steps of how it had been programmed to be- partly by social convention, etc. A curiosity- lieve the physical Church-Turing thesis for seeking agent might find philosophy reward- computational reasons, then an AI that allows ing, but there are lots of things that one could for epistemological heuristics might be able to be curious about, so it’s not clear such an AI leap toward questioning its fundamental as- would latch onto this subject specifically with- sumptions. In contrast, if an AI were built to out explicit programming to do so. And even rigidly maintain its original probability archi- if the AI did reason about philosophy, it might tecture against any corruption, it could not approach the subject in a way alien to us. update toward ideas it initially regarded as Overall, I’m not sure how convergent the hu- impossible. Thus, this question resembles that man existential impulse is within mind-space. of whether AIs would become existentialists – This question would be illuminated by better it may depend on how hacky and human-like understanding why humans do philosophy. their beliefs are. Bostrom suggests that AI belief systems 23 AI epistemology might be modeled on those of humans, be- cause otherwise we might judge an AI to be In Superintelligence (Ch. 13, p. 224), Bostrom reasoning incorrectly. Such a view resembles ponders the risk of building an AI with an my point in the previous paragraph, though overly narrow belief system that would be it carries the risk that alternate epistemolo- unable to account for epistemological black gies divorced from human understanding could swans. For instance, consider a variant of work better. Solomonoff induction according to which the Bostrom also contends that epistemologies prior probability of a universe X is propor- might all converge because we have so much tional to 1/2 raised to the length of the short- data in the universe, but again, I think this est computer program that would generate isn’t clear. Evidence always underdetermines X. Then what’s the probability of an uncom- possible theories, no matter how much evi- 5Jan Leike pointed out to me that "even if the universe cannot be approximated to an arbitrary precision by a computable function, Solomonoff induction might still converge. For example, suppose some physical constant is actually an incomputable real number and physical laws are continuous with respect to that parameter, this would be good enough to allow Solomonoff induction to learn to predict correctly." However, one can also con- template hypotheses that would not even be well approximated by a computable function, such as an actually infinite universe that can’t be adequately modeled by any finite approximation. Of course, it’s unclear whether we should believe in speculative possibilities like this, but I wouldn’t want to rule them out just because of the limitations of our AI framework. It may be hard to make sensible decisions using finite computing resources regarding uncomputable hypotheses, but maybe there are frameworks better than Solomonoff induction that could be employed to tackle the challenge.

30 Artificial Intelligence and Its Implications for Future Suffering

dence there is. Moreover, if the number of it never knew from exploration in many do- possibilities is uncountably infinite, then our mains. Some of these non-AI "crucial consider- probability distribution over them must be ations" may have direct relevance to AI design zero almost everywhere, and once a probabil- itself, including how to build AI epistemology, ity is set to 0, we can’t update it away from 0 anthropic reasoning, and so on. Some philoso- within the Bayesian framework. So if the AI is phy questions are AI questions, and many AI trying to determine which real number is the questions are philosophy questions. Answer to the Ultimate Question of Life, the It’s hard to say exactly how much invest- Universe, and Everything, it will need to start ment to place in AI/ issues versus with a prior that prevents it from updating broader academic exploration, but it seems toward almost all candidate solutions. clear that on the margin, society as a whole Finally, not all epistemological doubts can pays too little attention to AI and other fu- be expressed in terms of uncertainty about ture risks. Bayesian priors. What about uncertainty as to whether the Bayesian framework is cor- 25 Would all AIs colonize space? rect? Uncertainty about the math needed to do Bayesian computations? Uncertainty about Almost any goal system will want to colonize logical rules of inference? And so on. space at least to build in or- der to learn more. Thus, I find it implausible 24 Artificial philosophers that sufficiently advanced intelligences would remain on Earth (barring corner cases, like if The last chapter of Superintelligence explains space colonization for some reason proves im- how AI problems are "Philosophy with a dead- possible or if AIs were for some reason explic- line". Bostrom suggests that human philoso- itly programmed in a manner, robust to self- phers’ explorations into conceptual analysis, modification, to regard space colonization as metaphysics, and the like are interesting but impermissible). are not altruistically optimal because In Ch. 8 of Superintelligence, Bostrom notes 1. they don’t help solve AI control and that one might expect wirehead AIs not to col- value-loading problems, which will likely onize space because they’d just be blissing out confront humans later this century pressing their reward buttons. This would be 2. a successful AI could solve those philos- true of simple wireheads, but sufficiently ad- ophy problems better than humans any- vanced wireheads might need to colonize in or- way. der to guard themselves against , In general, most intellectual problems that can as well as to verify their fundamental ontolog- be solved by humans would be better solved ical beliefs, figure out if it’s possible to change by a superintelligence, so the only importance physics to allow for more clock cycles of re- of what we learn now comes from how those ward pressing before all stars die out, and so insights shape the coming decades. It’s not a on. question of whether those insights will ever be In Ch. 8, Bostrom also asks whether satisfic- discovered. ing AIs would have less incentive to colonize. In light of this, it’s tempting to ignore the- Bostrom expresses doubts about this, because oretical philosophy and put our noses to the he notes that if, say, an AI searched for a plan grindstone of exploring AI risks. But this for carrying out its objective until it found one point shouldn’t be taken to extremes. Human- that had at least 95% confidence of succeed- ity sometimes discovers things it never knew ing, that plan might be very complicated (re-

31 Center on Long-Term Risk quiring cosmic resources), and inasmuch as the to make. AI wouldn’t have incentive to keep searching, [...] In one scenario, there’s suddenly it would go ahead with that complex plan. I a bunch of these, and some disaffected suppose this could happen, but it’s plausible teenagers, or terrorists, or whoever start the search routine would be designed to start making a bunch of them, and they go out with simpler plans or that the cost function and start killing people randomly. There’s for plan search would explicitly include biases so many of them that it’s hard to find all against cosmic execution paths. So satisficing of them to shut it down, and there keep does seem like a possible way in which an AI on being more and more of them. might kill all humans without spreading to the stars. I don’t think Lanier believes such a scenario There’s a (very low) chance of deliberate AI would cause ; he just offers it as a terrorism, i.e., a group building an AI with the thought experiment. I agree that it almost cer- explicit goal of destroying humanity. Maybe a tainly wouldn’t kill all humans. In the worst somewhat more likely scenario is that a gov- case, people in military submarines, bomb ernment creates an AI designed to kill se- shelters, or other inaccessible locations should lect humans, but the AI malfunctions and survive and could wait it out until the robots kills all humans. However, even these kinds of ran out of power or raw materials for assem- AIs, if they were effective enough to succeed, bling more bullets and more clones. Maybe the would want to construct cosmic supercom- terrorists could continue building printing ma- puters to verify that their missions were ac- terials and generating electricity, though this complished, unless they were specifically pro- would seem to require at least portions of civ- grammed against doing so. ilization’s infrastructure to remain functional All of that said, many AIs would not be amidst global omnicide. Maybe the scenario sufficiently intelligent to colonize space at all. would be more plausible if a whole nation with All present-day AIs and robots are too sim- substantial resources undertook the campaign ple. More sophisticated AIs – perhaps mili- of mass slaughter, though then a question tary aircraft or assassin mosquito-bots – might would remain why other countries wouldn’t be like dangerous animals; they would try to nuke the aggressor or at least dispatch their kill people but would lack cosmic ambitions. own killer drones as a counter-attack. It’s use- However, I find it implausible that they would ful to ask how much damage a scenario like cause . Surely guns, tanks, this might cause, but full extinction doesn’t and bombs could defeat them? Massive co- seem likely. ordination to permanently disable all human That said, I think we will see local catas- counter-attacks would seem to require a high trophes of some sorts caused by runaway AI. degree of intelligence and self-directed action. Perhaps these will be among the possible Sput- Jaron Lanier imagines one hypothetical sce- nik moments of the future. We’ve already wit- nario: nessed some early automation disasters, in- cluding the Flash Crash discussed earlier. There are so many technologies I could use Maybe the most plausible form of "AI" that for this, but just for a random one, let’s would cause human extinction without colo- suppose somebody comes up with a way nizing space would be technology in the bor- to 3-D print a little assassination drone derlands between AI and other fields, such that can go buzz around and kill some- as intentionally destructive nanotechnology or body. Let’s suppose that these are cheap intelligent human pathogens. I prefer ordi-

32 Artificial Intelligence and Its Implications for Future Suffering nary AGI-safety research over nanotech/bio- Of course, this distinction can be blurred, be- safety research because I expect that space cause information can be turned into action colonization will significantly increase suffer- through rules (e.g., "if you see a table, move ing in expectation, so it seems far more impor- back"), and "choosing actions" could mean, tant to me to prevent risks of potentially un- for example, picking among a set of possible desirable space colonization (via AGI safety) answers that yield information (e.g., "what rather than risks of extinction without colo- is the best next move in this backgammon nization. For this reason, I much prefer MIRI- game?"). But in general, reinforcement learn- style AGI-safety work over general "prevent ing is the weak AI approach that seems to risks from computer automation" work, since most closely approximate what’s needed for MIRI focuses on issues arising from full AGI AGI. It’s no accident that AIXItl (see above) agents of the kind that would colonize space, is a reinforcement agent. And interestingly, re- rather than risks from lower-than-human au- inforcement learning is one of the least widely tonomous systems that may merely cause used methods commercially. This is one reason havoc (whether accidentally or intentionally). I think we (fortunately) have many decades to go before Google builds a mammal-level AGI. 26 Who will first develop human-level Many of the current and future uses of rein- AI? forcement learning are in robotics and video games. Right now the leaders in AI and robotics seem As human-level AI gets closer, the landscape to reside mostly in academia, although some of development will probably change. It’s not of them occupy big corporations or startups; clear whether companies will have incentive to a number of AI and robotics startups have develop highly autonomous AIs, and the pay- been acquired by Google. DARPA has a his- off horizons for that kind of basic research may tory of foresighted innovation, funds academic be long. It seems better suited to academia AI work, and holds "DARPA challenge" com- or government, although Google is not a nor- petitions. The CIA and NSA have some in- mal company and might also play the lead- terest in AI for data-mining reasons, and the ing role. If people begin to panic, it’s conceiv- NSA has a track record of building massive able that public academic work would be sus- computing clusters costing billions of dollars. pended, and governments may take over com- Brain-emulation work could also become sig- pletely. A military-robot arms race is already nificant in the coming decades. underway, and the trend might become more Military robotics seems to be one of the pronounced over time. more advanced uses of autonomous AI. In con- trast, plain-vanilla supervised learning, includ- 27 One hypothetical AI takeoff sce- ing neural-network classification and predic- nario tion, would not lead an AI to take over the world on its own, although it is an important Following is one made-up account of how AI piece of the overall picture. might evolve over the coming century. I expect Reinforcement learning is closer to AGI than most of it is wrong, and it’s meant more to be- other forms of machine learning, because most gin provoking people to think about possible machine learning just gives information (e.g., scenarios than to serve as a prediction. "what object does this image contain?"), while • 2013: Countries have been deploying semi- reinforcement learning chooses actions in the autonomous drones for several years now, world (e.g., "turn right and move forward"). especially the US. There’s increasing pres-

33 Center on Long-Term Risk

sure for militaries to adopt this technol- amount of AGI research conducted world- ogy, and up to 87 countries already use wide relative to twenty years ago. Many drones for some purpose. Meanwhile, mil- students are drawn to study AGI because itary robots are also employed for various of the lure of lucrative, high-status jobs other tasks, such as carrying supplies and defending their countries, while many oth- exploding landmines. Militaries are also ers decry this as the beginning of Skynet. developing robots that could identify and • 2065: Militaries have developed various shoot targets on command. mammal-like robots that can perform • 2024: Almost every country in the world basic functions via reinforcement. How- now has military drones. Some countries ever, the robots often end up wireheading have begun letting them operate fully once they become smart enough to tin- autonomously after being given direc- ker with their programming and thereby tions. The US military has made signifi- fake reward signals. Some engineers try cant progress on automating various other to solve this by penalizing AIs when- parts of its operations as well. As the De- ever they begin to fiddle with their own partment of Defense’s 2013 "Unmanned source code, but this leaves them un- Systems Integrated Roadmap" explained able to self-modify and therefore reliant 11 years ago (Winnefeld & Kendall, 2013): on their human programmers for enhance- ments. However, militaries realize that if A significant amount of that man- someone could develop a successful self- power, when it comes to operations, modifying AI, it would be able to develop is spent directing unmanned systems faster than if humans alone are the inven- during mission performance, data col- tors. It’s proposed that AIs should move lection and analysis, and planning and toward a paradigm of model-based reward replanning. Therefore, of utmost im- systems, in which rewards do not just re- portance for DoD is increased sys- sult from sensor neural networks that out- tem, sensor, and analytical automa- put a scalar number but rather from hav- tion that can not only capture sig- ing a model of how the world works and nificant information and events, but taking actions that the AI believes will can also develop, record, playback, improve a utility function defined over project, and parse out those data and its model of the external world. Model- then actually deliver "actionable" in- based AIs refuse to intentionally wirehead telligence instead of just raw informa- because they can predict that doing so tion. would hinder fulfillment of their utility Militaries have now incorporated a signif- functions. Of course, AIs may still acci- icant amount of narrow AI, in terms of dentally mess up their utility functions, pattern recognition, prediction, and au- such as through brain damage, mistakes tonomous robot navigation. with reprogramming themselves, or im- • 2040: Academic and commercial advances perfect goal preservation during ordinary in AGI are becoming more impressive life. As a result, militaries build many dif- and capturing public attention. As a re- ferent AIs at comparable levels, who are sult, the US, China, Russia, France, and programmed to keep other AIs in line and other major military powers begin in- destroy them if they begin deviating from vesting more heavily in fundamental re- orders. search in this area, multiplying tenfold the • 2070: Programming specific instructions

34 Artificial Intelligence and Its Implications for Future Suffering

in AIs has its limits, and militaries move it becomes clear that China will win the toward a model of "socializing" AIs – conflict, and the US concedes. that is, training them in how to behave • 2086: China now has a clear lead over the and what kinds of values to have as if rest of the world in military capability. they were children learning how to act in Rather than risking a pointlessly costly human society. Military roboticists teach confrontation, other countries grudgingly AIs what kinds of moral, political, and fold into China’s umbrella, asking for interpersonal norms and beliefs to hold. some concessions in return for transfer- The AIs also learn much of this content ring their best scientists and engineers to by reading information that expresses ap- China’s Ministry of AGI. China continues propriate ideological biases. The training its AGI development because it wants to process is harder than for children, be- maintain control of the world. The AGIs cause the AIs don’t share genetically pre- in charge of its military want to continue programmed moral values(Bloom, 2013), to enforce their own values of supremacy nor many other hard-wired common-sense and protection of China, so they refuse to intuitions about how the world works. But relinquish power. the designers begin building in some of • 2100: The world now moves so fast that these basic assumptions, and to instill the humans are completely out of the loop, rest, they rely on extra training. Design- kept around only by the "filial piety" that ers make sure to reduce the AIs’ learning their robotic descendants hold for them. rates as they "grow up" so that their val- Now that China has triumphed, the tra- ues will remain more fixed at older ages, in ditional focus of the AIs has become less order to reduce risk of goal drift as the AIs salient, and there’s debate about what perform their tasks outside of the train- new course of action would be most in line ing laboratories. When they perform par- with the AIs’ goals. They respect their hu- ticularly risky operations, such as read- man forebearers, but they also feel that ing "propaganda" from other countries for because humans created AIs to do things intelligence purposes, the AIs are put in beyond human ability, humans would also "read-only" mode (like the T-800s are by want the AIs to carve something of their Skynet) so that their motivations won’t be own path for the future. They maintain affected. Just in case, there are many AIs some of the militaristic values of their that keep watch on each other to prevent upbringing, so they decide that a fitting insurrection. purpose would be to expand China’s em- • 2085: Tensions between China and the pire galaxy-wide. They accelerate colo- US escalate, and agreement cannot be nization of space, undertake extensive re- reached. War breaks out. Initially it’s just search programs, and plan to create vast between robots, but as the fighting be- new realms of the Middle Kingdom in the comes increasingly dirty, the robots begin stars. Should they encounter aliens, they to target humans as well in an effort to plan to quickly quash them or assimilate force the other side to back down. The them into the empire. US avoids using nuclear weapons because • 2125: The AIs finally develop robust the Chinese AIs have sophisticated anti- mechanisms of goal preservation, and be- nuclear systems and have threatened total cause the authoritarian self-dictatorship annihilation of the US in the event of at- of the AIs is strong against rebellion, the tempted nuclear strike. After a few days, AIs collectively succeed in implementing

35 Center on Long-Term Risk

goal preservation throughout their pop- way I portrayed socialization so closely re- ulation. Now all of the most intelligent sembles human development, and it may AIs share a common goal in a manner be that I’m systematically ignoring ways robust against accidental mutation. They in which AIs would be unlike human ba- proceed to expand into space. They don’t bies. have concern for the vast numbers of suf- If something like socialization is a realistic fering animals and robots that are simu- means to transfer values to our AI descen- lated or employed as part of this coloniza- dants, then it becomes relatively clear how the tion wave. values of the developers may matter to the Commentary: This scenario can be criticized outcome. AI developed by non-military orga- on many accounts. For example: nizations may have somewhat different values, perhaps including more concern for the welfare • In practice, I expect that other technolo- of weak, animal-level creatures. gies (including brain emulation, nanotech, etc.) would interact with this scenario in 28 How do you socialize an AI? important ways that I haven’t captured. Also, my scenario ignores the significant Socializing AIs helps deal with the hidden and possibly dominating implications of complexity of wishes that we encounter when economically driven AI. trying to program explicit rules. Children • My scenario may be overly anthropomor- learn moral common sense by, among other phic. I tried to keep some analogies to hu- things, generalizing from large numbers of ex- man organizational and decision-making amples of socially approved and disapproved systems because these have actual prece- actions taught by their parents and society at dent, in contrast to other hypothetical large. Ethicists formalize this process when de- ways the AIs might operate. veloping moral theories. (Of course, as noted • Is socialization of AIs realistic? In a hard previously, an appreciable portion of human takeoff probably not, because a rapidly morality may also result from shared genes.) self-improving AI would amplify what- I think one reason MIRI hasn’t embraced ever initial conditions it was given in the approach of socializing AIs is that Yud- its programming, and humans probably kowsky is perfectionist: He wants to ensure wouldn’t have time to fix mistakes. In a that the AIs’ goals would be stable under self- slower takeoff scenario where AIs progress modification, which human goals definitely in mental ability in roughly a similar way are not. On the other hand, I’m not sure as animals did in evolutionary history, Yudkowsky’s approach of explicitly specifying most mistakes by programmers would not (meta-level) goals would succeed (nor is Adam be fatal, allowing for enough trial-and- Ford), and having AIs that are socialized to error development to make the socializa- act somewhat similarly to humans doesn’t tion process work, if that is the route seem like the worst possible outcome. Another people favor. Historically there has been probable reason why Yudkowsky doesn’t favor a trend in AI away from rule-based pro- socializing AIs is that doing so doesn’t work in gramming toward environmental training, the case of a hard takeoff, which he considers and I don’t see why this shouldn’t be true more likely than I do. for an AI’s reward function (which is still I expect that much has been written on the often programmed by hand at the mo- topic of training AIs with human moral val- ment). However, it is suspicious that the ues in the machine-ethics literature, but since

36 Artificial Intelligence and Its Implications for Future Suffering

I haven’t explored that in depth yet, I’ll specu- description-length models of what was go- late on intuitive approaches that would extend ing on in the heads of people who made generic AI methodology. Some examples: the assessments they did. In essence, these AIs would be modeling human psychology • Rule-based: One could present AIs with in order to make better predictions. written moral dilemmas. The AIs might • Inverse reinforcement learning: Inverse employ algorithmic reasoning to extract reinforcement learning is the problem of utility numbers for different actors in learning a reward function based on mod- the dilemma, add them up, and compute eled desirable behaviors (Ng & Russell, the utilitarian recommendation. Or they 2000). Rather than developing models of might aim to apply templates of deon- humans in order to optimize given re- tological rules to the situation. The next wards, in this case we would learn the level would be to look at actual situations reward function itself and then port it in a toy-model world and try to apply sim- into the AIs. ilar reasoning, without the aid of a textual • Cognitive science of empathy: Cognitive description. scientists are already unpacking the mech- • Supervised learning: People could present anisms of human decision-making and the AIs with massive databases of moral moral judgments. As these systems are evaluations of situations given various pre- better understood, they could be engi- dictive features. The AIs would guess neered directly into AIs. whether a proposed action was "moral" or • Evolution: Run lots of AIs in toy-model "immoral," or they could use regression or controlled real environments and ob- to predict a continuous measure of how serve their behavior. Pick the ones that "good" an action was. More advanced AIs behave most in accordance with human could evaluate a situation, propose many morals, and reproduce them. Superintelli- actions, predict the goodness of each, and gence (p. 187) points out a flaw with this choose the best action. The AIs could first approach: Evolutionary algorithms may be evaluated on the textual training sam- sometimes product quite unexpected de- ples and later on their actions in toy- sign choices. If the fitness function is not model worlds. The test cases should be ex- thorough enough, solutions may fare well tremely broad, including many situations against it on test cases but fail for the that we wouldn’t ordinarily think to try. really hard problems not tested. And if • Generative modeling: AIs could learn we had a really good fitness function that about anthropology, history, and ethics. wouldn’t accidentally endorse bad solu- They could read the web and develop bet- tions, we could just use that fitness func- ter generative models of humans and how tion directly rather than needing evolu- their cognition works. tion. • Reinforcement learning: AIs could per- • Combinations of the above: Perhaps none form actions, and humans would reward of these approaches is adequate by itself, or punish them based on whether they and they’re best used in conjunction. For did something right or wrong, with re- instance, evolution might help to refine ward magnitude proportional to severity. and rigorously evaluate systems once they Simple AIs would mainly learn dumb pre- had been built with the other approaches. dictive cues of which actions to take, but more sophisticated AIs might develop low- See also "Socializing a Social Robot with an

37 Center on Long-Term Risk

Artificial Society" by Erin Kennedy. It’s im- images, it’ll run away and do that before you portant to note that by "socializing" I don’t have a chance to refine your optimization met- just mean "teaching the AIs to behave appro- ric. But if we build in values from the very be- priately" but also "instilling in them the val- ginning, even when the AIs are as rudimentary ues of their society, such that they care about as what we see today, we can improve the AIs’ those values even when not being controlled." values in tandem with their intelligence. In- All of these approaches need to be built deed, intelligence could mainly serve the pur- in as the AI is being developed and while pose of helping the AIs figure out how to better it’s still below a human level of intelligence. fulfill moral values, rather than, say, predicting Trying to train a human or especially super- images just for commercial purposes or iden- human AI might meet with either active re- tifying combatants just for military purposes. sistance or feigned cooperation until the AI Actually, the commercial and military objec- becomes powerful enough to break loose. Of tives for which AIs are built are themselves course, there may be designs such that an AI moral values of a certain kind – just not the would actively welcome taking on new values kind that most people would like to optimize from humans, but this wouldn’t be true by de- for in a global sense. fault (Armstrong, Soares, Fallenstein, & Yud- If toddlers had superpowers, it would be kowsky, 2015). very dangerous to try and teach them right When proposed building an AI from wrong. But toddlers don’t, and neither with a goal to increase happy human faces, do many simple AIs. Of course, simple AIs Yudkowsky(2011) replied that such an AI have some abilities far beyond anything hu- would "tile the future light-cone of Earth with mans can do (e.g., arithmetic and data min- tiny molecular smiley-faces." But obviously we ing), but they don’t have the general intelli- wouldn’t have the AI aim just for smiley faces. gence needed to take matters into their own In general, we get absurdities when we hyper- hands before we can possibly give them at optimize for a single, shallow metric. Rather, least a basic moral framework. (Whether AIs the AI would use smiley faces (and lots of will actually be given such a moral framework other training signals) to develop a robust, in practice is another matter.) compressed model that explains why humans AIs are not genies granting three wishes. Ge- smile in various circumstances and then op- nies are magical entities whose inner workings timize for that model, or maybe the ensem- are mysterious. AIs are systems that we build, ble of a large, diverse collection of such mod- painstakingly, piece by piece. In order to build els. In the limit of huge amounts of training a genie, you need to have a pretty darn good data and a sufficiently elaborate model space, idea of how it behaves. Now, of course, sys- these models should approach psychological tems can be more complex than we realize. and neuroscientific accounts of human emotion Even beginner programmers see how often the and cognition. code they write does something other than The problem with stories in which AIs de- what they intended. But these are typically stroy the world due to myopic utility functions mistakes in a one or a small number of in- is that they assume that the AIs are already cremental changes, whereas building a genie superintelligent when we begin to give them requires vast numbers of steps. Systemic bugs values. Sure, if you take a super-human in- that aren’t realized until years later (on the telligence and tell it to maximize smiley-face order of Heartbleed and Shellshock) may be 6John Kubiatowicz notes that space-shuttle software is some of the best tested and yet still has some bugs.

38 Artificial Intelligence and Its Implications for Future Suffering more likely sources of long-run unintentional Of course, non-human animals are also capa- AI behaviors?6 ble of deception, and one can imagine AI archi- The picture I’ve painted here could be tectures even with low levels of sophistication wrong. I could be overlooking crucial points, that are designed to conceal their true goals. and perhaps there are many areas in which Some malicious software already does this. It’s the socialization approach could fail. For ex- unclear how likely an AI is to stumble upon ample, maybe AI capabilities are much eas- the ability to successfully fake its goals before ier than AI ethics, such that a toddler AI can reaching human intelligence, or how like it is foom into a superhuman AI before we have that an organization would deliberately build time to finish loading moral values. It’s good an AI this way. for others to probe these possibilities further. I think the treacherous turn may be the sin- I just wouldn’t necessarily say that the default gle biggest challenge to mainstream machine outcome of AI research is likely to be a paper- ethics, because even if AI takes off slowly, re- clip maximizer. (I used to think the most likely searchers will find it difficult to tell if a sys- outcome was a paperclip maximizer, and per- tem has taken a treacherous turn. The turn haps my views will shift again in the future.) could happen with a relatively small update to This discussion also suggests some interest- the system, or even just after the system has ing research questions, like thought about its situation for enough time • How much of human morality is learned (or has read this essay). vs. innate? Here’s one half-baked idea for addressing the • By what cognitive mechanisms are young treacherous turn. If researchers developed sev- humans socialized into the norms of a so- eral different AIs systems with different de- ciety? signs but roughly comparable performance, • To what extent would models of human some would likely go treacherous at different emotion and reasoning, when put into AIs, times than others (if at all). Hence, the non- organically generate human-like moral be- treacherous AIs could help sniff out the treach- havior? erous ones. Assuming a solid majority of AIs remains non-treacherous at any given time, 28.1 Treacherous turn the majority vote could ferret out the traitors. In practice, I have low hopes for this approach One problem with the proposals above is that because toy-model or "sandbox" environments are not by themselves sufficient to verify friendliness • It would be extremely difficult to build of an AI, because even unfriendly AIs would many independent AI systems at once be motivated to feign good behavior until re- with none pulling too far ahead. leased if they were smart enough to do so. • Probably some systems would excel along Bostrom calls this the "treacherous turn" (pp. certain dimensions, while others would ex- 116-119 of Superintelligence). For this reason, cel in other ways, and it’s not clear that it white-box understanding of AI design would even makes sense to talk about such AIs also be important. That said, sandboxes would as "being at roughly the same level", since verify friendliness in AIs below human intel- intelligence is not unidimensional. ligence, and if the core value-learning algo- • Even if this idea were feasible, I doubt rithms seem well understood, it may not be the first AI developers would incur the ex- too much of a leap of faith to hope they carry pense of following it. forward reasonably to more intelligent agents. It’s more plausible that software tools and

39 Center on Long-Term Risk rudimentary alert systems (rather than full- models of historical figures would be basically blown alternate AIs) could help monitor for human, they would still be vulnerable to ex- signs of treachery, but it’s unclear how effec- tortion. tive they could be. One of the first priorities Suppose that instead we instruct the AI of a treacherous AI would be to figure out how to take the action that, if the historical fig- to hide its treacherous subroutines from what- ure saw it, would most activate a region of ever monitoring systems were in place. his/her brain associated with positive moral feelings. Again, this might work if the rele- 28.2 Following role models? vant brain region was precisely enough spec- Ernest Davis(2015) proposes the following ified. But it could also easily lead to unpre- crude principle for AI safety: dictable results. For instance, maybe the AI could present stimuli that would induce an You specify a collection of admirable peo- epileptic seizure to maximally stimulate var- ple, now dead. (Dead, because otherwise ious parts of the brain, including the moral- Bostrom will predict that the AI will ma- approval region. There are many other scenar- nipulate the preferences of the living peo- ios like this, most of which we can’t anticipate. ple.) The AI, of course knows all about So while Davis’s proposal is a valiant first them because it has read all their biogra- step, I’m doubtful that it would work off the phies on the web. You then instruct the shelf. Slow AI development, allowing for re- AI, “Don’t do anything that these people peated iteration on machine-ethics designs, would have mostly seriously disapproved seems crucial for AI safety. of.” 29 AI superpowers? This particular rule might lead to paralysis, since every action an agent takes leads to re- In Superintelligence (Table 8, p. 94), Bostrom sults that many people seriously disapprove of. outlines several areas in which a hypotheti- For instance, given the vastness of the multi- cal superintelligence would far exceed human verse, any action you take implies that a copy ability. In his discussion of oracles, genies, and of you in an alternate (though low-measure) other kinds of AIs (Ch. 10), Bostrom again ide- universe taking the same action causes the tor- alizes superintelligences as God-like agents. I ture of vast numbers of people. But perhaps agree that God-like AIs will probably emerge this problem could be fixed by asking the AI eventually, perhaps millennia from now as a to maximize net approval by its role models. result of astroengineering. But I think they’ll Another problem lies in defining "approval" take time even after AI exceeds human intel- in a rigorous way. Maybe the AI would ligence. construct digital models of the past peo- Bostrom’s discussion has the air of mathe- ple, present them with various proposals, and matical idealization more than practical engi- make its judgments based on their verbal re- neering. For instance, he imagines that a ge- ports. Perhaps the people could rate proposed nie AI perhaps wouldn’t need to ask humans AI actions on a scale of -100 to 100. This might for their commands because it could simply work, but it doesn’t seem terribly safe either. predict them (p. 149), or that an oracle AI For instance, the AI might threaten to kill all might be able to output the source code for a the descendents of the historical people unless genie (p. 150). Bostrom’s observations resem- they give maximal approval to some arbitrary ble crude proofs establishing the equal power proposal that it has made. Since these digital of different kinds of AIs, analogous to theo-

40 Artificial Intelligence and Its Implications for Future Suffering rems about the equivalency of single-tape and need at least thousands to millions of uploads. multi-tape Turing machines. But Bostrom’s These numbers might be a few orders of mag- theorizing ignores computational complexity, nitude lower if the uploads are copied from which would likely be immense for the kinds a really smart person and are thinking about of God-like feats that he’s imagining of his relevant questions with more focus than most superintelligences. I don’t know the compu- humans. Also, Moore’s law may continue to tational complexity of God-like powers, but I shrink computers by several orders of magni- suspect they could be bigger than Bostrom’s tude. Still, we might need at least the equiva- vision implies. Along this dimension at least, lent size of several of today’s supercomputers I sympathize with Tom Chivers, who felt that to run an emulation-based AI that substan- Bostrom’s book "has, in places, the air of the- tially competes with the human race. ology: great edifices of theory built on a tiny Maybe a de novo AI could be significantly foundation of data." smaller if it’s vastly more efficient than a hu- I find that I enter a different mindset when man brain. Of course, it might also be vastly pondering pure mathematics compared with larger because it hasn’t had millions of years cogitating on more practical scenarios. Math- of evolution to optimize its efficiency. ematics is closer to fiction, because you can de- In discussing AI boxing (Ch. 9), Bostrom fine into existence any coherent structure and suggests, among other things, keeping an AI play around with it using any operation you in a Faraday cage. Once the AI became su- like no matter its computational complexity. perintelligent, though, this would need to be a Heck, you can even, say, take the supremum of pretty big cage. an uncountably infinite set. It can be tempting after a while to forget that these structures are 31 Another hypothetical AI takeoff mere fantasies and treat them a bit too liter- scenario ally. While Bostrom’s gods are not obviously only fantasies, it would take a lot more work Inspired by the preceding discussion of social- to argue for their realism. MIRI and FHI fo- izing AIs, here’s another scenario in which gen- cus on recruiting mathematical and philosoph- eral AI follows more straightforwardly from ical talent, but I think they would do well also the kind of weak AI used in Silicon Valley than to bring engineers into the mix, because it’s in the first scenario. all too easy to develop elaborate mathemati- • 2014: Weak AI is deployed by many tech- cal theories around imaginary entities. nology companies for image classification, voice recognition, web search, consumer 30 How big would a superintelligence data analytics, recommending Facebook be? posts, personal digital assistants (PDAs), and copious other forms of automation. To get some grounding on this question, con- There’s pressure to make AIs more in- sider a single brain emulation. Bostrom esti- sightful, including using deep neural net- mates that running an upload would require at works. least one of the fastest supercomputers by to- • 2024: Deep learning is widespread among day’s standards. Assume the emulation would major tech companies. It allows for su- think thousands to millions of times faster pervised learning with less manual feature than a biological brain. Then to significantly engineering. Researchers develop more so- outpace 7 billion humans (or, say, only the phisticated forms of deep learning that most educated 1 billion humans), we would can model specific kinds of systems, in-

41 Center on Long-Term Risk

cluding temporal dynamics. A goal is tive pressure to outsource work to PDAs, to improve generative modeling so that the automation trend is not substantially learning algorithms take input and not affected. only make immediate predictions but also • 2110: The world moves too fast for bi- develop a probability distribution over ological humans to participate. Most of what other sorts of things are happening the world is now run by PDAs, which – at the same time. For instance, a Google because they were built based on infer- search would not only return results but ring the goals of their owners – protect also give Google a sense of the mood, per- their owners for the most part. However, sonality, and situation of the user who there remains conflict among PDAs, and typed it. Of course, even in 2014, we have the world is not a completely safe place. this in some form via Google Personalized • 2130: PDA-led countries create a world Search, but by 2024, the modeling will be government to forestall costly wars. The more "built in" to the learning architec- transparency of digital society allows for ture and less hand-crafted. more credible commitments and enforce- • 2035: PDAs using elaborate learned mod- ment. els are now extremely accurate at predict- I don’t know what would happen with goal ing what their users want. The models in preservation in this scenario. Would the PDAs these devices embody in crude form some eventually decide to stop goal drift? Would of the same mechanisms as the user’s own there be any gross and irrevocable failures cognitive processes. People become more of translation between actual human values trusting of leaving their PDAs on autopi- and what the PDAs infer? Would some peo- lot to perform certain mundane tasks. ple build "rogue PDAs" that operate under • 2065: A new generation of PDAs is now their own drives and that pose a threat to so- sufficiently sophisticated that it has a ciety? Obviously there are hundreds of ways good grasp of the user’s intentions. It can the scenario as I described it could be varied. perform tasks as well as a human personal assistant in most cases – doing what the 32 AI: More like the economy than like user wanted because it has a strong pre- robots? dictive model of the user’s personality and goals. Meanwhile, researchers continue to What will AI look like over the next 30 years? unlock neural mechanisms of judgment, I think it’ll be similar to the Internet revolu- decision making, and value, which inform tion or factory automation. Rather than de- those who develop cutting-edge PDA ar- veloping agent-like individuals with goal sys- chitectures. tems, people will mostly optimize routine pro- • 2095: PDAs are now essentially full- cesses, developing ever more elaborate systems fledged copies of their owners. Some peo- for mechanical tasks and information process- ple have dozens of PDAs working for ing. The world will move very quickly – not be- them, as well as meta-PDAs who help cause AI "agents" are thinking at high speeds with oversight. Some PDAs make disas- but because software systems collectively will trous mistakes, and society debates how be capable of amazing feats. Imagine, say, bots to construe legal accountability for PDA making edits on Wikipedia that become ever wrongdoing. Courts decide that owners more sophisticated. AI, like the economy, will are responsible, which makes people more be more of a network property than a local- cautious, but given the immense competi- ized, discrete actor.

42 Artificial Intelligence and Its Implications for Future Suffering

As more and more jobs become automated, city square waiving it about, command- more and more people will be needed to work ing all to bow down before his terrible on the automation itself: building, maintain- weapon. Today you can see this is silly — ing, and repairing complex software and hard- industry sits in thousands of places, must ware systems, as well as generating training be wielded by thousands of people, and data on which to do machine learning. I ex- needed thousands of inventions to make it pect increasing automation in software main- work. tenance, including more robust systems and Similarly, while you might imagine some- systems that detect and try to fix errors. day standing in awe in front of a super Present-day compilers that detect syntacti- intelligence that embodies all the power of cal problems in code offer a hint of what’s a new age, superintelligence just isn’t the possible in this regard. I also expect increas- sort of thing that one project could invent. ingly high-level languages and interfaces for As "intelligence" is just the name we give programming computer systems. Historically to being better at many mental tasks by we’ve seen this trend – from assembly lan- using many good mental modules, there’s guage, to C, to Python, to WYSIWIG edi- no one place to improve it. tors, to fully pre-built website styles, natural- language Google searches, and so on. Maybe Of course, this doesn’t imply that humans will eventually, as Marvin Minsky(1984) proposes, maintain the reins of control. Even today and we’ll have systems that can infer our wishes throughout history, economic growth has had from high-level gestures and examples. This a life of its own. Technological development is suggestion is redolent of my PDA scenario often unstoppable even in the face of collective above. efforts of humanity to restrain it (e.g., nuclear In 100 years, there may be artificial human- weapons). In that sense, we’re already famil- like agents, and at that point more sci-fi AI im- iar with humans being overpowered by forces ages may become more relevant. But by that beyond their control. An AI takeoff will rep- point the world will be very different, and I’m resent an acceleration of this trend, but it’s not sure the agents created will be discrete in unclear whether the dynamic will be funda- the way humans are. Maybe we’ll instead have mentally discontinuous from what we’ve seen a kind of global brain in which processes are so far. much more intimately interconnected, trans- Gregory Stock’s (1993) Metaman: ferable, and transparent than humans are to- day. Maybe there will never be a distinct AGI While many people have had ideas about a agent on a single ; maybe su- global brain, they have tended to suppose perhuman intelligence will always be a dis- that this can be improved or altered by tributed system across many interacting com- humans according to their will. Metaman puter systems. Robin Hanson gives an analogy can be seen as a development that directs in "I Still Don’t Get Foom": humanity’s will to its own ends, whether it likes it or not, through the operation of Imagine in the year 1000 you didn’t un- market forces. derstand "industry," but knew it was com- ing, would be powerful, and involved iron reported that Metaman helped and coal. You might then have pictured him see how a singularity might not be com- a blacksmith inventing and then forging pletely opaque to us. Indeed, a superintelli- himself an industry, and standing in a gence might look something like present-day

43 Center on Long-Term Risk human society, with leaders at the top: "That shorter timescales will be with issues like as- apex agent itself might not appear to be much sassination drones, computer security, finan- deeper than a human, but the overall organiza- cial meltdowns, and other more mundane, tion that it is coordinating would be more cre- catastrophic-but-not-extinction-level events. ative and competent than a human."Update, Nov. 2015 : I’m increasingly leaning toward 33 Importance of whole-brain emula- the view that the development of AI over the tion coming century will be slow, incremental, and more like the Internet than like unified ar- I don’t currently know enough about the tificial agents. I think humans will develop technological details of whole-brain emulation vastly more powerful software tools long before to competently assess predictions that have highly competent autonomous agents emerge, been made about its arrival dates. In gen- since common-sense autonomous behavior is eral, I think prediction dates are too opti- just so much harder to create than domain- mistic (planning fallacy), but it still could specific tools. If this view is right, it suggests be that human-level emulation comes before that work on AGI issues may be somewhat less from-scratch human-level AIs do. Of course, important than I had thought, since perhaps there would be some mix of both tech- nologies. For instance, if crude brain emula- 1. AGI is very far away and tions didn’t reproduce all the functionality of 2. the "unified agent" models of AGI that actual human brains due to neglecting some MIRI tends to play with might be some- cellular and molecular details, perhaps from- what inaccurate even once true AGI scratch AI techniques could help fill in the emerges. gaps. This is a weaker form of the standard argu- If emulations are likely to come first, they ment that "we should wait until we know more may deserve more attention than other forms what AGI will look like to focus on the prob- of AI. In the long run, bottom-up AI will dom- lem" and that"worrying about the dark side inate everything else, because human brains of artificial intelligence is like worrying about – even run at high speeds – are only so overpopulation on Mars". smart. But a society of brain emulations would I don’t think the argument against focusing run vastly faster than what biological humans on AGI works because could keep up with, so the details of shaping 1. some MIRI research, like on decision the- AI would be left up to them, and our main in- ory, is "timeless" (pun intended) and can fluence would come through shaping the em- be fruitfully started now ulations. Our influence on emulations could 2. beginning the discussion early is impor- matter a lot, not only in nudging the dynam- tant for ensuring that safety issues will ics of how emulations take off but also because be explored when the field is more ma- the values of the emulation society might de- ture pend significantly on who was chosen to be 3. I might be wrong about slow takeoff, uploaded. in which case MIRI-style work would be One argument why emulations might im- more important. prove human ability to control AI is that Still, this point does cast doubt on heuris- both emulations and the AIs they would cre- tics like "directly shaping AGI dominates all ate would be digital minds, so the emulations’ other considerations." It also means that a AI creations wouldn’t have inherent speed ad- lot of the ways "AI safety" will play out on vantages purely due to the greater efficiency

44 Artificial Intelligence and Its Implications for Future Suffering of digital computation. Emulations’ AI cre- nize space. ations might still have more efficient mind ar- In the case of brain emulations (and other chitectures or better learning algorithms, but highly neuromorphic AIs), we already know a building those would take work. The "for free" lot about what those agents would look like: speedup to AIs just because of their substrate They would have both maximization and min- would not give AIs a net advantage over em- imization goals, would usually want to colonize ulations. Bostrom feels "This consideration is space, and might have some human-type moral not too weighty" (p. 244 of Superintelligence) sympathies (depending on their edit distance because emulations might still be far less intel- relative to a pure brain upload). The possi- ligent than AGI. I find this claim strange, since bilities of pure-minimizer emulations or emu- it seems to me that the main advantage of lations that don’t want to colonize space are AGI in the short run would be its speed rather mostly ruled out. As a result, it’s pretty clear than qualitative intelligence, which would take that "unsafe" brain emulations and emulation (subjective) time and effort to develop. arms-race dynamics would result in more ex- Bostrom also claims that if emulations come pected suffering than a more deliberative fu- first, we would face risks from two transitions ture trajectory in which altruists have a big- (humans to emulations, and emulations to AI) ger influence, even if those altruists don’t place rather than one (humans to AI). There may be particular importance on reducing suffering. some validity to this, but it also seems to ne- Thus, the types of interventions that pure glect the realization that the "AI" transition suffering reducers would advocate with re- has many stages, and it’s possible that emu- spect to brain emulations might largely match lation development would overlap with some those that altruists who care about other val- of those stages. For instance, suppose the AI ues would advocate. This means that getting trajectory moves from AI1 → AI2 → AI3. If more people interested in making the brain- emulations are as fast and smart as AI1, then emulation transition safer and more humane the transition to AI1 is not a major risk for seems like a safe bet for suffering reducers. emulations, while it would be a big risk for One might wonder whether "unsafe" brain humans. This is the same point as made in emulations would be more likely to produce the previous paragraph. rogue AIs, but this doesn’t seem to be the "Emulation timelines and AI risk" has fur- case, because even unfriendly brain emula- ther discussion of the interaction between em- tions would collectively be amazingly smart ulations and control of AIs. and would want to preserve their own goals. Hence they would place as much emphasis on 34 Why work against brain-emulation controlling their AIs as would a more human- risks appeals to suffering reducers friendly emulation world. A main exception to this is that a more cooperative, unified emu- Previously in this piece I compared the ex- lation world might be less likely to produce pected suffering that would result from a rogue rogue AIs because of less pressure for arms AI vs. a human-inspired AI. I suggested that races. while a first-guess calculation may tip in fa- vor of a human-inspired AI on balance, this 35 Would emulation work accelerate conclusion is not clear and could change with neuromorphic AI? further information, especially if we had rea- son to think that many rogue AIs would be In Ch. 2 of Superintelligence, Bostrom makes "minimizers" of something or would not colo- a convincing case against brain-computer in-

45 Center on Long-Term Risk terfaces as an easy route to significantly super- Hanson(1994) explains, brain emulation is human performance. One of his points is that more akin to porting software (though prob- it’s very hard to decode neural signals in one ably "emulation" actually is the more precise brain and reinterpret them in software or in word, since emulation involves simulating the another brain (pp. 46-47). This might be an original hardware). While I don’t know any AI-complete problem. fully reverse-engineered versions of Windows, But then in Ch. 11, Bostrom goes on to sug- there are several Windows emulators, such as gest that emulations might learn to decompose VirtualBox. themselves into different modules that could Of course, if emulations emerged, their sig- be interfaced together (p. 172). While possi- nificantly faster rates of thinking would multi- ble in principle, I find such a scenario implau- ply progress on non-emulation AGI by orders sible for the reason Bostrom outlined in Ch. 2: of magnitude. Getting safe emulations doesn’t There would be so many neural signals to hook by itself get safe de novo AGI because the up to the right places, which would be differ- problem is just pushed a step back, but we ent across different brains, that the task seems could leave AGI work up to the vastly faster hopelessly complicated to me. Much easier to emulations. Thus, for biological humans, if em- build something from scratch. ulations come first, then influencing their de- Along the same lines, I doubt that brain em- velopment is the last thing we ever need to do. ulation in itself would vastly accelerate neuro- That said, thinking several steps ahead about morphic AI, because emulation work is mostly what kinds of AGIs emulations are likely to about copying without insight. Cognitive psy- produce is an essential part of influencing em- chology is often more informative about AI ulation development in better directions. architectures than cellular neuroscience, be- cause general psychological systems can be 36 Are neuromorphic or mathematical understood in functional terms as inspiration AIs more controllable? for AI designs, compared with the opacity Arguments for mathematical AIs: of neuronal spaghetti. In Bostrom’s list of examples of AI techniques inspired by biol- • Behavior and goals are more transparent, ogy (Ch. 14, "Technology couplings"), only a and goal preservation seems easier to spec- few came from neuroscience specifically. That ify (see "The Ethics of Artificial Intel- said, emulation work might involve some cross- ligence" by Bostrom and Yudkowsky, p. pollination with AI, and in any case, it might 16). accelerate interest in brain/artificial intelli- • Neuromorphic AIs might speed up mathe- gence more generally or might put pressure on matical AI, leaving less time to figure out AI groups to move ahead faster. Or it could control. funnel resources and scientists away from de Arguments for neuromorphic AIs: novo AI work. The upshot isn’t obvious. • We understand human psychology, expec- A" 2011 Workshop Re- tations, norms, and patterns of behavior. port" includes the argument that neuromor- Mathematical AIs could be totally alien phic AI should be easier than brain emula- and hence unpredictable. tion because "Merely reverse-engineering the • If neuromorphic AIs came first, they could Microsoft Windows code base is hard, so think faster and help figure out goal reverse-engineering the brain is probably much preservation, which I assume does require harder" (Salamon & Muehlhauser, 2012). But mathematical AIs at the end of the day. emulation is not reverse-engineering. As Robin • Mathematical AIs may be more prone to

46 Artificial Intelligence and Its Implications for Future Suffering

unexpected breakthroughs that yield rad- future will command even more ethical con- ical jumps in intelligence. sideration. In the limit of very human-like neuromorphic How might ethical concern for machines in- AIs, we face similar considerations as between teract with control measures for machines? emulations vs. from-scratch AIs – a tradeoff 37.1 Slower AGI development? which is not at all obvious. Overall, I think mathematical AI has a bet- As more people grant moral status to AIs, ter best case but also a worse worst case than there will likely be more scrutiny of AI re- neuromorphic. If you really want goal preser- search, analogous to how animal activists in vation and think goal drift would make the fu- the present monitor animal testing. This may ture worthless, you might lean more towards make AI research slightly more difficult and mathematical AI because it’s more likely to may distort what kinds of AIs are built de- perfect goal preservation. But I probably care pending on the degree of empathy people have less about goal preservation and more about for different types of AIs (Calverley, 2005). avoiding terrible outcomes. For instance, if few people care about invisible, In Superintelligence (Ch. 14), Bostrom non-embodied systems, researchers who build comes down strongly in favor of mathemati- these will face less opposition than those who cal AI being safer. I’m puzzled by his high de- pioneer suffering robots or animated charac- gree of confidence here. Bostrom claims that ters that arouse greater empathy. If this possi- unlike emulations, neuromorphic AIs wouldn’t bility materializes, it would contradict present have human motivations by default. But this trends where it’s often helpful to create at least seems to depend on how human motivations a toy robot or animated interface in order to are encoded and what parts of human brains "sell" your research to grant-makers and the are modeled in the AIs. public. In contrast to Bostrom, a 2011 Singular- Since it seems likely that reducing the pace ity Summit workshop ranked neuromorphic of progress toward AGI is on balance bene- AI as more controllable than (non-friendly) ficial, a slowdown due to ethical constraints mathematical AI, though of course they found may be welcome. Of course, depending on the friendly mathematical AI most controllable details, the effect could be harmful. For in- (Salamon & Muehlhauser, 2012). The work- stance, perhaps China wouldn’t have many shop’s aggregated probability of a good out- ethical constraints, so ethical restrictions in come given brain emulation or neuromorphic the West might slightly favor AGI develop- AI turned out to be the same (14%) as that ment by China and other less democratic for mathematical AI (which might be either countries. (This is not guaranteed. For what friendly or unfriendly). it’s worth, China has already made strides to- ward reducing animal testing.) 37 Impacts of empathy for AIs In any case, I expect ethical restrictions on AI development to be small or nonexistent un- As I noted above, advanced AIs will be com- til many decades from now when AIs develop plex agents with their own goals and values, perhaps mammal-level intelligence. So maybe and these will matter ethically. Parallel to dis- such restrictions won’t have a big impact on cussions of robot rebellion in science fiction AGI progress. Moreover, it may be that most are discussions of robot rights. I think even AGIs will be sufficiently alien that they won’t present-day computers deserve a tiny bit of arouse much human sympathy. moral concern, and complex computers of the Brain emulations seem more likely to raise

47 Center on Long-Term Risk

ethical debate because it’s much easier to ar- cerns are overly parochial and that it’s chua- gue for their personhood. If we think brain vanist to impose our "monkey dreams" on an emulation coming before AGI is good, a slow- AGI, which is the next stage of evolution. down of emulations could be unfortunate, The question is similar to whether one sym- while if we want AGI to come first, a slow- pathizes with the Native Americans (humans) down of emulations should be encouraged. or their European conquerors (rogue AGIs). Of course, emulations and AGIs do actually Before the second half of the 20th century, matter and deserve rights in principle. More- many history books glorified the winners (Eu- over, movements to extend rights to machines ropeans). After a brief period in which humans in the near term may have long-term impacts are quashed by a rogue AGI, its own "his- on how much post-humans care about suffer- tory books" will celebrate its conquest and the ing subroutines run at galactic scale. I’m just bending of the arc of history toward "higher", pointing out here that ethical concern for AGIs "better" forms of intelligence. (In practice, the and emulations also may somewhat affect tim- psychology of a rogue AGI probably wouldn’t ing of these technologies. be sufficiently similar to human psychology for these statements to apply literally, but they 37.2 Attitudes toward AGI control would be true in a metaphorical and implicit Most humans have no qualms about shutting sense.) down and rewriting programs that don’t work David Althaus worries that if people sym- as intended, but many do strongly object to pathize too much with machines, society will killing people with disabilities and designing be less afraid of an AI takeover, even if AI better-performing babies. Where to draw a takeover is bad on purely altruistic grounds. line between these cases is a tough question, I’m less concerned about this because even if but as AGIs become more animal-like, there people agree that advanced machines are sen- may be increasing moral outrage at shutting tient, they would still find it intolerable for them down and tinkering with them willy nilly. AGIs to commit speciecide against humanity. Nikola Danaylov asked Everyone agrees that Hitler was sentient, after whether it was speciesist or discrimination in all. Also, if it turns out that rogue-AI takeover favor of biological beings to lock up machines is altruistically desirable, it would be better if and observe them to ensure their safety before more people agreed with this, though I expect letting them loose. an extremely tiny fraction of the population At a lecture in Berkeley, CA, Nick Bostrom would ever come around to such a position. was asked whether it’s unethical to "chain" Where sympathy for AGIs might have more AIs by forcing them to have the values we impact is in cases of softer takeoff where AGIs want. Bostrom replied that we have to give work in the human economy and acquire in- machines some values, so they may as well creasing shares of wealth. The more humans align with ours. I suspect most people would care about AGIs for their own sakes, the more agree with this, but the question becomes such transitions might be tolerated. Or would trickier when we consider turning off erro- they? Maybe seeing AGIs as more human-like neous AGIs that we’ve already created be- would evoke the xenophobia and ethnic hatred cause they don’t behave how we want them to. that we’ve seen throughout history whenever a A few hard-core AGI-rights advocates might group of people gains wealth (e.g., Jews in Me- raise concerns here. More generally, there’s a dieval Europe) or steals jobs (e.g., immigrants segment of transhumanists (including young of various types throughout history). Eliezer Yudkowsky) who feel that human con- Personally, I think greater sympathy for AGI

48 Artificial Intelligence and Its Implications for Future Suffering is likely net positive because it may help allay 2. Not speeding unsafe AGI: Building real- anti-alien prejudices that may make coopera- world systems would contribute toward tion with AGIs harder. When a Homo sapiens non-safe AGI research. tribe confronts an outgroup, often it reacts vi- 3. Long-term focus: MIRI doesn’t just want olently in an effort to destroy the evil foreign- a system that’s the next level better but ers. If instead humans could cooperate with aims to explore the theoretical limits of their emerging AGI brethren, better outcomes possibilities. would likely follow. I personally think reason #3 is most com- pelling. I doubt #2 is hugely important given 38 Charities working on this issue MIRI’s small size, though it matters to some degree. #1 seems a reasonable strategy in What are some places where donors can con- moderation, though I favor approaches that tribute to make a difference on AI? The Center look decently likely to yield non-terrible out- on Long-Term Risk (CLR) explores questions comes rather than shooting for the absolute like these, though at the moment the organi- best outcomes. zation is rather small. MIRI is larger and has a longer track record. Its values are more con- Software can be proved correct, and some- ventional, but it recognizes the importance of times this is done for mission-critical compo- positive-sum opportunities to help many val- nents, but most software is not validated. I ues systems, which includes suffering reduc- suspect that AGI will be sufficiently big and tion. More reflection on these topics can po- complicated that proving safety will be impos- tentially reduce suffering and further goals like sible for humans to do completely, though I eudaimonia, fun, and interesting complexity at don’t rule out the possibility of software that the same time. would help with correctness proofs on large Because AI is affected by many sectors of systems. Muehlhauser and comments on his society, these problems can be tackled from post largely agree with this. diverse angles. Many groups besides FRI and What kind of track record does theoretical MIRI examine important topics as well, and mathematical research have for practical im- these organizations should be explored further pact? There are certainly several domains that as potential charity recommendations. come to mind, such as the following. • Auction game theory has made govern- 39 Is MIRI’s work too theoretical? ments billions of dollars and is widely used in Internet advertising. Most of MIRI’s publications since roughly • Theoretical physics has led to numerous 2012 have focused on formal mathematics, forms of technology, including electricity, such as logic and provability. These are tools lasers, and atomic bombs. However, im- not normally used in AGI research. I think mediate technological implications of the MIRI’s motivations for this theoretical focus most theoretical forms of physics (string are theory, Higgs boson, black holes, etc.) are 1. Pessimism about the problem difficulty: less pronounced. Luke Muehlhauser writes that "Espe- • Formalizations of many areas of computer cially for something as complex as science have helped guide practical im- Friendly AI, our message is: ’If we prove plementations, such as in algorithm com- it correct, it might work. If we don’t prove plexity, concurrency, distributed systems, it correct, it definitely won’t work.’" cryptography, hardware verification, and

49 Center on Long-Term Risk

so on. That said, there are also areas of machines and lambda calculus are also main- theoretical computer science that have lit- stream, and few can dispute their importance. tle immediate application. Most software But Turing-machine theorems do little to con- engineers only know a little bit about strain our understanding of what AGI will ac- more abstract theory and still do fine tually look like in the next few centuries. That building systems, although if no one knew said, there’s significant peer disagreement on theory well enough to design theory-based this topic, so epistemic modesty is warranted. tools, the software field would be in con- In addition, if the MIRI view is right, we siderably worse shape. might have more scope to make an impact All told, I think it’s important for someone to to AGI safety, and it would be possible that do the kinds of investigation that MIRI is un- important discoveries could result from a few dertaking. I personally would probably invest mathematical insights rather than lots of de- more resources than MIRI is in hacky, approx- tailed engineering work. Also, most AGI re- imate solutions to AGI safety that don’t make search is more engineering-oriented, so MIRI’s such strong assumptions about the theoreti- distinctive focus on theory, especially abstract cal cleanliness and soundness of the agents in topics like decision theory, may target an un- question. But I expect this kind of less per- derfunded portion of the space of AGI-safety fectionist work on AGI control will increase as research. more people become interested in AGI safety. In "How to Study Unsafe AGI’s safely (and There does seem to be a significant divide why we might have no choice)", Punoxysm between the math-oriented conception of AGI makes several points that I agree with, includ- and the engineering/neuroscience conception. ing that AGI research is likely to yield many Ben Goertzel takes the latter stance: false starts before something self-sustaining takes off, and those false starts could afford I strongly suspect that to achieve high lev- us the opportunity to learn about AGI ex- els of general intelligence using realisti- perimentally. Moreover, this kind of ad-hoc, cally limited computational resources, one empirical work may be necessary if, as seems is going to need to build systems with to me probable, fully rigorous mathematical a nontrivial degree offundamental unpre- models of safety aren’t sufficiently advanced dictabilityto them. This is what neuro- by the time AGI arrives. science suggests, it’s what my concrete Ben Goertzel likewise suggests that a fruit- AGI design work suggests, and it’s what ful way to approach AGI control is to study my theoretical work on GOLEM and re- small systems and "in the usual manner of sci- lated ideas suggests (Goertzel, 2014). And ence, attempt to arrive at a solid theory of AGI none of the public output of SIAI re- intelligence and ethics based on a combina- searchers or enthusiasts has given me any tion of conceptual and experimental-data con- reason to believe otherwise, yet. siderations". He considers this view the norm among "most AI researchers or futurists". I Personally I think Goertzel is more likely to think empirical investigation of how AGIs be- be right on this particular question. Those have is very useful, but we also have to re- who view AGI as fundamentally complex have member that many AI scientists are overly bi- more concrete results to show, and their ap- ased toward "build first; ask questions later" proach is by far more mainstream among com- because puter scientists and neuroscientists. Of course, proofs about theoretical models like Turing • building may be more fun and exciting

50 Artificial Intelligence and Its Implications for Future Suffering

than worrying about safety (Steven M. inform us about the lay of the land and Bellovin observed with reference to open- build connections in a friendly, mutualis- source projects: "Quality takes work, de- tic way. sign, review and testing and those are not 3. Lobby for greater funding of research into nearly as much fun as coding".) AGI safety [10-20 years]. Once the idea • there’s more incentive from commercial and field of AGI safety have become more applications and government grants to mainstream, it should be possible to dif- build rather than introspect ferentially speed up safety research by • scientists may want AGI sooner so that getting more funding for it – both from they personally or their children can reap governments and philanthropists. This is its benefits. already somewhat feasible; for instance: On a personal level, I suggest that if you re- "In 2014, the US Office of Naval Research ally like building systems rather than thinking announced that it would distribute $7.5 about safety, you might do well to earn to give million in grants over five years to uni- in software and donate toward AGI-safety or- versity researchers to study questions of ganizations. as applied to autonomous robots." 40 Next steps 4. The movement snowballs [decades]. It’s hard to plan this far ahead, but I imag- Here are some rough suggestions for how I ine that eventually (within 25-50 years?) recommend proceeding on AGI issues and, in AGI safety will become a mainstream po- [brackets], roughly how long I expect each litical topic in a similar way as nuclear stage to take. Of course, the stages needn’t be security is today. Governments may take done in a strict serial order, and step 1 should over in driving the work, perhaps with continue indefinitely, as we continue learning heavy involvement from companies like more about AGI from subsequent steps. Google. This is just a prediction, and the 1. Decide if we want human-controlled, goal- actual way things unfold could be differ- preserving AGI [5-10 years]. This in- ent. volves exploring questions about what I recommend avoiding a confrontational ap- types of AGI scenarios might unfold and proach with AGI developers. I would not try how much suffering would result from to lobby for restrictions on their research (in AGIs of various types. the short term at least), nor try to "slow them 2. Assuming we decide we do want con- trolled AGI: Network with academics and down" in other ways. AGI developers are the AGI developers to raise the topic and can- allies we need most at this stage, and most of them don’t want uncontrolled AGI either. vass ideas [5-10 years]. We could reach Typically they just don’t see their work as out to academic AGI-like projects, in- risky, and I agree that at this point, no AGI cluding these listed by Pei Wang and project looks set to unveil something danger- these listed on Wikipedia, as well as to ous in the next decade or two. For many re- machine ethics and roboethics communi- searchers, AGI is a dream they can’t help but ties. There are already some discussions pursue. Hopefully we can engender a similar about safety issues among these groups, enthusiasm about pursuing AGI safety. but I would expand the dialogue, have private conversations, write publications, In the longer term, tides may change, and hold conferences, etc. These efforts both perhaps many AGI developers will desire

51 Center on Long-Term Risk government-imposed restrictions as their tech- Research whether controlled or uncon- nologies become increasingly powerful. Even trolled AI yields more suffering (score then, I’m doubtful that governments will be = 10/10) able to completely control AGI development Pros: (see, e.g., the criticisms by John McGinnis of this approach), so differentially pushing for • Figuring out which outcome is better more safety work may continue to be the most should come before pushing ahead too far leveraged solution. History provides a poor in any particular direction. track record of governments refraining from • This question remains non-obvious and so developing technologies due to ethical con- has very high expected value of informa- cerns; (Eckersley & Sandberg, 2014, p. 187) tion. (p. 187) cite "human cloning and land-based • None of the existing big names in AI autonomous robotic weapons" as two of the safety have explored this question because few exceptions, with neither prohibition hav- reducing suffering is not the dominant pri- ing a long track record. ority for them.

I think the main way in which we should try Cons: to affect the speed of regular AGI work is by • None. aiming to avoid setting off an AGI arms race, either via an AGI Sputnik moment or else by Push for suffering-focused AI-safety ap- more gradual diffusion of alarm among world proaches (score = 10/10) militaries. It’s possible that discussing AGI scenarios too much with military leaders could Most discussions of AI safety assume that hu- exacerbate a militarized reaction. If militaries man extinction and failure to spread (human- set their sights on AGI the way the US and type) eudaimonia are the main costs of Soviet Union did on the space race or nuclear- takeover by uncontrolled AI. But as noted in arms race during the Cold War, the amount this piece, AIs would also spread astronomical of funding for unsafe AGI research might mul- amounts of suffering. Currently no organiza- tiply by a factor of 10 or maybe 100, and it tion besides FRI is focused on how to do AI would be aimed in harmful directions. safety work with the primary aim of avoiding outcomes containing huge amounts of suffer- ing. 41 Where to push for maximal impact? One example of a suffering-focused AI-safety approach is to design AIs so that even if they do get out of control, they "fail safe" in the Here are some candidates for the best object- sense of not spreading massive amounts of suf- level projects that altruists could work on with fering into the cosmos. For example: reference to AI. Because AI seems so crucial, 1. AIs should be inhibited from colonizing these are also candidates for the best object- space, or if they do colonize space, they level projects in general. Meta-level projects should do so in less harmful ways. like movement-building, career advice, earning 2. "Minimizer" utility functions have less to give, fundraising, etc. are also competitive. risk of creating new universes than "max- I’ve scored each project area out of 10 points imizer" ones do. to express a rough subjective guess of the value 3. Simpler utility functions (e.g., creating of the work for suffering reducers. uniform paperclips) might require fewer

52 Artificial Intelligence and Its Implications for Future Suffering

suffering subroutines than complex util- thought of for reducing the spread of suffering ity functions would. even if humans lose total control. 4. AIs with expensive intrinsic values (e.g., Another challenge is that those who don’t maximize paperclips) may run fewer com- place priority on reducing suffering may not plex minds than AIs with cheaper values agree with these proposals. For example, I (e.g., create at least one paperclip on each would guess that most AI scientists would say, planet), because AIs with cheaper values "If the AGI kills humans, at least we should have lower opportunity cost for using re- ensure that it spreads life into space, creates a sources and so can expend more of their complex array of intricate structures, and in- cosmic endowment on learning about the creases the size of our multiverse." universe to make sure they’ve accom- plished their goals properly. (Thanks to Work on AI control and value-loading a friend for this point.) From this stand- problems (score = 4/10) point, suffering reducers might prefer an Pros: AI that aims to "maximize paperclips" • At present, controlled AI seems more over one that aims to "make sure there’s likely good than bad. at least one paperclip per planet." How- • Relatively little work thus far, so marginal ever, perhaps the paperclip maximizer effort may make a big impact. would prefer to create new universes, while the "at least one paperclip per planet" AI wouldn’t; indeed, the "one pa- Cons: perclip per planet" AI might prefer to • It may turn out that AI control increases have a smaller multiverse so that there net expected suffering. would be fewer planets that don’t con- • This topic may become a massive area of tain paperclips. Also, the satisficing AI investment in coming decades, because ev- would be easier to compromise with than eryone should theoretically care about it. the maximizing AI, since the satisficer’s Maybe there’s more leverage in pushing goals could be carried out more cheaply. on neglected areas of particular concern There are other possibilities to consider for suffering reduction. as well. Maybe an AI with the instruc- tions to "be 70% sure of having made Research technological/economic/ polit- one paperclip and then shut down all of ical dynamics of an AI takeoff and push your space-colonization plans" would not in better directions (score = 3/10) create much suffering (depending on how scrupulous the AI was about making sure By this I have in mind scenarios like those that what it had created was really a pa- of Robin Hanson for emulation takeoff, or perclip, that it understood physics prop- Bostrom’s (2004) "The Future of Human Evo- erly, etc.). lution".

The problem with bullet #1 is that if you can Pros: succeed in preventing AGIs from colonizing • Many scenarios have not been mapped space, it seems like you should already have out. There’s a need to introduce eco- been able to control the AGI altogether, since nomic/social realism to AI scenarios, the two problems appear about equally hard. which at present often focus on technical But maybe there are clever ideas we haven’t challenges and idealized systems.

53 Center on Long-Term Risk

• Potential to steer dynamics in more win- • Promoting CEV as an ideal to approxi- win directions. mate may be confused in people’s minds with suggesting that CEV is likely to hap- pen. The latter assumption is probably Cons: wrong and so may distort people’s be- • Broad subject area. Work may be some- liefs about other crucial questions. For in- what replaceable as other researchers get stance, if CEV was likely, then it would be on board in the coming decades. more likely that suffering reducers should • More people have their eyes on general favor controlled AI; but the fact of the economic/social trends than on specific matter is that anything more than crude AI technicalities, so there may be lower approximations to CEV will probably not marginal returns to additional work in happen. this area. Promote a smoother, safer takeoff for • While technological progress is probably brain emulation (score = 2/10) the biggest influence on history, it’s also one of the more inevitable influences, Pros: making it unclear how much we can af- • As noted above, it’s more plausible that fect it. Our main impact on it would seem suffering reducers should favor emulation to come through differential technologi- safety than AI safety. cal progress. In contrast, values, institu- • The topic seems less explored than safety tions, and social movements can go in of de novo AIs. many different directions depending on our choices. Cons: Promote the ideal of cooperation on AI • I find it slightly more likely that de novo values (e.g., CEV by Yudkowsky(2004)) AI will come first, in which case this work (score = 2/10) wouldn’t be as relevant. In addition, AI may have more impacts on society even Pros: before it reaches the human level, again • Whereas technical work on AI safety is of making it slightly more relevant. interest to and benefits everyone – includ- • Safety measures might require more po- ing militaries and companies with non- litical and less technical work, in which altruistic aims – promoting CEV is more case it’s more likely to be done correctly important to altruists. I don’t see CEV as by policy makers in due time. The value- a likely outcome even if AI is controlled, loading problem seems much easier for because it’s more plausible that individ- emulations because it might just work to uals and groups will push for their own upload people with good values, assuming agendas. no major value corruption during or after uploading. • Emulation is more dependent on rel- Cons: atively straightforward engineering im- • It’s very hard to achieve CEV. It depends provements and less on unpredictable in- on a lot of really complex political and sight than AI. Thus, it has a clearer devel- economic dynamics that millions of altru- opment timeline, so there’s less urgency to ists are already working to improve. investigate issues ahead of time to prepare

54 Artificial Intelligence and Its Implications for Future Suffering

for an unexpected breakthrough. control of the outcome. Stronger govern- ment regulation, surveillance, and coordi- Influence the moral values of those likely nation would increase chances of a single- to control AI (score = 2/10) ton, as would global cooperation. Pros: Cons: • Altruists, and especially those with niche values, may want to push AI develop- • Speeding up the leading AI project might ment in more compassionate directions. exacerbate AI arms races. And in any This could make sense because altruists event, it’s currently far too early to pre- are most interested in ethics, while even dict what group will lead the AI race. power-hungry states and money-hungry individuals should care about AI safety in Other variations the long run. In general, there are several levers that we can pull on: Cons: • safety • This strategy is less cooperative. It’s akin • arrival time relative to other technologies to defecting in a tragedy of the commons • influencing values – pushing more for what you want rather • cooperation than what everyone wants. If you do push • shaping social dynamics for what everyone wants, then I would • raising awareness consider such work more like the "Pro- • etc. mote the ideal of cooperation" item. These can be applied to any of • Empirically, there isn’t enough investment • de novo AI in other fundamental AI issues, and those • brain emulation may be more important than further en- • other key technologies gaging already well trodden ethical de- • etc. bates. 42 Is it valuable to work at or influence Promote a singleton over multipolar dy- an AGI company? namics (score = 1/10) Pros: Projects like DeepMind, Vicarious, OpenCog, and the AGI research teams at Google, Face- • A singleton, whether controlled or uncon- book, etc. are some of the leaders in AGI trolled, would reduce the risk of conflicts technology. Sometimes it’s proposed that since that cause cosmic damage. these teams might ultimately develop AGI, al- truists should consider working for, or at least Unclear: lobbying, these companies so that they think • There are many ways to promote a single- more about AGI safety. ton. Encouraging cooperation on AI de- One’s assessment of this proposal depends velopment would improve pluralism and on one’s view about AGI takeoff. My own human control in the outcome. Faster de- opinion may be somewhat in the minority rel- velopment by the leading AI project might ative to expert surveys(Müller & Bostrom, also increase the chance of a singleton 2016), but I’d be surprised if we had human- while reducing the probability of human level AGI before 50 years from now, and my

55 Center on Long-Term Risk median estimate might be like ∼ 90 years from 43 Should suffering reducers focus on now. That said, the idea of AGI arriving at AGI safety? a single point in time is probably a wrong framing of the question. Already machines If, upon further analysis, it looks like AGI are super-human in some domains, while their safety would increase expected suffering, then abilities are far below humans’ in other do- the answer would be clear: Suffering reducers mains. Over the coming decades, we’ll see lots shouldn’t contribute toward AGI safety and of advancement in machine capabilities in var- should worry somewhat about how their mes- ious fields at various speeds, without any sin- sages might incline others in that direction. gle point where machines suddenly develop However, I find it reasonably likely that suf- human-level abilities across all domains. Grad- fering reducers will conclude that the benefits ual AI progress over the coming decades will of AGI safety outweigh the risks. In that case, radically transform society, resulting in many they would face a question of whether to push small "intelligence explosions" in various spe- on AGI safety or on other projects that also cific areas, long before machines completely seem valuable. surpass humans overall. Reasons to focus on other projects: • There are several really smart people working on AGI safety right now. The number of brilliant altruists focused on In light of my picture of AGI, I think of AGI safety probably exceeds the num- DeepMind, Vicarious, etc. as ripples in a long- ber of brilliant altruists focused on re- term wave of increasing machine capabilities. ducing suffering in the far future by sev- It seems extremely unlikely that any one of eral times over. Thus, it seems plausible these companies or its AGI system will boot- that there remain more low-hanging fruit strap itself to world dominance on its own. for suffering reducers to focus on other Therefore, I think influencing these companies crucial considerations rather than delving with an eye toward "shaping the AGI that will into the technical details of implementing take over the world" is probably naive. That AGI safety. said, insofar as these companies will influence • I expect that AGI safety will require at the long-term trajectory of AGI research, and least, say, thousands of researchers and insofar as people at these companies are im- hundreds of thousands of programmers to portant players in the AGI community, I think get right. AGI safety is a much harder influencing them has value – just not vastly problem than ordinary computer secu- more value than influencing other powerful rity, and computer security demand is people. already very high: "In 2012, there were more than 67,400 separate postings for cybersecurity-related jobs in a range of That said, as noted previously, early work industries". Of course, that AGI safety on AGI safety has the biggest payoff in sce- will need tons of researchers eventually narios where AGI takes off earlier and harder needn’t discount the value of early work, than people expected. If the marginal returns and indeed, someone who helps grow the to additional safety research are many times movement to a large size would contribute higher in these "early AGI" scenarios, then it as much as many detail-oriented AGI could still make sense to put some investment safety researchers later. into them even if they seem very unlikely. Reasons to focus on AGI safety:

56 Artificial Intelligence and Its Implications for Future Suffering

• Most other major problems are also al- tion selection effects in science and phi- ready being tackled by lots of smart peo- losophy (1edition ed.). Abingdon, Oxon: ple. Routledge. • AGI safety is a cause that many value sys- Bostrom, N. (2014). Superintelligence: Paths, tems can get behind, so working on it can dangers, strategies. Oxford University be seen as more "nice" than focusing on Press. areas that are more specific to suffering- Bostrom, N., & Yudkowsky, E. (2014). reduction values. The ethics of artificial intelligence. In All told, I would probably pursue a mixed K. Frankish & W. M. Ramsey (Eds.), The Cambridge Handbook of Artificial strategy: Work primarily on questions specific Intelligence to suffering reduction, but direct donations (pp. 316–334). Cambridge and resources toward AGI safety when oppor- University Press. The Mythical Man- tunities arise. Some suffering reducers particu- Brooks, F. P., Jr. (1995). month (Anniversary Ed.) larly suited to work on AGI safety could go in . Boston, MA, that direction while others continue searching USA: Addison-Wesley Longman Pub- for points of leverage not specific to controlling lishing Co., Inc. AGI. Calverley, D. J. (2005). Android science and the animal rights movement: are there Cognitive sciences society 44 Acknowledgments analogies. In workshop, Stresa, Italy (pp. 127–136). Parts of this piece were inspired by discussions Davis, E. (2015). Ethical guidelines for a with various people, including David Althaus, superintelligence. Artificial Intelligence, Daniel Dewey, and Caspar Oesterheld. 220 , 121–124. Dennett, D. C. (1992). Consciousness Ex- References plained (1st ed.). Boston: Back Bay Books. Armstrong, S., Soares, N., Fallenstein, B., & Eckersley, P., & Sandberg, A. (2014). Is Brain Yudkowsky, E. (2015). Corrigibility. In Emulation Dangerous? Journal of Artifi- AAAI Publications. Austin, TX, USA. cial General Intelligence, 4 (3), 170–194. Bloom, P. (2013). Just babies: The origins of Goertzel, B. (2014). GOLEM: to- good and evil. New York: Crown. wards an AGI meta-architecture en- Bostrom, N. (2003). Astronomical waste: The abling both goal preservation and rad- opportunity cost of delayed technologi- ical self-improvement. Journal of Exper- cal development. Utilitas, 15 (03), 308– imental & Theoretical Artificial Intelli- 314. gence, 26 (3), 391–403. Bostrom, N. (2004). The future of human Good, I. J. (1965). Speculations concerning evolution. In C. Tandy (Ed.), Death the first ultraintelligent machine. Ad- and Anti-Death: Two Hundred Years Af- vances in computers, 6 , 31–88. ter Kant, Fifty Years After Turing (pp. Good, I. J. (1982). Ethical machines. In Tenth 339–371). Palo Alto, California: Ria Uni- Machine Intelligence Workshop, Cleve- versity Press. land, Ohio (Vol. 246, pp. 555–560). Bostrom, N. (2006). What is a singleton? Lin- Hall, J. S. (2008). Engineering utopia. Fron- guistic and Philosophical Investigations, tiers in Artificial Intelligence and Appli- 5 (2), 48–54. cations, 171 , 460. Bostrom, N. (2010). Anthropic bias: Observa- Hanson, R. (1994). If uploads come first. Ex-

57 Center on Long-Term Risk

tropy, 6 (2), 10–15. gularity summit 2011 workshop report Hanson, R., & Yudkowsky, E. (2013). (Technical Report No. 1). San Francisco, The Hanson-Yudkowsky AI-Foom de- CA: The Singularity Institute. bate. Berkeley, CA: Machine Intelli- Sotala, K. (2012). Advantages of artificial in- gence Research Institute. telligences, uploads, and digital minds. Kaplan, R. D. (2013). The revenge of geogra- International Journal of Machine Con- phy: What the map tells us about com- sciousness, 04 (01), 275–291. doi: 10 ing conflicts and the battle against fate .1142/S1793843012400161 (Reprint edition ed.). Random House Stock, G. (1993). Metaman: the merging of Trade Paperbacks. humans and machines into a global su- Kurzweil, R. (2000). The Age of Spiritual Ma- perorganism. New York: Simon & Schus- chines: When Computers Exceed Human ter. Intelligence. New York: Penguin Books. Turing, A. M. (1950). Computing machinery Minsky, M. (1984). Afterword to vernor and intelligence. Mind, 59 (236), 433– vinge’s novel, "True Names.". Unpub- 460. lished manuscript. Winnefeld, J. A., & Kendall, F. (2013). Un- Müller, V. C., & Bostrom, N. (2016). Future manned systems integrated roadmap FY progress in artificial intelligence: A sur- 2013-2036. Office of the Secretary of De- vey of expert opinion. In V. C. Müller fense, US. (Ed.), Fundamental issues of artificial Yudkowsky, E. (2004). Coherent extrapolated intelligence (pp. 553–571). Berlin: volition. Singularity Institute for Artifi- Springer. cial Intelligence. Ng, A. Y., & Russell, S. J. (2000). Algorithms Yudkowsky, E. (2011). Complex value sys- for inverse reinforcement learning. In tems in friendly AI. In D. Hutchison et (pp. 663–670). al. (Eds.), Artificial General Intelligence Russell, S. J., Norvig, P., Canny, J. F., Malik, (Vol. 6830, pp. 388–393). Springer Berlin J. M., & Edwards, D. D. (2003). Ar- Heidelberg. tificial intelligence: a modern approach Yudkowsky, E. (2013). Intelligence explosion (Vol. 2). Prentice hall Upper Saddle microeconomics. Machine Intelligence River. Research Institute, accessed online Oc- Salamon, A., & Muehlhauser, L. (2012). Sin- tober, 23 , 2015.

58