Superintelligence As a Cause Or Cure for Risks of Astronomical Suffering
Total Page:16
File Type:pdf, Size:1020Kb
Informatica 41 (2017) 389–400 389 Superintelligence as a Cause or Cure for Risks of Astronomical Suffering Kaj Sotala and Lukas Gloor Foundational Research Institute, Berlin, Germany E-mail: [email protected], [email protected] foundational-research.org Keywords: existential risk, suffering risk, superintelligence, AI alignment problem, suffering-focused ethics, population ethics Received: August 31, 2017 Discussions about the possible consequences of creating superintelligence have included the possibility of existential risk, often understood mainly as the risk of human extinction. We argue that suffering risks (s-risks), where an adverse outcome would bring about severe suffering on an astronomical scale, are risks of a comparable severity and probability as risks of extinction. Preventing them is the common interest of many different value systems. Furthermore, we argue that in the same way as superintelligent AI both contributes to existential risk but can also help prevent it, superintelligent AI can be both the cause of suffering risks and a way to prevent them from being realized. Some types of work aimed at making superintelligent AI safe will also help prevent suffering risks, and there may also be a class of safeguards for AI that helps specifically against s-risks. Povzetek: Prispevek analizira prednosti in nevarnosti superinteligence. 1 Introduction Work discussing the possible consequences of creating We term the possibility of such outcomes a suffering superintelligent AI (Yudkowsky 2008, Bostrom 2014, risk: Sotala & Yampolskiy 2015) has discussed Suffering risk (s-risk): One where an adverse superintelligence as a possible existential risk: a risk outcome would bring about severe suffering on "where an adverse outcome would either annihilate an astronomical scale, vastly exceeding all Earth-originating intelligent life or permanently and suffering that has existed on Earth so far. drastically curtail its potential" (Bostrom 2002, 2013). The previous work has mostly1 considered the worst- In order for potential risks - including s-risks - to case outcome to be the possibility of human extinction by merit work on them, three conditions must be met. First, an AI that is indifferent to humanity’s survival and the outcome of the risk must be sufficiently severe to values. However, it is often thought that for an merit attention. Second, the risk must have some individual, there exist “fates worse than death”; reasonable probability of being realized. Third, there analogously, for civilizations there may exist fates worse must be some way for risk-avoidance work to reduce than extinction, such as survival in conditions in which either the probability or severity of an adverse outcome. most people will experience enormous suffering for most In this paper, we will argue that suffering risks meet of their lives. all three criteria, and that s-risk avoidance work is thus of Even if such extreme outcomes would be avoided, a comparable magnitude in importance as work on risks the known universe may eventually be populated by vast from extinction. Section 2 seeks to establish the severity amounts of minds: published estimates include the of s-risks. There, we will argue that there are classes of possibility of 1025 minds supported by a single star suffering-related adverse outcomes that many value (Bostrom 2003a), with humanity having the potential to systems would consider to be equally or even more eventually colonize tens of millions of galaxies severe than extinction. Additionally, we will define a (Armstrong & Sandberg 2013). While this could enable class of less severe suffering outcomes which many an enormous number of meaningful lives to be lived, if value systems would consider important to avoid, albeit even a small fraction of these lives were to exist in not as important as avoiding extinction. Section 3 looks hellish circumstances, the amount of suffering would be at suffering risks from the view of several different value vastly greater than that produced by all the atrocities, systems, and discusses how much they would prioritize abuses, and natural causes in Earth’s history so far. avoiding different suffering outcomes. Next, we will argue that there is a reasonable probability for a number of different suffering risks to be realized. Our discussion 1 Bostrom (2014) is mainly focused on the risk of is organized according to the relationship that extinction, but does also devote some discussion to superintelligent AIs have to suffering risks: section 4 alternative negative outcomes such as “mindcrime”. We covers risks that may be prevented by a discuss mindcrime in section 5. 390 Informatica 41 (2017) 389–400 K. Sotala et al. superintelligence, and section 5 covers risks that may be well as individuals. Bostrom (2013) seems to realized by one2. Section 6 discusses how it might be acknowledge this by including “hellish” as a possible possible to work on suffering risks. severity in the corresponding chart, but does not place any concrete outcomes under the hellish severity, 2 Suffering risks as risks of extreme implying that risks of extinction are still the worst outcomes. Yet there seem to be plausible paths to severity civilization-wide hell outcomes as well (Figure 1), which As already noted, the main focus in discussion of risks we will discuss in sections 4 and 5. from superintelligent AI has been either literal extinction, with the AI killing humans as a side-effect of pursuing Global Thinning of Extinction Global some other goal (Yudkowsky 2008), or a value the ozone risks hellscape layer extinction. In value extinction, some form of humanity may survive, but the future is controlled by an AI Personal Car is stolen Death Extended torture operating according to values which all current-day followed by humans would consider worthless (Yudkowsky 2011). In death either scenario, it is thought that the resulting future would have no value. Endurable Terminal Hellish In this section, we will argue that besides futures that Figure 1: The worst suffering risks are ones that affect have no value, according to many different value systems everyone and subject people to hellish conditions. it is possible to have futures with negative value. These would count as the worst category of existential risks. In In order to qualify as equally bad or worse than addition, there are adverse outcomes of a lesser severity, extinction, suffering risks do not necessarily need to which depending on one’s value systems may not affect every single member of humanity. For example, necessarily count as worse than extinction. Regardless, consider a simplified ethical calculus where someone making these outcomes less likely is a high priority and a may have a predominantly happy life (+1), never exist common interest of many different value systems. (0), or have a predominantly unhappy life (-1). As long Bostrom (2002) frames his definition of extinction as the people having predominantly unhappy lives risks with a discussion which characterizes a single outnumber the people having predominantly happy lives, person’s death as being a risk of terminal intensity and under this calculus such an outcome would be considered personal scope, with existential risks being risks of worse than nobody existing in the first place. We will terminal intensity and global scope - one person’s death call this scenario a net suffering outcome3. versus the death of all humans. However, it is commonly This outcome might be considered justifiable if we thought that there are “fates worse than death”: at one assumed that, given enough time, the people living happy extreme, being tortured for an extended time (with no lives will eventually outnumber the people living chance of rescue), and then killed. unhappy lives. Most value systems would then still As less extreme examples, various negative health consider a net suffering outcome worth avoiding, but conditions are often considered worse than death (Rubin, they might consider it an acceptable cost for an even Buehler & Halpern 2016; Sayah et al. 2015; Ditto et al., larger amount of future happy lives. 1996): for example, among hospitalized patients with On the other hand it is also possible that the world severe illness, a majority of respondents considered could become locked into conditions in which the bowel and bladder incontinence, relying on a feeding balance would remain negative even when considering tube to live, and being unable to get up from bed, to be all the lives that will ever live: things would never get conditions that were worse than death (Rubin, Buehler & better. We will call this a pan-generational net Halpern 2016). While these are prospective evaluations suffering outcome. rather than what people have actually experienced, In addition to net and pan-generational net suffering several countries have laws allowing for voluntary outcomes, we will consider a third category. In these euthanasia, which people with various adverse conditions outcomes, serious suffering may be limited to only a have chosen rather than go on living. This may fraction of the population, but the overall population at considered an empirical confirmation of some states of some given time4 is still large enough that even this small life being worse than death, at least as judged by the fraction accounts for many times more suffering than has people who choose to die. The notion of fates worse than death suggests the existence of a “hellish” severity that is one step worse than “terminal”, and which might affect civilizations as 3 “Net” should be considered equivalent to Bostrom’s “global”, but we have chosen a different name to avoid giving the impression that the outcome would necessarily 2 Superintelligent AIs being in a special position where be limited to only one planet. they might either enable or prevent suffering risks, is 4 One could also consider the category of pan- similar to the way in which they are in a special position generational astronomical suffering outcomes, but to make risks of extinction both more or less likely restricting ourselves into just three categories is sufficient (Yudkowsky 2008).