Superintelligence Alignment Andrew Critch: critch@.org

Superintelligence alignment as the world’s top research priority

Andrew Critch1,2 [email protected] 1 Machine Intelligence Research Instute hp://intelligence.org/ 2 UC Berkeley, Center for Compable AI hp://humancompable.ai/

Superintelligence Alignment Andrew Critch: [email protected] My two hats:

Specialist hat: Generalist hat:

Bounded raonality, open- Aligning AI with human source game theory, interests, reducing algorithms that reason existenal risk, making the about algorithms, future super awesome, … … Superintelligence Alignment Andrew Critch: [email protected] Q: Why are the dominant on the planet?

Are we • the strongest? • the toughest? • the fiercest? • the smartest? Superintelligence Alignment Andrew Critch: [email protected] Q: Why are humans the dominant species on the planet?

∈ Superintelligence Alignment Andrew Critch: [email protected] Q: Why are humans the dominant species on the planet?

∉ Superintelligence Alignment Andrew Critch: [email protected]

A 30-year thought experiment:

Imagine if Earth received a credible message from 30 light-years away that said:

“Dear humans, we are considerably smarter than you in every way. We are planning a travel experiment that is 50% likely to bring us to Earth within 30 years.”

What percentage of the world’s greatest minds would be worth allocang to prepare for that scenario? 1%? 10%? 90%?

Superintelligence Alignment Andrew Critch: [email protected] We haven’t goen an alien message, but we have goen a Terrestrial one. From hp://aiimpacts.org/ : Superintelligence Alignment Andrew Critch: [email protected]

Brain model CPU $1MM availability via Memory $1MM demand supercomputer / (TB) availability (FLOPS) commodity computer analog network 10^14 2008 / 2023 10^2 present populaon model

spiking neural 10^18 2019 / 2042 10^4 present network electrophysiology 10^22 2033 / 2068 10^4 2019 states of protein 10^27 2052 / 2100 10^8 2038 complexes stochasc 10^43 2111 / 2201 10^14 2069 behavior of single molecules

Sandberg, A. & Bostrom, N. (2008): Whole Brain Emulaon: A Roadmap Technical Report #2008-3, Future of Humanity Instute, Oxford University URL: www.i.ox.ac.uk/reports/2008-3.pdf Superintelligence Alignment Andrew Critch: [email protected] hardware performance ≠ soware performance

but our source code isn’t so long…

< 750 MB < Superintelligence Alignment Andrew Critch: [email protected] HLAI à SHAI Superintelligence Alignment Andrew Critch: [email protected] HLAI à SHAI Superintelligence Alignment Andrew Critch: [email protected] Whatever we care about that might happen over the next few few decades or beyond… Superintelligence Alignment Andrew Critch: [email protected] … is likely to depend on the alignment of super-human AI with human interests: Superintelligence Alignment Andrew Critch: [email protected]

Our intelligence has brought us:

• Medicine, educaon, diplomacy, scienfic discovery, art, social progress, rights, space travel…

If the development of human-level AI goes well, we could have much more of all of these things.

If it goes poorly, not only would we miss out on these benefits, but we would likely be facing an existenal risk. Superintelligence Alignment Andrew Critch: [email protected] A precarious situaon:

Even absent any malice from an AI, there are basic sub-goals --- like acquiring maer, energy, self- defense, and increased intelligence --- that are useful for almost any objecve a superintelligence might hold, and would be disastrous to humans if pursued fully by a superintelligent opmizer. Superintelligence Alignment Andrew Critch: [email protected]

A 30-year thought experiment:

Experts in AI generally hold that within 30 years, there's around a 50% chance that AI systems will be able to "carry out most human professions at least as well as a typical human.”

What percentage of the world’s greatest minds would be worth allocang to prepare for that scenario? 1%? 10%? 90%? Superintelligence Alignment Andrew Critch: [email protected] There are numerous open problem areas: hp://humancompable.ai/bibliography Superintelligence Alignment Andrew Critch: [email protected]

… a few research agendas addressing AI safety over varying me horizons and scenarios:

Impact* Focus on: Example research agenda me scale Foreseeable safety issues with currently Concrete problems in AI safety, D. 5 developing applicaons, e.g. self-driving Amodei et al years cars, personal assistants

Ensuring safety of highly intelligent systems Value Alignment for Advanced 10 built with current ML paradigms, e.g. Systems, years reinforcement learning, acve learning J. Taylor et al New theorecal perspecves extending Agent Foundaons For Aligning 15 probability theory, game theory, decision Machine Intelligence…, years theory, bounded raonality, etc. Soares & Fallenstein

* Impact me scale esmates are my own judgments based on proximity and difficulty of applicaons. Superintelligence Alignment Andrew Critch: [email protected]

… and a number of research hotspots aempng to address the issues:

Locaon Research group Strengths Machine Intelligence Research Expanding theorecal foundaons Instute (Berkeley) (probability theory, game theory, …) San Francisco Center for Human-Compable AI Expanding theorecal foundaons area (UC Berkeley) (cooperave inverse reinforcement, … ) OpenAI (San Francisco) Working closely with engineers and current state-of-the-art Google DeepMind (London) Working closely with engineers and current state-of-the-art

Future of Humanity Instute Broad view of AI impacts, considering law, London (Oxford) policy, and governance area Leverhulme Center for the Future of Broad view of AI impacts, considering law, Intelligence (Oxford/Cambridge) policy, and governance Center for the Study of Existenal Broad view of existenal risks in general Risk (Cambridge) Superintelligence Alignment Andrew Critch: [email protected] But on net, very lile alignment research is happening… … because the world’s leading AI teams need to focus on beang intelligence benchmarks to compete with each other for top talent.

help AI alignment: needed: how to align a superintelligence with human interests compeon incenves

AI capabilies: how to build a superintelligence Superintelligence Alignment Andrew Critch: [email protected] Worldwide spending on AI control / alignment:

10 8.86 9

8 Technical research: 6.86 Distributed across 7 24 projects with a 6 5.25 median annual 5 4.05 budget of $100k. 4 3.61 2.81 Related efforts: 3 2.62

Spending, in millions of USD Distributed across 2 1.67 20 projects with a 0.95 1 median annual 0 budget of $57k. 2014 2015 2016

Technical research Related efforts Combined

* Thanks to Sebasan Farquhar at the Global Priories Project for the data Superintelligence Alignment Andrew Critch: [email protected]

This makes sense: sacrificing researcher-hours to work on AI alignment means giving up some of your compeve edge over other teams. Superintelligence Alignment Andrew Critch: [email protected]

Even if you care about the long-term future, you want your team to be ahead of the pack because you trust yourself with the responsible development of human-level AI more than you trust your competors. Superintelligence Alignment Andrew Critch: [email protected]

alignment hey guys… safety is this way…

safety safety safety

intelligence Superintelligence Alignment Andrew Critch: [email protected]

Summary: 1. Human-level AI (HLAI) will probably exist within the next several decades. It will likely be simpler than a human brain, and be followed shortly thereaer by super-human AI (SHAI). 2. SHAI has the potenal to dominate humans in shaping the future if/when it exists (cue: ) 3. You probably need SHAI to be aligned with (your) human interests if you wish to affect the future more than a few decades from now (cue: children, rerement, the environment, space travel), 4. Alignment research is currently highly neglected because today an AI research team’s best strategy for aracng top talent is to focus on beang intelligence benchmarks instead of alignment. 5. Intervenons needed: 1. More technical work specifically on superintelligence alignment, and 2. Incenves and instuons for cooperave development of HLAI, so that race condions do not crowd out safety measures. Superintelligence Alignment Andrew Critch: [email protected] Thanks!

hp://intelligence.org/

hp://humancompable.ai/