Impossibility and Uncertainty Theorems in AI Value Alignment or why your AGI should not have a utility function Peter Eckersley Partnership on AI & EFF
[email protected] Abstract been able to avoid that problem in the first place are a tiny por- tion of possibility space, and one that has arguably received Utility functions or their equivalents (value functions, objec- 1 tive functions, loss functions, reward functions, preference a disproportionate amount of attention. orderings) are a central tool in most current machine learning There are, however, other kinds of AI systems for which systems. These mechanisms for defining goals and guiding ethical discussions about the value of lives, about who should optimization run into practical and conceptual difficulty when exist or keep existing, are more genuinely relevant. The clear- there are independent, multi-dimensional objectives that need est category is systems that are designed to make decisions to be pursued simultaneously and cannot be reduced to each affecting the welfare or existence of people in the future, other. Ethicists have proved several impossibility theorems rather than systems that only have to make such decisions in that stem from this origin; those results appear to show that highly atypical scenarios. Examples of such systems include: there is no way of formally specifying what it means for an outcome to be good for a population without violating strong • autonomous or semi-autonomous weapons systems (which human ethical intuitions (in such cases, the objective