1

DEMYSTIFYING “VALUE ALIGNMENT”: FORMALLY LINKING TO ETHICAL PRINCIPLES IN A DEONTIC COGNITIVE CALCULUS

S. BRINGSJORD∗ and N.S. GOVINDARAJULU and A. SEN Rensselaer AI & Reasoning (RAIR) Lab Department of Cognitive Science; Department of Computer Science Rensselaer Polytechnic Institute (RPI) Troy NY 12180 USA E-mail: [email protected]∗ and [email protected] and [email protected] www.rpi.edu

The Internet is now filled with endless disussion of the “value alignment prob- lem” (VAP) in AI, roughly the challenge of ensuring that AIs — so the idea goes — behave in accordance with human values. Fascinating is the two-part fact that discussion of VAP appears to be entirely divorced from the enormous Occidental ethics literature, which spans at least 2.5 millennia of systematic thought about morality; and that those who built this literature never analyzed VAP. It might be said that since the thinkers who produced this literature were unaware of AI, or came before AI, this two-part fact is wholly unsurprising. But this makes little sense, since morally recalcitrant human agents could have been seen as presenting VAP. Moreoever, aside from the fact that e.g. Leibniz had a sophisticated ethical system, can be credited with founding modern (at least logicist) AI, and had a stunningly firm grasp of computation and compu- tational cognition, the problem with allowing (let alone encouraging) a chasm between VAP and longstanding systematic ethics is that there is a perfectly straightforward way to formalize VAP in keeping with Western ethics. This way is formal/axiomatic, and we explain this way here, and concretize it a bit via some simple demonstrations.

1. Introduction A lot of people in and around AI who are concerned about immoral and/or destructive and/or unsafe AIs are calling for “value alignment” in such 2 machines.∗ Such a call is for instance issued by Russell, Dewey & Tegmark (2015).† Our sense is that most of the time such calls are vapid, since those issuing the calls don’t really know what they’re calling for, and since the phrase is generally nowhere to be found in ethics itself. The absence of the phrase in over three millennia of systematic thinking about ethics is a pregnant fact. Herein we carry out formal work that puts some rigorous flesh on the calls in question. This work, quite limited in scope for now, forges a formal and computational link between axiology (the theory of value) and ethical principles ( that express obligations, prohibitions, etc.). CC D is the space of deontic cognitive calculi. Hitherto, in particular calculi in this space that have been specified, and from there in some cases implemented, there is an absence of principled axiology. A case in point is DCEC∗; this calculus, presented and used e.g. in (Govindarajulu & Bringsjord 2017), specifies and implements what may so far be the most expressive, nuanced ethical principle to be AI-ified‡ — yet there is no prin- cipled axiology in this calculus, and hence none reported in the paper in question. Instead, an intuitive notion of positive and negative value (based on a simple correspondence between states-of-affairs and elementary arith- metic), and consistent with at least na¨ıve forms of consequentialist ethical theories, is given and employed. In what follows, we encapsulate one way to formally forge a link between Chisholm’s (1975) intrinsic-value axiology,§ and ethical principles, in order to introduce the road to doing this in a rich and rigorous way, subsequently. The sequel will unfold as follows. We begin with some general remarks about the axiomatic approach that we follow (§2).

2. The Axiomatic Approach to Axiology It is an important feature of a theory axiomatically formulated that the primitive and defined notions which are implicitly or explicitly defined within the theory have a meaning only as part of the theory. The ax- iomatic approach can therefore eliminate for value theory as it has for other theories the need to depend on the inadequate and inflexible re- sources of ordinary language, and can help overcome the frustration that results from an attempt to explicate the basic value concepts in isolation

∗E.g. see https://futureoflife.org/2017/02/03/align-artificial-intelligence-with-human- values/. †Arnold, Kasenberg & Scheutz (2017) offer a penetrating critque of this approach. ‡Namely: The Doctrine of Double-Effect. An excellent overview can be found here: (McIntyre 2004/2014). §The earlier version of which is (Chisholm & Sosa 1966). 3

from a coherent theory. (Davidson et al. 1955, 142)

3. Chisholm’s Intrinsic-Value Axiology, Absorbed Chisholm (1975) gives a theory of intrinsic value expressed in a quantified propositional ¶ that takes as primitive a binary relation Intrinsically Preferable; this allows him to e.g. write P (p, q), where here P is the primi- tive relation, and p and q are of course propositional variables (= zero-arity relations). To absorb Chisholm’s system into a deontic cognitive calculus, we begin by immediately recasting Chisholm’s theory in quantified multi- , in which all his (third-order) relations are modal operators, and his propositional variables are variables ranging over unrestricted formulae (which may themselves have modal operators and quantifiers within them). E.g., P becomes the binary modal operator Pref.k We cast Chisholm’s six definitions as axioms, and translate his five axioms as described immedi- ately above. This yields 11 axioms, specified as follows: A1 ∀φ, ψ [Same(φ, ψ) ↔ ¬Pref(φ, ψ) ∧ ¬Pref(ψ, φ)] A2 ∀φ [Indiff(φ) ↔ ¬Pref(φ, ¬φ) ∧ ¬Pref(¬φ, φ)] A3 ∀φ [Neutral(φ) ↔ ∃ψ (Indiff(ψ) ∧ Same(φ, ψ)] A4 ∀φ [Good(φ) ↔ ∃ψ (Indiff(ψ) ∧ Pref(φ, ψ)] A5 ∀φ [Bad(φ) ↔ ∃ψ (Indiff(ψ) ∧ Pref(ψ, φ)] A6 ∀φ, ψ [ALAG(φ, ψ) ↔ ¬Pref(ψ, φ)] A7 ∀φ, ψ [Pref(φ, ψ) → ¬Pref(ψ, φ)] A8 ∀φ, ψ, γ [(ALAG(ψ, φ) ∧ ALAG(γ, ψ)) → ALAG(γ, φ)] A9 ∀φ, ψ [(Indiff(φ) ∧ Indiff(ψ)) → Same(φ, ψ)] A10 ∀φ [(Good(φ) ∧ Bad(¬φ)) → Pref(φ, ¬φ)] A11 ∀φ, ψ [¬(Pref(φ, φ ∨ ψ) ∧ Pref(ψ, φ ∨ ψ)) ∧ ¬(Pref(φ ∨ ψ, φ) ∧ Pref(φ ∨ ψ ∨ ψ))]

4. Proof Theory Used In order to obtain some object-level theorems from A1–A11, we of course must have a proof theory to anchor matters; and in order for these theo- rems to be automatically obtained we shall of course need this theory to be

¶There are various ways of viewing this kind of logic. We view it as a proper fragment of third-order logic in which quantification is restricted to relation variables having zero ar- ity. On this view, the properties that states-of-affairs can have are viewed as the relations of third-order logic, which, accordingly, can’t be quantified over. kLater, it will become necessary to move beyond Chisholm by allowing this operator to take into account agents α, and thus it will become ternary: Pref(φ, ψ, α). The other operators in the modalized axiology of Chisholm would of course need to include a placeholder for agents. 4 implemented. We use a simple proof theory, RA, for now. Our first ingredi- ents are schemata needed for quantificational reasoning over the 11 axioms. We thus allow the standard natural-deduction schemata allow- ing introduction and elimination of ∃ and ∀ in A1–A11; and of course we allow the remaining natural-deduction schemata for first-order logic: modus ponens, indirect proof, and so on, where the wffs allowed in these schemata may contain operators. We in addition invoke the inference schemata (resp., as Sg and Sb)

`RA φ Good(φ) and

`RA ¬φ Bad(φ)

5. Bridging to Ethical Principles So far there is no connection between value and traditional ethical cat- egories, such as the obligatory and forbidden. In the full paper, we in- troduce “bridging” principles that take us from value to not only these two categories, but to the complete spectrum of ethical categories in (Bringsjord 2015). Here, put informally, are two of the principles we for- malize and explore: P1/P2 Where φ is any good (bad) state-of-affairs, it ought to be (is for- bidden) that φ. In the full paper, we explore the automated proving of the theorems in §5 by way of ShadowProver.

6. On Deriving an ‘Ought’ From an ‘Is’ Hume famously maintained in his Treatise of Human Nature that an ought cannot be dervied from an is. Yet it appears that, one, those calling for “value alignment” are calling for what Hume declared impossible, and that, two, we have nonetheless accomplished the very thing, for we have: Theorem: It ought to be that φ → φ. Proof: Trivial: φ → φ is a theorem, and hence by S is Good, and thus by Sg is obligatory. QED 5

7. Some “Value-Alginment” Theorems, Machine-Discovered/Verified As mentioned above, ShadowProver, used previously to automate a quanti- fied modal logic, can be used to implement the inference system for the cal- culus presented in Section 3.∗∗ We show three theorems that ShadowProver can prove. Below we present the theorems and figures showing output from the prover and the time taken to prove the corresponding theorem. Theorem 1 Since we prefer reading Ibsen to brushing ones teeth, the former activity is good. See Figure 1 below for ShadowProver’s output for this theorem.

Fig. 1. Proving Theorem 1

Theorem 2 Since we prefer brushing ones teeth, to paying taxes, the latter activity is bad. See Figure 2 below for ShadowProver’s output for this theorem.

∗∗An open-source implementation is available at https://github.com/naveensundarg/ prover. For help using the implementation, please contact the second author. 6

Fig. 2. Proving Theorem 2

Theorem 3 Theorem 3: Assuming the wherewithal to do so, one ought to read Ibsen. Similary, see Figure 3 below for ShadowProver’s output for this theorem.

Fig. 3. Proving Theorem 3

8. Conclusion and Next Steps We have sought to explicitly and formally forge a connection between ax- iology and deontic concepts and propositions, in order to rationalize calls for “value alignment” in AI. Have we succeeded? At this point, confessedly, the most that could be said in our favor is that we have put on the table an encapsulation of a candidate for such forging. What additional steps are necessary? Some are rather obvious. Goodness of states-of-affairs, and badness of 7 them as well, would seem to fall into continua; yet what we have adapted from Chisholm and Sosa allows for no gradations in value. That goodness and badness comes in degrees appears to be the case even if attention is restricted to states-of-affairs that are intrinsically good (bad). For instance, (at least of “weighty” things, say the stunning about the physical world in relativity theory) on the part of a person would seem to have intrinsic value, but the selfless love of one person for another would seem to be something of even greater intrinsic value. Yet at this point, again, our axiology admits of no gradations in goodness and badness. We know that we need a much more fine-grained axiology. Our suspicion is that goodness and badness should be divided between what has intrinsic value/disvalue, and what has value instrumentally. In- strumental value would be parasitic on intrinsic value. More specifically, we are inclined to think that both the instrumental and the intrinsic is graded. One final remark: Even at this early phase in the forging of a formal connection between value and ethical principles based on deontic operators, it seems patently clear that because actual and concrete values in human- ity differ greatly between group, nation, culture, religion, and so on, any notion that a given AI or class of AIs can be aligned with the set of values in humanity isn’t only false, but preposterous. For many Christians, for example, the greatest intrinsic good achievable by human persons is direct and everlasting commuion with the divine person: God. For many others (e.g. (Thagard 2012)), this God doesn’t exist, and the greatest goods are achieved by living, playing, and working in the present worldm in which our lives are short and end forever upon earthly death. In the context formed by the brute fact that values among human persons on our planet vary greatly, and are, together, deductively inconsistent, our focus on formality is, we submit, prudent. Once the formal work has advanced sufficiently, presumably the alignment of AIs with values can be undertaken relative to a concretely instantiated axiology, from which ethical principles flow. For a given class of AIs, then, their behavior would be regulated by ethical principles that flow from a particular instantiation of the axiology.

References Arnold, T., Kasenberg, D. & Scheutz, M. (2017), Value Alignment or Misalign- ment — What Will Keep Systems Accountable, in ‘Proceedings of the AAAI- 17 Workshop on AI, Ethics, and Society (WS-17-02)’, pp. 81–88. URL: https://aaai.org/ocs/index.php/WS/AAAIW17/paper/view/15216/14648 Bringsjord, S. (2015), A 21st-Century Ethical Hierarchy for Humans and Robots: 8

EH, in I. Ferreira, J. Sequeira, M. Tokhi, E. Kadar & G. Virk, eds, ‘A World With Robots: International Conference on Robot Ethics (ICRE 2015)’, Springer, Berlin, Germany, pp. 47–61. This paper was published in the com- pilation of ICRE 2015 papers, distributed at the location of ICRE 2015, where the paper was presented: Lisbon, Portugal. The URL given here goes to the preprint of the paper, which is shorter than the full Springer version. URL: http://kryten.mm.rpi.edu/SBringsjord ethical hierarchy 0909152200NY.pdf Chisholm, R. (1975), ‘The Intrinsic Value in Disjunctive States of Affairs’, Noˆus 9, 295–308. Chisholm, R. & Sosa, E. (1966), ‘On the Logic of “Intrinsically Better”’, American Philosophical Quarterly 3, 244–249. Davidson, D., McKinsey, J. & Suppes, P. (1955), ‘Outlines of a Formal Theory of Value, I’, Philosophy of Science 22(2), 140–160. Govindarajulu, N. & Bringsjord, S. (2017), On Automating the Doctrine of Dou- ble Effect, in C. Sierra, ed., ‘Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)’, International Joint Conferences on Artificial Intelligence, pp. 4722–4730. URL: https://doi.org/10.24963/ijcai.2017/658 McIntyre, A. (2004/2014), The Doctrine of Double Effect, in E. Zalta, ed., ‘The Stanford Encyclopedia of Philosophy’. URL: https://plato.stanford.edu/entries/double-effect Russell, S., Dewey, D. & Tegmark, M. (2015), ‘Research Priorities for Robust and Beneficial Artificial Intelligence’, AI Magazine pp. 105–114. Thagard, P. (2012), The Brain and the Meaning of Life, Princeton University Press, Princeton, NJ.