<<

Hard Choices and Hard Limits for Artificial Intelligence

Bryce Goodman Department of Philosophy University of Oxford United Kingdom [email protected]

ABSTRACT are “on a par”: one is neither better, worse nor equal to the other and yet the alternatives are comparable. Artificial intelligence (AI) is supposed to help us make better 2. The second situates Chang’s conclusions in the context of the choices [1]. Some of these choices are small, like what route to utility model of rational choice. This model says that rational take to work [27], or what music to listen to [64]. Others are big, agents’ choices are governed by a utility function 1 , and like what treatment to administer for a disease [74] or how long to denies the possibility of parity. If we accept Chang’s sentence someone for a crime [2]. If AI can assist with these big argument, the utility model fails in the case of hard choices. decisions, we might think it can also help with hard choices, cases 3. The third discusses the implications for AI writ large. Every where alternatives are neither better, worse nor equal but on a par application of machine learning requires an objective function, which attempts to compress a complex problem [15]. The aim of this paper, however, is to show that this view is into a singular scalar value. Thinkers like Stuart Russell mistaken: the fact of parity shows that there are hard limits on AI believe that we can use AI to learn an agent’s reward in decision making and choices that AI cannot, and should not, function, and therefore guarantee alignment between AI resolve. goals and human values. But the fact of hard choices proves this approach is wrong: human decision making cannot KEYWORDS always be represented by an objective function because Value alignment; fairness; hard choices; AI ethics human values are not always amenable to quantitative representation. 4. The last provides evidence of the limits of AI decision 1 Introduction making through a discussion of algorithmic fairness. Fairness is an essentially contested subject, there is no obvious or Artificial intelligence (AI) is supposed to help us make better objectively correct definition: certain conceptions of fairness choices [1]. Some of these choices are small, like what route to are not better, worse or equal to one another – they are on a take to work [27], or what music to listen to [64]. Others are big, par. The result is that only people, as moral agents, are like what treatment to administer for a disease [74] or how long to capable of committing to a particular conception of fairness, sentence someone for a crime [2]. If AI can assist with these big which can then be instantiated in an AI system. There are decisions, we might think it can also help with hard choices. The indeed theoretical limits on AI in decision making. aim of this paper, however, is to show that this view is mistaken. Though my assessment of AI and hard choices is critical, its The stakes are as follows. We live in a world where AI is conclusion is positive: hard choices don’t show that AI is increasingly used to make decisions with significant ethical unimportant or uninteresting, but that human agency is more 2 consequences. In many cases, AI predictions already outperform important, and more complex, than we might have thought. human judgement, and this trend is poised to continue. In the coming years, many of the decisions currently made by people will – and should be – delegated to AI. The question is whether 2 What makes a choice hard? there are theoretical limits on AI in decision making and choices The following section provides an overview and defense of that AI cannot, and should not, resolve. Ruth Chang’s theory of hard choices. Readers already familiar

This paper is structured into the following sections: 1 Utility function, reward function and objective function are each used in different disciplines (e.g. , machine learning, reinforcement learning) and, for our intents and purposes, interchangeable. What matters is that they are all functions, 1. The first seeks to identify what it is that makes a choice hard. “binary relation[s] between two sets that associate to each element of the first set exactly one element of the second set” [81]. In this section, I reconstruct and defend the philosopher Ruth 2 For a different, though obviously related, attempt to cash out the implications of Chang’s position that hard choices occur because alternatives Chang’s argument for AI, see [23]. with Chang’s work should feel free to skim, as should those who what we want. If we knew that investing in stock A would bring are less philosophically inclined and/or willing to accept Chang’s us twice the return of stock B, we would buy it. The choice is conclusions without argument. difficult just because we don’t know how things are going to turn Life is filled with choices. Some are hard. Should I pursue a out – we are ignorant of nonnormative factors relevant to making career that brings personal fulfillment, or financial security? the choice. Should I live close to my parents, or set off on my own? Do I On the other hand, deciding whether to donate to Save the favor the Beatles, or the Rolling Stones? Children or Greenpeace may be a difficult choice because we are But a decision being big is not the same as a decision being not really sure what cause we should care most about: we are hard. Big choices are big because of their consequences: deciding genuinely unsure of what matters more, saving a child’s life or where we live, who we marry, or whether we undergo a medical protecting the world’s oceans. This is also a case of ignorance, but procedure are all big choices because the of the choice this time we are uncertain about the normative factors relevant to will have a big impact on our lives. But big choices are not always making the choice. hard and hard choices are not always big [15:27]. It is a big choice To determine whether a choice’s difficulty arises in virtue of to marry the person we love, but not necessarily a hard choice. On ignorance or uncertainty, we can pose the counterfactual: “Would the other hand, deciding whether to spend your holiday in Italy or it still be a difficult choice if we had perfect knowledge of all the Ethiopia is a choice that may be small, in terms of overall life normative and nonnormative factors relevant to making the consequences, yet hard, because it lacks a clearly correct answer. choice?” If the argument for ignorance is correct, then in the case It’s also worth distinguishing between epistemically and of hard choices the answer must always be no.4 metaphysically hard choices. Some choices clearly have a right Chang argues that ignorance cannot be the cause hard choices answer, even if it is unclear what the right answer is, e.g. will a because there are cases where the chooser has first person particular medical treatment improve a patient’s condition? In authority over the properties relevant to the choice, but the choice other cases, it is not clear that there is a right answer, but one must is hard nevertheless.5 She offers us the example of choosing choose anyway, e.g. what is the best way to allocate limited between lemon sorbet and apple pie where the only relevant medical resources during a pandemic? The latter are cases of aspect is tastiness to the chooser.6 Let us suppose we are able to hard choices, not the former. sample each so we know precisely how each dessert tastes. It is Chang [15:1] proposes the following gloss of situations that plausible that neither tastes better overall: both are incredibly involve hard choices: delicious in their own right. And, Chang argues, it is also possible 1. One alternative is better in some relevant respects that they do not taste equally good. If apple pie and lemon sorbet 2. The other alternative is better in other relevant respects, and did taste equally good, adding a small improvement to one should yet make the choice obvious. But, she argues, this is not so. We may 3. Neither seems to be at least as good as the other overall, that find that apple pie tastes better with whipped cream than without. is, in all relevant respects If we are stuck choosing between apple pie and lemon sorbet, it is If we think of rational agency as consisting in the ability to not clear that adding whipped cream to the pie will resolve the recognize and respond to reasons3, a hard choice occurs when “the issue. However, if apple pie and lemon sorbet were equal, a small agent’s reasons for going one way as opposed to the other have, in improvement to one should lead us to rationally prefer that option. some sense, ‘run out’”[15:1]. Chang’s claim is that reasons run Chang [15:5–6] calls this the “small improvement argument”: out because of a structural feature of hard choices, namely that the 1. A is neither better nor worse than B with respect to V (for alternatives under consideration are neither better, worse nor example, Apple pie is neither better nor worse than lemon equal – they are on a par. sorbet with respect to tastiness). Chang defends her position against three alternatives. The first 2. A+ is better than A with respect to V: (Apple pie with is an argument for ignorance or uncertainty: “what makes a choice whipped cream is better than apple pie without whipped hard is that we are ignorant or uncertain of the normative or cream with respect to tastiness). nonnormative factors that are relevant to making the choice” 3. A+ is not better than B with respect to V: (Apple pie with [15:3]. An agent simply does not know enough about each option whipped cream is not better than lemon sorbet with respect to to decide: she may be unsure of how a choice will play out, or tastiness). ignorant about what she actually values. Deciding how to invest one’s money is difficult. But it is not difficult because we don’t know what we want. It is difficult because we don’t know what choice is most likely to result in 4 To be clear, a choice can be intuitively “hard” even if we know what choice we need to make – you may know that the right thing is to end a relationship, but nevertheless struggle with the decision. This sort of difficulty, however, does not 3 Chang [16] refers to this as the “passivist view” of rational agency (in contrast to pertain to knowing what choice to make, but rather the significance and/or consequences of making that choice. her own “activist view”, discussed later in our paper) and finds its roots in the works 5 In other words, the chooser has perfect knowledge of the relevant normative and of Susan Wolf [78], Joseph Raz [55], Thomas Scanlon [63], Derek Parfit [54], nonnormative factors, but the choice is still hard. Jonathan Dancy [22], Jonathan Skorupski [68] and, more distantly, Hobbes [38:45], 6 If this example seems overly trivial, we might suppose that this will be the who wrote that “when a man reasons, he does nothing else but conceive a sum total chooser’s last meal, or swap in more somber stakes, e.g. Sophie’s choice of which from addition of parcels for REASON…is nothing but reckoning”. child to save.

4. A and B are not equally good with respect to V: (If they were, incommensurable, we must reject the view that hard choices arise apple pie with whipped cream would taste better than lemon only when and because alternatives are incomparable.7 sorbet but in premise 3 we have assumed it does not). Our discussion thus far has concerned Chang’s argument that 5. Therefore, A is not better than, worse than, or equally as neither ignorance, incommensurability nor incomparability are the good as B with respect to V root cause of hard choices. Before proceeding to evaluate Chang’s The dessert example illustrates the possibility of cases where positive claims, it is worth situating her view relative to the “none of the three usual comparative relations, ‘better than’, classical assumptions of rational choice and expected utility ‘worse than’, and ‘equally good’” holds even though the chooser theory, which provide the basis for what I will call the “utility has perfect knowledge of the relevant aspects of the choice [15:4]. model” of rational choice [51].8 Consequently, ignorance alone cannot explain all cases of hard choices that fit our gloss. The second objection Chang rejects holds that hard choices 3 Hard Choices and the Utility Model occur when the alternatives are incommensurable. To say that According to the utility model of rational choice, “any rational alternatives are incommensurable is to say that they lack a agent can be described as having a utility function that assigns a common measure [56:1,15:6]. Weight and height are, in this sense, real number representing the desirability of being in any particular incommensurable properties: it is incoherent to compare an world state” [59:7–8]. In particular, expected utility theory says object’s height with another object’s weight. On this view, apple that a rational decision maker chooses between alternatives by pie and lemon sorbet are incommensurable because tastiness is comparing their weighted expected utility values, which is a not the sort of property that can be represented by units: we function of utility value and probability [50]. When the choice cannot say that apple pie is 2.4 times as tasty as lemon sorbet. So, takes place under certain conditions, as in the dessert example, the one might argue, hard choices arise because we lack the ability to utility value alone is what determines the choice.9 quantitively compare alternatives. The utility model, and its assumptions about how people make This argument, however, conflates incommensurability with choices, plays a prominent role in economics [30], public policy incomparability. We may not be able to quantify the difference in [35,39] , ethics [47], psychology [25,42] and artificial intelligence taste between apple pie and lemon sorbet, but we can nevertheless [33,31,60]. As a descriptive theory, it offers a tool for compare the two. Alternatives that cannot be related cardinally understanding and predicting why people choose what they do: “it may still be related ordinally: we regularly engage in rational looks at people’s choices (e.g. how much money they’ve saved, comparisons of incommensurable alternatives by ranking one what car they bought), and tries to ‘rationalize’ those choices, that above the other. is, figure out whether the choices are compatible with Furthermore, incommensurability between alternatives does optimization and, if so, what the choices imply about the agent’s not necessarily mean that a choice is hard. There may be no unit preferences” [45:7]. When we rationalize an agent’s choices we for measuring how much enjoyment we get from watching are trying to create an abstract representation of her preferences television or falling in love, but this does not mean we would that will make sense of the decisions she made and, insofar as she struggle to choose one experience over the other. As Chang [15:7] acts rationally, predict what decisions she is likely to make.10 writes, “since it is compatible with alternatives being Under the utility model, an agent chooses between alternatives incommensurable that one of them is better than the other with in accordance with a set of preferences and her choice is rational respect to [the relevant property], the choice between insofar as those preferences are governed by a set of logical incommensurables could be easy: choose the better alternative. relations. In particular, it is assumed that an agent’s preferences Thus, incommensurability is not what makes a choice hard”. are both transitive and complete [73:66–67]. Transitivity means A third position considered by Chang [15:10] is that a choice that if a rational agent prefers A to B and B to C, she will also is hard when the options “cannot be compared with respect to prefer A to C. Completeness means that all of an agent’s options what matters in the choice”. On this view, whenever there is no can be related in one of three ways: basis for comparing two options, there is no basis for a rational 1. She prefers A to B choice [56]. If incomparability is what makes choices hard than 2. She prefers B to A choosing one alternative over another can never be rational. Chang’s three arguments against this position essentially boil 7 This is a very short and incomplete version of an argument Chang [14] offers in down to the observation that we can, in fact, make rational favor of comparativism, which says (among things) that we can rationally choose decisions even in the case of hard choices: “incomparabilists between incomparable alternatives. This claim is not central to our argument and so cannot make sense of the relatively common fact that sometimes is only cited for completeness. 8 There are, of course, lively debates within the field of decision theory that I am in a hard choice the agent rationally concludes that she has most passing over, such as whether the expected utility model is a normative or positive reason to choose one rather than the other of the alternatives” theory. 9 Inverse reinforcement learning, which we will discuss later, assumes what is [15:9]. Intuitively, if we think lemon sorbet and apple pie can be sometimes called “Boltzmann (ir)rationality”, according to which humans are compared in terms of their tastiness even if their tastiness is “noisily rational” and will choose an option with a probability proportional to e to the power of its utility [79,66]. 10 As a normative theory, the standard model would also tell us what decisions she should make.

3. She is indifferent between A and B of an option—by standing for it—you can be that in virtue of This trichotomy should be familiar – it exhaustively describes which something is a will-based reason for choosing that option”. the way in which any two quantities can be related. For example, In hard choices, it is an agent’s will, rather than any external an object may weigh more, less, or the same as another. There is condition, that serves as the determinative source of reasons. no fourth option. Chang’s argument is at once intuitive and surprising. It is If we accept this model, we can make a large number of intuitive that the exercise of our will – the act of deciding to do inferences from a small number of choices. For example, if Alice something – may be a source of reasons. After all, when we chooses apple pie over lemon sorbet and we assume that tastiness commit to a cause or a person, that commitment grounds reasons is the only factor relevant to her choice, the utility model says we (e.g. to spend our time with them, to protect them, etc.). But it is can infer that she thinks apple pie is either better than or equal to surprising that rational agency – the ability to decide based on lemon sorbet with respect to tastiness. Further, if we learn that she reasons – may require the ability to not merely recognize but also prefers apple pie with whipped cream to apple pie, we can reliably generate reasons [16]. predict that, insofar as she is rational, she will always tend towards choosing apple pie with whipped cream over lemon sorbet. 4 Hard Choices and the Limits of AI If Chang’s theory of hard choices is true, however, it poses a We now turn our attention to how hard choices should affect particular challenge for the utility model of rational choice. Chang how we think about AI. Although many of the conclusions we rejects the trichotomy and introduces a fourth option: A and reach will be applicable to AI more generally, our analysis will B may be on a par [12]. When alternatives are on a par they are focus on machine learning, a specific sub-field of AI concerned comparable yet incommensurable: we can rationally choose one with algorithms that “learn” from data. But first, it is worth briefly over the other even though we lack a common metric for saying something about what AI is, and what it is not. comparison. And, she argues, it is precisely in these cases that we The language of AI is math: if a problem cannot be stated end up facing a hard choice. mathematically, it is not suitable for AI. More specifically, we If Chang’s account is accurate, the trichotomy assumed by the cannot use AI unless we can provide a quantitative description of utility model provides an incomplete vocabulary to describe what it is we want to optimize. In machine learning, an algorithm rational choice. We must introduce a fourth relationship in which learns to map a set of inputs to outputs. A map learned by a A is neither better, worse nor equal to B. Further, parity is not machine learning algorithm is called a model.11 Models are transitive: if A is on a par with B and B is on a par with C it does produced through a process of training. In supervised learning, not follow that A must be on a par with C. Alice may think apple each training iteration involves measuring the success of the pie with whipped cream is on a par with lemon sorbet and that model at mapping inputs to outputs and then adjusting the model. lemon sorbet is on a par with apple pie but still prefer apple pie The difference between a model’s output (prediction) and the with whipped cream over plain old apple pie [15:21]. The upshot actual output (reality) is known as error. The goal of training is to is that, if we accept parity as a relationship between alternatives, reduce model error. The function used to track how well the we are limited in what we can infer about a rational agent based model is doing is the objective function: this function’s input is on her choices alone: if Alice picks apple pie over lemon sorbet, the model’s results and the output is a scalar value. When the we cannot conclude she thinks that apple pie is better or equally function outputs higher values for more accurate results, we call it good. She may, instead, think that they are on a par. a reward function. In the opposite case, we call it the cost, loss or The utility model may still be able to accurately describe and error function [9:82]. If the algorithm is intended to optimize an predict how agents make decisions in many situations, in agent’s utility (e.g. evaluate and recommend the best alternative), particular those situations that involve a choice between the objective (or reward) function is equivalent to the utility alternatives that can be quantitively compared (e.g. money). If we function discussed in the previous section. could reliably distinguish cases where an agent faces a hard A well specified objective function is critical to ensure choice we could appropriately circumscribe the model. However, alignment between what an AI system does and what we want it the fact that hard choices are caused by structural, rather than to do. This is because the objective function “reduces all the substantive, features of a choice means that there is no obvious various good and bad aspects of a possibly complex system down way to determine, a priori, when hard choices will arise for an to a single number, a scalar value, which allows candidate agent. Furthermore, because hard choices are structural, they solutions to be ranked and compared” [57:155]. As a practical might arise far more often than we might think – as we have matter, our ability to build an AI system that meets our goals will already seen, a hard choice does not have to be big. only ever be as good as our ability to translate those goals into a So, what are we to do when we face a hard choice? How can single metric. we make a rational decision when alternatives are on a par? Chang’s answer begins by distinguishing between reasons that rest upon facts external to the actor (given reasons) and those that exist in virtue of the actor’s will (will-based reasons). According 11 Deep learning is a subfield of machine learning that builds models using neural to Chang [15:23], “when our given reasons are on a par, will- networks [9]. based reasons may step in…by putting your will behind a feature

The same is true for another popular field of machine learning: 2. if needed, measurements of the sensory inputs to that reinforcement learning. In the context of AI12, reinforcement agent; and learning is concerned with artificial agents or learners “learning 3. if available, a model of the environment what to do—how to map situations to actions—so as to maximize Determine the reward function being optimized a numerical reward signal. The learner is not told which actions to Put simply, the inverse reinforcement learning problem is the take, but instead must discover which actions yield the most problem of inferring from an agent’s behavior and circumstances reward by trying them” [71:2]. The “numerical reward signal” is the reward or utility function that is guiding her choices.15 Various specified by a reward function, which typically assigns different forms of inverse reinforcement learning have been offered as a values to states brought about by an agent’s actions [59:8].13 More solution to the alignment problem [33,32]. For example, Stewart generally, reinforcement learning adopts what Rich Sutton calls Russell [59:7,60] proposes three principles to guide the design of the reward hypothesis: “all of what we mean by goals and “provably beneficial” AI: purposes can be well thought of as maximization of the expected 1. The machine’s purpose is to maximize the realization of value of the cumulative sum of a received scalar signal (reward)” human values. [70]. 2. The machine is initially uncertain about what those human The upshot is that both supervised and reinforcement learning values are. – the two dominate modes underpinning AI today – depend upon 3. Machines can learn about human values by observing the the ability to represent decisions and their results in quantitative, choices that we humans make. scalar terms. Russell calls this “cooperative inverse reinforcement learning”, If Chang’s theory is correct, it places clear limits on the reward and the basic idea is that AI can learn what someone values – her hypothesis. More specifically, it says that we should reject the reward function – through repeat interactions and observations.16 reward hypothesis whenever we encounter a hard choice, since in However, if Chang is correct, humans do not have anything this scenario goals and purposes cannot be represented in terms of like a reward function. Sometimes, perhaps oftentimes, goals and a scalar signal. Furthermore, it gives us criteria for deducing what purposes simply cannot be represented as the maximization of the scenarios or sorts of problems will elude AI solutions on expected value of a scalar reward [75]. Russell’s approach fails, philosophical, and not merely technical, grounds. not because AI cannot come to learn what we value per se, but What does this mean for AI? It means that certain claims about because what we value is partly determined by internally the future of AI capabilities are a priori false. Demis Hassabis, generated reasons that are not revealed by externally observable co-founder and CEO of DeepMind – arguably the most advanced behavior [13].17 AI company in the world – has stated that DeepMind’s goal is “to More generally, if we accept the possibility of parity, we allow solve intelligence” and then use that intelligence “to solve for the possibility that a rational agent’s preferences may be everything else” [80]. But if Chang’s thesis is true, solving underdetermined up and until she makes a choice.18 This intelligence won’t solve everything else. In particular, it won’t conclusion is supported not only by Chang’s philosophical solve problems that involve hard choices. arguments, but empirical evidence. As the neuropsychologists Chang’s theory also calls into question the merits of certain Benjamin Hayden and Yael Niv [36:7] put it, “value doesn't sit in solutions to the so-called “alignment problem” in AI: the problem the brain waiting to be used; rather, is a complex and of ensuring agreement between AI objectives and human values active process that takes place at the time the decision is made”. [19].14 One category of solutions is based upon inverse The prevalence of ad hoc preference formation is also evidenced reinforcement learning, which attempts to solve the following by the work of behavioral scientists Sarah Lichtenstein and Paul [52]: Slovic [46:1], who write: Given 1. Measurements of an agent’s behavior over time, in a variety of circumstances; 15 This is, more or less, an operationalization of Paul A. Samuelson’s [61] “Consumption theory in terms of revealed preference”. For two canonical critiques, see [65,76]. 16 12 In particular, Russel [59:8] states that “to the extent that objectives can be defined Outside of AI, reinforcement learning “has furnished the canonical computational concisely by specifying reward functions, behavior can be explained concisely by framework for understanding value-guided decision-making in humans and other inferring reward functions”. primates, and has been widely used to explain how different brain regions participate 17 This conclusion is in accord with Stuart Armstrong and Sören Mindermann [5:2], in valuation and choice” [41:856]. 13 who argue that “although current [inverse reinforcement learning] methods can The issue of how to properly specify reward functions plays a prominent role in perform well on many well-specified problems, they are fundamentally and discussions about AI safety. If we want to train a robot to sweep our floors, for philosophically incapable of establishing a ‘reasonable’ reward function for the example, we may design a reward function that assigns positive value to the action human, no matter how powerful they become.” Where we differ, however, is over “cleaning” or state “clean floors”. If the reward function assigns positive value every why such methods fail: they argue that “normative assumptions” need to be time a floor goes from dirty to clean, our robot may get stuck in an endless cycle of introduced to constrain the space of potential reward functions assignable to cleaning, dumping and cleaning again, since this would produce a greater reward observed behavior, whereas we have argued that an agent’s preferences may be than simply cleaning the floors once. This behavior is known as “reward hacking” underdetermined up to and until a choice is made. 18 because the AI agent has “hacked” the reward function [20]. A recent study by Silver et al [67] found “that infants experienced choice-induced 14 This same concern is also sometimes referred to as the “control problem”: “how to preference change similar to adults’…hence, choice shapes preferences—even ensure that systems with an arbitrarily high degree of intelligence remain strictly without extensive experience making decisions and without a well-developed self- under human control” [59:3]. See also [10]. concept.”

One of the main themes that has emerged from behavioral 5 Hard Choices in AI Design: The Case of decision research during the past three decades is the view that Fairness people’s preferences are often constructed in the process of If we can rely upon AI to resolve hard choices, it must follow elicitation. This idea is derived from studies demonstrating that that we can rely upon AI to resolve hard choices in its design. A normatively equivalent methods of elicitation (e.g., choice and closer examination reveals that this is not so, nor should we want pricing) give rise to systematically different responses. These it to be. preference reversals violate the principle of procedure Imagine we are designing an algorithm that will be used to invariance that is fundamental to all theories of rational choice. recommend candidates for a job, and we want the algorithm to be The only way to square this evidence with the idea that fair with regards to a candidate’s sex. We immediately face a hard humans are rational agents is to accept that a rational agent need choice: how should we define fairness? For some, fairness not have fully formed preferences prior to the act of choice and depends on everyone being treated the same: we should all be that “there is no guarantee of consistency or reliability [of subject to the same rules and processes. For others, what really preferences]” [36:7]. matters is not the process, but where people end up: if applying The upshot is that, when we ‘rationalize’ an agent’s choices the same rules to everyone results in one group having less than under the utility model, we may be imposing consistency where another, we shouldn’t subject everyone to the same rules. The there is none.19 Instead, we should recognize that inconsistency is debate has no obviously correct21 resolution because fairness is not a sign of irrationality – it is a mark of agency. As Emerson what W.B. Gallie [28:168] calls an ‘essentially contested [26:183] reminds us, “a foolish consistency is the hobgoblin of concept’: “when we examine the different uses of these terms and little minds…with consistency a great soul has simply nothing to the characteristic arguments in which they figure we soon see that do. He may as well concern himself with his shadow on the wall”. there is no one clearly definable general use of any of them which To recap, all applications of machine learning require an can be set up as the correct or standard use”. objective function that “reduces all the various good and bad The mere fact people cannot agree on how a term should be aspects of a possibly complex system down to a single number, a used does not mean it refers to an essentially contested concept: scalar value, which allows candidate solutions to be ranked and there are lively debates over whether human-induced climate compared” [57:155]. In particular, reinforcement learning adopts change is real or child vaccinations cause autism, but this is just the reward hypothesis: “all of what we mean by goals and because certain groups are scientifically misguided. However, in purposes can be well thought of as maximization of the expected the case of fairness, the fact that we lack a universally accepted value of the cumulative sum of a received scalar signal (reward)” conception is not the result of ignorance or obstinance. Rather, [70]. However, hard choices are cases where alternatives cannot insofar as fairness is an essentially contested concept, disputes be represented in terms of scalar values: the standard trichotomy– “are perfectly genuine” and “although not solvable by argument greater than, lesser than and equal to – simply does not apply. of any kind, are nevertheless sustained by perfectly respectable Therefore, we should not expect that AI can resolve those choices. arguments and evidence” [28:169]. This conclusion has important implications for the relationship Most of the time, our world is able to accommodate such between intelligence and agency. Pei Wang [29:31] defines conceptual debates: art galleries will remain open even if we intelligence as “the capacity of a system to adapt to its disagree over the essential nature of art. However, if we are to environment while operating with insufficient knowledge and ensure that an algorithm’s recommendations are fair, we must be resources”. If we accept this definition, it seems perfectly able to first express fairness in mathematical terms. This is where plausible that machines can satisfy the requirements of being (at the trouble begins. least somewhat) intelligent. But agency – the ability to engage in Numerous methods have been proposed for making machine making rational choices – is quite a different matter. Our learning algorithms ‘more fair’ [77,40]. Mehrabi et al. [48] argument reveals that a rational agent must not only recognize provide an overview of more than twenty one different definitions reasons but, when faced with a hard choice, be capable of of ‘algorithmic fairness’ found in machine learning literature, generating reasons.20 such as: All of which points to a new, and critical, question: can we rely on AI to generate the sort of reasons required in hard choices? • Equalized odds – Men and women should have equal rates And, perhaps more importantly, should we? We turn to this for true positives and false positives [34,58,49,7]. question in the following section. • Equal opportunity – Men and women should have equal true positive rates [6,34,8]. • Demographic parity – The likelihood of a positive outcome 19 One is reminded of Nassim Nicholas Taleb’s Bed of Procrustes [72]: “we humans, should be the same regardless of whether the person is a man facing limits of knowledge, and things we do not observe, the unseen and the or a woman [21]. unknown, resolve tension by squeezing life and the world into crisp commoditized ideas, reductive categories, specific vocabularies, and prepackaged narratives...” 20 Sutton [69] notes that John McCarthy, who coined the term “artificial intelligence”, believed that “intelligence is the computational part of the ability to achieve goals in 21 The fact there is no obvious correct solution does not mean there are not obviously the world.” This is perfectly commensurate with our argument, which says that incorrect solutions (e.g. give all the jobs to men) or that a solution cannot be reached. rational agency requires an additional capacity: to establish those goals. The point is that fairness, unlike accuracy, is not an objective measure.

• Fairness through awareness – Men and women with similar because of the way he drifts into becoming a murderer. Hannah characteristics should receive similar outcomes [24]. Arendt [3:54] observed that “in the general moral denunciation of • Fairness through unawareness – The algorithm is fair so long the Nazi crimes, it is almost always overlooked that the true moral as gender is not explicitly used in the decision-making issue did not arise with the behavior of the Nazis but of those who process [18]. only ‘coordinated’ themselves and did not act out of conviction”. Each conception captures a different intuition about what When the ethical stakes are high, drifting is, according to Arendt, might constitute fairness. One would hope that all of these not merely lazy, but evil [4]. intuitions are mutually compatible in practice. But they are not. On the other hand, the ability to exercise our will and form our Kleinberg et al. [44] show that various conceptions of fairness own reasons sui generis is, for thinkers such as Soren Kierkegaard in machine learning derive from an attempt to satisfy one of three [43], Friedrich Nietzsche [53:219] and Martin Heidegger [37:248], distinct conditions: precisely what it means to live authentically. When we decide • Calibration condition -- That the probability estimates whether and how we commit, we prove that we are not “machine provided by the algorithm should be well calibrated across men with machine minds and machine hearts” [17] and become groups: if the algorithm predicts that x% of women and y% the authors of our own rationality. of men are good candidates, then x% of women and y% of The present example shows us why certain hard choices are men should be good candidates. especially hard, and why they are, and should, be outside the • Balance for the positive class condition -- That the average purview of AI. They require that we commit not only to a score (probability estimate) received by people recommended particular alternative, but also to a particular understanding of by the algorithm is the same across groups: if women who what matters to us. In such cases, delegating decision making to score x or greater are recommended, then men who score x AI is not only impossible, it is unconscionable. or greater should also be recommended. Some might find this conclusion dispiriting. However, as • Balance for the negative class condition -- That the average Chang notes, navigating hard choices “is the point at which we score (probability estimate) received by people rejected by come into our own as self-governing agents” [15:27]. We should the algorithm is the same across groups: if women who score not despair that AI cannot resolve hard choices, because hard x or lower are rejected, then men who score x or lower choices give us the opportunity to decide who we are. should also be rejected. The researchers consider a number of cases and proposed ACKNOWLEDGMENTS solutions before concluding that “these conditions are in general I would like to thank the Uehiro Centre for Practical Ethics and incompatible with each other…moreover, this incompatibility the Institute for Ethics in AI at the University of Oxford as well as applies to approximate versions of the conditions as well” [44:3]. the following for their invaluable feedback: Ruth Chang, Julian All of the metrics available for measuring or achieving fairness Savulescu, Brian Christian, Ritwik Gupta, Larry Sommer will, in nearly all cases, result in trade-offs: intuitions about what McGrath, Thomas Sinclair, Lofred Madzou and Nikita Aggarwal. constitutes a fair decision will be violated by implementing one alternative or another.22 The upshot is that deciding how to REFERENCES measure fairness is a hard choice: certain alternatives are neither [1] Ajay Agrawal, , and Avi Goldfarb. 2018. better, worse nor equal to one another with respect to fairness. Prediction machines: the simple economics of artificial They are on a par. intelligence. Harvard Business Press, Cambridge. So, what are we to do? Chang has already provided an answer: [2] Julia Angwin, Jeff Larson, Surya Mattu, and Lauren when faced with a hard choice, we have reached the limits of Kirchner. 2016. Machine Bias. ProPublica. Retrieved from quantitative comparison. Nothing in the world will tell us the https://www. propublica. org/article/machine-bias-risk- correct answer. Instead, we must commit. Our commitment can assessments-in-criminal-sentencing [3] Hannah Arendt. 2009. Some Questions of Moral Philosophy. then become a new source of reasons: “when you commit to In Responsibility and Judgment. Schocken, New York. something…it is your agency, the very activity of committing, [4] Hannah Arendt and Jens Kroh. 1964. Eichmann in 23 that is the source of the reasons you create” [15:24]. Jerusalem. Viking Press New York. Of course, we do not have to commit. We can neglect to [5] Stuart Armstrong and Sören Mindermann. 2019. Occam’s “exercise our normative power as rational agents” and drift into razor is insufficient to infer the preferences of irrational one alternative, that is, “intentionally choose it, but in a way that agents. arXiv:1712.05812 [cs] (January 2019). Retrieved is noncommittal, that does not involve our putting ourselves November 11, 2020 from http://arxiv.org/abs/1712.05812 behind [it]” [15:26]. But there is a cost to drifting. The main [6] Richard J. Arneson. 1989. Equality and equal opportunity for character in Camus’ [11] The Stranger is horrifying precisely welfare. Philosophical studies 56, 1 (1989), 77–93. [7] Pranjal Awasthi, Matthäus Kleindessner, and Jamie Morgenstern. 2020. Equalized odds postprocessing under 22 This result is also known as the “Impossibility Theorem of Machine Fairness” [62]. imperfect group information. In International Conference on 23 I suspect that much of the work on algorithmic fairness has had very little impact Artificial Intelligence and Statistics, PMLR, 1770–1780. because the organizations charged with adopting solutions are, in the absence of any [8] Yahav Bechavod, Katrina Ligett, Aaron Roth, Bo Waggoner, explicit legislation, reluctant to commit to a definition of fairness. and Steven Z. Wu. 2019. Equal opportunity in online

classification with partial feedback. In Advances in Neural [29] Ben Goertzel and Cassio Pennachin. 2007. Artificial general Information Processing Systems, 8974–8984. intelligence. Springer, New York. [9] Yoshua Bengio, Ian Goodfellow, and Aaron Courville. 2017. [30] Faruk Gul and Wolfgang Pesendorfer. 2008. The case for Deep learning. MIT Press, Cambridge. mindless economics. In The foundations of positive and [10] Nick Bostrom. 2016. Superintelligence: Paths, Dangers, normative economics: A handbook. Oxford University Press Strategies (Reprint Edition ed.). Oxford University Press, New York, 3–42. Oxford, United Kingdom ; New York, NY. [31] Dylan Hadfield-Menell, McKane Andrus, and Gillian K. [11] Albert Camus. 1989. The Stranger. Vintage, New York. Hadfield. 2018. Legible Normativity for AI Alignment: The [12] Ruth Chang. 2002. The possibility of parity. Ethics 112, 4 Value of Silly Rules. arXiv:1811.01267 [cs] (November (2002), 659–688. 2018). Retrieved July 23, 2020 from [13] Ruth Chang. 2013. Commitments, Reasons and the Will. In http://arxiv.org/abs/1811.01267 Oxford Studies in Metaethics, Russ Shafer-Landau (ed.). [32] Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart Oxford University Press, Oxford. Russell, and Anca Dragan. 2020. Inverse Reward Design. [14] Ruth Chang. 2016. Comparativism: The Grounds of Rational arXiv:1711.02827 [cs] (October 2020). Retrieved October Choice. Oxford University Press, Oxford. Retrieved 30, 2020 from http://arxiv.org/abs/1711.02827 November 16, 2020 from https://ezproxy- [33] Dylan Hadfield-Menell, Stuart J. Russell, Pieter Abbeel, and prd.bodleian.ox.ac.uk:2196/view/10.1093/acprof:oso/978019 Anca Dragan. 2016. Cooperative inverse reinforcement 9315192.001.0001/acprof-9780199315192-chapter-11 learning. In Advances in neural information processing [15] Ruth Chang. 2017. Hard choices. Journal of the American systems, 3909–3917. Philosophical Association (2017). [34] Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of [16] Ruth Chang. 2021. What is it to be a rational agent? In The opportunity in supervised learning. In Advances in Neural Routledge Handbook of Practical Reason, Ruth Chang and Information Processing Systems, 3315–3323. Kurt Sylvan (eds.). Routledge, New York. [35] John C. Harsanyi. 1953. Cardinal utility in welfare [17] Charlie Chaplin. 1940. The Great Dictator. United Artists. economics and in the theory of risk-taking. Journal of [18] Jiahao Chen, Nathan Kallus, Xiaojie Mao, Geoffry Svacha, Political Economy 61, 5 (1953), 434–435. and Madeleine Udell. 2019. Fairness under unawareness: [36] Benjamin Hayden and Yael Niv. 2020. The case against Assessing disparity when protected class is unobserved. In economic values in the brain. PsyArXiv. Proceedings of the Conference on Fairness, Accountability, DOI:https://doi.org/10.31234/osf.io/7hgup and Transparency, 339–348. [37] Martin Heidegger. 1996. Being and time: A translation of [19] Brian Christian. 2020. The Alignment Problem: Machine Sein und Zeit. SUNY Press, New York. Learning and Human Values (1st Edition ed.). W. W. Norton [38] Thomas Hobbes. 1958. Leviathan; Or, The Matter, Forme & Company, New York, NY. an Power of a Commonwealth. Clarendon Press. [20] Jack Clark and Dario Amodei. 2016. Faulty Reward [39] Ronald A. Howard. 1980. On making life and death Functions in the Wild. OpenAI. Retrieved October 28, 2020 decisions. In Societal Risk Assessment. Springer, 89–113. from https://openai.com/blog/faulty-reward-functions/ [40] Ben Hutchinson and Margaret Mitchell. 2019. 50 Years of [21] Sam Corbett-Davies and Sharad Goel. 2018. The measure Test (Un)fairness: Lessons for Machine Learning. In and mismeasure of fairness: A critical review of fair machine Proceedings of the Conference on Fairness, Accountability, learning. arXiv preprint arXiv:1808.00023 (2018). and Transparency (FAT* ’19), Association for Computing [22] Jonathan Dancy. 2000. Practical reality. Clarendon Press, Machinery, New York, NY, USA, 49–58. Oxford. DOI:https://doi.org/10.1145/3287560.3287600 [23] Roel Dobbe, Thomas Krendl Gilbert, and Yonatan Mintz. [41] Keno Juechems and Christopher Summerfield. 2019. Where 2019. Hard Choices in Artificial Intelligence: Addressing does value come from? Trends in cognitive sciences 23, 10 Normative Uncertainty through Sociotechnical (2019), 836–850. Commitments. arXiv:1911.09005 [cs, eess] (November [42] Joseph W. Kable and Paul W. Glimcher. 2009. The 2019). Retrieved October 15, 2020 from neurobiology of decision: consensus and controversy. http://arxiv.org/abs/1911.09005 Neuron 63, 6 (2009), 733–745. [24] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer [43] Søren Kierkegaard. 1985. Philosophical Fragments, Reingold, and Richard Zemel. 2012. Fairness through Johannes Climacus. Princeton University Press, Princeton. awareness. In Proceedings of the 3rd Innovations in [44] Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Theoretical Computer Science conference, 214–226. 2016. Inherent trade-offs in the fair determination of risk [25] Ward Edwards. 1954. The theory of decision making. scores. arXiv preprint arXiv:1609.05807 (2016). Psychological Bulletin 51, 4 (1954), 380–417. [45] Jonathan Levin and Paul Milgrom. 2004. Introduction to DOI:https://doi.org/10.1037/h0053870 Choice Theory. (2004), 25. [26] Ralph Waldo Emerson. 1982. Selected Essays. Penguin [46] Sarah Lichtenstein and Paul Slovic. 2006. The construction Classics, New York. of preference. Cambridge University Press. [27] Jen Fitzpatrick. 2020. Charting the next 15 years of Google [47] William MacAskill. 2016. Doing good better: How effective Maps. Google Blog. Retrieved November 2, 2020 from altruism can help you help others, do work that matters, and https://blog.google/perspectives/jen-fitzpatrick/charting- make smarter choices about giving back. Penguin. next-15-years-google-maps/ [48] Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina [28] W. B. Gallie. 1955. Essentially Contested Concepts. Lerman, and Aram Galstyan. 2019. A Survey on Bias and Proceedings of the Aristotelian Society 56, (1955), 167–198. Fairness in Machine Learning. arXiv:1908.09635 [cs]

(September 2019). Retrieved October 28, 2020 from in Infancy: Psychological Science (October 2020). http://arxiv.org/abs/1908.09635 DOI:https://doi.org/10.1177/0956797620954491 [49] Alan Mishler and Edward H. Kennedy. 2020. Fairness in [68] John Skorupski. 2012. The domain of reasons. Oxford Risk Assessment Instruments: Post-Processing to Achieve University Press, Oxford. Counterfactual Equalized Odds. arXiv preprint [69] Richard Sutton. 2020. John McCarthy’s Definition of arXiv:2009.02841 (2020). Intelligence. Journal of Artificial General Intelligence 11, 2 [50] Philippe Mongin. 1997. Expected utility theory. In (2020), 66–67. Handbook of Economic Methodology. Edward Elgar, [70] Richard S. Sutton. 2020. The reward hypothesis. Retrieved London, 342–350. October 28, 2020 from [51] and . 1947. Theory of http://www.incompleteideas.net/rlai.cs.ualberta.ca/RLAI/rew games and economic behavior. Princeton University Press, ardhypothesis.html Princeton. [71] Richard S. Sutton and Andrew G. Barto. 2018. [52] Andrew Y. Ng and Stuart J. Russell. 2000. Algorithms for Reinforcement Learning, second edition: An Introduction. inverse reinforcement learning. In Icml, 2. MIT Press. [53] Friedrich Wilhelm Nietzsche. 1974. The Gay Science: With a [72] Nassim Nicholas Taleb. 2016. The Bed of Procrustes: prelude in German rhymes and an appendix of songs. Philosophical and Practical Aphorisms. Random House Vintage, New York. Trade Paperbacks, New York. [54] Derek Parfit. 2011. On what matters. Oxford University [73] Johanna Thoma. 2019. Decision Theory. In The Open Press, Oxf. Handbook of Formal Epistemology, Richard Pettigrew and [55] Joseph Raz. 1999. Engaging Reason: On the Theory of Jonathan Weisberg (eds.). Retrieved from Value and Action. Oxford University Press, Oxford. https://jonathanweisberg.org/pdf/open-handbook-of-formal- [56] Joseph Raz. 2002. Incommensurability and Agency. Oxford epistemology.pdf University Press, Oxford. Retrieved November 16, 2020 [74] Eric Topol. 2019. Deep Medicine: How Artificial from https://ezproxy- Intelligence Can Make Healthcare Human Again. Hachette prd.bodleian.ox.ac.uk:2196/view/10.1093/0199248001.001.0 UK. 001/acprof-9780199248001-chapter-4 [75] . 1975. A Critique of Expected Utility Theory: [57] Russell Reed. 1999. Neural Smithing: Supervised Learning Descriptive and Normative Considerations. Erkenntnis in Feedforward Artificial Neural Networks (Illustrated (1975-) 9, 2 (1975), 163–173. Edition ed.). Bradford Books, Cambridge. [76] Amos Tversky and . 1974. Judgment under [58] Yaniv Romano, Stephen Bates, and Emmanuel J. Candès. uncertainty: Heuristics and biases. science 185, 4157 (1974), 2020. Achieving Equalized Odds by Resampling Sensitive 1124–1131. Attributes. arXiv preprint arXiv:2006.04292 (2020). [77] Sahil Verma and Julia Rubin. 2018. Fairness definitions [59] Stuart Russell. 2017. Provably Beneficial Artificial explained. In 2018 IEEE/ACM International Workshop on Intelligence. Future of Life Institute. Retrieved from Software Fairness (FairWare), IEEE, 1–7. https://people.eecs.berkeley.edu/~russell/papers/russell- [78] Susan Wolf. 1993. Freedom within reason. Oxford bbvabook17-pbai.pdf University Press, Oxford. [60] Stuart Russell. 2019. Human Compatible: Artificial [79] Brian D. Ziebart, Andrew L. Maas, J. Andrew Bagnell, and Intelligence and the Problem of Control. Viking. Anind K. Dey. 2008. Maximum entropy inverse [61] Paul A. Samuelson. 1948. Consumption theory in terms of reinforcement learning. In AAAI, Chicago, 1433–1438. revealed preference. Economica 15, 60 (1948), 243–253. [80] 2020. Demis Hassabis. Wikipedia. Retrieved October 15, [62] Kailash Karthik Saravanakumar. 2020. The Impossibility 2020 from Theorem of Machine Fairness -- A Causal Perspective. https://en.wikipedia.org/w/index.php?title=Demis_Hassabis arXiv:2007.06024 [cs, stat] (July 2020). Retrieved &oldid=982443718 November 9, 2020 from http://arxiv.org/abs/2007.06024 [81] 2021. Function (). Wikipedia. Retrieved April [63] Thomas Scanlon. 2000. What we owe to each other. Belknap 30, 2021 from Press, Cambridge. https://en.wikipedia.org/w/index.php?title=Function_(mathe [64] Markus Schedl. 2019. Deep Learning in Music matics)&oldid=1019439784 Recommendation Systems. Front. Appl. Math. Stat. 5, (2019). DOI:https://doi.org/10.3389/fams.2019.00044 [65] Amartya K. Sen. 1971. Choice functions and revealed preference. The Review of Economic Studies 38, 3 (1971), 307–317. [66] Rohin Shah, Noah Gundotra, Pieter Abbeel, and Anca D. Dragan. 2019. On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference. arXiv:1906.09624 [cs, stat] (June 2019). Retrieved November 11, 2020 from http://arxiv.org/abs/1906.09624 [67] Alex M. Silver, Aimee E. Stahl, Rita Loiotile, Alexis S. Smith-Flores, and Lisa Feigenson. 2020. When Not Choosing Leads to Not Liking: Choice-Induced Preference