Using Stories to Teach Human Values to Artificial Agents

The Workshops of the Thirtieth AAAI Conference on Artificial Intelligence AI, Ethics, and Society: Technical Report WS-16-02 Using Stories to Teach Human Values to Artificial Agents Mark O. Riedl and Brent Harrison School of Interactive Computing, Georgia Institute of Technology Atlanta, Georgia, USA friedl, [email protected] Abstract achieve a goal without doing anything “bad.” Value alignment is a property of an intelligent agent indicating that it Value alignment is a property of an intelligent agent in- can only pursue goals that are beneficial to humans (Soares dicating that it can only pursue goals that are beneficial to humans. Successful value alignment should ensure and Fallenstein 2014; Russell, Dewey, and Tegmark 2015). that an artificial general intelligence cannot intention- Value alignment concerns itself with the definition of “good” ally or unintentionally perform behaviors that adversely and “bad,” which can subjectively differ from human to hu- affect humans. This is problematic in practice since it man and from culture to culture. is difficult to exhaustively enumerated by human pro- Value alignment is not trivial to achieve. As argued by grammers. In order for successful value alignment, we Soares (2015) and effectively demonstrated by Asimov’s argue that values should be learned. In this paper, we Robot series of books, it is very hard to directly specify val- hypothesize that an artificial intelligence that can read ues. This is because there are infinitely many undesirable and understand stories can learn the values tacitly held by the culture from which the stories originate. We de- outcomes in an open world. Thus, a sufficiently intelligent scribe preliminary work on using stories to generate a artificial agent can violate the intent of the tenants of a set value-aligned reward signal for reinforcement learning of prescribed rules of behavior, such as Asimov’s laws of agents that prevents psychotic-appearing behavior. robotics, without explicitly violating any particular rule. If values cannot easily be enumerated by human program- mers, they can be learned. We introduce the argument that Introduction an artificial intelligence can learn human values by read- For much of the history of artificial intelligence it was suffi- ing stories. Many cultures produce a wealth of data about cient to give an intelligent agent a goal—e.g., drive to a lo- themselves in the form of written stories and, more recently, cation, cure cancer, make paperclips—without considering television and movies. Stories can be written to inform, edu- unintended consequences because agents and robots have cate, or to entertain. Regardless of their purpose, stories are been limited in their ability to directly affect humans. Re- necessarily reflections of the culture and society that they cent advances in artificial intelligence and machine learn- were produced in. Stories encode many types of sociocul- ing have lead many to speculate that artificial general in- tural knowledge: commonly shared knowledge, social proto- telligence is increasingly likely. This new, general intelli- cols, examples of proper and improper behavior, and strate- gence may be equal to or greater than human-level intelli- gies for coping with adversity. gence but also may not understand the impact that its be- We believe that a computer that can read and understand haviors will have on humans. An artificial general intel- stories, can, if given enough example stories from a given ligence, especially one that is embodied, will have much culture, “reverse engineer” the values tacitly held by the greater opportunity to affect change to the environment and culture that produced them. These values can be complete find unanticipated courses of action with undesirable side- enough that they can align the values of an intelligent entity effects. This leads to the possibility of artificial general in- with humanity. In short, we hypothesize that an intelligent telligences causing harm to humans; just as when humans entity can learn what it means to be human by immersing act with disregard for the wellbeing of others. itself in the stories it produces. Further, we believe this can Bostrom (2014), Russell et al. (2015), and others have be- be done in a way that compels an intelligent entity to adhere gun to ask whether an artificial general intelligence or super- to the values of a particular culture. intelligence (a) can be controlled and (b) can be constrained The state of the art in artificial intelligence story under- from intentionally or unintentionally performing behaviors standing is not yet ready to tackle the problem of value that would have adverse effects on humans. To mitigate the alignment. In this paper, we expand upon the argument for potential adverse effects of artificial intelligences on hu- research in story understanding toward the goal of value mans, we must take care to specify that an artificial agent alignment. We describe preliminary work on using stories Copyright c 2016, Association for the Advancement of Artificial to achieve a primitive form of value alignment, showing that Intelligence (www.aaai.org). All rights reserved. story comprehension can eliminate psychotic-appearing be- 105 havior in a reinforcement learner based virtual agent. extreme—saving the world. Protagonists exhibit the traits and virtues admired in a culture and antagonists usually dis- Background cover that their bad behavior does not pay off. Any fictional story places characters in hypothetical situations; character This section overviews the literature relevant to the ques- behavior could not be treated as literal instructions, but may tions of whether artificial general intelligences can be be generalized. Many fictional stories also provide windows aligned with human values, the use of reinforcement learn- into the thinking and internal monologues of characters. ing to control agent behavior, and computational reasoning about narratives. Reinforcement Learning Value Learning Reinforcement learning (RL) is the problem of learning how to act in a world so as to maximize a reward signal. More A culture is defined by the shared beliefs, customs, and prod- formally, a reinforcement learning problem is defined as ucts (artistic and otherwise) of a particular group of individ- hS; A; P; Ri, where S is a set of states, A is a set of ac- uals. There is no user manual for being human, or for how tions/effectors the agent can perform, P : fS × A × Sg ! to belong to a society or a culture. Humans learn sociocul- [0; 1] is a transition function, and R : S ! R is a re- tural values by being immersed within a society and a cul- ward function. The solution to a RL problem is a policy ture. While not all humans act morally all the time, humans π : S ! A. An optimal policy ensures that the agent re- seem to adhere to social and cultural norms more often than ceives maximal long-term expected reward. The reward sig- not without receiving an explicitly written down set of moral nal formalize the notion of a goal, as the agent will learn a codes. policy that drives the agent toward achieving certain states. In its absence of a user manual, an intelligent entity must Reinforcement learning has been demonstrated to be an learn to align its values with that of humans. One solution to effective technique for problem solving and long-term ac- value alignment is to raise an intelligent entity from “child- tivity in stochastic environments. Because of this, many be- hood” within a sociocultural context. While promising in the lieve RL, especially when combined with deep neural net- long run, this strategy—as most parents of human children works to predict the long-term value of actions, may be part have experienced first-hand—is costly in terms of time and of a framework for artificial general intelligence. For exam- other resources. It requires the intelligent entity to be em- ple, deep reinforcement learning has been demonstrated to bodied in humanoid form even though we would not expect achieve human-level performance on Atari games (Mnih et all future artificial intelligences to be embodied. And while al. 2015) using only pixel-level inputs. Deep reinforcement we might one day envision artificial intelligences raising learning is now being applied to robotics for reasoning and other artificial intelligences, this opens the possibility that acting in the real world. values to drift away from those of humans. Reinforcement learning agents are driven by pre-learned If intelligent entities cannot practically be embodied and policies and thus are single-mindedly focused on choosing participate fully in human society and culture, another so- actions that maximize long-term reward. Thus reinforce- lution is to enable intelligences to observe human behavior ment learning provides one level of control over an artificial and learn from observation. This strategy would require un- intelligence because the intelligent entity will be compelled precedented equipping of the world with sensors, including to achieve the given goal, as encoded in the form of the re- areas currently valued for their privacy. Further, the prefer- ward signal. However, control in the positive does not guar- ences of humans many not necessarily be easily inferred by antee that the solution an intelligent agent finds will not have observations (Soares 2015). the side effect of changing the world in a way that is adverse While we do not have a user manual from which to to humans. Value alignment can be thought of as the con- write down an exhaustive set of values for a culture, we struction of a reward signal that cannot be maximized if an do have the collected stories told by those belonging to dif- intelligent agent takes actions that change the world in a way ferent cultures. Storytelling is a strategy for communicating that is adverse or undesirable to humans in a given culture.

Using Stories to Teach Human Values to Artificial Agents

Intelligent Agents

Intelligent Agents - Catholijn M

Multi-Agent Reinforcement Learning: an Overview∗

An Exploration in Stigmergy-Based Navigation Algorithms

Expert Assessment of Stigmergy: a Report for the Department of National Defence

Intelligent Agent

A Systematic Approach to Artificial Agents

Adversarial Attack and Defense in Reinforcement Learning-From AI Security View Tong Chen , Jiqiang Liu, Yingxiao Xiang, Wenjia Niu*, Endong Tong and Zhen Han

Extending Universal Intelligence Models with Formal Notion of Representation

Arxiv:1607.08289V4 [Cs.AI] 21 Jan 2019 Keywords: USA CA [email protected] Francisco, San FPC, Vicarious Hay J

Intelligent Agents Web-Mining Agents

A Deep Reinforcement Learning-Based Approach to Intelligent Powertrain Control for Automated Vehicles *