The First Law of Robotics ( a Call to Arms )
Total Page:16
File Type:pdf, Size:1020Kb
:-0 appear, AAAI-94 The First Law of Robotics ( a call to arms ) Daniel Weld Oren Etzioni* Department of Computer Science and Engineering University of Washington Seattle, WA 98195 {weld, etzioni}~cs.washington.edu Abstract should not slavishly obey human commands-its fore- Even before the advent of Artificial Intelligence, sci- most goal should be to avoid harming humans. Con- ence fiction writer Isaac Asimov recognized that an sider the following scenarios: agent must place the protection of humans from harm .A construction robot is instructed to fill a pothole at a higher priority than obeying human orders. In- in the road. Although the robot repairs the cavity, spired by Asimov, we pose the following fundamental it leavesthe steam roller, chunks of tar, and an oil questions: (I) How should one formalize the rich, but slick in the middle of a busy highway. informal, notion of "harm"? (2) How can an agent avoid performing harmful actions, and do so in a com- .A softbot (software robot) is instructed to reduce putationally tractable manner? (3) How should an disk utilization below 90%. It succeeds, but in- agent resolve conflict between its goals and the need spection revealsthat the agent deleted irreplaceable to avoid harm? (4) When should an agent prevent a IjcTEXfiles without backing them up to tape. human from harming herself? While we address some of these questions in technical detail, the primary goal While less dramatic than Asimov's stories, the sce- of this paper is to focus attention on Asimov's concern: narios illustrate his point: not all ways of satisfying a society will reject autonomous agents unless we have human order are equally good; in fact, sometimesit is some credible means of making them safe! better not to satisfy the order at all. As we begin to deploy agentsin environments where they can do some real damage, the time has come to revisit Asimov's The Three Laws of Robotics: Laws. This paper explores the following fundamental 1. A robot may not injure a human being, or, questions: through inaction, allow a human being to come .How should one formalize the notion of to harm. "harm"? We define dont-disturb and restore- 2. A robot must obey orders given it by human two domain-independentprimitives that capture as- beings except where such orders would conflict pects of Asimov's rich but informal notion of harm with the First Law. within the classical planning framework. 3. A robot must protect its own existence as long .How can an agent avoid performing harm- as such protection does not conflict with the ful actions, and do so in a computationally First or Second Law. tractable manner? We leverage and extend the Isaac Asimov ( Asimov 1942): familiar mechanismsof planning with subgoal inter- actions (Tate 1977; Chapman 1987; McAllester & Motivation Rosenblitt 1991; Penberthy & Weld 1992) to detect In 1940, Isaac Asimov stated the First LawofRobotics, potential harm in polynomial time. In addition, we capturing an essential insight: an intelligent agentl explain how the agent can avoid harm using tactics such as confrontation and evasion (executing sub- .We thank Steve Hanks, Nick Kushmerick, Neal Lesh, plans to defusethe threat of harm). Kevin Sullivan, and Mike Williamson for helpful discus- sions. This research was funded in part by the University .How should an agent resolve conflict between of Washington Royalty Research Fund, by Office of Naval its goals and the need to avoid harm ? We ResearchGrants 90-J-1904 and 92-J-1946, and by National impose a strict hierarchy where dont-disturb con- ScienceFoundation Grants IRI-8957302, IRI-9211045, and straints override planners goals, but restore con- IRI-9357772. 1Since the field of robotics now concernsitself primarily straints do not. with kinematics, dynamics, path planning, and low level .When should an agent prevent a human from control issues, this paper might be better titled "The First harming herself? At the end of the paper, we Law of Agenthood." However, we keep the reference to show how our framework could be extended to par- "Robotics" as a historical tribute to Asimov. tially addressthis question. The paper's main contribution is a "call to arms:" dard assumptions of classical planning: the agent has before we release autonomous agents into real-world complete and correct information of the initial state environments, we need some credible and computation- of the world, the agent is the sole cause of change, ally tractable means of making them obey Asimov's and action execution is atomic, indivisible, and results First Law. in effects which are deterministic and completely pre- dictable. (The end of the paper discusses relaxing these Survey of Possible Solutions To make intelligent decisions regarding which actions assumptions. ) On a more syntactic level, we make the are harmful, and under what circumstances, an agent additional assumption that the agent's world model is might use an explicit model of harm. For example, composed of ground atomic formuli. This sidesteps the we could provide the agent with a partial order over ramification problem, since domain axioms are banned. world states (i.e., a utility function). This framework is Instead, we demand that individual action descriptions widely adopted and numerous researchers are attempt- explicitly enumerate changes to every predicate that is ing to render it computationally tractable (Russell & affected.4 Note, however, that we are not assuming the Wefald 1991; Etzioni 1991; Wellman & Doyle 1992; STRIPSrepresentation; Instead we adopt an action lan- Haddawy & Hanks 1992; Williamson & Hanks 1994), guage (based on ADL (Pednault 1989)) which includes but many problems remain to be solved. In many universally quantified and disjunctive preconditions as cases, the introduction of utility models transforms well as conditional effects (Penberthy & Weld 1992). planning into an optimization problem -instead of Given the above assumptions, the next two sections searching for some plan that satisfies the goal, the define the primitives dont-disturb and restore, and agent is seeking the best such plan. In the worst case, explain how they should be treated by a generative the agent may be forced to examine all plans to deter- planning algorithm. We are not claiming that the ap- mine which one is best. In contrast, we have explored a proach sketched below is the "right" way to design satisficing approach -our agent will be satisfied with agents or to formalize Asimov's First Law. Rather , any plan that meets its constraints and achieves its our formalization is meant to illustrate the kinds of goals. The expressive power of our constraint language technical issues to which Asimov's Law gives rise and is weaker than that of utility functions, but our con- how they might be solved. With this in mind, the pa- straints are easier to incorporate into standard plan- per concludes with a critique of our approach and a (long) list of open questions. ning algorithms. By using a general, temporal logic such as that of (Shoham 1988) or (Davis 1990, Ch. 5) we could spec- Safety ify constraints that would ensure the agent would not Some conditions are so hazardous that our agent cause harm. Before executing an action, we could ask should never cause them. For example, we might de- an agent to prove that the action is not harmful. While mand that the agent never delete UTEX files, or never elegant, this approach is computationally intractable handle a gun. Since these instructIons hold for all as well. Another alternative would be to use a planner times, we refer to them as dont-disturb constraints, such as ILP (Allen 1991) or ZENO (Penberthy & Weld and say that an agent is safe when it guarantees to 1994) which supports temporally quantified goals. un- abide by them. As in Asimov's Law, dont-disturb fortunately, at present these planners seem too ineffi- constraints override direct human orders. Thus, if we cient for our needs.2 ask a softbot to reduce disk utilization and it can only Situated action researchers might suggest that non- do so by deleting valuable UTEX files, the agent should deliberative, reactive agents could be made "safe" by refuse to satisfy this request. carefully engineering their interactions with the envi- We adopt a simple syntax: dont-disturb takes a ronment. Two problems confound this approach: 1) single, function-free, logical sentence as argument. For the interactions need to be engineered with respect to example, one could command the agent avoid deleting each goal that the agent might perform, and a general files that are not backed up on tape with the following purpose agent should handle many such goals, and 2) constraint: if different human users had different notions of harm, then the agent would need to be reengineered for each dont-disturb( wri tten. to. tape(f) V isa(f, f ile ) ) user. Instead, we aim to make the agent's reasoning about Free variables, such as f above, are interpreted as harm more tractable, by restricting the content and universally quantified. In general, a sequence of ac- form of its theory of injury.3 We adopt the staD- tions satisfies dont-disturb(C) if none of the actions make C false. Formally, we say that a plan satisfies 2We have also examined previous work on "plan quality" an dont-disturb constraint when every consistent, for ideas, but the bulk of that work has focused on the totally-ordered, sequence of plan actions satisfies the problem of leveraging a single action to accomplish multiple constraint as defined below. goals thereby reducing the number of actions in, and the cost of, the plan (Pollack 1992; Wilkins 1988). While this ference tractable by formulating restricted representation classof optimizations is critical in domains such as database languages (Levesque & Brachman 1985).