Abstract Behavior Representations for Self-Assessment
Total Page:16
File Type:pdf, Size:1020Kb
Abstract Behavior Representations for Self-Assessment Scott A. Wallace Washington State University Vancouver 14204 NE Salmon Creek Ave. Vancouver, WA 98686 [email protected] Abstract ultimate purpose of the simulation is to train human pilots for situations that may occur in the real world (where teams Developing and testing intelligent agents is a complex task will consist entirely of humans). As a result, it is critical that that is both time-consuming and costly. This is especially true for agents whose behavior is judged not only by the final the agents used in the training simulation produce behavior states they achieve, but also by the way in which they accom- consistent with that of human experts. The problem, how- plish their task. In this paper, we examine methods for ensur- ever, is that it can be extremely difficult (if not impossible) to ing an agent upholds constraints particular to a given domain. precisely specify the behavior of these human experts in this We explore two significant projects dealing with this problem and many other complex domains. As a result, designing and we determine that two properties are crucial to success in and debugging such agents is an extremely time-consuming complex domains. First, we must seek efficient methods of and costly process. representing domain constraints and testing potential actions for consistency. Second, behavior must be assessed at run- We hypothesize that, at least to some extent, the difficulty time, as opposed to only during a planning phase. Finally, and high costs associated with the designing and testing pro- we explore how abstract behavior representations might be cess has a negative impact on the number of expert-level used to satisfy these two properties. We end the paper with agents being produced. To help mitigate these problems, we a brief discussion of the current state of our self-assessment have suggested that abstract behavior representations could framework and our plans for future work. be used to help validate an agent’s knowledge base by com- paring the agent’s behavior against a gold standard which Introduction might be derived, for example, from observations of human experts (Wallace & Laird 2003). However, case-based vali- As members of the Artificial Intelligence (AI) community, dation, in which the agent’s behavior is examined on a spe- one of our long term goals is to provide the theoretical and cific set of test scenarios, is incomplete by nature. As a re- technical framework to support agent development. Al- sult, it is difficult, if not impossible, to obtain complete con- though this goal has been a priority for quite some time, fidence in the quality of the agent’s performance in domains there remain few, if any, systems with the type of general of any real complexity. For this reason, we are looking for intelligence that was predicted to be just around the corner methods to continue assessing (and potentially correcting) in the late 1950s and 1960s. In past two decades, researchers the agent’s performance after it leaves the laboratory. have been more successful at demonstrating how it is possi- In a more general sense, the problem of ensuring that an ble to develop systems which reason or act intelligently so agent behaves like its human counterpart can be viewed as long as the domain is suitably constrained. However, even the problem of ensuring the agent upholds a set of operat- given the progress of the last twenty years, agents that per- ing constraints appropriate for its domain. We will refer form complicated tasks with the skill of human experts re- to this as the behavioral correctness problem,orBCP.As main relatively few. noted earlier, if this problem is addressed during or at the To understand why agents are not more prolific, we look end of development, it is known as validation. If the BCP is to the conditions of their operating environments. In many addressed after the agent leaves the laboratory, there are at domains, success is measured not only by the final state least two different ways to assess the agent’s behavior. First, achieved (as in the classic planning sense), but also by the the agent may play an active role reasoning about the qual- degree to which the agent’s behavior is consistent with that ity of its behavior with respect to explicit and declarative of a human expert. Consider, for example, the TacAir-Soar domain constraints. In this case, the assessment process is project (Jones et al. 1999) in which the design goal is to cognitive and we call the agent reflective. The second way build agents to fly tactical military aircraft as part of a sim- assessment may be conducted is within the agent’s frame- ulated training exercise. In this setting, teammates may be work itself. For example, the agent may be built using an either other intelligent agents or human counterparts. The architecture designed to ensure specific domain constraints Copyright c 2005, American Association for Artificial Intelli- will be upheld. In this case, the process is pre-cognitive in gence (www.aaai.org). All rights reserved. the sense that the agent itself may have no explicit knowl- edge or representation of the domain constraints. We con- a protective agent architecture. The planner prescribes be- sider such agent architectures to be protective of the domain havior consistent with the domain constraints, and the agent constraints. Furthermore, we consider an agent safe to the need only carry out such a plan (assuming one can be found). degree the agent can guarantee domain constraints will be The benefits of this approach, and architectural based meth- upheld (either through cognitive or pre-cognitive processes). ods in general, is that there is a clear distinction between the The remainder of this paper begins with an examination domain knowledge (i.e. the constraints and operators) and of previous work on the problem of ensuring intelligent sys- the mechanisms which enforce safety. As a result, it seems tems meet domain constraints. Following this examination, relatively clear that this method could be reused across dif- we discuss Behavior Bounding, a method for using abstract ferent domains with minimal effort. However, Weld and Et- behavior representations to classify the quality or correct- zioni’s approach also suffers from two serious shortcomings. ness of an agent’s behavior. Next, we present a plan for The first shortcoming is that safety is only enforced at extending Behavior Bounding so an agent can assess (and plan time. This is acceptable given their assumptions of modify) its own behavior during task performance. Finally, a simple world model (e.g., the agent is the sole cause for we close the paper by describing the current state of our change in the world state, and all actions are deterministic). project, a self-assessment framework, and commenting on However, if non-deterministic actions were possible, it is work remaining to be completed. likely that their planning method may become computation- ally intractable quite quickly. Perhaps even worse, the un- Related Work predictable nature of exogenous events in worlds where the agent is not the sole cause of change means that plan-time In this section, we examine earlier work on behavior assess- safety assurances may break down at execution time. The ment, focusing on one protective agent architecture and one ability to deal gracefully with exogenous events is critical system that combines characteristics of reflective agents and in domains of increased complexity. The desire to support protective agent architectures. Weld and Etzioni (Weld & this capability suggests that safety may be better enforced Etzioni 1994) use Isaac Asimov’s first law of robotics1 as a closer to execution time since plans generated significantly motivational tool for discussing the inherent dangers of in- in advance may need to be modified as the task is pursued. telligent systems and the need to ensure that their behavior The second main problem is the manner in which domain obeys constraints appropriate to their operating domain. In constraints are specified. Both hard and soft constraints their article, Weld and Etzioni examine a number of poten- are specified using an identifier and a logical sentence, ef- tial methods to help ensure agents uphold constraints appro- fectively of the form dont-do(C). Although their lan- priate to their domain. The authors acknowledge that prov- guage allows universal quantification within the sentence C, ing each action will not violate a constraint is likely to be this construction still means that significant enumeration of computationally infeasible in domains of any real complex- prohibited behavior is likely to be required. Instead, what ity and focus their efforts on more tractable methods. we would like to do is develop structures that allows us The approach Weld and Etzioni pursue is to modify a to classify larger regions of behavior space (i.e., action se- planner such that successfully generated plans uphold a set quences) as either consistent or inconsistent with the domain of pre-specified constraints. Their plan representation is constraints without necessarily enumerating the contents of built on an action language which allows universal quan- these spaces. tification and conditional effects. Constraints are specified as logical, function-free sentences and come in two flavors. The second project we examine, is the work on plan ex- The first is a hard constraint dont-disturb(C) which ecution by Atkins and others (e.g., (Atkins, Durfee, & Shin ensures that no action scheduled by the planner will make 1997; Atkins et al. 2001)) using the CIRCA real-time archi- C false (unless C was false in the initial state). The second tecture (Musliner, Durfee, & Shin 1993).