Cognitive Machine Theory of Mind
Total Page:16
File Type:pdf, Size:1020Kb
Cognitive Machine Theory of Mind Thuy Ngoc Nguyen ([email protected]) Dynamic Decision Making Laboratory, Carnegie Mellon University 5000 Forbes Ave., Pittsburgh PA 15213 USA Cleotilde Gonzalez ([email protected]) Dynamic Decision Making Laboratory, Carnegie Mellon University 5000 Forbes Ave., Pittsburgh PA 15213 USA Abstract (Baker, Saxe, & Tenenbaum, 2011; Baker, Jara-Ettinger, A major challenge for research in Artificial Intelligence (AI) Saxe, & Tenenbaum, 2017). In this work, researchers devel- is to develop systems that can infer humans’ goals and beliefs oped a Bayesian ToM (BToM) model that is able to predict when observing their behavior alone (i.e., systems that have and attribute beliefs and desires of other agents, given the ob- Theory of Mind, ToM). In this research we use a theoretically- grounded, pre-existent cognitive model to demonstrate the de- servation of their actions. The BToM uses Bayes’ probabil- velopment of ToM from observation of other agents’ behavior. ities and an assumption of utility maximization (i.e., human The cognitive model relies on Instance-Based Learning The- rationality) to determine the posterior probability of “mental ory (IBLT) of experiential decision making, that distinguishes it from previous models that are hand-crafted for particular set- states”. Another recent example is the work of Rabinowitz tings, complex, or unable to explain a cognitive development et al. (2018), who developed a Machine ToM (MToM) archi- of ToM. An IBL model was designed to be an observer of tecture involving three modules: a character net that parses agents’ navigation in gridworld environments and was queried afterwards to predict the actions of new agents in new (not agents’ past trajectories of navigation in gridworlds; a mental experienced before) gridworlds. The IBL observer can infer state net, which parses agents’ trajectories in recent episodes, and predict potential behaviors from just a few samples of that are then used by the prediction net which is queried re- agents’ past behavior of random and goal-directed reinforce- ment learning agents. Furthermore the IBL observer is able to garding future behaviors of new agents. These authors offer infer the agent’s false belief and pass a classic ToM test com- a set of tests of the observer’s predictions regarding various monly used in humans. We discuss the advantages of using types of agents, and a test of recognition of false beliefs, the IBLT to develop models of ToM, and the potential to predict human ToM. Sally-Anne test (Wimmer & Perner, 1983). Keywords: cognitive model; machine theory of mind; Our research builds on these efforts, making the follow- instance-based learning theory. ing contributions. First, we present a Cognitive Machine Theory of Mind (CogToM) framework that relies on a gen- Introduction eral cognitive theory of decisions from experience, Instance- Theory of mind (ToM) refers to the ability of humans to in- Based Learning Theory (IBLT) (Gonzalez, Lerch, & Lebiere, fer and understand the beliefs, desires, and intentions of oth- 2003). Our approach is different from the standard compu- ers (Premack & Woodruff, 1978). ToM is known to develop tational models of ToM, summarized above, in that it uses very early in life (Wimmer & Perner, 1983; Keysar, Lin, & the IBL process and the formulations of the ACT-R archi- Barr, 2003) and it is one of the most important social skills tecture (Anderson & Lebiere, 2014) for memory-based in- used to predict others’ behavior and intentions, and to theo- ference to demonstrate how ToM develops from observation rize about others’ beliefs and desires in future situations. of other agents’ actions. Second, we demonstrate that an Since its origins, Artificial Intelligence (AI) attempted to IBL model of an observer (i.e, IBL observer) is able to ex- “replicate” various human behaviors in computational form, plain the inferences made by three types of acting agents in aiming at passing an imitation game (i.e., Turing Test) (Lake, gridworlds: Random, Reinforcement Learning (RL), and IBL Ullman, Tenenbaum, & Gershman, 2017; Turing, 1950): agents. Third, we find that the IBL observer predicts beliefs where a machine behavior would be indistinguishable from and actions of IBL acting agents more accurately than it pre- that of a human. AI work on ToM has investigated how hu- dicts the beliefs and actions of RL or Random agents. mans “mentalize” robots (machines more generally) and how human ToM develops when interacting with machines rather Instance-Based Learning Theory than other humans (Banks, 2019). While this work is ex- IBLT is a theory of decisions from experience, developed tremely relevant for developing machine representations of to explain human learning in dynamic decision environ- ToM, it does not address the major problem of how to build ments (Gonzalez et al., 2003). IBLT provides a decision mak- an algorithm that can develop ToM from the limited observa- ing algorithm and a set of cognitive mechanisms used to im- tion of other agents’ actions; a capability that humans excel plement computational models. The algorithm involves the at (Lake et al., 2017; Botvinick et al., 2017). recognition and retrieval of past experiences (i.e., instances) Recently, researchers built computational architectures of according to their similarity to a current decision situation. ToM. A notable example is the work of Baker and colleagues An “instance” in IBLT is a memory unit, that results from 2560 ©2020 The Author(s). This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY). the potential alternatives evaluated. These are memory rep- gridworld. The observer should be able to accomplish ToM resentations consisting of three elements: a situation (S) (set given full or partial observation of the agent’s action traces of attributes that give a context to the decision, or state s); a in past gridworlds. The observer model in CogToM is built decision (D) (the action taken corresponding to an alternative according to IBLT (Gonzalez et al., 2003). in state s, or action a); and a utility (U) (expected utility u or experienced outcome x of the action taken in a state). Gridworld IBLT relies on sub-symbolic mechanisms that have been A gridworld is a sequential decision making problem wherein discussed extensively (e.g. (Gonzalez et al., 2003; Gonzalez agents move through a N × N grid to search for targets and & Dutt, 2011; Gonzalez, 2013)), but we summarize here for avoid obstacles. We use gridworlds of 11 × 11 size following completeness. Each instance i in memory has a value of Acti- (Rabinowitz et al., 2018) (see Fig 1). A gridworld contains vation, which represents how readily available that informa- randomly-located obstacles (black bars) and the number of tion is in memory (Anderson & Lebiere, 2014). A simplified obstacles varies from zero (no obstacles) to six with the size version of the Activation equation captures how recently and of 1 ×1. In each grid, there are four goals of different values, frequently the considered instances are activated: represented as four colored objects (blue, green, orange, and purple), which are put at random locations that do not over- ! 1 − g lap the obstacles. Starting at a random position (i.e., (x;y)), A = ln (t −t0)−d + ln i ; (1) i ∑ s the agent (black dot) makes sequential decisions about the t02f1::t−1g gi actions to take (i.e., up, down, left, right) to reach one of the where d and s are respectively the decay and noise parame- four objects. A sequence of moves from the initial location to ters; t0 refers to the previous timestamp in which the outcome the end location forms a trajectory (dotted red line) which is of instance i was observed resulting from choosing an action produced by the strategy (the sequence of decisions) that the a at state s. The rightmost term represents the Gaussian noise agent takes. for capturing individual variation in activation, and gi is a ran- Generally, a gridworld task can be formulated as a Par- dom number drawn from a uniform distribution U(0;1). tially Observable Markov Decision Process (POMDP) as Activation of an instance i is used to determine its memory in (Rabinowitz et al., 2018). Each POMDP M j has a state retrieval probability: space S j, and each square in the grid is called a state s 2 S j. At each state s, an agent Ak is required to take an action a Ai=t e from an action space A j. Each agent follows their policy (i.e. pi = ; (2) Al=t ∑l e strategy), to decide how to move around the grid. By execut- p ing its policy pk in the gridworld M j, the agent Ak creates a T where t = s 2 representing the variability in recalling in- trajectory denoted by Tk j = f(st ;at )gt=0. If the agent has a stances from memory, and l refers to the index of all stored full observation of the grid, POMDP is referred to as MDP. instances to normalize pi. The expected utility of taking action a at state s is calcu- Models of Acting Agents in the Gridworld lated through a mechanism in IBLT called Blending: We consider three different types of acting agents that play in the gridworlds: Random, Reinforcement Learning (RL), and n Instance-based Learning (IBL) agents. V(s;a) = ∑ pixi: (3) i=1 A random agent Ak selects an action a in state s based on the probability pk(ajs). Precisely, the policy of Ak is drawn Essentially, the blended value is the sum of all the out- from a Dirichlet distribution pk ∼ Dir(a) with concentration comes weighted by their probability of retrieval, where xi is parameter a, so that ∑a2A pk(ajs) = 1 and pk(ajs) > 0.