Machine Theory of Mind
Total Page:16
File Type:pdf, Size:1020Kb
Machine Theory of Mind Neil C. Rabinowitz∗ Frank Perbet H. Francis Song DeepMind DeepMind DeepMind [email protected] [email protected] [email protected] Chiyuan Zhang S. M. Ali Eslami Matthew Botvinick Google Brain DeepMind DeepMind [email protected] [email protected] [email protected] Abstract 1. Introduction Theory of mind (ToM; Premack & Woodruff, For all the excitement surrounding deep learning and deep 1978) broadly refers to humans’ ability to rep- reinforcement learning at present, there is a concern from resent the mental states of others, including their some quarters that our understanding of these systems is desires, beliefs, and intentions. We propose to lagging behind. Neural networks are regularly described train a machine to build such models too. We de- as opaque, uninterpretable black-boxes. Even if we have sign a Theory of Mind neural network – a ToM- a complete description of their weights, it’s hard to get a net – which uses meta-learning to build models handle on what patterns they’re exploiting, and where they of the agents it encounters, from observations might go wrong. As artificial agents enter the human world, of their behaviour alone. Through this process, the demand that we be able to understand them is growing it acquires a strong prior model for agents’ be- louder. haviour, as well as the ability to bootstrap to Let us stop and ask: what does it actually mean to “un- richer predictions about agents’ characteristics derstand” another agent? As humans, we face this chal- and mental states using only a small number of lenge every day, as we engage with other humans whose behavioural observations. We apply the ToM- latent characteristics, latent states, and computational pro- net to agents behaving in simple gridworld en- cesses are almost entirely inaccessible. Yet we function vironments, showing that it learns to model ran- with remarkable adeptness. We can make predictions about dom, algorithmic, and deep reinforcement learn- strangers’ future behaviour, and infer what information ing agents from varied populations, and that it they have about the world; we plan our interactions with arXiv:1802.07740v2 [cs.AI] 12 Mar 2018 passes classic ToM tasks such as the “Sally- others, and establish efficient and effective communication. Anne” test (Wimmer & Perner, 1983; Baron- Cohen et al., 1985) of recognising that others can A salient feature of these “understandings” of other agents hold false beliefs about the world. We argue that is that they make little to no reference to the agents’ true this system – which autonomously learns how to underlying structure. We do not typically attempt to esti- model other agents in its world – is an impor- mate the activity of others’ neurons, infer the connectivity tant step forward for developing multi-agent AI of their prefrontal cortices, or plan interactions with a de- systems, for building intermediating technology tailed approximation of the dynamics of others’ hippocam- for machine-human interaction, and for advanc- pal maps. A prominent argument from cognitive psychol- ing the progress on interpretable AI. ogy is that our social reasoning instead relies on high- level models of other agents (Gopnik & Wellman, 1992). *Corresponding author: [email protected]. These models engage abstractions which do not describe the detailed physical mechanisms underlying observed be- Machine Theory of Mind haviour; instead, we represent the mental states of others, ploring the conditions under which such abilities arise can such as their desires, beliefs, and intentions. This abil- also shed light on the origin of our human abilities (Carey, ity is typically described as our Theory of Mind (Premack 2009). Finally, such models will likely be crucial mediators & Woodruff, 1978). While we may also, in some cases, of our human understanding of artificial agents. leverage our own minds to simulate others’ (e.g. Gordon, Lastly, we are strongly motivated by the goals of making ar- 1986; Gallese & Goldman, 1998), our ultimate human un- tificial agents human-interpretable. We attempt a novel ap- derstanding of other agents is not measured by a 1-1 corre- proach here: rather than modifying agents architecturally to spondence between our models and the mechanistic ground expose their internal states in a human-interpretable form, truth, but instead by how much these models afford for we seek to build intermediating systems which learn to re- tasks such as prediction and planning (Dennett, 1991). duce the dimensionality of the space of behaviour and re- In this paper, we take inspiration from human Theory of present it in more digestible forms. In this respect, the pur- Mind, and seek to build a system which learns to model suit of a Machine ToM is about building the missing in- other agents. We describe this as a Machine Theory of terface between machines and human expectations (Cohen Mind. Our goal is not to assert a generative model of et al., 1981). agents’ behaviour and an algorithm to invert it. Rather, we focus on the problem of how an observer could learn au- 1.1. Our approach tonomously how to model other agents using limited data (Botvinick et al., 2017). This distinguishes our work from We consider the challenge of building a Theory of Mind as previous literature, which has relied on hand-crafted mod- essentially a meta-learning problem (Schmidhuber et al., els of agents as noisy-rational planners – e.g. using inverse 1996; Thrun & Pratt, 1998; Hochreiter et al., 2001; Vilalta RL (Ng et al., 2000; Abbeel & Ng, 2004), Bayesian in- & Drissi, 2002). At test time, we want to be able to en- ference (Lucas et al., 2014; Evans et al., 2016), Bayesian counter a novel agent whom we have never met before, and Theory of Mind (Baker et al., 2011; Jara-Ettinger et al., already have a strong and rich prior about how they are go- 2016; Baker et al., 2017) or game theory (Camerer et al., ing to behave. Moreover, as we see this agent act in the 2004; Yoshida et al., 2008; Camerer, 2010; Lanctot et al., world, we wish to be able to collect data (i.e. form a poste- 2017). In contrast, we learn the agent models, and how to rior) about their latent characteristics and mental states that do inference on them, from scratch, via meta-learning. will enable us to improve our predictions about their future behaviour. Building a rich, flexible, and performant Machine Theory of Mind may well be a grand challenge for AI. We are not To do this, we formulate a meta-learning task. We con- trying to solve all of this here. A main message of this struct an observer, who in each episode gets access to a paper is that many of the initial challenges of building a set of behavioural traces of a novel agent. The observer’s ToM can be cast as simple learning problems when they are goal is to make predictions of the agent’s future behaviour. formulated in the right way. Our work here is an exercise Over the course of training, the observer should get better in figuring out these simple formulations. at rapidly forming predictions about new agents from lim- ited data. This “learning to learn” about new agents is what There are many potential applications for this work. Learn- we mean by meta-learning. Through this process, the ob- ing rich models of others will improve decision-making in server should also learn an effective prior over the agents’ complex multi-agent tasks, especially where model-based behaviour that implicitly captures the commonalities be- planning and imagination are required (Hassabis et al., tween agents within the training population. 2013; Hula et al., 2015; Oliehoek & Amato, 2016). Our work thus ties in to a rich history of opponent modelling We introduce two concepts to describe components of this (Brown, 1951; Albrecht & Stone, 2017); within this con- observer network and their functional role. We distinguish text, we show how meta-learning could be used to fur- between a general theory of mind – the learned weights nish an agent with the ability to build flexible and sample- of the network, which encapsulate predictions about the efficient models of others on the fly. Such models will common behaviour of all agents in the training set – and be important for value alignment (Hadfield-Menell et al., an agent-specific theory of mind – the “agent embedding” 2016) and flexible cooperation (Nowak, 2006; Kleiman- formed from observations about a single agent at test time, Weiner et al., 2016; Barrett et al., 2017; Kris Cao), and which encapsulates what makes this agent’s character and will likely be an ingredient in future machines’ ethical deci- mental state distinct from others’. These correspond to a sion making (Churchland, 1996). They will also be highly prior and posterior over agent behaviour. useful for communication and pedagogy (Dragan et al., This paper is structured as a sequence of experiments of 2013; Fisac et al., 2017; Milli et al., 2017), and will thus increasing complexity on this Machine Theory of Mind likely play a key role in human-machine interaction. Ex- network, which we call a ToMnet. These experiments Machine Theory of Mind showcase the idea of the ToMnet, exhibit its capabilities, formalism, we associate the reward functions, discount fac- and demonstrate its capacity to learn rich models of other tors, and conditional observation functions with the agents agents incorporating canonical features of humans’ Theory rather than with the POMDPs. For example, a POMDP of Mind, such as the recognition of false beliefs. could be a gridworld with a particular arrangement of walls and objects; different agents, when placed in the same Some of the experiments in this paper are directly inspired POMDP, might receive different rewards for reaching these by the seminal work of Baker and colleagues in Bayesian objects, and be able to see different amounts of their local Theory of Mind, such as the classic food-truck experiments surroundings.