Deepmind Lab2d

November 2020 DeepMind Lab2D Charles Beattie1, Thomas Köppe1, Edgar A. Duéñez-Guzmán1 and Joel Z. Leibo1 1DeepMind We present DeepMind Lab2D, a scalable environment simulator for artificial intelligence research that facilitates researcher-led experimentation with environment design. DeepMind Lab2D was built with the specific needs of multi-agent deep reinforcement learning researchers in mind, but it may also be useful beyond that particular subfield. 1. Introduction state space size. It is a real scientific question that Are you a product of your genes, brain, or environ- merits serious study. We have elsewhere referred ment? What will an artificial intelligence (AI) be? to this as the problem problem (Leibo et al., 2019). The development of AI systems is inextricably in- One reason it is currently difficult to undertake tertwined with questions about the fundamental work aimed at the problem problem is that it is causal factors shaping any intelligence, including rare for any individual person to possess expertise natural intelligence. Even in a completely abstract in all the relevant areas. For instance, machine learning scenario, prior knowledge is needed for learning researchers know how to design well- effective learning, but prior learning is the only controlled experiments but struggle with the nec- realistic way to generate such knowledge. essary skills that are more akin to computer game The centrality of this dynamic is illustrated by design and engineering. Another reason why it is experiments in biology. In laboratory animals, difficult to pursue this hypothesis is the prevailing manipulations of the rearing environment pro- culture in machine learning that views any tin- duce profound effects on both brain structure kering with the environment as “hand-crafting”, and behavior. For instance, laboratory rodent en- “special-casing”, or just “cheating”. These atti- vironments may be enriched by using larger cages tudes are misguided. In our quest for generality, which contain larger groups of other individuals— we must not forget the great diversity and partic- creating more opportunities for social interaction, ularity of the problem space. variable toys and feeding locations, and a wheel A diverse set of customizable simulation envi- to allow for the possibility of voluntary exercise. ronments for large-scale 3D environments with Rearing animals in such enriched environments varying degrees of physical realism exist (Beattie improves their learning and memory, increases et al., 2016; Juliani et al., 2018; Kempka et al., synaptic arborization, and increases total brain 2016; Leibo et al., 2018; Todorov et al., 2012). weight (Van Praag et al., 2000). It is the inter- For 2D, excellent simulation environments also action of environmental factors that is thought exist (Chevalier-Boisvert et al., 2018; Jiang, 2019; to produce the enrichment effects, not any single Lanctot et al., 2019; Platanios et al., 2020; Schaul, factor in isolation. 2013; Stepleton, 2017; Suarez et al., 2019; Zheng arXiv:2011.07027v2 [cs.AI] 12 Dec 2020 While it is conceivable that AI research could et al., 2017); however, they fell short, at the time stumble upon a perfectly general learning algo- this project began, in at least one of our require- rithm without needing to consider its environments of composability, flexibility, multi-agent ment, it is overwhelmingly clear that environ- capabilities, or performance. ments affect learning in ways that are not just arbitrary quirks, but real phenomena which we 1.1. DeepMind Lab2D 1 must strive to understand theoretically. There DeepMind Lab2D (or “DMLab2D” for short) is is structure there. For instance, the question of a platform for the creation two-dimensional, why some environments only generate trivial com- layered, discrete “grid-world” environments, in plexity, like tic-tac-toe, while others generate rich which pieces (akin to chess pieces on a chess complexity, like Go, is not just a matter of the 1https://github.com/deepmind/lab2d Corresponding author(s): Charles Beattie [email protected] DeepMind, 5 New Street Square, London EC4A 3TW DeepMind Lab2D board) move around. This system is particularly • contact (string): A tag name for a contact tailored for multi-agent reinforcement learning. event. Whenever the piece enters (or leaves) The computationally intensive engine is written in the same ¹F, Gº-coordinate as another piece C++ for efficiency, while most of the level-specific (which is necessarily on a different layer), all logic is scripted in Lua. involved pieces experience a contact event. The event is tagged with the value of this The grid. The environments of DMLab2D consist attribute. of one or more layers of two-dimensional grids. A position in the environment is uniquely identified An attempt to change a piece’s state fails if the by a coordinate tuple ¹F, G, layerº. Layers are la- piece’s resulting ¹F, G, layerº-position is already beled by strings, and the F- and G-coordinates are occupied. non-negative integers. An environment can have an arbitrary number of layers, and their rendering Callbacks. Most of the logic in an environment order is controlled by the user. is implemented via callbacks for specific types (states) of pieces. Callbacks are functions which Pieces. The environments of DMLab2D are pop- the engine calls when the appropriate event or ulated with pieces. Each piece occupies a posi- interaction occurs. tion ¹F, G, layerº, and each position is occupied by at most one piece. Pieces also have an ori- Raycasts and queries. The engine provides two entation, which is one of the traditional cardi- ways to enumerate the pieces in particular posi- nal directions (north, east, south, west). Pieces tions (and layers) on the grid: raycasts and queries. can move around the ¹F, Gº-space and reorient A raycast, as the name implies, finds the first themselves as part of the evolution of the envi- piece, if any, in a ray from a given position. A ronment, both relatively to their current posi- query finds all pieces within a particular area tion/orientation and absolutely. It is also possible in the grid, shaped like a disc, a diamond, or a for a piece to have no position, in which case rectangle. it is “off the board”. Pieces cannot freely move among layers; instead, a piece’s layer is controlled 1.2. Why 2D, not 3D? through its state (described next). Two-dimensional environments are inherently easier to understand than three-dimensional ones, States. Each piece has an associated state. The at very little, if any, loss of expressiveness. Even a state consists of a number of key-value attributes. game as simple as Pong, which essentially consists Values are strings or lists of strings. The possible of three moving rectangles on a black background, values are fixed by the designer as part of the can capture something fundamental about the environment. The state of each piece can change real game of table tennis. This abstraction makes as part of the evolution of the environment, but it easier to capture the essence of the problems the state change can only select from among the and concepts that we aim to solve. 2D games fixed available values. have a long history of making challenging and The state of a piece controls the piece’s ap- interpretable benchmarks for artificially intelli- pearance, layer, group membership, and behav- gent agents (Mnih et al., 2015; Samuel, 1959; ior. Concretely, the state of a piece comprises the Shannon, 1950). following attributes: Rich complexity along numerous dimensions can be studied in 2D just as readily as in 3D, • layer (string): The label of the layer which if not more so. When studying a particular re- the piece occupies. search question, it is not clear a priori whether • sprite (string): The name of the sprite used specific aspects of 3D environments are crucial to draw this piece. for obtaining the desired behavior in the train- • groups (list of strings): The groups of which ing agents. Even when explicitly studying phe- this piece is a member. Groups are mostly nomena like navigation and exploration, where used for managing updater functions. organisms depend on complex visual processing 2 DeepMind Lab2D and continuous-time physical environments, re- provide robust support for multi-agent systems. searchers in reinforcement learning often need Most existing environments, however, only pro- to discretize the interactions and observations so vide poor support for multiple players. that they become tractable. Moreover, 2D worlds DeepMind Lab2D supports multiple simultane- can often capture the relevant complexity of the ous players interacting in the same environment. problem at hand without the need for continuous- These players may be either human or computer- time physical environments. This pattern where controlled, and it is possible to mix human and studying phenomena on 2D worlds has been a computer-controlled players in the same game. critical first step towards further advances in more Each player can have a custom view of the complex and realistic environments is ubiquitous world that reveals or obscures particular infor- in the field of artificial intelligence. 2D worlds mation, controlled by the designer. A global view, have been successfully used to study problems as potentially hidden from the players, can be set diverse as social complexity, navigation, imper- up and can include privileged information. This fect information, abstract reasoning, exploration, can be used for imperfect information games, as and many more (Leibo et al., 2017; Lerer and well as for human behavioral experiments where Peysakhovich, 2017; Rafols et al., 2005; Ullman the experimenter can see the global state of the et al., 2009; Zheng et al., 2020). environment as the episode is progressing. Another advantage of 2D worlds is that they are easier to design and program than their 3D 1.4. Exposing metrics, supporting analysis counterparts. This is particularly true when the DeepMind Lab2D provides several flexible mecha- 3D world actually exploits the space or physical nisms for exposing internal environment informa- dynamics beyond the capabilities of 2D ones.

Load more