Arxiv:1911.13071V2 [Cs.AI] 16 Mar 2020 for Several Decades, Procedural Content Generation and Very Little in the Way of Randomization

Increasing Generality in Machine Learning through Procedural Content Generation Sebastian Risi1;2 and Julian Togelius1;3 1modl.ai, 2IT University of Copenhagen, 3New York University fsebastian, [email protected] Procedural Content Generation (PCG) refers to the Learning (PCGML) [1] refers to the use of machine learn- practice, in videogames and other games, of generating ing to train models on existing game content, and then content such as levels, quests, or characters algorithmi- leverage these models to create novel content automat- cally. Motivated by the need to make games replayable, ically. This can be done through simply sampling from as well as to reduce authoring burden, limit storage space the learned models, or through searching the artifact requirements, and enable particular aesthetics, a large space implied by the model so as to optimize some ob- number of PCG methods have been devised by game de- jective. In this paper, the term artifact refers to ob- velopers. Additionally, researchers have explored adapt- jects/levels/maps/etc. made by an algorithm. Interest- ing methods from machine learning, optimization, and ingly, PCGML poses different and hard challenges com- constraint solving to PCG problems. Games have been pared to generating e.g. images, because the produced widely used in AI research since the inception of the content needs to function. field, and in recent years have been used to develop and At the same time as PCG researchers are starting to benchmark new machine learning algorithms. Through incorporate these advances into their systems, interest in this practice, it has become more apparent that these the machine learning community is increasing in PCG- algorithms are susceptible to overfitting. Often, an al- inspired methods to improve the robustness of ML sys- gorithm will not learn a general policy, but instead a tems. One reason for this development is the growing policy that will only work for a particular version of a evidence that, while ML methods perform well for tasks particular task with particular initial parameters. In re- or in the environments they are trained on, they do not sponse, researchers have begun exploring randomization generalize well when that environment is changed or dif- of problem parameters to counteract such overfitting and ferent from what is seen during training. Training neu- to allow trained policies to more easily transfer from one ral networks with many free parameters and over long environment to another, such as from a simulated robot training times has lead to state-of-the-art performance to a robot in the real world. Here we review the large in many domains, but these solutions are typically over- amount of existing work on PCG, which we believe has fitted to the particular training examples, achieving high an important role to play in increasing the generality accuracy on a training set but performing poorly on data of machine learning methods. The main goal here is to not used for training [2, 3]. Especially in deep reinforce- present RL/AI with new tools from the PCG toolbox, ment learning (in which an agent has to learn in interac- and its secondary goal is to explain to game developers tion with its environment), overfitting is rarely addressed and researchers a way in which their work is relevant to [4, 5, 6] but a significant problem. Take the very popular AI research. Arcade Learning Environment (ALE) as an example [7], a classic benchmark in RL based on an emulation of the 1 Introduction Atari 2600 games console. Hundreds of games were made for that console, however they all have fixed sets of levels arXiv:1911.13071v2 [cs.AI] 16 Mar 2020 For several decades, procedural content generation and very little in the way of randomization. Training an (PCG) has been a feature of many video games (Ta- agent to play a game in ALE makes it liable to overfit ble. 1). PCG refers to the algorithmic creation of game not only to that particular game, but to its levels and content | not the game engine, but things such as levels, sequence of events. quests, maps, characters or even rules | either in run- The basic idea in employing PCG to address the gen- time (as the game is being played) or at design time (as erality problem in ML systems is to artificially create the game is made). There are several reasons why PCG more training data or training situations. This way is used in games: it can increase the replayability of a ML-based systems can be biased towards learning gen- game as players are presented with a new experience ev- eral task properties instead of learning spurious elements ery time they play, it can help to reduce production costs found in the training examples. Methods include simpler and disk storage space, and it enables new types of games approaches such as data augmentation that artificially built on the unique affordances of content generation. increase the data used for training [8, 9] or methods that Interestingly, developments in PCG and machine train agents in a large number of training environments learning have started to influence each other in recip- that include randomized elements [10]. PCG methods rocal ways. Procedural Content Generation via Machine have been extended to create complete maps for Cap- 1 ture the Flag [11] or maps for 2D video games [12], and generation of content that is essential to gameplay. The in a recent impressive demonstration of the advantage of highly successful Diablo series of games (Blizzard, 1997- training in PCG-generated environments, have allowed 2013) (Fig. 1c), as well as platformers such as Spelunky a robot hand trained in a simulation to manipulate a (Mossmouth, 2008), are roguelikes. Rubik's cube in the real world [13]. While the PCG in Rogue was motivated by a need In this article, we review the history of PCG and for replayability and unpredictability, another key reason the recent trends in hybridizing PCG methods with ma- for PCG is wanting to create game worlds larger than chine learning techniques. We do this because we believe can fit in memory or on storage media. A paradigm- there are convergent research interests and complemen- setting game here was Elite (Brabensoft, 1984), a space- tary methods in the two communities. We want to sup- faring adventure game featuring thousands of planets ply machine learning researchers with a new toolbox to which seemingly miraculously fit in memory on a Com- aid their generalization work, and games researchers and modore 64, with 64 kilobytes of memory (Fig. 1b). Every developers with new perspectives from machine learn- time a star system was visited, the game would recre- ing. This Review also details promising future research ate the whole starsystem with planets, space stations, direction and under-explored research avenues enabled and spacecraft, from a given random seed. This ap- by more advanced PCG techniques. The goal of this Re- proach has later been used for games such as No Man's view is to share with the larger machine learning research Sky (Hello Games, 2015), which famously contains more community work from the exciting field of PCG that is planets than you can visit in a lifetime, all with their own just beginning to capture the interest of ML researchers ecologies (Fig. 1e). but could ultimately encourage the emergence of more The strategy games in the very popular Civilization general AI. We focus on examples where there's a no- series also rely heavily on PCG, as a new world is cre- tion of \environment" (typically through learning being ated for the players to explore and contest every time centered on an agent in simulated physical world), but a new game is created (Fig. 1f). Similarly, the open- will throughout the text mention other learning settings world sandbox game Minecraft (Mojang, 2010) creates a where relevant for comparison. completely new world at the start of each game session We will discuss work coming out of both commercial (Fig. 1d). Other games use PCG in somewhat more pe- game development, AI/ML research targeted at games, ripheral roles, such as the sidequest generation (e.g. cre- and AI/ML research targeted at other applications or ating an infinite supply of fetch quests through a guild the development of general intelligence. It is important system) in The Elder Scrolls V: Skyrim (Bethesda, 2011) to bear in mind that these approaches were developed for (along with some earlier games in the series) and the per- very different purposes, with those coming out of game vasive generation of terrain features and vegetation in a development and research typically focused on providing large number of open-world 3D games. PCG techniques entertainment. However, this should not in general make are now so commonplace and reliable that it is more com- them less suitable for other purposes. Indeed, as one of mon than not to utilize them in many game genres. the reasons games are entertaining is that they challenge Interestingly, PCG in video games is actually prefig- our minds, one could argue that algorithms designed to ured by certain pen-and-paper generators intended to increase entertainment in games may be useful for creat- be executed by humans with the help of dice or cards, ing intelligence-relevant challenges [14]. including a dungeon generator for the classic Dungeons We will discuss both work that we classify as PCG and Dragons (TSR, 1976) role playing game [31]. Some and certain examples of work that is adjacent to PCG, recent board games which include aspects of PCG are such as some forms of data augmentation in supervised 504 (2F Spiele, 2015) or Betrayal at House on the Hill learning.

Arxiv:1911.13071V2 [Cs.AI] 16 Mar 2020 for Several Decades, Procedural Content Generation and Very Little in the Way of Randomization

Covariance Matrix Adaptation for the Rapid Illumination of Behavior Space

Long Term Memory Assistance for Evolutionary Algorithms

Automated Antenna Design with Evolutionary Algorithms

Evolutionary Algorithms

Evolutionary Algorithms in Intelligent Systems

A New Evolutionary System for Evolving Artificial Neural Networks

Neuroevolution Strategies for Episodic Reinforcement Learning ∗ Verena Heidrich-Meisner , Christian Igel

A Hybrid Neural Network and Genetic Programming Approach to the Automatic Construction of Computer Vision Systems

Generative Genetic Algorithm for Music Generation

An Evolutionary Algorithm for Multi-Objective Optimization of Combustion Processes

Evolutionary Algorithms for Reinforcement Learning

Evolution of Complexity in Real-World Domains