<<

Level generation and style enhancement — for game development overview∗

Piotr Migdał† Bartłomiej Olechno Błażej Podgórski ECC Games SA ECC Games SA ECC Games SA Kozminski University

Abstract

We present practical approaches of using deep learning to create and enhance level maps and textures for video games – desktop, mobile, and web. We aim to present new possibilities for game developers and level artists. The task of designing levels and filling them with details is challenging. It is both time-consuming and takes effort to make levels rich, complex, and with a feeling of being natural. Fortunately, recent progress in deep learning provides new tools to accompany level designers and visual artists. Moreover, they offer a way to generate infinite worlds for game replayability and adjust educational games to players’ needs. We present seven approaches to create level maps, each using statistical methods, , or deep learning. In particular, we include:

• Generative Adversarial Networks for creating new images from existing examples (e.g. ProGAN).

• Super-resolution techniques for upscaling images while preserving crisp detail (e.g. ESRGAN).

• Neural style transfer for changing visual themes. arXiv:2107.07397v1 [cs.CV] 15 Jul 2021 • Image translation – turning semantic maps into images (e.g. GauGAN).

• Semantic segmentation for turning images into semantic masks (e.g. U-Net).

• Unsupervised semantic segmentation for extracting semantic features (e.g. Tile2Vec).

• Texture synthesis – creating large patterns based on a smaller sample (e.g. InGAN).

∗Submitted to the 52nd International Simulation and Gaming Association (ISAGA) Conference 2021 †[email protected], https://p.migdal.pl

1 1. Introduction Advances in deep learning [1] offer new ways to create and modify visual content. They can be used for entertainment, including development. This paper presents a survey of various machine learning techniques that game developers can directly use for texture and level creation and improvement. These techniques offer a way to:

• Generate infinite content that closely resembles real data. • Change or modify the visual style. • Save the time for artists level designers and artists by extending the board or filling features. These techniques differ from a classical approach of procedural content generation [2] using hand-written rules, which is the keystone of some genres (e.g. roguelike games). In contrast, machine learning models learn from the training dataset [3,4]. While we focus on images, it is worth mentioning progress in text-generation neural networks (e.g. transformers such as GPT-2 [5] and GPT-3 [6]). They are powerful enough not only to supplement entertainment, e.g. with writing poetry [7] but be the core of a text adventure game such as AI Dungeon [8,9]. In this game, all is AI-generated and allows any text input by the user (rather than a few options). GPT-3 network is powerful enough to generate abstract data (such as code) and may be able to create spatial maps encoded as text (e.g. as XML or JSON). One of its extensions, DALL·E[10], is a text-image model able to generate seemingly arbitrary images from the description, such as “an armchair in the shape of an avocado”. Recent progress in resulted in creating networks surpassing top human players in Go [11], reaching grandmaster level in real-time strategy StarCraft II [12], and playing at a decent level in a multiplayer battle arena Dota 2 [13]. These methods learn from interacting with the game environment, learning human-like strategies [14], and exploiting unintended properties of game and level maps [15]. Other notable approaches include automatic level evaluation [16] and level generation from gameplay videos [17]. For some networks, level generation is an auxiliary task to improve an AI model performance [18]. Generating levels can be modified by parameters to adjust game difficulty or address particular needs in educational games [19], offering the potential to increase their effectiveness [20]. We present seven approaches to create level maps, each using statistical methods, machine learning, or deep learning. In particular, we include:

• Generative Adversarial Networks for creating new images from existing examples. • Super-resolution techniques for upscaling images while preserving crisp detail. • Neural style transfer for changing visual themes. • Image translation – turning semantic maps into images. • Semantic segmentation for turning images into semantic masks. • Unsupervised semantic segmentation for extracting semantic features. • Texture synthesis – creating large patterns based on a smaller sample.

2 In this paper, we put a special emphasis on methods that are readily available for use. The most popular deep learning frameworks, including TensorFlow [21] and PyTorch [22], are open-source. Moreover, it is common that the newest models are shared as GitHub code, or even directly as Colaboratory notebooks, allowing users to use them online, without any setup. Other models can be accessed via backend, as a service, or with TensorFlow.js [23] using GPU support in the browser. 2. Choice of methods Numerous methods are (or can be) used for game development. We focused on ones, which fulfill the following set of criteria: • Learn from data rather than use hand-written rules. • Work on or with images (that is, any 2D spatial data). • Can be used for generating, enhancing, or transforming levels. • Can directly fit into the workflow of level designers and artists [24]. • Are state of the art, in the sense that there is no other method that obviously outclasses them. We choose to restrict to a set of methods that can be related to each other in a meaningful way and focus on the new statistical methods. While we consult state-of-the-art surveys [25], we are aware that there are no straightforward, numeric benchmarks for content generation. What matters is the desired effect by game developers and a subjective experience by the players. Text generation, procedural or not, is vital for a wide range of adventure or text-based games. Comparing such methods deserves another study, both due to the sheer volume of the topic and the difficulty of relating the text to images (as it is ”apples to oranges”). We acknowledge that practically any statistical, machine learning or deep learning method can have a potential for game development. However, we aim to cover ones for which their use is straightforward. Since tools used in commercial games are often proprietary secrets, we do not restrict ourselves only to methods with known applications in video games. It is important to emphasize that these seven method classes are, in general, non-interchangeable. As depicted in Figure1, each requires a different input, produces a different output, and requires a different setup (which usually involves a specific training dataset). They solve different problems. Hence, our motivation is not to compare the quality of the results but to provide use-cases when a particular method has an edge. Even though we use the term “image”, we are not restricted to 2D images with 3 color channels. The same methods work for more channels (e.g. RGBA, RGB + heightmap, or feature maps representing other physical properties) and 3D voxel data. 3. Description of methods 3.1. Generative Adversarial Networks Regular Generative Adversarial Networks (GANs) generate new images similar to ones from the training set [26, 27]. The core idea is that there are two networks – one generating images

3 Figure 1. A conceptual summary of transformations performed by the discussed methods. We show a list of manually created input-output pairs.

(Generator) and the other distinguishing generated images from real ones (Discriminator) [28]. However, their training is notoriously difficult and requires much effort to create any satisfactory results. GANs take as an input a vector, which might be random, fixed, or generated from the text [29, 30, 31]. Historically, networks were restricted to generating small images (e.g. 64×64 or 128×128 pixels) [32, 33, 34]. Only recent progress allowed to generate large-scale (e.g. 1024×1024) detailed images such as human with ProGAN [35], as seen in Figure2. Its training process involves a series of neural network training stages. It starts from high-level features (such as 4 × 4 images showing

4 Figure 2. A photorealistic generated from a set of human-interpretable parameters [38] (left), and a similarly created anime face [39, 40] (right). an extremely blurred face), step-by-step learning to include high detail. This training is crucial, as otherwise convolutional neural networks tend to overemphasize local features (visible by humans or not) and neglect high-level features. This class offers potential for one-step creation of levels, from the high-level overview to low-level features. These are extremely data and computing-intensive. However, there is an ongoing effort to make them more data-efficient [36]. Moreover, most examples are based on fairly standardized datasets, e.g. faces of or anime characters [37] – with consistent style and color. The same approach might not work for more diverse collections of images or photos.

3.2. Super-resolution (image upscaling) Video games often benefit from high visual detail, unless the low resolution is a conscious design choice – as for some retro-styled games. Super-resolution [41, 42] offers a way to improve texture details by a factor (typically 2x or 4x) without additional effort. Unlike more traditional approaches (nearest neighbor, linear, bicubic), it preserves crisp detail. It is not possible to flawlessly generate more data. However, it is possible to create plausible results — crisp details resembling authentic, high-resolution images. The method can be applied in two ways: upgrading textures or improving the game resolution in real-time. A super-resolution network ESRGAN [43] became widely used as it was released with pre- trained weights and easy-to-use code. It can improve older games and save artists’ time for new games as only a lower resolution would be required. Fans are already doing it – both as a proof-of-principle (e.g.for a single screenshot as in Figure3) and to create playable mods that enhance game graphics. There are channels dedicated to upscaling games [44, 45] that resulted in fan-made textures usable in games such as the 3D shooter Return to Castle [46]. Compare with a seemingly manual approach of remastering old games, e.g. Diablo II: Resurrected [47] and StarCraft: Remastered [48]. In addition to generic upscaling networks, the is a PULSE [49] network that offers upscaling of faces up to x64 times as in Figure4. Another key application is improving resolution in real-time. Given that we have fixed computing power (usually restricted by the GPU), it improves the trade-off between resolution and frames per second. That is, we can improve resolution while keeping the same frame rate or

5 Original Upscaled x4

Figure 3. Upscaling with ESRGAN a scene from Heroes of Might and Magic III.

Figure 4. Depixelizing Doomguy with PULSE [53], reproduced with permission. Note that there are many plausible faces (right) consistent with the original, low-resolution version from (1993) (left). improving the frame rate while keeping the same resolution. such as DLSS 2.0 by [50, 51] are already used by AAA games such as 2077 [52].

3.3. Style transfer Instead of increasing resolution, we may want to change the style of the image. For example, to turn foliage into dunes or daytime textures into nighttime. The simplest example of style transfer involves taking two images (one with content and the other with style) and generating a painting-like picture [54] or video [55]. These methods work best for creating dreamy or trippy visual styles but can go well beyond that [56, 57]. It is already accessible as a plugin for a popular game engine, Unity3D [58]. Style transfer can be used for post-processing, as depicted in Figure5, or as part of an asset generation workflow [59]. Newer approaches (usually involving GANs) can change crucial features without sacrificing quality [60, 61]. They can be sensitive and detailed features as a gradual change of faces (e.g. by age or ethnicity) while maintaining photorealistic quality [62, 63]. These techniques look promising for changing textures or adding a real-time filter. It includes turning rendered graphics into realistic ones, reassembling real photos, e.g. with GTA V as an example [64]. While such transformations improve the realism of some pieces (e.g. foliage), they may reduce the saturation, cleanliness, and other desirable features of game graphics.

6 Figure 5. An original screenshot from Doom (1993) and three styles: The Starry Night by van Gogh, a constellation photo by Pexels, and Angels and Devils by Escher (top for the style, bottom for the modified image). Generated online with DeepArt [65].

Figure 6. Nvidia GauGAN beta [68] for turning paintbrush-like feature drawings into realistic images. Colors corresponding: sea, sky, rock, clouds and mountains (left) are turned into a photorealistic picture (right).

3.4. Image translation Image translation is a broad subject of using neural networks to turn one image into another. Technically, a few other methods can be viewed as such: style transfer, super-resolution, or feature extraction. This section focuses on turning feature maps into levels, fixing feature maps, enriching maps with details (trees), or adding additional modalities (e.g. depth) from existing fields. Notable examples include pix2pix [66] and CycleGAN [67], translating between various classes of images. GauGAN [68] is a network focused on turning semantic maps into realistic images – see Figure6. Another similar application is turning 2D graphics into a pseudo-3D perspective – even without any data obtained from a game engine [69]. Moreover, abstract game graphics can be turned into photorealistic landscapes – e.g. for Minecraft [70].

3.5. Supervised image segmentation Turning real photos into game maps usually requires manual work. Semantic segmentation is a class of models extracting semantic features from images [71, 72]. For example, to change an

7 Figure 7. Supervised image segmentation with a custom U-Net: turning satellite images (left) into road maps (right) [73], reproduced with permission. aerial image into areas with different properties as depicted in Figure7. In a race car game, it would be extracting the location of roads, gravel, grass, foliage, and water. A semantic map can be applied to change the game behavior (e.g. different friction on each zone) or generate a map with different textures or graphics, but with overall structure taken from a real-world example. In supervised semantic segmentation, we need to have training data in the form of pairs: the input image and semantic features, with the same resolution, with one or more classes. Network architectures such as U-Net typically need a relatively small training dataset (as for each image, they treat every pixel as a data point) and a simple training process.

3.6. Unsupervised image segmentation Manual labeling of all data with semantic maps can be labor-intensive and rigid. For a typical project, one would need to label a considerable fraction of data, which makes it infeasible if we want to lower the workload of level designers. Unsupervised feature extraction can be created in a few different ways. The most common approach is [74] – that is, using visual patterns extracted by pre-trained networks [75] such as ResNet-50 [76]. However, in many cases, game graphics differ substantially from real photos in style, scale, and content – see Figure8. Given that convolutional neural networks are very sensitive to low-level patterns, any digital artistic style may mislead them. Moreover, abstract graphics (e.g. Minesweeper) or of a different scale (e.g. SimCity) may be non-suitable for networks primarily trained on ImageNet [77] – a dataset of photos taken with regular cameras. In this case, we may need to resort to self-supervised and unsupervised methods trained on our dataset. For that, we may use [79], unsupervised clustering [80], or methods extracting content based solely on visual pattern proximity such as Tile2Vec [81] and Patch2Vec [82]. At ECC Games SA, we use a similar technique [83] for turning aerial photos of race track by drone into semantic maps that allow both to distinguish robust features (e.g. road vs foliage) and details (density and angle of tire trails). This technique can work even on a single image, with no pre-training, which offers extreme data efficiency rare in deep learning. In particular, it means that there is no extra effort on image labeling or generating large datasets. We show two diverse examples in Figure9.

8 Figure 8. An example that ResNet-50 trained on ImageNet dataset does not work well with Minesweeper, using online .js [78].

Figure 9. One-shot results for a SimCity map and an aerial scan of a racetrack – input images (left) and three largest Principal Components mapped to RGB channels (right).

3.7. Pattern synthesis Pattern synthesis is a method used to expand a spatial pattern up to an arbitrary size. Conceptually, it relates to super-resolution, but it makes the image larger instead of going deeper in detail. A straightforward approach involves generating bigger textures without self-repetition to give them a natural feel [84, 85]. This technique can be applied statically or for creating patterns in real-time [86]. Another way to go is to work at the level of the general map – i.e. expanding high-level semantic maps from a smaller sample. Except for deep learning, some classical algorithms are widely used, such as WaveFunctionCollapse [87, 88, 89]. For game genres such as classic platformer games and scrolling shooters, 2D level maps can be considered a sequence of 1D stripes. We can tackle generating 1D sequences with additional, simpler methods. We can tackle generating 1D sequences with additional, simpler methods, e.g. the classical statistical model of Hidden Markov Chain [90]. Deep learning models for sequence

9 Figure 10. Pattern synthesis using InGAN [85] based on a single map from SimCity. The map is treated purely as an image. Splitting into semantic regions (e.g. tiles related to buildings) would improve the result. generation, such as a Long Short-Term Memory network (LSTM) [91] and a Gated Recurrent Unit (GRU) [92], offer more flexibility and even better results, especially for longer sequences. The 1D patterns of images can be exploited by transformers generating images strip-by-strip [93]. For example, they can generate Super Mario levels generated with an LSTM [94]. 4. Conclusion We presented an array of deep learning methods for the creation and enhancement of levels and textures. They show the potential to improve the game development process, offer new experiences to the players, or generate content in real-time. As we have seen in the past, technical advancements allow not only to improve existing game formulas, but also to create new genres [95]. Depending on the problem we tackle or an enhancement we want to make – we need to consider different types of deep learning models.

1. We need to translate the task into an input-output problem. Both inputs and outputs can be photos or other real-world images, in-game graphics, semantic maps, or any other format of spatial encoding of low-level textures and high-level maps.

2. We need to check classes of models and if they provide the transformation we need. In this paper, we discussed seven different classes, which provide a good starting point.

3. We need to see if any model within this family provides results of the desired quality. This step may involve a simple plug & play setup with an existing trained model or involve a much more complicated process of finding a specific model and training it with our data.

Quite a few deep learning techniques are already used in game development – by fan communities, modders, professional game developers, or as a tool provided by game engines. While progress in deep learning is fast, even current models offer broad perspectives for supporting game development – with many applications and opportunities yet to be discovered.

10 The abundance of publications from the last two years (and not-yet-published novel results) shows that it is a fruitful and quickly developing field. The latter resorts us to citing GitHub repositories and other sources to reference the newest methods. Acknowledgments We would like to thank the ECC Games team, with special thanks to Marcin Prusarczyk and Piotr Wątrucki for the project overview, and Bartłomiej ”Valery” Walendziak for sharing his insights on the stages of level development from the artist’s perspective. We are grateful to Olle Bergman, Anna Olchowik and Jan Marucha for their comments and to Rafał Jakubanis for his extensive editorial feedback on the draft. We focus on creating race track maps from aerial scans of real race tracks. This task involves a few levels of complexity – from the high-level general pattern of the map to low-level details and decorations such as tire tracks. This research is supported by the EU R&D grant POIR.01.02.00- 00-0052/19 for project “GearShift – building the engine of the behavior of wheeled motor vehicles and map generation based on artificial intelligence algorithms implemented on the platform – GAMEINN”. References [1] I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016, https://www. deeplearningbook.org/. [2] N. Shaker, J. Togelius, and M. J. Nelson. Procedural Content Generation in Games: A Textbook and an Overview of Current Research. Springer Berlin Heidelberg, New York, NY, 2016, http: //pcgbook.com/. [3] A. Summerville, S. Snodgrass, M. Guzdial, C. Holmg˚ard,A. K. Hoover, A. Isaksen, A. Nealen, and J. Togelius. Procedural Content Generation via Machine Learning (PCGML). IEEE Transactions on Games, 10(3):257–270, Sept. 2018, arXiv:1702.00539, doi:10.1109/TG.2018.2846639. [4] J. Liu, S. Snodgrass, A. Khalifa, S. Risi, G. N. Yannakakis, and J. Togelius. Deep Learning for Procedural Content Generation. Neural Comput & Applic, 33(1):19–37, Jan. 2021, arXiv:2010.04548, doi:10.1007/s00521-020-05383-8. [5] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. Language Models are Unsupervised Multitask Learners. OpenAI, page 24, Feb. 2019, https://openai.com/blog/ better-language-models/. [6] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. , J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei. Language Models are Few-Shot Learners. In 1877–1901, volume 33, pages 1877–1901. Curran Associates, Inc., July 2020, arXiv:2005.14165.

[7] G. Branwen. GPT-2 Neural Network Poetry. Mar. 2019, https://www.gwern.net/GPT-2. [8] R. Raley and M. Hua. Playing With Unicorns: AI Dungeon and Citizen NLP. Digital Humanities Quarterly, 14(4), Dec. 2020, https://escholarship.org/uc/item/83m0m08x.

11 [9] N. Walton. AI Dungeon, 2019, https://aidungeon.io. accessed: 2021-07-15. [10] A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever. Zero-Shot Text-to-Image Generation. arXiv:2102.12092 [cs], Feb. 2021, arXiv:2102.12092, https://openai. com/blog/dall-e/. [11] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis. A general reinforcement learning that masters chess, , and Go through self-play. Science, 362(6419):1140– 1144, Dec. 2018, doi:10.1126/science.aar6404. [12] O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. L. Paine, C. Gulcehre, Z. Wang, T. Pfaff, Y. Wu, R. Ring, D. Yogatama, D. W¨unsch, K. McKinney, O. Smith, T. Schaul, T. Lillicrap, K. Kavukcuoglu, D. Hassabis, C. Apps, and D. Silver. Grandmaster level in StarCraft II using multi-agent reinforcement learning. , 575(7782):350–354, Nov. 2019, doi:10.1038/s41586-019-1724-z. [13] OpenAI, C. Berner, G. Brockman, B. Chan, V. Cheung, P. Dębiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse, R. Józefowicz, S. Gray, C. Olsson, J. Pachocki, M. Petrov, H. P. d. O. Pinto, J. Raiman, T. Salimans, J. Schlatter, J. Schneider, S. Sidor, I. Sutskever, J. Tang, F. Wolski, and S. Zhang. Dota 2 with Large Scale Deep Reinforcement Learning. arXiv:1912.06680 [cs, stat], Dec. 2019, arXiv:1912.06680, https://openai.com/projects/five/. [14] M. Jaderberg, W. M. Czarnecki, I. Dunning, L. Marris, G. Lever, A. G. Casta˜neda,C. Beattie, N. C. Rabinowitz, A. S. Morcos, A. Ruderman, N. Sonnerat, T. Green, L. Deason, J. Z. Leibo, D. Silver, D. Hassabis, K. Kavukcuoglu, and T. Graepel. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, 364(6443):859–865, May 2019, arXiv:1807.01281, doi:10.1126/science.aau6249, https://deepmind.com/blog/article/ capture-the-flag-science. [15] B. Baker, I. Kanitscheider, T. Markov, Y. Wu, G. Powell, B. McGrew, and I. Mordatch. Emergent Tool Use From Multi-Agent Autocurricula. arXiv:1909.07528 [cs, stat], Feb. 2020, arXiv:1909.07528. [16] O. Davoodi, M. Ashtiani, and M. Rajabi. An Approach for the Evaluation and Correction of Manually Designed Video Game Levels Using Deep Neural Networks. The Computer Journal, (bxaa071), July 2020, doi:10.1093/comjnl/bxaa071. [17] M. Guzdial and M. Riedl. Toward Game Level Generation from Gameplay Videos. arXiv:1602.07721 [cs], Feb. 2016, arXiv:1602.07721. [18] S. Risi and J. Togelius. Increasing generality in machine learning through procedural content generation. Nat Mach Intell, 2(8):428–436, Aug. 2020, arXiv:1911.13071, doi:10.1038/s42256- 020-0208-z. [19] M. Wardaszko and B. Podgórski. Mobile Learning Game Effectiveness in Cognitive Learning by Adults. Simul. Gaming, 48(4):435–454, Aug. 2017, doi:10.1177/1046878117704350. [20] K. Park, B. W. Mott, W. Min, K. E. Boyer, E. N. Wiebe, and J. C. Lester. Generating Educational Game Levels with Multistep Deep Convolutional Generative Adversarial Networks. In 2019 IEEE Conference on Games (CoG), pages 1–8, Aug. 2019, doi:10.1109/CIG.2019.8848085.

12 [21] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: A System for Large-Scale Machine Learning. pages 265–283, 2016, arXiv:1603.04467. [22] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. K¨opf,E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Jan. 2019, arXiv:1912.01703. [23] D. Smilkov, N. Thorat, Y. Assogba, A. Yuan, N. Kreeger, P. Yu, K. Zhang, S. Cai, E. Nielsen, D. Soergel, S. Bileschi, M. Terry, C. Nicholson, S. N. Gupta, S. Sirajuddin, D. Sculley, R. Monga, G. Corrado, F. B. Vi´egas,and M. Wattenberg. TensorFlow.js: Machine Learning for the Web and Beyond. arXiv:1901.05350 [cs], Feb. 2019, arXiv:1901.05350. [24] M. Guzdial, N. Liao, J. Chen, S.-Y. Chen, S. Shah, V. Shah, J. Reno, G. Smith, and M. O. Riedl. Friend, Collaborator, Student, Manager: How Design of an AI-Driven Game Level Editor Affects Creators. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, pages 1–13. Association for Computing Machinery, New York, NY, USA, May 2019, doi: 10.1145/3290605.3300854. [25] R. Stojnic, R. Taylor, M. Kardas, V. Kerkez, L. Viaud, E. Saravia, and G. Cucurull. Browse the State-of-the-Art in Machine Learning, Papers with Code, https://paperswithcode.com/sota. accessed: 2021-07-15. [26] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative Adversarial Networks. In NIPS 2014, June 2014, arXiv:1406.2661. [27] I. Goodfellow. NIPS 2016 Tutorial: Generative Adversarial Networks. arXiv:1701.00160 [cs], Apr. 2017, arXiv:1701.00160. [28] M. Kahng, N. Thorat, D. H. P. Chau, F. B. Vi´egas,and M. Wattenberg. GAN Lab: Understanding Complex Deep Generative Models using Interactive Visual Experimentation. IEEE Transactions on Visualization and , 25(1):310–320, Jan. 2019, arXiv:1809.01587, doi:10.1109/ TVCG.2018.2864500, https://poloclub.github.io/ganlab/. [29] S. Reed, Z. Akata, S. Mohan, S. Tenka, B. Schiele, and H. Lee. Learning what and where to draw. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pages 217–225, Red Hook, NY, USA, Dec. 2016. Curran Associates Inc., arXiv:1610.02454. [30] H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. Metaxas. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks. In 2017 IEEE International Conference on (ICCV), pages 5908–5916, Oct. 2017, arXiv:1612.03242, doi:10.1109/ICCV.2017.629. [31] A. Dash, J. C. B. Gamboa, S. Ahmed, M. Liwicki, and M. Z. Afzal. TAC-GAN - Text Conditioned Auxiliary Classifier Generative Adversarial Network. arXiv:1703.06412 [cs], Mar. 2017, arXiv:1703.06412. [32] Y. Jin, J. Zhang, M. Li, Y. Tian, H. Zhu, and Z. Fang. Towards the Automatic Anime Characters Creation with Generative Adversarial Networks. arXiv:1708.05509 [cs], Aug. 2017, arXiv:1708.05509.

13 [33] M. Brundage, S. Avin, J. Clark, H. Toner, P. Eckersley, B. Garfinkel, A. Dafoe, P. Scharre, T. Zeitzoff, B. Filar, H. Anderson, H. Roff, G. C. Allen, J. Steinhardt, C. Flynn, S. O. hEigeartaigh,´ S. Beard, H. Belfield, S. Farquhar, C. Lyle, R. Crootof, O. Evans, M. Page, J. Bryson, R. Yampolskiy, and D. Amodei. The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation. Report, Apr. 2018, arXiv:1802.07228, https://www.repository.cam.ac.uk/handle/ 1810/275332. [34] A. Brock, J. Donahue, and K. Simonyan. Large Scale GAN Training for High Fidelity Natural Image Synthesis. Sept. 2018, arXiv:1809.11096. [35] T. Karras, T. Aila, S. Laine, and J. Lehtinen. Progressive Growing of GANs for Improved Quality, Stability, and Variation. Feb. 2018, arXiv:1710.10196. [36] T. Karras, M. Aittala, J. Hellsten, S. Laine, J. Lehtinen, and T. Aila. Training Generative Adversarial Networks with Limited Data. In Proc. NeurIPS, Oct. 2020, arXiv:2006.06676.

[37] G. Branwen. Making Anime Faces With StyleGAN, Gwern.net, Feb. 2019, https://www.gwern. net/Faces. last modified: 2021-01-30, accessed: 2021-07-16. [38] S. Guan. SummitKwan/transparent latent gan, Sept. 2018, https://github.com/SummitKwan/ transparent_latent_gan. last modified: 2018-11-01. [39] G. Branwen. This Waifu Does Not Exist, Gwern.net, Feb. 2019, https://www.gwern.net/TWDNE. last modified: 2020-01-20, accessed: 2021-07-16.

[40] G. Branwen. This Waifu Does Not Exist v3.5 (TWDNEv3.5) - Gwern, Feb. 2019, https://www. thiswaifudoesnotexist.net. last modified: 2020-09-06, accessed: 2021-07-15. [41] W. Sun and Z. Chen. Learned Image Downscaling for Upscaling using Content Adaptive Resampler. IEEE Trans. on Image Process., 29:4027–4040, 2020, arXiv:1907.12904, doi:10.1109/TIP.2020. 2970248. [42] S. Anwar, S. Khan, and N. Barnes. A Deep Journey into Super-resolution: A survey. arXiv:1904.07523 [cs], Mar. 2020, arXiv:1904.07523. [43] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. C. Loy. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. In L. Leal-Taix´eand S. Roth, editors, Computer Vision – ECCV 2018 Workshops, Lecture Notes in Computer Science, pages 63–79, Cham, 2019. Springer International Publishing, arXiv:1809.00219, doi:10.1007/978-3-030-11021-5_5. [44] AI Neural Networks being used to generate HQ textures for older games (You can do it yourself!), ResetEra, Dec. 2018, https://www.resetera.com/threads/ai-neural-networks-being-used- to-generate-hq-textures-for-older-games-you-can-do-it-yourself.88272/. [45] GameUpscale - , Dec. 2018, https://www.reddit.com/r/GameUpscale. accessed: 2021-07- 15.

[46] C. Hart. Krischan74/RTCWHQ, May 2019, https://github.com/Krischan74/RTCWHQ. last modified: 2019-08-26. [47] R. McCaffrey. Diablo 2 Resurrected Announced for PC, PS5, Xbox Series X, PS4, Xbox One, and Switch - IGN. IGN, Feb. 2021, https://www.ign.com/articles/diablo-2- resurrected-announced-for-pc-ps5-xbox-series-x-ps4-xbox-one-and-nintendo-switch.

14 [48] L. Hafer. StarCraft Remastered Review. IGN, Aug. 2017, https://www.ign.com/articles/2017/ 08/19/starcraft-remastered-review.

[49] S. Menon, A. Damian, S. Hu, N. Ravi, and C. Rudin. PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models. pages 2437–2445, 2020, arXiv:2003.03808.

[50] A. Burnes. NVIDIA DLSS 2.0: A Big Leap In AI Rendering, Mar. 2020, https://www.nvidia.com/ en-us/geforce/news/nvidia-dlss-2-0-a-big-leap-in-ai-rendering/. accessed: 2021-07-15.

[51] J. Burgess. RTX on—The NVIDIA Turing GPU. IEEE Micro, 40(2):36–44, Mar. 2020, doi: 10.1109/MM.2020.2971677.

[52] K. Castle. Here are all the confirmed ray tracing and DLSS games so far. Rock Paper Shotgun, June 2021, https://www.rockpapershotgun.com/confirmed-ray-tracing-and-dlss-games-2021.

[53] bycloud. Depixelizing Doom Guy? Mona Lisa in Real Life? The ”Upscaling” AI: PULSE, June 2020, https://www.youtube.com/watch?v=CSoHaO3YqH8. accessed: 2021-07-16.

[54] L. Gatys, A. Ecker, and M. Bethge. A Neural Algorithm of Artistic Style. Journal of Vision, 16(12):326–326, Aug. 2016, arXiv:1508.06576, doi:10.1167/16.12.326.

[55] M. Ruder, A. Dosovitskiy, and T. Brox. Artistic Style Transfer for Videos. In B. Rosenhahn and B. Andres, editors, , Lecture Notes in Computer Science, pages 26–36, Cham, 2016. Springer International Publishing, arXiv:1604.08610, doi:10.1007/978-3-319-45886-1_3.

[56] A. Semmo, T. Isenberg, and J. D¨ollner.Neural Style Transfer: A Paradigm Shift for Image-based Artistic Rendering? pages 5:1–5:13. ACM, July 2017, doi:10.1145/3092919.3092920.

[57] Y. Jing, Y. Yang, Z. Feng, J. Ye, Y. Yu, and M. Song. Neural Style Transfer: A Review. IEEE Transactions on Visualization and Computer Graphics, 26(11):3365–3385, Nov. 2020, arXiv:1705.04058, doi:10.1109/TVCG.2019.2921336.

[58] T. Deliot, T. Guinier, and K. Vanhoey. Real-time style transfer in Unity using deep neural networks, Unity Blog, Nov. 2020, https://blog.unity.com/technology/real-time-style-transfer-in- unity-using-deep-neural-networks. accessed: 2021-07-15.

[59] K. Allen. Neural Net Style Transfer Game Art, Nov. 2016, http://katieamazing.com/blog/2016/ 12/15/art-generation-style-transfer. accessed: 2021-07-16.

[60] F. Luan, S. Paris, E. Shechtman, and K. Bala. Deep Photo Style Transfer. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6997–7005, July 2017, arXiv:1703.07511, doi:10.1109/CVPR.2017.740.

[61] J. An, H. Xiong, J. Huan, and J. Luo. Ultrafast Photorealistic Style Transfer via Neural Architecture Search. AAAI, 34(07):10443–10450, Apr. 2020, arXiv:1912.02398, doi:10.1609/aaai.v34i07. 6614.

[62] T. Karras, S. Laine, and T. Aila. A Style-Based Generator Architecture for Generative Adversarial Networks. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4396–4405, June 2019, arXiv:1812.04948, doi:10.1109/CVPR.2019.00453.

15 [63] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila. Analyzing and Improving the Image Quality of StyleGAN. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8107–8116, June 2020, arXiv:1912.04958, doi:10.1109/CVPR42600. 2020.00813.

[64] S. R. Richter, H. A. AlHaija, and V. Koltun. Enhancing Photorealism Enhancement. arXiv:2105.04619 [cs], May 2021, arXiv:2105.04619, https://intel-isl.github.io/ PhotorealismEnhancement/.

[65] M. Bethge, A. Ecker, L. Gatys, Ł. Kidziński, and M. Warchoł. deepart.io - become a digital artist, Oct. 2015, https://deepart.io/. accessed: 2021-07-15.

[66] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-Image Translation with Conditional Adversarial Networks. pages 5967–5976. IEEE Computer Society, July 2017, arXiv:1611.07004, doi:10.1109/CVPR.2017.632, https://phillipi.github.io/pix2pix/.

[67] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired Image-to-Image Translation Using Cycle- Consistent Adversarial Networks. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 2242–2251, Oct. 2017, arXiv:1703.10593, doi:10.1109/ICCV.2017.244, https:// junyanz.github.io/CycleGAN/.

[68] T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu. Semantic Image Synthesis With Spatially-Adaptive Normalization. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2332–2341, June 2019, arXiv:1903.07291, doi:10.1109/CVPR.2019.00244, https: //nvlabs.github.io/SPADE/.

[69] M.-L. Shih, S.-Y. Su, J. Kopf, and J.-B. Huang. 3D Using Context-Aware Layered Depth Inpainting. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8025–8035, June 2020, arXiv:2004.04727, doi:10.1109/CVPR42600.2020.00805, https://shihmengli.github.io/3D-Photo-Inpainting/.

[70] Z. Hao, A. Mallya, S. Belongie, and M.-Y. Liu. GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds. arXiv:2104.07659 [cs], Apr. 2021, arXiv:2104.07659, https://nvlabs.github. io/GANcraft/.

[71] O. Ronneberger, P. Fischer, and T. Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. In N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, editors, Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Lecture Notes in Computer Science, pages 234–241, Cham, 2015. Springer International Publishing, arXiv:1505.04597, doi:10. 1007/978-3-319-24574-4_28, https://lmb.informatik.uni-freiburg.de/people/ronneber/ u-net/.

[72] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous , and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):834–848, Apr. 2018, arXiv:1606.00915, doi:10.1109/TPAMI.2017.2699184.

[73] A. Nowaczynski. Deep learning for satellite imagery via image segmentation, deepsense.ai, Apr. 2017, https://deepsense.ai/deep-learning-for-satellite-imagery-via-image-segmentation/. accessed: 2021-07-16.

16 [74] Y. Bengio. Deep learning of representations for unsupervised and transfer learning. In Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning workshop - Volume 27, UTLW’11, pages 17–37, Washington, USA, July 2011. JMLR.org, http://proceedings.mlr. press/v27/bengio12a.html.

[75] C. Olah, A. Mordvintsev, and L. Schubert. Feature Visualization. Distill, 2(11):e7, Nov. 2017, doi:10.23915/distill.00007, https://distill.pub/2017/feature-visualization.

[76] K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for Image Recognition. pages 770–778. IEEE Computer Society, June 2016, arXiv:1512.03385, doi:10.1109/CVPR.2016.90.

[77] S. Kornblith, J. Shlens, and Q. V. Le. Do Better ImageNet Models Transfer Better? In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2656–2666, June 2019, arXiv:1805.08974, doi:10.1109/CVPR.2019.00277.

[78] L. Chen. Keras.js - Run Keras models in the browser, May 2017, https://transcranial.github. io/keras-js/. accessed: 2021-07-15.

[79] J. Masci, U. Meier, D. Cire¸san,and J. Schmidhuber. Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction. In T. Honkela, W. Duch, M. Girolami, and S. Kaski, editors, Artificial Neural Networks and Machine Learning – ICANN 2011, Lecture Notes in Computer Science, pages 52–59, Berlin, Heidelberg, 2011. Springer, doi:10.1007/978-3-642-21735-7_7.

[80] M. Caron, P. Bojanowski, A. Joulin, and M. Douze. Deep Clustering for of Visual Features. In V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, editors, Computer Vision – ECCV 2018, Lecture Notes in Computer Science, pages 139–156, Cham, 2018. Springer International Publishing, arXiv:1807.05520, doi:10.1007/978-3-030-01264-9_9.

[81] N. Jean, S. Wang, A. Samar, G. Azzari, D. Lobell, and S. Ermon. Tile2Vec: Unsupervised Representation Learning for Spatially Distributed Data. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 3967–3974, July 2019, arXiv:1805.02855, doi:10.1609/ aaai.v33i01.33013967. Number: 01.

[82] O. Fried, S. Avidan, and D. Cohen-Or. Patch2Vec: Globally Consistent Image Patch Representation. Computer Graphics Forum, 36(7):183–194, 2017, doi:10.1111/cgf.13284.

[83] P. Migdał and B. Olechno. Unsupervised segmentation - inspecting examples, interactively, W3C Workshop on Web and Machine Learning, Aug. 2020, https://www.w3.org/2020/Talks/mlws/ piotr_migdal/slides.html. accessed: 2021-07-15.

[84] Y. Zhou, Z. Zhu, X. Bai, D. Lischinski, D. Cohen-Or, and H. Huang. Non-stationary texture synthesis by adversarial expansion. ACM Trans. Graph., 37(4):49:1–49:13, July 2018, arXiv:1805.04487, doi: 10.1145/3197517.3201285, https://vcc.tech/research/2018/TexSyn.

[85] A. Shocher, S. Bagon, P. Isola, and M. Irani. InGAN: Capturing and Remapping the ”DNA” of a Natural Image. In The IEEE International Conference on Computer Vision (ICCV), Apr. 2019, arXiv:1812.00231, http://www.wisdom.weizmann.ac.il/~vision/ingan/.

[86] L. Liang, C. Liu, Y.-Q. Xu, B. Guo, and H.-Y. Shum. Real-time texture synthesis by patch-based sampling. ACM Trans. Graph., 20(3):127–150, July 2001, doi:10.1145/501786.501787.

17 [87] M. Gumin. mxgmn/WaveFunctionCollapse, Sept. 2016, https://github.com/mxgmn/ WaveFunctionCollapse. last modified: 2021-07-07.

[88] H. Kim, S. Lee, H. Lee, T. Hahn, and S. Kang. Automatic Generation of Game Content using a Graph-based Wave Function Collapse Algorithm. In 2019 IEEE Conference on Games (CoG), pages 1–4, Aug. 2019, doi:10.1109/CIG.2019.8848019. ISSN: 2325-4289.

[89] I. Karth and A. M. Smith. WaveFunctionCollapse is constraint solving in the wild. In Proceedings of the 12th International Conference on the Foundations of Digital Games, FDG ’17, pages 1–10, New York, NY, USA, Aug. 2017. Association for Computing Machinery, doi:10.1145/3102071.3110566.

[90] L. Rabiner. A tutorial on hidden Markov models and selected applications in . Proceedings of the IEEE, 77(2):257–286, Feb. 1989, doi:10.1109/5.18626.

[91] S. Hochreiter and J. Schmidhuber. Long Short-Term Memory. Neural Computation, 9(8):1735–1780, Nov. 1997, doi:10.1162/neco.1997.9.8.1735.

[92] K. Cho, B. van Merri¨enboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–1734, Doha, Qatar, Oct. 2014. Association for Computational Linguistics, arXiv:1406.1078, doi:10.3115/v1/D14-1179.

[93] M. Chen, A. Radford, R. Child, J. Wu, H. Jun, D. Luan, and I. Sutskever. Generative Pretraining From Pixels. In International Conference on Machine Learning, pages 1691–1703. PMLR, Nov. 2020, http://proceedings.mlr.press/v119/chen20s.html. ISSN: 2640-3498.

[94] A. Summerville and M. Mateas. Super Mario as a String: Platformer Level Generation Via LSTMs. In DiGRA/FDG ’16 - Proceedings of the First International Joint Conference of DiGRA and FDG, volume 13 of 1, Aug. 2016, arXiv:1603.00930.

[95] D. Kushner. Masters of Doom: How Two Guys Created an Empire and Transformed Pop Culture. Random House, May 2003, https://en.wikipedia.org/wiki/Masters_of_Doom.

18