Player-AI Interaction: What Neural Network Games Reveal About AI as Play

Jichen Zhu∗† Jennifer Villareale∗ Nithesh Javvaji∗ [email protected] [email protected] [email protected] Drexel University Drexel University Northeastern University Sebastian Risi Mathias Löwe Rush Weigelt [email protected] [email protected] [email protected] IT University Copenhagen IT University Copenhagen Drexel University Casper Harteveld∗ [email protected] Northeastern University

Figure 1: Selection of neural network (NN) games. (From left to right, Top: How to Train Your Snake [108], Idle Game [141], Evolution [115], EvoCommander [114], Machine Learning Arena [117], Hey Robot [120]; Middle: Quick, Draw! [125], Semantris [126], Dr. Derks Mutant Battlegrounds [140], Forza Car Racing [139], Democracy 3 [122], Darwin’s Avatars [113], AudioinSpace [105]; Bottom: NERO [128], Black & White [106], Creatures [127], MotoGP19 [131], Supreme Commander 2 [121], Galactic Arms Race [119], Petalz [118]) ABSTRACT survey of neural network games (n = 38), we identified the dom- The advent of (AI) and machine learning (ML) inant interaction metaphors and AI interaction patterns in these games. In addition, we applied existing human-AI interaction guide- arXiv:2101.06220v2 [cs.HC] 18 Jan 2021 bring human-AI interaction to the forefront of HCI research. This paper argues that games are an ideal domain for studying and exper- lines to further shed light on player-AI interaction in the context of imenting with how humans interact with AI. Through a systematic AI-infused systems. Our core finding is that AI as play can expand current notions of human-AI interaction, which are predominantly ∗These authors contributed equally to this research. productivity-based. In particular, our work suggests that game and † Currently at IT University of Copenhagen, Denmark. UX designers should consider flow to structure the learning curve of human-AI interaction, incorporate discovery-based learning to Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed play around with the AI and observe the consequences, and of- for profit or commercial advantage and that copies bear this notice and the full citation fer users an invitation to play to explore new forms of human-AI on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, interaction. to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. CHI ’21, May 8–13, 2021, Yokohama, Japan © 2021 Association for Computing Machinery. ACM ISBN 978-1-4503-8096-6/21/05...$15.00 https://doi.org/10.1145/3411764.3445307 1 CHI ’21, May 8–13, 2021, Yokohama, Japan Zhu, Villareale and Javvaji, et al.

CCS CONCEPTS gameplay. A neural network (NN) is a computational model that • Human-centered computing → Human computer interac- includes nodes (i.e., neurons) and connections between these nodes tion (HCI); • Applied computing → Computer games; • Com- that transmit information [76]. The strengths of these connections puting methodologies → Artificial intelligence. (i.e., weights) are typically adjusted through some learning process. For our paper, this definition covers both NNs that control agents / KEYWORDS non-player characters (NPCs) in games [43, 88] and generative NN models that produce game content [33, 71]. Human-AI Interaction; Neural Networks; User Experience; Game We chose NN games for two key reasons. First, given the wide Design adoption of AI in games, we had to constrain our systematic (quali- ACM Reference Format: tative) review. Second, and more importantly, NN games provide Jichen Zhu, Jennifer Villareale, Nithesh Javvaji, Sebastian Risi, Mathias insights into some of the most pressing open problems in human-AI Löwe, Rush Weigelt, and Casper Harteveld. 2021. Player-AI Interaction: interaction. For example, NNs are notorious for UX designers to What Neural Network Games Reveal About AI as Play. In CHI Conference on work with because of NNs’ low interpretability of the underlying Human Factors in Computing Systems (CHI ’21), May 8–13, 2021, Yokohama, Japan. ACM, New York, NY, USA, 17 pages. https://doi.org/10.1145/3411764. process and the frequent unpredictability of its outcome. Studying 3445307 NN games can thus provide valuable information on how game designers have to work with these challenges. 1 INTRODUCTION We collected 38 NN games and applied a two-phased qualitative 1 analysis to examine them. In the first phase, we use close read- With the recent boom in artificial intelligence (AI) technology, peo- ing and grounded theory to identify the overarching interaction ple are interacting with a growing number of AI-infused products metaphors and patterns of how NNs are represented in the game in many aspects of everyday life. Already, these AI systems influ- user interface (UI). In the second phase, we apply current human-AI ence our decisions (e.g., recommendation systems [74]), inhabit our interaction design guidelines [6], compiled from a wide range of households (e.g., robotic appliances [80]), and accompany us in our productivity-based domains, to our dataset. From these analyses, playful experiences [54, 102, 103] and educational games [85, 99]. we derive design lessons for where games do well and identify The technological development has precipitated renewed in- open areas that can expand our current notion of human-AI inter- terest in the Human Computer Interaction (HCI) community. In action. A key design insight is that reframing AI as play offers a addition to improving the usability of individual products [58], HCI useful approach for considering human-AI interaction in games researchers synthesized design guidelines for human-AI interaction and beyond. from the past decades [6]. There is a growing recognition that, com- The core argument of this paper is that games are a rich and pared to traditional interactive systems, AI-infused products impose currently overlooked domain for advancing human-AI interaction. additional challenges (e.g., technical barrier, low interpretability) to The design space afforded by structuring AI as play, as game de- the current user experience (UX) design process [23]. Furthermore, signers have been exploring, can point out new opportunities for new interdisciplinary research areas have emerged around topics AI-infused products in general. At the same time, insights of the such as explainable AI [9, 66, 83, 101], ethics & fairness [12, 23, 38], generalized guidelines from other domains can be adapted to im- and machine learning (ML) as a design material for UX [93, 94]. prove player-AI interaction. The key contributions of this paper Among the fast-growing body of literature on human-AI inter- are as follows: action in the CHI community, one overlooked area is the context of play. With few exceptions, most recent literature focuses on • We propose the new research area of player-AI interaction. productivity-related domains such as e-commerce, navigation and Through the first systematic review on player-AI interaction autocomplete [6]. While these are important domains for human-AI in the context of NN games, we showcase how player-AI interaction, the history of AI and human-AI interaction has long interaction can expand the current productivity-based dis- been associated with play. For instance, ELIZA, one of the first AI cussions around human-AI interaction. programs designed to interact with lay users, was a playful satire • We adapted existing design guidelines for human-AI inter- of a certain school of psychotherapy [90]. Games such as Chess, action to the context of games. Currently, there are no syn- Poker, Go, and StarCraft have continued to serve as benchmarks thesized metrics to evaluate player-AI interaction. that propelled the development of AI since the beginning of the • We provide several insights from NN games (e.g., flow, ex- field. Since games naturally focus on end-user experience, gameAI ploration) to improve current challenges in human-AI inter- research has accumulated valuable knowledge related to human-AI action (e.g., learnability of AI). interaction [52, 54, 71, 86, 97, 100]. In this paper, we propose the new construct of player-AI interac- 2 RELATED WORK tion to highlight how people interact with AI in the context of play, In this section, we summarize related work in human-AI interac- especially through computer games. To provide an overview of tion and AI-based games research. We aim to bridge these two existing work in this area, we conducted the first systematic review disconnected areas. of player-AI interaction in the scope of Neural Network games — com- puter games in which players interact with an NN as part of the core 2.1 Human-AI Interaction

1Unless otherwise specified, we use the term AI broadly to include a wide rangeof The HCI community has developed a body of work on how to artificial intelligence and machine learning techniques. design user interactions to improve productivity through AI-based 2 Player-AI Interaction: What Neural Network Games Reveal About AI as Play CHI ’21, May 8–13, 2021, Yokohama, Japan applications [35, 39, 41, 78, 92]. Thanks to increasingly sophisticated co-creator, adversary, villain, or spectacle, and whether AI is vis- big data and deep neural networks (i.e., ), AI-infused ible or guided. Cook et al. [16] further examined design patterns products have started to enter the consumer market, prompting in procedural content generation (PCG)-based games and derived a new surge of interest in human-AI interaction [6, 8, 63] and its different AI design patterns. In the context of assisting thegame societal impact in topics such as explainability and transparency [9, development process, Riedl and Zook [68] proposed that AI plays 66, 83, 101]. the role of actor, designer, and producer. While the above work pro- An important research area is user-centered design for human-AI vides a critical starting point for our work, they are “meant to be a interaction. For instance, in conversational agents, a widely adopted tool for thinking about creating AI-based games, rather than serve type of AI-infused products, researchers have reported the wide gap as a comprehensive taxonomy of methods” [84]. Finally, Guzdial between user expectations and the user experience (UX) of these et al. [32] used the taxonomy of friend, collaborator, student, and systems [51] and how users develop their own strategies to work manager to describe the different interaction metaphors for how around the obstacles [58]. Recently, researchers synthesized guide- game designers interact with an AI-based game level editor. Our lines, principles, and theories into coherent design frameworks for work builds on this tradition of using human-human interaction human-AI interaction [6, 79, 89]. While user-centered (or player- as metaphors to structure the interaction between humans and AI. centered) design is key in game development [81], few works have In addition, we extend this literature by conducting an in-depth looked at games as a domain for human-AI interaction, despite the empirical analysis through grounded theory instead of relying on re- long history of games and AI. Our work is the first to do so in a searchers’ domain expertise, as is the case for the above-mentioned systematic and empirical way. More specifically, we leverage the existing work. existing meta-review of human-AI interaction design principles [6] Finally, there is a significant body of work in games research to investigate NN games. to understand player experience [2, 20, 21, 50, 60]. For example, There is a growing understanding in recent literature that design- the game engagement questionnaire (GEQ) [11] is a widely used ing AI and ML products is especially challenging for UX designers. instrument for measuring player engagement, although recently it Dove et al. [23] acknowledged that, despite the regular use of ML in has been approached with increasing criticism [47]. Other notable UX products, there has been little design innovation. Echoing the frameworks include game involvement [13], game usability [21], challenges of using ML as design material, Yang et al. [95] argued and design heuristics [50]. While these frameworks are useful to that capability uncertainty and output complexity of AI systems are improve the general player experience, they do not have sufficient the two root causes of why human-AI interaction is uniquely tricky focus on the interaction between players and AI to guide the human- to design. The work presented in this paper aims to identify player- AI interaction design of games. Thus, our work proposes the first AI interaction, which is currently separated from the mainstream set of guidelines to design and evaluate player-AI interaction. human-AI interaction literature, as a rich domain for further study and experimentation. Lessons from the game research community on how to structure human-AI interaction in the context of play can 3 DATASET: NEURAL NETWORK GAMES help to expand the current body of work in human-AI interaction. This section describes our systematic search process and the result- ing dataset of 38 NN Games. Table 1 provides an overview of this dataset, including the characteristics described in this section. 2.2 AI-based Game Design and Player Experience In AI research, there is an extended history of using games as a rich 3.1 Search Strategy and Data Collection domain to motivate algorithmic advancements. Salient examples We searched two popular web gaming portals—Steam and itch.io— include Chess in the era of “Good Old Fashioned AI” (GOFAI) [14], and a widely used game AI book, Artificial Intelligence and Games [96]. Go [73], classic Atari video games [56], or even the popular AAA We chose these three sources because they collectively cover a wide game StarCraft [62, 88] in the age of deep learning. The advances in variety of games of different production modes. Steam is the largest game AI in turn opened new design spaces of player experience in digital distribution platform for PC gaming, offering over 34,000 research [52, 54, 86, 97, 100] as well as commercially released games games with over 95 million monthly active users in 2019 [1]. itch.io and game engines (e.g., Versu [24], Left4Dead [87] and Civilization is one of the largest platforms for indie games, containing nearly VI [27]). 100,000 games with various metadata. Artificial Intelligence and However, with few exceptions, games have only recently started Games is the most cited book on game AI that includes examples to be used as a serious domain for human-AI interaction research. of games with notable AI innovations. The book complements the For instance, Gomme and Bartle [29] used strategy games to study previous two sources for its coverage on AAA commercial games players’ expectations for what they consider to be a worthy AI- and research games. controlled opponent. Along those lines, several researchers pro- Our inclusion criteria were computer games wherein players posed using games and playful experiences to help designers and can interact with an NN as part of the core gameplay (i.e., game- users learn AI [25, 59, 64]. play loop). We use the definition of core gameplay as “the setof Most existing work has focused on high-level metaphors (often actions the player iterates on the most while playing the game [and referred to as “design patterns”) of how players and designers can which] should directly influence the outcomes of the game” [31]. interact with AI. For instance, Treanor et al. [84] derived nine pat- We further excluded work with no clear win condition and no clear terns based on what players do: AI as role-model, trainee, editable, feedback on how player interaction with the AI impacts the game, 3 CHI ’21, May 8–13, 2021, Yokohama, Japan Zhu, Villareale and Javvaji, et al.

because the player’s actions (i.e., walk, jump, punch) produced no Game AI Book itch.io Steam Additional Games (n=9) (n=36) (n=67) (n=13) visible change in the NN’s behavior. As a result, it was unclear how the player was intended to interact with the NN and to what end Identification the NN impacted gameplay. Note that our goal was not to develop Games excluded after Game websites and descriptions game reviewed for a comprehensive list, but rather to capture a representative sample screened for NN and relevance NN and relevance (n=125) (n=69) of NN games to analyze current trends in player-AI interaction. Screening

Games excluded 3.2 Dataset Games assessed due to NN not part of for eligibility gameplay loop or For each game we included, two researchers collected the following (n=56) unclear feedback

Eligibility (n=18) to form our dataset: 1) screen recording of one researcher playing at least one hour of the game, 2) game developers’ description of the

Games included Games excluded game and their design intent (i.e., via the game’s website, developer for game analysis for being unplayable (n=38) (n=7) blog, and academic publications), and 3) technical features of the NN (online vs. offline learning, the types of output to the NN). Weused findings from 1) and 2) to determine 3). If it was not apparent, the Included Games included two researchers consulted our NN expert coauthors for resolution. for human-AI interaction guideline analysis This section summarizes the key characteristics of our dataset. It (n=31) should be noted that seven games did not have playable versions publicly available (marked with * in Table 1). For those games, Figure 2: Data collection process. the researchers used existing gameplay footage available on the Internet for our Phase 1 analysis. However, for Phase 2, directly playing the games is necessary. Thus, we excluded the unplayable games for our guideline analysis (see Figure 2). as these games lack the basic elements for meaningful player-AI in- teraction. Notice that sandbox games with clear feedback to player 3.2.1 Characteristics as Games. As a collection of games, our dataset interaction are included [30, 33, 106, 137, 142]. For the same reason, is diverse in multiple aspects. Indie and research games make up we excluded games where the NN did not interact with players most of the dataset (74%), while only 26% are AAA games. Us- (e.g., ML agents that can automatically play games [75]). Finally, ing Heintz and Law’s taxonomy [34], our games cover all the gen- we excluded digitized versions of traditional board/card games (e.g., res: 15 simulation games (39%), 7 puzzle games (18%), 5 strategy Chess, Go, Poker) to focus on computer games. Future research is games (13%), 4 role-playing games (11%), 3 action games (8%), 3 needed for investigating player-AI interaction in traditional games. sports games (8%) and 1 adventure game (3%). In addition, 16 games Our search process is summarized in Figure 2. Similar to other are multiplayer (42%), and the remaining 22 games are single player systematic reviews on large game repositories [5], we used pre- (58%). This suggests that our dataset has a good representation of existing game tags in these systems. On itch.io, we used its tags different types of games. “neural-network” and “machine-learning,” and “AI.” On Steam we searched with the terms “neural network”, “machine learning”, and 3.2.2 Characteristics of NN and Game AI. From the technical point also used Steam’s own tag “artificial intelligence.” We acknowledge of view, our dataset also covered a wide range of varieties. There that not all NN games identify themselves as such, and this is a is a relatively even split between NN games with online learning limitation of our study. However, it is possible that the games that (58%) and offline learning methods (42%). This technical feature is advertise their use of NNs are more likely to pay extra attention associated with different gameplay characteristics. In online learn- to player-AI interaction. In the Artificial Intelligence and Games ing games, the network is (further) trained as the player interacts book [96], we went through the chapter “A Brief History of Ar- with it. Therefore, these games can adapt to individual players’ tificial Intelligence and Games” to collect the relevant games. In actions in real-time. Offline learning games, on the other hand, are order to include as many influential examples as possible, we also shipped with fixed NNs and are not adaptive in the same way. How- asked on social media in the games and game AI communities for ever, offline learning games have the advantage of handling more additional work. The suggested games are included in the category complex user input, such as natural language [104, 120, 126, 142] of “additional games” along with the games the authors were aware and images [125, 135]. The output of the NN can be divided into of. behavior (89%) and content (11%). Behavior output consists of the After screening the 125 games resulting from the above process actions and decisions by NN-controlled characters. Content output for eligibility, we found 38 games that met the inclusion criteria typically take the form of in-game assets such as flowers118 [ ] or (Table 1). The most common reasons for games to be excluded were weapons [119]. that 1) they used content (e.g., music) generated by an external NN, A key challenge to our analysis is the black-box nature of AI and but the NN was not part of the gameplay loop, and 2) they did not NNs, especially from the players’ point of view. Similar to other have full human-AI interaction due to the lack of feedback for player AI-infused products, games often contain complex interactions actions. For example, Bird by Example [110] is a single-player RPG supported by different algorithms. It can be difficult to attribute where players navigate a forest environment and interact with their gameplay features to specific algorithms without access to source bird offspring that is controlled by an NN. This game was excluded code. The authors, including two game AI researchers, made our 4 Player-AI Interaction: What Neural Network Games Reveal About AI as Play CHI ’21, May 8–13, 2021, Yokohama, Japan best effort to determine whether there were multiple AIs in agame complete, both researchers reviewed each other’s work to ensure (e.g., Black and White [106]) and what the NN was responsible for. there were no discrepancies. If there was a discrepancy, the game We did so by using game developers’ descriptions (e.g., conference would be discussed and reviewed again by both researchers. talks and online articles) and our technical expertise to analyze We opted for this consensual qualitative approach [36] instead the gameplay. Still, many such technical details remain unknown of inter-rater reliability (IRR) because analyzing NN games and for commercial games. We acknowledge this as a limitation of our their compliance with guidelines (next section) is complex. This study. However, what makes NNs uniquely demanding in their is due to the incredibly varying contexts and nuances that need human-AI interaction design is their reduced predictability and low to be considered in this emerging but under-studied area. Con- interpretability. We argue that an NN, whether as the entire game sensus coding is generally more suited for small samples and for AI or as a component, will introduce these characteristics to the considering multiple viewpoints, which fits our study better than games they are part of. As a result, NN games can be studied as a IRR [55]. Specifically, in order to get such multiple viewpoints, the whole, regardless of their technical differences. two researchers discussed above were a game researcher and an AI engineer. A much richer and deeper understanding is thus gained. 4 PHASE 1: ANALYZING PLAYER-AI INTERACTION IN NN GAMES 4.2 Results The first broad question we attempt to answer is how existing This section presents the results of our Phase 1 analysis. All classi- games use neural networks (NN), especially in terms of human-AI fications of each game according to our player-NN framework are interaction. In particular, we focused on two subsidiary research presented in Table 1. questions: 4.2.1 Interaction Metaphors. The HCI literature shows that inter- • RQ 1.a: How do NN games structure player-AI interaction? face metaphors (e.g., “Desktop” and “Search Engine”) are “useful • RQ 1.b: How visible are NNs in the UI of the core gameplay? ways to provide familiar entities that enable people readily to un- derstand the underlying conceptual model [of a system] and know 4.1 Methods what to do at the interface” [72, p.78]. Critical AI studies revealed the The overall analysis procedure involved a close reading of the importance of metaphors to AI [3, 53, 98]. Our analysis found four gameplay data in our dataset of 38 games. We used grounded theory interaction metaphors that provide familiar structures for players to iteratively develop a framework for how players interact with the to interact with the AI: NN as Apprentice, Competitor, Designer, and NN (i.e., the interaction metaphors) and how much the existence of Teammate. This finding is consistent with recent work in thegame NN is foregrounded in the core gameplay (i.e., levels of visibility). design literature. Based on their expert knowledge and intuition, After initial observations of the games, two researchers con- game developers discuss how interaction metaphors (often referred ducted a close reading of the dataset based on the following ques- to as “design patterns”) have been used in game design [16, 84] tions: 1) What role does the NN play in the overall game system? 2) and in game production [68]. Our analysis extends the existing How does the player interact with the NN in the game? 3) Where literature by conducting the first empirical work that uses deep does the interaction with the NN occur in the gameplay experience? qualitative analysis to analyze the interaction metaphors. and 4) How, if at all, are the NNs presented in the UI? The largest portion of NN games (34%) adopted what we call Next, the two researchers conducted a preliminary open coding Neural Network as Apprentice. In these games, the player inter- to label notable characteristics of the observations. During this acts with the NN as its mentor, and the focus of the gameplay is step of the analysis, both researchers first went through each game how player changes the NN over time. The player’s mentoring of individually and noted initial labels (e.g., player input via param- the NN can be achieved by providing direct feedback to the NN’s eters, NN outputs behavior, NN is represented as a creature) into behaviors [106, 127, 129]. For example, in Creatures, the player a shared document. Then, they discussed this document to iterate provides positive feedback (petting) when the NN-controlled crea- on the labels to form concepts (e.g., player directs NN toward a ture displays desirable behavior (e.g., eat when hungry) and pun- desired goal). While constructing the concepts, they re-observed ishes it (slapping) for the opposite. A second way the player can some of the games and reviewed related literature to refine the mentor the NN is by configuring the right training setting for classifications. it [109, 120, 128, 134]. The gameplay afforded by this interaction Once a common concept list was achieved and agreed upon, metaphor focuses on getting the player to train the NN. As shown both researchers separately re-analyzed the shared document to in Figure 4, all games in this category use online learning. develop preliminary categories that fit into a framework. During Another interaction metaphor our NN games use is Neural this step of the analysis, the researchers presented each other’s Network as Competitor. The key characteristics of this group, framework to one another and then collectively iterated on the consisting of 26% of the games, is that player-AI interaction is categories to finalize the framework. The result of this phase is adversarial. For example, in Supreme Commander 2 [121], the player a set of categories that make up the player-NN framework (i.e., fights an NN through their respective army platoons. As the player the interaction metaphors discussed in Section 4.2.1 and levels of customizes their army, the NN weighs the player’s unit composition visibility discussed in Section 4.2.2). Using the framework, each against its own and makes tactical battle decisions, such as how its researcher independently coded the same 7 games (20%), one from army will respond, which enemy to target first, or when to retreat. each genre. After a complete agreement on the codes, they then The NN can exploit players that are over-reliant on a single strategy coded the rest of the games independently. When the codes were and counter the player to create an evolving challenge [107, 121, 5 CHI ’21, May 8–13, 2021, Yokohama, Japan Zhu, Villareale and Javvaji, et al. NN-Limited NN-Specific NN-Limited UI NN-Limited UI NN-Limited UI NN-Limited UI NN-Limited UI NN-Specific UI NN-Specific UI NN-Specific UI NN-Specific UI NN-Specific UI NN-Agnostic UI NN-Agnostic UI NN-Agnostic UI NN-Agnostic UI NN-Agnostic UI NN-Agnostic UI NN-Agnostic UI Designer Designer Teammate Teammate Teammate Teammate Teammate Apprentice Apprentice Apprentice Apprentice Apprentice Apprentice Apprentice Apprentice Competitor Competitor Competitor Competitor Player-NN Framework Characteristics Online Online Online Online Online Online Online Online Online Online Online Online Offline Offline Offline Offline Offline Offline Offline Behaviors Behaviors Behaviors Behaviors Behaviors Behaviors Behaviors Behaviors Behaviors Behaviors Behaviors Behaviors Behaviors Behaviors Behaviors Behaviors Behaviors Behaviors Behaviors NN Characteristics controls creature movement controls creature movement creates creature desires controls robot movement controls the car’s driving performancecontrols enemy snake behavior controls the creature’s sensor-motor coordination creates motivations and desires of the public Behaviorscontrols tank movement and shooting behavior Offlinecontrols creature movement and sensory input controls the car’s Competitor driving performance controls creature behavior NN-Agnostic UI controls language processing controls performance of the vehicle’s movement controls robot behavior controls car movement controls chef behavior identifies sketches drawn by player the controls enemy unit flight and fight behavior No No No No No No No No No Yes Yes Yes Unknown Unknown Unknown Unknown Unknown Unknown Unknown controls opponent behavior Sports Puzzle Puzzle Puzzle Puzzle Strategy Strategy Role-play Role-play Role-play Role-play Simulation Simulation Simulation Simulation Simulation Simulation Simulation Simulation Indie Indie Indie Indie Indie Indie Indie Indie AAA AAA AAA AAA AAA AAA Research Research Research Research Research Game Characteristics Table 1: Overview of the 38 NN games and the results of Phase 1 analysis. (* Games without available playable versions and excluded for Phase 2.) Game Title [Ludography]2D Walk Evolution [130] AI Dungeon [142]AIvolution [138] AudioinSpace* [105][40] PublisherBlack & White [106][91] Blitzkrieg 3 Genre [132]BrainCrafter* [134] MultipleColin AIs? McRae Rally 2.0* [112]Competitive Snake [136] Corral Research [137]Creatures Indie [127][30] ActionDarwin’s Avatar* [113] [48]Democracy Adventure 3 [122] NN AAADr. Responsibilities Derk’s Mutant Battlegrounds [140]EvoCommander* [114] AAA [43] Yes NoEvolution [115] Sports Indieevolution for Research beginners [129] StrategyFootball Evo creates [109] Simulation weapon visuals creates andForza natural Unknown audio Car Action language Racing responses [139][82] Unknown NNGAR Output [119][33] Unknown Indie LearningGridworld [116] controls "Boris" Interaction controls battleGuess Metaphor creature behavior the movement Word [104] and Simulation [28] behavior NoHey Robot [120] How to Train Unknown your Snake [108]Idle UI Machine Learning Game Indie controls [141] creatureiNNk Behaviors movement [135] controls chicken Content movement andMachine preservation Indie Behaviors Learning skills Arena* [117] Simulation OfflineMotoGP19 Research [131] Behaviors Online Online SimulationNeat Race Indie Unknown [133] Research PuzzleNERO Behaviors [128][77] OnlineOui controls Simulation Chef!! Action creature [111][15] movement No Designer OfflinePetalz* [118][69] Apprentice DesignerQuick, Draw! [125] Yes ApprenticeRace No for the Galaxy controls [123] Yes playerRoll movement for and Content Competitor the behavior Galaxy NN-Limited [124] UI NN-Limited Semantris [126] creates NN-Agnostic natural Research language NN-Agnostic responsesSupreme controls UI Commander snake 2 movement Offline AAA [121][65] creates particleThe weapons Abbattoir Intergrade Puzzle NN-Limited [107] UI Research Sports Simulation Behaviors Behaviors Research AAA Designer Simulation No Online Online Unknown Indie No Strategy Behaviors controls the car’s No driving NN-Limited Research identifies performance UI sketches Strategy Offline drawn Unknown Apprentice Teammate by controls the robot movement and shooting Puzzle behavior Behaviors controls opponent Unknown behavior creates flowers Content Online Teammate NN-Limited Behaviors NN-Specific controls UI UI enemy unit offense behavior Online Behaviors No Onlineplayer Behaviors Offline NN-Limited Apprentice UI controls the classification words of Designer Apprentice NN-Specific Behaviors Competitor UI Offline Behaviors Online NN-Specific UI NN-Limited UI Offline NN-Agnostic UI Competitor NN-Specific UI Behaviors Competitor Content Competitor Offline NN-Agnostic UI NN-Limited UI Offline DesignerTeammate NN-Limited UI NN-Agnostic UI

6 Player-AI Interaction: What Neural Network Games Reveal About AI as Play CHI ’21, May 8–13, 2021, Yokohama, Japan

123, 124, 131, 132, 139]. In these games, the NN counters the player players build and command a variety of units to defeat the opposing during gameplay, thus encouraging them to adapt and try new NN-controlled enemy. The game’s opening screen (Right, Figure 3) strategies. A key distinction in this category is that the NN learns personifies the NN as an evil-looking person with the text “Meet player’s actions to create a more difficult challenge for the player Boris, a neural-network AI you can fight against...” to overcome. As discussed further in the next section, only one Finally, 34% of the games used NN-Agnostic UI, which does game [135] here explicitly highlights the existence of the NN in not reference the NN. By masking the NN, these games maintain their core UI. the narrative immersion of the game worlds without revealing the For 21% of the games we identified Neural Network as Team- algorithms used to build them. mate, which happens if the interaction between the player and the NN is structured as those between colleagues. In these games, the player and the NN work together toward a shared goal. For exam- ple, in Evolution [115], players and the NN create a stick-figure-like 4.3 Discussion creature together. Players assemble the creature by placing bones, 4.3.1 NN Is Driving the Experimentation of Novel Gameplay Experi- muscles, and joints in different ways. The NN takes the player’s ences. We observed a surge of NN games in recent years (Figure 5). creation and improves it through evolving it over many iterations. NNs have been adopted in a wide variety of game genres and This interaction creates a collaborative cycle between the player gameplay experiences. The games in our collection covered all the and the NN. A unique characteristic of this interaction metaphor common game genres, showing vibrant efforts in trying different is that the player and the NN have complementary skills. Both are ways of incorporating NN in games. The interaction metaphor with needed to complete the game objective. the longest history is NN as Apprentice, whereas NN as Teammate The final 19% of the games used the Neural Network as De- has only been published since 2014. Our hypothesis is that the signer metaphor. In these games, the NN acts as a creator and the metaphor of NN as Teammate has the highest technical require- player as its client. The NN generates new content [113, 142] or cus- ment for the NN because it needs to sufficiently understand and tomizes content based on the preferences of the player [118, 119], anticipate player interaction in order to accomplish a common goal. usually determined passively through players frequently interacting Thus, they are the most recent interaction metaphor. Most of the with a particular game element. For example, in Petalz [118], play- games are experimental, as the game design community has not ers arrange and nurture a balcony of flowers, which are generated formed established ways to use NNs. This is also echoed by the by an NN. The NN generates each flower (shape and color) based on fact that indie developers and researchers developed the majority the player’s selection of flowers to breed or cross-pollinate. The NN of our games (74%). extends the game’s playability by creating flowers that match the We noticed that some of the NN games are providing novel preferences of the player. Notice that compared to NN as Teammate, gameplay experiences that would not have been possible without the player here generally has less well-defined goals to accomplish NNs. For example, thanks to advancements in deep learning, AI with the NN. Dungeon [142] offered a text-based, human-AI collaborative story- telling experience with few constraints for what the player could 4.2.2 Visibility of NN in Core UI. For the second research question type into the game. This is a drastic improvement over traditional of “how visible are NNs in the UI of the core gameplay,” we found text-based adventure games, which are infamous for their intol- 3 levels in which the NN is called into the player’s attention in the erance for player input that is slightly different from what the UI: NN-Specific, NN-Limited, and NN-Agnostic. designer programmed for [57]. More significant is that no matter A significant number of games (26%) foregrounded the existence what story elements the player enters, AI Dungeon can respond in of its NN through what we call NN-Specific .UI These UIs highlight reasonable ways. the presence of the NN during core gameplay through linguistic The NN games in our dataset have improved established game- features (e.g., using the term “neural network” [108, 116, 135, 141]). play experiences that had been supported by other AI techniques. For instance, How to Train your Snake describes each NN-controlled Compared with traditional AI, NNs are more adaptive and flexible. snake as “...hooked up to a Neural Network” [108]. Some games use In NN as competitor games, we observed that these games use visual features (e.g., visualizing the underling NN [108, 109, 133, established game mechanics (e.g., racing [131, 139]), but the NN 135, 141]). In iNNk (Middle, Figure 3), the word “neural network” offers a more challenging competitive experience through more is prominently featured in the core game UI along with the NN’s capable NPCs than other AI techniques. confidence meter. More interesting, some games visualize thepa- The most unique gameplay is found in making the training of rameters of the NN training algorithm to make the training process the NN itself a playable mechanic. By representing the NN and playable. For example, in Neat Race [133] (Left, Figure 3), the game displaying features of its internal processes, this may lead to new visualizes the NN’s internal structure (bottom left of the screenshot) player experiences, such as using gameplay to inform their under- and displays its parameters as sliders (top right). standing of the system to be successful in the game. For example, The majority of our games (40%) used NN-Limited UI. They we observed that this occurs in games such as How to Train your acknowledge the presence of the NN in the game, but only through Snake [108] and Idle Machine Learning Game [141], which made non-essential UI, such as using technical terminology in tutori- the NN explicit in the UI and displayed parameters for the players als [127, 138, 142], menus outside the core gameplay loop [113, 116, to use when steering the NN’s output. The gameplay became a 128, 139], or explicitly referring to the NN only in title screens [126, puzzle about how to best configure the NN to be successful in the 142]. For instance, Blitzkrieg 3 [132] is a WWII strategy game where game. To do so requires players to have a basic understanding of 7 CHI ’21, May 8–13, 2021, Yokohama, Japan Zhu, Villareale and Javvaji, et al.

Figure 3: From left to right, we display Neat Race [133] categorized as NN-Specific, iNNk [135] categorized as NN-Specific, and Blitzkrieg 3 [132] categorized as NN-Limited.

what prior researchers proposed based on intuitions and domain knowledge [16, 32, 68, 84], all of which uses human relationships. The role of metaphors has been extensively studied in human cog- nition [45, 46] and in UI design [49, 72], as it has been more recently Agnostic Specific Limited Limited in human-AI interaction as well [22]. Specific Agnostic Online We believe this full, uncomplicated adoption of interaction metaphors Agnostic Specific Offline reflects the early stage of human-AI interaction. While we did notice Limited Competitor Limited some laudable innovations such as those mentioned in the previous Offline Online section, by and large, game designers and developers went with Specific Agnostic Apprentice familiar concepts and metaphors. This is consistent with Bolter’s Teammate Agnostic Specific notion of how new technology remediates familiar forms before Offline Online taking on its distinctive forms [10]. Compared to ML-based UX [23], Limited Designer Limited

Online an advantage of the games community is that game AI developers Specific Agnostic are often game designers themselves or work closely with the latter. Offline Agnostic Specific This cross-pollination between algorithm and design makes games Limited Limited Specific Agnostic a vibrant domain for new experimentation. It is also notable that the metaphors are connected with the algorithmic characteristics of different NNs. As shown in Figure 4, all games that adopted NN as Apprentice use online learning for player agency, whereas most games with NN as Competitor use Figure 4: Distribution of the NN games (n = 38) categorized offline learning for opponent competency. by interaction metaphor, online/offline learning, and UI vis- ibility. Each black dot represents one NN game. 4.3.3 The Struggle with Transparency and Interpretability. Similar to most NN-based UX applications (see further discussions in Sec- 8 T 7 T : Teammate T tion 5), NN games struggle with how and what to communicate to 6 C : Competitor T T the player regarding the use of the NN. In our initial coding stage, 5 D : Designer T T T C even our NN researcher coauthors could not always figure out the 4 A : Apprentice T C T C NN-specific questions by simply looking at the game. Because the 3 T D C A C C Number of Games use of NNs may not be apparent in the gameplay, and since only 2 C D D C A A A 26% of our games references NNs (including simply using the word 1 A C A A C D D A D A A A A “neural network”) in their core UI, the game requires the player to believe they are interacting with an NN. Even when the players are directly playing with the parameters Figure 5: Distribution of the NN games by publishing date. of the NN, it is not always clear what these features do. For example, unless the player has a background in evolutionary algorithms, terms such as “population” and “retraining” are not necessarily the capabilities and limitations of the system, which are discovered understandable. over time through play. Like other AI-infused systems, games also struggle with the lack 4.3.2 Metaphors We Play By. In our analysis, we saw that interac- of interpretability of NNs. We see a full spectrum from completely tion metaphors based on human relationships played a powerful blackbox NNs [106, 121, 122, 127] (often in NN as competitors) to at- role in structuring player-AI interaction. We did not notice that any tempts to visualize the underlying NN [108, 109, 128, 133, 134, 141]. games break or even complicate (e.g., a teammate that back-stabs) Most notably, NERO [128], gives insights about the NN and its train- the interaction metaphor they use. Our empirical analysis validates ing in two ways. First, by visualizing the NN training parameters, 8 Player-AI Interaction: What Neural Network Games Reveal About AI as Play CHI ’21, May 8–13, 2021, Yokohama, Japan players can steer the NN behavior by tweaking the reward structure their feedback such as preference to the NN during gameplay?” Ta- for what is preferred behavior (e.g., approach enemy, attack enemy). ble 2 shows all the original guidelines and our associated “Guiding Second, players are able to see graphs of the NN’s internal structure questions for NN games.” and fitness values across generations for all robots across various From this process, we agreed that guidelines in the “when wrong” combat stats (e.g., enemy hits). category did not apply in the context of games. Games handle failure The strongest designs for making NNs more interpretable come differently than other AI products, where failure is expected and, from simulation games where the player can tweak different train- in fact, part of the main interaction and resulting experience [7, 44]. ing parameters. In these cases, even though the names of the pa- Additionally, for AI-infused products, humans are consumers of the rameters are sometimes too technical for players without an AI AI. By contrast, players in many of our games actively control how background, the NN’s behavioral change feedback through differ- the AIs are trained. This close relationship significantly complicates ent iterations of trial-and-error gameplay helps the player develop the notion of failure in games. As a result, fully unpacking what an intuition. In other words, most games in our dataset manage to failure means in games is out of the scope of this paper. We hence reframe the difficulties of interacting with an NN as a puzzle and excluded this category and focused our analysis on the initially, thus make it more engaging. during, and overtime categories. We do offer some observations in the discussion, but further research is needed in this important area of player-AI interaction. 5 PHASE 2: ANALYZING NN GAMES WITH GENERAL HUMAN-AI INTERACTION 5.1.2 Step 2: Codes for Analyzing NN Games. Two researchers took GUIDELINES an iterative approach to arrive at a set of codes to analyze the games in Step 3. Amershi et al.’s [6] do not necessarily specify or The second broad research question is to explore what neural net- recommend how the guidelines should be evaluated, but in their user work (NN) games can tell us about designing human-AI interaction. study with UX designers (n = 49), they applied a 5-point semantic Here we focus on the following subsidiary research questions: differential scale from “clearly violated” to “clearly applied.” We combined their approach for heuristic evaluation with our guiding • RQ 2.a: To what extent do NN games comply with contem- questions to create a 3-point coding scheme for each guideline. In a porary design guidelines for human-AI interaction? nutshell, this coding scheme is used to decide whether a game (A) • RQ 2.b: Using the design guidelines for human-AI interac- clearly applies, (B) partially applies, or (C) violates the guideline. tion, how can NN games be differentiated according to their For example, for Guideline 1 “Make clear what the system can do” characteristics (see Table 1) and in comparison with other we added the guiding question “How does the game make clear AI-infused products? what the NN can do?” and the following three codes: (A) Makes the NN’s capabilities known; (B) Makes part of NN’s capabilities 5.1 Methods known; and (C) Does not make the capabilities known at all. With For inferring what NN games can tell us about human-AI interac- this coding scheme, the researchers were able to clearly label in tion, we used the human-AI design guidelines proposed by Amershi the context of a specific guideline and NN games. Additionally, et al. [6]. It is the most recent and comprehensive manner in which the 3-point scale is much more suitable for qualitative/consensus the design for human-AI interaction is documented thus far by coding. We aimed to describe rather than only rate or score the the HCI community. As discussed above (Section 2.2), currently games to extract meaningful insights, hence why we defined the no equivalent guidelines exist specifically for games. There are 3-point coding scheme for each guideline in accordance with the 18 guidelines in [6] in total. They are grouped according to when associated guiding question. Table 2 shows the resulting codes. the user is interacting with the AI: 1) initially, 2) during, 3) when 5.1.3 Step 3: Analyzing NN Games. Two researchers applied the wrong, and 4) overtime. The analysis procedure in adapting these codes from Step 2 for the analysis of the 31 playable games from guidelines to the NN games involved a three-step process: Step 1. our corpus, the results of which are presented in Section 5.2. When defining guiding questions for NN games, Step 2 establishing codes analyzing each game, both researchers reviewed the data collected for analyzing games, and Step 3 analyzing the games with Step from Phase 1 (i.e., gameplay footage and written observations) and 1 & 2. Two researchers performed all steps in close coordination played the game. After reviewing this material, they independently and checked the outcomes of each step with the other authors for assigned codes for each guideline per game. Disagreements were verification and to reach consensus [37, 67]. resolved through discussion. After the coding results were agreed upon, scores were assigned 5.1.1 Step 1: Guiding Questions for NN games. Two researchers for cross-comparison with the characteristics found in Section 4 completed a detailed reading of each guideline to understand the (see Table 1). We assigned a score of 2 for the clearly applies codes guideline in the context of the original AI application examples (A), 1 for the partially applies codes (B), and 0 for the violation codes (e.g., recommender systems, activity devices, etc.). Then, both re- (C). For the comparison of the guidelines (i.e., comparing G1 with searchers explored how the guidelines may be applied in the con- G2, etc.), we then took the sum of scores of these resulting scores text of games. This process led to the definition of a question for per guidelines and divided them by the maximum score per guide- each guideline to help orient the researchers when observing the line to calculate the % per guideline. For the cross-comparisons of games in Step 3. For example, for Guideline 15 “encourage gran- the characteristics (on the interaction metaphors, visibility, devel- ular feedback,” we defined the question, “how do players indicate oper, etc.), we normalized the sum of scores as we have a different 9 CHI ’21, May 8–13, 2021, Yokohama, Japan Zhu, Villareale and Javvaji, et al.

      While these games are doing well in most cases by communicat-

    ing the capabilities of the NN during the initial interaction, they are doing poorly regarding communicating the limitations of NNs     to the players. Guideline 2 “makes clear how well the system can     do what it can do” had 27 games (85%) not making the limitations    of the NNs known to the player at all — the second-highest number

    of violations among the guidelines. This may be justified in cases    playing against the NNs (i.e., NN as Competitor) to avoid exploiting

    the system to win the game. Or in cases where the NN is the focus of gameplay (i.e., NN as Apprentice), limitations become part of the     puzzle and understanding through play.        5.2.2 During. The during guidelines (G3–G6) refer to player in-     teraction at any given gameplay loop. Guideline 3 “time services     based on context” and 5 “match relevant social norms” had the most reported applications: 21 games’ (68%) NNs provided timely     service, and 25 games (81%) were labeled as an expected interaction    with the NN. The clear application of G3 and G5 suggests how              designers carefully considered how the NN can help assist with Figure 6: The distribution of compliance codes (A, B, and C) the flow of the player experience by suggesting interactions and providing immediate consequences that are consistent with player per guidelines in count (n = 31) and %. expectations. Games also performed well in complying with Guideline 4 “show number of games per characteristic and then took the average of contextually relevant information,” as none of the games violated the normalized sum of scores to calculate the % per characteristic. showing contextually relevant information. In the majority of these In this section, we do not report the results on the characteristics games, the focus of the gameplay centers on changing or affecting of NN input and NN output as they do not provide any meaningful the output of the NN. We observed that the games provide infor- insights. We further omitted categories with a low number of cases mation in regards to how the NN was responding to or utilizing (e.g., there is only one adventure game). Finally, for the comparison player actions. A common approach is the use of UI elements (e.g., of our outcomes with other AI-infused products reported by Amer- NN performance stats increasing) accompanied by a continuous shi et al. [6], we considered the major patterns in the aggregate animation or visual change to enable players to observe and then data for both (i.e., NN games vs. all other AI-infused products). inform their next gameplay action. For example, in iNNk [135], players can observe the exposed confidence meter that displays as 5.2 Results a percentage under the NPC character in relation to their drawing, thus building a better mental model of how to subvert the NN in Figure 6 shows the results of applying the adapted guidelines in future drawings. the context of NN games. Below, we discuss per guideline category Other games provide additional visual information during this (i.e., initially, during, and overtime) the results in more detail. Note animation, such as an overlay of the entire NN population. Pro- that the reported % here are based on the % of games that were viding this extra information allows players to assess the NN’s coded as A = clearly applied, B = partially applied, and C = violation. progress and observe both successful and failed attempts. For ex- Following this, we compare the application of the guidelines across ample, in Evolution [115], players are able to see an overlay of all the characteristics discussed in Section 4, and with other AI-infused the NN attempts at training the same creature simultaneously. This products. The reported results here are based on the % derived from additional animation shows the highest and the lowest-performing the normalized scores. creatures training at the same time, which enables players to better 5.2.1 Initially. The initially guidelines (G1–G2) refer to player in- understand the progress as a whole and determine if the creature teraction prior to gameplay. During the initial interaction of these needs to be tweaked further. games, Guideline 1 “makes clear what the system can do” has the While games are doing well to display relevant information re- most reported applications: 23 games (74%) made either full or part garding gameplay, Guideline 6 “mitigate social biases” had 17 games of the NN’s capabilities known to the player prior to gameplay. (55%) with a violation. Examination of these instances revealed that These games helped the player understand the capabilities in a it was unclear if the NN mitigates undesirable social stereotypes variety of ways, such as tutorials, intro screens, or developer notes and biases. Additionally, we reported half of such games as a vio- prior to gameplay. Some were not as direct and did not provide such lation because such biases may emerge directly from players and textual content, but still made the capabilities known by immedi- are not mitigated. For example, games that made the NN the focus ately providing the player with an output to observe. For example, of gameplay (i.e., training or evolving using an NN) provides a in How to Train your Snake [108], players start the game to find new NN to play with. Therefore, players may steer the NN with the NN already training the snakes to move and find food, thus, their own personal preferences. Stereotypes and biases can emerge showcasing an immediate result. through the player’s direction and reinforcement. 10 Player-AI Interaction: What Neural Network Games Reveal About AI as Play CHI ’21, May 8–13, 2021, Yokohama, Japan

5.2.3 Overtime. overtime    The guidelines (G12–G18) refer to player    interaction with the NN over a longer period of time. Guideline 13       “learn from user behavior” and 14 “update and adapt cautiously” are     performing well. Based on the player’s behavior, 17 games (55%) NN      personalize the experience, and 26 games (84%) do not disrupt the      gameplay experience when the NN changes its behavior. Guideline

     

  15 “encourage granular user feedback” is doing well with 14 games       (45%) that allow players to directly indicate their preferences to        the NN during gameplay. Games are performing moderately in      regards to Guideline 17 “provide global controls” with 15 games      (48%) providing full or partial global control to adjust how the NN       behaves, respectively.       

A common approach to provide players more agency is the ability       to adjust the NN through parameters or the environment it interacts     in. We observed these in setting menus, or in other cases, directly    in the core GUI. For example, in NERO, the game allows players  to edit the reward structure of the NNs during a training session. Further, players are able to edit the training environment, such as (a) Developer categories. adding barriers and placing particular enemies to directly influence ­€‚     the NNs training.      Guideline G16 “convey the consequences of user actions related      to NN” and G18 “clear notifications of changes in NN capability”      had the most violations in this category: 22 games (71%) did not pro-       vide any feedback conveying how players’ actions will impact the      NN, and 28 games (90%) did not notify of any changes or updated to 

       the NN capabilities. Further, Guideline G12 “remember recent trans-       actions” was another difficult guideline to apply in games. In these        cases, 10 games (32%) leveraged the history of the player actions        to generating content tailored to the player or a more challenging       experience but did not allow the users to access that memory. Only        2 games (6%) make this history useful to the player as other AI        products do (e.g., navigation products, search engines) and allowed       the users to reference that history.    

   5.2.4 Comparison. Consistent with the analysis in Section 4, NN- Specific games (60%) outperform NN-Limited (50%) and NN-Agnostic    (40%) as shown in Figure 7b. G1 “make clear what the system can (b) UI Visibility categories. do” is understandably what separates NN-Agnostic from the other     two UI categories and G17 “provide global controls” is the most       differentiating guideline for visibility. Consistent with the analysis     in Section 4, we find that the simulation genre complies better (63%)          compared to all others (32–58%), due to G4, G15, and G17. Also, not   unexpectedly, NN games with online learning score higher (57%)             than those with offline learning (39%), specifically in the guidelines           in the overtime category.        With the interaction metaphors (see Figure 7c), we see that NN as       Apprentice scores the highest (60%) with here too G17 as the most       differentiating guideline, followed by G13 “learn from user behavior.”        Most apparent is that NN as Competitor (35%) scores the lowest,                which is not unexpected given that there are gameplay reasons      to not abide by the guidelines. Of note is that NN as Teammate       do reasonably well on G12 “remember recent transactions,” which    makes sense given that players need to build trust with the NN to    work with them. We further find that the AAA games score lower (38%) compared (c) Interaction metaphor categories. to the indie (56%) and research (49%) games (see Figure 7a). This is mostly due that 6 out of 9 AAA games are classified as NN Figure 7: A comparison of guidelines across different char- 11 acteristics. CHI ’21, May 8–13, 2021, Yokohama, Japan Zhu, Villareale and Javvaji, et al. as Competitor games. It is interesting that research games score we removed the “when wrong” category of guidelines in our anal- high on G3 and G14, while indie games score high on G13, G15, ysis, we noticed many interesting uses of failure in the player-AI and G17. It indicates that the research games are more focused on interactions in the NN games. Overall, failures in games are used integrating the NN into the game flow, while indie games are more productively. In many NN games, failure is used to motivate the experimenting in how players can interact with the NN. player to continue improving their AI. This is consistent with the Aside from contrasting the outcomes with the game charac- use of failure in game design [44]. teristics, we compared our guideline outcomes with the reported Most notably, we noticed that failure is highlighted, instead of outcomes of AI-infused products by Amershi et al. [6] to see what minimized, from the beginning. When the player first starts the key similarities and differences exist. We find that making the limi- game, the snake they control dies in How to Train your Snake [108], tations known (G2) is poorly addressed in both NN games as other or robots run to the edges of the arena instead of approaching AI-infused products. AI-infused products do tend to perform better the enemy in NERO [128]. This is a design pattern not commonly on G12, which is not surprising given that this a feature that is observed in non-AI games. We believe that the game designers critical for many recommender types of products. NN games, how- used this device in order to re-frame players from “problem makers” ever, generally perform better on G15 and G17, suggesting that NN to the AI into “problem fixers”, making controlling the NN a less games facilitate granular user feedback through direct interaction intimidating task. We noticed that this design strategy was used and provide more controls to their users, respectively. particularly in the NN as Apprentice games. 5.3.3 Playing with Different forms of Human-AI Interaction. Our 5.3 Discussion analysis highlights the differences between various types of games, By leveraging the design guidelines for human-AI interaction by most notably in games using NN as Competitor. Compared to the Amershi et al. [6], our aim was to explore what NN games tell other interaction metaphors, they violate many more guidelines. us about designing human-AI interaction. In this process, we also Through close examination, however, many of the violations are learned more about NN games themselves and were able to verify motivated and intentionally designed to be so. As argued above, and confirm the findings reported in Section 4. In fact, after applying when players compete with the AI, common design assumptions the guidelines, we see more specifically what NN games do and how such as transparency and explainability do not apply directly and they differ from other AI-infused products. Here we describe the needs to be re-examined and adapted for the context. main takeaways from analyzing NN games with general human-AI Literature on human-AI interaction primarily focuses on the interaction guidelines, especially in terms of how the notion of AI paradigm of AI as a tool/augmentation to the user. However, an as play can be used to inform human-AI interaction. increasing number of AI-infused products fall outside this assump- tion. For example, AI in cybersecurity applications explores AI as 5.3.1 Learning AI and Its Limitations Through Play. Clearly com- an adversary, and AI products with high privacy concerns (e.g., in municating the affordances and limitations of a system is along- healthcare) require a different way to think about transparency. In established design principle [61] and confirmed in human-AI inter- these cases, we believe NN games can offer many design insights action [6, 19, 26]. While the NN games do overall relatively well in and cases for inspiration. We also need to expand current human-AI communicating what the NN does (G1), they do not in communi- interaction guidelines so that it can encompass elements essential cating what it does not (G2). Other AI-infused products performed to play, such as engagement, flow, and fun. equally poorly on G2 [6]. Through our close analysis of the games as well as developers’ notes, we noticed that NN games purposely 6 THE FUTURE OF PLAYER-AI ignore G2 because the point is to learn the limitations through play. INTERACTION The developers of [125, 126, 142] all explicitly describe their games as experiences to explore or test the limits of the NN. For Through the specific case of NN games, we have demonstrated the example, Semantris asked the players to “play around ” and “see richness of player-AI interaction as a research topic. In this section, what references the AI understands best.” In such games, the NN we intend to situate player-AI interaction in the broader context cannot directly communicate the limitations of the system upfront of related research areas. We then distill the design implications of without jeopardizing the gameplay — because discovering the sys- our study for games and for UX/HCI. tem’s limitations is the gameplay. By framing the discovery of NN 6.1 Establishing Player-AI Interaction limitations as play, many NN games are able to foster a sense of curiosity, discovery, and accomplishment in players. The study of players, game design, and game technology (here we We also saw creative ways of making AI’s limitations part of focus on AI) are three main pillars in games research (Figure 8). the rule of play. Most notably, we saw a number of projects with At the intersection of Games and AI, there is a well-established online-learning adopted by the idle game genre, which has a built- research community of game AI. In fact, game AI has often been in play-wait-play cycle [5]. This game feature makes the technical driving the world of AI research, in which the most advanced forms requirement of waiting for the NN to finish the training part of of AI algorithms are often first developed and tested in games [70]. the expected experience and uses the idle game’s reward system to Results in domains such as StarCraft 2 and Quake III now frequently incentivize players to return to the game after waiting. appear in the most prestigious journals [42, 88]. Combining the study of the player with game design, there is the research area 5.3.2 Highlighting Failure as Part of Play. As argued above, failures of player experience. However, the intersection between players in games are more complex than in other AI-infused systems. While and AI has so far been relatively under-explored. Until recently, 12 Player-AI Interaction: What Neural Network Games Reveal About AI as Play CHI ’21, May 8–13, 2021, Yokohama, Japan

challenging to design. Discovery-based learning offers players the opportunity to play around with the NN at their own pace and observe the consequences of their actions on the NN and the game  world. However, most NN games in our dataset offered very little scaffold, making it difficult for players without a technical back-    ground to succeed. We suggest that UX designers use enhanced discovery-based learning and provide feedback, worked examples, scaffolding, and elicited explanations to further assist their users.     Extend the invitation to play. Finally, for researchers and de- signers interested in exploring new forms of human-AI interaction, we believe offering users an invitation to play can unleash their imagination and empower them to explore new ways to interact with even the same technology. As we can see from Hey Robot!, the Figure 8: Research areas of games. magic circle of play turns the smart speaker user from the seeker of information to the provider. The voice assistant’s inability to understand user command/intent is transformed from failure to real players are typically only brought in at the end of game AI perform to the source of fun. research as the means to evaluate the effectiveness of the algorithms. Carving out the topic of player-AI interaction will fill in this gap. In this paper, we present the first empirical study on this gap that we call player-AI interaction. In addition to establishing games as a 7 LIMITATIONS rich domain for human-AI interaction, our analyses contribute in- We recognize that several limitations impact the scope of our work. sights into how we can classify AI-based games based on interaction First, our study considered only NN games with specific tags on metaphors (i.e., apprentice, designer, teammate, and competitor) popular game platforms and textbooks. Our goal was to find a and the visibility of the AI system as part of the core UI (i.e., spe- representative sample of salient NN games, not a comprehensive cific, limited, and agnostic). We further adapted design guidelines list. However, we acknowledge that we may have missed some for human-AI interaction to the context of games, which can be relevant games. Second, we omitted the failure-related human-AI useful for others when designing their AI-based games. However, interaction guidelines. While we offered some related observations, we encourage the community to further scrutinize these design further research is needed to study failure in games, separate from guidelines as our work indicates that AI-based games are different failure outside the context of play, and how it relates to player- from AI-infused products, for example, with regards to the role of AI interaction. Third, we used 3-point codes in our qualitative failure, and to work towards more formalized and evaluated design analysis. While it is appropriate for our purpose of the analysis, guidelines for player-human interaction. future research can adopt a more fine-grained analysis that can better distinguish the “violations” and “does not apply.” Finally, we 6.2 Encouraging Playing with AI do not have necessary (sufficient) technical information about how For game designers, UX designers, and HCI researchers interested NNs are used in many commercial games. The blackbox nature of in human-AI interaction, one of our key takeaways is that reframing game AI limited our ability to conduct in-depth analyses of specific AI as play offers a useful design approach, complementary tothe features of games (see “Multiple AI?” in Table 1). current instrument-based views on AI. As as play can offer new human-AI interaction design space where the users can tinker, explore, and experiment with AI. We propose the following design considerations: 8 CONCLUSION Use flow to structure the learning curve of human-AI in- We introduced the term player-AI interaction to study how human teraction. For many users, interaction with AI can be overwhelm- players interact with AI in the context of games. While we intend ing, especially when they encounter unexpected output from the to situate it in the broader context of human-AI interaction, we also algorithm. One important lesson from our study is that the concept highlight the unique opportunities and challenges presented by re- of flow18 [ ], widely used to balance game difficulty and player en- framing AI as play. Through a systematic search of existing neural gagement over time, can be useful to design human-AI interaction. network games, we conducted two deep qualitative analyses. In the In Section 5.3, we discussed that NN games should better support first one, we analyzed the common metaphors that structure the players for “Learning AI and Its Limitations” and make experimen- player-AI interaction and how much the NNs are foregrounded in tation with AI more acceptable by “Highlighting Failure as Part of the core UI. In the second analysis, we adapted the current human- Play.” The use of flow can be useful to structure how to gradually AI interaction guidelines to player-AI interaction and applied them expose users to different AI features (see also [17]). to identify the strengths and weaknesses of NN games. Based on Incorporate enhanced discovery-based learning. Many games our findings, we proposed that the notion of AI as play, which in our analysis, especially simulation games, offer discovery-based is an alternative to the current paradigm of performance-centric learning [4] with mixed success. Since players come with different human-AI interaction, can contribute to both game design and HCI background knowledge and needs, explicit instruction for AI is communities. 13 CHI ’21, May 8–13, 2021, Yokohama, Japan Zhu, Villareale and Javvaji, et al.

ACKNOWLEDGMENTS [21] Heather Desurvire and Charlotte Wiberg. 2009. Game usability heuristics (PLAY) for evaluating and designing better games: The next iteration. In International This work is partially supported by the National Science Foundation conference on online communities and social computing. Springer, 557–566. (NSF) under Grant Number IIS-1816470 and a DFF-Danish ERC- [22] Graham Dove and Anne-Laure Fayard. 2020. Monsters, Metaphors, and Ma- programme grant (9145-00003B). The authors would like to thank chine Learning. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–17. all past and current members of the project, especially Evan Freed [23] Graham Dove, Kim Halskov, Jodi Forlizzi, and John Zimmerman. 2017. UX and Anna Acosta for assistance in collecting initial data. We want design innovation: Challenges for working with machine learning as a design material. In Proceedings of the 2017 chi conference on human factors in computing to thank those who suggested additional games on . Finally, systems. 278–288. we thank Robert C. Gray for assistance in editing this paper. [24] Richard Evans and Emily Short. 2013. Versu—a simulationist storytelling system. IEEE Transactions on Computational Intelligence and AI in Games 6, 2 (2013), 113–130. [25] Laura Beth Fulton, Ja Young Lee, Qian Wang, Zhendong Yuan, Jessica Hammer, REFERENCES and Adam Perer. 2020. Getting Playful with Explainable AI: Games with a [1] 2020. Steam (service) — , The Free Encyclopedia. https://en.wikipedia. Purpose to Improve Human Understanding of AI. In Extended Abstracts of the org/w/index.php?title=Steam_(service)&oldid=978516977 [Online]. 2020 CHI Conference on Human Factors in Computing Systems. 1–8. [2] Vero Vanden Abeele, Katta Spiel, Lennart Nacke, Daniel Johnson, and Kathrin [26] Anushay Furqan, Chelsea Myers, and Jichen Zhu. 2017. Learnability through Gerling. 2020. Development and validation of the player experience inventory: adaptive discovery tools in voice user interfaces. In Proceedings of the 2017 A scale to measure player experiences at the level of functional and psychosocial CHI Conference Extended Abstracts on Human Factors in Computing Systems. consequences. International Journal of Human-Computer Studies 135 (2020), 1617–1623. 102370. [27] Firaxis Games. 2016. Sid Meier’s Civilization® VI. https://civilization.com/ [3] Philip E Agre. 1997. Computation and human experience. Cambridge University [28] Katy Ilonka Gero, Zahra Ashktorab, Casey Dugan, Qian Pan, James Johnson, Press. Werner Geyer, Maria Ruiz, Sarah Miller, David R Millen, Murray Campbell, et al. [4] Louis Alfieri, Patricia J Brooks, Naomi J Aldrich, and Harriet R Tenenbaum. 2020. Mental Models of AI Agents in a Cooperative Game Setting. In Proceedings 2011. Does discovery-based instruction enhance learning? Journal of educational of the 2020 CHI Conference on Human Factors in Computing Systems. 1–12. psychology 103, 1 (2011), 1. [29] Daniel Gomme and Richard Bartle. 2020. Strategy Games : The Components of [5] Sultan A Alharthi, Olaa Alsaedi, Zachary O Toups, Joshua Tanenbaum, and A Worthy Opponent. In Proceedings of 2020 Foundation of Digital Games. Jessica Hammer. 2018. Playing to wait: A taxonomy of idle games. In Proceedings [30] Stephen Grand, Dave Cliff, and Anil Malhotra. 1997. Creatures: Artificial life of the 2018 CHI Conference on Human Factors in Computing Systems. 1–15. autonomous software agents for home entertainment. In Proceedings of the first [6] Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira international conference on Autonomous agents. 22–29. Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N Bennett, Kori Inkpen, [31] Emmanuel Guardiola. 2016. The gameplay loop: a player activity model for et al. 2019. Guidelines for human-ai interaction. In Proceedings of the 2019 CHI game design and analysis. In Proceedings of the 13th International Conference on Conference on Human Factors in Computing Systems. 1–13. Advances in Computer Entertainment Technology. 1–7. [7] Craig G Anderson, Jen Dalsen, Vishesh Kumar, Matthew Berland, and Constance [32] Matthew Guzdial, Nicholas Liao, Jonathan Chen, Shao-Yu Chen, Shukan Shah, Steinkuehler. 2018. Failing up: How failure in a game environment promotes Vishwa Shah, Joshua Reno, Gillian Smith, and Mark O Riedl. 2019. Friend, learning through discourse. Thinking Skills and Creativity 30 (2018), 135–144. collaborator, student, manager: How design of an ai-driven game level editor [8] Gagan Bansal, Besmira Nushi, Ece Kamar, Walter S Lasecki, Daniel S Weld, and affects creators. In Proceedings of the 2019 CHI Conference on Human Factors in Eric Horvitz. 2019. Beyond accuracy: The role of mental models in human-AI Computing Systems. 1–13. team performance. In Proceedings of the AAAI Conference on Human Computation [33] Erin J Hastings, Ratan K Guha, and Kenneth O Stanley. 2009. Evolving content and Crowdsourcing, Vol. 7. 2–11. in the galactic arms race video game. In 2009 IEEE Symposium on Computational [9] Reuben Binns, Max Van Kleek, Michael Veale, Ulrik Lyngs, Jun Zhao, and Nigel Intelligence and Games. IEEE, 241–248. Shadbolt. 2018. ’It’s Reducing a Human Being to a Percentage’ Perceptions of [34] Stephanie Heintz and Effie Lai-Chong Law. 2015. The game genre map:a Justice in Algorithmic Decisions. In Proceedings of the 2018 Chi conference on revised game classification. In Proceedings of the 2015 Annual Symposium on human factors in computing systems. 1–14. Computer-Human Interaction in Play. 175–184. [10] Jay David Bolter. 2016. Remediation. The international encyclopedia of commu- [35] Jonathan L Herlocker, Joseph A Konstan, and John Riedl. 2000. Explaining col- nication theory and philosophy (2016), 1–11. laborative filtering recommendations. In Proceedings of the 2000 ACM conference [11] Jeanne H Brockmyer, Christine M Fox, Kathleen A Curtiss, Evan McBroom, on Computer supported cooperative work. 241–250. Kimberly M Burkhart, and Jacquelyn N Pidruzny. 2009. The development [36] Clara E Hill. 2012. Consensual qualitative research: A practical resource for of the Game Engagement Questionnaire: A measure of engagement in video investigating social science phenomena. American Psychological Association. game-playing. Journal of Experimental Social Psychology 45, 4 (2009), 624–634. [37] Clara E Hill, Sarah Knox, Barbara J Thompson, Elizabeth Nutt Williams, [12] Joanna J Bryson. 2010. Robots should be slaves. Close Engagements with Artificial Shirley A Hess, and Nicholas Ladany. 2005. Consensual qualitative research: Companions: Key social, psychological, ethical and design issues (2010), 63–74. An update. Journal of counseling psychology 52, 2 (2005), 196. [13] Gordon Calleja. 2007. Digital game involvement: A conceptual model. Games [38] Lars Erik Holmquist. 2017. Intelligence on tap: artificial intelligence as a new and culture 2, 3 (2007), 236–260. design material. interactions 24, 4 (2017), 28–33. [14] Murray Campbell, A Joseph Hoane Jr, and Feng-hsiung Hsu. 2002. Deep blue. [39] Kristina Höök. 2000. Steps to take before intelligent user interfaces become real. Artificial intelligence 134, 1-2 (2002), 57–83. Interacting with computers 12, 4 (2000), 409–426. [15] Gabriele Cimolino, Sam Lee, Quentin Petraroia, and TC Nicholas Graham. 2019. [40] Amy K Hoover, William Cachia, Antonios Liapis, and Georgios N Yannakakis. Oui, Chef!!: Supervised Learning for Novel Gameplay with Believable AI. In 2015. Audioinspace: Exploring the creative fusion of generative audio, visuals Extended Abstracts of the Annual Symposium on Computer-Human Interaction in and gameplay. In International Conference on Evolutionary and Biologically Play Companion Extended Abstracts. 241–246. Inspired Music and Art. Springer, 101–112. [16] Michael Cook, Mirjam Eladhari, Andy Nealen, Mike Treanor, Eddy Boxerman, [41] Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings Alex Jaffe, Paul Sottosanti, and Steve Swink. 2016. PCG-Based Game Design of the SIGCHI conference on Human Factors in Computing Systems. 159–166. Patterns. arXiv preprint arXiv:1610.03138 (2016). [42] Max Jaderberg, Wojciech M Czarnecki, Iain Dunning, Luke Marris, Guy Lever, [17] Christian Arzate Cruz and Jorge Adolfo Ramirez Uresti. 2017. Player-centered Antonio Garcia Castaneda, Charles Beattie, Neil C Rabinowitz, Ari S Morcos, game AI from a flow perspective: Towards a better understanding of past trends Avraham Ruderman, et al. 2019. Human-level performance in 3D multiplayer and future directions. Entertainment Computing 20 (2017), 11–24. games with population-based reinforcement learning. Science 364, 6443 (2019), [18] Mihaly Csikszentmihalyi. 1990. Flow: The psychology of optimal experience. 859–865. Vol. 1990. Harper & Row New York. [43] Daniel Jallov, Sebastian Risi, and Julian Togelius. 2016. EvoCommander: A [19] Maartje De Graaf, Somaya Ben Allouch, and Jan Van Diik. 2017. Why do novel game based on evolving and switching between artificial brains. IEEE they refuse to use my robot?: Reasons for non-use derived from a long-term Transactions on Computational Intelligence and AI in Games 9, 2 (2016), 181–191. home study. In 2017 12th ACM/IEEE International Conference on Human-Robot [44] Jesper Juul. 2013. The art of failure: An essay on the pain of playing video games. Interaction (HRI. IEEE, 224–233. MIT press. [20] Alena Denisova and Paul Cairns. 2015. The Placebo Effect in Digital Games: [45] George Lakoff and Mark Johnson. 2008. Metaphors we live by. University of Phantom Perception of Adaptive Artificial Intelligence. In Proceedings of the 2015 Chicago press. Annual Symposium on Computer-Human Interaction in Play (London, United [46] George Lakoff and Mark Turner. 2009. More than cool reason: A field guide to Kingdom) (CHI PLAY ’15). Association for Computing Machinery, New York, poetic metaphor. University of Chicago press. NY, USA, 23–33. https://doi.org/10.1145/2793107.2793109 14 Player-AI Interaction: What Neural Network Games Reveal About AI as Play CHI ’21, May 8–13, 2021, Yokohama, Japan

[47] Effie L-C Law, Florian Brühlmann, and Elisa D Mekler. 2018. Systematic review [71] Sebastian Risi and Julian Togelius. 2015. Neuroevolution in games: State of the and validation of the game experience questionnaire (geq)-implications for art and open challenges. IEEE Transactions on Computational Intelligence and AI citation and reporting practice. In Proceedings of the 2018 Annual Symposium on in Games 9, 1 (2015), 25–41. Computer-Human Interaction in Play. 257–270. [72] Helen Sharp, Yvonne Rogers, and Jennifer Preece. 2019. Interaction Design: [48] Dan Lessin and Sebastian Risi. 2015. Darwin’s Avatars: A Novel Combination beyond human-computer interaction (5th ed.). John Wiley & Sons. of Gameplay and Procedural Content Generation. In Proceedings of the 2015 [73] David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Annual Conference on Genetic and Evolutionary Computation. 329–336. Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneer- [49] Todd Lubart. 2005. How can computers be partners in the creative process: shelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural classification and commentary on the special issue. International Journal of networks and tree search. nature 529, 7587 (2016), 484–489. Human-Computer Studies 63, 4-5 (2005), 365–369. [74] Brent Smith and Greg Linden. 2017. Two decades of recommender systems at [50] Andrés Lucero, Jussi Holopainen, Elina Ollila, Riku Suomela, and Evangelos Amazon. com. Ieee internet computing 21, 3 (2017), 12–18. Karapanos. 2013. The playful experiences (PLEX) framework as a guide for [75] Sam Snodgrass and Santiago Ontañón. 2014. Experiments in map generation expert evaluation. In Proceedings of the 6th International Conference on Designing using Markov chains.. In FDG. Pleasurable Products and Interfaces. 221–230. [76] Kenneth O Stanley. 2007. Compositional pattern producing networks: A novel [51] Ewa Luger and Abigail Sellen. 2016. "Like Having a Really Bad PA": The Gulf abstraction of development. Genetic programming and evolvable machines 8, 2 between User Expectation and Experience of Conversational Agents. In Pro- (2007), 131–162. ceedings of the 2016 CHI Conference on Human Factors in Computing Systems [77] Kenneth O Stanley, Bobby D Bryant, and Risto Miikkulainen. 2005. Evolving (San Jose, California, USA) (CHI ’16). Association for Computing Machinery, neural network agents in the NERO video game. Proc. IEEE (2005), 182–189. New York, NY, USA, 5286–5297. https://doi.org/10.1145/2858036.2858288 [78] Aaron Steinfeld, Terrence Fong, David Kaber, Michael Lewis, Jean Scholtz, [52] Michael Mateas. 1999. An Oz-centric review of interactive drama and believable Alan Schultz, and Michael Goodrich. 2006. Common metrics for human-robot agents. In Artificial intelligence today. Springer, 297–328. interaction. In Proceedings of the 1st ACM SIGCHI/SIGART conference on Human- [53] Michael Mateas. 2003. Expressive AI: A semiotic analysis of machinic affor- robot interaction. 33–40. dances. In 3rd Conference on Computational Semiotics for Games and New Media. [79] Jennifer Sukis. 2019. Ai design & practices guidelines (a review). https: 58. //medium.com/design-ibm/ai-design-guidelines-e06f7e92d864 [54] Michael Mateas and Andrew Stern. 2003. Façade: An experiment in building a [80] Ja-Young Sung, Lan Guo, Rebecca E Grinter, and Henrik I Christensen. 2007. fully-realized interactive drama. In Game developers conference, Vol. 2. 4–8. “My Roomba is Rambo”: intimate home appliances. In International conference [55] Nora McDonald, Sarita Schoenebeck, and Andrea Forte. 2019. Reliability and on ubiquitous computing. Springer, 145–162. inter-rater reliability in qualitative research: Norms and guidelines for CSCW [81] Penelope Sweetser and Daniel Johnson. 2004. Player-centered game environ- and HCI practice. Proceedings of the ACM on Human-Computer Interaction 3, ments: Assessing player opinions, experiences, and issues. In International CSCW (2019), 1–23. Conference on Entertainment Computing. Springer, 321–332. [56] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis [82] Dean Takahashi. 2018. How Microsoft’s Turn 10 fashioned the A.I. for cars Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with in Forza Motorsport 5 (interview). https://venturebeat.com/2013/11/06/how- deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013). microsofts-turn-10-fashioned-the-ai-for-cars-in-forza-motorsport-5- [57] Nick Montfort. 2005. Twisty Little Passages: an approach to . interview/ Mit Press. [83] Nava Tintarev and Judith Masthoff. 2007. A survey of explanations in recom- [58] Chelsea Myers, Anushay Furqan, Jessica Nebolsky, Karina Caro, and Jichen Zhu. mender systems. In 2007 IEEE 23rd international conference on data engineering 2018. Patterns for how users overcome obstacles in voice user interfaces. In workshop. IEEE, 801–810. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. [84] Mike Treanor, Alexander Zook, Mirjam P Eladhari, Julian Togelius, Gillian 1–7. Smith, Michael Cook, Tommy Thompson, Brian Magerko, John Levine, and [59] Chelsea M. Myers, Jiachi Xie, and Jichen Zhu. 2020. A Game-Based Approach for Adam Smith. 2015. AI-based game design patterns. (2015). Helping Designers Learn Machine Learning Concepts. arXiv:arXiv:2009.05605 [85] Josep Valls-Vargas, Santiago Ontanón, and Jichen Zhu. 2015. Exploring player [60] Lennart Nacke, Anders Drachen, Kai Kuikkaniemi, Joerg Niesenhaus, Hannu J trace segmentation for dynamic play style prediction. In Proceedings of the AAAI Korhonen, Wouter M Hoogen, Karolien Poels, Wijnand A IJsselsteijn, and Conference on Artificial Intelligence and Interactive Digital Entertainment, Vol. 11. Yvonne AW De Kort. 2009. Playability and player experience research. In Pro- [86] Josep Valls-Vargas, Jichen Zhu, and Santiago Ontañón. 2017. Graph grammar- ceedings of digra 2009: Breaking new ground: Innovation in games, play, practice based controllable generation of puzzles for a learning game about parallel and theory. DiGRA. programming. In Proceedings of the 12th International Conference on the Founda- [61] Don Norman. 2013. The design of everyday things: Revised and expanded edition. tions of Digital Games. 1–10. Basic books. [87] Valve. 2008. Left 4 Dead. https://www.l4d.com [62] Santiago Ontanón, Gabriel Synnaeve, Alberto Uriarte, Florian Richoux, David [88] Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, An- Churchill, and Mike Preuss. 2013. A survey of real-time strategy game AI drew Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo Ewalds, research and competition in StarCraft. IEEE Transactions on Computational Petko Georgiev, et al. 2019. Grandmaster level in StarCraft II using multi-agent Intelligence and AI in games 5, 4 (2013), 293–311. reinforcement learning. Nature 575, 7782 (2019), 350–354. [63] A. Oulasvirta, N. R. Dayama, M. Shiripour, M. John, and A. Karrenbauer. 2020. [89] Danding Wang, Qian Yang, Ashraf Abdul, and Brian Y Lim. 2019. Design- Combinatorial Optimization of Graphical User Interface Designs. Proc. IEEE ing theory-driven user-centric explainable AI. In Proceedings of the 2019 CHI 108, 3 (2020), 434–464. conference on human factors in computing systems. 1–15. [64] Derrick Pemberton, Zhiguo Lai, Lotus Li, Shitong Shen, Jue Wang, and Jessica [90] Joseph Weizenbaum. 1966. ELIZA—a computer program for the study of natural Hammer. 2019. AI or Nay-I? Making Moral Complexity More Accessible. In language communication between man and machine. Commun. ACM 9, 1 (1966), Extended Abstracts of the Annual Symposium on Computer-Human Interaction in 36–45. Play Companion Extended Abstracts (Barcelona, Spain) (CHI PLAY ’19 Extended [91] James Wexler. 2002. Artificial Intelligence in Games. Rochester: University of Abstracts). Association for Computing Machinery, New York, NY, USA, 281–286. Rochester (2002). https://doi.org/10.1145/3341215.3358248 [92] Terry Winograd. 2006. Shifting viewpoints: Artificial intelligence and human– [65] Steven Rabin. 2015. Game AI pro 2: collected wisdom of game AI professionals. computer interaction. Artificial intelligence 170, 18 (2006), 1256–1258. AK Peters/CRC Press. [93] Qian Yang, Nikola Banovic, and John Zimmerman. 2018. Mapping machine [66] Emilee Rader, Kelley Cotter, and Janghee Cho. 2018. Explanations as mecha- learning advances from hci research to reveal starting places for design innova- nisms for supporting algorithmic transparency. In Proceedings of the 2018 CHI tion. In Proceedings of the 2018 CHI Conference on Human Factors in Computing conference on human factors in computing systems. 1–13. Systems. 1–11. [67] K Andrew R Richards and Michael A Hemphill. 2018. A practical guide to [94] Qian Yang, Alex Scuito, John Zimmerman, Jodi Forlizzi, and Aaron Steinfeld. collaborative qualitative data analysis. Journal of Teaching in Physical Education 2018. Investigating how experienced UX designers effectively work with ma- 37, 2 (2018), 225–231. chine learning. In Proceedings of the 2018 Designing Interactive Systems Confer- [68] Mark Owen Riedl and Alexander Zook. 2013. AI for game production. In 2013 ence. 585–596. IEEE Conference on Computational Inteligence in Games (CIG). IEEE, 1–8. [95] Qian Yang, Aaron Steinfeld, Carolyn Rosé, and John Zimmerman. 2020. Re- [69] Sebastian Risi, Joel Lehman, David B D’Ambrosio, Ryan Hall, and Kenneth O Examining Whether, Why, and How Human-AI Interaction Is Uniquely Diffi- Stanley. 2015. Petalz: Search-based procedural content generation for the casual cult to Design. In Proceedings of the 2020 CHI Conference on Human Factors in gamer. IEEE Transactions on Computational Intelligence and AI in Games 8, 3 Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing (2015), 244–255. Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376301 [70] Sebastian Risi and Mike Preuss. 2020. From Chess and Atari to StarCraft and [96] Georgios N Yannakakis and Julian Togelius. 2018. Artificial intelligence and Beyond: How Game AI is Driving the World of AI. KI-Künstliche Intelligenz 34, games. Vol. 2. Springer. 1 (2020), 7–17. 15 CHI ’21, May 8–13, 2021, Yokohama, Japan Zhu, Villareale and Javvaji, et al.

[97] R Michael Young, Mark O Riedl, Mark Branly, Arnav Jhala, RJ Martin, and CJ [L137] The Sherrinford. 2018. Corral. Steam. https://store.steampowered.com/app/ Saretto. 2004. An architecture for integrating plan-based behavior generation 833400/Corral/ with interactive game environments. J. Game Dev. 1, 1 (2004), 1–29. [L138] Lykke Studios. 2019. AIvolution. Steam. https://store.steampowered.com/ [98] Jichen Zhu. 2009. Intentional systems and the artificial intelligence (ai) hermeneu- app/1014990/Aivolution/ tic network: Agency and intentionality in expressive computational systems. Ph.D. [L139] Microsoft Studios. 2013. Forza Motorsport 5. Xbox One. Dissertation. Georgia Institute of Technology. [L140] Mount Rouke Studios. 2020. Dr. Derk’s Mutant Battleground. Steam. https: [99] Jichen Zhu, Katelyn Alderfer, Anushay Furqan, Jessica Nebolsky, Bruce Char, //store.steampowered.com/app/1102370/Dr_Derks_Mutant_Battlegrounds/ Brian Smith, Jennifer Villareale, and Santiago Ontañón. 2019. Programming in [L141] vashgard. 2019. Idle Machine Learning Game. itch.io. https://vashgard.itch. game space: how to represent parallel programming concepts in an educational io/idle-machine-learning game. In Proceedings of the 14th International Conference on the Foundations of [L142] Nick Walton. 2019. AI Dungeon. Website. https://aidungeon.io/ Digital Games. 1–10. [100] Jichen Zhu and D Fox Harrell. 2008. Daydreaming with Intention: Scalable Blending-Based Imagining and Agency in Generative Interactive Narrative.. In AAAI Spring Symposium: Creative Intelligent Systems, Vol. 156. [101] Jichen Zhu, Antonios Liapis, Sebastian Risi, Rafael Bidarra, and G Michael Youngblood. 2018. Explainable AI for designers: A human-centered perspec- tive on mixed-initiative co-creation. In 2018 IEEE Conference on Computational Intelligence and Games (CIG). IEEE, 1–8. [102] Jichen Zhu and Santiago Ontanón. 2010. Towards Analogy-Based Story Gener- ation.. In ICCC. 75–84. [103] Jichen Zhu and Santiago Ontañón. 2013. Shall I compare thee to another story?—An empirical study of analogy-based story generation. IEEE Transactions on Computational Intelligence and AI in Games 6, 2 (2013), 216–227.

LUDOGRAPHY [L104] IBM Research AI. 2020. Guess the Word. Website. https://word-game-ui- dev.mybluemix.net/ [L105] Amy K. Hoover, William Cachia, Antonios Liapis, Georgios N. Yannakakis. 2015. Audioinspace. Research game. [L106] Electronic Arts. 2001. Black & White. [CD-ROM]. [L107] Jared Bagley. 2018. The Abbattoir Intergrade. itch.io. https://cabrill.itch.io/ai [L108] Bewelge. 2017. How to Train Your Snake. itch.io. https://bewelge.itch.io/how- to-train-your-snake [L109] Buddhaman. 2018. Football Evo. itch.io. https://buddhaman.itch.io/football- evo [L110] Noah Burkholder. 2020. Bird By Example. itch.io. https://noahburkholder.itch. io/bird-by-example [L111] Gabriele Cimolino. 2018. Oui Chef!! itch.io. https://anna-mae.itch.io/oui-chef [L112] Codemasters. 2000. Colin McRae Rally 2.0. Playstation. [L113] Dan Lessin, Sebastian Risi. 2015. Darwin’s Avatars. Research game. [L114] Daniel Jallov, Sebastian Risi, Julian Togelius. 2016. EvoCommander. Research game. [L115] Keiwan Donyagard. 2019. Evolution. itch.io. https://keiwan.itch.io/evolution [L116] DopplerFrog. 2015. Gridworld. Steam. https://store.steampowered.com/app/ 396890/Gridworld/ [L117] Grant Ferguson. 2019. Machine Learning Arena. Research game. [L118] FinchBeak. 2015. Petalz. Facebook. [L119] Evolutionary Games. 2014. Galactic Arms Race. Steam. https://store. steampowered.com/app/249610/Galactic_Arms_Race/ [L120] Everybody House Games. 2019. Hey Robot. Website. https:// everybodyhousegames.com/heyrobot.html [L121] Gas Powered Games. 2010. Supreme Commander 2. Steam. https://store. steampowered.com/app/40100/Supreme_Commander_2/ [L122] Positech Games. 2013. Democracy 3. Steam. https://store.steampowered.com/ app/245470/Democracy_3/ [L123] Temple Gates Games. 2017. Race for the Galaxy. Steam. https://store. steampowered.com/app/893840/Roll_for_the_Galaxy/ [L124] Temple Gates Games. 2020. Roll for the Galaxy. Steam. https://store. steampowered.com/app/893840/Roll_for_the_Galaxy/ [L125] . 2016. Quick, Draw! Website. https://quickdraw.withgoogle.com/ [L126] Google. 2018. Semantris. Website. https://research.google.com/semantris/ [L127] Steve Grand. 1996. Creatures. [CD-ROM]. [L128] Kenneth O Stanley, Bobby D Bryant, Risto Miikkulainen. 2005. NERO. Website. http://nn.cs.utexas.edu/nero/, [L129] Nils Krautkremer. 2020. evolution for beginners. Steam. https://store. steampowered.com/app/1276350/evolution_for_beginners/ [L130] MiorSoft. 2020. 2D Walk Evolution. itch.io. https://miorsoft.itch.io/2d-walk- evolution [L131] MotoGP. 2019. MotoGP19. Steam. https://2019.motogpvideogame.com/ [L132] Nival. 2017. Blitzkrieg 3. Steam. https://store.steampowered.com/app/235380/ Blitzkrieg_3/ [L133] Oxygenium. 2020. Neat Race. itch.io. https://oxygenium.itch.io/neat-race [L134] Peter Greve, Jan Piskur. Sebastian Risi. 2014. Braincrafter. Research game. [L135] PXL Lab, REAL Lab. 2020. iNNk. Research game. [L136] Alper Sekerci. 2020. Competitive Snake. itch.io. https://alpersekerci.itch.io/ competitive-snake 16 Player-AI Interaction: What Neural Network Games Reveal About AI as Play CHI ’21, May 8–13, 2021, Yokohama, Japan

Table 2: Summary of the guideline analysis codes.

AI Design Guidelines & Guiding Questions Codes for NN Games

Make clear what the system can do A - Makes the NN’s capabilities known Help the user understand what the AI system is capable of doing. G1 How does the game make clear what the NN can do? B - Makes part of the NN’s capabilities known C - Does not make the capabilities known at all Initially Make clear how well the system can do what it can do A - Makes the NN’s limitations known Help the user understand how often the AI system may make mistakes. G2 How does the game make clear what the NN is good or bad at? B - Makes part of the NN’s limitations known C - Does not make the limitations known at all

Timed services based on context A - The NN provides a timely service that adds to the flow Time when to act or interrupt based on the user’s current task and environment. of player experience

G3 B - The NN provides a timely service at the request of the When does the NN act or interrupt based on players’ current gameplay task? player to continue the flow of player experience C - The NN provides a timely service that interrupts the flow of player experience Show contextually relevant information A - Shows relevant feedback: UI + behavior/content Display information relevant to the user’s current task and environment. G4 What information does the game provide to the user based on how the NN B - Shows some relevant feedback: behavior/content responds to the player? C - Does not show relevant feedback Match relevant social norms During Ensure the experience is delivered in a way that users would expect, given their A - Expected interaction G5 social and cultural context. How does the NN interact with players given their social and cultural context? B - Somewhat expected interaction C - Unexpected interaction Mitigate social biases A - NN mitigates undesirable social stereotypes and Ensure the AI system’s language and behaviors do not reinforce undesirable and biases unfair stereotypes and biases. G6 How does the NN mitigate social biases especially in regard to undesirable and B - NN is neutral to undesirable social stereotypes and unfair stereotypes and biases? biases C - Unclear if the NN mitigates undesirable social stereotypes and biases

Remember recent transactions A - NN remembers players’ relevant recent actions and Maintain short term memory and allow the user to make efficient references to allows users to reference that memory that memory. G12 Does the NN remember players’ recent relevant actions? B - NN remembers players’ relevant recent actions and How does the game allow users to reference that memory during gameplay? does not allow users to reference that memory C - Unclear if the NN remembers the action history of the player. Not applicable Learn from user behavior A - NN personalizes based on the given player’s Personalize the user’s experience by learning from their actions over time. behavior. G13 B - NN somewhat personalizes by reacting to the player How does the NN personalize by learning from the player’s actions? based on only pre-trained other players behavior C - NN does not personalize Update and adapt cautiously A - Update and adapts cautiously - the NN does not Limit disruptive changes when updating and adapting the AI system’s behaviors. disrupt the player experience when it updates its behavior B - Update and adapts systematically - the NN partially G14 How does the NN use its knowledge of the player to update and adapt gameplay? disrupts the player experience when it updates its behavior C - Update and adapts abruptly - the NN disrupts the player experience when it updates its behavior Encourage granular feedback Enable the user to provide feedback indicating their preferences during regular A - Granular User feedback - direct interaction interaction. Overtime G15 B - Somewhat Granular User feedback - indirect How do players indicate their preference to the NN? interaction C - No Granular User feedback Convey the consequences of user actions related to NN A - Immediate Feedback conveying how players actions Immediately update or convey how user actions will impact future behaviors of the will impact the NN AI system. G16 B - Partial Feedback conveying how players actions will How does the game convey that the NN has changed due to their actions? impact the NN C - No feedback conveying how players actions will impact the NN Provide Global controls Allow the user to globally customize what the AI system monitors and how it A - Game provides clear global control over the NN G17 behaves. Does the game allow the player to customize the NN directly? If so, how? B - Game provides some global control over the NN C - Game provides no global control over the NN Notify users about changes A - Clear notifications of changes in the NNs capability Inform the user when the AI system adds or updates its capabilities. G18 How does the game notify users about changes to the NN? B - Some notifications of changes in the NNs capability 17 C - No notifications of changes in the NNs capability