Automated Game Testing with ICARUS: Intelligent Completion of Adventure Riddles Via Unsupervised Solving
Total Page:16
File Type:pdf, Size:1020Kb
Spotlight CHI PLAY'17 Extended Abstracts, Oct. 15–18, 2017, Amsterdam, NL Automated Game Testing with ICARUS: Intelligent Completion of Adventure Riddles via Unsupervised Solving Johannes Pfau Jan David Smeddinck We report on the design rationale, the practical “For human beings, testing the University of Bremen ICSI, University of California, Berkeley implementation, and its use in game development same game for a longer period of Bibliothekstraße 1, 1947 Center St, Berkeley, CA 94704 industry projects. The underlying solving mechanic is time can be quite demanding of 28359 Bremen, Germany [email protected] based on discrete reinforcement learning in a dualistic both their creativity and fashion, encompassing volatile short-term memory as concentration. Since projects Daedalic Entertainment Rainer Malaka well as persistent long-term memory that spans across require different styles of testing Papenreye 51, University of Bremen distinct game iterations. In combination with heuristics at different times, such as simply 22453 Hamburg Bibliothekstraße 1, that reduce the search space and the possibility to playing through the game as Germany 28359 Bremen, Germany employ pre-defined situation-dependent action choices, quickly as possible or in-depth bug [email protected] [email protected] the system manages to traverse complete playthrough testing of various parts of the iterations in roughly the same amount of time that a game, the testers often have to Abstract professional game tester requires for a speedrun. The actively force themselves to leave With ICARUS, we introduce a framework for autonomous ICARUS project was developed at Daedalic the path their brains are used to video game playing, testing, and bug reporting. We report Entertainment. The software can be used to generically and to come up with new creative on the design rationale, the practical implementation, and run all adventure games built with the popular ways of breaking the game. its use in game development industry projects. With Visionaire Engine [6] and is currently used for Additionally, even for a linear ICARUS, we introduce a framework for autonomous video evaluating daily builds, for large-scale hardware game, the number of possible game playing, testing, and bug reporting. compatibility and performance tests, as well as for combinations as well as the order semi-supervised quality assurance playthroughs. they are made in during a play ©Permission the authors, to make digital 2017. or hard This copies ofis all theor part authors of this work version for personal ofor classroomthe work. use is grantedIt is without posted fee provided here thatfor copies your are personalnot made or session can become extremely distributed for profit or commercial advantage and that copies bear this notice The supplementary video depicts real-time solving with use. Not for redistribution. large.” and the full citation on the first page. Copyrights for components of this work active control and observation via a web control panel. Theowned de byfi othersnitive than versionthe author(s) was must publishedbe honored. Abstracting as: with credit is Pfau,permitted. J., To copySmeddinck, otherwise, or republish,J. D., to& post Malaka, on servers R. or to(2017). redistribute to lists, requires prior specific permission and/or a fee. Request permissions - Maik Hildebrandt, Automatedfrom [email protected] Game Testing. with ICARUS: Intelligent Author Keywords Head of QA at Daedalic CompletionCHI PLAY'17 Extended of Adventure Abstracts, October Riddles 15–18, 2017, via Amsterdam, Unsupervised Automated game testing; quality assurance; Netherlands © 2017 Copyright is held by the owner/author(s). Publication Entertainment [12] Solving. Extended Abstracts Publication of the reinforcement learning; automated bug reporting; con Annualrights licensed Symposium to ACM. on Computer-Human Interaction ACM ISBN 978-1-4503-5111-9/17/10…$15.00 tinuous performance analysis; continuous integration inhttps://doi.org/10.1145/3130859.3131439 Play, 153–164. https://doi.org/10.1145/3130859.3131439 testing 153 Spotlight CHI PLAY'17 Extended Abstracts, Oct. 15–18, 2017, Amsterdam, NL ACM Classification Keywords categories named above. Following a discussion of the D.2.5 [Software Engineering]: Testing and Debugging; current state of the art in game testing, automated I.2.1 [Artificial Intelligence]: Applications and Expert testing, and the application of techniques from artificial Systems – Games; intelligence / machine learning in these contexts, we Crashes/Freezes K.8 [Personal Computing]: Games present the rationale and architecture of ICARUS in detail, together with exemplary use cases in the form of Shutting down the game A INTRODUCTION an industry case study and a discussion that reflects on unexpectedly or preventing Continuo us and extensive quality assurance (QA) plays the value that such systems can currently provide in the screen from rendering an important role in the video game industry. Modern game development processes, as well as an outlook on any further. games are often immensely complex software systems future developments in the area of intelligent that offer a broad range of possible game experiences automated game testing. This technical framework Blocker and are often immediately used by a large number of description and the according case study provide a consumers. At the same time, bugs or game glitches report on a novel system for automated game testing B Resulting in a game state can considerably harm the immersion, fun, and with adventure games. Readers from the scientific from which no further game endanger the overall game experience. Thus, a large community will gain a better understanding of the progress can be made. portion (typically ~10-20 %) [5] of the budget for a extent to which the game industry is embracing applied particular video game production is spent solely on artificial intelligence and machine learning in contexts General finding and reporting bugs, testing traversability, beyond classic game AI, while readers with a compatibility, performance, and aesthetics. Such issues background in the game industry can gain a better C Graphical flaws, animation are usually broken down into three major categories of understanding of how similar approaches might benefit issues, typos, glitches. severity (A: Crashes/Freezes, B: Blocker and C: their own projects. General. See: Table 1). While the order of severity is descending, the probability to miss a bug of the RELATED WORK Table 1. Common categories of bugs in particular type is simultaneously ascending. So far, automated frameworks for testing software or video games [1, p. 178]. Furthermore, the majority of missed bugs stems from specifically video games have been developed. error blindness (due to the habituation to the game Automated approaches exist, for example for selected, procedures and the sticking to established action choice discrete performance measurements, such as patterns), a specific form of change blindness [20], that determining the FPS at which a game can run on a testers grow more likely to fall victim to the more often given system, or the CPU and memory load when and frequently they play-test the same game. starting or running the game using new games or saved game states [9]. While such systems can In this light, the introduction of ICARUS in professional frequently detect issues in category A, blockers and video game development does not only aim at reducing especially more general flaws of non-technical nature, labor costs for QA, but also at improving the bug like unsolvable conditions in complex quests, remain tracking performance and at decreasing the cognitive undetected and require manual involvement. Other load for human testers, assisting in all of the bug approaches simulate playthroughs, using manually 154 Spotlight CHI PLAY'17 Extended Abstracts, Oct. 15–18, 2017, Amsterdam, NL predetermined [3, 8, 11] or recorded [4, 10] action As the following section will show in further detail, the for sequences. These systems can help with detecting ICARUS system tackles a number of shortcomings of Left each available many potential blockers and some more general issues. the systems that were discussed in this section. With clicks target object However, they require manual adaptation or re- an active and guided machine learning approach, it recording of the action sequences whenever the narrows the playthrough down to the most relevant procedure changes, which typically happens on a daily actions, after having explored the complete game for basis during the active game development of modern action set, highlighting potential yet less common Right each available games. Furthermore, most of the time video games do blockers as well as general blockers, that - unlike clicks target object not strictly constrain the player regarding the order in crashes or freezes - could have easily gone undetected which a sequence of actions needs to be executed. using more traditional automated testing. As Figure 7 ’Use’ with Actions are not always mandatory to perform in order shows, this can notably speed up the progress of QA each each available to progress in a game and often the player is given evaluations. item target object several choices on how to proceed. The former deterministic approaches thus require different ICARUS ’Use’ with manually defined (or recorded) action sequences. Even The system for intelligent completion of adventure each each available in games with just a few optional branches