CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

Video Game Selection Procedures For Experimental Research April Tyack, Peta Wyeth, Madison Klarkowski Queensland University of Technology (QUT) Brisbane, Australia [a.tyack, peta.wyeth, m.klarkowski]@qut.edu.au

ABSTRACT likely to produce a desired PX. To our knowledge, no Videogames are complex stimuli, and selecting games that guidelines exist to aid researchers in this search. Studying consistently induce a desired player experience (PX) in an the wider literature for pointers is generally unhelpful: only experimental setting can be challenging. The number of a few publications describe this aspect of experiment design relatively high-quality games being released each year in detail, and some omit their reasons for selecting games continues to increase, which makes deriving a shortlist of entirely [45]. Concerns have long been raised regarding the plausible candidate games from this pool increasingly degree to which researchers understand the videogames problematic. Despite this, guidance for structuring and they study, and the corresponding validity of experimental reporting on the game selection process remains limited. research in the field (see [16] for a review). However, the This paper therefore proposes two approaches to game extent of the issue is obfuscated by a tendency for selection: the first leverages online videogame databases unsuccessful papers to go unpublished (i.e., the "file-drawer and existing PX research, and is structured with respect to problem" [57]). An apparent lack of justified reasoning for widely-applicable videogame metadata. The second process videogame selection is particularly jarring in the context of applies established game design theory to serve researchers a literature that tends to extol the medium when introducing when insufficient connections between desired PX new research. Videogames are complicated – for many, this outcomes and recognisable game elements exist. Both is indeed part of their appeal [32] – and richer descriptions methods are accompanied by example reports of their are hence required for readers to understand how candidate application. The present work aims to assist experimental games suit the goals of a study. Clear reporting practices researchers in selecting videogames likely to meet their more broadly are crucial for study replication, and assist in needs, while encouraging more rigorous standards of demonstrating the maturity of the field. reporting in the field. Indeed, conducting and reporting on a well-justified Author Keywords videogame selection process has a number of benefits. Video games; experiments; player experience; game Applying more rigorous criteria to game selection gives selection. researchers better chances of predictably inducing desired ACM Classification Keywords PX outcomes. Similarly, papers outlining this process are K.8.0 [Personal Computing]: General - Games better equipped to defend the validity of their results, and are more likely to invite further interest towards the subject. INTRODUCTION Well-designed, executed, and reported experiments provide Experimental and quasi-experimental studies constitute a reliable evidence about the nature, quality, and key segment of player experience (PX) research. characteristics of videogame experiences. Publications that Videogames themselves are useful experimental stimuli due feature sound methods are more likely to be positively to their potential in maintaining a balance between external received by the academic community and external funding validity and experimental control [44]. bodies alike; they contribute to authors’ reputations for However, the sheer volume of yearly releases presents an quality work, while demonstrating the merit of the broader obstacle for researchers attempting to select games most field and its suitability for further investment. A lack of versatile game selection procedures is a considerable barrier Permission to make digital or hard copies of all or part of this work for to adopting more detailed reporting practices, although personal or classroom use is granted without fee provided that copies are others (e.g., publication page limits) may also pose an issue. not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for How researchers can efficiently and accurately create a components of this work owned by others than the author(s) must be shortlist of appropriate stimulus games remains an open honored. Abstracting with credit is permitted. To copy otherwise, or question. The present work proposes two alternative republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from approaches to address this gap. The first involves applying [email protected]. videogame metadata to structure the game selection CHI 2018, April 21–26, 2018, Montreal, QC, Canada process. In this approach, links between individual © 2018 Copyright is held by the owner/author(s). Publication rights metadata elements and PX are made using existing licensed to ACM. ACM 978-1-4503-5620-6/18/04...$15.00 research, thus forming game selection criteria. Public https://doi.org/10.1145/3173574.3173760

Paper 186 Page 1 CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada videogame databases also list a number of these metadata and confounding factors in their own work. In Study 1, for items, which allows for more efficient refinement in the example, they ignore factors included in the original study’s selection process. [3] pilot test (cell n=13), and instead observe a more obvious difference in control complexity: the non-violent The second approach to deriving game selection criteria is game uses only two keys, while the violent game requires based on the MDA framework [25]. This method assists mouse and keyboard input, with up to 20 keys. Based in researchers in translating a desired PX into more Self-Determination Theory, the researchers justified their recognisable game elements. Applying game design theory choice of covariate by applying existing work linking basic in this way is valuable in the event that relationships psychological need thwarting and aggression, together with between metadata elements and PX outcomes are not their more recent study findings connecting mastery of readily forthcoming, or when metadata-based criteria fail to controls in videogame play to competence need satisfaction. produce a shortlist of manageable length. This approach appears more successful than using the A paucity of published information exists to guide PX results of an underpowered pilot test: their hypotheses are researchers in selecting videogames likely to suit study supported both in the original text and the more recent aims. Authors rarely describe game characteristics in Bayesian re-analysis [22, 53]. We note, however, that sufficient detail for readers to understand the reasons for further evaluation of existing work linking game elements their inclusion in the research. We propose two videogame to PX would strengthen experiment design. selection methods based in existing user-centred McMahan et al. [44] highlight that stimulus games should frameworks and demonstrate their separate application to provide researchers with control over the experience of address these interrelated research gaps. play. Their own experiment involved Mario Kart Wii’s [47] BACKGROUND AND RELATED WORK Time Trial mode, which lacks the series’ random power- Improving the quality of experimental studies involving ups and game-controlled opponents, either of which could humans remains a high priority in fields such as health, introduce wild variation into participants’ experiences [44]. education, and psychology, particularly with respect to However, they also considered relatively normal parts of general study design [11, 19, 43], scale development and racing (e.g., crashing) as potential confounds, and we selection [41, 56, 59], and intervention design, selection, consider this excessive: some degree of variation across and use [7, 12, 35, 52]. In Human-Computer Interaction play sessions is desirable to avoid confounding the effect of (HCI), research has reviewed experimental design for videogame play with that of a constant stimulus [61]. usability testing [20, 21, 28], and the framing of user experience measures [5, 36], with a view towards Applying game modifications (“mods”) – external changes improving research quality. Recent work in the field has to a game’s assets – can be an effective way to exert control examined common issues in the reporting of statistical over unwanted aspects of a game’s design, or to create results in HCI publications [9, 15, 34]; for example, the stimuli more directly tailored to research goals [16]. While limits of null hypothesis significance testing. Videogame modding can be difficult and time-consuming, it may be researchers are presently concerned with improving necessary when desired game content is unavailable or measurement of game enjoyment [45] and PX more broadly otherwise unsuitable for research. However, the relatively [60]. small number of games that feature modding tools may restrict game selection when customised content is required. In proposing videogame selection methods, we aim to Researchers should strongly consider the trade-offs between complement this broad movement towards more robust commercial and custom-made (or modded) games when research design. Existing material is scarce, and papers that constructing their experiment design, and we suggest discuss game selection methods rarely make it their primary reviewing the excellent work on this topic [16, 27] for this topic. Anderson [2], for example, makes only passing purpose. mention of “best practices” for game selection in aggression research, proposing that violent and non-violent games Järvelä et al. [27] discuss desirable characteristics of should exhibit otherwise similar PX profiles. This stimulus stimulus games for event-based research, which analyses matching approach sees continued use within media effects atomised in-game events or tasks (e.g., collecting a power- research (e.g., [4]). However, a recent re-analysis of studies up). Taking events as the unit of analysis allows researchers using this method [22] suggests its difficulty to adequately to make claims about the effects of individual game match games, as a result of issues arising from null- elements (e.g., rewards) that occur across game types and hypothesis significance testing (NHST) and small sample genres [49]. In line with their research approach, Järvelä et sizes, especially in pilot studies. Increasing sample size – al. highlight the need to consider games with temporally specifically, minimum cell size – does make matching isolated and repeated tasks, an inherent capacity for data possible, notwithstanding the costs of obtaining a larger collection (e.g., automatically generated game logs), and sample. Przybylski et al. [53] illustrate an alternative the ability to create a consistent experience across approach, applying prior research within an appropriate participants. On the latter point, the importance of theory as the foundation for selecting (or modifying) games controlling for player skill (e.g., selecting games with difficulty settings) is emphasised [27].

Paper 186 Page 2 CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

These papers are valuable perspectives into the types of the spirit of the original meaning: for example, even when game characteristics that are generally desirable for Number of Players is a relevant factor, listing every experimental research. However, information on how to possible arrangement of a game’s multiplayer modes is structure the game selection process is less forthcoming: unnecessary if only one will be studied. even when examples are provided in the text (e.g., [26]), The VGMS lacks a measure of game quality that could decisions leading to the final selection remain somewhat improve the specificity of a search. While ‘quality’ is a ambiguous. purely subjective measure, aggregate review scores (e.g., The approach suggested by Järvelä et al. [27] – combining from ) may be interpreted as a correlate of subjective measures of game quality with broad [17] or usability [40], which is generally a desirable attribute to specific [42] game taxonomies – is likely valuable when a consider in the time-limited experimental setting. As small number of games are under consideration as potential previously noted, existing work [27] suggests that these stimuli; indeed, the authors appear to begin from this publicly available scores are a powerful addition to initial position. Taxonomies are less helpful for creating such a search criteria. We therefore recommend ‘quality’ as a shortlist: hundreds of games are released every year on PC practical, if unofficial, addition to criteria derived from the alone [58], and categorising games by taxonomy can be VGMS. immensely time-consuming without first-hand play A number of VGMS items (e.g., genre, mode of play) are experience. Subjective measures of game quality (e.g., presently mentioned, if only briefly, when describing aggregated review scores) can assist in delimiting a search; stimulus games in published research – indeed, many are however, multiple search criteria are needed to manage the associated with existing PX research outcomes. Applying growing volume of commercial game releases. Creating an VGMS items to structure game selection represents a initial shortlist requires a technique that can accurately formalisation of current practices that are partially or remove large numbers of obviously unsuitable games from informally conducted. Because our aim is to provide consideration without individual review. researchers with accessible methods, we present an example THE PROPOSED APPROACH of how the VGMS could be feasibly applied and reported. We therefore propose a videogame selection technique that Example Application of Videogame Metadata delimits an initial search using widely-applicable game The following example details the process of developing characteristics. Recent work in information science [31, 37, criteria for game selection with VGMS categories as a 38] has taken a user-centred approach to metadata guide. It is hoped that presenting such information in this derivation. Their iterative process of item selection was paper will provide practical guidance in interpreting these initially grounded in player interviews, and later refined factors in a plausible context. This example refers to an with online survey data and interviews with a wider range experimental study concerned with the effects of immersion of potential users (e.g., developers, curators). This user- on game enjoyment; a report of the game selection process centred approach reflects a view that classification schema follows. should suit their likely users’ needs; we take a similar position in reinterpreting it for use in structuring game Selection criteria for a game designed to induce immersion selection procedures. The Video Game Metadata Schema were developed via examination of the immersion concept (VGMS), in its current incarnation [39], features both definition [30] and relevant PX research, structured with objective (e.g., platform) and subjective characteristics VGMS items [39] most likely to strongly affect PX. For the (e.g., theme). The latter are coded using a restricted purposes of this research, immersion “involves a lack of vocabulary outlined in the schema. awareness of time, a loss of awareness of the real world, involvement and a sense of being in the task environment” The VGMS is well-suited to shortlist derivation because it [30]. VGMS items selected for further review, in the order features both easily searchable elements (e.g., genre, they appear in the schema, are Gameplay Genre, platform, release date) that can immediately cut wide Progression, Platform, Number of Players, Point of View, swathes of games from consideration, as well as a more Difficulty Control, and Retail Release Date. A measure of detailed set of game elements that are less readily Quality was appended to these items to represent usability searchable (e.g., Number of Players, Point of View) but [40]. remain likely to influence PX nonetheless. Not all factors will apply to every study – however, the breadth of study  Gameplay Genre: The target game should belong to designs in the literature suggests that no standard subset of either the Action Adventure, Action RPG, or RPG genre. ‘optimal’ VGMS items exists: researchers are best Immersive experiences in these genres are more likely, as positioned to identify the factors with the greatest utility in they typically offer compelling narratives, character their own studies. Considering the relevance of each item designs, and audiovisual fidelity [32]. reveals a profile of game attributes identified in the  Progression: A generally linear mode of progression is literature as most likely to elicit a desired PX. We recognise most likely to support continuous involvement in the that appropriating the VGMS for structuring game selection involves some degree of reinterpretation, while retaining

Paper 186 Page 3 CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

game world [33]; this consistent sense of progress is each step of this process has been included as associated with immersion [29]. supplementary material.  Platform: As the game will be played in a university Difficulty computer lab, only games released on PC may be Title Progression Perspective Control considered for the study. Night in the Linear,  Number of Players: Although competitive multiplayer Woods Branching 3rd person None games may strongly induce immersion [10], the target game should feature a single-player mode in order to Hollow Linear, minimise potential confounds caused by the presence of Knight Open world 3rd person None other players. Hellblade: Senua’s DDA,  Perspective: A first-person perspective, compared to Sacrifice Linear 3rd person Settings third-person, is likely to support immersion through direct avatar embodiment [14]. Resident Evil 7: Biohazard Linear 1st person Settings  Difficulty Control: A choice of difficulty level or (ideally) Dynamic Difficulty Adjustment (DDA) will broaden the Little pool of suitable participants, and make immersive Nightmares Linear 3rd person None experiences more likely for participants across skill levels West of [24]. Loathing Open world 3rd person None  Retail Release Date: The candidate game should be a Monster recent release to more closely link research outcomes to Slayers Linear 3rd person Settings the current state of the art. The novelty of a game has also NieR: Linear, been linked to immersive experiences [29]. Automata Open world 3rd person Modular  Quality: Usability issues (e.g., lack of adequate control Pyre Linear 3rd person Settings [40, 50]) may frustrate players and limit immersion; the selected game should therefore minimise usability issues Table 1. Shortlisted games evaluated on progression type, to allow participants to focus entirely on play. Games that camera perspective, and the presence of options to regulate received a strong critical reception are appropriate difficulty. candidates for this reason. One game, Resident Evil 7: Biohazard, met all criteria, and A search was conducted on GameFAQs [18] for PC games five others fail only in their use of 3rd person perspective. released in 2017, categorised as ‘Action Adventure’ or However, we highlight that the goal of using VGMS items ‘Role-Playing’, and sorted according to Metacritic score. as criteria was to derive a shortlist of plausible candidate Only games with 5 or more critic reviews were included in games. The remaining criteria are likely to assist in making the search as a precaution against individual reviewer bias. the final decision; regardless, using the number of satisfied Game quality was operationalised as having an aggregated criteria alone to justify game selection is highly precarious. review score of 80 or greater (on a 100-point scale) on Each shortlisted game has advantages and disadvantages for Metacritic. This process was conducted in September, so use in experimental study beyond the broad factors outlined games released in the final months of 2017 could not be in this process. For example, while Resident Evil: considered. A total of 17 games were identified as meeting Biohazard may indeed be highly immersive, its strong these criteria. Manual inspection revealed 7 games that horror elements may interfere with recruitment or cause were originally released prior to 2017 — these games were participants to drop out during the session. A metadata- released on PC in 2017 (e.g., Bayonetta [51]), expansions based approach should be used to supplement, rather than for a previously-released title (e.g., Darkest Dungeon: The replace, well-considered reasoning for the final selection. Crimson Court [55]), or remasters of older games (e.g., We emphasise that actually playing the shortlisted games at Planescape: Torment - Enhanced Edition [48]). These this time is strongly recommended to qualify the final games were subsequently excluded, as the original design decision-making process. practices that guided these games’ production may not This example report demonstrates that considering the accurately represent the present state of the industry. We range of game elements outlined in the VGMS provides a further identified that one game (Lone Echo [54]) required fuller picture of the final candidate game – and the desired a VR headset to play; lacking such hardware, we removed it PX it helps to create – to researchers and readers alike. One from consideration. The final shortlist of 9 games is strength of this approach is its basis in existing research that presented in Table 1. Each remaining game was assessed relates VGMS items to PX outcomes. However, this according to the remaining criteria; however, all games information may not be available, especially when the featured a single-player mode, and we therefore removed desired PX is not directly tied to a well-studied concept, as this column for legibility. The list of shortlisted games at it was in the previous example. We propose the application

Paper 186 Page 4 CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada of game design theory, developed to connect tangible game have non-violent experiences with the game. A key elements to desired experiences, as one approach to game dynamic arising from progress is for players to see their selection in such an event. contribution towards an in-game goal. Dynamics of accessible challenge involve quick adaptation to game USING THE MDA FRAMEWORK TO IDENTIFY RELEVANT CRITERIA controls, the ability to learn game functions without The MDA (Mechanics, Dynamics, Aesthetics) framework significant penalty, and being able to approach obstacles considers videogames with the view that game designers’ with appropriate levels of confidence. and players’ perspectives towards the medium represent Suitable mechanics then include: opposite approaches – that is, that the act of game design positions developers in a fundamentally different way than  Implicit or delayed fail states players, who primarily consider games with consumption in  Non-violent player actions mind [25]. Within the framework, ‘mechanics’ are the various forms of player action that are afforded by input  Audiovisual representation of player standing relative to devices, and instantiated in code; ‘dynamics’ are the their goal (e.g., progress bar) emergent player interactions that occur through game  Immediately responsive controls mechanics; and ‘aesthetics’ are the emotional states or  A limited number of possible inputs (actions) experiences that result from play. The framework is primarily used in game design (e.g., [13, 23]) as a way of  A tutorial level, or just-in-time tutorial prompts linking game attributes, player behaviour, and PX; this The game dynamics and mechanics identified as supporting quality also makes the framework highly valuable for game a particular set of aesthetics may then be applied as selection in the absence of readily applicable PX research. selection criteria for determining an appropriate candidate This paper’s proposed application of the MDA framework game. We again underscore the incomparable value of begins by describing a set of desired aesthetics (i.e., an researchers playing these shortlisted games before finalising intended PX), which are then used to identify potential their inclusion in the experimental design. dynamics and mechanics that could support such aesthetics Approaching videogame selection from a design – and, with equal utility, those that would be decidedly perspective makes its justification possible in the absence of inappropriate. These characteristics become the selection existing research. Applying the MDA framework, or other criteria for creating a shortlist of suitable games. This game design theories, could assist researchers interested in approach can be used in isolation, or in conjunction with broadening studies of PX beyond established concepts. It VGMS items, to the extent that these items appear relevant may also be valuable in the event that VGMS-based to desired PX outcomes. Similarly, pre-existing research exclusion criteria fail to generate a sufficiently limited can and should be used to support this method where shortlist. However, deriving appropriate mechanics, available. The following example demonstrates use of the dynamics, and aesthetics requires a degree of familiarity method in isolation from prior research and the VGMS to and experience with videogames, as does correctly exhibit the utility of its independent application. We do, identifying games that feature them. (Indeed, researchers however, recommend applying other methods (such as the who are also experienced players and developers will likely VGMS) and relevant existing work wherever possible. make the fullest use of this approach.) We suggest that this Example Application of the MDA Framework represents an opportunity for researchers with limited Consider an example study in which the target game must design expertise to collaborate with developers, lending the repair participant mood following an unrelated frustrating process a measure of multidisciplinary integrity. task. The following section presents a feasible application The process outlined in this section is inherently subjective, of the MDA framework in deriving a set of characteristics and may produce criteria broad enough to describe a variety that the target game should reflect. of videogame types. A degree of care is therefore required Mood Management Theory [63] suggests the selection of a to interpret these criteria in ways that meet study stimulus game that induces positive affect and low arousal requirements. Clear reporting practices are crucial when to counteract the negatively-valenced, arousing frustration applying the MDA framework to game selection. task. These desired emotions can be translated into game DISCUSSION aesthetics that are likely to engender such affective Differences between the two methods proposed in this responses. The aesthetics chosen are gentleness, progress, paper reflect the range of positions in which researchers and accessible challenge; each of these either limits further may find themselves when planning an experimental study. frustration, provides opportunities for satisfaction, or lowers The greatest strength of the metadata-based selection arousal levels. process lies in its direct application of existing work: Dynamics resulting from gentleness as an aesthetic goal videogame metadata derived from an iterative and user- could involve players having the ability to make mistakes centred process, online databases of videogame titles, and without immediately ending the game. Players should also research linking search criteria to PX outcomes. The

Paper 186 Page 5 CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada approach relies on the existence of relevant PX literature, LIMITATIONS AND FUTURE WORK however, which is not always the case. Certainly, no The methods outlined in this paper generally discuss games method (including those proposed here) can claim universal as abstractions – as objects to be categorised, or applicability; other frameworks (e.g., activity theory [46], compilations of design elements that support a particular the Design Box [1]) are likely to yield practical videogame experience. In practice, videogame play is more complex selection methods with different strengths. Regardless of than what analytic methods can encapsulate. We therefore approach, thorough evaluation of connections between find it appropriate to once again recommend, as others have game elements and PX is invaluable during experiment (e.g., [6, 62]), that researchers experience a reasonable design to avoid potential confounds, and hence maximise proportion of the games they study. A degree of critical the value of collected data. distance from the subject matter may indeed limit researcher bias; regardless, the continued importance of A selection process structured using the MDA framework regular playtesting in game development highlights that applies design thinking to link PX with tangible game essential knowledge of the medium is uniquely derived elements that would support a desired experience. The from play. In a similar vein, the present work makes every framework’s robustness and versatility – demonstrated by attempt to encourage more thoughtful game selection its continued use [13, 23, 60], despite its publication in practices; however, it remains possible to apply either 2004 [25] – lends strength to its application in the present method uncritically, adopting a reductionist perspective that work, suggesting utility in accommodating future industry considers games only in terms of their separate elements. directions. The MDA selection process can be conducted While we are unable to directly prevent a paint-by-numbers without a substantial body of existing literature, although approach to research, we hope this paper makes such work research outcomes can be easily incorporated into the easier to identify and critique. process wherever relevant. Unlike many game taxonomies, the MDA framework itself is relatively straightforward; This paper contributes to an ongoing trend in videogame researchers are therefore able to focus on its application, research and HCI that seeks to improve study design, and in rather than interpretation. However, this game selection turn, the quality of research findings in the field. process inherits the subjectivity that accompanies game Videogame research has proceeded for over a decade with design. Greater attention must be paid to reporting the surprisingly sparse literature about how it should be done. procedure as a result. We are enthusiastic about the prospect of more researchers turning their attention to the place of videogames in study Despite these differences, both methods begin from the design, given their apparent irreducibility; in particular, we perspective of studying a particular experience of interest. eagerly anticipate the construction of new videogame This reflects the focus of PX research on the player – who selection practices with strengths that differ from those is obviously important – in the player-game interaction. We presented here. We note, for example, that successful suggest, however, that this has occurred to the detriment of application of the VGMS-based method (when used in considering the videogame’s role in helping elicit desirable isolation) depends on existing study of the subject matter; it experiences. The approaches identified in this paper correct is hence unsuited to shortlist derivation when few of its this imbalance, acknowledging the equal contribution of game elements appear linked to PX in the extant literature. videogames in producing PX. In attending to the videogame as artefact, and (comparatively) distancing ourselves from CONCLUSIONS the player, we approach PX from a perspective that This paper has proposed two approaches to videogame considers player characteristics – which are largely beyond selection in experimental PX research. The first, based on researcher control – less relevant to PX experiment design. videogame metadata, can quickly and accurately remove large numbers of games from consideration, and is best- Videogame selection is always subjective. In forming links suited to studies with strong theoretical connections to between game elements and experience, the proposed existing PX research. The second approach, based on the methods ensure that researchers clarify their subjective MDA framework, provides a means for researchers to interpretations of experience, which may be otherwise derive a list of game elements likely to support a desired indistinct (e.g., immersion, presence, and flow [8, 45]); experience – and a corresponding set of appropriate games greater retention of conceptual consistency and legibility – when existing literature is less readily applicable. across the literature can be attained as a result. Indeed, these Example applications of both methods demonstrate that methods aim to promote considered reflection on how reporting the selection process clarifies the relationship videogames – and game elements – contribute to PX, between candidate games and research goals. Rigour in support meaningful engagement with relevant literature, designing and reporting videogame research is crucial for and assist in structuring a rigorous, detailed report of how study replication and the credibility of the field; both would these practices have influenced the subjective selection make funding PX research a more viable prospect. process. We reiterate that formulaic application of either method is unlikely to benefit neither individual experiment design nor videogame research as a field.

Paper 186 Page 6 CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

ACKNOWLEDGMENTS Developing and Evaluating Complex Interventions: This work was completed with the benefit of the Australian The New Medical Research Council Guidance. BMJ Postgraduate Award (APA) and the QUT Excellence Top- 337. Up Scholarship. 13. Christy Dena. 2017. Finding a Way: Techniques to REFERENCES Avoid Schema Tension in Narrative Design. 1. Roger Altizer Jr. and José P. Zagal. 2014. Designing Transactions of the Digital Games Research Inside the Box or Pitching Practices in Industry and Association 3, 1: 27-61. Education. In Proceedings of DiGRA 2014. 14. Alena Denisova and Paul Cairns. 2015. First Person vs. 2. Craig A. Anderson. 2004. An Update on the Effects of Third Person Perspective in Digital Games: Do Player Playing Violent Video Games. Journal of Adolescence Preferences Affect Immersion? In Proceedings of the 27, 1: 113-122. 33rd Annual ACM Conference on Human Factors in 3. Craig A. Anderson, Nicholas L. Carnagey, Mindy Computing Systems, 145-148. Flanagan, Arlin J. Benjamin Jr., Janie Eubanks, and 15. Mark D. Dunlop and Mark Baillie. 2009. Paper Jeffery C. Valentine. 2004. Violent Video Games: Rejected (p>0.05): An Introduction to the Debate on Specific Effects of Violent Content on Aggressive Appropriateness of Null-Hypothesis Testing. Thoughts and Behavior. Advances in Experimental International Journal of Mobile Human Computer Social Psychology 36: 199-249. Interaction 1, 3: 1-8. 4. Patrícia Arriaga, Joana Adrião, Filipa Madeira, Inês 16. Malte Elson and Thorsten Quandt. 2016. Digital Games Cavaleiro, Alexandra Maia e Silva, Isabel Barahona, in Laboratory Experiments: Controlling a Complex and Francisco Esteves. 2015. A "Dry Eye" for Victims Stimulus Through Modding. Psychology of Popular of Violence: Effects of Playing a Violent Video Game Media Culture 5, 1: 52-65. on Pupillary Dilation to Victims and on Aggressive 17. Christian Elverdam and Espen Aarseth. 2007. Game Behavior. Psychology of Violence 5, 2: 199-208. Classification and Game Design: Construction Through 5. J. A. Bargas-Avila and K. Hornbæk. 2011. Old Wine in Critical Analysis. Games and Culture 2, 1: 3-22. New Bottles or Novel Challenges: A Critical Analysis 18. GameFAQs. 2017. Search GameFAQs. Retrieved 17 of Empirical Studies of User Experience. In September, 2017 from Proceedings of the SIGCHI Conference on Human https://www.gamefaqs.com/search_advanced Factors in Computing Systems, 2689-2698. 19. Russell Gersten, Scott Baker, and John Wills Lloyd. 6. Pippin Barr, James Noble, and Robert Biddle. 2007. 2000. Designing High-Quality Research in Special Video Game Values: Human-Computer Interaction and Education: Group Experimental Design. The Journal of Games. Interacting with Computers 19, 2: 180-195. Special Education 34, 1: 2-18. 7. Debbie Bonetti, Martin Eccles, Marie Johnston, Nick 20. Wayne D. Gray and Marilyn C. Salzman. 1998. Steen, Jeremy Grimshaw, Rachel Baker, Anne Walker, Damaged Merchandise? A Review of Experiments that and Nigel Pitts. 2005. Guiding the Design and Compare Usability Evaluation Methods. Human- Selection of Interventions to Influence the Computer Interaction 13, 3: 203-261. Implementation of Evidence-Based Practice: An 21. Morten Hertzum and Niels Ebbe Jacobsen. 2001. The Experimental Simulation of a Complex Intervention Evaluator Effect: A Chilling Fact About Usability Trial. Social Science & Medicine 60, 9: 2135-2147. Evaluation Methods. International Journal of Human- 8. Elizabeth A. Boyle, Thomas M. Connolly, Thomas Computer Interaction 13, 4: 421-443. Hainey, and James M. Boyle. 2012. Engagement in 22. Joseph Hilgard, Christopher R. Engelhardt, Bruce D. Digital Entertainment Games: A Systematic Review. Bartholow, and Jeffrey N. Rouder. 2017. How Much Computers in Human Behavior 28, 3: 771-780. Evidence Is p > .05? Stimulus Pre-Testing and Null 9. Paul Cairns. 2007. HCI... Not as it Should Be: Primary Outcomes in Violent Video Games Research. Inferential Statistics in HCI Research. In Proceedings Psychology of Popular Media Culture 6, 4: 361-380. of the 21st British HCI Group Annual Conference on 23. Clint Hocking. 2011. Dynamics: The State of the Art. People and Computers: HCI...But not as we Know It - Video. Volume 1, 195-201. 24. Robin Hunicke and Vernell Chapman. 2004. AI for 10. Paul Cairns, Anna L. Cox, Matthew Day, Hayley Dynamic Difficulty Adjustment in Games. In Martin, and Thomas Perryman. 2013. Who But Not Challenges in Game Artificial Intelligence, 91-96. Where: The Effect of Social Play on Immersion in 25. Robin Hunicke, Marc LeBlanc, and Robert Zubek. Digital Games. International Journal of Human- 2004. MDA: A Formal Approach to Game Design and Computer Studies 71, 11: 1069-1077. Game Research. In Proceedings of the AAAI Workshop 11. Donald T. Campbell and Julian C. Stanley. 2015. on Challenges in Game AI. Experimental and Quasi-Experimental Designs for 26. Simo Järvelä, Inger Ekman, J. Matias Kivikangas, and Research. Ravenio Books. Niklas Ravaja. 2014. A Practical Guide to Using 12. Peter Craig, Paul Dieppe, Sally Macintyre, Susan Digital Games as an Experiment Stimulus. Michie, Irwin Nazareth, and Mark Petticrew. 2008.

Paper 186 Page 7 CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

Transactions of the Digital Games Research 39. Jin Ha Lee, Andrew Perti, Rachel Ivy Clarke, Travis Association 1, 2. W. Windleharth, and Marc Schmalz. 2017. Video 27. Simo Järvelä, Inger Ekman, J. Matias Kivikangas, and Game Metadata Schema Version 4.0. Retrieved from Niklas Ravaja. 2015. Stimulus Games. In Game http://gamer.ischool.uw.edu/official_release/ Research Methods. ETC Press. 40. Ian J. Livingston, Regan L. Mandryk, and Kevin G. 28. Monique W. M. Jaspers. 2009. A Comparison of Stanley. 2010. Critic-Proofing: How Using Critic Usability Methods for Testing Interactive Health Review and Game Genres Can Refine Heuristic Technologies: Methodological Aspects and Empirical Evaluations. In Proceedings of the International Evidence. International Journal of Medical Informatics Academic Conference on the Future of Game Design 78, 5: 340-353. and Technology, 48-55. 29. Charlene Jennett. 2010. Is Game Immersion Just 41. Kathleen N. Lohr. 2002. Assessing Health Status and Another Form of Selective Attention? An Empirical Quality-of-Life Instruments: Attributes and Review Investigation of Real World Dissociation in Computer Criteria. Quality of Life Research 11, 3: 193-205. Game Immersion. PhD Thesis. University College 42. Sus Lundgren and Staffan Bjork. 2003. Game London, London, England. Mechanics: Describing Computer-Augmented Games 30. Charlene Jennett, Anna L. Cox, Paul Cairns, Samira in Terms of Interaction. In Proceedings of the 2003 Dhoparee, Andrew Epps, Tim Tijs, and Alison Walton. Technologies for Interactive Digital Storytelling and 2008. Measuring and Defining the Experience of Entertainment Conference. Immersion in Games. International Journal of Human- 43. Frances Mair and Pamela Whitten. 2000. Systematic Computer Studies 66, 9: 641-661. Review of Studies of Patient Satisfaction with 31. Jacob Jett, Simone Sacchi, Jin Ha Lee, and Rachel Ivy Telemedicine. BMJ 320, 7248: 1517-1520. Clarke. 2016. A Conceptual Model for Video Games 44. Ryan P. McMahan, Eric D. Ragan, Anamary Leal, and Interactive Media. Journal of the Association for Robert J. Beaton, and Doug A. Bowman. 2011. Information Science and Technology 67, 3: 505-517. Considerations for the Use of Commercial Video 32. Daniel Johnson, Lennart Nacke, and Peta Wyeth. 2015. Games in Controlled Experiments. Entertainment All about that Base: Differing Player Experiences in Computing 2, 1: 3-9. Video Game Genres and the Unique Case of MOBA 45. Elisa D. Mekler, Julia Ayumi Bopp, Alexandre N. Games. In Proceedings of 33rd Annual ACM Tuch, and Klaus Opwis. 2014. A Systematic Review of Conference on Human Factors in Computing Systems, Quantitative Studies on the Enjoyment of Digital 2265-2274. http://dx.doi.org/10.1145/2702123.2702447 Entertainment Games. In Proceedings of the 32nd 33. Daniel Johnson, Peta Wyeth, Penelope Sweetser, and Annual ACM Conference on Human Factors in John Gardner. 2012. Personality, Genre and Videogame Computing Systems, 927-936. Play Experience. In Proceedings of the 4th 46. Bonnie A. Nardi, ed. Context and Consciousness: International Conference on Fun and Games, 117-120. Activity Theory and Human-Computer Interaction. 34. Maurits Kaptein and Judy Robertson. 2012. Rethinking 1996, MIT Press: Massachusetts, USA. Statistical Analysis Methods for CHI. In Proceedings 47. Nintendo EAD. 2008. Mario Kart Wii. Videogame of the SIGCHI Conference on Human Factors in [Wii]. Nintendo, Kyoto, Japan. Computing Systems, 1105-1114. 48. Overhaul Games. 2017. Planescape: Torment - 35. Robert E. Larzelere, Brett R. Kuhn, and Byron Enhanced Edition. Videogame [PC]. Beamdog, Johnson. 2004. The Intervention Selection Bias: An Edmonton, Canada. Underrecognized Confound in Intervention Research. 49. Cody Phillips, Daniel Johnson, Peta Wyeth, Leanne Psychological Bulletin 130, 2: 289-303. Hides, and Madison Klarkowski. 2015. Redefining 36. Effie Lai-Chong Law, Paul van Schaik, and Virpi Roto. Videogame Reward Types. In Proceedings of the 2014. Attitudes Towards User Experience (UX) Annual Meeting of the Australian Special Interest Measurement. International Journal of Human- Group for Computer Human Interaction, 83-91. Computer Studies 72, 6: 526-541. 50. David Pinelle, Nelson Wong, and Tadeusz Stach. 2008. 37. Jin Ha Lee, Hyerim Cho, Violet Fox, and Andrew Perti. Heuristic Evaluation for Games: Usability Principles 2013. User-Centered Approach in Creating a Metadata for Video Game Design. In Proceedings of the SIGCHI Schema for Video Games and Interactive Media. In Conference on Human Factors in Computing Systems, Proceedings of the 13th ACM/IEEE-CS Joint 1453-1462. Conference on Digital Libraries, 229-238. 51. PlatinumGames. 2017. Bayonetta. Videogame [PC]. 38. Jin Ha Lee, Rachel Ivy Clarke, and Andrew Perti. 2015. SEGA, Tokyo, Japan. Empirical Evaluation of Metadata for Video Games and 52. Michael Pressley, Steve Graham, and Karen Harris. Interactive Media. Journal of the Association for 2006. The State of Educational Intervention Research Information Science and Technology 66, 12: 2609- as Viewed Through the Lens of Literacy Intervention. 2625. British Journal of Educational Psychology 76, 1: 1-19.

Paper 186 Page 8 CHI 2018 Paper CHI 2018, April 21–26, 2018, Montréal, QC, Canada

53. Andrew K. Przybylski, Edward L. Deci, C. Scott Vet. 2007. Quality Criteria Were Proposed for Rigby, and Richard M. Ryan. 2014. Competence- Measurement Properties of Health Status Impeding Electronic Games and Players' Aggressive Questionnaires. Journal of Clinical Epidemiology 60, Feelings, Thoughts, and Behaviors. Journal of 1: 34-42. Personality and Social Psychology 106, 3: 441-457. 60. Vero Vanden Abeele, Lennart E. Nacke, Elisa D. 54. . 2017. Lone Echo. Videogame [PC]. Mekler, and Daniel Johnson. 2016. Design and Studios, California, USA. Preliminary Validation of The Player Experience 55. Red Hook Studios. 2017. Darkest Dungeon - The Inventory. In Proceedings of the 2016 Annual Crimson Court. Videogame [PC]. Red Hook Studios, Symposium on Computer-Human Interaction in Play Vancouver, Canada. Companion Extended Abstracts, 335-341. 56. John P. Robinson, Phillip R. Shaver, and Lawrence S. 61. Gary L. Wells and Paul D. Windschitl. 1999. Stimulus Wrightsman. 1991. Criteria for Scale Selection and Sampling and Social Psychological Experimentation. Evaluation. Measures of Personality and Social Personality and Social Psychology Bulletin 25, 9: Psychological Attitudes 1, 3: 1-16. 1115-1125. 57. Robert Rosenthal. 1979. The File Drawer Problem and 62. Dmitri Williams. 2005. Bridging the Methodological Tolerance for Null Results. Psychological Bulletin 86, Divide in Game Research. Simulation & Gaming 36, 4: 3: 638-641. 447-463. 58. SteamSpy. 2017. Monthly Summaries. Retrieved 7th 63. Dolf Zillmann. 1988. Mood Management Through September, 2017 from http://steamspy.com/year/ Communication Choices. The American Behavioral 59. Caroline B. Terwee, Sandra D. M. Bot, Michael R. de Scientist 31, 3: 327-340. Boer, Daniëlle A. W. M. van der Windt, Dirk L. Knol,

Joost Dekker, Lex M. Bouter, and Henrica C. W. de

Paper 186 Page 9