This may be the author’s version of a work that was submitted/accepted for publication in the following source:

Tyack, April, Wyeth, Peta,& Klarkowski, Madison (2018) Video game selection procedures for experimental research. In Cox, A & Perry, M (Eds.) Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, United States of America, pp. 1-9.

This file was downloaded from: https://eprints.qut.edu.au/117644/

c Consult author(s) regarding copyright matters

This work is covered by copyright. Unless the document is being made available under a Creative Commons Licence, you must assume that re-use is limited to personal use and that permission from the copyright owner must be obtained for all other uses. If the docu- ment is available under a Creative Commons License (or other specified license) then refer to the Licence for details of permitted re-use. It is a condition of access that users recog- nise and abide by the legal requirements associated with these rights. If you believe that this work infringes copyright please provide details by email to [email protected]

Notice: Please note that this document may not be the Version of Record (i.e. published version) of the work. Author manuscript versions (as Sub- mitted for peer review or as Accepted for publication after peer review) can be identified by an absence of publisher branding and/or typeset appear- ance. If there is any doubt, please refer to the published source. https://doi.org/10.1145/3173574.3173760 Video Game Selection Procedures For Experimental Research April Tyack, Peta Wyeth, Madison Klarkowski Queensland University of Technology (QUT) Brisbane, Australia [a.tyack, peta.wyeth, m.klarkowski]@qut.edu.au

ABSTRACT likely to produce a desired PX. To our knowledge, no Videogames are complex stimuli, and selecting games that guidelines exist to aid researchers in this search. Studying consistently induce a desired player experience (PX) in an the wider literature for pointers is generally unhelpful: only experimental setting can be challenging. The number of a few publications describe this aspect of experiment design relatively high-quality games being released each year in detail, and some omit their reasons for selecting games continues to increase, which makes deriving a shortlist of entirely [45]. Concerns have long been raised regarding the plausible candidate games from this pool increasingly degree to which researchers understand the videogames problematic. Despite this, guidance for structuring and they study, and the corresponding validity of experimental reporting on the game selection process remains limited. research in the field (see [16] for a review). However, the This paper therefore proposes two approaches to game extent of the issue is obfuscated by a tendency for selection: the first leverages online videogame databases unsuccessful papers to go unpublished (i.e., the "file-drawer and existing PX research, and is structured with respect to problem" [57]). An apparent lack of justified reasoning for widely-applicable videogame metadata. The second process videogame selection is particularly jarring in the context of applies established game design theory to serve researchers a literature that tends to extol the medium when introducing when insufficient connections between desired PX new research. Videogames are complicated – for many, this outcomes and recognisable game elements exist. Both is indeed part of their appeal [32] – and richer descriptions methods are accompanied by example reports of their are hence required for readers to understand how candidate application. The present work aims to assist experimental games suit the goals of a study. Clear reporting practices researchers in selecting videogames likely to meet their more broadly are crucial for study replication, and assist in needs, while encouraging more rigorous standards of demonstrating the maturity of the field. reporting in the field. Indeed, conducting and reporting on a well-justified Author Keywords videogame selection process has a number of benefits. Video games; experiments; player experience; game Applying more rigorous criteria to game selection gives selection. researchers better chances of predictably inducing desired ACM Classification Keywords PX outcomes. Similarly, papers outlining this process are K.8.0 [Personal Computing]: General - Games better equipped to defend the validity of their results, and are more likely to invite further interest towards the subject. INTRODUCTION Well-designed, executed, and reported experiments provide Experimental and quasi-experimental studies constitute a reliable evidence about the nature, quality, and key segment of player experience (PX) research. characteristics of videogame experiences. Publications that Videogames themselves are useful experimental stimuli due feature sound methods are more likely to be positively to their potential in maintaining a balance between external received by the academic community and external funding validity and experimental control [44]. bodies alike; they contribute to authors’ reputations for However, the sheer volume of yearly releases presents an quality work, while demonstrating the merit of the broader obstacle for researchers attempting to select games most field and its suitability for further investment. A lack of versatile game selection procedures is a considerable barrier to adopting more detailed reporting practices, although others (e.g., publication page limits) may also pose an issue. How researchers can efficiently and accurately create a shortlist of appropriate stimulus games remains an open question. The present work proposes two alternative approaches to address this gap. The first involves applying videogame metadata to structure the game selection process. In this approach, links between individual metadata elements and PX are made using existing research, thus forming game selection criteria. Public videogame databases also list a number of these metadata and confounding factors in their own work. In Study 1, for items, which allows for more efficient refinement in the example, they ignore factors included in the original study’s selection process. [3] pilot test (cell n=13), and instead observe a more obvious difference in control complexity: the non-violent The second approach to deriving game selection criteria is game uses only two keys, while the violent game requires based on the MDA framework [25]. This method assists mouse and keyboard input, with up to 20 keys. Based in researchers in translating a desired PX into more Self-Determination Theory, the researchers justified their recognisable game elements. Applying game design theory choice of covariate by applying existing work linking basic in this way is valuable in the event that relationships psychological need thwarting and aggression, together with between metadata elements and PX outcomes are not their more recent study findings connecting mastery of readily forthcoming, or when metadata-based criteria fail to controls in videogame play to competence need satisfaction. produce a shortlist of manageable length. This approach appears more successful than using the A paucity of published information exists to guide PX results of an underpowered pilot test: their hypotheses are researchers in selecting videogames likely to suit study supported both in the original text and the more recent aims. Authors rarely describe game characteristics in Bayesian re-analysis [22, 53]. We note, however, that sufficient detail for readers to understand the reasons for further evaluation of existing work linking game elements their inclusion in the research. We propose two videogame to PX would strengthen experiment design. selection methods based in existing user-centred McMahan et al. [44] highlight that stimulus games should frameworks and demonstrate their separate application to provide researchers with control over the experience of address these interrelated research gaps. play. Their own experiment involved Mario Kart Wii’s [47] BACKGROUND AND RELATED WORK Time Trial mode, which lacks the series’ random power- Improving the quality of experimental studies involving ups and game-controlled opponents, either of which could humans remains a high priority in fields such as health, introduce wild variation into participants’ experiences [44]. education, and psychology, particularly with respect to However, they also considered relatively normal parts of general study design [11, 19, 43], scale development and racing (e.g., crashing) as potential confounds, and we selection [41, 56, 59], and intervention design, selection, consider this excessive: some degree of variation across and use [7, 12, 35, 52]. In Human-Computer Interaction play sessions is desirable to avoid confounding the effect of (HCI), research has reviewed experimental design for videogame play with that of a constant stimulus [61]. usability testing [20, 21, 28], and the framing of user experience measures [5, 36], with a view towards Applying game modifications (“mods”) – external changes improving research quality. Recent work in the field has to a game’s assets – can be an effective way to exert control examined common issues in the reporting of statistical over unwanted aspects of a game’s design, or to create results in HCI publications [9, 15, 34]; for example, the stimuli more directly tailored to research goals [16]. While limits of null hypothesis significance testing. Videogame modding can be difficult and time-consuming, it may be researchers are presently concerned with improving necessary when desired game content is unavailable or measurement of game enjoyment [45] and PX more broadly otherwise unsuitable for research. However, the relatively [60]. small number of games that feature modding tools may restrict game selection when customised content is required. In proposing videogame selection methods, we aim to Researchers should strongly consider the trade-offs between complement this broad movement towards more robust commercial and custom-made (or modded) games when research design. Existing material is scarce, and papers that constructing their experiment design, and we suggest discuss game selection methods rarely make it their primary reviewing the excellent work on this topic [16, 27] for this topic. Anderson [2], for example, makes only passing purpose. mention of “best practices” for game selection in aggression research, proposing that violent and non-violent games Järvelä et al. [27] discuss desirable characteristics of should exhibit otherwise similar PX profiles. This stimulus stimulus games for event-based research, which analyses matching approach sees continued use within media effects atomised in-game events or tasks (e.g., collecting a power- research (e.g., [4]). However, a recent re-analysis of studies up). Taking events as the unit of analysis allows researchers using this method [22] suggests its difficulty to adequately to make claims about the effects of individual game match games, as a result of issues arising from null- elements (e.g., rewards) that occur across game types and hypothesis significance testing (NHST) and small sample genres [49]. In line with their research approach, Järvelä et sizes, especially in pilot studies. Increasing sample size – al. highlight the need to consider games with temporally specifically, minimum cell size – does make matching isolated and repeated tasks, an inherent capacity for data possible, notwithstanding the costs of obtaining a larger collection (e.g., automatically generated game logs), and sample. Przybylski et al. [53] illustrate an alternative the ability to create a consistent experience across approach, applying prior research within an appropriate participants. On the latter point, the importance of theory as the foundation for selecting (or modifying) games controlling for player skill (e.g., selecting games with difficulty settings) is emphasised [27]. These papers are valuable perspectives into the types of the spirit of the original meaning: for example, even when game characteristics that are generally desirable for Number of Players is a relevant factor, listing every experimental research. However, information on how to possible arrangement of a game’s multiplayer modes is structure the game selection process is less forthcoming: unnecessary if only one will be studied. even when examples are provided in the text (e.g., [26]), The VGMS lacks a measure of game quality that could decisions leading to the final selection remain somewhat improve the specificity of a search. While ‘quality’ is a ambiguous. purely subjective measure, aggregate review scores (e.g., The approach suggested by Järvelä et al. [27] – combining from ) may be interpreted as a correlate of subjective measures of game quality with broad [17] or usability [40], which is generally a desirable attribute to specific [42] game taxonomies – is likely valuable when a consider in the time-limited experimental setting. As small number of games are under consideration as potential previously noted, existing work [27] suggests that these stimuli; indeed, the authors appear to begin from this publicly available scores are a powerful addition to initial position. Taxonomies are less helpful for creating such a search criteria. We therefore recommend ‘quality’ as a shortlist: hundreds of games are released every year on PC practical, if unofficial, addition to criteria derived from the alone [58], and categorising games by taxonomy can be VGMS. immensely time-consuming without first-hand play A number of VGMS items (e.g., genre, mode of play) are experience. Subjective measures of game quality (e.g., presently mentioned, if only briefly, when describing aggregated review scores) can assist in delimiting a search; stimulus games in published research – indeed, many are however, multiple search criteria are needed to manage the associated with existing PX research outcomes. Applying growing volume of commercial game releases. Creating an VGMS items to structure game selection represents a initial shortlist requires a technique that can accurately formalisation of current practices that are partially or remove large numbers of obviously unsuitable games from informally conducted. Because our aim is to provide consideration without individual review. researchers with accessible methods, we present an example THE PROPOSED APPROACH of how the VGMS could be feasibly applied and reported. We therefore propose a videogame selection technique that Example Application of Videogame Metadata delimits an initial search using widely-applicable game The following example details the process of developing characteristics. Recent work in information science [31, 37, criteria for game selection with VGMS categories as a 38] has taken a user-centred approach to metadata guide. It is hoped that presenting such information in this derivation. Their iterative process of item selection was paper will provide practical guidance in interpreting these initially grounded in player interviews, and later refined factors in a plausible context. This example refers to an with online survey data and interviews with a wider range experimental study concerned with the effects of immersion of potential users (e.g., developers, curators). This user- on game enjoyment; a report of the game selection process centred approach reflects a view that classification schema follows. should suit their likely users’ needs; we take a similar position in reinterpreting it for use in structuring game Selection criteria for a game designed to induce immersion selection procedures. The Video Game Metadata Schema were developed via examination of the immersion concept (VGMS), in its current incarnation [39], features both definition [30] and relevant PX research, structured with objective (e.g., platform) and subjective characteristics VGMS items [39] most likely to strongly affect PX. For the (e.g., theme). The latter are coded using a restricted purposes of this research, immersion “involves a lack of vocabulary outlined in the schema. awareness of time, a loss of awareness of the real world, involvement and a sense of being in the task environment” The VGMS is well-suited to shortlist derivation because it [30]. VGMS items selected for further review, in the order features both easily searchable elements (e.g., genre, they appear in the schema, are Gameplay Genre, platform, release date) that can immediately cut wide Progression, Platform, Number of Players, Point of View, swathes of games from consideration, as well as a more Difficulty Control, and Retail Release Date. A measure of detailed set of game elements that are less readily Quality was appended to these items to represent usability searchable (e.g., Number of Players, Point of View) but [40]. remain likely to influence PX nonetheless. Not all factors will apply to every study – however, the breadth of study  Gameplay Genre: The target game should belong to designs in the literature suggests that no standard subset of either the Action Adventure, Action RPG, or RPG genre. ‘optimal’ VGMS items exists: researchers are best Immersive experiences in these genres are more likely, as positioned to identify the factors with the greatest utility in they typically offer compelling narratives, character their own studies. Considering the relevance of each item designs, and audiovisual fidelity [32]. reveals a profile of game attributes identified in the  Progression: A generally linear mode of progression is literature as most likely to elicit a desired PX. We recognise most likely to support continuous involvement in the that appropriating the VGMS for structuring game selection involves some degree of reinterpretation, while retaining game world [33]; this consistent sense of progress is each step of this process has been included as associated with immersion [29]. supplementary material.  Platform: As the game will be played in a university Difficulty computer lab, only games released on PC may be Title Progression Perspective Control considered for the study. Night in the Linear,  Number of Players: Although competitive multiplayer Woods Branching 3rd person None games may strongly induce immersion [10], the target game should feature a single-player mode in order to Hollow Linear, minimise potential confounds caused by the presence of Knight Open world 3rd person None other players. Hellblade: Senua’s DDA,  Perspective: A first-person perspective, compared to Sacrifice Linear 3rd person Settings third-person, is likely to support immersion through direct avatar embodiment [14]. Resident Evil 7: Biohazard Linear 1st person Settings  Difficulty Control: A choice of difficulty level or (ideally) Dynamic Difficulty Adjustment (DDA) will broaden the Little pool of suitable participants, and make immersive Nightmares Linear 3rd person None experiences more likely for participants across skill levels West of [24]. Loathing Open world 3rd person None  Retail Release Date: The candidate game should be a Monster recent release to more closely link research outcomes to Slayers Linear 3rd person Settings the current state of the art. The novelty of a game has also NieR: Linear, been linked to immersive experiences [29]. Automata Open world 3rd person Modular  Quality: Usability issues (e.g., lack of adequate control Pyre Linear 3rd person Settings [40, 50]) may frustrate players and limit immersion; the selected game should therefore minimise usability issues Table 1. Shortlisted games evaluated on progression type, to allow participants to focus entirely on play. Games that camera perspective, and the presence of options to regulate received a strong critical reception are appropriate difficulty. candidates for this reason. One game, Resident Evil 7: Biohazard, met all criteria, and A search was conducted on GameFAQs [18] for PC games five others fail only in their use of 3rd person perspective. released in 2017, categorised as ‘Action Adventure’ or However, we highlight that the goal of using VGMS items ‘Role-Playing’, and sorted according to Metacritic score. as criteria was to derive a shortlist of plausible candidate Only games with 5 or more critic reviews were included in games. The remaining criteria are likely to assist in making the search as a precaution against individual reviewer bias. the final decision; regardless, using the number of satisfied Game quality was operationalised as having an aggregated criteria alone to justify game selection is highly precarious. review score of 80 or greater (on a 100-point scale) on Each shortlisted game has advantages and disadvantages for Metacritic. This process was conducted in September, so use in experimental study beyond the broad factors outlined games released in the final months of 2017 could not be in this process. For example, while Resident Evil: considered. A total of 17 games were identified as meeting Biohazard may indeed be highly immersive, its strong these criteria. Manual inspection revealed 7 games that horror elements may interfere with recruitment or cause were originally released prior to 2017 — these games were participants to drop out during the session. A metadata- released on PC in 2017 (e.g., Bayonetta [51]), expansions based approach should be used to supplement, rather than for a previously-released title (e.g., Darkest Dungeon: The replace, well-considered reasoning for the final selection. Crimson Court [55]), or remasters of older games (e.g., We emphasise that actually playing the shortlisted games at Planescape: Torment - Enhanced Edition [48]). These this time is strongly recommended to qualify the final games were subsequently excluded, as the original design decision-making process. practices that guided these games’ production may not This example report demonstrates that considering the accurately represent the present state of the industry. We range of game elements outlined in the VGMS provides a further identified that one game (Lone Echo [54]) required fuller picture of the final candidate game – and the desired a VR headset to play; lacking such hardware, we removed it PX it helps to create – to researchers and readers alike. One from consideration. The final shortlist of 9 games is strength of this approach is its basis in existing research that presented in Table 1. Each remaining game was assessed relates VGMS items to PX outcomes. However, this according to the remaining criteria; however, all games information may not be available, especially when the featured a single-player mode, and we therefore removed desired PX is not directly tied to a well-studied concept, as this column for legibility. The list of shortlisted games at it was in the previous example. We propose the application of game design theory, developed to connect tangible game have non-violent experiences with the game. A key elements to desired experiences, as one approach to game dynamic arising from progress is for players to see their selection in such an event. contribution towards an in-game goal. Dynamics of accessible challenge involve quick adaptation to game USING THE MDA FRAMEWORK TO IDENTIFY RELEVANT CRITERIA controls, the ability to learn game functions without The MDA (Mechanics, Dynamics, Aesthetics) framework significant penalty, and being able to approach obstacles considers videogames with the view that game designers’ with appropriate levels of confidence. and players’ perspectives towards the medium represent Suitable mechanics then include: opposite approaches – that is, that the act of game design positions developers in a fundamentally different way than  Implicit or delayed fail states players, who primarily consider games with consumption in  Non-violent player actions mind [25]. Within the framework, ‘mechanics’ are the various forms of player action that are afforded by input  Audiovisual representation of player standing relative to devices, and instantiated in code; ‘dynamics’ are the their goal (e.g., progress bar) emergent player interactions that occur through game  Immediately responsive controls mechanics; and ‘aesthetics’ are the emotional states or  A limited number of possible inputs (actions) experiences that result from play. The framework is primarily used in game design (e.g., [13, 23]) as a way of  A tutorial level, or just-in-time tutorial prompts linking game attributes, player behaviour, and PX; this The game dynamics and mechanics identified as supporting quality also makes the framework highly valuable for game a particular set of aesthetics may then be applied as selection in the absence of readily applicable PX research. selection criteria for determining an appropriate candidate This paper’s proposed application of the MDA framework game. We again underscore the incomparable value of begins by describing a set of desired aesthetics (i.e., an researchers playing these shortlisted games before finalising intended PX), which are then used to identify potential their inclusion in the experimental design. dynamics and mechanics that could support such aesthetics Approaching videogame selection from a design – and, with equal utility, those that would be decidedly perspective makes its justification possible in the absence of inappropriate. These characteristics become the selection existing research. Applying the MDA framework, or other criteria for creating a shortlist of suitable games. This game design theories, could assist researchers interested in approach can be used in isolation, or in conjunction with broadening studies of PX beyond established concepts. It VGMS items, to the extent that these items appear relevant may also be valuable in the event that VGMS-based to desired PX outcomes. Similarly, pre-existing research exclusion criteria fail to generate a sufficiently limited can and should be used to support this method where shortlist. However, deriving appropriate mechanics, available. The following example demonstrates use of the dynamics, and aesthetics requires a degree of familiarity method in isolation from prior research and the VGMS to and experience with videogames, as does correctly exhibit the utility of its independent application. We do, identifying games that feature them. (Indeed, researchers however, recommend applying other methods (such as the who are also experienced players and developers will likely VGMS) and relevant existing work wherever possible. make the fullest use of this approach.) We suggest that this Example Application of the MDA Framework represents an opportunity for researchers with limited Consider an example study in which the target game must design expertise to collaborate with developers, lending the repair participant mood following an unrelated frustrating process a measure of multidisciplinary integrity. task. The following section presents a feasible application The process outlined in this section is inherently subjective, of the MDA framework in deriving a set of characteristics and may produce criteria broad enough to describe a variety that the target game should reflect. of videogame types. A degree of care is therefore required Mood Management Theory [63] suggests the selection of a to interpret these criteria in ways that meet study stimulus game that induces positive affect and low arousal requirements. Clear reporting practices are crucial when to counteract the negatively-valenced, arousing frustration applying the MDA framework to game selection. task. These desired emotions can be translated into game DISCUSSION aesthetics that are likely to engender such affective Differences between the two methods proposed in this responses. The aesthetics chosen are gentleness, progress, paper reflect the range of positions in which researchers and accessible challenge; each of these either limits further may find themselves when planning an experimental study. frustration, provides opportunities for satisfaction, or lowers The greatest strength of the metadata-based selection arousal levels. process lies in its direct application of existing work: Dynamics resulting from gentleness as an aesthetic goal videogame metadata derived from an iterative and user- could involve players having the ability to make mistakes centred process, online databases of videogame titles, and without immediately ending the game. Players should also research linking search criteria to PX outcomes. The approach relies on the existence of relevant PX literature, LIMITATIONS AND FUTURE WORK however, which is not always the case. Certainly, no The methods outlined in this paper generally discuss games method (including those proposed here) can claim universal as abstractions – as objects to be categorised, or applicability; other frameworks (e.g., activity theory [46], compilations of design elements that support a particular the Design Box [1]) are likely to yield practical videogame experience. In practice, videogame play is more complex selection methods with different strengths. Regardless of than what analytic methods can encapsulate. We therefore approach, thorough evaluation of connections between find it appropriate to once again recommend, as others have game elements and PX is invaluable during experiment (e.g., [6, 62]), that researchers experience a reasonable design to avoid potential confounds, and hence maximise proportion of the games they study. A degree of critical the value of collected data. distance from the subject matter may indeed limit researcher bias; regardless, the continued importance of A selection process structured using the MDA framework regular playtesting in game development highlights that applies design thinking to link PX with tangible game essential knowledge of the medium is uniquely derived elements that would support a desired experience. The from play. In a similar vein, the present work makes every framework’s robustness and versatility – demonstrated by attempt to encourage more thoughtful game selection its continued use [13, 23, 60], despite its publication in practices; however, it remains possible to apply either 2004 [25] – lends strength to its application in the present method uncritically, adopting a reductionist perspective that work, suggesting utility in accommodating future industry considers games only in terms of their separate elements. directions. The MDA selection process can be conducted While we are unable to directly prevent a paint-by-numbers without a substantial body of existing literature, although approach to research, we hope this paper makes such work research outcomes can be easily incorporated into the easier to identify and critique. process wherever relevant. Unlike many game taxonomies, the MDA framework itself is relatively straightforward; This paper contributes to an ongoing trend in videogame researchers are therefore able to focus on its application, research and HCI that seeks to improve study design, and in rather than interpretation. However, this game selection turn, the quality of research findings in the field. process inherits the subjectivity that accompanies game Videogame research has proceeded for over a decade with design. Greater attention must be paid to reporting the surprisingly sparse literature about how it should be done. procedure as a result. We are enthusiastic about the prospect of more researchers turning their attention to the place of videogames in study Despite these differences, both methods begin from the design, given their apparent irreducibility; in particular, we perspective of studying a particular experience of interest. eagerly anticipate the construction of new videogame This reflects the focus of PX research on the player – who selection practices with strengths that differ from those is obviously important – in the player-game interaction. We presented here. We note, for example, that successful suggest, however, that this has occurred to the detriment of application of the VGMS-based method (when used in considering the videogame’s role in helping elicit desirable isolation) depends on existing study of the subject matter; it experiences. The approaches identified in this paper correct is hence unsuited to shortlist derivation when few of its this imbalance, acknowledging the equal contribution of game elements appear linked to PX in the extant literature. videogames in producing PX. In attending to the videogame as artefact, and (comparatively) distancing ourselves from CONCLUSIONS the player, we approach PX from a perspective that This paper has proposed two approaches to videogame considers player characteristics – which are largely beyond selection in experimental PX research. The first, based on researcher control – less relevant to PX experiment design. videogame metadata, can quickly and accurately remove large numbers of games from consideration, and is best- Videogame selection is always subjective. In forming links suited to studies with strong theoretical connections to between game elements and experience, the proposed existing PX research. The second approach, based on the methods ensure that researchers clarify their subjective MDA framework, provides a means for researchers to interpretations of experience, which may be otherwise derive a list of game elements likely to support a desired indistinct (e.g., immersion, presence, and flow [8, 45]); experience – and a corresponding set of appropriate games greater retention of conceptual consistency and legibility – when existing literature is less readily applicable. across the literature can be attained as a result. Indeed, these Example applications of both methods demonstrate that methods aim to promote considered reflection on how reporting the selection process clarifies the relationship videogames – and game elements – contribute to PX, between candidate games and research goals. Rigour in support meaningful engagement with relevant literature, designing and reporting videogame research is crucial for and assist in structuring a rigorous, detailed report of how study replication and the credibility of the field; both would these practices have influenced the subjective selection make funding PX research a more viable prospect. process. We reiterate that formulaic application of either method is unlikely to benefit neither individual experiment design nor videogame research as a field. ACKNOWLEDGMENTS Journal of Human-Computer Studies 71, 11, 1069- This work was completed with the benefit of the Australian 1077. Postgraduate Award (APA) and the QUT Excellence Top- [11] Donald T. Campbell and Julian C. Stanley. 2015. Up Scholarship. Experimental and Quasi-Experimental Designs for REFERENCES Research. Ravenio Books. [1] Roger Altizer Jr. and José P. Zagal. 2014. [12] Peter Craig, Paul Dieppe, Sally Macintyre, Susan Designing Inside the Box or Pitching Practices in Michie, Irwin Nazareth, and Mark Petticrew. 2008. Industry and Education. In Proceedings of DiGRA Developing and Evaluating Complex 2014. Interventions: The New Medical Research Council [2] Craig A. Anderson. 2004. An Update on the Guidance. BMJ 337. Effects of Playing Violent Video Games. Journal [13] Christy Dena. 2017. Finding a Way: Techniques to of Adolescence 27, 1, 113-122. Avoid Schema Tension in Narrative Design. [3] Craig A. Anderson, Nicholas L. Carnagey, Mindy Transactions of the Digital Games Research Flanagan, Arlin J. Benjamin Jr., Janie Eubanks, Association 3, 1, 27-61. and Jeffery C. Valentine. 2004. Violent Video [14] Alena Denisova and Paul Cairns. 2015. First Games: Specific Effects of Violent Content on Person vs. Third Person Perspective in Digital Aggressive Thoughts and Behavior. Advances in Games: Do Player Preferences Affect Immersion? Experimental Social Psychology 36, 199-249. In Proceedings of the 33rd Annual ACM [4] Patrícia Arriaga, Joana Adrião, Filipa Madeira, Conference on Human Factors in Computing Inês Cavaleiro, Alexandra Maia e Silva, Isabel Systems. 145-148. Barahona, and Francisco Esteves. 2015. A "Dry [15] Mark D. Dunlop and Mark Baillie. 2009. Paper Eye" for Victims of Violence: Effects of Playing a Rejected (p>0.05): An Introduction to the Debate Violent Video Game on Pupillary Dilation to on Appropriateness of Null-Hypothesis Testing. Victims and on Aggressive Behavior. Psychology International Journal of Mobile Human Computer of Violence 5, 2, 199-208. Interaction 1, 3, 1-8. [5] J. A. Bargas-Avila and K. Hornbæk. 2011. Old [16] Malte Elson and Thorsten Quandt. 2016. Digital Wine in New Bottles or Novel Challenges: A Games in Laboratory Experiments: Controlling a Critical Analysis of Empirical Studies of User Complex Stimulus Through Modding. Psychology Experience. In Proceedings of the SIGCHI of Popular Media Culture 5, 1, 52-65. Conference on Human Factors in Computing [17] Christian Elverdam and Espen Aarseth. 2007. Systems. ACM, 2689-2698. Game Classification and Game Design: [6] Pippin Barr, James Noble, and Robert Biddle. Construction Through Critical Analysis. Games 2007. Video Game Values: Human-Computer and Culture 2, 1, 3-22. Interaction and Games. Interacting with [18] GameFAQs. 2017. Search GameFAQs. Retrieved Computers 19, 2, 180-195. 17 September, 2017 from [7] Debbie Bonetti, Martin Eccles, Marie Johnston, https://www.gamefaqs.com/search_advanced Nick Steen, Jeremy Grimshaw, Rachel Baker, [19] Russell Gersten, Scott Baker, and John Wills Anne Walker, and Nigel Pitts. 2005. Guiding the Lloyd. 2000. Designing High-Quality Research in Design and Selection of Interventions to Influence Special Education: Group Experimental Design. the Implementation of Evidence-Based Practice: The Journal of Special Education 34, 1, 2-18. An Experimental Simulation of a Complex [20] Wayne D. Gray and Marilyn C. Salzman. 1998. Intervention Trial. Social Science & Medicine 60, Damaged Merchandise? A Review of Experiments 9, 2135-2147. that Compare Usability Evaluation Methods. [8] Elizabeth A. Boyle, Thomas M. Connolly, Thomas Human-Computer Interaction 13, 3, 203-261. Hainey, and James M. Boyle. 2012. Engagement [21] Morten Hertzum and Niels Ebbe Jacobsen. 2001. in Digital Entertainment Games: A Systematic The Evaluator Effect: A Chilling Fact About Review. Computers in Human Behavior 28, 3, Usability Evaluation Methods. International 771-780. Journal of Human-Computer Interaction 13, 4, [9] Paul Cairns. 2007. HCI... Not as it Should Be: 421-443. Inferential Statistics in HCI Research. In [22] Joseph Hilgard, Christopher R. Engelhardt, Bruce Proceedings of the 21st British HCI Group Annual D. Bartholow, and Jeffrey N. Rouder. 2017. How Conference on People and Computers: HCI...But Much Evidence Is p > .05? Stimulus Pre-Testing not as we Know It - Volume 1. 195-201. and Null Primary Outcomes in Violent Video [10] Paul Cairns, Anna L. Cox, Matthew Day, Hayley Games Research. Psychology of Popular Media Martin, and Thomas Perryman. 2013. Who But Culture 6, 4, 361-380. Not Where: The Effect of Social Play on [23] Clint Hocking. 2011. Dynamics: The State of the Immersion in Digital Games. International Art. Video. [24] Robin Hunicke and Vernell Chapman. 2004. AI An Underrecognized Confound in Intervention for Dynamic Difficulty Adjustment in Games. In Research. Psychological Bulletin 130, 2, 289-303. Challenges in Game Artificial Intelligence. AAAI, [36] Effie Lai-Chong Law, Paul van Schaik, and Virpi 91-96. Roto. 2014. Attitudes Towards User Experience [25] Robin Hunicke, Marc LeBlanc, and Robert Zubek. (UX) Measurement. International Journal of 2004. MDA: A Formal Approach to Game Design Human-Computer Studies 72, 6, 526-541. and Game Research. In Proceedings of the AAAI [37] Jin Ha Lee, Hyerim Cho, Violet Fox, and Andrew Workshop on Challenges in Game AI. Perti. 2013. User-Centered Approach in Creating a [26] Simo Järvelä, Inger Ekman, J. Matias Kivikangas, Metadata Schema for Video Games and Interactive and Niklas Ravaja. 2014. A Practical Guide to Media. In Proceedings of the 13th ACM/IEEE-CS Using Digital Games as an Experiment Stimulus. Joint Conference on Digital Libraries. 229-238. Transactions of the Digital Games Research [38] Jin Ha Lee, Rachel Ivy Clarke, and Andrew Perti. Association 1, 2. 2015. Empirical Evaluation of Metadata for Video [27] Simo Järvelä, Inger Ekman, J. Matias Kivikangas, Games and Interactive Media. Journal of the and Niklas Ravaja. 2015. Stimulus Games. In Association for Information Science and Game Research Methods. ETC Press. Technology 66, 12, 2609-2625. [28] Monique W. M. Jaspers. 2009. A Comparison of [39] Jin Ha Lee, Andrew Perti, Rachel Ivy Clarke, Usability Methods for Testing Interactive Health Travis W. Windleharth, and Marc Schmalz. 2017. Technologies: Methodological Aspects and Video Game Metadata Schema Version 4.0. Empirical Evidence. International Journal of Retrieved from Medical Informatics 78, 5, 340-353. http://gamer.ischool.uw.edu/official_release/ [29] Charlene Jennett. 2010. Is Game Immersion Just [40] Ian J. Livingston, Regan L. Mandryk, and Kevin Another Form of Selective Attention? An G. Stanley. 2010. Critic-Proofing: How Using Empirical Investigation of Real World Critic Review and Game Genres Can Refine Dissociation in Computer Game Immersion. PhD Heuristic Evaluations. In Proceedings of the Thesis. University College London, London, International Academic Conference on the Future England. of Game Design and Technology. ACM, 48-55. [30] Charlene Jennett, Anna L. Cox, Paul Cairns, [41] Kathleen N. Lohr. 2002. Assessing Health Status Samira Dhoparee, Andrew Epps, Tim Tijs, and and Quality-of-Life Instruments: Attributes and Alison Walton. 2008. Measuring and Defining the Review Criteria. Quality of Life Research 11, 3, Experience of Immersion in Games. International 193-205. Journal of Human-Computer Studies 66, 9, 641- [42] Sus Lundgren and Staffan Bjork. 2003. Game 661. Mechanics: Describing Computer-Augmented [31] Jacob Jett, Simone Sacchi, Jin Ha Lee, and Rachel Games in Terms of Interaction. In Proceedings of Ivy Clarke. 2016. A Conceptual Model for Video the 2003 Technologies for Interactive Digital Games and Interactive Media. Journal of the Storytelling and Entertainment Conference. Association for Information Science and [43] Frances Mair and Pamela Whitten. 2000. Technology 67, 3, 505-517. Systematic Review of Studies of Patient [32] Daniel Johnson, Lennart Nacke, and Peta Wyeth. Satisfaction with Telemedicine. BMJ 320, 7248, 2015. All about that Base: Differing Player 1517-1520. Experiences in Video Game Genres and the [44] Ryan P. McMahan, Eric D. Ragan, Anamary Leal, Unique Case of MOBA Games. In Proceedings of Robert J. Beaton, and Doug A. Bowman. 2011. 33rd Annual ACM Conference on Human Factors Considerations for the Use of Commercial Video in Computing Systems. 2265-2274. Games in Controlled Experiments. Entertainment http://dx.doi.org/10.1145/2702123.2702447 Computing 2, 1, 3-9. [33] Daniel Johnson, Peta Wyeth, Penelope Sweetser, [45] Elisa D. Mekler, Julia Ayumi Bopp, Alexandre N. and John Gardner. 2012. Personality, Genre and Tuch, and Klaus Opwis. 2014. A Systematic Videogame Play Experience. In Proceedings of the Review of Quantitative Studies on the Enjoyment 4th International Conference on Fun and Games. of Digital Entertainment Games. In Proceedings of ACM, 117-120. the 32nd Annual ACM Conference on Human [34] Maurits Kaptein and Judy Robertson. 2012. Factors in Computing Systems. ACM, 927-936. Rethinking Statistical Analysis Methods for CHI. [46] Bonnie A. Nardi, ed. Context and Consciousness: In Proceedings of the SIGCHI Conference on Activity Theory and Human-Computer Interaction. Human Factors in Computing Systems. ACM, 1996, MIT Press: Massachusetts, USA. 1105-1114. [47] Nintendo EAD. 2008. Mario Kart Wii. Videogame [35] Robert E. Larzelere, Brett R. Kuhn, and Byron [Wii]. Nintendo, Kyoto, Japan. Johnson. 2004. The Intervention Selection Bias: [48] Overhaul Games. 2017. Planescape: Torment - Selection and Evaluation. Measures of Personality Enhanced Edition. Videogame [PC]. Beamdog, and Social Psychological Attitudes 1, 3, 1-16. Edmonton, Canada. [57] Robert Rosenthal. 1979. The File Drawer Problem [49] Cody Phillips, Daniel Johnson, Peta Wyeth, and Tolerance for Null Results. Psychological Leanne Hides, and Madison Klarkowski. 2015. Bulletin 86, 3, 638-641. Redefining Videogame Reward Types. In [58] SteamSpy. 2017. Monthly Summaries. Retrieved Proceedings of the Annual Meeting of the 7th September, 2017 from Australian Special Interest Group for Computer http://steamspy.com/year/ Human Interaction. 83-91. [59] Caroline B. Terwee, Sandra D. M. Bot, Michael R. [50] David Pinelle, Nelson Wong, and Tadeusz Stach. de Boer, Daniëlle A. W. M. van der Windt, Dirk L. 2008. Heuristic Evaluation for Games: Usability Knol, Joost Dekker, Lex M. Bouter, and Henrica Principles for Video Game Design. In Proceedings C. W. de Vet. 2007. Quality Criteria Were of the SIGCHI Conference on Human Factors in Proposed for Measurement Properties of Health Computing Systems. ACM, 1453-1462. Status Questionnaires. Journal of Clinical [51] PlatinumGames. 2017. Bayonetta. Videogame Epidemiology 60, 1, 34-42. [PC]. SEGA, Tokyo, Japan. [60] Vero Vanden Abeele, Lennart E. Nacke, Elisa D. [52] Michael Pressley, Steve Graham, and Karen Mekler, and Daniel Johnson. 2016. Design and Harris. 2006. The State of Educational Preliminary Validation of The Player Experience Intervention Research as Viewed Through the Inventory. In Proceedings of the 2016 Annual Lens of Literacy Intervention. British Journal of Symposium on Computer-Human Interaction in Educational Psychology 76, 1, 1-19. Play Companion Extended Abstracts. ACM, 335- [53] Andrew K. Przybylski, Edward L. Deci, C. Scott 341. Rigby, and Richard M. Ryan. 2014. Competence- [61] Gary L. Wells and Paul D. Windschitl. 1999. Impeding Electronic Games and Players' Stimulus Sampling and Social Psychological Aggressive Feelings, Thoughts, and Behaviors. Experimentation. Personality and Social Journal of Personality and Social Psychology 106, Psychology Bulletin 25, 9, 1115-1125. 3, 441-457. [62] Dmitri Williams. 2005. Bridging the [54] . 2017. Lone Echo. Videogame Methodological Divide in Game Research. [PC]. Studios, California, USA. Simulation & Gaming 36, 4, 447-463. [55] Red Hook Studios. 2017. Darkest Dungeon - The [63] Dolf Zillmann. 1988. Mood Management Through Crimson Court. Videogame [PC]. Red Hook Communication Choices. The American Studios, Vancouver, Canada. Behavioral Scientist 31, 3, 327-340. [56] John P. Robinson, Phillip R. Shaver, and

Lawrence S. Wrightsman. 1991. Criteria for Scale