A STUDY OF TECHNIQUES FOR MEASURING ENJOYMENT IN VIDEO CONTAINING

By ELIZABETH A. MATTHEWS

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2019 2019 Elizabeth A. Matthews I dedicate this to everyone who believed in me, even when I didn’t. To my friends and family: you laid the groundwork for me to start running. To my past self: your struggle got me to the end. To those in the future: know that you can do it too. ACKNOWLEDGMENTS First, I would like to thank my advisor, Dr. Juan E Gilbert, for believing in me since the beginning. I would also like to thank my committee, for being interested and invested in my progress and success. Thanks to my parents, Robin A Matthews and Geoffrey B Matthews, for their experience and guidance through a stressful time. Thanks to everyone in the Human Experience Lab at the University of Florida for their help, friendship, and support. Specifically, thank you to Rua Williams, Briana Posadas, and DeKita Moon for their thorough feedback on this dissertation. I would like to extend special thanks to Bodie Lee, for always being there for me. Finally, I would like to thank my friends outside of academia, for their listening and encouragement.

4 TABLE OF CONTENTS page ACKNOWLEDGMENTS...... 4 LIST OF TABLES...... 8 LIST OF FIGURES...... 9 ABSTRACT...... 13

CHAPTER 1 INTRODUCTION...... 15 1.1 Problem and ...... 15 1.2 Procedural Content Generation...... 15 1.3 Enjoyment...... 16 1.4 Overview of Goals, Research Questions, and Thesis Statement...... 17 1.5 Contributions...... 18 1.6 Organization of the Dissertation...... 19 2 RELATED WORK / LITERATURE REVIEW...... 20 2.1 Defining Game Enjoyment...... 20 2.1.1 GameFlow...... 20 2.1.2 Motivational States...... 24 2.1.3 Emotional States...... 26 2.1.4 Needs Satisfaction...... 30 2.1.5 Engagement...... 32 2.1.6 Summary...... 38 2.2 Measuring Game Enjoyment...... 38 2.2.1 Subjective Measurements...... 38 2.2.2 Objective Measurements...... 40 2.2.3 List of Measurement Tools...... 43 2.2.4 Summary...... 48 2.3 Procedural Generation in Video Games...... 48 2.3.1 Independent Procedural Content Generation...... 49 2.3.2 Experience Driven Procedural Content Generation...... 50 2.3.3 Summary...... 51 3 IMPLICATIONS...... 52 3.1 Problem Statements and Proposed Solution...... 52 3.2 Research Questions and Hypotheses to be Tested...... 52

5 4 APPROACH...... 54 4.1 User Study Design Overview...... 54 4.2 Participants...... 56 4.3 Game Types...... 56 4.4 Game Enjoyment Metric and Measurement...... 58 4.5 Data Collection and Analysis...... 59 4.5.1 Demographics...... 61 4.5.2 The Game Experience Questionnaire...... 62 4.5.3 The Fang et al. Questionnaire...... 62 4.5.4 Physiological Data with the Empatica...... 62 5 2D+3D INFINITE RUNNER ENGINE...... 66 5.1 Game Style Choice...... 66 5.2 Procedural Content Generation...... 66 5.3 Static Content Generation...... 66 5.4 Manual Designs...... 67 6 ATLAS CHRONICLE...... 69 6.1 Story Abstraction...... 69 6.2 Physics Engine...... 70 6.3 Terrain Generation...... 71 6.4 Recursive Process...... 72 6.5 Terrain Mapping...... 72 6.6 Testing ...... 73 6.6.1 Storyline and Progression...... 73 6.6.2 2D Engine...... 74 6.7 Static Content Generation...... 74 6.8 Manual Designs...... 74 7 DATA AND ANALYSIS...... 83 7.1 Data...... 83 7.2 Analysis...... 83 7.2.1 Objective Measurements...... 84 7.2.2 Subjective Measurements...... 86 7.2.2.1 Analysis Procedure...... 87 7.2.2.2 Decline of Enjoyment...... 91 7.2.2.3 Gender and Gamer Label Differences...... 93 7.3 Results...... 94 8 SUMMARY AND FUTURE WORK...... 113 8.1 Summary and Research Questions Revisited...... 113 8.2 Contributions...... 113 8.3 Limitations and Future Work...... 114

6 8.4 Conclusions...... 114

APPENDIX A DESIGN MATERIALS...... 116 A.1 Infinite Runner Designer...... 116 A.2 RPG Designer...... 117 B USER STUDY MATERIALS...... 119 B.1 Recruitment Flyer...... 119 B.2 Screening Form...... 120 B.3 Informed Consent Form...... 121 B.4 Instructions...... 124 B.5 Game Experience Questionnaire...... 126 B.5.1 In-Game GEQ...... 126 B.5.2 Scoring Guide...... 126 B.6 Fang et al. Questionnaire...... 127 C ADDITIONAL STUDY DATA...... 128 C.1 Subjective Data...... 128 C.1.1 Before and After by Generation Type...... 128 C.1.2 Game and Generation...... 141 C.1.3 Gender...... 143 REFERENCES...... 145 BIOGRAPHICAL SKETCH...... 157

7 LIST OF TABLES Table page 4-1 Table of participant controlled randomized order assignment. Letters represent the following: s = static, m = manual, and p = procedural content generation, Run = infinite runner game, RPG = role playing game. Order uses a Latin Square of size 3 for both game types...... 65 7-1 A traditional numerical contingency table for GEQ03 ‘‘I felt bored.’’ The left column indicates the numerical response value. The other columns are for each segment during which the data is collected. The numbers in the segment columns represent the total for each response value in that segment...... 102 7-2 A traditional numerical contingency table with superscript indicating the pairwise test results for GEQ03 ‘‘I felt bored.’’ Numbers are the same as Table 7-1...... 102 A-1 Controls for the Infinite Runner Designer Program...... 117 A-2 Table for manual color mapping...... 118

8 LIST OF FIGURES Figure page 1-1 Two variations of dungeon layouts in Rogue [1]...... 19 4-1 The typical features of an infinite runner with the PCG in green. The user controls a runner with constant velocity by pressing a button to jump. Procedural generation controls the jump distance, the next platform’s height relative to the current platform, the next platform’s length, and optional obstacles. Good PCG assures that the layouts are always possible...... 63 4-2 The features of a possible RPG with PCG setup. A story is defined by important locations of interest and connections between them. PCG then varies the relative cardinal directions, distances, and landmass shapes around the locations. Good PCG would assure that the progression of locations in the story is not broken by impossible land features that would otherwise prevent the subject from following the intended storyline...... 64 4-3 Mockup of the setup for the study. Participant will be seated a comfortable distance from the computer monitor, between one to three feet...... 64 5-1 Example of the implemented 2D side-scrolling jumping game designed with 2D+3D infinite runner engine [2]. The user controls the square by jumping between platforms. 68 6-1 Example of story abstraction with three LOI and three restrictions. The restrictions between A/B and B/C are traversable restrictions and the restriction between A/C is not...... 75 6-2 The three states in which a slide spring can exist. Top image: the connection between A and B is less than MAX and greater than MIN distances. Middle image: the distance between A and B is less than MIN, force is applied to push A and B away from each other. Bottom image: the distance between A and B is greater than MAX, force is applied to pull A and B towards each other...... 75 6-3 The process from the coordinates generated by the space manager to 2D map.... 76 6-4 Recursive process for continent ending with one possible content generated...... 77 6-5 Recursive process for world ending with one possible world generated...... 78 6-6 Example of noise added to climate mapping for more natural boundaries...... 79 6-7 Terrain Boundary Map used in examples for the testing game...... 79 6-8 Example of noise added to climate mapping for more natural boundaries...... 80 6-9 LOI structure for minimal RPG to be used...... 81 6-10 Example gameplay of the 2D RPG testing engine for Atlas Chronicle...... 82

9 7-1 The color key used for all physiological box plots. Red indicates the mean of the box plot is close to the overall mean for the participant for the day and cyan indicates the mean of the box plot is the furthest...... 96 7-2 Skin temperature for Participant 14. X-axis is the time segment during which the data was collected. Y-axis is the measured skin temperature scaled to the participant’s average skin temperature for the day. Colors used from the color scale in Figure 7-1. Assigned order was RPG-p, RPG-s, RPG-m, Run-m, Run-p, Run-s. For the RPG game session, skin temperature showed a distinct pattern of increasing, plateauing, then decreasing. Variance was less for the Runner game...... 96 7-3 Skin conductivity (EDA) results for Participant 9. X-axis is the time segment during which the data was collected. Y-axis is the measured skin conductivity scaled to the participant’s average skin conductivity for the day. Colors used from the color scale in Figure 7-1. Assigned order was RPG-s, RPG-m, RPG-p, Run-s, Run-m, Run-p. The Runner showed a pattern of increasing conductivity during the first two generation types, with a final plateau. The RPG showed no discernible pattern. The 5th segment was lost for Run-s due to software glitches...... 97 7-4 Skin temperature results for Participant 3. X-axis is the time segment during which the data was collected. Y-axis is the measured skin temperature scaled to the participant’s average skin temperature for the day. Colors used from the color scale in Figure 7-1. Assigned order was Run-s, Run-m, Run-p, RPG-m, RPG-p, RPG-s. Notable patterns for all sessions is starting low and increasing during game play...... 98 7-5 Skin temperature results for Participant 8. X-axis is the time segment during which the data was collected. Y-axis is the measured skin temperature scaled to the participant’s average skin temperature for the day. Colors used from the color scale in Figure 7-1. Assigned order was Run-p, Run-s, Run-m, RPG-p, RPG-s, RPG-m. The RPG shows a pattern of increasing and then plateau. The Runner game shows an unusual pattern of minimal variance. The 5th segment was lost for Runner-m and Runner-s due to Empatica software glitches...... 99 7-6 Skin temperature results for Participant 9. X-axis is the time segment during which the data was collected. Y-axis is the measured skin temperature scaled to the participant’s average skin temperature for the day. Colors used from the color scale in Figure 7-1. Assigned order was RPG-s, RPG-m, RPG-p, Run-s, Run-m, Run-p. Contrary to Figure 7-3, Participant 9’s skin conductivity did not produce the same pattern.. 100 7-7 Heart rate results for Participant 4. X-axis is the time segment during which the data was collected. Y-axis is the measured heart rate scaled to the participant’s average heart rate for the day. Colors used from the color scale in Figure 7-1. No discernible pattern was observed...... 101

10 7-8 A visual table of the results for the question ‘‘I felt bored’’ pooled by time segment. Y-axis is the response value from 0 (not at all) to 4 (extremely). X-axis on the top labels the segment for which the response was recorded. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different... 103 7-9 The color key used in the subjective contingency tables. Key depends on the data shown...... 103 7-10 Responses to the question ‘‘I feel exhausted when playing this game’’ pooled by time segment. Y-axis is the summation of response values, ranging from 0 to 8. X-axis on the top labels the segment for which the response was recorded. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different...... 104 7-11 Responses to the GEQ component ‘‘’’ pooled by game type. Y-axis is the summation of response values, ranging from 0 to 8. X-axis on the top labels the game type for which the response was recorded. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different...... 105 7-12 Responses to the GEQ component ‘‘negative affect’’ pooled by time segment and separated by game type. Y-axis is the summation of response values, ranging from 0 to 8. X-axis on the top labels the segment for which the response was recorded. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different...... 106 7-13 The GEQ component ‘‘negative affect’’ pooled by time segment and generation type, limited to the first (1) and last (5) segments. Y-axis is the summation of response values, ranging from 0 to 8. X-axis on the top labels the segment and generation type for which the response was recorded. The key for generation types is m=manual, p=procedural, and s=static. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different...... 107 7-14 Responses to the GEQ component ‘‘negative affect’’ pooled by time segment and generation type, limited to the first (1) and last (5) segments and separated by game type. Y-axis is the summation of response values, ranging from 0 to 8. X-axis on the top labels the segment and generation type for which the response was recorded. The key for generation types is m=manual, p=procedural, and s=static. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different...... 108 7-15 GEQ components with results pooled by game and generation type. Y-axis is the summation of response values, ranging from 0 to 8. X-axis on the top labels the game and generation type for which the response was recorded. The key for generation types is m=manual, p=procedural, and s=static. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different...... 109

11 7-16 GEQ components with results pooled by gender. Y-axis is the summation of response values, ranging from 0 to 8. X-axis on the top labels the identified gender of the participant for which the response was recorded. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different...... 110 7-17 Scores for the Fang et al. question ‘‘I feel worried when playing this game’’ pooled by gender. Y-axis is the response values ranging from 0 to 4. X-axis on the top labels the identified gender of the participant for which the response was recorded. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different...... 111 7-18 GEQ components with results pooled by self-ascribed gamer level for the Runner game. Y-axis is the summation of response values, ranging from 0 to 8. X-axis on the top labels the identified gamer level of the participant for which the response was recorded. Gamer levels are pooled into two categories, more experienced (Ex+Fr, Expert and Frequent), and less experienced (Ca+Ne, Casual and Newbie). Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different...... 112

12 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy A STUDY OF TECHNIQUES FOR MEASURING ENJOYMENT IN VIDEO GAMES CONTAINING PROCEDURAL GENERATION By Elizabeth A. Matthews August 2019 Chair: Juan E. Gilbert Major: Human-Centered Computing If asked why one plays video games, a common response would be ‘‘because it’s an enjoyable experience.’’ One of the efforts towards creating enjoyable games involves the generation of content for said games. Procedural Content Generation for Games (PCG-G) is the application of computers to generate game content and selecting the enjoyable items for use in games. While many academic papers on creating procedural content generation exist, there is minimal research into the enjoyment factors improved by games containing procedurally generated content. When researchers advocate for procedural content generation, one of the common rationales is that PCG ‘‘enhances .’’ In this dissertation is a proposal for a testing framework for procedural generation which provides evidence as to what types of enjoyment are affected by procedural generation. This research also provides an overview of current research in the measurement of enjoyment in video games and select established methods to test via user study while designing game studies. Subjective and objective measurements were measured during the user study to help establish which kinds of enjoyment factors are affected by PCG. The study utilized two different types of games in user tests and three different content generation approaches for comparison. Eighteen participants (11 Male, 7 Female), aged between 18 and 24 years old (average age of 20) were screened based on their prior gaming experience. Results show that computer generated levels were not significantly different from manually designed levels, nor were the computer generated levels different from static levels. Enjoyment is difficult to measure as it is a subjective experience

13 even though most people would say, ‘‘I know it when I see it’.’ However, despite these difficulties in measurement, many academic papers claim enjoyment as their primary motivation for exploring new procedural generation approaches, despite the fact that minimal research has been conducted to back up the claim that PCG enhances ‘‘replay value.’’ This research began the first steps to explore enjoyment factors as they relate to PCG, but future work still needs to be performed before these claims are confirmed.

14 CHAPTER 1 INTRODUCTION 1.1 Problem and Motivation

If asked why one plays video games, a common response would be ‘‘because it’s an enjoyable experience’’ [3]. Video games are a popular part of leisure activity, which is demonstrated by the fact that revenues currently outstrip movie revenues by more than 25 percent [4, 5]. While economic factors such as these imply it is lucrative to study video games, the understanding of the theory of enjoyment in video games is the primary motivator for this work. ‘‘Enjoyment’’ is naturally a subjective experience. Initial investigations in the psychological literature are still incomplete and need further research to solidify the approach [6,7]. One of the efforts towards creating more enjoyable games involves the generation of content for said games. Traditionally, this content has been generated manually by human game designers and level designers. However, the two major problems with manual content generation for games are the expense [8] and the fact that it does not scale [9]. Procedural Content Generation for Games (PCG-G) is the application of computers to generate game content and select the enjoyable items for use in games [10]. While there is a large quantity of research in procedural content generation, there is minimal research into the enjoyment factors improved by games containing procedurally generated content. 1.2 Procedural Content Generation

Procedural Content Generation (PCG), sometimes referred to as just Procedural Generation (PG), is the generation of virtual content by computers, typically through a human-defined procedure. While it is possible to use procedural generation to create content for a wide variety of subjects, one of the most commonly applied areas is in video games. As games get larger and more expansive, a cheaper and faster approach to generating the game

15 content is highly desirable. The types of applications for which procedural content is generated can range from platformer games [11] to experience-driven systems [12]. PCG has been utilized in games for a long time, starting with one of the earliest games, Rogue [1]. Rogue was an where the goal was to explore a cave-like level to advance until the end, where the cave level layout, enemy placement, etc. were generated procedurally. The game used ASCII characters for visualization. Despite the lack of advanced graphics, the game was so popular it sparked a colloquialism to describe games similar in concept: ‘‘’’ [13]. Two examples of generated content from Rogue can be seen in Figure 1-1. Games with PCG can have a minimal amount of generated content, but the most prevalent and noticeable forms are repeated-play variations. Games like [14] have the same storyline each time the game is started by a gamer, but the dungeons in which the player progresses through the story are procedurally generated. As a result, each time the game is played, the exploration is different. Although this effect is possible to achieve with manual content generation, it is unfeasible due to the sheer amount of time needed to create enough levels so that a gamer would not notice repeating levels after repeatedly playing the game. One of the most common claims in the motivation for studying PCG is that the addition of procedural generation enhances replay value, or replayability. Replay value is the concept of continued enjoyment over repeated plays of the game. Most research does not provide a rigorous definition or examination of game enjoyment, but rather simply claims outright that enhanced game enjoyment occurs. 1.3 Game Enjoyment

Game enjoyment has been studied quite thoroughly, utilizing both subjective and objective tools. The of enjoyment has been categorized in order to provide focus for measuring a subjective experience. In order to know which tools to use, one must first decide what aspect of enjoyment one seeks to measure.

16 When motivating the use of procedural content generation, one of the common rationales is that PCG ‘‘enhances replay value.’’ What is meant by this claim is that upon repeated plays, the enjoyment factor of playing the game each time is enhanced more when PCG is used than when the game is static. However, these claims are rarely backed with quantifiable data created from a Human-Computer Interaction (HCI) perspective. 1.4 Overview of Goals, Research Questions, and Thesis Statement

The problem addressed here arises from the evaluation of enjoyment in PCG systems. There are two approaches to evaluating enjoyment of computer generated content. The most common approach is to use fitness functions and visualization of content space [15]. Functions and visualization are widely tested and useful for ensuring the feasibility of the content generated; however, the algorithmic approach lacks the human-centered approach that is required for any product generated for a human user. Research recommends using fitness visualization to trim content down to ‘‘better’’ content before utilizing subjective HCI methods [15]. However, the same research recommends employing user-derived measurements to finalize the claims [15]. User-derived measurements can take the form of subjective items, such as asking the user to rate their experience, or objective items, such as the user’s heart rate. Despite being the recommended approach, most research stops short of using user studies at all, or even begs the question by simply stating the claim that procedural generation enhances replay value without providing any evidence to support that statement [16--34]. The goals of this research are to apply a human-centered approach to the evaluation of procedural generation and its contribution to video game enjoyment. This research tests whether the current, established tools for measurement of enjoyment can capture what procedural generation contributes to game enjoyment. The study uses established tools that have been used successfully in other research to measure enjoyment in video games, and applies them to measure replay value, the claimed contribution of PCG to games. The research addresses the following questions:

17 • Does procedural generation of infinite runners and role playing games (RPGs) provide enjoyment enhancement upon repeated plays of these games compared to static game environments?

• Does procedural generation of infinite runners and RPGs provide repeated-play enjoyment enhancement equivalent to manual generation?

• Does procedural generation provide the same kind of repeated-play enjoyment enhancement between types of procedural generation for infinite runners and RPGs? The thesis statement for this research is: current tools for the measurement of enjoyment in games are sufficient to measure some, but not all, of the interactions between enjoyment, game types, methods of level generation, and user characteristics. The decline of enjoyment over time is also measurable but indistinguishable between human-designed, computer-designed, and unchanging levels. Adding procedural generation to different video game types affects different aspects of enjoyment depending on the game type. 1.5 Contributions

This study relates to both the fields of Human-Centered Computing and the study of new procedural generation techniques for video games. This research contributes to video by exploring the assumption that procedural generation contributes to the retention of enjoyment over repeated game plays. The results suggest that procedural generation is no different than a static design and also no different than manually designing multiple levels. If a game designer wishes to have multiple levels for a player to experience, then procedural generation is the ideal choice due to how much faster and cheaper it is than the manual design process. A potential problem is identified that the assumption about PCG is not measurable by the current tools of enjoyment. Further research into the measurement of enjoyment produced by procedural generation, rather than accepting enhanced enjoyment as a given, can help researchers and game designers better understand what PCG can bring to video games, and therefore design better algorithms for the implementation of PCG.

18 1.6 Organization of the Dissertation

This dissertation is organized into eight chapters. Chapter1 covers the introduction to the problem and behind this research, as well as a brief overview of the contributions. Chapter2 is a literature review of both measures of enjoyment and procedural generation. In Chapter3 the research questions are further detailed and hypotheses are provided for testing. Chapters4,5, and6 cover the design and approach of the study performed. Chapter7 covers the data collected, its analysis, and results. This dissertation ends with Chapter8, which contains a summary of the findings and proposals for future work.

Figure 1-1. Two variations of dungeon layouts in Rogue [1].

19 CHAPTER 2 RELATED WORK / LITERATURE REVIEW This chapter summarizes the published works that are related to defining game enjoyment, measuring game enjoyment, and procedural generation in video games. This section highlights the gaps in the current state of research, and demonstrates how this research contributes to the state of knowledge in measuring enjoyment of video games. 2.1 Defining Game Enjoyment

Enjoyment of a video game is a subjective experience. There are many ways to enjoy a video game, and this enjoyment is not necessarily represented in the same way between different individuals. To measure enjoyment it must first be defined. The following are the most common definitions of game enjoyment in the current literature. Due to the entwined nature of measurement and definitions, some measurement tools are described in this section. Full details about measurement tools are covered in Section 2.2. 2.1.1 GameFlow

GameFlow is an adaptation of the concept of Flow, first proposed by Csikszentmihalyi and Csikszentmihalyi [35]. Flow is the basic concept that people find genuine satisfaction in a state of consciousness achieved by tailor-fitting the subject matter to each individual’s skills. The goal is to create an experience neither too demanding nor too easy. GameFlow takes Flow and applies it to games, as proposed by Sweetser and Wyeth [36] and Chen [37]. For a game to be enjoyable, using the GameFlow definition, the design has to balance challenge versus ease. If a game is too hard, it creates anxiety. If a game is too easy, it creates boredom. Some of the extensions have been to test pervasive games [38], educational/learning games [39], and GameFlow for Mobility Impaired Users [40]. This adaptability has led to GameFlow as one of the most commonly used definitions when measuring enjoyment of a game experience. Jegers presents a PhD thesis on GameFlow in pervasive games [38]. Pervasive games, as defined by Jegers, are games that have three aspects: ‘‘anywhere gaming,’’ integration between

20 virtual and physical worlds, and an emphasis on social interaction. The research is a survey covering previously published works about the Flow model (PGF) [41]. The PGF modifies the original GameFlow Model by following 14 criteria within the existing eight elements for a total of 50 criteria [38]. The use of subjective measurements for measurement of GameFlow was corroborated by Nacke and Lindley [42, 43]. The purpose of the research was to correlate self-reporting methods with objective measurements. The focus was on measuring flow, immersion, and boredom. The study used three Half- 2 game modifications with a highly atmospheric horror setting to ensure an affective experience. Development of each of the levels was an iterative process based on feedback from game players, designers, and researchers. Each of the three levels focused on one of the flow, immersion, or boredom categories and was played once. This iterative process concluded with a list of criteria for each of the three categories, which were implemented as guidelines in the design process [43]:

• Boredom / Less-Engaging Experience

-- Linear level layout -- Weak and similar enemy types -- Repeating textures and models -- Damped and dull sounds -- No real winning or ending condition -- Limited choice of weapons and ammunition -- High amount of health and ammo supplies -- No surprises

• Immersion

-- Complex exploratory environment with concealed information -- Various opponents -- Fitting sensory effects -- Variety of models, textures, and lighting to establish mood/scenery

21 -- New weapons/ammo/health as a reward after a fight

• Flow

-- Design challenges around mechanics of available weapons -- Start with easy combat -- Increase combat difficulty gradually -- Allow for half-cover spots (Not perfectly safe spots) The physiological measurements of facial EMG, electrodermal activity, and video recording were taken continuously and the subjective questionnaires of the Game Experience Questionnaire (GEQ) [44] and the MEC Spatial Presence Questionnaire (SPQ) [45] were collected at the end of each level. The subject demographics were that of 25 male University students, ages 19 through 38 (M = 23.48, SD = 4.76), where 60% of the subjects usually played a video game every day [43]. Nacke and Lindley show that physiological measurements can be indicators to game players’ emotional states. Their research also discovered that the GEQ was unable to measure immersion and boredom, but was able to measure flow. Procci and Bowers examined flow and immersion within games using the Dispositional Flow State Scale (DFS-2) and the Immersive Tendencies Questionnaire (ITQ) [46]. The study examined the overlap between the two questionnaires. Procci and Bowers show that one flow scale for gaming, typically labeled ”immersion”, combines the time-transformation and loss of self-consciousness aspects. However, they found an opposite result. For their study Procci and Bowers surveyed a sample of ‘‘gamers,’’ labeled as ‘‘avid players of entertainment games,’’ without requiring the subjects to play a game but instead to recall past experiences. Procci and Bowers limited their gamer label to people who ‘‘played games at least twice a week or for more than five hours per week.’’ The final subject sample size was 279, with 144 female participants, 134 male participants, and 1 non-specified gendered participant. Their findings were that the two scales were not overlapping, despite similar items being measured on each questionnaire, and therefore cannot be used interchangeably. Procci and Bowers recommend strongly against using these two tools in game enjoyment measurements.

22 A benefit to the GameFlow model is that it can be extended to cover various subsets of playtesting, either game or user type specific. The EGameFlow model is an adaptation of the GameFlow model to apply to learning games specifically proposed by Fu, Su, and Yu [39]. They thoroughly tested and revised the questionnaire in three stages. First the validity evaluation of the scale items was evaluated. The second stage comprised a pre-test, a reliability test, and a validity test. The final stage formally tested the scale’s reliability and validity. The EGameFlow model follows the original GameFlow categories with some modifications, such as the addition of the category of ‘‘Knowledge Improvement,’’ and converts the criteria into Likert-scalable statements. Another extension was proposed by Zain, Jaafar, and Razak for motor-impaired users (MIU): the MIU-GameFlow model. Zain et al. evaluated the motor-impaired users (MIU)-GameFlow model [40]. Results from interviews with an expert panel, and a review of the literature, found that flexibility was important for MIU and added criteria accordingly to form the MIU-GameFlow model. Similarly, some of the original criteria from the original GameFlow model and EGameFlow tools were excluded by Zain et al. Zain et al. used expert review to examine each part of the proposed model and found that it is a good guideline for future tests, but state that actual playtesting is required for confirmation. A strong benefit of the GameFlow definition of enjoyment is that the model has its roots in theories formed in other disciplines. GameFlow fits with interactive digital media from the idea that a good video game causes a player to lose track of time, is not boring, and is challenging without being too challenging. All of these features are the core of Flow and GameFlow theory. Interactive digital media is a relatively new phenomenon, when considering the entirety of academic research history. To counteract the relatively sparse foundations, building a new method from established and tested theories lends strength to the method. Questionnaires can be developed to test specific flow states, but testing of GameFlow can require the use of non-invasive or unobtrusive means [47]. Requiring participants in a study to shift their attention from the game to a different mental task, such as a questionnaire,

23 would impede the achievement of a Flow state. Even if the user is able to remember their state of mind accurately and the questionnaires are carefully worded to reduce confusion, the process of redirecting from game to questionnaire is disruptive to GameFlow. A physiological measurement could achieve the same results, with minimal downsides directly related to the GameFlow state. Weber et al. ultimately argue that cognitive synchronization can be used as a measurement of GameFlow without disrupting the Flow state itself [47]. However, most studies use some form of questionnaire based measurements when accounting for GameFlow, including the models presented in by Fu, Su, and Yu[39] and the GEQ [44]. Due to the nature of Flow and GameFlow being that the game user achieves an ‘‘in the zone’’ state, using self-reporting measurement tools brings the user out of one mindset and into another for answering questions. 2.1.2 Motivational States

Not all effects of enjoyment may be entirely determined by GameFlow states themselves [48]. Another definition for enjoyment finds its basis in examining the motivational states of the game player. Motivational states as a definition encompasses the ‘‘Pre-Game’’ phase presented in the Integrated Model of Player Experience [49]. By examining what may drive a player, one gets a fuller picture of the player’s intentions, and whether satisfying those motivational states is likely to increase player enjoyment. Kaye presented a thesis examining GameFlow, finding that all effects of enjoyment may not rely entirely on flow states themselves [48]. Instead of only GameFlow, motivational states interact with a player’s experience. Kaye’s research found that motivational states did influence the enjoyment of a game alongside elements of flow theory. Kaye proposed a framework for how to model motivational states and their effect on game type selection and enjoyment. The first study Kaye covers showed that achievement and immersion oriented motivations were linked to flow states during a game. Kaye recommended that the entire gaming experience should be considered for a more comprehensive understanding of the enjoyment process.

24 Motivational states can be categorized into two types: extrinsic and intrinsic. Extrinsic and intrinsic motivational states relate to where the motivations originate [50]. Extrinsic motivation comes from external factors, such as monetary gain upon completion of a task. Intrinsic motivation comes from the task itself and an individual’s personal goals and desires. Within these two categories, three formats are possible: pleasant experiences, ethical motivations, and goal setting and achievement. Subjective measurements have been successfully implemented using motivational states in regards to game enjoyment. A Learning Motivational model called ARCS [51] is further extended into the Instructional Materials Motivation Survey (IMMS) [52], a tool for researchers to test motivational states. Derbali and Frasson examined physiological measurements as indicators of motivational moments in game play [53]. Using the IMMS questionnaire, they found several correlations between objective physiological measurements and motivational states. Theta waves in the frontal regions of the brain and motivation were positively correlated, high-beta waves in the left-center region was a significant predictor for high level of motivation, and skin conductance was a significant predictor for motivation. A motivational state model is less adaptable to unique or specific testing conditions than GameFlow. However, Cota, Ishitani, and Vieira examined the motivational properties for elderly game users [54]. After an initial survey about the elderly users’ preferences, they developed a game tailored to the reported preferences. Subjects were required to fill out a questionnaire directly related to the developed game and the user’s motivations. Cota et al. discovered similar results to those found in GameFlow theory, Section 2.1.1, such that the games should not be too easy or too difficult. Cota, Ishitani, and Vieira found several motivational aspects for elderly players fell under the appropriate categories of intrinsic/extrinsic and player preference/ethical motivations/goal setting and achievement. One intrinsic motivation reported was that the game was ‘‘a tool to combat cognitive diseases.’’ Motivational states as a model for enjoyment is a strong concept due to its inclusion of the game player’s motivations and pre-game status. This definition provides a dynamic version

25 of enjoyment that adapts to different people, as people do not all enjoy experiences in the same manner. However, using subjective self-reporting measures would only reflect motivational states at the time of the questionnaire rather than the gamer’s actual motivational states throughout the game play session [55]. Ghergulescu and Muntean recommend using noninvasive EEG monitoring to measure reactionary motivational states. So, while the motivational states can be measured with certain methods, the easiest and cheapest to procure (self-reporting questionnaires) may not capture the entire picture [55]. 2.1.3 Emotional States

Similar to motivational states is the model of enjoyment as emotional states. This approach is based on the idea that there are second order beliefs that shape a player’s decision making process and experience through a game [56]. In other words, a person’s interpersonal belief system may affect the reward weighing process when playing a game. Both subjective and objective (physiological and behavioral) measurements can be used when measuring emotional states [56]. Emotional states themselves are quite complicated, so multiple approaches to measuring emotional states is recommended. Cho et al. provide an early examination of emotional wording in television transcripts versus printed news, focusing on measurements by episodic composition and emotionally-loaded words [57]. They posit emotional reactions as a process developed over time rather than immediate reactions. The study measured aggression, blame, praise, satisfaction, tenacity, and motion, with a catch-all category of ‘‘others.’’ These measures were grouped into three categories of negative emotion (aggression and blame), positive emotion (praise and satisfaction), and emotional intensity (tenacity and motion). The frequency of these categories being applied to the two media forms were used to measure emotional impact differences. Cho et al. found that television reports elicited more intense emotions, and proposed that the visual medium of television is what aided in this. Video games are a highly visual medium, with the addition of interactivity, and therefore there is good reason to investigate how emotional states might interact with the enjoyment of interactive digital media.

26 Bartsch provides insight into why viewers of movies and television series find it rewarding to experience emotions [58]. The research covers four studies designed to cover differing kinds of gratification associated with emotional experiences, within the scope of TV and movie audiences. Despite not directly relating to game enjoyment, these studies form an important base from which research may emerge. The work Bartsch covers is similar to motivational states and needs satisfaction in that emotional experiences are satisfying on a cognitive and social level. Bartsch advocates that emotions are important to viewers of entertainment media in their own description of the experience, and that research needs to reflect that. Emotional states are measured on a two-dimensional emotional scale of valence/valiance and arousal. Valence is the perceived good or bad rating of the emotion, and arousal is the intensity of the emotion. Horlings provides caution against using brain activity because there are numerous factors which can influence thoughts, and also emphasizes the need to be exhaustive when using these type of tests [59]. Horlings’ work also confirms the two dimensional emotional scale of valiance and arousal, and confirms that EEG can be used for measuring emotional levels in this manner. Measuring emotional states with bio-sensors is proposed by Haag, Goronzy, Schaich, and Williams [60]. The proposed method obtained an accuracy rating of 96.6% and 89.9% for recognition of emotional arousal and valiance, respectively. Haag et al. used ProComp+ to measure EMG, skin conductivity, breathing, HR, and ECG. To validate their classification of these physiological measurements, they used over 800 photographs, classified by a large number of participants in a pre-study in terms of arousal and valence. The experimental emotional responses obtained by the bio-sensors was collected from a single person, and Haag et al. suggest expanding upon it in the future with more participants. Further, they also suggest a multiple measure approach to verify results. Emotional states can be measured with subjective measurements, via the system described by Kivikangas, Nacke, and Ravaja [61] or Biometric Storyboards by Mirza-Babaei et al. [62].

27 Both methods provided a way for users to watch media and flag/tag emotional locations for later reflection. Kivikangas, Nacke, and Ravaja describe a system which enables participants to self-report experiences of game events via review of automatically created video clips and questionnaires about the events [61]. This self-reporting method was also supported by physiological measurements. Kivikangas et al. state that their system was to cover a void in analysis software that covered the following items not found in existing software:

• Provide a method for administering self-report measures without interrupting the activity.

• Provide the ability for the game researcher to compare the self report measures with psychophysiological responses at the event point (i.e., basic triangulation). Kivikangas et al. achieved their desired system, similar to Biometric Storyboards by Mirza-Babaei et al. [62], by allowing participants to create markers at relevant points during the test without breaking the cognitive and emotional state. Participants would later reflect on the video segments created by the markers. The emotional state model is adaptable and typically used in situations where or nostalgia would be strong. Studies used emotional states to examine solitary versus multi-user games [63], the emotional responses between traditional and virtual games [64], and competitive [65]. Deterding used an interview style exploration into emotional states of gamers [63]. The interview was semi-structured to focus on the situational frames of: settings, objects, roles, internal organization, metacommunication, attention, emotion, rules for action and communication, and situational boundaries. For each frame participants were asked to describe biographical incidents they considered “prototypical” for leisurely gameplay. Deterding found that gamers found the most liberating or freeing experience was solitary game play. Gamers were the most free to experience emotions when there was no social obligation to other players. Fang, Chen, and Huang examined the emotional reaction differences between traditional style board games and virtual games [64]. They used the PANAS [66] to measure positive and

28 negative affect, including 10 items for positive affect and 10 items for negative affect. The emotional satisfaction was also collected in visceral (visual pleasantness), behavioral (ease of use), and reflective (sense of satisfaction) levels. The study used three different interface formats (physical, desktop computer, and tablet) of the games Monopoly and Jenga. The total 77 participants were distributed three to four in a group evenly between the Monopoly or Jenga games. All subjects experienced all three variants of the formats of the assigned game with 15 minutes for each interface. After a total of 45 minutes of play time, participants were requested to fill out the survey. The findings were that traditional board games drew out stronger positive reactions, possibly due to traditional memories based on physical objects. Landowska and Wr´obelpresent a study of emotional states while playing two player competitive Tetris [65]. The study used both self-reporting subjective and physiological measurements to collect data. The questionnaire was of their own design and measured both positive and negative emotions. The 32 subjects were put into 8 tournament groups of 4, with each player playing once with every other participant. The questionnaire was filled out after each tournament. They found that emotional states vary greatly for individuals and recommended any game that designs around emotional states should be adaptive. This finding should also apply to studies that intend to measure differences in emotional states between testing groups. Emotional states can be measured in both positive and negative valence, and therefore allow for a more dynamic definition of enjoyment. A story within a game doesn’t have to provide only good emotions to be enjoyable, and neither should our definitions of enjoyment be restricted to measuring only good emotions. However, emotions are subjective and depend on the individual [65]. There may also be a miscommunication or reliability issue between self-reporting subjective measurements and the true emotional states that are occurring during the game. Claes et al. used a combination of non-verbal observation and self-reporting measurements to examine emotional ranges during a baseline and during gameplay of a therapeutic game (Playmancer) between ”normal” control group and a group with Eating

29 Disorders (ED) [67]. Facial recognition software was used for the observational measurements of joy and anger. They found that self-reporting measurements for emotional states are unreliable and seem to have discrepancy with observational behavioral non-verbal cues. Whether this is due to poorly designed questions on the subjective measurements or to an actual discrepancy is unclear. 2.1.4 Needs Satisfaction

Needs satisfaction as a measure of enjoyment is based on Self Determination Theory (SDT). SDT (and needs satisfaction) is based on the concept that enjoyable actions satisfy a base need. A vital modeling tool in measuring SDT is the Player Experience of Needs Satisfaction (PENS) [68]. While enjoyment as a satisfaction of needs provides a promising new definition of enjoyment, there is little coverage in academia. Tamborini et al. expand on the basic needs satisfaction model of enjoyment by incorporating hedonic and nonhedonic needs into their own model of needs satisfaction [69, 70]. Hedonic needs are arousal and absorption; nonhedonic needs are competence and autonomy. The PENS measured needs satisfaction in the two categories of autonomy and competence. The remaining two categories of arousal and absorption were each measured with three- Likert-type scales that was introduced by Tamborini et al. Their findings indicated that both types of needs were statistically significant in positive correlation with self-reported/subjective enjoyment. They also found that interactivity played a part in the intensity of these relationships; arousal was low in low interactivity tests. Tamborini et al. state that other definitions of enjoyment are tautological definitions of enjoyment as a pleasurable response, measured in vague terms, while needs satisfaction is more concrete and measurable. Sometimes several subjective measurements are used in combination to measure needs satisfaction, such as the PENS [68] and the Situational Motivation Scale (SIMS) [71, 72]. Neys, Jansz, and Tan used a questionnaire that focused on the four items of need satisfaction, motivation, enjoyment and persistence [72]. Autonomy, Competence, and Relatedness were

30 measured by the PENS. The SIMS, which is also based in SDT, measured regulation modes: Intrinsic Motivation, Identified Regulation, External Regulation, and Amotivation. Persistence and Enjoyment were measured by a set of items created by Neys et al. The study was an online survey posted on IGN.com and Gamer.nl, available for a time span of four weeks. A majority of the participants, more than 95%, were male. Gamer identity categories were Hardcore, Heavy, and Casual, and were self-reported by participants based on typical number of days and hours played per week. Neys et al. were able to find that hardcore gamers had the highest levels of enjoyment during gameplay. They also found that of the three needs presented earlier by Tamborini et al., only relatedness did not indicate enjoyment. Mood repair was also measured by needs satisfaction. Rieger et al. examine in-game success and needs satisfaction as the effect on mood repair and followed the PENS model [73]. They used the IMI [74] as adapted by Reinecke et al. [75] for measuring the needs satisfaction, and the SES questionnaire for measuring emotions/mood [76]. Participants were placed through a highly stressful cognitive task, the paced auditory serial-addition task (PASAT) presented by Gronwall [77], to lower their mood or frustrate them. Each participant’s mood was measured before and after a serious of Mario Kart races. The four item IMI scale for enjoyment was also recorded after the race. The study showed that video games are able to serve mood repair, help to increase positive mood states, and to decrease negative mood states. Rieger et al. showed that in game success is important to positive moods, however enjoyment relies more on needs satisfaction than success. Needs Satisfaction has several toolkits available. A well established model, the PENS [68], exists and has been thoroughly tested, as seen with Tamborini et al. [69][70] and Neys et al. [72]. Also, a couple different established questionnaires exist for measurement purposes of needs satisfaction: the IMI as used by Rieger et al. [73], and the SIMS as used by Neys et al. [72]. Due to its basis in SDT there is related other disciplinary work to provide context on the theory.

31 Minimal work has been recorded on physiological measurements of needs satisfaction, forcing researchers to rely on subjective measurements. Further research is needed to determine if this definition of enjoyment is limited to subjective measurements only, or if physiological measurements can be validated. Additionally, the satisfaction of needs is a purposefully narrow definition of enjoyment. Needs satisfaction of competence, relatedness, and autonomy may miss out on other varieties of in how a gamer may enjoy a gaming experience (such as the negative emotional experience from horror games or the aimless enjoyment of GameFlow). 2.1.5 Engagement

Engagement refers to a broad group of overlapping concepts and researchers present widely differing approaches of how to define engagement for their specific experiments [3]. Researchers often cite ‘‘engagement’’ when they refer to one of the categories covered in this chapter, such as motivational states or gameflow. When reporting on engagement, one must keep in mind the philosophical concept of ‘‘Family Resemblance’’; engagement is not connected by one essential common feature, but rather is connected by a series of overlapping similar features, where no singular feature is held by all items labeled with engagement [78]. Due to the encompassing nature of engagement, proponents of its use approach the argument for the use of the concept from a broad scope. Leiker et al. examined the way motivation, enjoyment, and engagement interact [79]. Leiker et al. used interactivity/choice as their situational manipulation. The 60 participants played a custom built Kinect game and were randomly assigned to two groups. The control group was allowed to select the difficulty level of each practice block, and the ’yoked’ group had their difficulty mirroring one of the control group participant’s choices. After completing the first day of game play, participants were requested to fill out the questionnaire, based on the IMI [74] and a user-engagement scale edited to only include items related to interest/enjoyment, perceived competence, effort, and pressure/tension. After a week, participants were called back to conclude the second part of the test. The second part had participants participate in three difficulty levels (easy, medium,

32 hard) in randomized order. Leiker et al. found that study makers should not focus strictly on motivational approaches to learning, but a multifaceted approach would better encompass the learner’s process. Several models have been presented for engagement. The models follow Continuation Desire [80], the Traces model [81--83], three-aspect approach [84], activity-based approach [85], flow [86], and Revised Game Engagement Model (R-GEM) [87]. The clearest definition divides engagement into emotional engagement, behavioral engagement, and cognitive engagement [88]. Shoenau-Fog, Louchart, Lim, and Soto-Sanfiel focus on an aspect of narrative engagement, which they believe may be used to quantify any interactive storytelling experience [80]. Continuation Desire (CD) is the desire or willingness to continue an experience, and can be used as a metric to measure the quality of a interactive story experience. Schoenau-Fog et al. present a framework on how to measure engagement this way, relating to emotional engagement as the affective dimension of CD. The Traces model is presented by Bouvier, Sehaba, Lavou´e,and George [81][82][83]. The research acknowledges that engagement covers the multiple categories of attention, immersion, involvement, presence, and flow. Bouvier et al. combine a motivational concept of SDT, Activity Theory, and Trace Theory to explain game engagement. Based on an SDT sourced definition of engagement the following engaged behaviors can be defined [83]:

• Environment-directed, in relation to the need for autonomy. Player behavior exhibited is exploration or modding.

• Social-directed, in relation to the need for relatedness. Player behavior exhibited is expanding social network or sharing moments with others.

• Self-directed, in relation to the need for autonomy. Player behavior exhibited is character customization or creating a story around a character.

• Action-directed, in relation to the needs for competence and autonomy. Player behavior exhibited is mastering game skill or elaborating a strategy.

33 Trace theory refers to considering the behavior of a gamer as a sequence of actions taken, such as mouse clicks or keyboard input [82]. At the base of the framework are observed events, called obsels. Each obsel contains the type of event, a timestamp, and a set of contextual information to characterize the event. A primary trace is a set of obsels that may be connected. Bouvier et al. link the actions with motivations, and thus with Activity Theory and SDT applications. Bouvier et al. performed a user study to validate the performance of their approach in distinguishing engaged versus non-engaged participants and to identify the types of engaged behaviors. The 12 participants provided 12 traces to 3 experts in the gaming field, who responded via an online questionnaire. The questionnaire directly asked the experts to decide if a particular trace corresponded to an engaged player, or one engaged with any of the above types. Bouvier et al. had fairly accurate results, with an accuracy rating of 91.67% for engagement prediction, 80% for prediction of social-engagement, and 100% for both action-engagement and environment-engagement. Li, Jiang, Tan, and Wei approached engagement from three aspects using three differing measurements for each of the aspects [84]. The first engagement aspect is motivational standings, the second is engagement as user perception (such as enjoyment, satisfaction, and involvement), and the third is physiological measurements. The first two aspects were measured via subjective surveys and the third was measured by EEG. Li et al. performed two investigations, one primarily on validity and testability of this approach, and the second a semistructured interview to gauge participants opinions on game engagement. For a description of the study, see the next subsection. Marsh and Nardi suggested an activity-based approach to engagement focusing on a narrative [85]. They also focused on motivations per objective in the activities as in Activity Theory. The framework proposed is to consider a sphere of engagement through motive in activity. Marsh and Nardi stated that actions which share a motive are contained within a sphere of engagement. Marsh and Nardi did not provide any user studies based on the

34 framework, but provided a flexible framework for future analysis/design of interactive digital media. An approach to engagement based on flow concepts was presented by Schiavo, Cappelletti, and Zancanaro [86]. Their study used behavioral and physiological measurements. An adapted version of the Experience Sampling Methodology was used and trained a Support Vector Machine classifier that identified the affective states correctly with an accuracy of 73%. The study found that detectable behavioral cues can be used to measure engagement, thereby not needing intrusive or obvious measuring equipment. Their study used webcams and input devices combined with software to track behavioral cues, such as head tracking software. The measurements they used were facial expression, head position and keystrokes, which were easily obtained without intrusive equipment. Schiavo et al. found that the cheaper and less invasive options were a good alternative for measuring boredom, flow, and stress in games. Procci provided an examination of the Revised Game Engagement Model (R-GEM) based on forming firm definitions of immersion, involvement, presence, and flow [87]. Procci recommended considering ‘‘game engagement’’ as a generic term rather than a specific context. A study was conducted using the game to test the relationships proposed within the R-GEM. The 84 participants played Minecraft and then filled out a questionnaire designed by Procci. The results showed that the model still needed work but generally showed reliable factors, and so further research was suggested by Procci. Silpasuwanchai, Sigemasu, and Ren summarized engagement into three categories of emotional, behavioral, and cognitive [88]. Similar to emotional states discussed in Section 2.1.3, Silpasuwanchai et al. also summarized emotional engagement as the valence, arousal, and endurance of the evoked affective state. They also presented a user study to explore how strategies affect the thee proposed dimensions of engagement in the context of learning. In the study, users performed three problem-solving and three memory tasks via two interactive systems; one gamified and the other not. The 30 participants were randomly

35 assigned to one of the interactive systems and completied all six tasks. Afterwards, emotional engagement, behavioral engagement, cognitive engagement, and learning performance were collected via a Likert-scaled subjective questionnaire, designed by Silpasuwanchai et al. They were unable to confirm a consistent relationship, and Silpasuwanchai et al. suggested that gamification may need a higher dimensional assessment. Behavioral cues have been used to measure engagement. Riemer and Schrader took an unobtrusive monitoring approach with behavioral cues to monitor engagement and its effects on mental model development [89]. They measured Behavioral Engagement in High Relevance phases (BEHR) and Behavioral Engagement in Low Relevance phases (BELR). High relevance phases are moments where a game user may self-reflect or exhibit self-monitoring behavior. Low relevance phases are behaviors exhibited with low relevance to the educational objective. Reimer and Schrader classified BEHR as total time spent in the decision and reflection sequences, and BELR as the number of side missions played. The study consisted of 97 participants. The participants were given a manual for instruction and then played the game Cure Runners until the end of the first chapter in game. During playing the game screens were recorded for non-intrusive behavioral examination. The findings were that only self-monitoring affects mental model development in serious games and behavioral engagement has no effect. Bardzell, Bardzell, Pace, and Karnell suggested that, since emotion and engagement are both biological and subjective constructs, that a combination of physiological and self-reporting methods are required [90]. However, McMahan, Parsons, and Parberry found that EEG off-the-shelf modules can be used to measure gamer’s engagement during game play, specifically relating to player tasks that occurred within the game (death, normal play). McMahan et al. looked into EEG measures of engagement with the Emotiv [91, 92]. The study had 30 participants play while wearing the Emotiv, with ”general game play” differentiated from ”dead events”. Participants played the game for 15 minutes with

36 no conclusion survey. Webcam footage was recorded to help isolate events, such as facial movements, which might affect the EEG data. According to the research presented by Procci, James, and Bowers, most gamers experience low engagement levels at the start of a game and some gamers will progress to higher levels with experience [93]. Procci et al. subscribe to the idea that engagement is on a scale from low to high levels of engagement, moving from immersion to presence, flow, and finally absorption. The research examined a ‘‘low-level’’ game engagement score, created by summing the Game Engagement Questionnaire (GEngQ) immersion and presence subscales versus a ‘‘high-level’’ game engagement score, created by summing the GEngQ flow and absorption subscales. A total of 187 participants were asked to play a game and report their starting and ending times, with a request to play at least 15 minutes of the game. After the participant played the game for as long as they desired, they were returned to fill out the GEngQ scales. Li et al. studied software gaming elements of game complexity and familiarity [84]. The first investigation collected EEG game play data and self-reported evaluations of user-game engagement. A total of 44 participants had EEG data collected while they played the game for 3 minutes. At the end, the participants filled in a self-report survey to measure their game engagement. Li et al. found in this study that, basing engagement on theta oscillation density, the high-familiarity and low-complexity games had the highest game engagement (lowest density). They further suggested that ‘‘the effect of game complexity and game familiarity on cortical activity in the left side of the DLPFC can indicate the existence of a symmetric compensation relationship,’’ but insisted on further research to confirm. A strength of defining enjoyment as engagement is its encompassing nature. Engagement typically contains aspects from other well-established enjoyment definitions such as immersion, involvement, presence, and flow [87]. The overarching nature of enjoyment is also reflected in engagement, as a multitude of aspects make up what is enjoyable about an experience. Using the engagement definition can allow for a broader approach to research on enjoyment.

37 A negative aspect to defining enjoyment as engagement is related to its strengths. Due to the diverse nature of engagement, this definition does little to help clarify exactly what the research is measuring. Engagement is best used in conjunction with additional specifying definitions to better focus the research. Care must be taken not to fall back on simply categorizing enjoyment simply as engagement, when defining engagement itself requires a multitude of considerations. 2.1.6 Summary

The literature provides numerous clear definitions of enjoyment, which provided an initial focus and direction for this research. Most models of enjoyment deal with in-the-moment analysis, with a few exceptions. Due to the prolonged nature of replay value, engagement was selected as the best measure of enjoyment for the user study proposed in this dissertation. Additionally, one of the games used is an RPG (Role Playing Game), for which engagement is an excellent model of enjoyment due to the immersion effect of role-play [94]. 2.2 Measuring Game Enjoyment

The definition chosen for enjoyment will inform the type of measurement tools used in the user study. There are two basic categories of measurement tools: subjective and objective. This section covers the most commonly used measurement tools for each category. 2.2.1 Subjective Measurements

Some of the easiest measurements to access are subjective self-reporting measures. These measures are typically in the format of questionnaires and surveys and, possibly due to the low-tech method and ease of recording these measures, are commonly used in test runs of user studies. The drawback to these measurement types is the reliance on self-reflection from the participant subsequent difficulties in calibration. Subjective measurements, such as questionnaires, can be tailor-fit to a researcher’s hypothesis statement. If questioning the user’s sense of self within the game world, the question can directly ask this. The specifics of what is asked are only limited by interpretation.

38 The format these questions take varies from interview style questions to formal Likert scales. Likert scales are questions in the form of statements, with a scale from Agree to Disagree, typically with 5 or 7 points, and from which the subject must select the option that best represents their feelings on the statement. Interview style questions provide context, and require thorough analysis to lead to a quantifiable result. An interview question can lead into a Likert scale, such as asking what a user’s favorite game is and rating the game on the Likert scale. Other formats for questions can be similar to Likert scales, but unless the prompt is a statement of agree/disagree it is not a formal Likert scale, though sometimes referred to as one [95]. When deciding upon which subjective question type to use, care must be taken to assure the wording or structure of the questionnaire does not influence the results of the study [96--98]. Subjective measurements in enjoyment have successfully measured engagement [3, 79, 90, 93, 99, 100], emotional states [56, 63--65, 67, 90], GameFlow [38, 40, 42, 46, 48, 100--102], motivational states [3, 48, 53, 54, 56, 72, 79, 84], and needs satisfaction [69, 70, 72, 73, 100, 102, 103]. A benefit to subjective methods is the ease of access to these reporting methods. No additional mechanical applications are required and the investigator can also work towards outsourcing participants via online participation [104]. Another benefit is the diversity of aspects that can be measured. Self-reporting measurements can be adjusted to capture any topic or feeling, as long as the question is phrased correctly and validated. Physiological measurements can be difficult to interpret beyond the initial data. Established measurement scales, such as the GEQ [44] or the Fang et al. Questionnaire [96], already exist, with rigorous testing to guarantee the validity and reliability of the items. If studying a unique or currently unknown aspect of enjoyment, statistical analysis with Cronbach’s Alpha [105] is the place to start. Subjective measures can also work as supplemental and reaffirming measurements to other measured items, and frequently are used as calibration for the physiological measurements, as

39 discussed in the next section. As physiological measurements become more easily available with cheaper equipment, validating measurement categories in the objective measurements with established subjective measurements will validate the new measures. The nature of subjective measurements is that of retrospection. The retrospective process lends itself better to types of enjoyment reflecting an overall experience, such as Needs Satisfaction. Reactive measurements, or measurements of instinctual reactional enjoyment, are better left for objective measurements, specifically physiological measures. One of the downsides to subjective methods is the additional cognitive load the format of question-and-answer places on the user. Using self-reporting methods requires the user to stop whatever they currently are doing and shift their cognizant state to consider the subject contextually and provide an appropriate answer. If one were measuring an aspect along the lines of GameFlow [37], for example, bringing the player outside of that flow could alter the experience one is attempting to measure. Additionally, subjective measurements are personal, and subjective to the reporting person’s experience. Careful validation and considerations need to be held to take this dependence into account. 2.2.2 Objective Measurements

Objective measurements, such as temperature or time, are measures not subject to personal bias. The most commonly used objective measurements in enjoyment research are physiological measurements. Physiological measurements are grouped into two categories based on the controlling nervous system. The Central Nervous System (CNS) controls measures like Electroencephalo- graphy (EEG), Event Related Brain Potentials (ERP), and Electrooculography (EOG). The Peripheral Nervous System controls the rest of the measures in the somatic and autonomic subsystems. The somatic nervous system controls voluntary activation of muscles and the autonomic nervous system (ANS) controls involuntary muscles and internal organs [106]. The unknowns of these measurements make it difficult to directly draw on them as stand-alone

40 measurements. However, a few studies have verified them as stand-alone measurements by validating the results with subjective self-reported measures. Another subset of objective measurements is behavioral measurements, which utilize measures such as timing, mouse clicks, and facial expressions. This subset is small when compared to the physiological data available. A non-exhaustive list of physiological/behavioral measures commonly used in HCI is, as defined by Dirican and G¨okt¨urk[106]:

• Event Related Brain Potentials (ERP)

• Electroencephalography (EEG)

• Electro Dermal Activity (EDA)/Galvanic Skin Response (GSR)

• Cardiovascular Measures/Heart Rate (HR) and Heart Rate Variability (HRV)

• Blood Pressure (BP)

• Electromyrogram (EMG)

• Eye Movements

• Pupil Diameter

• Respiration Brain waves for EEG are typically defined in frequency bands [107]. Delta is 1-4 Hz, Theta is 4-8 Hz, Alpha is 8-14 Hz, Beta is 10-30 Hz, and Gamma is 30-50 Hz. Alpha brain waves are typically associated with relaxation, visual processing, and lack of active cognitive processes. Beta brain waves are associated with alertness, attention, vigilance, and excitatory problem solving activities. Theta brain waves are usually related to decreased alertness and lower information processing, but frontal midline theta activity is linked to mental effort, attention, and stimulus processing. Delta brain waves are most prominent during sleep, relaxation, or fatigue. The relationship between gamma activity and internal mental state is still being investigated.

41 Physiological measurements in video game enjoyment have successfully measured emotional states [56, 59, 60, 65, 67, 90, 108, 109], engagement [3, 84, 90--92, 99, 108, 110], GameFlow [43, 102, 107], motivational states [3, 53, 55, 56, 84], and needs satisfaction [102, 103]. Behavioral measurements in video game enjoyment have successfully measured emotional states [56, 67], engagement [86, 89], GameFlow [38, 86], and motivational states [56]. None of the current literature has applied behavioral measurements to needs satisfaction. Physiological measurements are best used for reactive enjoyment states which might be forgotten or not realized by the game user after the fact [111]. Objective measurements have the benefit of being objective, such that they do not rely on an individual subject’s guess for the accurate answer to a subjective question [106]. Objective measures, with a proper set up, are less intrusive to a game user’s thought process. The nature of physiological and behavioral measurements require no input from the game user themselves, requiring only mechanical recordings that will later be evaluated by the researcher. Several shortcomings of the physiological measurements fall into the categories of Special Equipment Disadvantages, Data Acquisition and Interpretation Disadvantages, and Unnaturalness Disadvantages [106]. Special equipment disadvantage is the fact that special equipment is required in order to measure the act. Behavioral measurements are less affected by equipment availability but still require some special recording equipment, such as screen recording or key logging. Data acquisition and interpretation have special difficulties owing to the black-box and relatively unknown nature of physiological measures. Most research with physiological measurements use subjective self-reporting questionnaires to validate the results. As research progresses this shortcoming will be less of a concern. Unnaturalness is not limited to physiological measures, in that laboratory settings and testing environments are unnatural to a game user and may affect their responses, specifically

42 those related to relaxation and getting into a ”zone” state of mind. Moreover, because physiological measurements require additional equipment, this adds an additional factor on top of the unnaturalness of the lab setting in the first place. Due to the fact that human brains and thought processes are affected by many factors, researchers recommend not using physiological measurements for over-arching enjoyment types. While enjoyment types such as GameFlow lend themselves well to physiological measurements, more long-term definitions such as motivational states or needs satisfaction are better suited for subjective measures [111]. Some tools for measuring physiological signals, such as EEG, cannot be recorded if there is major movement in the subject. Natural head movements and body movements occur often with game playing, and some games even require physical activity for play. A skin-like electronic device for recording electroencephalograms (EEG measurements) long-term would ameliorate some of the shortcomings for physiological measurements in long experiments [112]. 2.2.3 List of Measurement Tools

This section contains officially proposed models and tools of measuring enjoyment of interactive digital media, based on previous definitions of enjoyment.

EGameFlow. Fu, Su, and Yu propose EGameFlow for the measurement of GameFlow in e-learning games [39]. ‘‘Player skill’’ from the model of GameFlow [36], was changed to ‘‘Knowledge Improvement.’’ The questionnaire uses 7 point Likert scales. The questionnaire contains the following 8 categories:

• Concentration (6 items): games must provide activities that encourage the player’s concentration while minimizing stress from learning overload, which may lower the player’s concentration on the game.

• Clear Goal (4 items): tasks in the game should be clearly explained at the beginning.

• Feedback (5 items): feedback allows a player to determine the gap between the current stage of knowledge and the knowledge required for ultimate completion of the game’s task. Feedback does not need to be real-time, but must provide the user context of the effect their actions have on the game.

43 • Challenge (6 items): the game should offer challenges that fit the player’s level of skills; the difficulty of these challenges should change in accordance with the increase in the player’s skill level.

• Autonomy (3 items): the learner should enjoy taking the initiative in game-playing and asserting total control over his or her choices in the game.

• Immersion (7 items): the game should lead the player into a state of immersion.

• Social Interaction (6 items): tasks in the game should become a means for players to interact socially.

• Knowledge Improvement (5 items): the game should increase the player’s level of knowledge and skills while meeting the goal of the curriculum.

Fang et al. 11 Item Questionnaire. Fang et al. proposed an approach/avoidance system for enjoyment [96--98]. Approach systems are positive systems, such as being motivated to perform an action so that a positive aspect occurs. Avoidance are negative systems, such as performing actions to prevent negative aspects from happening. They start their proposal based on the Tripartite Model [96, 113]. The Tripartite Model is a model of reactions leading to effects. Three categories of reactions (affective, cognitive, behavioral) lead to identical three category of effects. Their results are the following three categories of questions, totaling 11 questions together. The categories are:

• Affect (5 items): questions that ask the user about their feelings directly, such as being happy or worried.

• Behavior (3 items): questions that ask the user about their behaviors which occurred during the game, such as swearing.

• Cognition (3 items): questions that ask the user about their feelings in relation to the game, such as the characters actions being decent.

The Game Experience Questionnaire (GEQ). Ijsselsteijn et al. propose the Game Experience Questionnaire (the GEQ) [44]. The GEQ is a multitude of 5-point scaled questions within various categories. The categories are in three modules:

• Core Module (33 items): tests on seven components of the game experience: Immersion, Flow, Competence, Positive and Negative Affect, Tension, and Challenge.

44 • Social Presence Module (17 items): questions pertaining on the behavioral model of the player when another entity is present.

• Post-Game Module (17 items): questions to assess how the player feels after they finished playing. There also is the in-game version of the core module available, which is a more concise version for multiple/frequent in-game testing stops containing less items per component.

The Game Engagement Questionnaire. Brockmyer et al. developed this 19-item questionnaire to specifically measure engagement in a game based on the three categories of absorption (5 items), immersion (9 items), presence (4 items), and flow (1 item) [114].

Biometric Storyboards. Mirza-Babaei et al. discuss the idea of combining self reporting measurements with observational and biometric measurements in user testing called Biometric Storyboards (BioSt) [62]. BioSt uses graph storyboarding to portray player experience. They find that BioSt based user testing provides more nuances than regular self-reporting only user testing because of the combination of physiological measurements with self-reported measurements. The BioSt is similar to the method that Bouvier et al. developed for their studies [82]. Bouvier et al. created a system which uses a storyboard with key events during the timeline for users to self-report back on, providing context and easier readability of the physiological measurements.

PhysSigTK. Rank and Lu demonstrate PhysSigTK, a physiological signals toolkit for making low-cost hardware accessible in the Unity3D game development environment [110]. They state that engagement is inherently connected to the game itself, and therefore context free engagement measurements are less viable. PhysSigTK provides access to the physiological measurements from the E4, Iom, e-Health, and MindWave hardware.

Transportation Questionnaire. Green and Brock suggest a transportation questionnaire based on transportation theory for media enjoyment [115]. Transportation theory is the theory that enjoyment can be based in the transportation of self into a world to the extent that players internalize moralities or decisions based on events within the game world. While

45 modeled for reading media enjoyment, the Transportation Questionnaire could be modified for interactive digital media, similarly to how Flow was converted to GameFlow successfully.

Intrinsic Motivation Inventory. Ryan and Deci provide a seven part scale to measure intrinsic motivational states [74]. The questionnaire covers seven motivational scale items:

• Interest/Enjoyment

• Perceived Competence

• Effort/Importance

• Pressure/Tension

• Perceived Choice

• Value/Usefulness

• Relatedness The IMI scale is copyrighted and so cannot be reproduced here, but is free for academic use. An abbreviated version (14 items) adapted for game responses in a pre- and post-game survey is presented by Vos, Van Der Meijden, and Denessen, and provided in full in the appendix [116]. Adaptations for gaming is also provided by Reinecke et al. [75].

Situational Motivation Scale. Developed by Guay, Vallerand, and Blanchard, the SIMS is a questionnaire to measure situational motivational states on four scales [71]. The four motivational scales covered are:

• Intrinsic motivation

• Identified regulation

• External regulation

• Amotivation

Instructional Material Motivation Scale. A questionnaire/scale presented by Keller to measure motivation in instructional materials [51, 52]. The questions cover four areas of motivational states:

46 • Attention

• Relevance

• Confidence

• Satisfaction

Positive And Negative Affect Scale. A set of scales to measure positive and negative emotional affective states, as developed by Watson, Clark, and Tellegen [66]. The PANAS consists of a collection of words, to which users are encouraged to rate how much or little each of the words apply to themselves. The PANAS can be modified to indicate immediate or over a duration of time reflection. Though not directly related to game experiences the emotional words are not heavy on context, therefore needing little to non editing for enjoyment of interactive digital media measures.

Skin Conductance. Skin conductance, Electro Dermal Activity (EDA), Galvanic Skin Response (GSR) is the conductivity of the skin based on sweat and chemical reactions in the body. GSR has been shown to be reliable to measure frustration [99], emotional states [109], and engagement as measured by the GEQ [111].

Heart Rate. PPG, photoplethysmograph, is used to measure heart rate. PPG has shown to correlate to enjoyment in video games [117].

Cronbach’s Alpha. When developing a set of subjective items to measure a target definition of enjoyment, and no other established tools work for the research, Cronbach’s Alpha can help validate the developed items. The measure was first developed by Cronbach in 1951 [118]. Cronbach’s alpha is not limited to enjoyment of interactive digital media but can be applied to any set of subjective items with closed-ended numbered answers. The alpha value ranges between 0 and 1, with higher numbers indicating higher reliability coefficients [105]. The coefficient measures the internal consistency of a test, or the inter-relatedness of items within a test [119]. Nunnaly recommends a level of 0.7 or higher [120]. Most statistical analysis programming tools will have Cronbach’s Alpha available.

47 2.2.4 Summary

Both subjective and objective measurements were researched because there were no published studies that showed how, specifically, to measure enjoyment applied to procedural generation of video games. Following the decision made in Section 2.2 to select engagement as the model of enjoyment to measure, the subjective and objective measurement tools chosen for use were the GEQ, the Fang et al. Questionnaire, skin conductance, skin temperature, and heart rate. The GEQ and Fang et al. Questionnaire were selected for their direct relationship to engagement, after dropping the behavioral and cognition aspects of the Fang et al. Questionnaire. These two questionnaires were also selected on their brevity, due to the fact that they would be utilized multiple times during the study. Skin conductance, skin temperature, and heart rate have been shown to reliably measure engagement as well. These three objective measurements were also selected due to the fact that a non-intrusive sensor, the Empatica E4, can collect this information. 2.3 Procedural Generation in Video Games

Procedural Content Generation has been utilized to create enjoyable games since the early 1980s [1]. Several academic approaches have been presented in order to generate better and higher quality content. The more traditional approach has been to generate a multitude of options independently of the user, while a more recent approach has been to dynamically generate the content based on the user’s gameplay. The procedurally generated content ranges in size and scope in the final game. It can be divided into the following categories [10]:

• Game bits: basic units that compose a game. (EX: textures or sounds)

• Game space: navigational world/space in which the player exists.

• Game systems: complex systems to increase a game’s believability.

• Game scenarios: progression/sequence of game/story events.

• Game design: structure and goals of a game.

48 • Derived content: side-products of the game. (EX: leaderboards and news) The independence from human influence in PCG systems varies as well. Khaled, Nelson, and Barr provide the following design metaphors for PCG [121]:

• Tools: items meant to achieve design goals and provide the designer control.

• Materials: procedurally generated items that can be edited by designers.

• Designers: PCG algorithms that aim to complete game-design on their own.

• Domain Experts: evaluators, monitors, and analysts of gameplay data. An extremely popular competition was developed from the idea of adapting a non- -procedurally generated game, Super Mario Brothers, into one which uses PCG. The Infinite Mario Competition is frequently cited by different researchers discussed in this section and has been one of the main motivators for discovering new PCG techniques [122]. There are two important approaches in PCG techniques: Independent Procedural Content Generation (IPCG) generates content without context from the user, and Experience Driven Procedural Content Generation (EDPCG) generates content based on a user’s performance and play style in the game. This distinction is important because, while both are techniques of introducing variety, the design of tests for the user’s experience in each would be quite different. Independent Procedural Content Generation (IPCG) is the most popular version of PCG, despite a recent rise in interest in research in EDPCG. 2.3.1 Independent Procedural Content Generation

IPCG generates a vast amount of variations of content given a certain structure, and then selects a variation for each play-through. The generation does not take input from the user. IPCG has been in use for a long time, so the techniques created are vast and varied. IPCG can be used to generate quests or puzzles [9, 16, 18, 123], platformer levels as in or Mario [20, 25, 122, 124--128], 3D terrain [19, 129--131], FPS maps [132], infinite runner platforming games such as Canabalt [133, 134] or other infinite type games [24], dungeon crawling mazes [29, 126, 135, 136], road systems [22], in-game objects or set pieces [23, 137], tracks [138], creative sandbox games like Minecraft [26, 139], puzzle games

49 [27, 32], adventure/story-driven worlds [140], planetary systems [30], music [141], maps for real time strategy games [142], or ornamentation [143]. Enjoyment in IPCG has been measured in several ways, with the most common being a programmed fitness function [18, 21, 27, 30, 33, 34, 124, 126, 129, 132, 133, 136]. Less commonly player performance was used [24, 26, 132, 133, 135]. The few that did use user studies for verifying that their generated content was entertaining did not use any of the established questionnaires for enjoyment, and used simple direct questions such as ‘‘which did you enjoy more’’ [24, 26, 29, 122, 123, 135, 137, 142]. 2.3.2 Experience Driven Procedural Content Generation

Experience Driven PCG has been studied in the field of dynamic difficulties, finding that reflective difficulties are better enjoyed despite self-reported gamer experience [144]. Dynamic difficulty adjustment falls under the category of GameFlow theory as the challenge is adjusted to be ‘‘in the zone’’ for a perfect player experience. EDPCG has been used to generate platformer levels [31, 127, 145, 146], top-down shooter games [147], and puzzles/quests [9]. Modeling the player’s experience helps inform good experience-driven procedural content [148]. The study of enjoyment enhancement from EDPCG would need to be designed differently, as the point of tailoring an experience to a gamer’s play style brings in a different variable. EDPCG has been criticized as a game design strategy on many fronts. For example, it makes the game dishonest with the gamer [149]. The reality of the game world is compromised, as it is not the same world state for each user where an opponent’s skills are a known challenge to be met. Dynamic changes are also exploitable by self-aware gamers, where players can realize the experience driven mechanics and force it into ‘‘gaming the system.’’ Finally, when a game changes to be easier after initial failures, if noticeable by the player, is insulting to the player, as when a child discovers that a parent has ‘‘let them win.’’ Realizations such as these on the part of the player have an immense impact on their

50 immersion and enjoyment of the game, even to the point of a loss of sense of accomplishment [150]. 2.3.3 Summary

Because EDPCG is new, contains more factors, and its effects on enjoyment can be contrary to the intention of improving enjoyment, my study will focus only on IPCG due to its prevalence and the fact that few studies have focused on it. IPCG has had numerous academic applications, but problems arise in the majority of published research. The research presented draws on the improved user experience for motivation for developing new PCG, often in the form of ‘‘replay value.’’ However, most state the claim that replay value is one of the purposes for PCG without citation or user studies to confirm its efficacy [16--33]. Smith [34] cites a quote from the creator of Canabalt [134], but a quotation is hardly evidence. Some researchers cite anecdotal evidence about older PCG-driven games continued popularity for proof of enhanced replay value [135], but again this is not real evidence. In some cases, other published research is cited, such as Compton and Mateas [125], which in turn do not establish the claim of replay value, but are nevertheless used as evidence for the claim. Some tools have been proposed for helping in the analysis of the fitness of a PCG design [33, 151, 152]. However, these tools are few and do not approach the enjoyment aspect of PCG from a human perspective, instead opting for fitness functions or visualization of procedurally generated content. It has been shown that humans can reliably pick which procedurally generated levels are more fun, and therefore user studies are a reliable for validation of PCG content generation [153]. Some of the published research does provide user studies (e.g., Mari˜no,Reis, and Lelis [154]); however, the method in which enjoyment was measured was a basic ‘‘did you enjoy this’’ question. In order to support the claims made by the majority of PCG research claiming replay value as a motivator, a more thorough and systematic approach is necessary using user studies [154].

51 CHAPTER 3 IMPLICATIONS Although there are methods to measure enjoyment, there is no application of these methods to demonstrate that procedural generation is a benefit to video game designs due to repeated game play enjoyment. Rather than provide objective evidence for this type of statement, most researchers make an unsupported statement that procedural generation enhances replay value, without providing any justification other than common sense. For the purposes of this research, the selected definition of ‘‘replay value’’ is the retention of enjoyment factors over a repeated play environment. The approach used by this study applied established enjoyment measurement tools to design a testing framework for measuring differences in enjoyment due to procedural generation in two types of video games. 3.1 Problem Statements and Proposed Solution

Problem Statement 1: There is little to no experimental evidence measuring how PCG affects replay value. Proposed Solution 1: Develop a standard framework for testing of PCG enhanced enjoyment for future testing. This goal will be achieved by performing exploratory user studies around PCG games, starting with the use of the academically proven enjoyment measurements. The user study involved two different game types, an infinite runner and a RPG, and three different content generation approaches, static, manual, and procedural. Both subjective and objective measurements of enjoyment were recorded for analysis. 3.2 Research Questions and Hypotheses to be Tested

Research Question R1: Procedural generation of infinite runners and RPGs provides enjoyment enhancement upon repeated plays of these games compared to static game environments. This research question will be tested by comparing the measurements of enjoyment between the static and procedural generation in both game types. If the values have a

52 statistically significant better value for the procedurally generated content than the static then the null hypothesis for R1 can be rejected. The hypothesis is:

H01: there is no statistical difference between enjoyment of static and procedural generation infinite runners and RPGs

HA1: there is a statistical difference between static and procedural generation games Research Question R2: Procedural generation of infinite runners and RPGs provides repeated-play enjoyment enhancement equivalent to manual generation. This research question will be tested by comparing the measurements of enjoyment between the manual and procedural generation in both game types. The null hypothesis for R2 can be rejected if there is proof that the values for PCG are better than or equal to manual generation. The hypothesis is:

H02: there is no statistical difference between enjoyment of manual and procedural generation infinite runners and RPGs

HA2: there is a statistical difference between manual and procedural generation infinite runners and RPGs Research Question R3: Procedural generation provides the same kind of repeated-play enjoyment enhancement between types of procedural generation for infinite runners and RPGs. This research question will be tested by comparing the measurements of enjoyment between both game types while only looking at procedural generation. This hypothesis will look for equivalent values between the two games on factors of enjoyment in order to reject the null hypothesis for R3. The hypothesis is:

H03: there is no statistical difference between enjoyment of infinite runners and RPGs while using procedural generation

HA3: there is a statistical difference between enjoyment infinite runners and RPGs while using procedural generation

53 CHAPTER 4 APPROACH The following section details the user study, data collection, and analysis in detail. It also outlines how each decision relates back to the research questions in Section 3.2. 4.1 User Study Design Overview

A brief overview of the flow of actions for the user study is documented in this section. The study took place over two days to avoid participant exhaustion, allocating one day per chosen game type (Sections 4.3,5,6). The total time for each section/subsection was determined after a pilot study. An outline of the processes is as follows:

1. Pre-Stimulus

(a) Provide informed consent form to sign, emailed before the start of the study for review (Section B.3) (b) Request demographics collection (Section 4.5.1) (c) Familiarizing with hardware setup (Figure B-2) (d) Take baseline measurements (e) Assign stimuli type order based on participant ID

2. For each game type (Section 4.3)

(a) For each generation type (procedural, manual, static)

i. Play the game for a period of time ii. Gather objective continuous data iii. Collect subjective data five times during game play (Sections 4.4, 4.5)

3. Post-Stimulus

(a) Collect final survey (b) Compensation (pro-rated to 10$ per day)

4. Analysis (Section 7.2) Step 1 was designed to inform and accommodate the subject before testing starts. Beginning with 1.a, participants were provided an informed consent form (Section B.3)

54 detailing the possible risks involved with the study and compensation that the subject received upon completion of both days of the study, followed by asking for and answering any participant questions. The form was provided to the participants before meeting in person to allow for enough time to read and review its contents. Participant demographics (Step 1.b, Section 4.5.1) were collected, and the subject was familiarized with the setup and the game type (Figure B-2). The Empatica E4 (Section 4.5.4) was started before the game program was run to collect baseline physiological measurements with participants seated in a quiet room. A controlled randomized order was assigned for the three content generation types depending on the participant’s assigned ID. Step 2 was the main data collection and game playing step. The amount of time participants were requested to play varied depending on the game type, due to average completion time measured during a pilot test. For the RPG game type participants played the game for 30 minutes at a time, with a questionnaire requesting feedback every 6 minutes. After the 30 minutes + time to answer the questionnaire, participants were be given a break to use the restroom, get water, etc. Afterwards, the 30 minutes play section began again, but with a different generation type. The total time for the RPG was around two hours total including the break times and questionnaire times. The Runner game had a shorter average completion time, therefore participants played the game for 10 minute sessions, responding to the questionnaire every two minutes. The total time for the Runner was around one hour. Both game types collected a total of 15 subjective responses to the questionnaire and three sessions of continuous objective physiological data, for a total of 30 subjective responses and six objective sessions for each participant. Step 3 was completed after all three content generation types were played on a given day. The participant was requested to repeat all steps another day for the other game type. Compensation was provided at a prorated amount after the subject completed each day. Step 4 was the analysis of the collected data; this step is described in detail in Section 7.2.

55 4.2 Participants

Participants were selected from applicants with some expressed interest in playing video games, and were screened using a brief screening form (Section B.2). Most of the participants indicated that they enjoyed some form of RPG or infinite runner games, or both, which was a requirement based on the types of games included in this study (Section 4.3). Participants who labeled themselves as either a Frequent Gamer or Expert Gamer were also selected, regardless of game preference. Attempts were made to balance gender ratios in the final results, but there was no specific restriction on age or gender demographics. The assumption was made that by measuring as a deviation from the subject’s baseline enjoyment, the differences between individuals was less important. This is a significant assumption that could be subject for further research. A total of 18 participants (11 Male, 7 Female), aged between 18 and 24 years old (average age of 20), were selected for the study, based on committee recommendations and the study design. All user studies were approved by the Institutional Review Board (IRB); all subjects were provided with an informed consent form that detailed the possible risks and other factors. Appendix B.3 contains the full informed consent document that was provided to the participants. 4.3 Game Types

Several factors were considered in choosing the games. First, the procedural generation must be controllable. Traditionally, games containing procedural generation do not allow players to see or control the procedural generation behind the scenes. The ability to enable or disable the modified features of each successive game is required for comparison. Second, the length of a single play-through must have a small average completion time. The point of the experiment was repetition, so the game stimulus must be short enough to allow for a feasible block of time to be allocated to the experiment. Future research should look into longer-run games to replicate results.

56 The two different game types were: An infinite runner, such as Canabalt [134], and an RPG type, such as 6 [155]. The use of two different types of games was used to investigate Research Question R3. Infinite runners simplify the game down to relying primarily on the procedural generation aspect to create a gaming experience. Most infinite runners have the avatar character automatically running in a horizontal fashion, with the only control available to a player being one button to control when the character jumps. The character must jump between procedurally generated building sizes, types, and heights along with procedurally generated obstacles (Figure 4-1). The purpose of utilizing an infinite runner is because the game design is simple, with each game lasting no more than a few minutes. Rapid repetition of the game was possible and the simple nature of the game type eliminated some of the external factors that might influence the results. Canabalt was not used, but a similar game was developed for this study because the official version of Canabalt does not offer the capability to repeat the same level multiple times and control the state of the experiment. The developed game utilized More Mountain’s 2D+3D Infinite Runner Game Engine, which is available in the Asset Store [2] (Section 5). RPG games are more complicated than infinite runner games because RPG games have a story interwoven with the world map. In most cases, RPG games do not feature procedurally generated content except in places where the story is not dictating the progression. The result of this is interchangeable dungeons and other simple maze-like locations as the source of procedural content. This study developed a procedural generation system, Atlas Chronicle, along with a simplistic game engine, to test procedural generation in variations of an RPG. The purpose of the game engine and story created was to explore the world and find where to go by talking to NPCs (Non-Player Characters). The player had up/down/left/right control over the character as well as ‘‘interact’’ and ‘‘inventory’’ buttons. The procedural content generated by Atlas Chronicle assured that the general progression between important locations remained

57 the same, but the layout of the world around these locations was different (Figure 4-2). The game engine used was intentionally simplistic and removed most of the other mechanics of an RPG game. Reducing the game down to its basic structure allowed testing of procedural generation with minimal interference from other enjoyment factors, such as music and artwork. Simplistic design also reduces the time taken to play the game to a feasible time for a user study. Typical RPG games can take up to 20 to 40 hours to complete, while the game engine created for Atlas Chronicle takes about 10 minutes to complete. Section6 provides full details on the Atlas Chronicle programming design created for this study. The purpose of using these two games was to show that the effects of procedural generation were different between game types. The use of two games was a preliminary step towards addressing Research Question R3; however, more game types will need to be tested to fully understand R3. Ideally, future studies will apply the same study methodology to other types of games containing procedural generation to draw broader conclusions about what exactly procedural generation provides to games. 4.4 Game Enjoyment Metric and Measurement

The terms ‘‘replay value’’ or ‘‘replayability’’ are often used when describing the benefit of procedural generation in games. The first step in the study was to define the concept of replay value formally, with respect to the use of this term to describe video game enjoyment, then choose measuring tools that best quantify replay value as a specific measurement of enjoyment. Based on the survey of the current literature in Section 2.2, there are several good definitions of enjoyment. Due to the unknown nature of what procedural generation affects in a game’s experience, the design collected data for multiple definitions of enjoyment under the broader definition of engagement. For general engagement, physiological data were collected; for affect, the Fang et al. Questionnaire was used; and finally, the Game Experience Questionnaire was used due to its broad covering of multiple definitions. A combination of both subjective and objective measurements were used. The Game Experience Questionnaire (GEQ) has been proven reliable [44, 102, 103, 156] and covers

58 multiple aspects of enjoyment such as Competence, Sensory and Imaginative Immersion, Flow, Tension, Challenge, Negative affect, and Positive affect (Section 4.5.2). The Fang et al. Questionnaire was also used because it is reliable way to measure Affect [96]. The Empatica E4 wrist band was used to obtain objective measurements. Due to the lengthy experiment time, more cumbersome and delicate hardware could not be used without influencing the data [112] but the Empatica wrist bands have been shown to be reliable for measuring physiological responses in survey participants [91, 92] and so they were selected. 4.5 Data Collection and Analysis

There were two phases to the user studies. Phase 1 used the infinite runner game and phase 2 used the RPG. The specific amount of time for each phase was decided based on a pilot test, and further modified to reduce fatigue by shortening the total playing time. Due to the length of time necessary for each phase, the two phases were accomplished on separate days, with some participants performing phase 2 before phase 1. Each of the phases contained three test treatments of content generation: static, manual, and procedural. Both phases used the following method:

1. Subjects were briefed on the experiment and signed consent form

2. Subjects filled out demographics questionnaire and were assigned a controlled randomized order (Table 4-1)

3. Subjects were given time to become familiar with the controls

4. Subjects played through the game in one of the content generation formats

(a) Measurements were collected 5 times during game play

5. Steps 2 through 4 were repeated for each of the content generation formats in their assigned random order Step 1 was a formal subject briefing on the possible risks involved in the study; then the informed consent form was provided to the subject to sign. The participant was provided a digital copy via email upon notification of being selected to participate. Step 2 was the

59 collection of demographic data that might affect the resulting measurements of enjoyment (Section 4.5.1). Participants were assigned the three content generation formats in random order: static, manual, or procedural. For the procedural and manual formats, each successive game that was played was different, with the creation of the content created by the computer or human-designers, respectively. The static format game was replayed using a single created level for all games played. Selection of the randomized order utilized a 3x3 Latin Square, detailed in Table 4-1. Each day was randomly assigned to a game type as well (Table 4-1). In Step 3 a paper with the current game’s controls was provided to the participant (Figures B-2a and B-2b). The participant was given some time to read and become familiar with the controls, and finally asked if they had any questions about the controls before starting the session. The setup of the experiment had the participant seated one to three feet from a computer monitor (Figure 4-3). Seating the participant, as well as using a controller for input, were to minimize the amount of strain on the participant over the lengthy test session. A standard Playstation 4 controller was used (Figure B-2). The main input for both game types was mapped to the bottom button on the controller, jumping for the runner game and interacting with towns for the RPG. The bottom button on the controller was the button most directly under a participant’s thumb when they held the controller. Participants completed successive play-throughs of their games within the format for the time allotted for that game type (two hours for RPG; one hour for Runner). Due to the rapid iterative process, the subjective questionnaire enjoyment measurements were recorded exactly five times instead of after each game iteration. The game time was divided by five, resulting in five timed segments, at the end of which the questionnaire was shown for input. For example, the RPG would play for 6 minutes before displaying the questionnaire, repeated for a total of five times. The use of a different amount of time per game was a compromise between two options. The time was decided after a pilot test determined the average play time

60 a single game of either game type would take. Then a total time was calculated based on that average time such that each segment potentially has the same number of completed games between game types. The different amount of time does result in a confounding factor of a shorter amount of time played for the Runner, but the alternative of the same number of hours would confound based on different number of completions. Physiological data were collected continuously. The data collection was contained within the computer rather than on paper or at a different digital location to minimize player removal from the game-play experience. After the fifth questionnaire was completed the participant was informed to come find the observer and take their stretch break. The observer stopped the Empatica’s recording and instructed the participant to take a brief walk over to where the restrooms are and back. After the break the participant was switched to the next content generation format. The same participants were used for both phases to allow a within-subjects comparison. Some potential participants did not show up after the second day and are not included in the data provided in this research. Participant compensation was pro-rated for their time commitment. 4.5.1 Demographics

The following data were collected for each participant for demographics purposes:

1. Gender

2. Age

3. Self reported gamer level, from most active to least active in gaming

(a) Expert Gamer (b) Frequent Gamer (c) Casual Gamer (d) Newbie / Non-Gamer There appears to be no consensus on how to measure or classify gamer level. Hours per week spent playing games is often utilized, but free time to play games does not always reflect

61 a person’s active interest in video games. A ranking mechanism that uses 15 factors has been suggested [157] but the rankings of the 15 factors were determined by the researchers’ opinions of each factor’s importance, and no evidence was provided to verify the importance of the scale. The creators of the mechanism even suggest re-weighting the 15 factors according to what a developer thinks is important. Accordingly, demographics collection utilized self-described gamer levels, which at least represented the user’s general opinion of their interest, level, and skill. 4.5.2 The Game Experience Questionnaire

The Game Experience Questionnaire [44], briefly described in Section 2.2.1, covers several aspects of enjoyment. The Social Presence Module did not apply to this research and was not included. The In-Game module is an abbreviated version of the core module, recommended by Ijsselsteijn et al. for multiple/frequent in-game testing scenarios, and therefore was the best option for the design. See Appendix B.5 for the full questionnaire used. 4.5.3 The Fang et al. Questionnaire

In addition to the in-game module of the GEQ, the questionnaire presented to participants used the 5-item Affect section from the Fang et al. Questionnaire [96], shown in Appendix 2.2.3. The Behavior and Cognition sections from the Fang et al. Questionnaire were not used because they were not relevant to this research. 4.5.4 Physiological Data with the Empatica

The Empatica E4 is a wearable and wireless sensor for biological data capture in real time that is both lightweight and not intrusive [158]. There are four components available: PPG, EDA, 3-axis accelerometer, and infrared thermopile. The first two are primary designs of the Empatica E4 while the latter two are off the shelf. Analysis examined the measurements for skin conductivity (EDA), heart rate, and skin temperature; the accelerometer was not used. Skin Conductance (EDA) has been used frequently to show enjoyment in video games, since exciting moments are likely to elicit high arousal and engagement in the game [117]. Ivory and Kalyanaraman [99] found that skin conductance was a statistically significant method

62 for measuring frustration. Photoplethymograph (PPG) data have been shown to reliably measure enjoyment in video games [117]. The PPG measurements were converted by the Empatica software into an estimate of the participants heart rate, which is the form that was used in analysis. Skin temperature (Thermopile) data have been shown to have some effects in video game experience [3], so the measurements were included in the analysis. Data were recorded using the Empatica E4’s internal memory then later transferred to a computer. A few sections of some participants’ data were lost due to software glitches in Empatica’s upload interface. The sum of the data lost represents around 3.5% of the total physiological segments. The missing segments appear in two of the patterns discussed in Section7 and are noted in the figure captions.

Figure 4-1. The typical features of an infinite runner with the PCG in green. The user controls a runner with constant velocity by pressing a button to jump. Procedural generation controls the jump distance, the next platform’s height relative to the current platform, the next platform’s length, and optional obstacles. Good PCG assures that the layouts are always possible.

63 Figure 4-2. The features of a possible RPG with PCG setup. A story is defined by important locations of interest and connections between them. PCG then varies the relative cardinal directions, distances, and landmass shapes around the locations. Good PCG would assure that the progression of locations in the story is not broken by impossible land features that would otherwise prevent the subject from following the intended storyline.

Figure 4-3. Mockup of the setup for the study. Participant will be seated a comfortable distance from the computer monitor, between one to three feet.

64 Table 4-1. Table of participant controlled randomized order assignment. Letters represent the following: s = static, m = manual, and p = procedural content generation, Run = infinite runner game, RPG = role playing game. Order uses a Latin Square of size 3 for both game types. Day 1 Time Slots Day 2 Time Slots Participant ID 1st 2nd 3rd 1st 2nd 3rd 000 Run-s Run-m Run-p RPG-s RPG-m RPG-p 001 Run-m Run-p Run-s RPG-s RPG-m RPG-p 002 Run-p Run-s Run-m RPG-s RPG-m RPG-p 003 Run-s Run-m Run-p RPG-m RPG-p RPG-s 004 Run-m Run-p Run-s RPG-m RPG-p RPG-s 005 Run-p Run-s Run-m RPG-m RPG-p RPG-s 006 Run-s Run-m Run-p RPG-p RPG-s RPG-m 007 Run-m Run-p Run-s RPG-p RPG-s RPG-m 008 Run-p Run-s Run-m RPG-p RPG-s RPG-m 009 RPG-s RPG-m RPG-p Run-s Run-m Run-p 010 RPG-m RPG-p RPG-s Run-s Run-m Run-p 011 RPG-p RPG-s RPG-m Run-s Run-m Run-p 012 RPG-s RPG-m RPG-p Run-m Run-p Run-s 013 RPG-m RPG-p RPG-s Run-m Run-p Run-s 014 RPG-p RPG-s RPG-m Run-m Run-p Run-s 015 RPG-s RPG-m RPG-p Run-p Run-s Run-m 016 RPG-m RPG-p RPG-s Run-p Run-s Run-m 017 RPG-p RPG-s RPG-m Run-p Run-s Run-m

65 CHAPTER 5 2D+3D INFINITE RUNNER ENGINE The infinite runner is a style of game that relies heavily on PCG in its game design. The controls are limited typically to a single button with all other interactions automated (see Figure 4-1 for the types of PCG typically involved). Games of this style will create variety by varying the jump height, the distance to the next platform, and the size of the next platform. Optionally, obstacles can be added to cause harm or slow down the runner. The development of the Runner game used the 2D+3D infinite runner engine developed by More Mountains [2]. The engine came with tools to be used within Unity and included several pre-made game types that expedited the testing process. 5.1 Game Style Choice

The 2D+3D engine provides several different styles of infinite runner, in both two dimensional and three dimensional styles. In order to eliminate control factors and make as simple a game as possible, the design used the 2D side scrolling runner where the player only controls a limited jump between platforms (Figure 5-1). The game starts out with a flat surface for players to reorient themselves (Figure 5-1a). Players can ‘‘double-jump’’ by pressing jump again while mid-air (Figure 5-1b). 5.2 Procedural Content Generation

The More Mountains toolkit came with a procedural generation algorithm built into the demos. The random numbers generated by Unity’s random number generator determined the next platform’s shape and location as a relative position from the current position. Using the relative position assured that the next platform was never too far off from the current one. 5.3 Static Content Generation

Due to the PCG-reliant nature of infinite runners, special consideration was needed when designing the non-PCG type content. The engine did not come with the feature of manually designing levels, so the ability to create levels from a saved pattern was developed. An addition

66 to the program achieved the static content by selecting a single starting level value. The level began the same and had the same pattern of jumps every time the participant started over. 5.4 Manual Designs

The engine did not come with the ability to create human-designed patterns. The code added to the game created the manual level by reading a design from a save file and repeating the design ad infinitum. These designs were manually created to be a length of level longer than the average distance a player will achieve. To assist in designing levels for the infinite runner a simple assistive toolkit program was developed (Figure A-1). Unfortunately, humans cannot design a truly infinite level. The designer was provided controls on the left. The program allowed designers to control the total number of jumps in the sequence, the viewing scale, and to select a current platform to modify. To select a jump, the designer could either use the drop-down menu on the left or click on the jump with a mouse. The selected jump is provided both as a number and colored in red. The designer then could use the keyboard controls to modify the platform in the ways described in Table A-1.

67 (a) Start state of the game.

(b) Mid-jump during gameplay.

Figure 5-1. Example of the implemented 2D side-scrolling jumping game designed with 2D+3D infinite runner engine [2]. The user controls the square by jumping between platforms.

68 CHAPTER 6 ATLAS CHRONICLE Story-driven RPG-like games are one of the game categories that PCG has barely touched due to the intertwined nature of the game world’s map and the game story. Until one can achieve truly procedurally generated stories that make sense, a different approach was necessary. The system, Atlas Chronicle, relies on the nature of RPG stories where an order of locations to visit is the primary progression of storytelling. The relative distances between story-based locations of interest remain within story-defined constraints of minimum and maximum distance and connectivity. The variance comes from the manipulation of the cardinal directions, as rarely does the relative direction make a difference in the story being told. If a system can calculate a controlled randomization of locations for each location visited in the game that satisfies the minimum and maximum distance between the locations as defined by the story, one can infer the world around these locations to a full map. The development of Atlas Chronicle finds this set of locations with physics. Atlas Chronicle was developed in Python [159] with the use of the PyGame module [160] for visual and interactive aspects. The Atlas Chronicle system, described in detail in the following sections, totals around 2700 lines of code in 11 different Python files. The external modules used were: a physics engine (Pymunk [161]), 2D Vectors (Euclid [162]), Perlin noise (noise [163]), and interpolation (scipy [164]). The following sections describe an overview of the code design, data structures, and algorithms implemented. 6.1 Story Abstraction

In a story-driven RPG there are locations of interest (LOI). These LOI can be a town, cave, castle, or even a non-interactive terrain type, so the journey the player experiences travels through different terrains. A RPG story can be abstracted into LOI and distance restrictions between each LOI. These restrictions are a story-experience restriction of distance: Town A and Cave B must be between 10 and 15 kilometers of each other. Restrictions represent a minimum and maximum value connecting two LOI.

69 These restrictions can be of two types: traversable or non-traversable. Traversable means that the restriction between LOIs A and B doubles as a rule that A must directly connect to B via traversable terrain. In our example, Town A and Cave B connected by a traversable restriction means that the direct path between A and B must be connected by walk-able terrain. The ‘‘traversable’’ definition for terrain can change (see Section 6.4 for details). A non-traversable restriction does not need to be covered by non-traversable terrain, but simply does not require that the path be explicitly traversable terrain. This property is to allow restrictions to connect LOI to restrict their distances without affecting the connectivity of the terrain. Figure 6-1 shows a simple visualization of three LOI with three restrictions. With abstraction, the world can be built around the story. The first step is to find some set of coordinates that satisfy all of the restrictions in the connected graph of LOI. The settling of LOI in these satisfactory coordinates used a physics engine. 6.2 Physics Engine

A physics engine applies forces on LOI as physical objects to push and pull the LOI into coordinates that satisfy the restrictions. Each LOI was represented as a physical object, and each restriction was made into a slide/spring combination. The connection functions as a free-movement sliding joint when within valid distance measurements. If the connection distance/size is less than the minimum value the physical property is changed to a spring-style connection with a resting size of the minimum value. The forces applied will push the two connected LOI objects further away if the distance connecting them is too small. A similar physics interaction happens if the distance is larger than the maximum value with the force applied to bring the two LOI closer to each other. A visualization of this process is in Figure 6-2. With all LOI in the system and connected by slide-spring restrictions (weighted based on the story parameters), each LOI was given a random x,y coordinate and allowed to exist in a physical space. The system was allowed to simulate the forces until all of the objects’ velocity fell below a small threshold.

70 The use of a physics engine over other more deterministic algorithms was to account for human error in the story abstraction step. Future applications of the system will be available for game developers and a non-deterministic allowance will allow for imperfect systems to still create solutions. Once each of the LOI has valid x,y coordinates that satisfy all restrictions to the best of the system’s ability, the terrain around the LOI is generated. 6.3 Terrain Generation

A randomized floodfill fills in terrain around the LOI. Each LOI was given a seed terrain, with each interact-able/town-based LOI creating four seeds around it of traversable terrain. The edges of the map were filled with void tiles, which are tiles that will be replaced in further iterations. Figure 6-3 shows the process visually. Each of the seed tiles and edge void tiles were placed in a list, A, to be used in the floodfill sequence, Figure 6-3a and 6-3b. The floodfill algorithm, Figure 6-3c, was as follows:

1. Select a tile t from the list of available tiles A

2. Populate list T of possible floodfill adjacent tiles from t

(a) Check all cardinal directions North, South, East, West of t (b) Add tile to T if it is unfilled and viable*

3. IF no tiles are in T , remove t from A and return to step 1

4. ELSE select one tile t1 randomly from T

5. Fill t1 with the same tile type as t and add t1 to A

6. Repeat steps until A is empty *Viable means that, in the case of void tiles, attempting to fill the tile cannot cross a boundary along the line of any traversable restriction. Any two LOI connected by traversable restrictions will have a map that connects them via traversable terrain. The final result was a map of traversable and not traversable tiles, Figure 6-3d.

71 6.4 Recursive Process

The next step repeated the process from Section 6.3 recursively. The traversable map generated, Figure 6-3d, is called a field. A field is a subset of what makes a continent, which is a subset of a world. A field is a set of LOI separated by traversable or not traversable terrain. A continent consists of fields separated by not traversable terrain. A world contains continents separated by water. The next iteration for creating a continent repeated the process with fields in place of the LOI (Figure 6-4). The fields were connected by restrictions, placed in a physics engine randomly and allowed to settle (Figure 6-4a). The floodfill has a similar process with void tiles around the edge, but the floodfill seed tiles were mountains instead of landfill (Figure 6-4b and 6-4c), to ensure the two or more fields were connected by mountains, thus creating a single continent (Figure 6-4d and 6-4e). Figure 6-4f shows the climate to terrain mapping, covered in the next section. The final recursive step repeated the process with continents instead of LOI (Figure 6-5a). Instead of floodfilling a third time, for this study’s purposes the map is complete, and the unfilled areas were filled with ocean/water tiles (Figure 6-5b). 6.5 Terrain Mapping

After the continents were formed, the system mapped terrain types onto the landfill tiles. A climate was assigned to each LOI. Climates are made of a value between 0.0 and 1.0 for both temperature and humidity. A climate map was made by first creating seed values from each LOI (Figure 6-6a) and then an average value was added to the edges (Figure 6-6b) to enable interpolation to the edges of the map. Then two interpolation maps were generated to interpolate the climate values in the unassigned tiles between LOI, one for each temperature and humidity, visualized as blue and red respectively in Figure 6-6c. Then the climate was mapped to a corresponding terrain type based on a look-up process called a Terrain Boundary Map 6-6d.

72 The system achieved the climate to terrain mapping by creating a nearest neighbor map to the designer’s specifications, called a Terrain Boundary Map (TBM). For this study the map seen in Figure 6-7 was used, but any amount of terrain values can be used. The values for temperature and humidity were used as a x,y coordinate look-up in the nearest neighbor map, which converted the two values into a single terrain type. To create more organic looking separations between terrains from the interpolation step, a small amount of noise was added to the climate map before TBM look-up. Figure 6-8 shows the comparison between plain (6-8a and 6-8b) and noisy (6-8c and 6-8d) maps. 6.6 Testing Game Engine

The intended run-time of a normal RPG is 40+ hours, which was not feasible for a rapid, repeated test in a user study. Instead, for testing the procedural generation in RPGs used a minimal game engine and storyline created for the study. The engine was programmed in Python [159] and utilized the PyGame module [160] for video game specific behavior. The game engine only contained the minimal elements to progress through the story:

movement

• Locked terrain type travel

• Text boxes/interactable locations of interest 6.6.1 Storyline and Progression

A basic connectivity structure of LOI can be seen in Figure 6-9. The hero starts near StarterTown where an NPC (Non ) tells them about the three crystals necessary to save the world and defeat the final . The NPC also suggests looking nearby for where to start with a cardinal direction hint to DesertTown. An NPC in DesertTown will direct the hero cardinally towards DesertPalace. DesertPalace contains the first crystal necessary to win the game. The hero is then directed towards the mountains to get to the next open area, field1. The MountainTown NPC will tell the hero to search nearby in PlainsVillage for the Mountain Pass Key. Once the hero has the key they are allowed through the Mountain Pass within MountainTown that leads to MountainCave in

73 field1. The story continues in in a similar style, leading the player to find three crystals in order to beat the game, as well as one hidden object for the ‘‘true ending.’’ 6.6.2 2D Engine

The map generated by Atlas Chronicle was converted to a two-dimensional game via Python and PyGame. The engine consisted of 11 Python source files with around 1540 lines of code. The game engine featured two dimensional pixel-based graphics and controller input. The engine used a singleton factory pattern to minimize the amount of image loading required by the game. The game engine kept track of the time elapsed in-game and would pause the game automatically at timed intervals to collect the survey information. The game would only be allowed to continue after the participant completed the survey. The game can be seen in Figure 6-10. 6.7 Static Content Generation

The static content was selected from a single level designed by Atlas Chronicle. 6.8 Manual Designs

The manual content was designed from a toolkit created before testing. The toolkit assisted in expediting the creation of a world map without needing to place each tile specifically. A map was generated by painting, similar to MS paint, with certain colors defined to certain tiles. Each town/LOI was given a unique color. The mapping is shown in Table A-2. A few examples of the designer are shown in Figure A-2. The designs were required to follow a human-readable English description of the equivalent to Figure 6-9 and other details about the world structure, such as nearby water.

74 A B

C

Figure 6-1. Example of story abstraction with three LOI and three restrictions. The restrictions between A/B and B/C are traversable restrictions and the restriction between A/C is not.

A B

A B

A B

Figure 6-2. The three states in which a slide spring can exist. Top image: the connection between A and B is less than MAX and greater than MIN distances. Middle image: the distance between A and B is less than MIN, force is applied to push A and B away from each other. Bottom image: the distance between A and B is greater than MAX, force is applied to pull A and B towards each other.

75 (a) Coordinates determined with physics engine. (b) Map with seed tiles.

(c) Halfway through the floodfill process. (d) A final possible map.

Figure 6-3. The process from the coordinates generated by the space manager to 2D map.

76 (a) Physics engine with two fields. (b) Mountain seed values placed.

(c) Halfway through floodfill process. (d) Final mountain shape created.

(e) Placement of fields. (f) Climate mapping applied.

Figure 6-4. Recursive process for continent ending with one possible content generated.

77 (a) Physics engine with three continents.

(b) Final possible world creation.

Figure 6-5. Recursive process for world ending with one possible world generated.

78 (a) Seed values. (b) Edge values. (c) Interpolation. (d) Terrain mapping.

Figure 6-6. Example of noise added to climate mapping for more natural boundaries.

(0.8, 0.2)

Desert

Plains Grasslands Forest Temperature

Tundra

(0.2, 0.2) Humidity (0.2, 0.8)

Figure 6-7. Terrain Boundary Map used in examples for the testing game.

79 (a) Climate map in temperature and humidity (b) Climate map translated to terrain map with no noise. with no noise.

(c) Climate map in temperature and humidity (d) Climate map translated to terrain map with noise. with noise

Figure 6-8. Example of noise added to climate mapping for more natural boundaries.

80 Figure 6-9. LOI structure for minimal RPG to be used.

81 (a) Starting area of 2D testing engine.

(b) Interaction with a town.

(c) Another area in the 2D engine.

Figure 6-10. Example gameplay of the 2D RPG testing engine for Atlas Chronicle.

82 CHAPTER 7 DATA AND ANALYSIS 7.1 Data

The independent variables were the two game types and the three level generation methods, resulting in six different states:

1. Game Type

(a) infinite runner (Runner) (b) RPG (RPG)

2. Level Generation Method

(a) Static (b) Manual (c) Procedural The dependent variables were the subjective and objective measures of enjoyment selected for this study:

1. Subjective Measurements

(a) The 14 questions from the In-Game GEQ Module (b) Affect as measured by the 5 questions in the Fang et al. Questionnaire

2. Objective Measurements

(a) Skin Conductance (b) Heart Rate (c) Skin Temperature The objective measurements were scaled by the median value for that recording session. 7.2 Analysis

Researchers have proposed ways to increase engagement as a means to make a better game (Section 2.1.5). Often, the published literature conflates engagement with enjoyment [3, 84, 85, 93], arguing that if engagement is high, then the experience is more enjoyable. But these studies did not include an analysis of ‘‘replay value’’ in video games, so a one-to-one

83 connection may not be applicable. The following analysis will explore this connection further as well as exploring engagement and enjoyment as separate entities. 7.2.1 Objective Measurements

All objective measurements were scaled to adjust for bias in the range of responses selected by each participant. The participant’s responses were divided by their median response

for that game session. The participant’s median was represented as 1.0 on the y-axis and visualized with a dotted red line. Color was used to represent distance from the median, with red being the closest and cyan being the farthest. Although the heart rate data did not have any discernible patterns related to game or generation types, skin temperature and skin conductivity (EDA) revealed some common patterns. For visualization purposes the color key seen in Figure 7-1 was used to visually express the distance from the participant’s mean value for the day. Among the participants who showed a physiological response, the most common pattern is depicted in Figure 7-2. The participant’s skin temperature started low, increased, then finally decreased as time progressed. Participant 14 played the RPG game in the following order: procedural, static, manual. In Figure 7-2 Participant 14’s first generation type of the day (RPG-p) showed a pattern of increasing temperatures, ranging from the lowest to the highest values for that day. The second game session (RPG-s) displayed mid-range temperatures, and the final game session (RPG-m) had the lowest skin temperatures of all three. The pattern indicated that the participant was initially engaged with the RPG game but grew less engaged as time progressed. For the runner game, Participant 14’s skin temperature remained close to the median throughout the session. A slight variation of the common ‘‘increase, plateau, then decrease’’ pattern can be seen for Participant 9’s skin conductivity (Figure 7-3). The generation order for Participant 9’s Runner games was static, manual, procedural. There was a clear increase during the Run-s and Run-m sessions, with a plateau on the last section (Run-p). Participant 3 (Figure 7-4) had increasing skin temperatures for each segment in all three generation types during the Runner game, ending with nearly all temperatures above the

84 median during the final game (RPG-p). Rather than a consistent increase in skin temperature, the RPG games followed a pattern similar to those measured for Participant 14, where the initial generation type had the greatest change (RPG-m), the second generation type (RPG-p) displayed a plateau in skin temperature, and the final generation type (RPG-s) revealed decreased engagement, with lower skin temperatures. Participant 8 (Figure 7-5) showed a different pattern for the Runner game, with skin temperature remaining relatively constant throughout the game session for the runner game. Another pattern came from the skin temperatures for Participant 9 (Figure 7-6) during the Run-p test. The participant’s skin temperature started close to the median, then dropped, and finally increased for the final generation type. This pattern was in conflict with the levels of engagement suggested by Participant 9’s skin conductivity, which followed the more common pattern of increasing, plateauing, then decreasing. Heart rate measurements were collected; however, there were no discernible patterns for most of the participants. Participant 4, seen in Figure 7-7, showed the typical heart rate ranges. The absence of consistent patterns in heart rates may be due to heart rate being a reliable measure for reactive emotional states rather than an overall enjoyment experience. There was no absolute pattern that all participants followed in the objective measurements, even when controlling for individual physiological ranges. Some patterns were noticeable, and most followed one of three patterns: strict increase; increase then plateau; or increase then decrease. There was evidence to show that most of the participants experienced some form of increased engagement while playing the video games, which is consistent with published articles that discuss video game engagement [3]. The data collected from this experiment showed evidence of increased engagement, as measured by increased skin conductivity (EDA) and skin temperature [99, 110, 165], for most participants. However, 12 out of 18 participants (67%) reported boredom by the end of the session, and 16 out of 18 (87%) reported the repetitive nature of the games.

85 As per Nacke [111], physiological measurements are best used for reactive game enjoyment. However, most participants did experience some form of increased engagement. Analysis of the subjective data below revealed that the physiological data did not correlate with the subjective measurements. The disconnect between physiological data and the subjective measurements suggests that for some types of video games, engagement and enjoyment are not synonymous. 7.2.2 Subjective Measurements

User responses were recorded periodically during each Game/Generation section. The responses were on a Likert scale from 0 to 4; details on the full survey can be found in Sections 4.5.2, 4.5.3, B.5 and B.6. In order to detect patterns, a contingency table was built upon varying factors and then the results were tested for statistical significance. Analysis tested several factors to determine independent variables. The terminology used in the following analysis is:

• Participant ID: The identification number given to each participant to protect anonymity and blind analysis. Also abbreviated as ‘‘PID’’ in the figures.

• Response: The numerical value a participant answered for a given question. Participants were interrupted at evenly spaced times to fill out the GEQ and Fang et al Questionnaire in response to the game experience so far.

• Game Type: The game type played for which the current response was measured. The game types are RPG and Runner. Each game was played on a separate day. Also abbreviated as ‘‘Game’’ in the figures.

• Generation Type: The level content generation used for which the current response was measured. The generation types are manual, procedural, and static. These three generation types were played in a single game session on a single day in a controlled random order (Figure 4-1). Also abbreviated as ‘‘Generation’’ or ‘‘Gen’’ in the figures.

• Segment: The time segment from which the response was recorded. For each combination of game and generation type the participant was asked to record their responses five times, resulting in five segments of time for which participants provided a response. The segments are numbered in ascending order, with segment 1 being the first time segment played for the game.

86 • Game Session: A single day during which only one game type was played, either RPG or Runner. All three generation types are played during a single game session.

• Test Set: The set of six unique test treatments over the user study. The list includes RPG/procedural, RPG/manual, RPG/static, Runner/procedural, Runner/manual, and Runner/static. 7.2.2.1 Analysis Procedure

The first test applied on a contingency table was the Chi-Squared test. Pearson’s chi-squared test checks the null hypothesis: ‘‘the factors chosen for the x and y axes in this table are independent variables.’’ If the null hypothesis is true then there is no relationship between the two factors chosen for x and y alignment. The Chi-Squared test uses the fact that the expected value of a cell in a contingency table, under the null hypothesis, is proportional to the product of the marginal distributions of the two factors. Marginal distributions are the sum of the table across rows or columns, which represents the expected distribution of each factor independently of the other. The null hypothesis is that the two variables are statistically independent, i.e. the joint distribution is the product of the two marginal distributions. Given the marginal distributions, the expected value under the null hypothesis of each cell can be calculated and compared to the observed value in that cell. The Chi-Squared test uses the sum of the squares of the differences of the observed frequencies from the expected frequencies, divided by the expected frequencies:

2 X (observedi − expectedi ) χ2 = expected i i The summation variable i ranges over all cells of the contingency table. Table 7-1 shows a traditional numerical contingency table of the responses to GEQ03, ‘‘I felt bored,’’ from all participants. The data are separated by segment and the average and median for each segment is also provided at the bottom of the table. The first question was whether this distribution could occur by random chance. The R code syntax for the chi-squared test and results were as follows:

R Code:

87 chisq.test(value[item=="GEQ03"], segNumber[item=="GEQ03"])

Results: X-squared = 41.176, df = 16, p-value = 0.0005229

The p-value is less than 0.05, which indicates that the observed responses were significantly different from the expected frequencies from independent distributions. Therefore, the null hypothesis was rejected and it was concluded that there was a relationship between segment number and response values for GEQ03. The Chi-Squared test answers whether the two factors of the contingency table are independent, but does not determine, for example, when they are not independent, which particular segments are different from each other. For this, either parametric ANOVA and a post-hoc test, or the equivalent nonparametric tests, was needed. These are used to compare each level of a factor (such as segments) with the other levels of the same factor to determine which levels are significantly different. Due to repeated significance testing, the chances of a Type I error (”false positive” or rejection of a true null hypothesis) are elevated, and hence significance values for each test must be adjusted to reflect the true probability under the null hypothesis. The Holm-Bonferroni p-value adjustment method was applied to counteract this effect. ANOVA tests are parametric tests that determine whether the variance in a factor is significantly affected when the data is split into groups based on another factor. Nonparametric tests determine whether nonparametric statistics, such as medians and ranks, in a factor are significantly affected when the data is split into groups based on another factor. The questionnaire data are discrete rather than continuous; therefore, the data do not meet the required assumptions for parametric ANOVA. Both parametric and nonparametric sets of results are provided, but ultimately the nonparametric Kruskal-Wallis and pairwise Wilcoxon rank-sum tests are used for this analysis. Both the parametric and nonparametric

88 pairwise tests have been corrected for multiple comparisons using the Holm-Bonferroni p-value adjustment method. First, the ANOVA test:

R Code: ANOVA.example= aov(value[item=="GEQ03"]~segNumber[item=="GEQ03"]) summary(ANOVA.example)

Results: Df Sum Sq Mean Sq F value Pr(>F) segNumber[item == "GEQ03"] 1 54.1 54.07 25.66 5.61e-07 Residuals 536 1129.5 2.11

ANOVA indicates there were statistically significant differences between the group means,

as seen in the Pr(>F) column with a value less than 0.05. Therefore the parametric pairwise test is applied:

R Code: pairwise.t.test(value[item=="GEQ03"], segNumber[item=="GEQ03"])

Results: 1 2 3 4 2 0.76450 - - - 3 0.15150 0.76450 - - 4 0.00435 0.07131 0.76450 - 5 0.00016 0.00528 0.17230 0.76450

89 Some segment means were not statistically different from each other, as seen between

segments 1 and 2, with a p-value of 0.76. However, segment 1 was significantly different from segments 4 and 5; and segment 2 was significantly different from segment 5. The Kruskal-Wallis test indicated there were statistically significant differences in medians:

R Code: kruskal.test(value[item=="GEQ03"], segNumber[item=="GEQ03"])

Results: Kruskal-Wallis chi-squared = 24.878, df = 4, p-value = 5.324e-05

The p-value from the Kruskal-Wallis test was less than 0.05, so analysis continued with the pairwise Wilcoxon test:

R Code: pairwise.wilcox.test(value[item=="GEQ03"], segNumber[item=="GEQ03"])

Results: 1 2 3 4 2 0.77378 - - - 3 0.14533 0.77378 - - 4 0.00511 0.06927 0.77378 - 5 0.00031 0.00723 0.14533 0.77378

The pairwise test indicated that the median of segment 1 was significantly different from segments 4 and 5; segment 2 was significantly different from segment 5; and all other segment medians were not significantly different. These results were conceptually identical to the parametric tests.

90 There are several ways to convey these results visually. One option, seen in Table 7-2, is to use superscript on the mean (average) and median rows. If any column shares a letter, then the groups are not statistically different. For example, the letter A on segments 1, 2, and 3 indicated they were not different, but segment 1 and segment 5 did not share any letters, therefore they were significantly different. Finally, a variant on the contingency table can be used to display the data graphically. Seen in Figure 7-8, the numerical totals are represented as a proportion of the total responses per segment. A larger segment indicates a larger percentage of the total. Colors are used to visually distinguish the different response values, with the color key labeled in Figure 7-9. The upper x-axis is used to indicate the segment number and the lower x-axis is used to indicate the pairwise results. The results from the chi-squared test and both parametric and nonparametric pairwise tests indicated that that boredom, as measured by GEQ03, increased significantly from the first segment (Segment 1) to the last segment (Segment 5) during the test. The remaining analyses of the subjective data yielded the contingency data as in Figure 7-8, with the chi-squared p-value in the upper right and the Wilcoxon results as labels on

the bottom axis. Significance was selected as true when the p-value was less than 0.05.A few selected tables that had a chi-squared p-value greater than 0.05 are included based on comparisons to other significant tables or for discussion of hypothetical patterns. 7.2.2.2 Decline of Enjoyment

As seen in Figure 7-8, boredom increased significantly as the time progressed. The same pattern can be seen in Figure 7-10, which measured participants’ responses to ‘‘I feel exhausted when playing this game.’’ The pattern also appeared when using the scoring guide for the GEQ, Figure 7-12a. However, while still showing a similar visual pattern, the statistical analysis showed that the Runner game did not significantly increase in the negative ratings (Figure 7-12b). The pattern indicated that the Runner game was less boring to play over an extended period of time. The conclusion could be due to the innate nature of the Runner game, as well

91 as player expectations. This may be due to a higher ‘‘flow’’ for runner games when compared to the RPG (Figure 7-11). Runner games are typically fast-paced, but not expected to be mentally challenging. By comparison, players usually expect complex story lines for RPG games. These expectations were not feasible for the study design, but the end result may have been increased boredom in response to the relative simplicity of the RPG game. Another reason for the difference could be the higher cognitive load required for a RPG, which requires two dimensional navigation, versus the Runner, which only requires a single button press. Given that there appeared to be a general increase in negative responses over time, the analysis compared the early and late responses, focusing on the survey responses from segment 1, collected after the participants had played the game for the earlier designated amount of time (Section 4.4), and segment 5, which were collected after playing the generation type for the full amount of time (Figure 7-13). Generally, the pairwise analysis indicates that segment 1 for the manual designs was ‘‘better’’ than segment 5 of the static designs in regards to negative affect scores. But all other categories were too similar to differentiate. This result was also observed when evaluating the GEQ questions individually (Figures C-1 and C-2). However, the start/end boredom effect was apparent when all categories were pooled in Figure 7-8. The decreased ability to identify significant differences in Figures 7-12 and 7-13 was likely due to the smaller sample sizes resulting from the split into generation type categories. However, one can conclude is that the segment 1 (start) values for all three generation types were similar, and all three segment 5 (end) values were similar. The conclusion is that procedural was no better and no worse than any other generation approach. The next step in the analysis separated the start/end values by game type and repeated the process. The pairwise analysis revealed significant differences between segment 1 RPG-m and segment 5 RPG-p and RPG-s (Figure 7-14a), but not for the Runner comparisons (Figure 7-14b). However, the same claim can be made for the Runner game that procedural generation is no better and no worse than any other content generation approach. All other GEQ

92 component analyses were not significant and are included in Appendix C.1.1. No additional information could be gathered from the Fang et al. questions (see Figures in C.1.1). Figure 7-15 shows responses pooled by game and generation type. Notably, in most components of the GEQ, the RPG had significantly different results from the Runner. The component that did not have statistically distinct groups between the two game types was the sensory component. The positive and competence categories were statistically different between the game types, but not between the game generation types. The analysis for these three components can be seen in Figure C-14 in Appendix C.1.2. The last four categories (negative, tension, flow, and challenge) showed some statistical differences between the generation types. Figure 7-15b shows the tension component, with a statistical difference between manual and procedural design generations for both games. However, procedural for the RPG decreased the tension while the tension increased for the Runner. The same type of statistical difference can be seen in the challenge component (Figure 7-15d), with a decrease in challenge for the RPG procedural test when compared to the manual test, but an increase in challenge in the same comparisons for the Runner. 7.2.2.3 Gender and Gamer Label Differences

Gender demographics were collected for this study. Of the 18 total participants, 11 identified as male and 7 identified as female. Figure 7-16 contains all GEQ components for which there was a significant difference between the two gender groups. Generally, female-identifying participants expressed a more enjoyable experience. The exception is for Affect02, ‘‘I feel worried when playing this game,’’ where male-identifying participants submitted more scores and had a lower median (Figure 7-17). The rest of the Fang et al. questions can be found in Figure C-17. Self-assigned gamer experience labels were collected for this study. The breakdown had one expert gamer, 8 frequent gamers, 5 casual gamers, and 4 newbie/non-gamers. Due to when splitting into upper levels (expert + frequent) and lower levels (casual + newbie) resulted in a total of 9 participants in each grouping, brief analysis was applied to this breakdown.

93 While there were some differences in the two groupings, the only notable pattern was for the Runner game. Figure 7-18 shows the results for both the positive and negative affect components of the GEQ for the Runner game. Upper level gamers significantly rated the game in a more positive attitude, in both higher positive affect and lower negative affect. This could be due to gamers who play more frequently can shift into the flow state with more ease than less frequent players. Future research should examine this in greater detail. 7.3 Results

Revisiting the three hypotheses based on the analysis in 7.2: Research Question R1: Procedural generation of infinite runners and RPGs provides enjoyment enhancement upon repeated plays of these games compared to static game environments.

H01: there is no statistical difference between enjoyment of static and procedural generation infinite runners and RPGs

HA1: there is a statistical difference between static and procedural generation games R1 is asking if procedural generation does enhance a game beyond static game play. Static before and after measurements of enjoyment were indistinguishable from procedural (Figures 7-13, 7-14). The analysis failed to reject the null hypothesis of procedural generation providing no enjoyment enhancement from static generation. Research Question R2: Procedural generation of infinite runners and RPGs provides repeated-play enjoyment enhancement equivalent to manual generation.

H02: there is no statistical difference between enjoyment of manual and procedural generation infinite runners and RPGs

HA2: there is a statistical difference between manual and procedural generation infinite runners and RPGs R2 is asking if procedural generation is equivalent to manual generation of game levels in terms of enjoyment. The before and after for manual designs for all games were indistinguishable from procedural (Figures 7-13, 7-14). Therefore, the analysis failed to reject

94 the null hypothesis that manual generation of levels is different than procedural generation at sustaining enjoyment. Research Question R3: Procedural generation provides the same kind of repeated-play enjoyment enhancement between types of procedural generation for infinite runners and RPGs.

H03: there is no statistical difference between enjoyment of infinite runners and RPGs while using procedural generation

HA3: there is a statistical difference between enjoyment of infinite runners and RPGs while using procedural generation R3 is asking if the same factors are enhanced by procedural generation between game types. The enjoyment factors that were affected by procedural generation had opposite effects

(Figure 7-15). The analysis rejected the null hypothesis and accepted the hypothesis HA3

The study failed to reject null hypotheses H01 and H02 that would identify differences in generation types. Based on the questionnaire responses, procedural generation is not providing increased enjoyment beyond what was obtained from static generation games. That being said, procedural generation was the same as manual designs. According to my results, if a game designer wanted to have multiple, varied levels for their game, procedural generation would be a viable option because no enjoyment would be lost by using procedural generation, and procedural generation is likely to be faster and more cost-effective than manual procedures [8,9].

Hypothesis HA3 was accepted; therefore, the choice of game type when implementing procedural generation will affect different aspects of enjoyment. In order to create a questionnaire that accurately measures procedural generation’s contribution to enjoyment, the questions must be able to capture all facets of enjoyment affected by procedural generation. The GEQ, while covering seven different components of enjoyment, did not cover components related to procedural generation. Until these components are found, researchers cannot claim that procedural generation enhances replay value.

95 Closest to mean Furthest from mean

Figure 7-1. The color key used for all physiological box plots. Red indicates the mean of the box plot is close to the overall mean for the participant for the day and cyan indicates the mean of the box plot is the furthest.

Participant 14 Participant 14 Participant 14

● 1.02 1.02 1.02

● 1.00 1.00 1.00

● ● ● ● ●

● ● 0.98 ● 0.98 0.98 ● ● ● ●

skinTemperature skinTemperature skinTemperature ● ● ●

● ● ● ● ●

0.96 ● 0.96 0.96 ●

● ● RPG−m RPG−p RPG−s

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

Segment Segment Segment

Participant 14 Participant 14 Participant 14

● ● ● ●

1.00 1.00 1.00 ● ● ● ● ● ● ● ● ● ● ●

● ● 0.95 0.95 0.95

● ● skinTemperature skinTemperature skinTemperature 0.90 0.90 0.90

● ●

0.85 ● Run−m 0.85 Run−p 0.85 Run−s

1 2 3 4 5 1 2 3 4 5 1 2 3 4

Segment Segment Segment

Figure 7-2. Skin temperature for Participant 14. X-axis is the time segment during which the data was collected. Y-axis is the measured skin temperature scaled to the participant’s average skin temperature for the day. Colors used from the color scale in Figure 7-1. Assigned order was RPG-p, RPG-s, RPG-m, Run-m, Run-p, Run-s. For the RPG game session, skin temperature showed a distinct pattern of increasing, plateauing, then decreasing. Variance was less for the Runner game.

96 Participant 9 Participant 9 Participant 9

● 6 6 6

● 5 5 5 4 4 4

● ● ● ● ● ● ● ● ● ● ● ● EDA ● EDA EDA 3 ● 3 3 ● ● ● ● ●

● 2 2 ● 2 ● ● ● ● ● ● ● ● ● ● ● 1 1 1 RPG−m RPG−p RPG−s

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

Segment Segment Segment

Participant 9 Participant 9 Participant 9

● ● ● ● ● ● ● ● ● ● ● ● ●

● ●

1.5 1.5 1.5 ● ● ● ● ● ● ● ● ● ● ● ● ● 1.0 1.0 1.0

EDA EDA ● EDA ● ●

● ● ● ● ● ● 0.5 0.5 0.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● Run−m Run−p ● Run−s 0.0 0.0 0.0

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

Segment Segment Segment

Figure 7-3. Skin conductivity (EDA) results for Participant 9. X-axis is the time segment during which the data was collected. Y-axis is the measured skin conductivity scaled to the participant’s average skin conductivity for the day. Colors used from the color scale in Figure 7-1. Assigned order was RPG-s, RPG-m, RPG-p, Run-s, Run-m, Run-p. The Runner showed a pattern of increasing conductivity during the first two generation types, with a final plateau. The RPG showed no discernible pattern. The 5th segment was lost for Run-s due to software glitches.

97 Participant 3 Participant 3 Participant 3

1.05 ● 1.05 1.05 ● ● ● ● ● ● ● ● ● ● ●

● ● ● 1.00 1.00 1.00 skinTemperature skinTemperature skinTemperature 0.95 0.95 0.95 RPG−m RPG−p RPG−s

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

Segment Segment Segment

Participant 3 Participant 3 Participant 3

● 1.05 1.05 1.05

● ●

1.00 1.00 1.00 ● 0.95 0.95 0.95

● ● ● 0.90 0.90 0.90 skinTemperature skinTemperature skinTemperature 0.85 0.85 0.85

0.80 Run−m 0.80 Run−p 0.80 Run−s

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

Segment Segment Segment

Figure 7-4. Skin temperature results for Participant 3. X-axis is the time segment during which the data was collected. Y-axis is the measured skin temperature scaled to the participant’s average skin temperature for the day. Colors used from the color scale in Figure 7-1. Assigned order was Run-s, Run-m, Run-p, RPG-m, RPG-p, RPG-s. Notable patterns for all sessions is starting low and increasing during game play.

98 Participant 8 Participant 8 Participant 8

● ● ●

● 1.00 1.00 1.00 0.95 0.95 0.95 skinTemperature skinTemperature skinTemperature 0.90 0.90 0.90

0.85 RPG−m 0.85 RPG−p 0.85 RPG−s

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

Segment Segment Segment

Participant 8 Participant 8 Participant 8

● ● ● ● ● ● ● 1.00 1.00 1.00 ● ● 0.95 0.95 0.95

● ● 0.90 ● 0.90 0.90 ● ● ● ● skinTemperature ● skinTemperature skinTemperature ● ● ● ● ● ● 0.85 ● 0.85 0.85 ● ● ● ● ● ● ● ● ● ● ● Run−m Run−p Run−s 0.80 0.80 0.80

1 2 3 4 1 2 3 4 5 1 2 3 4

Segment Segment Segment

Figure 7-5. Skin temperature results for Participant 8. X-axis is the time segment during which the data was collected. Y-axis is the measured skin temperature scaled to the participant’s average skin temperature for the day. Colors used from the color scale in Figure 7-1. Assigned order was Run-p, Run-s, Run-m, RPG-p, RPG-s, RPG-m. The RPG shows a pattern of increasing and then plateau. The Runner game shows an unusual pattern of minimal variance. The 5th segment was lost for Runner-m and Runner-s due to Empatica software glitches.

99 Participant 9 Participant 9 Participant 9

● ● 1.06 1.06 1.06 1.04 1.04 1.04 1.02 1.02 1.02 1.00 1.00 1.00

● skinTemperature skinTemperature skinTemperature 0.98 0.98 0.98 0.96 0.96 0.96 RPG−m RPG−p RPG−s 0.94 0.94 0.94

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

Segment Segment Segment

Participant 9 Participant 9 Participant 9 1.08 1.08 1.08

● 1.04 1.04 1.04

● ●

● 1.00 1.00 1.00 skinTemperature skinTemperature skinTemperature

0.96 Run−m 0.96 Run−p 0.96 Run−s

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

Segment Segment Segment

Figure 7-6. Skin temperature results for Participant 9. X-axis is the time segment during which the data was collected. Y-axis is the measured skin temperature scaled to the participant’s average skin temperature for the day. Colors used from the color scale in Figure 7-1. Assigned order was RPG-s, RPG-m, RPG-p, Run-s, Run-m, Run-p. Contrary to Figure 7-3, Participant 9’s skin conductivity did not produce the same pattern.

100 Participant 4 Participant 4 Participant 4 1.8 1.8 1.8 1.6 1.6 1.6

1.4 1.4 1.4 ● ● ● ● ● ● ● ● ● ● ● heartRate heartRate heartRate 1.2 1.2 1.2 ●

● 1.0 1.0 1.0

● ● ● ● ● 0.8 0.8 0.8 RPG−m RPG−p RPG−s

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

Segment Segment Segment

Participant 4 Participant 4 Participant 4

2.0 2.0 ● 2.0 1.8 1.8 1.8 1.6 1.6 1.6 ● ● ● ● ● ● ● ● 1.4 1.4 1.4

● ●

1.2 ● 1.2 1.2

heartRate ● heartRate heartRate ● ● ● ● ● 1.0 1.0 1.0 0.8 0.8 0.8

Run−m Run−p ● Run−s 0.6 0.6 0.6

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

Segment Segment Segment

Figure 7-7. Heart rate results for Participant 4. X-axis is the time segment during which the data was collected. Y-axis is the measured heart rate scaled to the participant’s average heart rate for the day. Colors used from the color scale in Figure 7-1. No discernible pattern was observed.

101 Table 7-1. A traditional numerical contingency table for GEQ03 ‘‘I felt bored.’’ The left column indicates the numerical response value. The other columns are for each segment during which the data is collected. The numbers in the segment columns represent the total for each response value in that segment. Response Segment 1 2 3 4 5 0 35 25 23 19 19 1 21 24 21 16 14 2 19 28 15 17 13 3 19 12 27 25 21 4 14 19 22 30 40 Average 1.59 1.78 2.04 2.29 2.46 Median 1.00 2.00 2.00 3.00 3.00

Table 7-2. A traditional numerical contingency table with superscript indicating the pairwise test results for GEQ03 ‘‘I felt bored.’’ Numbers are the same as Table 7-1. Response Segment 1 2 3 4 5 0 35 25 23 19 19 1 21 24 21 16 14 2 19 28 15 17 13 3 19 12 27 25 21 4 14 19 22 30 40 Average 1.59A 1.78AB 2.04ABC 2.29BC 2.46C Median 1.00A 2.00AB 2.00ABC 3.00BC 3.00C

102 Responses for GEQ03 p: < 0.01 Segment 1 2 3 4 5 0 1 Response 2 3 4

A AB ABC BC C Pairwise Significance Tests

Figure 7-8. A visual table of the results for the question ‘‘I felt bored’’ pooled by time segment. Y-axis is the response value from 0 (not at all) to 4 (extremely). X-axis on the top labels the segment for which the response was recorded. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different.

0 1 2 3 4 0 1 2 3 4 5 6 7 8

(a) Colors used to represent responses for (b) Colors used to represent responses for individual questions from the GEQ or component totals from the GEQ. questions from the Fang et al. Questionnaire.

Figure 7-9. The color key used in the subjective contingency tables. Key depends on the data shown.

103 Responses for Affect04 p: < 0.01 Segment 1 2 3 4 5 0 1 Response 2 3 4 A AB ABC BC C Pairwise Significance Tests

Figure 7-10. Responses to the question ‘‘I feel exhausted when playing this game’’ pooled by time segment. Y-axis is the summation of response values, ranging from 0 to 8. X-axis on the top labels the segment for which the response was recorded. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different.

104 Responses for Flow p: < 0.01 Game RPG Run 0 1 2 Response 3 4 5 6 7 8 A B Pairwise Significance Tests

Figure 7-11. Responses to the GEQ component ‘‘flow’’ pooled by game type. Y-axis is the summation of response values, ranging from 0 to 8. X-axis on the top labels the game type for which the response was recorded. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different.

105 Responses for Negative During RPG Responses for Negative During Runner p: 0.1015 p: 0.8461 Segment Segment 1 2 3 4 5 1 2 3 4 5 0 1 0 2 3 1 4 2 Response Response 3 5 4 6 5 7 6 7 8 8

106 A A AC C C A A A A A Pairwise Significance Tests Pairwise Significance Tests

(a) Responses for the RPG game type. (b) Responses for the Runner game type.

Figure 7-12. Responses to the GEQ component ‘‘negative affect’’ pooled by time segment and separated by game type. Y-axis is the summation of response values, ranging from 0 to 8. X-axis on the top labels the segment for which the response was recorded. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different. Responses for Negative p: 0.0827 Segment 1.m 5.m 1.p 5.p 1.s 5.s 0 1 2 3 Response 4 5 6 7 8 A B AB B AB B Pairwise Significance Tests

Figure 7-13. The GEQ component ‘‘negative affect’’ pooled by time segment and generation type, limited to the first (1) and last (5) segments. Y-axis is the summation of response values, ranging from 0 to 8. X-axis on the top labels the segment and generation type for which the response was recorded. The key for generation types is m=manual, p=procedural, and s=static. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different.

107 Responses for Negative During RPG Responses for Negative During Runner p: 0.3203 p: 0.1459 Segment Segment 1.m 5.m 1.p 5.p 1.s 5.s 1.m 5.m 1.p 5.p 1.s 5.s 0 0 1 2 1 3 Response Response 2 4 3 5 4 6 5 108 7 6 8 7 8 A AB AB B AB B A A A A A A Pairwise Significance Tests Pairwise Significance Tests

(a) Responses for the RPG game type. (b) Responses for the Runner game type.

Figure 7-14. Responses to the GEQ component ‘‘negative affect’’ pooled by time segment and generation type, limited to the first (1) and last (5) segments and separated by game type. Y-axis is the summation of response values, ranging from 0 to 8. X-axis on the top labels the segment and generation type for which the response was recorded. The key for generation types is m=manual, p=procedural, and s=static. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different. Responses for Negative Responses for Tension p: < 0.01 p: < 0.01 Game/Gen Game/Gen m.RPG p.RPG s.RPG m.Run p.Run s.Run m.RPG p.RPG s.RPG m.Run p.Run s.Run 0 1 2 0 3 4 1 Response Response 2 3 5 6 4 7 5 8 6 7 8 A AB B D D D A B B D E DE Pairwise Significance Tests Pairwise Significance Tests

(a) Scores for negative affect. (b) Scores for tension.

Responses for Flow Responses for Challenge p: < 0.01 p: < 0.01 Game/Gen Game/Gen m.RPG p.RPG s.RPG m.Run p.Run s.Run m.RPG p.RPG s.RPG m.Run p.Run s.Run 0 0 1 2 1 3 Response Response 2 3 4 4 5 5 6 6 7 7 8 8 A B B AD D AD A B B D E DE Pairwise Significance Tests Pairwise Significance Tests

(c) Scores for flow. (d) Scores for challenge.

Figure 7-15. GEQ components with results pooled by game and generation type. Y-axis is the summation of response values, ranging from 0 to 8. X-axis on the top labels the game and generation type for which the response was recorded. The key for generation types is m=manual, p=procedural, and s=static. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different.

109 Responses for Flow Responses for Tension p: < 0.01 p: < 0.01 Gender Gender female male female male 0 0 1 2 1 3 2 3 Response Response 4 4 5 6 5 6 7 7 8 8 A B A B Pairwise Significance Tests Pairwise Significance Tests

(a) Score for GEQ component ‘‘flow.’’ (b) Score for GEQ component ‘‘tension.’’

Responses for Challenge Responses for Positive p: < 0.01 p: < 0.01 Gender Gender female male female male 0 1 0 2 3 1 2 3 4 Response Response 4 5 5 6 6 7 7 8 8 A B A B Pairwise Significance Tests Pairwise Significance Tests

(c) Score for GEQ component ‘‘challenge.’’ (d) Score for GEQ component ‘‘positive.’’

Figure 7-16. GEQ components with results pooled by gender. Y-axis is the summation of response values, ranging from 0 to 8. X-axis on the top labels the identified gender of the participant for which the response was recorded. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different.

110 Responses for Affect02 p: < 0.01 Gender female male 0 Response 1 2 3 4 A B Pairwise Significance Tests

Figure 7-17. Scores for the Fang et al. question ‘‘I feel worried when playing this game’’ pooled by gender. Y-axis is the response values ranging from 0 to 4. X-axis on the top labels the identified gender of the participant for which the response was recorded. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different.

111 Responses for Negative During Runner Responses for Positive During Runner p: < 0.01 p: < 0.01 Gamer Label Gamer Label Ca+Ne Ex+Fr Ca+Ne Ex+Fr 0 0 1 1 2 2 3 3 4 Response Response 4 5 6 112 7 5 6 8 7 8 A B A B Pairwise Significance Tests Pairwise Significance Tests

(a) Score for GEQ component ‘‘positive affect.’’ (b) Score for GEQ component ‘‘negative affect.’’

Figure 7-18. GEQ components with results pooled by self-ascribed gamer level for the Runner game. Y-axis is the summation of response values, ranging from 0 to 8. X-axis on the top labels the identified gamer level of the participant for which the response was recorded. Gamer levels are pooled into two categories, more experienced (Ex+Fr, Expert and Frequent), and less experienced (Ca+Ne, Casual and Newbie). Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different. CHAPTER 8 SUMMARY AND FUTURE WORK 8.1 Summary and Research Questions Revisited

Video game enjoyment is an important factor in one of the most lucrative entertainment industries [3--5]. Creating content for video games is typically manual, but that is not cost-effective nor feasible for large projects [8, 9]. Some researchers and game developers look to Procedural Content Generation for the solution to this problem. While there are recommendations and algorithmic approaches to validating computer generated content for video games, researchers recommend highly employing HCI methods as the final step [15]. This research began the work to examine how certain measurable enjoyment factors were affected by procedural generation. The research questions to guide this study were:

• R1: Does procedural generation of infinite runners and RPGs provide enjoyment enhancement upon repeated plays of these games compared to static game environments?

• R2: Does procedural generation of infinite runners and RPGs provide repeated-play enjoyment enhancement equivalent to manual generation?

• R3: Does procedural generation provide the same kind of repeated-play enjoyment enhancement between types of procedural generation for infinite runners and RPGs? The conclusion to R1 and R2 is that there is no difference between procedurally generated content, manually designed content, and static unchanging content. The conclusion to R3 is that different games have different aspects affected differently by procedural generation. Contrary to what numerous publications say [16--34, 125, 135], the unique designs in my study did not increase enjoyment as measured by the GEQ or the Fang et al. questionnaire. 8.2 Contributions

The contributions of this work are as follows.

1. Recommendation for procedural generation use. Procedurally generated levels are no different than manually designed levels. If a game designer wishes to have multiple levels for a player to experience, then procedural generation is an excellent choice because it is faster and cheaper than manual generation. There is no statistical difference between procedural and manual generation as measured by GEQ and Fang et al.

113 2. Identified a problem. Procedural generation research cannot claim that PCG enhances replay value compared to a static level design until proven. The GEQ, Fang et al., and physiological data do not support this claim.

3. Developed guidance for future research. Procedural generation does not affect the same factors between different types of games. If researchers are to design a questionnaire to measure what aspects of enjoyment are affected by procedural generation and replay value, the questionnaire needs to be dynamic to cover all kinds of procedural generation, lest there be multiple questionnaires for each individual. 8.3 Limitations and Future Work

The results discovered in this research are important for other researchers interested in the application of procedural generation in video games. The first step will be the publication of this research, which will be a natural follow-up to my initial publication covering the challenges of defining enjoyment in video games [166]. This study only tested two kinds of procedural generation games, while there are a multitude of other PCG categories left unstudied. Additionally, these games were simplified to reduce confounding factors. A next step would perform the test on commercial games with controllable procedural generation. In addition, it will be important to develop a questionnaire specifically designed to capture the ‘‘replay value’’ aspects of enjoyment for use in future PCG-G. Finally, the study measured a decline in enjoyment over time during a single-day session. Future studies should look into multiple-day studies, playing the game once per day. Future work could extend this research by adapting the definition to encompass the desire to replay a game as well. 8.4 Conclusions

This research revealed statistically relevant conclusions about procedural generation and its effect on enjoyment. Most notably, this research demonstrated that computer generated levels were not significantly different from manually designed levels, nor were the computer generated levels different from static levels. Participants experienced the same decline in enjoyment over time regardless of the content generation used. Procedural generation is complicated in its effects on game enjoyment. In order to advance research on the effects of

114 PCG-G on enjoyment researchers cannot rely strictly on heart rate, skin temperature, skin conductivity, the GEQ, or the Fang et al. Questionnaire. Procedural generation may indeed improve ‘‘replay value’’ over an unchanging game, but further research must be explored before this claim can be justified.

115 APPENDIX A LEVEL DESIGN MATERIALS A.1 Infinite Runner Designer

(a) The defaults for the designer.

(b) An example of a designed level using the toolkit, zoomed out with a small draw scale.

Figure A-1. Two screen captures of the designer toolkit provided to create infinite runner levels.

116 Table A-1. Controls for the Infinite Runner Designer Program. Key Control Description Make the current platform smaller e Make the current platform larger w Move the current platform higher s Move the current platform lower a Move the current platform closer to the previous platform d Move the current platform farther from the previous platform

A.2 RPG Designer

(a) ... (b) ...

(c) ... (d) ...

Figure A-2. Example interactions of the RPG Designer Toolkit

117 Table A-2. Table for manual color mapping. Tile Type Hex Code Color Tile Type Hex Code Color water #000066 mountain #663300 grassland #CCFF99 forest #006600 desert #FFFF66 snow #CCFFFF StarterTown #000033 PlainsVillage #003300 MountainTown #330000 DesertTown #993300 DesertPalace #FF6600 ForestVillage #66FFCC MasterSword #CC66CC MountainCave #330066 PortTown #990066 SnowyVillage #0066CC ThiefHideout #009999 FishingTown #666666 FinalPalace #000000

118 APPENDIX B USER STUDY MATERIALS B.1 Recruitment Flyer

Figure B-1. Recruiting poster displayed publicly.

119 B.2 Screening Form

Please fill out the following information in order to participate in the video game user study. Contact Elizabeth Matthews at lmatthews (at) wlu (dot) edu with any questions. Email address: Name: Which of the following describes your experiences with video games? ◦ Expert Gamer ◦ Frequent Gamer ◦ Casual Gamer ◦ Newbie / Non-Gamer What kinds of games do you enjoy? (Select as many as needed)

 Action  Adventure  RPG  Simulation  Strategy  Casual  MMOs  Puzzle  Fighting  First Person Shooter  Side-scrolling / Beat ’Em Up  Survival  Rhythm   Platformer  Infinite Runner

120  Fantasy RPG  Other: B.3 Informed Consent Form Informed Consent IRB#: IRB201801277(University of Florida) IRB.201819.007(Washington and Lee University) Protocol Title The Effects of Procedural Generation on Repeated Game Play Enjoyment Please read this consent document carefully before you decide to participate in this study. You have been asked to participate in a research study reviewed and approved by the Washington and Lee University Institutional Review Board for Research with Human Subjects. The purpose of this study, in terms of your participation, as well as any expected risks and benefits, must be fully explained to you before you sign this form and give your consent to participate. Purpose of the research study The purpose of this study is to examine the effects of procedural content generation (PCG) on game enjoyment in repeated game play environments over extended periods of time. PCG is, among other things, a method where the computer designs the levels for you to play in the game. What you will be asked to do in the study The study will take place over two days, with each day pertaining to a different video game type. The order in which you are assigned game types and content generation types will be randomized. The two types of games you will be playing are an Infinite Runner style and Role Playing Game (RPG) style. On each day, following a brief explanation and tutorial of how to play the assigned game type, you will be asked to volunteer to play a game for several hours. You will be equipped

121 with a physiological sensor (an Empatica e4) on your wrist to gather physical data as you play. The types of measurements that the E4 records are Heart rate, EDA (Skin conductivity), Accelerometers, and Skin temperature. Baseline physiological measurements will be taken and then you will be allowed to play the game. While playing the game, the system will periodically stop to request data from you, but otherwise you are encouraged to relax and play the game as you would normally. You will use a controller to play the game and the mouse to fill out the survey when it appears. The levels you will play will be designed in three different ways, separated into three sections. The computer screen and a webcam capture of your face will be recorded during this session. Audio will be recorded as well. Between each section you will be provided with a break to stand up, move around, use the restroom, etc. before restarting the test. The Empatica will be removed during this break. The system will notify you to contact the observer at these points. Compensation will be provided after both days are completed. Time required RPG: 30 Minutes * 3 + break time = 2 hours estimated total Infinite Runner: 10 Minutes * 3 + break time = 1 hours estimated total Total: 2 hours + 1 hour = 3 hours over two days Risks and Benefits Expected risks are equivalent to those of extended time spent in front of a computer monitor, with brief breaks provided every 30 minutes for the RPG or 10 minutes for the Infinite Runner. We do not anticipate that you will benefit directly by participating in this experiment. Compensation You will be paid $20.00 compensation for participating in this research. The compensation is prorated for the two-day study. Each day you will receive $10.00 in the form of a visa gift card. Confidentiality

122 Any information derived from this research project which personally identifies you will not be voluntarily released or disclosed without your separate consent, except as specifically required by law. Your information will be assigned a code number. The list connecting your name to this number will be kept in a locked file in my faculty supervisors office. When the study is completed and the data has been analyzed, the list will be destroyed. Your name will not be used in any report. Voluntary participation Your participation in this study is completely voluntary. There is no penalty for not participating. You may withdraw your participation at any point during the study by contacting the observer or emailing Elizabeth Matthews, contact information below. Also, you may skip any question you would prefer not to answer. The investigator may withdraw you from participation at his/her professional discretion. You have the right to withdraw from the study at any time without consequence. Who to contact if you have questions about the study Elizabeth Matthews, Assistant Professor, Computer Science Department, Parmly Hall 408, 204 W. Washington St., Lexigton, VA 244650, email [email protected], phone 352-870-1822. Juan E. Gilbert, PhD, Department of Computer & Information Science & Engineering, 432 Newell Dr, Gainesville, FL 32611, phone 352-562-0784. Who to contact about your rights as a research participant in the study If, at any time, you have questions regarding the conduct of this research, or if you wish to discuss you rights as a research participant, you may contact the chair of the Institutional Review Board for Research with Human Subjects, Bryan Price, at [email protected] or 458-8316. Agreement I have read the procedure described above. I voluntarily agree to participate in the procedure and I have received a copy of this description. You will be given a copy of this consent form to keep. I confirm that I am 18 years or older and consent to participate in this study.

123 Participant Name: Participant Signature: Date: Investigator Signature: Date: B.4 Instructions

Instructions were provided to participants as shown in Figure B-2

124 125

(a) Control instructions for the Infinite Runner.

(b) Control instructions for the RPG.

Figure B-2. Game instruction pages provided to participants. Figure also shows the control setup for each game. B.5 Game Experience Questionnaire

Please indicate how you felt while playing the game for each of the items, on the following scale: not at all slightly moderately fairly extremely

0 1 2 3 4

B.5.1 In-Game GEQ

1. I was interested in the game’s story ...... 0 1 2 3 4

2. I felt successful ...... 0 1 2 3 4

3. I felt bored ...... 0 1 2 3 4

4. I found it impressive ...... 0 1 2 3 4

5. I forgot everything around me ...... 0 1 2 3 4

6. I felt frustrated ...... 0 1 2 3 4

7. I found it tiresome ...... 0 1 2 3 4

8. I felt irritable ...... 0 1 2 3 4

9. I felt skillful ...... 0 1 2 3 4

10. I felt completely absorbed ...... 0 1 2 3 4

11. I felt content ...... 0 1 2 3 4

12. I felt challenged ...... 0 1 2 3 4

13. I had to put a lot of effort into it ...... 0 1 2 3 4

14. I felt good ...... 0 1 2 3 4 B.5.2 Scoring Guide

In-Game version. The In-game Module consists of seven components that are identical to the core Module except that only two items are used for every component. The items for each are listed below. Component scores are computed as the average value of its items.

126 • Competence: Items 2 and 9.

• Sensory and Imaginative Immersion: Items 1 and 4.

• Flow: Items 5 and 10.

• Tension: Items 6 and 8.

• Challenge: Items 12 and 13.

• Negative affect: Items 3 and 7.

• Positive affect: Items 11 and 14. B.6 Fang et al. Questionnaire

Affect.

1. I feel unhappy when playing this game...... 0 1 2 3 4

2. I feel worried when playing this game...... 0 1 2 3 4

3. I feel happy when playing this game...... 0 1 2 3 4

4. I feel exhausted when playing this game...... 0 1 2 3 4

5. I feel miserable when playing this game...... 0 1 2 3 4

127 APPENDIX C ADDITIONAL STUDY DATA C.1 Subjective Data

This section contains additional subjective data collected from all participants. C.1.1 Before and After by Generation Type

Responses for GEQ03 p: 0.0842 Segment 1.m 5.m 1.p 5.p 1.s 5.s 0 1 Response 2 3 4

A B AB B AB B Pairwise Significance Tests

Figure C-1. Responses to ‘‘I felt bored’’ pooled by time segment and generation type, limited to the first and last segments. Y-axis is the response values ranging from 0 to 4. X-axis on the top labels the segment and generation type for which the response was recorded. The key for generation types is m=manual, p=procedural, and s=static. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different.

128 Responses for GEQ07 p: 0.1758 Segment 1.m 5.m 1.p 5.p 1.s 5.s 0 1 Response 2 3 4 A AB AB AB AB B Pairwise Significance Tests

Figure C-2. Responses to ‘‘I found it tiresome’’ pooled by time segment and generation type, limited to the first and last segments. Y-axis is the response values ranging from 0 to 4. X-axis on the top labels the segment and generation type for which the response was recorded. The key for generation types is m=manual, p=procedural, and s=static. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different.

129 Responses for Competence During RPG Responses for Competence During Runner p: 0.8187 p: 0.733 Segment Segment 1.m 5.m 1.p 5.p 1.s 5.s 1.m 5.m 1.p 5.p 1.s 5.s 0 1 0 2 1 3 2 4 3 Response Response 5 4 6 5 130 6 7 7 8 A A A A A A A A A A A A Pairwise Significance Tests Pairwise Significance Tests

(a) Responses for the RPG game type. (b) Responses for the Runner game type.

Figure C-3. Responses to the GEQ component ‘‘competence’’ pooled by time segment and generation type, limited to the first and last segments. Y-axis is the summation of response values, ranging from 0 to 8. X-axis on the top labels the segment and generation type for which the response was recorded. The key for generation types is m=manual, p=procedural, and s=static. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different. Responses for Sensory During RPG Responses for Sensory During Runner p: 0.3081 p: 0.5377 Segment Segment 1.m 5.m 1.p 5.p 1.s 5.s 1.m 5.m 1.p 5.p 1.s 5.s 0 0 1 1 2 Response Response 2 3 4 3 5 4 131 6 5 7 6 A A A A A A A A A A A A Pairwise Significance Tests Pairwise Significance Tests

(a) Responses for the RPG game type. (b) Responses for the Runner game type.

Figure C-4. Responses to the GEQ component ‘‘sensory’’ pooled by time segment and generation type, limited to the first and last segments. Y-axis is the summation of response values, ranging from 0 to 8. X-axis on the top labels the segment and generation type for which the response was recorded. The key for generation types is m=manual, p=procedural, and s=static. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different. Responses for Flow During RPG Responses for Flow During Runner p: 0.7664 p: 0.352 Segment Segment 1.m 5.m 1.p 5.p 1.s 5.s 1.m 5.m 1.p 5.p 1.s 5.s 0 0 1 2 1 3 2 3 Response Response 4 132 4 5 6 7 6 8 A A A A A A A A A A A A Pairwise Significance Tests Pairwise Significance Tests

(a) Responses for the RPG game type. (b) Responses for the Runner game type.

Figure C-5. Responses to the GEQ component ‘‘flow’’ pooled by time segment and generation type, limited to the first and last segments. Y-axis is the summation of response values, ranging from 0 to 8. X-axis on the top labels the segment and generation type for which the response was recorded. The key for generation types is m=manual, p=procedural, and s=static. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different. Responses for Tension During RPG Responses for Tension During Runner p: 0.7036 p: 0.2857 Segment Segment 1.m 5.m 1.p 5.p 1.s 5.s 1.m 5.m 1.p 5.p 1.s 5.s 0 1 0 2 3 1 Response Response 2 3 4 4 133 5 6 5 A A A A A A A A A A A A Pairwise Significance Tests Pairwise Significance Tests

(a) Responses for the RPG game type. (b) Responses for the Runner game type.

Figure C-6. Responses to the GEQ component ‘‘tension’’ pooled by time segment and generation type, limited to the first and last segments. Y-axis is the summation of response values, ranging from 0 to 8. X-axis on the top labels the segment and generation type for which the response was recorded. The key for generation types is m=manual, p=procedural, and s=static. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different. Responses for Challenge During RPG Responses for Challenge During Runner p: 0.7036 p: 0.2857 Segment Segment 1.m 5.m 1.p 5.p 1.s 5.s 1.m 5.m 1.p 5.p 1.s 5.s 0 1 0 2 3 1 Response Response 2 3 134 4 4 5 6 5 A A A A A A A A A A A A Pairwise Significance Tests Pairwise Significance Tests

(a) Responses for the RPG game type. (b) Responses for the Runner game type.

Figure C-7. Responses to the GEQ component ‘‘challenge’’ pooled by time segment and generation type, limited to the first and last segments. Y-axis is the summation of response values, ranging from 0 to 8. X-axis on the top labels the segment and generation type for which the response was recorded. The key for generation types is m=manual, p=procedural, and s=static. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different. Responses for Positive During RPG Responses for Positive During Runner p: 0.0376 p: 0.9845 Segment Segment 1.m 5.m 1.p 5.p 1.s 5.s 1.m 5.m 1.p 5.p 1.s 5.s 0 1 2 0 3 1 2 4 3 Response Response 4 5 5 6 135 6 7 8 7 A A A A A A A A A A A A Pairwise Significance Tests Pairwise Significance Tests

(a) Responses for the RPG game type. (b) Responses for the Runner game type.

Figure C-8. Responses to the GEQ component ‘‘positive affect’’ pooled by time segment and generation type, limited to the first and last segments. Y-axis is the summation of response values, ranging from 0 to 8. X-axis on the top labels the segment and generation type for which the response was recorded. The key for generation types is m=manual, p=procedural, and s=static. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different. Responses for Affect01 During RPG Responses for Affect01 During Runner p: 0.4691 p: 0.5711 Segment Segment 1.m 5.m 1.p 5.p 1.s 5.s 1.m 5.m 1.p 5.p 1.s 5.s 0 0 Response Response 1 2 1 3 136 2 3 4 4 A A A A A A A A A A A A Pairwise Significance Tests Pairwise Significance Tests

(a) Responses for the RPG game type. (b) Responses for the Runner game type.

Figure C-9. Results for the Fang et al. question ‘‘I feel unhappy when playing this game.’’ Y-axis is the response values ranging from 0 to 4. X-axis on the top labels the segment and generation type for which the response was recorded. The key for generation types is m=manual, p=procedural, and s=static. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different. Responses for Affect02 During RPG Responses for Affect02 During Runner p: 0.4214 p: 0.9245 Segment Segment 1.m 5.m 1.p 5.p 1.s 5.s 1.m 5.m 1.p 5.p 1.s 5.s 0 0 Response Response 137 1 1 2 3 2 4 3 4 A A A A A A A A A A A A Pairwise Significance Tests Pairwise Significance Tests

(a) Responses for the RPG game type. (b) Responses for the Runner game type.

Figure C-10. Results for the Fang et al. question ‘‘I feel worried when playing this game.’’ Y-axis is the response values ranging from 0 to 4. X-axis on the top labels the segment and generation type for which the response was recorded. The key for generation types is m=manual, p=procedural, and s=static. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different. Responses for Affect03 During RPG Responses for Affect03 During Runner p: 0.1815 p: 0.984 Segment Segment 1.m 5.m 1.p 5.p 1.s 5.s 1.m 5.m 1.p 5.p 1.s 5.s 0 0 1 2 1 Response Response 138 2 3 3 4 A A A A A A A A A A A A Pairwise Significance Tests Pairwise Significance Tests

(a) Responses for the RPG game type. (b) Responses for the Runner game type.

Figure C-11. Results for the Fang et al. question ‘‘I feel happy when playing this game.’’ Y-axis is the response values ranging from 0 to 4. X-axis on the top labels the segment and generation type for which the response was recorded. The key for generation types is m=manual, p=procedural, and s=static. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different. Responses for Affect04 During RPG Responses for Affect04 During Runner p: 0.0243 p: 0.3064 Segment Segment 1.m 5.m 1.p 5.p 1.s 5.s 1.m 5.m 1.p 5.p 1.s 5.s 0 0 1 Response Response 2 1 3 2 139 3 4 4 A AB AB B AB B A A A A A A Pairwise Significance Tests Pairwise Significance Tests

(a) Responses for the RPG game type. (b) Responses for the Runner game type.

Figure C-12. Results for the Fang et al. question ‘‘I feel exhausted when playing this game.’’ Y-axis is the response values ranging from 0 to 4. X-axis on the top labels the segment and generation type for which the response was recorded. The key for generation types is m=manual, p=procedural, and s=static. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different. Responses for Affect05 During RPG Responses for Affect05 During Runner p: 0.7966 p: 0.9406 Segment Segment 1.m 5.m 1.p 5.p 1.s 5.s 1.m 5.m 1.p 5.p 1.s 5.s 0 0 1 Response Response 140 2 1 2 3 4 3 A A A A A A A A A A A A Pairwise Significance Tests Pairwise Significance Tests

(a) Responses for the RPG game type. (b) Responses for the Runner game type.

Figure C-13. Results for the Fang et al. question ‘‘I feel miserable when playing this game.’’ Y-axis is the response values ranging from 0 to 4. X-axis on the top labels the segment and generation type for which the response was recorded. The key for generation types is m=manual, p=procedural, and s=static. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different. C.1.2 Game and Generation

Responses for Competence Responses for Sensory p: < 0.01 p: < 0.01 Game/Gen Game/Gen m.RPG p.RPG s.RPG m.Run p.Run s.Run m.RPG p.RPG s.RPG m.Run p.Run s.Run 0 0 1 1 2 3 Response Response 2 3 4 4 5 5 6 6 7 7 8 A A A D D D A A A A A A Pairwise Significance Tests Pairwise Significance Tests

(a) Results for the GEQ component (b) Results for the GEQ component ‘‘competence.’’ ‘‘sensory.’’

Responses for Positive Responses for Affect01 p: < 0.01 p: < 0.01 Game/Gen Game/Gen m.RPG p.RPG s.RPG m.Run p.Run s.Run m.RPG p.RPG s.RPG m.Run p.Run s.Run 0 0 1 2 3 1 Response Response 4 2 3 5 6 4 7 8 A A A D D D A A A D D AD Pairwise Significance Tests Pairwise Significance Tests

(c) Results for the GEQ component (d) Results for the Fang et al. question ‘‘I ‘‘positive affect.’’ feel unhappy when playing this game.’’

Figure C-14. Results pooled by game and generation type that did not have significant differences between the generation types. Y-axis is the response values. X-axis on the top labels the game and generation type for which the response was recorded (m=manual, p=procedural, and s=static). Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different.

141 Responses for Affect02 Responses for Affect03 p: < 0.01 p: < 0.01 Game/Gen Game/Gen m.RPG p.RPG s.RPG m.Run p.Run s.Run m.RPG p.RPG s.RPG m.Run p.Run s.Run 0 0 Response Response 1 2 1 3 2 3 4 4 A A A A A A A A A D D D Pairwise Significance Tests Pairwise Significance Tests

(a) Results for the Fang et al. question ‘‘I feel (b) Results for the Fang et al. question ‘‘I feel worried when playing this game.’’ happy when playing this game.’’

Responses for Affect04 Responses for Affect05 p: < 0.01 p: < 0.01 Game/Gen Game/Gen m.RPG p.RPG s.RPG m.Run p.Run s.Run m.RPG p.RPG s.RPG m.Run p.Run s.Run 0 0 1 1 Response Response 2 2 3 3 4 4 A A A D D D A A A D D D Pairwise Significance Tests Pairwise Significance Tests

(c) Results for the Fang et al. question ‘‘I feel (d) Results for the Fang et al. question ‘‘I feel exhausted when playing this game.’’ miserable when playing this game.’’

Figure C-15. Results pooled by game and generation type that did not have significant differences between the generation types. Y-axis is the response values, ranging from 0 to 4. X-axis on the top labels the game and generation type for which the response was recorded. The key for generation types is m=manual, p=procedural, and s=static. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different.

142 C.1.3 Gender

Responses for Competence Responses for Negative p: < 0.01 p: 0.1301 Gender Gender female male female male 0 0 1 1 2 2 3 3 4 Response Response 5 4 5 6 6 7 7 143 8 8 A A A A Pairwise Significance Tests Pairwise Significance Tests

(a) Score for GEQ component ‘‘competence’’ by gender. (b) Score for GEQ component ‘‘negative affect’’ by gender.

Figure C-16. Components of the GEQ that did not have statistically different grouping when pooled by gender. Y-axis is the summation of response values, ranging from 0 to 8. X-axis on the top labels the identified gender of the participant for which the response was recorded. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different. Responses for Affect01 Responses for Affect03 p: 0.0112 p: 0.1004 Gender Gender female male female male 0 0 1 Response Response 1 2 2 3 3 4 4 A B A B Pairwise Significance Tests Pairwise Significance Tests

(a) Scores for ‘‘I feel unhappy when playing this (b) Scores for ‘‘I feel happy when playing this game’’ game’

Responses for Affect04 Responses for Affect05 p: < 0.01 p: < 0.01 Gender Gender female male female male 0 0 1 1 2 Response Response 3 2 4 3 4 A B A A Pairwise Significance Tests Pairwise Significance Tests

(c) Scores for ‘‘I feel exhausted when playing (d) Scores for ‘‘I feel miserable when playing this this game’ game’

Figure C-17. Fang et al. questions with results pooled by gender. Y-axis is the response values, ranging from 0 to 4. X-axis on the top labels the identified gender of the participant for which the response was recorded. Bottom X-axis labels the pairwise analysis: columns which do not share a letter are significantly different.

144 REFERENCES [1] , ‘‘Rogue,’’ [PC], 1980. [2] M. Mountains. (2017) 2d+3d infinite runner engine. [Online]. Available: https://www.assetstore.unity3d.com/en/#!/content/51328 [3] E. A. Boyle, T. M. Connolly, T. Hainey, and J. M. Boyle, ‘‘Engagement in digital entertainment games: A systematic review,’’ Computers in Human Behavior, vol. 28, no. 3, pp. 771--780, 2012. [4] Entertainment Software Association. Essential facts about the computer and video game industry. [Online]. Available: http://essentialfacts.theesa.com/ [5] ------. Computer and video game sales in the united states from 2000 to 2015 (in billion u.s. dollars). [Online]. Available: https://www.statista.com/statistics/273258/ us-computer-and-video-game-sales/ [6] P. Vorderer, C. Klimmt, and U. Ritterfeld, ‘‘Enjoyment: At the heart of media entertainment,’’ Communication theory, vol. 14, no. 4, pp. 388--408, 2004. [7] P. Wyeth, D. M. Johnson, and P. Sweetser, ‘‘Conceptualising, operationalising and measuring the player experience in videogames,’’ in Extended Proceedings of the Fun and Games Conference 2012. IRIT Press, 2012, pp. 90--93. [8] Y. Takatsuki. (2007) Cost headache for game developers. [Online]. Available: news.bbc.co.uk/2/hi/business/7151961.stm [9] A. Iosup, ‘‘Poggi: generating puzzle instances for online games on grid infrastructures,’’ Concurrency and Computation: Practice and Experience, vol. 23, no. 2, pp. 158--171, 2011. [10] M. Hendrikx, S. Meijer, J. Van Der Velden, and A. Iosup, ‘‘Procedural content generation for games: A survey,’’ ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 9, no. 1, p. 1, 2013. [11] G. Smith, M. Treanor, J. Whitehead, and M. Mateas, ‘‘Rhythm-based level generation for 2d platformers,’’ in Proceedings of the 4th International Conference on Foundations of Digital Games. ACM, 2009, pp. 175--182. [12] M. Booth. (2009) The ai systems of . [Online]. Available: www.valvesoftware.com/publications/2009/ai systems of l4d mike booth.pdf [13] Wikipedia. (2017) Roguelike wikipedia entry. [Online]. Available: https: //en.wikipedia.org/wiki/Roguelike [14] Blizzard North, ‘‘Diablo (),’’ 1996. [15] N. Shaker, G. Smith, and G. N. Yannakakis, Evaluating content generators. Cham: Springer International Publishing, 2016, pp. 215--224.

145 [16] C. Ashmore and M. Nitsche, ‘‘The in a generated world.’’ in DiGRA Conference, 2007. [17] N. Barreto, A. Cardoso, and L. Roque, ‘‘Computational creativity in procedural content generation: A state of the art survey,’’ in Proceedings of the 2014 Conference of Science and Art of Video Games, 2014. [18] G. A. Barros, A. Liapis, and J. Togelius, ‘‘Playing with data: Procedural generation of adventures from open data.’’ in DiGRA/FDG, 2016. [19] C. Beckham and C. Pal, ‘‘A step towards procedural terrain generation with gans,’’ arXiv preprint arXiv:1707.03383, 2017. [20] S. Dahlskog and J. Togelius, ‘‘Patterns and procedural content generation: revisiting mario in world 1 level 1,’’ in Proceedings of the First Workshop on Design Patterns in Games. ACM, 2012, p. 1. [21] M. Ihsan, ‘‘Application of knapsack problem in procedurally generated game levels,’’ 2013. [22] R. Giacinto, ‘‘Procedural generation of road systems in computer games.’’ [23] E. J. Hastings, R. K. Guha, and K. O. Stanley, ‘‘Automatic content generation in the galactic arms race video game,’’ IEEE Transactions on Computational Intelligence and AI in Games, vol. 1, no. 4, pp. 245--263, 2009. [24] M. Henschke, D. Hobbs, and B. Wilkinson, ‘‘Developing serious games for children with cerebral palsy: case study and pilot trial,’’ in Proceedings of the 24th Australian Computer-Human Interaction Conference. ACM, 2012, pp. 212--221. [25] P. Mawhorter and M. Mateas, ‘‘Procedural level generation using occupancy-regulated extension,’’ in Computational Intelligence and Games (CIG), 2010 IEEE Symposium on. IEEE, 2010, pp. 351--358. [26] C. Patrascu and S. Risi, ‘‘Artefacts: Minecraft meets collaborative interactive evolution,’’ in Computational Intelligence and Games (CIG), 2016 IEEE Conference on. IEEE, 2016, pp. 1--8. [27] L. T. Pereira, C. Toledo, L. N. Ferreira, and L. H. Lelis, ‘‘Learning to speed up evolutionary content generation in physics-based puzzle games,’’ in Tools with Artificial Intelligence (ICTAI), 2016 IEEE 28th International Conference on. IEEE, 2016, pp. 901--907. [28] D. Sampath, ‘‘Abrcon, adaptive object re-configuration: an approach to enhance, repeat playability of games and repeat watchability of movies,’’ in Proceedings of the 2004 ACM SIGCHI International Conference on Advances in computer entertainment technology. ACM, 2004, pp. 313--316. [29] G. Scott and D. Toy, ‘‘Constrained procedural map generation for two-dimensional games.’’

146 [30] M. Sewell et al., ‘‘Developing an interactive space simulation game featuring procedural content generation,’’ 2007. [31] N. Shaker, G. N. Yannakakis, and J. Togelius, ‘‘Towards automatic personalized content generation for platform games.’’ in AIIDE, 2010. [32] D. Williams-King, J. Denzinger, J. Aycock, and B. Stephenson, ‘‘The gold standard: Automatically generating puzzle game levels.’’ in AIIDE, 2012, pp. 191--196. [33] A. Zafar and H. Mujtaba, ‘‘Identifying catastrophic failures in offline level generation for mario,’’ in Frontiers of Information Technology (FIT), 2012 10th International Conference on. IEEE, 2012, pp. 62--67. [34] G. M. Smith, ‘‘Expressive design tools: Procedural content generation for game designers,’’ Ph.D. dissertation, University of California, Santa Cruz, 2012. [35] M. Csikszentmihalyi and I. S. Csikszentmihalyi, Optimal experience: Psychological studies of flow in consciousness. Cambridge university press, 1992. [36] P. Sweetser and P. Wyeth, ‘‘Gameflow: a model for evaluating player enjoyment in games,’’ Computers in Entertainment (CIE), vol. 3, no. 3, pp. 3--3, 2005. [37] J. Chen, ‘‘Flow in games (and everything else),’’ Communications of the ACM, vol. 50, no. 4, pp. 31--34, 2007. [38] K. Jegers, ‘‘Pervasive game flow: understanding player enjoyment in pervasive gaming,’’ Computers in Entertainment (CIE), vol. 5, no. 1, p. 9, 2007. [39] F.-L. Fu, R.-C. Su, and S.-C. Yu, ‘‘Egameflow: A scale to measure learners’ enjoyment of e-learning games,’’ Computers & Education, vol. 52, no. 1, pp. 101--112, 2009. [40] N. H. M. Zain, A. Jaafar, and F. H. A. Razak, ‘‘Enjoyable game design: Validation of motor-impaired user gameflow model,’’ International Journal of Computer Theory and Engineering, vol. 8, no. 2, p. 116, 2016. [41] K. Jegers, ‘‘Pervasive gameflow.: A validated model of player enjoyment in pervasive gaming,’’ 2007. [42] L. Nacke and C. A. Lindley, ‘‘Flow and immersion in first-person shooters: measuring the player’s gameplay experience,’’ in Proceedings of the 2008 Conference on Future Play: Research, Play, Share. ACM, 2008, pp. 81--88. [43] L. E. Nacke and C. A. Lindley, ‘‘Affective ludology, flow and immersion in a first-person shooter: Measurement of player experience,’’ arXiv preprint arXiv:1004.0248, 2010. [44] W. IJsselsteijn, Y. de Kort, and K. Poels, ‘‘The game experience questionnaire,’’ Manuscript in preparation, 2008.

147 [45] P. Vorderer, W. Wirth, F. R. Gouveia, F. Biocca, T. Saari, L. J¨ancke,S. B¨ocking, H. Schramm, A. Gysbers, T. Hartmann et al., ‘‘Mec spatial presence questionnaire,’’ Retrieved Sept, vol. 18, p. 2015, 2004. [46] K. Procci and C. Bowers, ‘‘An examination of flow and immersion in games,’’ in Proceedings of the Human Factors and Ergonomics Society Annual Meeting, vol. 55, no. 1. SAGE Publications, 2011, pp. 2183--2187. [47] R. Weber, R. Tamborini, A. Westcott-Baker, and B. Kantor, ‘‘Theorizing flow and media enjoyment as cognitive synchronization of attentional and reward networks,’’ Communication Theory, vol. 19, no. 4, pp. 397--422, 2009. [48] L. K. Kaye, ‘‘Motivations, experiences and outcomes of playing videogames.’’ Ph.D. dissertation, University of Central Lancashire, 2012. [49] M. C. Angelides and H. Agius, ‘‘Know thy player: An integrated model of player experience for digital games research,’’ 2014. [50] B. S. Frey and M. Osterloh, Successful management by motivation: Balancing intrinsic and extrinsic incentives. Springer Science & Business Media, 2001. [51] J. M. Keller, Motivational design for learning and performance: The ARCS model approach. Springer Science & Business Media, 2009. [52] ------, ‘‘Imms: Instructional materials motivation survey,’’ Florida State University, 1987. [53] L. Derbali and C. Frasson, ‘‘Prediction of players motivational states using electrophysiological measures during play,’’ in 2010 10th IEEE International Conference on Advanced Learning Technologies. IEEE, 2010, pp. 498--502. [54] T. T. Cota, L. Ishitani, and N. Vieira, ‘‘Mobile game design for the elderly: A study with focus on the motivation to play,’’ Computers in Human Behavior, vol. 51, pp. 96--105, 2015. [55] I. Ghergulescu and C. H. Muntean, ‘‘A novel sensor-based methodology for learner’s motivation analysis in game-based learning,’’ Interacting with Computers, vol. 26, no. 4, pp. 305--320, 2014. [56] F. Madeira, P. Arriaga, J. Adri˜ao,R. Lopes, F. Esteves et al., ‘‘Emotional gaming,’’ Psychology of gaming, vol. 1, pp. 12--29, 2013. [57] J. Cho, M. P. Boyle, H. Keum, M. D. Shevy, D. M. McLeod, D. V. Shah, and Z. Pan, ‘‘Media, terrorism, and emotionality: Emotional differences in media content and public reactions to the september 11th terrorist attacks,’’ Journal of Broadcasting & Electronic Media, vol. 47, no. 3, pp. 309--327, 2003. [58] A. Bartsch, ‘‘Emotional gratification in entertainment experience. why viewers of movies and television series find it rewarding to experience emotions,’’ Media Psychology, vol. 15, no. 3, pp. 267--302, 2012.

148 [59] R. Horlings, D. Datcu, and L. J. Rothkrantz, ‘‘Emotion recognition using brain activity,’’ in Proceedings of the 9th international conference on computer systems and technologies and workshop for PhD students in computing. ACM, 2008, p. 6. [60] A. Haag, S. Goronzy, P. Schaich, and J. Williams, ‘‘Emotion recognition using bio-sensors: First steps towards an automatic system,’’ in Tutorial and research workshop on affective dialogue systems. Springer, 2004, pp. 36--48. [61] J. M. Kivikangas, L. Nacke, and N. Ravaja, ‘‘Developing a triangulation system for digital game events, observational video, and psychophysiological data to study emotional responses to a virtual character,’’ Entertainment Computing, vol. 2, no. 1, pp. 11--16, 2011. [62] P. Mirza-Babaei, L. E. Nacke, J. Gregory, N. Collins, and G. Fitzpatrick, ‘‘How does it play better?: exploring user testing and biometric storyboards in games user research,’’ in Proceedings of the SIGCHI conference on human factors in computing systems. ACM, 2013, pp. 1499--1508. [63] S. Deterding, ‘‘The joys of absence: Emotion, emotion display, and interaction tension in video game play,’’ Proc. FDG, vol. 15, 2015. [64] Y.-M. Fang, K.-M. Chen, and Y.-J. Huang, ‘‘Emotional reactions of different interface formats: Comparing digital and traditional board games,’’ Advances in Mechanical Engineering, vol. 8, no. 3, p. 1687814016641902, 2016. [65] A. Landowska and M. R. Wr´obel,‘‘Affective reactions to playing digital games,’’ in 2015 8th International Conference on Human System Interaction (HSI). IEEE, 2015, pp. 264--270. [66] D. Watson, L. A. Clark, and A. Tellegen, ‘‘Development and validation of brief measures of positive and negative affect: the panas scales.’’ Journal of personality and social psychology, vol. 54, no. 6, p. 1063, 1988. [67] L. Claes, S. Jim´enez-Murcia,J. J. Santamar´ıa,M. B. Moussa, I. S´anchez,L. Forcano, N. Magnenat-Thalmann, D. Konstantas, M. L. Overby, J. Nielsen et al., ‘‘The facial and subjective emotional reaction in response to a video game designed to train emotional regulation (playmancer),’’ European Eating Disorders Review, vol. 20, no. 6, pp. 484--489, 2012. [68] S. Rigby and R. Ryan, ‘‘White paper: The player experience of need satisfaction (pens),’’ Immersyve, Inc, Tech. Rep., September 2007. [69] R. Tamborini, N. D. Bowman, A. Eden, M. Grizzard, and A. Organ, ‘‘Defining media enjoyment as the satisfaction of intrinsic needs,’’ Journal of communication, vol. 60, no. 4, pp. 758--777, 2010. [70] R. Tamborini, M. Grizzard, N. David Bowman, L. Reinecke, R. J. Lewis, and A. Eden, ‘‘Media enjoyment as need satisfaction: The contribution of hedonic and nonhedonic needs,’’ Journal of Communication, vol. 61, no. 6, pp. 1025--1042, 2011.

149 [71] F. Guay, R. J. Vallerand, and C. Blanchard, ‘‘On the assessment of situational intrinsic and extrinsic motivation: The situational motivation scale (sims),’’ Motivation and emotion, vol. 24, no. 3, pp. 175--213, 2000. [72] J. L. Neys, J. Jansz, and E. S. Tan, ‘‘Exploring persistence in gaming: The role of self-determination and social identity,’’ Computers in Human Behavior, vol. 37, pp. 196--209, 2014. [73] D. Rieger, T. Wulf, J. Kneer, L. Frischlich, and G. Bente, ‘‘The winner takes it all: The effect of in-game success and need satisfaction on mood repair and enjoyment,’’ Computers in Human Behavior, vol. 39, pp. 281--286, 2014. [74] R. M. Ryan and E. L. Deci, ‘‘Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being.’’ American psychologist, vol. 55, no. 1, p. 68, 2000. [75] L. Reinecke, R. Tamborini, M. Grizzard, R. Lewis, A. Eden, and N. David Bowman, ‘‘Characterizing mood management as need satisfaction: The effects of intrinsic needs on selective exposure and mood repair,’’ Journal of Communication, vol. 62, no. 3, pp. 437--453, 2012. [76] R. Hampel, Adjektiv-Skalen zur Einsch¨atzungder Stimmung (SES), 1977. [77] D. Gronwall, ‘‘Paced auditory serial-addition task: a measure of recovery from concussion,’’ Perceptual and motor skills, vol. 44, no. 2, pp. 367--373, 1977. [78] L. Wittgenstein, Philosophical investigations. John Wiley & Sons, 2010. [79] A. M. Leiker, A. T. Bruzi, M. W. Miller, M. Nelson, R. Wegman, and K. R. Lohse, ‘‘The effects of autonomous difficulty selection on engagement, motivation, and learning in a motion-controlled video game task,’’ Human movement science, vol. 49, pp. 326--335, 2016. [80] H. Schoenau-Fog, S. Louchart, T. Lim, and M. T. Soto-Sanfiel, ‘‘Narrative engagement in games-a continuation desire perspective.’’ in FDG, 2013, pp. 384--387. [81] P. Bouvier, K. Sehaba, E. Lavou´e,and S. George, ‘‘Using traces to qualify learner’s engagement in game-based learning,’’ in 2013 IEEE 13th International Conference on Advanced Learning Technologies. IEEE, 2013, pp. 432--436. [82] P. Bouvier, K. Sehaba, and E.´ Lavou´e,‘‘A trace-based approach to identifying users’ engagement and qualifying their engaged-behaviours in interactive systems: application to a social game,’’ User Modeling and User-Adapted Interaction, vol. 24, no. 5, pp. 413--451, 2014. [83] P. Bouvier, E. Lavou´e,and K. Sehaba, ‘‘Defining engagement and characterizing engaged-behaviors in digital gaming,’’ Simulation & Gaming, p. 1046878114553571, 2014.

150 [84] M. Li, Q. Jiang, C.-H. Tan, and K.-K. Wei, ‘‘Enhancing user-game engagement through software gaming elements,’’ Journal of Management Information Systems, vol. 30, no. 4, pp. 115--150, 2014. [85] T. Marsh and B. Nardi, ‘‘Spheres and lenses: Activity-based scenario/narrative approach for design and evaluation of entertainment through engagement,’’ in International Conference on Entertainment Computing. Springer, 2014, pp. 42--51. [86] G. Schiavo, A. Cappelletti, and M. Zancanaro, ‘‘Engagement recognition using easily detectable behavioral cues,’’ Intelligenza Artificiale, vol. 8, no. 2, pp. 197--210, 2014. [87] K. C. Procci, ‘‘The subjective gameplay experience: An examination of the revised game engagement model,’’ Ph.D. dissertation, University of Central Florida Orlando, Florida, 2015. [88] C. Silpasuwanchai, X. Ma, H. Shigemasu, and X. Ren, ‘‘Developing a comprehensive engagement framework of gamification for reflective learning,’’ in Proceedings of the 2016 ACM Conference on Designing Interactive Systems. ACM, 2016, pp. 459--472. [89] V. Riemer and C. Schrader, ‘‘Impacts of behavioral engagement and self-monitoring on the development of mental models through serious games: Inferences from in-game measures,’’ Computers in Human Behavior, vol. 64, pp. 264--273, 2016. [90] J. Bardzell, S. Bardzell, T. Pace, and J. Karnell, ‘‘Making user engagement visible: a multimodal strategy for interactive media experience research,’’ in CHI’08 extended abstracts on Human factors in computing systems. ACM, 2008, pp. 3663--3668. [91] T. McMahan, I. Parberry, and T. D. Parsons, ‘‘Evaluating electroencephalography engagement indices during video game play,’’ in Proceedings of the Foundations of Digital Games Conference, 2015. [92] ------, ‘‘Evaluating player task engagement and arousal using electroencephalography,’’ Procedia Manufacturing, vol. 3, pp. 2303--2310, 2015. [93] K. Procci, N. James, and C. Bowers, ‘‘The effects of gender, age, and experience on game engagement,’’ in Proceedings of the human factors and ergonomics society annual meeting, vol. 57, no. 1. SAGE Publications, 2013, pp. 2132--2136. [94] A. I. A. Jabbar and P. Felicia, ‘‘Gameplay engagement and learning in game-based learning a systematic review,’’ Review of Educational Research, vol. 85, no. 4, pp. 740--779, 2015. [95] J. Carifio and R. J. Perla, ‘‘Ten common misunderstandings, misconceptions, persistent myths and urban legends about likert scales and likert response formats and their antidotes,’’ Journal of Social Sciences, vol. 3, no. 3, pp. 106--116, 2007. [96] X. Fang, S. Chan, J. Brzezinski, and C. Nair, ‘‘Development of an instrument to measure enjoyment of computer game play,’’ Intl. Journal of Human--Computer Interaction, vol. 26, no. 9, pp. 868--886, 2010.

151 [97] X. Fang, S. Chan, and C. Nair, ‘‘An online survey system on computer game enjoyment and personality,’’ in International Conference on Human-Computer Interaction. Springer, 2009, pp. 304--314. [98] X. Feng, S. Chan, J. Brzezinski, and C. Nair, ‘‘Measuring enjoyment of computer game play,’’ AMCIS 2008 Proceedings, p. 306, 2008. [99] J. D. Ivory and S. Kalyanaraman, ‘‘The effects of technological advancement and violent content in video games on players’ feelings of presence, involvement, physiological arousal, and aggression,’’ Journal of Communication, vol. 57, no. 3, pp. 532--555, 2007. [100] A. M. Limperos, M. G. Schmierbach, A. D. Kegerise, and F. E. Dardis, ‘‘Gaming across different consoles: exploring the influence of control scheme on game-player enjoyment,’’ Cyberpsychology, Behavior, and Social Networking, vol. 14, no. 6, pp. 345--350, 2011. [101] M. Klarkowski, D. Johnson, P. Wyeth, M. McEwan, C. Phillips, and S. Smith, ‘‘Operationalising and evaluating sub-optimal and optimal play experiences through challenge-skill manipulation,’’ in Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 2016, pp. 5583--5594. [102] E. D. Mekler, J. A. Bopp, A. N. Tuch, and K. Opwis, ‘‘A systematic review of quantitative studies on the enjoyment of digital entertainment games,’’ in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2014, pp. 927--936. [103] D. Johnson, P. Wyeth, M. Clark, and C. Watling, ‘‘Cooperative game play with avatars and agents: Differences in brain activity and the experience of play,’’ in Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 2015, pp. 3721--3730. [104] W. Wirth, F. Ryffel, T. Von Pape, and V. Karnowski, ‘‘The development of video game enjoyment in a role playing game,’’ Cyberpsychology, Behavior, and Social Networking, vol. 16, no. 4, pp. 260--264, 2013. [105] J. R. A. Santos, ‘‘Cronbach’s alpha: A tool for assessing the reliability of scales,’’ Journal of extension, vol. 37, no. 2, pp. 1--5, 1999. [106] A. C. Dirican and M. G¨okt¨urk,‘‘Psychophysiological measures of human cognitive states applied in human computer interaction,’’ Procedia Computer Science, vol. 3, pp. 1361--1367, 2011. [107] L. E. Nacke, S. Stellmach, and C. A. Lindley, ‘‘Electroencephalographic assessment of player experience: A pilot study in affective ludology,’’ Simulation & Gaming, 2010. [108] J. Frey, C. M¨uhl,F. Lotte, and M. Hachet, ‘‘Review of the use of electroencephalography as an evaluation method for human-computer interaction,’’ arXiv preprint arXiv:1311.2222, 2013.

152 [109] J. M. Kivikangas, G. Chanel, B. Cowley, I. Ekman, M. Salminen, S. J¨arvel¨a,and N. Ravaja, ‘‘A review of the use of psychophysiological methods in game research,’’ Journal of Gaming & Virtual Worlds, vol. 3, no. 3, pp. 181--199, 2011. [110] S. Rank and C. Lu, ‘‘Physsigtk: Enabling engagement experiments with physiological signals for game design,’’ in Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on. IEEE, 2015, pp. 968--969. [111] L. E. Nacke, ‘‘An introduction to physiological player metrics for evaluating games,’’ in Game Analytics. Springer, 2013, pp. 585--619. [112] Y. Lee and W.-H. Yeo, ‘‘Skin-like electronics for a persistent brain-computer interface,’’ Journal of Nature and Science, vol. 1, no. 7, p. 132, 2015. [113] R. L. Nabi and M. Krcmar, ‘‘Conceptualizing media enjoyment as attitude: Implications for mass media effects research,’’ Communication Theory, vol. 14, no. 4, pp. 288--310, 2004. [114] J. H. Brockmyer, C. M. Fox, K. A. Curtiss, E. McBroom, K. M. Burkhart, and J. N. Pidruzny, ‘‘The development of the game engagement questionnaire: A measure of engagement in video game-playing,’’ Journal of Experimental Social Psychology, vol. 45, no. 4, pp. 624--634, 2009. [115] M. C. Green and T. C. Brock, ‘‘The role of transportation in the persuasiveness of public narratives.’’ Journal of personality and social psychology, vol. 79, no. 5, p. 701, 2000. [116] N. Vos, H. Van Der Meijden, and E. Denessen, ‘‘Effects of constructing versus playing an educational game on student motivation and strategy use,’’ Computers & Education, vol. 56, no. 1, pp. 127--137, 2011. [117] A. Drachen, L. E. Nacke, G. Yannakakis, and A. L. Pedersen, ‘‘Correlation between heart rate, electrodermal activity and player experience in first-person shooter games,’’ in Proceedings of the 5th ACM SIGGRAPH Symposium on Video Games. ACM, 2010, pp. 49--54. [118] L. J. Cronbach, ‘‘Coefficient alpha and the internal structure of tests,’’ psychometrika, vol. 16, no. 3, pp. 297--334, 1951. [119] M. Tavakol and R. Dennick, ‘‘Making sense of cronbach’s alpha,’’ International journal of medical education, vol. 2, p. 53, 2011. [120] J. NUNNALY, ‘‘Psychometric theory, ch. 6,’’ 1978. [121] R. Khaled, M. J. Nelson, and P. Barr, ‘‘Design metaphors for procedural content generation in games,’’ in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 2013, pp. 1509--1518. [122] N. Shaker, J. Togelius, G. N. Yannakakis, B. Weber, T. Shimizu, T. Hashiyama, N. Sorenson, P. Pasquier, P. Mawhorter, G. Takahashi et al., ‘‘The 2010 mario ai

153 championship: Level generation track,’’ IEEE Transactions on Computational Intelligence and AI in Games, vol. 3, no. 4, pp. 332--347, 2011. [123] I. Dart and M. J. Nelson, ‘‘Smart terrain causality chains for adventure-game puzzle generation,’’ in Computational Intelligence and Games (CIG), 2012 IEEE Conference on. IEEE, 2012, pp. 328--334. [124] W. Baghdadi, F. S. Eddin, R. Al-Omari, Z. Alhalawani, M. Shaker, and N. Shaker, ‘‘A procedural method for automatic generation of spelunky levels,’’ in European Conference on the Applications of Evolutionary Computation. Springer, 2015, pp. 305--317. [125] K. Compton and M. Mateas, ‘‘Procedural level design for platform games.’’ in AIIDE, 2006, pp. 109--111. [126] N. Sorenson, P. Pasquier, and S. DiPaola, ‘‘A generic approach to challenge modeling for the procedural creation of video game levels,’’ IEEE Transactions on Computational Intelligence and AI in Games, vol. 3, no. 3, pp. 229--244, 2011. [127] N. Shaker, M. Nicolau, G. N. Yannakakis, J. Togelius, and M. O’neill, ‘‘Evolving levels for super mario bros using grammatical evolution,’’ in Computational Intelligence and Games (CIG), 2012 IEEE Conference on. IEEE, 2012, pp. 304--311. [128] D. Yu and A. Hull, ‘‘Spelunky (pc game),’’ 2009. [129] M. Frade, F. F. de Vega, and C. Cotta, ‘‘Automatic evolution of programs for procedural generation of terrains for video games,’’ Soft Computing, vol. 16, no. 11, pp. 1893--1914, 2012. [130] J. Doran and I. Parberry, ‘‘Controlled procedural terrain generation using software agents,’’ IEEE Transactions on Computational Intelligence and AI in Games, vol. 2, no. 2, pp. 111--119, 2010. [131] M. Nitsche, C. Ashmore, W. Hankinson, R. Fitzpatrick, J. Kelly, and K. Margenau, ‘‘Designing procedural game spaces: A case study,’’ Proceedings of FuturePlay, vol. 2006, 2006. [132] L. Cardamone, G. N. Yannakakis, J. Togelius, and P. L. Lanzi, ‘‘Evolving interesting maps for a first person shooter,’’ in European Conference on the Applications of Evolutionary Computation. Springer, 2011, pp. 63--72. [133] L. V. Carvalho, A.´ V. Moreira, V. Vicente Filho, M. Albuquerque, and G. L. Ramalho, ‘‘A generic framework for procedural generation of gameplay sessions,’’ 2013. [134] A. Saltsman, ‘‘Canabalt (pc game),’’ Adam Atomic, 2009. [135] H. M. Decker-Davis, ‘‘Generating challenge: Game design for procedural spaces,’’ Ph.D. dissertation, The Savannah College of Art and Design, 2012.

154 [136] O. K. Moore Jr, ‘‘Procedural content generation: Using ai to generate playable content,’’ 2015. [137] O. Korn, M. Blatz, A. Rees, J. Schaal, V. Schwind, and D. G¨orlich,‘‘Procedural content generation for game props? a study on the effects on user experience,’’ Computers in Entertainment (CIE), vol. 15, no. 2, p. 1, 2017. [138] D. Loiacono, ‘‘Learning, evolution and adaptation in racing games,’’ in Proceedings of the 9th conference on Computing Frontiers. ACM, 2012, pp. 277--284. [139] Mojang, ‘‘Minecraft (pc game),’’ 2011. [140] E. A. Matthews and B. A. Malloy, ‘‘Procedural generation of story-driven maps,’’ in Computer Games (CGAMES), 2011 16th International Conference on. IEEE, 2011, pp. 107--112. [141] M. Scirea, Y.-G. Cheong, M. J. Nelson, and B.-C. Bae, ‘‘Evaluating musical foreshadowing of videogame narrative experiences,’’ in Proceedings of the 9th Audio Mostly: A Conference on Interaction With Sound. ACM, 2014, p. 8. [142] J. Togelius, M. Preuss, N. Beume, S. Wessing, J. Hagelb¨ack,G. N. Yannakakis, and C. Grappiolo, ‘‘Controllable procedural map generation via multiobjective evolution,’’ Genetic Programming and Evolvable Machines, vol. 14, no. 2, pp. 245--277, 2013. [143] J. Whitehead, ‘‘Toward proccedural decorative ornamentation in games,’’ in Proceedings of the 2010 Workshop on Procedural Content Generation in Games. ACM, 2010, p. 9. [144] J. T. Alexander, J. Sear, and A. Oikonomou, ‘‘An investigation of the effects of game difficulty on player enjoyment,’’ Entertainment Computing, vol. 4, no. 1, pp. 53--62, 2013. [145] D. Wheat, M. Masek, C. P. Lam, and P. Hingston, ‘‘Dynamic difficulty adjustment in 2d platformers through agent-based procedural level generation,’’ in Systems, Man, and Cybernetics (SMC), 2015 IEEE International Conference on. IEEE, 2015, pp. 2778--2785. [146] S. Bakkes, S. Whiteson, G. Li, G. V. Vi¸sniuc,E. Charitos, N. Heijne, and A. Swellengrebel, ‘‘Challenge balancing for personalised game spaces,’’ in Games Media Entertainment (GEM), 2014 IEEE. IEEE, 2014, pp. 1--8. [147] H. Yu and T. Trawick, ‘‘Personalized procedural content generation to minimize frustration and boredom based on ranking algorithm.’’ in AIIDE, 2011. [148] G. N. Yannakakis, P. Spronck, D. Loiacono, and E. Andr´e,‘‘Player modeling,’’ in Dagstuhl Follow-Ups, vol. 6. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2013. [149] J. Schell, The Art of Game Design: A book of lenses. CRC Press, 2014. [150] S. Xue, M. Wu, J. Kolen, N. Aghdaie, and K. A. Zaman, ‘‘Dynamic difficulty adjustment for maximized engagement in digital games,’’ in Proceedings of the 26th International

155 Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee, 2017, pp. 465--471. [151] M. Cook, J. Gow, and S. Colton, ‘‘Towards the automatic optimisation of procedural content generators,’’ in Computational Intelligence and Games (CIG), 2016 IEEE Conference on. IEEE, 2016, pp. 1--8. [152] ------, ‘‘Danesh: Helping bridge the gap between procedural generators and their output,’’ in Proc. PCG Workshop, 2016. [153] W. M. Reis, L. H. Lelis et al., ‘‘Human computation for procedural content generation in platform games,’’ in Computational intelligence and games (cig), 2015 ieee conference on. IEEE, 2015, pp. 99--106. [154] J. R. Marino, W. M. Reis, and L. H. Lelis, ‘‘An empirical evaluation of evaluation metrics of procedurally generated mario levels,’’ AIIDE, pp. 44--50, 2015. [155] Square, ‘‘Final fantasy vi (super nintendo entertainment system game),’’ 1994. [156] J. K¨atsyri,R. Hari, N. Ravaja, L. Nummenmaa et al., ‘‘Just watching the game ain’t enough: striatal fmri reward responses to successes and failures in a video game during active and vicarious playing,’’ 2013. [157] B. Ip and E. Adams, ‘‘From casual to core: A statistical mechanism for studying gamer dedication,’’ Article published in and is available online at http://www. gamasutra. com/features/20020605/ip pfv. htm, 2002. [158] M. Garbarino, M. Lai, D. Bender, R. W. Picard, and S. Tognetti, ‘‘Empatica e3—a wearable wireless multi-sensor device for real-time computerized biofeedback and data acquisition,’’ in Wireless Mobile Communication and Healthcare (Mobihealth), 2014 EAI 4th International Conference on. IEEE, 2014, pp. 39--42. [159] (2019) Python 3.7. [Online]. Available: https://www.python.org/ [160] (2019) Pygame 1.9.6. [Online]. Available: https://www.pygame.org/ [161] (2019) Pymunk 5.5.0. [Online]. Available: http://www.pymunk.org/en/latest/ [162] (2019) euclid 0.01. [Online]. Available: https://pypi.org/project/euclid/ [163] (2019) noise 1.2.2. [Online]. Available: https://pypi.org/project/noise/ [164] (2019) Scipy 1.3.0. [Online]. Available: https://www.scipy.org/ [165] L. Li and J.-h. Chen, ‘‘Emotion recognition using physiological signals,’’ in International Conference on Artificial Reality and Telexistence. Springer, 2006, pp. 437--446. [166] E. Matthews, G. Matthews, and J. E. Gilbert, ‘‘A framework for the assessment of enjoyment in video games,’’ in Human-Computer Interaction. Interaction Technologies, M. Kurosu, Ed. Cham: Springer International Publishing, 2018, pp. 460--476.

156 BIOGRAPHICAL SKETCH Elizabeth Anne Matthews was born in June of 1986 as the second child of Robin Adele Matthews and Geoffrey B Matthews in Bellingham, Washington. She was a 2004 graduate of Sehome High School in the Bellingham School District. Elizabeth attended Western Washington University in Bellingham, Washington where she graduated in 2009 with a Bachelor of Science in Computer Science and a minor in Mathematics. After leaving Western Washington University, Elizabeth attended Clemson University in Clemson, South Carolina, where she earned a Master of Science in Computer Science in the fall of 2014. In 2019, Elizabeth received her Ph.D. in Human-Centered Computing, under the advisement of Dr. Juan E. Gilbert, from the Computer and Information Science Engineering Department in the University of Florida’s Herbert Wertheim College of Engineering.

157