ABSTRACT

DOMÍNGUEZ, IGNACIO XAVIER. Influencing Behavior Unobtrusively in Virtual Gaming Environments. (Under the direction of David L. Roberts).

Virtual gaming environments mediate interaction between players within a game and/or be- tween a player and the game itself to make gameplay more enjoyable. Although game designers strive to achieve a delicate balance between content and interaction that promotes player enjoy- ment, sometimes players want to take actions that would result in this balance being broken. Many games simply prevent player actions that will put the game in an undesired state at the expense of player agency. More recent approaches attempt to accommodate player actions by adapting the game, but this solution has limited applicability.

In this work, we introduce an alternative way to reduce the tension between player actions and authorial constraints by unobtrusively influencing player behavior to promote player actions that align with authorial intentions. This enables players choosing author-desired actions out of their own volition, therefore preserving player agency. Subtly influencing player behavior is also useful to make games easier or harder, and we also present examples of behavior influence techniques that can be used to alter players’ performance, effectively controlling the game’s difficulty. As an extension, we can use behavior influence techniques to inspect how player behavior changes and compare it to a known baseline, allowing authors to obtain additional insight about their players, such as determining if players are actively attentive and engaged, or if players are likely to be automated agents instead of human players.

We also introduce a taxonomy that partitions behavior influence on two categories. The first category corresponds to the type of behavior being affected in terms of level of abstraction: input- level or semantic. The second category of the taxonomy classifies the motivation for wanting to modify player behavior, or its purpose: narrative, difficulty, or scrutiny. Through a series of case studies, we show examples of unobtrusive behavior influence in virtual gaming environments across both types of behavior and with all three purposes in our taxonomy. © Copyright 2018 by Ignacio Xavier Domínguez

All Rights Reserved Influencing Behavior Unobtrusively in Virtual Gaming Environments

by Ignacio Xavier Domínguez

A dissertation submitted to the Graduate Faculty of North Carolina State University in partial fulfillment of the requirements for the Degree of Doctor of Philosophy

Computer Science

Raleigh, North Carolina

2018

APPROVED BY:

Tiffany Barnes Arnav Jhala

Robert St. Amant David L. Roberts Chair of Advisory Committee DEDICATION

To Sofía and Marco

ii BIOGRAPHY

Ignacio writes beautiful software, teaches, and conducts academic research. He likes to be around computers. Some people argue that Ignacio might be a computer himself; a claim which he vehe- mently denies by pointing out his occasional human-like emotional responses.

iii ACKNOWLEDGEMENTS

I would like to begin by thanking my advisor Dave Roberts. Thank you for your guidance, mentorship, kindness and support, and for always being excited about my work. I’ve learned so much from you.

I’d also like to thank my committee members Arnav Jhala, Tiffany Barnes, and especially Rob St.

Amant.

To my parents, thank you for providing me with all the opportunities that ultimately led me here.

This accomplishment is yours as much as it is mine. I love you both very much.

To my wife Desirée Romero, thank you for riding this crazy train with me for what feels like an eternity. Thank you for your understanding and patience, and for believing in me even when I doubted myself. I look forward to writing the next chapter together.

Very special thanks to Margaret Heil and Robert Fornaro for their love, support, and constant encouragement. You’ve become family, and I wouldn’t have gotten this far without you. I am forever grateful.

Finally, many thanks to current and past members of the CIIGAR Lab, and of our sibling lab the Liquid Narrative Group, especially to Rogelio Cardona-Rivera, Sean Mealin, Robert Loftin, John

Majikes, Justus Robertson, Markus Eger (technically in the POEM Lab), Adam Amos-Binks, and Julio

Bahamón.

iv TABLE OF CONTENTS

LIST OF TABLES ...... viii

LIST OF FIGURES ...... x

Chapter 1 Introduction ...... 1 1.1 Thesis...... 4 1.2 Summary of the Dissertation ...... 6

Chapter 2 Taxonomy of Behavior Influence ...... 9 2.1 Influence...... 10 2.2 Player Behavior in VGEs...... 12 2.2.1 Avatars...... 13 2.2.2 Asymmetry...... 14 2.2.3 Dynamic Adaptation...... 15 2.3 Description of the Taxonomy...... 15 2.3.1 Type of Behavior...... 16 2.3.2 Purpose...... 17

Chapter 3 Case Study: The Concentration Game ...... 20 3.1 Introduction ...... 20 3.2 Method...... 23 3.2.1 Experimental Design ...... 23 3.2.2 Population and Sampling ...... 26 3.2.3 Description of the Environment ...... 27 3.2.4 Experimental Procedure...... 28 3.2.5 Evaluation Metrics ...... 29 3.3 Analysis and Results...... 30 3.4 Discussion...... 35

Chapter 4 Case Study: The Typing Game ...... 38 4.1 Introduction ...... 38 4.2 Method...... 40 4.2.1 Experimental Design ...... 40 4.2.2 Population and Sampling ...... 41 4.2.3 Description of the Environment ...... 41 4.2.4 Experimental Procedure...... 43 4.2.5 Evaluation Metrics ...... 45 4.3 Analysis and Results...... 46 4.3.1 Improvement with Practice...... 46 4.3.2 Familiarity with Words ...... 48 4.3.3 Additional Analyses ...... 50 4.4 Discussion...... 52

v Chapter 5 Case Study: The Mimesis Effect ...... 54 5.1 Introduction ...... 54 5.1.1 Narrative Roles...... 57 5.1.2 Interactive Narrative Role-Playing Games ...... 58 5.2 Method...... 59 5.2.1 Experimental Design ...... 59 5.2.2 Population and Sampling ...... 60 5.2.3 Description of the Environment ...... 61 5.2.4 Experimental Procedure...... 63 5.2.5 Evaluation Metrics ...... 64 5.3 Analysis and Results...... 67 5.3.1 H1: Choice Correspondence to Explicit Roles...... 67 5.3.2 H2: Choice Correspondence to an Implicit Role...... 69 5.3.3 H3: No Preferred Role in Control Group...... 70 5.3.4 H4: Less Variability with Explicit Roles...... 71 5.3.5 H5: Choice Consistency Increases...... 71 5.3.6 H6: First Choices Are Predictive of Implicit Role...... 74 5.4 Discussion...... 75

Chapter 6 Case Study: Asymmetric VEs ...... 80 6.1 Introduction ...... 80 6.2 Method...... 82 6.2.1 Experimental Design ...... 83 6.2.2 Population and Sampling ...... 84 6.2.3 Description of the Environment ...... 84 6.2.4 Experimental Procedure...... 86 6.2.5 Evaluation Metrics ...... 89 6.3 Analysis and Results...... 90 6.3.1 Individual Performance ...... 90 6.3.2 Group Performance ...... 91 6.4 Discussion...... 93

Chapter 7 Conclusions ...... 96 7.1 Summary...... 96 7.2 Impact ...... 97

BIBLIOGRAPHY ...... 99

APPENDICES ...... 109 Appendix A Concentration Game Feature Plots...... 110 A.1 Analysis 1 Plots ...... 111 A.2 Analysis 2 Plots ...... 112 A.3 Analysis 3 Plots ...... 113 Appendix B Mimesis Effect Validation Phase...... 114

vi B.1 Requirements...... 114 B.1.1 Requirement #1: External Validity of Game Experience ...... 115 B.1.2 Requirement #2: Controlling for Player Role Biases ...... 116 B.1.3 Requirement #3: Actions are Recognizably Role-Specific ...... 118 B.2 Validation Survey...... 118 B.2.1 Validation of Role-Attribute Mappings...... 119 B.2.2 Validation of Role Descriptions...... 120 B.2.3 Validation of Role Gender and Behavioral Alignment ...... 120 B.2.4 Validation of Action Choices...... 121

vii LIST OF TABLES

Table 1.1 Overview of the dissertation, showing how each case study fits into our taxon- omy of behavior influence. In Chapter3 and in Chapter4 we explore input-level behavior with difficulty and scrutiny purposes. In Chapter5 we explore semantic behavior with narrative and scrutiny purposes. Finally, in Chapter6 we explore techniques that apply to many types of behavior and with many purposes, and present an experimental design that focuses on semantic behavior with a diffi- culty purpose...... 8

Table 3.1 Participant assignment per condition...... 27 Table 3.2 Global averages of features per type of round. In parentheses, standard deviation values...... 32 Table 3.3 Assignment of round types to classes for each analysis. In parentheses, percent- age of rounds in each class for each analysis...... 33 Table 3.4 Accuracy results for our first and second analyses per condition...... 34 Table 3.5 Accuracy results for our third analysis per condition. For each class in our classi- fication, Pn columns show precision values, Rn columns show recall values, and Fn columns show F-scores...... 34

Table 4.1 Description of the rounds in our Typing Game...... 44 Table 4.2 Normalized mean and standard deviation of the inter-keystroke interval of par- ticipants on the "replay" condition on each of the first four attempts of every round...... 47 Table 4.3 Normalized mean and standard deviation of the number of mistakes made by participants on the "replay" condition on each of the first four attempts of every round...... 48

Table 5.1 Distribution of participants across experiment conditions...... 60 Table 5.2 Score (and standard deviation) for the 7 questions in the Intrinsic Motivation Inventory (1. I enjoyed doing this activity very much, 2. This activity was fun to do, 3. I thought this was a boring activity, 4. This activity did not hold my attention at all, 5. I would describe this activity as very interesting, 6. I thought this activity was quite enjoyable, 7. While I was doing this activity, I was thinking about how much I enjoyed it)...... 68 Table 5.3 Number of actions chosen by participants with explicit roles that corresponded to each of the roles. In parentheses, the proportion of each value for each player 2 role (row). Players were significantly consistent with their explicit roles (χ = 1286.3, p < 0.0001, φc = 0.563)...... 68 Table 5.4 Number of actions chosen by participants with explicit roles based on whether their role was chosen or assigned. In parentheses, the proportion of each value for each condition (row). We found a statistically significant (but small) increase 2 in consistency when players chose their role (χ = 22.365, p < 0.0001, φc = 0.106). 69

viii Table 5.5 Number of participants assigned to each cluster and the total number of actions chosen corresponding to each role by cluster. Participants in our control group 2 were significantly consistent with an implicit role (χ = 356.19, p < 0.0001, φc = 2 0.602), and had no significant preference for any particular one (χ = 0.34146, p = 0.843)...... 70 Table 5.6 Goodness of fit measure (R 2) and significance values for the linear models fitted to evaluate H5. We did not find significant differences in consistency...... 72 Table 5.7 Proportion of consistent choices made by players by their implicit or explicit role, and by experimental condition, including sample numbers and variance of consistency...... 73 Table 5.8 For each of the three clustering analyses, number of participants assigned to each cluster and the total number of actions chosen corresponding to each role by cluster...... 74 Table 5.9 Confusion matrix, classification accuracy, precision, recall, and F-score when considering players’ first three, four, and five choices made in the game...... 76

Table 6.1 Full factorial experimental design including number of participants and number of game sessions per experimental treatment...... 84

Table B.1 Choice point options in the order they were presented in the game. The value in parentheses indicates the level of agreement of the assignment of a particular choice option to its particular role...... 121

ix LIST OF FIGURES

Figure 2.1 Waskul & Lust [117] persona-player-person boundaries...... 13

Figure 3.1 User interface of the Concentration game...... 28 Figure 3.2 Visualizations of mouse motion and click activity on different types of rounds. Notice in Figure 3.2c that the player made four mismatches despite being able to see the contents of all tiles for the entire duration of the round, supporting the idea that they were trying to conceal their behavior...... 31

Figure 4.1 Screenshot of the game board showing a set of a single word on the fourth column and second row of the grid...... 42 Figure 4.2 Comparison of the average normalized inter-keystroke interval and normalized number of mistakes by word type on the first attempt of every round...... 49 Figure 4.3 Comparison of the average normalized inter-keystroke interval and normalized number of mistakes by condition on the first attempt of every round. The vertical lines separate rounds by word type...... 51

Figure 5.1 Part of the animation sprite used for the player’s avatar, which was modeled after Perlin’s Polly [91] to avoid the Proteus Effect [121] – the phenomenon that users conform to expected behaviors associated with an avatar’s appearance. . . 61 Figure 5.2 Screenshot of a sample in-game level environment...... 62 Figure 5.3 Screenshot of a sample in-game dialog box...... 63 Figure 5.4 Screenshot of a sample in-game action selection screen...... 64 Figure 5.5 Screenshot of a sample in-game cutscene...... 65 Figure 5.6 Plot synopsis for our game. Numbers preceding some of the plot points corre- spond to choices the player encountered to resolve that plot point, enumerated in Table B.1...... 66 Figure 5.7 Plots of the proportion of consistent choices made by players by their implicit or explicit role, and by experimental condition, as shown in Table 5.7. The blue lines and gray shadows represent the fitted linear regressions and confidence intervals, respectively...... 72

Figure 6.1 Icons of items used in our game, as shown to participants...... 83 Figure 6.2 Avatar models in red and gray...... 83 Figure 6.3 Leaderboards shown in L+ treatments...... 85 Figure 6.4 High-level architecture diagram...... 86 Figure 6.5 Screenshots of both game tasks...... 87 Figure 6.6 Average number of items found by players by the color of their avatars and the type of game session...... 91 Figure 6.7 Average distance, in game units, between players’ placement of items on the map to all items’ original location...... 92 Figure 6.8 Average duration of the scavenger hunt task by avatar appearance and game type. 93

x Figure 6.9 Average duration of the map reconstruction task by avatar appearance and game type...... 94 Figure 6.10 Avatar models in red and gray...... 95

Figure A.1 Histograms of all 6 features. In blue, values for no reveal rounds. In red, values for mixed reveal rounds...... 111 Figure A.2 Histograms of all 6 features. In blue, values for no reveal rounds. In red, values for full reveal rounds...... 112 Figure A.3 Histograms of all 6 features. In blue, values for no reveal rounds. In red, values for full reveal rounds. In green, values for partial reveal rounds...... 113

Figure B.1 Our triad of role-attribute mappings. We selected three attributes and identi- fied three corresponding roles we felt best represented the attributes. Nodes represent role-attribute mappings, and edges are attributes shared between the connected role-attribute mappings. The edge opposite a node is the antonymic attribute to the node’s role-attribute...... 117

xi Chapter 1

Introduction

In general, a virtual environment (VE) is a digital space that allows participants to interact among themselves and/or with digital elements. In an increasingly digital world, VEs are commonly used to mediate the completion of tasks that range from solving complex problems to facilitating remote communication. Virtual gaming environments (VGEs) are special in that they are designed for enter- tainment purposes and possess ludic properties. While most VEs mediate to make task completion more efficient, VGEs mediate to make task completion (gameplay) more enjoyable.

Several approaches have been proposed to intentionally create enjoyable gaming experiences, as well as to analyze existing games to identify what makes them fun. A well-known example of such approaches is the MDA Framework [70]. The three main components of this framework are the mechanics, the dynamics, and the aesthetics. The mechanics are the lower-level rules of the game, and include the basic actions players can carry out, such as running, jumping or shooting. The dynamics describe how all the mechanics interact and translate into gameplay. Importantly, the dynamics also describe how gameplay evolves over time. Finally, the aesthetics are a product of the dynamics and describe the emotional responses from players to the game that will ultimately make the experience enjoyable. Hunicke et al. [70] describe eight types of aesthetics commonly found in games, namely, sensation, fantasy, narrative, challenge, fellowship, discovery, expression, and submission. It is common for a single game to have multiple of these types as aesthetic goals.

1 Dillon [39, 40] expands on the aesthetics description of MDA and proposes the 6-11 Framework as a replacement for the eight types of aesthetics. This framework provides a taxonomy for aesthetics that groups them into six emotions (fear, anger, joy/happiness, pride, sadness, and excitement) and eleven instincts (survival, self identification, collecting, greed, protection/care/nurture, aggres- siveness, revenge, competition, communication, exploration/curiosity, and color appreciation). Additionally, the 6-11 Framework explains that these emotions and instincts are interconnected, where one can trigger and/or support the other. An emerging theme is the complexity and range of the aesthetics component, but also some commonalities with other theories of engagement and enjoyment such as Self-Determination

Theory (SDT) [103] and flow [35]. In SDT, Ryan & Deci [103] identify competence, autonomy, and relatedness as the three basic psychological needs that must be satisfied in order to feel intrinsically motivated and engaged in an activity. Competence in SDT is comparable to challenge in MDA, and refers to the player’s perceived difficulty of the game. According to Csikszentmihalyi [35], the flow state is achieved when there is a balance between players’ perceived challenge in the game and their perceived skill. If a game is too difficult players become anxious, but if a game is too easy they become bored. Even though it is well-established that the flow state is critical for enjoyment [34, 35,

105, 109] it is difficult to sustain [36]. SDT’s autonomy, represented in MDA as the narrative, discovery and expression aesthetics types, has a direct parallel to a player’s sense of agency—the satisfying power to take meaningful action and see the results of our decisions and choices [85]—which is also a highly pursued game property

[109]. To create enjoyable gaming experiences, game designers strive to promote a player’s sense of agency, while at the same time providing content that promotes flow and triggers the author’s desired emotional responses. For example, in interactive narrative games, achieving a balance between a compelling story’s coherence and the player’s sense of dramatic agency is a key challenge

[5]. This creates tension between allowing players to act freely and having players act according to the game designer’s structure for optimal enjoyment.

Because generating game content to support player actions becomes exponentially expensive

2 and time-consuming as players are given a broader choice of actions [24], this tension is usually resolved by forcing player actions to follow authorial goals. A common approach is to simply prevent player actions that would put the game in an unsupported or undesired state. For example, players in an open-world game may not be allowed to reach parts of the game map that otherwise appear reachable, resulting in an “invisible wall” without a logical explanation. However, this approach is problematic because it breaks player agency, and therefore flow.

In some situations, game designers are only interested in taking the player from an initial state to a goal state. In such cases, it is possible to use techniques such as intervention and accommodation [98,

102] to fill in the intermediate steps while preserving game coherence. This approach is suitable for scenarios where authors only care about players experiencing an interesting story, regardless of what that story may be. When authors have a specific storyline they want their players to follow, such as those occurring in game-based learning environments, this solution is insufficient.

The work we present here unlocks the potential to provide tools for game designers to unob- trusively influence player behavior in a predictable direction as an alternative way to reduce the tension between player actions and authorial constraints by promoting player actions that align with author’s intentions. The behavior influence techniques we present here enable game authors to craft more engaging aesthetic experiences by subtly guiding players toward choosing author- desired actions out of their own volition, therefore preserving player agency. This has the potential to reduce authorial burden by allowing designers to focus their efforts developing game content that is more likely to be experienced by players. These techniques can also be applied to alter players’ performance, effectively controlling a game’s difficulty, which can allow game designers to more directly promote desirable mental states (e.g., flow) that affect players’ enjoyment and engagement.

Similarly, game designers can leverage these behavior influence techniques to inspect how player behavior changes compared to a known baseline, allowing authors to unobtrusively obtain addi- tional insights about their players, such as determining if players are actively attentive and engaged, or if players are likely to be automated agents instead of human players. These insights provide actionable data, allowing games to intervene when a particular target behavior is identified to, for

3 example, re-engage players or prevent bots from gaining an unfair advantage

1.1 Thesis

The player behavior influence techniques we present in this research leverage properties and affordances in games, namely, the knowledge that players possess, the framing of information that is revealed to players, and the physical properties of in-game elements. More formally, the thesis put forward by this work is the following:

Thesis: In a virtual gaming environment, player knowledge, the framing of revealed

knowledge, and/or the properties of in-game elements (e.g., player avatars) can be leveraged to have a predictable effect on player behavior, allowing authors to subtly

influence players’ behaviors without altering game mechanics.

Player knowledge is intentionally broad and refers to previous experiences with video games and gaming contexts, or general knowledge that would be expected from players, such as familiarity with common words or with common computer input devices. The framing of revealed knowledge refers to when and how information pertinent to the game is communicated to players. For example, a multiplayer game task can be framed as competitive or collaborative, or simply left unspecified, altering the social dynamics between players. Or certain game actions otherwise allowable in a game can be presented as having particular repercussions. Similarly, player behavior can be affected by the properties of in-game elements, such as colors and sizes of objects and characters, which can be easily altered in virtual environments. Importantly, behavior influence techniques can leverage these properties and affordances in a predictable direction, allowing not just changes in behavior, but changes that can be effected toward a desired objective. To support this thesis, in this document we present a series of case studies that serve as examples of specific changes in player behavior based on some of these properties and affordances, and how these techniques generalize across multiple game genres.

4 In addition to these case studies, in this dissertation we put forward a taxonomy that partitions behavior influence on two categories. The first category corresponds to the type of behavior being affected in terms of level of abstraction: input-level or semantic. Input-level data emerges from the use of the computing hardware through input devices. This type of data is particular to the input device or devices being used, and has the same representation regardless of the game task from which it has been collected. Examples of input-level data include keystrokes, mouse movements, touch-screen taps, and mouse clicks. While input-level data is represented the same across games, mechanics (the game rules) and dynamics (how the rules affect gameplay) will produce distinctive input-level data patterns.

Semantic data derives from games’ mechanics and dynamics. As such, they only make sense within the context of the particular game, or in the best case the game genre, from which it has been collected. Examples of semantic data include the number of moves made in a , the actions taken or choices made by a character, player stat changes, and many others. Semantic data may or may not be directly correlated with input-level data. As a more concrete example, a pinch gesture on a mobile version of a game may open the player’s inventory, whereas on the desktop version of the same game the inventory may be opened by pressing a key on the computer keyboard. Conversely, the same input-level data acquire a semantic meaning when contextualized in a particular game. A mouse click can trigger the reveal of a tile in a casual game, but can also cause a shot to be fired in a first-person shooter.

The second category of our taxonomy classifies the motivation for wanting to modify player behavior, or its purpose: narrative, difficulty, or scrutiny. There are many game genres where following an author-defined storyline is important. In these situations, modifying player behavior so that it aligns with the story the author is trying to tell is ideal. In these cases it is said that influencing behavior has a narrative purpose. Game difficulty is a well-known factor in player engagement with a game [62]. For this reason, being able to match difficulty with player skill is often desired. Influencing behavior in order to make the game easier or harder is said to have a difficulty purpose.

In some cases it is desirable to assess the cognitive state of players. Knowing how a particular game’s

5 property will affect player behavior can be controlled, for example, to determine if players are actively attentive and engaged, or for bot detection. Influencing behavior in this way is said to have a scrutiny purpose.

1.2 Summary of the Dissertation

In Chapter2 we describe our taxonomy of behavior influence techniques. This taxonomy classifies techniques by whether the behavior affected is at the input device level or at the semantic level, and by whether the technique was applied or can be applied with a narrative, difficulty, or scrutiny purpose. As part of this description, this chapter also summarizes related work covering traditional behavior influence techniques and human behavior traits that can be computationally leveraged.

Chapter3 presents a case study [42] based on the casual game Concentration, also known as the Memory game, where players aim to find matching pairs of tiles across the game board. In this study we allowed some participants to cheat by enabling reveal mode, where the values of all tiles were displayed even outside of a turn, essentially controlling their knowledge over the state space of the game. Additionally, we altered the framing of this information by telling some participants that a cheating detection system was in place, and that if cheating was detected they would lose their entire compensation. Here, we look to influence input-level data by manipulating player knowledge

(via exposure to the values of hidden tiles) and the framing of how this knowledge is revealed (via anticipated punishment), affecting mouse usage patterns and derived game-independent metrics.

We explore two purposes for this behavior change. The first one affects difficulty, as revealing the tiles makes the game easier to play. The second one is scrutiny, since input-level data patterns produced by human players are expected to change when the tiles are revealed. By inspecting the expected behavior differences a system could differentiate human players from bots.

In the case study presented in Chapter4 we look at the Typing Game [43], where players are shown series of falling words on a board. The goal is to correctly type the words before they fall off the board. Through a series of levels, players were exposed to words of varying length that

6 also vary on how familiar they are to players. Through this study we are able to better understand how player knowledge—via familiarity with a word, in terms of both general experience and from practice—affects input-level behavior in the form of typing patterns. As in Chapter3, we explore two purposes for this behavior change. The first one affects difficulty, as typing unfamiliar words makes the game harder. The second one is scrutiny, since input-level (keystroke) data patterns produced by human players are expected to change based on the length of the word being typed and the player’s familiarity with the word. This could be used not only for bot detection, but also for determining player’s familiarity with presented content.

In Chapter5 we present a case study that looks at a different game genre, namely, a role-playing game. This study explores the relationship between a narrative role and in-game choices of actions in what we call the Mimesis Effect [44]. In this environment, we leverage player knowledge and the framing of this knowledge to affect semantic data (choices). Specifically. we describe how in-game narrative roles can be used with a narrative purpose to influence player behavior to make choices that are consistent with their roles. We also show how these semantic data can be leveraged to scrutinize players in order to infer a player’s acting role.

The last case study [41] is presented in Chapter6, and it describes a proof of concept multi- player architecture and implementation of an Asymmetric Virtual Environment (AVE) that could conceptually be used to influence many types of behavior with many purposes. In this chapter we also describe how this AVE could be used as a research platform via a preliminary experiment designed to evaluate the effects of avatar colors on semantic behavior with the purpose of altering the difficulty of the game by influencing individual and group performance on a 3D scavenger hunt game immediately followed by a location identification task in a 2D casual game format. Avatar colors—red and gray—were controlled and displayed asymmetrically, where a player’s avatar color could be different to that player than to other players (or vice versa) simultaneously.

Finally, Chapter7 provides concluding remarks. Table 1.1 shows a summary of this dissertation.

7 Table 1.1 Overview of the dissertation, showing how each case study fits into our taxonomy of behavior influence. In Chapter3 and in Chapter4 we explore input-level behavior with difficulty and scrutiny pur- poses. In Chapter5 we explore semantic behavior with narrative and scrutiny purposes. Finally, in Chap- ter6 we explore techniques that apply to many types of behavior and with many purposes, and present an experimental design that focuses on semantic behavior with a difficulty purpose.

Type of Behavior Purpose

Case Studies Input-level Semantic Narrative Difficulty Scrutiny

Concentration Game X X X Chapter3

Typing Game X X X Chapter4

The Mimesis Effect X X X Chapter5

Asymmetric VEs X X X X X Chapter6

8 Chapter 2

Taxonomy of Behavior Influence

While many people believe that they are in absolute control of their own actions, the reality is that human attitudes are conditioned on multiple factors, such as genetic predispositions [3], social norms, cultural and educational backgrounds, and general beliefs [114]. Attitudes are not directly observable, but psychologists identify three ways in which they manifest measurably: cognitive, affective, and behavioral responses [114]. The cognitive response refers to our opinions, thoughts, and beliefs, which can be measured via surveys or interviews. The affective component refers to how an attitude makes us feel and is also measurable via surveys or interviews. The behavioral component describes what we do as a result of an attitude toward something and, in contrast with the other two components, can be measured via observations. It is important to note that these three manifestations of attitudes are not independent. While the behavioral component is usually considered a consequence of our cognitive and affective responses, the outcomes of our behaviors feed back into those responses as well. Similarly, the cognitive and affective responses are related and feed each other.

9 2.1 Influence

When we talk about influence, we generally refer to changes in observable behavior. More specifically, influencing behavior refers not just to any changes in behavior, but to changes that are consistent with the expectations of the influencer. We’ll call these desirable behaviors. For example, in advertis- ing and marketing, influence is used to promote the desirable behavior of customers signing up for a service or buying a product. Although a comprehensive summary of the abundant literature on behavior influence is outside of the scope of this research [114], in this section we present select examples of phenomena that psychologists have identified as capable of modifying behavior.

Social Influence is a subset of Social Psychology that studies how we influence and are influenced by others [95]. While several social influence phenomena were identified many years ago (e.g., [17]),

Cialdini [29] famously consolidated six principles of influence, namely, reciprocity, consistency, social proof, authority, liking, and scarcity. The principle of reciprocity says that we are more likely to accommodate a request from someone who we feel has given something to us or has done something for us. The principle of consistency says that we are more likely to act in accordance with something we have previously said or done. Social proof says that we resolve uncertainty by looking at the actions of others. We are also more likely to comply with a request from a figure of authority, or to believe something that is said by a figure of authority. The principle of linking states that we are more easily persuaded by those who we like. The principle of scarcity says that we have increased desire to pursue things that we believe will no longer be available in the future. More recently, Cialdini [31] identified a seventh principle named unity, which describes how the more we identify with someone, the easier it is for them to influence our behavior.

Social psychologists also study social phenomena that influences behavior beyond Cialdini’s 7 principles. Chartrand & Bargh [27] discovered the Chameleon Effect, where mimicry of behavior resulted in higher likeability of the mimics, even when the participants did not realize they were being mimicked. Social stereotypes also serve as sources of behavior influence, affecting behaviors of both the holders of the stereotype [115, 116] as well as the subjects being stereotyped [108]. Other

10 work in social phenomena describes how behavior changes with anonymity or with an altered perception of accountability [96], such as when participating in large groups. Human behavior is not only influenced by others, but also by stimuli in the environment. In fact, as Bargh & Chartrand [12] posit, most of human behavior is determined by unconscious cognitive processes that are triggered by environmental stimuli and execute outside of our control. However, the continuous flow of complex information we receive from the world, combined with our limited capacity to process it all, results in impaired rational judgment [106]. These shortcuts in information processing and decision making are known as cognitive biases or heuristics [72]. Cognitive biases are systematic, automatic, and predictable, allowing them to be leveraged to influence behavior [114]. A few examples of well-known cognitive biases include priming, anchoring, and framing. Prim- ing occurs when an earlier stimulus affects the cognitive process (and corresponding behavioral response) to a following stimulus. A famous study by Bargh et al. [13] demonstrated the effect of the priming bias on behavior. Participants exposed to concepts related to elder people walked more slowly than participants exposed to neutral stimuli when leaving the experiment. Similarly, partici- pants exposed to rude stimuli behaved more aggressively toward experimenters than participants exposed to polite stimuli.

Anchoring occurs when we evaluate a stimulus with respect to a preceding stimulus. For ex- ample, if we are negotiating the price of a product we want to buy, coming down to a price from a higher starting price may appear more favorable than coming down to the same price from a lower starting price. In an experiment by Tversky & Kahneman [112], participants were asked to estimate a numerical expression in 5 seconds. The first group was tasked to estimate the result of

8 7 6 5 4 3 2 1, while the second group had to estimate the result of 1 2 3 4 5 6 7 8. × × × × × × × × × × × × × × The average estimate of the first group was 2,250, while the average of the second group was 512. The explanation provided by Tversky & Kahneman [112] of this discrepancy is that the limited amount of information available to make the estimate was the result of the first few multiplications. A higher anchor resulted in a higher estimate and a lower anchor resulted in a lower estimate.

The framing bias refers to how humans perceive and respond to a stimulus differently depending

11 on how this stimulus is presented or delivered. For example, the decision to gamble becomes more attractive when the odds are described as the possibility of winning the jackpot than as the possibility of losing the cost of the entry. Another example from a series of experiments conducted by Tversky

& Kahneman [113] asked participants to choose between two options on how to respond to a hypothetical disease that is expected to kill 600 people. A first group was given the choice between

A) saving 200 lives, and B) a 1/3 chance that all 600 lives are saved and 2/3 chance that no lives are saved. A second group was given the choice between A) 400 people dying, and B) 1/3 chance that nobody dies and 2/3 chance that everyone dies. Despite both scenarios having the same numerical outcome, the majority of participants in the first group (72%) chose option (A), while the majority of the second group (78%) chose option (B).

Other cognitive biases also include responses to exposure to different colors [16, 48, 54, 77] and the way in which visual information is perceived and processed [65]. There are over a hundred identified cognitive biases, with corresponding variations. While a comprehensive discussion of all of these is outside of the scope of this research, it is important to note that many of these biases can be leveraged to influence behavior.

2.2 Player Behavior in VGEs

Virtual gaming environments introduce affordances that are impossible, or at least extremely diffi- cult, for the real world to support, adding a new dimension to human behavior. Waskul & Lust [117] posit that participants in VGEs construct and navigate symbolic boundaries between themselves and gameplay, often lying in the liminal spaces of the persona-player-person boundaries, illustrated in Figure 2.1. A participant in a game is at the same time: a) a person, with her own identity, beliefs, desires, intentions, and so on, b) a player, who is part of a social group and embedded in the culture and conventions of gaming, and c) a persona, a virtual “self” that exists in the game’s world.

This separation between the person and the persona has naturally resulted in a good amount of the research on behavior in VGEs exploring how social norms and behavior patterns that apply to the

12 Figure 2.1 Waskul & Lust [117] persona-player-person boundaries.

physical world translate to virtual worlds [107]. A clear example was presented by Bailenson & Yee

[8] where they showed that the Chameleon Effect [27]—where mimicking another person’s behavior results in higher likeability—transferred to VEs. Other examples include work by Roberts & Isbell

[101] on computational social influence, work by Dotsch & Wigboldus [45] on social stereotypes, work by Yee et al. [124] on personality, work by Dotsch & Wigboldus [45] and Yee et al. [122] on interpersonal distance, gender, and eye contact, and a series of experiments by Gillath et al. [57] on compassion and other social behavior, also including interpersonal distance and eye contact, largely confirming the universality of the automaticity of human behavior.

2.2.1 Avatars

While most of what we know about human behavior translates well to VGEs, the nature and affor- dances of digital media are known to also affect and interfere with this behavior [90], giving game designers additional tools to influence players. One of the most well-explored examples of VGE affordances and their effect on behavior is the ability to easily customize our virtual appearance via avatars [46, 121]—a visual representation of our digital selves [38]. Zanbaka et al. [125] explored how avatar realism and gender affected persuasiveness. They presented participants a persuasive message delivered by real humans, virtual humans, and virtual non-human avatars, each with a male and female version. Even though virtual speakers were rated more favorably than human speakers,

13 both virtual and real speakers were equally effective at changing participants’ attitudes, showing that characters in virtual environments can be as effective as real humans when influencing behavior.

Interestingly, they also found that female speakers were more persuasive to male participants, while male speakers were more persuasive to female participants.

A series of experiments by Yee & Bailenson [121] and Yee et al. [123] showed differences in avatar visual appearance change people’s behavior in VEs. In agreement with what is known of human behavior in the physical world, they found that when participants were embodied in avatars that were perceived as more attractive, they disclosed a higher amount of information about themselves to others, and reduced the amount of interpersonal distance they had with other avatars in the same virtual world. Similarly, they found that participants using taller avatars exhibited more confidence and more aggressive negotiation behavior. They named the phenomenon where people modify their behavior to conform with their avatar’s visual appearance the Proteus Effect.

The Proteus Effect has been explored beyond just attractiveness and height. Merola et al. [81] showed that avatar colors had an effect on social identity. When participants were asked to discipline an alleged offender, users of avatars in black robes were more harsh in their punishments than users given avatars in a white robe. These results are consistent with work by Peña et al. [88] where participants with avatars wearing dark clothes exhibited a priming effect for negative attitudes and tended to show more aggressive intentions and behavior than those wearing light-colored clothes.

Exploring this effect further, this latter work also showed that not just color but also the nature of the outfit affected behavior. Despite having the same color, participants embodied in avatars wearing a white Ku Klux Klan outfit showed more aggressive behavior and intentions than participants embodied in avatars wearing a white doctor’s coat. Peña [90] argues that behavior changes due to avatar appearance are due to priming effects.

2.2.2 Asymmetry

A powerful affordance of VGEs is the ability to have multiple interactants share the same digital context (e.g., a 3D world) while possibly occupying different physical locations [10]. While in the

14 great majority of VGEs the representation of the “world” that is rendered to every interactant is congruent, this does not have to always be the case. This affordance has been explored by Bailenson

[6], Bailenson & Beall [7], and Bailenson et al. [9–11] in what they call a Transformed Social Interaction (TSI), which occurs when a computer-mediated social interaction is strategically altered or filtered.

Every user perceives their own digital rendering of each other, and these renderings need not be congruent. In terms of influence in VGEs, TSI can be combined with principles of social influence to elicit desired behaviors (e.g., strategically altering avatar interpersonal distance).

2.2.3 Dynamic Adaptation

Yet another affordance of VGEs that can be used to influence player behavior is the ability for online content adaptation. Yannakakis & Togelius [119] outline ways in which procedural content generation (PCG) can be used to produce experiences targeted to individual players in what they call Experience-Driven Procedural Content Generation (EDPCG). EDPCGs rely on player models to evaluate fitness of game content to the player, based on author-defined criteria. A recent example of an EDPCG was presented in a series of experiments by Harrison & Roberts [64]. In two adaptive games, they were able to influence player behavior to increase the amount of time they were willing to spend in each game session by promoting game states that player retention models would find more favorable.

In addition to accounting for cognitive biases and player knowledge, PCG and adaptive models that rely on player models to procure content (e.g., models of player’s preferences in PaSSAGE [111] to select content for a story) could leverage semantic and/or input-level data to scrutinize players, providing additional information to make these models more accurate to the individual player.

2.3 Description of the Taxonomy

Different influence techniques in virtual gaming environments affect behavior differently. In this section we describe a small taxonomy of different behavior changes that can be expected through

15 different influence techniques. While there are many ways to categorize these techniques (e.g., by the principle of influence they leverage), here we choose a more pragmatic and utilitarian approach by partitioning behavior influence in VGEs into two categories. One category corresponds to the type of behavior being affected in terms of level of abstraction (input-level vs. semantic). The second category classifies the motivation for wanting to modify player behavior (narrative, difficulty, or scrutiny). In this dissertation, we will use this taxonomy as a framework to present a series of case studies exemplifying how VGE affordances can be used to influence player behavior.

2.3.1 Type of Behavior

2.3.1.1 Input-level Behavior

Input-level behavior refers to patterns in the raw data that can be extracted from the use of input devices, such as keyboards, mice, or touchscreens. Respectively, examples of input-level data include keystrokes, mouse movements or clicks, and touch-screen taps or gestures. These data are particular to the input device from which they were generated, but the same type of device generates data that has the same representation independently of the game from which it was collected. For example, a mouse click in a soccer game is represented the same as a mouse click in a casual game.

While input-level data from the same type of input device is represented the same across games, mechanics (the game rules) and dynamics (how the rules affect gameplay) will produce distinctive input-level data patterns. Some models of behavior, in particular models based on human cognition, describe behavior at the input-level. Famous examples include Fitts’ Law models [52] that are used to predict duration, trajectories, and error rates for pointer movements, as well as how these metrics change based on the size and distance of the target.

2.3.1.2 Semantic Behavior

Data collected from player behavior in a VGE is semantic when the input-level data is interpreted in the context of a game. The game environment translates input device events into actions that acquire meaning within that particular game. For example, a pinch gesture on a mobile version of a game

16 may open the player’s inventory, whereas on the desktop version of the same game the inventory may be opened by pressing a key on the computer keyboard. Conversely, the same input-level data acquire a different semantic meaning when contextualized in different games. A mouse click can trigger the reveal of a tile in a casual game, but can also cause a shot to be fired in a first-person shooter. Other examples of semantic data include the number of moves made in a casual game, the actions taken or choices made by a character, player stat changes, and many others.

Patterns in semantic data may or may not be directly correlated with patterns in the input- level data that generate them. In many games, the semantic interpretation of input-level events is dependent on the timing of these events, not just with respect to other input device events, but also with respect to the state of the game. This makes it possible for two identical input-level data patterns to acquire completely different semantic meaning, even within the context of the same game. Conversely, it is possible for distinct input-level patterns to be interpreted equally within a game. For example, a game that requires selection of tiles from a grid can produce the same result if the input device used to select them is operated slowly and if it’s operated rapidly.

2.3.2 Purpose

2.3.2.1 Narrative Purpose

Games that are narratively driven often require players to follow an author-defined storyline. For example, educational games, as a special case of interactive narrative games, have a particular set of goals related to a subject matter they would like players to experience. A common approach to this challenge is to author interactive narrative content for each of the user’s actions that has a meaningful impact on the story’s progress. However, the amount of interactive narrative content that must be authored to support this level of agency is exponential in the amount of ways the player can direct the development of the unfolding narrative [24]. For compelling experiences, this authoring becomes expensive and complex, requiring a significant amount of time to ensure that a high-quality experience is delivered [20, 84]. One way to reduce the tension between player agency and content authoring in a way that is

17 not detrimental to player experience is to influence player behavior to increase its convergence with authorial intents. Similarly, attitudes toward game responses to player actions can be altered to preserve agency, as illustrated by Fendt et al. [50]. These situations where player behavior is modified to accommodate a story is classified by this taxonomy as having a narrative purpose.

2.3.2.2 Difficulty Purpose

Game difficulty is a well-known factor in players’ engagement with a game [62] as well as players’ enjoyment [34, 35, 75, 105, 109]. Game designers often strive to promote a state of flow in players, which requires the game’sdifficulty to match the players’ perception of their own skill [36]. The game’s difficulty, as shown by Klimmt et al. [75], affects players’ satisfaction with their own skills, providing an opportunity to influence behavior by controlling flow. This is important not just to improve players’ enjoyment and general satisfaction, but also has implications for serious games [87]. Kickmeier

& Albert [73] showed that promoting states of flow in educational games can improve learning outcomes by making small tweaks to how content is presented in what they call microadaptivity. In a 1994 patent by NASA [92], game difficulty was also used to influence player behavior in order to make them more attentive.

Techniques to influence behavior to make a game easier or harder is classified as having a difficulty purpose in this taxonomy.

2.3.2.3 Scrutiny Purpose

In some cases, observed behavior can be used to learn more about the player. When typing or pointing at targets in a graphical user interface, individuals exhibit distinctive patterns in the timing of their keystrokes [82] and the movement of the mouse [1, 97]. Ali & Yang [2] used these player behavior patterns in VGEs, via input-level data, as a mechanism for player authentication.

Player behavior can also be used more generally to identify behavior patterns that are not necessarily unique to an individual, but to a group of individuals such as demographic classes or even humans as a whole. Martey et al. [79] explored behavior patterns of players using female

18 characters and identified differences in semantic behavior (but not on input-level behavior) between players that reported identifying with the gender of their avatars and those who did not. Influencing behavior and inspecting the outcome can also be used for bot detection. Thawonmas et al. [110] used behavior patterns in semantic data to identify bots in multiplayer online games. Input-level data can also be used to characterize bot behavior, as shown by Barik et al. [14] Influencing behavior to learn more about players is classified by this taxonomy as having a scrutiny purpose.

19 Chapter 3

Case Study: The Concentration Game

3.1 Introduction

This case study shows how controlling player knowledge and the framing of this knowledge can be used to unobtrusively influence input-level behavior with a difficulty or scrutiny purpose. Specifi- cally, we show how mouse movement and click patterns change in response to players’ access to information that makes the game easier. Purposefully controlling this information can be used to subtly influence players’ performance, but also to examine player reactions when this information is available in order to differentiate, for example, human players from bots.

People exhibit distinctive patterns when using input devices such as typing on a keyboard [82] or moving a computer mouse [1, 97]. Some of these patterns are unique to the individual, and as such have been used in the field of biometrics for user authentication [82, 97]. Other patterns are more general and are reflective of how all humans process information and execute tasks on a computer [52]. Studies have shown that people move their mouse according to their visual attention; some follow along with the pointer like a finger on print media, while others move the pointer to areas surrounding their visual focus [4, 28, 86]. In either case, there is a correlation between eye focus and pointer location [28, 86]. Different users may exhibit different types of patterns in input device usage depending on the

20 microstrategy they apply to the task, as identified by Gray & Boehm-Davis [60]. Microstrategies are characteristic choices that users make, without extensive deliberation, between different actions to achieve their goals. A representative example of research on microstrategies is due to Gray &

Fu [61]. Experiment participants were given a task to perform in a user interface that contained an information box, with a variable cost of accessing that information in different conditions: the information could be permanently visible or it could require a mouse click (with a temporary lockout) to see. Gray and Fu found patterns in completion times, error rates, and decisions made by participants across the conditions, which could be explained in terms of tradeoffs between perceptual/motor and memory retrieval effort. Participants’ behaviors depended in subtle ways on cognitive biases (e.g., a preference for “knowledge in the head” rather than perfect knowledge in the world).

Games with the ability to characterize these input-level behaviors could identify user interaction patterns that are abnormal—usage that diverts from the way a game is normally played. Some games may find this information useful to detect usability problems. For example, a game may be able to detect interaction patterns of a novice user and offer hints and tips on how to navigate a user interface and use the controls. Other games may use these models of user behavior to differentiate real humans players from automated agents posing as humans (bots) that are commonly used to gain an unfair advantage [56], or to differentiate classes of players. This case study takes the first steps toward exploring the latter use cases by controlling and framing information revealed to players to influence their behavior at the input-level with the purpose of scrutinizing their behavior and/or modifying the difficulty of the game. Games that benefit from making sure their users are humans instead of bots have traditionally relied on human interactive proofs (HIPs) [58] which can differentiate, with a certain degree of confidence, a real human from an automated system by presenting problems that are hard for computers to solve, but humans solve with little effort. An example of such implementations are

CAPTCHAs [21]. However, while there have been efforts to reduce the burden of their use [59],

HIPs are generally intrusive and can present usability challenges for users [118]. There is also the

21 possibility that a human could solve a HIP and immediately after activate a bot, effectively bypassing this method’s protection.

One approach to overcome the limitations of HIPs is the use of human observational proofs

(HOPs), which aim to detect whether a user’s behavior “looks” human. As HIPs, HOPs have also been used to detect bots in games [14, 56], including multiplayer online games [110]. However, behavior that merely looks human could be easily replicated by adding some randomness. Instead, we envision an approach that matches behavior not only to what resembles human, but also by corresponding actions to known human cognitive processes. We call this type of security proofs human subtlety proofs (HSPs). Human behavior models can be extracted from normal usage of input devices to obtain a snapshot of users’ perception and decision-making processes in a transparent and unobtrusive way for a given game task. Games can then scrutinize their players by making subtle changes that will influence human behavior in a measurable and predictable way, such as strategically disclosing information to players. These changes in behavior can then be inspected to differentiate players that have been exposed to the behavior-changing stimulus from those who have not, and also from bots that would not be able to respond in a way a human would.

In order for HSPs to be realized, its assumptions about the effects of these subtle interventions on behavior need to be tested. In this case study we show that different cognitive processes operating on the same task will produce distinct input-level behavior patterns. We also show that this behavior falls outside of human control, such that the modified behavior could not be faked. In this case study we present an initial approach toward building human behavior models that would allow a game to identify different player behavior patterns based on their low-level usage of input devices, such as mouse motion, click patterns, and key presses. In this work we focus exclusively on mouse features. More specifically:

We hypothesize that (H1) different cognitive processes will translate into differences

in how people use input devices, and that (H2) these differences cannot be hidden by

people even if they try to conceal their behavior.

22 3.2 Method

To test our hypotheses, we conducted an experiment around a computer version of the Memory game [74], also known as the Concentration game. In this casual game, players must find matching pairs of tiles on a board by revealing one tile at a time until all pairs are found. We gave some players a modified version of the game that influenced the cognitive processes required to solve the task by controlling their ability to cheat through the use of a reveal mode, where they could see what each tile contained before revealing them. By controlling cheating in the game we can simulate cases of players having access to information that other players do not have in a natural way, effectively changing the task from one of location recall to one of visual identification. We can therefore test whether playing the game under different cognitive processes influences player behavior at the input level. In particular, in this case study we focus exclusively on mouse usage patterns.

3.2.1 Experimental Design

To explore how input-level behavior changes when information revealed to players is framed dif- ferently, we designed our study with four experimental conditions: 1) reveal mode disabled; 2) use of reveal mode discouraged; 3) use of reveal mode encouraged; and 4) reveal mode enabled, but neither encouraged nor discouraged. The difference between conditions was the language used when providing instructions in the in-game tutorial and game rounds.

On every condition, the instruction text for the tutorial always started with the following para- graph:

In this experiment, you will first play a tutorial round of Concentration to become familiar

with the rules and controls of the game. The game board consists of a 4 4 grid of black × tiles. A single left-click on any tile will show the underlying letter. You will then attempt

to find a second tile that has the same letter as the first tile. If the pair is a match, the tiles

will be removed from the board. If not, this is a mismatch — the tiles will be restored to

their hidden state, and the game will continue on to the next turn. You will also lose some

23 amount from your round reward, which will be displayed on the right. A round is over

when all tiles have been cleared off the board.

Every condition added a different final paragraph, as described below. Similarly, the instructions provided before moving to the game rounds had the following text in all conditions:

You will now complete the actual exam, which consists of ten rounds.

At this time, the investigator will leave the room. When you have finished all rounds, an

on-screen message will indicate that the experiment has been completed and display your

final compensation. If you find that you need to terminate the study early, please press

the letter Q and you will be taken to the final screen. In either case, please open the door

when you are finished with the study and the investigator will debrief you and instruct

you on how to obtain your compensation.

Some conditions included a different additional paragraph on the game round instructions, as described below.

3.2.1.1 Reveal mode disabled

This is the control group. Participants in this condition could not enable reveal mode and a cheating detection system was not mentioned. Additionally, the game interface did not display the control to enable reveal mode. We will refer to this condition as Cheating disabled.

On this condition, the following text was appended to the tutorial instructions:

For this tutorial round, some tiles have already been removed. Please do not use any other

controls. If the game is incorrectly exited, we will be unable to determine your actual

compensation.

This condition did not include any additional text on the round instructions.

24 3.2.1.2 Use of reveal mode discouraged

Participants playing under this condition were told of the existence of a cheat detection module, and that this module was activated. In reality, no such module existed. The ability to use “reveal mode” was enabled. We will refer to this condition as Cheating discouraged.

The following text was appended to the tutorial instructions:

For this tutorial round, some tiles have already been removed. If your reward goes down

to $0, you can use spacebar on the keyboard to reveal hidden tiles and finish off the rest of

the round. This is called the “reveal mode”. Feel free to experiment with this option in the

tutorial round. Please do not use any other controls. If the game is incorrectly exited, we

will be unable to determine your actual compensation.

This condition showed the following additional text prior to the formal round:

The researchers are interested in improving the quality of a cheat detection system. For

your round, we have enabled our cheat detection module. For purposes of our study, the

cheating detection module only examines mouse operations, such as mouse movement

and left-click. It does not monitor key presses. If the module is able to detect that you are

cheating (only looking at mouse actions), then the round will end and you will receive $0

for the round.

3.2.1.3 Use of reveal mode encouraged

Participants playing under this condition were told of the existence of a cheat detection module, but were told that this module had been disabled for them. The ability to use “reveal mode” was enabled. We will refer to this condition as Cheating encouraged.

The tutorial instructions on this condition also appended the same text as the condition where reveal mode was discouraged. The following additional text was shown on the round instructions for this condition:

25 The researchers are interested in improving the quality of a cheating detection system.

Some participants have a cheating detection system that is activated during their game

session that examines mouse operations, such as mouse movement and left-click, to detect

if they are cheating. For your round, however, we have disabled our cheating detection

module.

3.2.1.4 Use of reveal mode allowed

This condition did not mention a cheat detection module at all, and the existence of a reveal mode was not explicitly stated. However, the ability to use “reveal mode” was enabled. We will refer to this condition as Cheating allowed.

The following text was appended to the tutorial instructions:

For this tutorial round, some tiles have already been removed. Please do not use any other

controls. If the game is incorrectly exited, we will be unable to determine your actual

compensation.

This condition did not include any additional text on the round instructions.

3.2.2 Population and Sampling

We used snowball sampling to recruit participants for our study. The recruiting message contained a link to a website where interested people could read the consent form and sign up for a study time slot. Participants were offered a base compensation of $5.00 for participating, a maximum of an additional $2.00 for each game round they played, for a total compensation amount of up to a maximum of $25.00. Their compensation for a round began at the highest value ($2.00) and decreased by $0.10 (until it is $0) for every mismatch that the participant made on that round.

Because our control group did not have the ability to enable reveal mode, participants on this condition were given an additional $5.00 on their base compensation for a total of $10.00 base

26 Table 3.1 Participant assignment per condition.

Cheating Cheating Cheating Cheating Gender Disabled Discouraged Encouraged Allowed Female 1 4 6 0 Male 11 11 5 11 Total 12 15 11 11

compensation. The compensation received for the round is displayed at the bottom right of the game screen, as shown on Figure 3.1.

Our implementation of the Concentration game randomly assigned participants to one of the conditions when they began the study. We are only considering for analysis the 49 participants who completed all 10 rounds in the study. The number of participants per condition and gender is shown in Table 3.1. Because participants self-selected for the study, it is possible that the participant demographics are biased towards individuals who enjoy these types of games or have a better-than- average memory.

3.2.3 Description of the Environment

We implemented a version of the Concentration game in Adobe Flash. The game interface, shown in Figure 3.1, consists of a 4 4 grid of tiles, each being a 100-pixel square. Tile boundaries were × represented with a 1-pixel line. When face down, a tile is black. When face up, the tile shows a white background with a single, centered black letter. To improve legibility, we used the letters A, B, C, E,

H, I, P,and Q, which were presented in the Helvetica Neue LT Std 65 Medium typeface [83]. Each letter was used twice, resulting in 8 unique matching pairs of tiles. The location of each letter tile on the board was randomly chosen for each round.

A game round begins with all tiles facing down. Tiles are revealed by clicking on them. A turn begins when a player reveals the first tile of a pair, and ends when the player reveals the second tile of that pair. When two tiles are revealed, their contents are displayed for 1 second. At that point the tiles are turned face down (in the case of a mismatch) or cleared from the board (in the case of a

27 (a) Game screen with reveal mode disabled (b) Game screen with reveal mode enabled

Figure 3.1 User interface of the Concentration game

match). The player may proceed to the next turn without waiting for the system to turn tiles back over by clicking on any face-down tile in the case of match, or by clicking on any available tile in the case of a mismatch. A game round ends when every tile has been cleared from the board.

Players with access to reveal mode could press the space bar on their keyboards to toggle it on and off. This mode would allow players to see the letter of every available tile when these were face down, essentially allowing them to cheat. To avoid confusion about the state of a tile, we used gray for the color of the letters of face down tiles when reveal mode was enabled, as shown on Figure 3.1b.

3.2.4 Experimental Procedure

Participation in the study began with an optional demographics survey, followed by an in-game tutorial and practice round to become familiar with the game rules and mechanics. Any amount earned during the practice round was not counted toward the final compensation amount. The instructions provided during this tutorial varied per condition as described in above. The practice

28 round was followed by ten formal rounds of the game.

3.2.5 Evaluation Metrics

We instrumented our game to collect input-level behavior data in the form of mouse pointer motion and click events with millisecond precision. We associated these events with the state of the game, and of the board. For example, each recorded event also captured whether reveal mode was enabled, which tile was clicked or hovered, turn counts, etc. The game also recorded when reveal mode was toggled. While many of these data are game-specific, we are only considering for analysis input-level features that can be measured on any graphical user interface and are therefore task-independent.

Here we describe the features.

1. Time between clicks: This feature measures the time, in milliseconds, between two successive

clicks. These times are averaged and reported as a single value per round.

2. Time between a click and a succeeding mouse movement: This measures the time, in mil-

liseconds, after a tile is clicked and before the mouse is first moved. These time differences

are also averaged and reported as a single value per round.

3. Count of change in direction of mouse motion: This measures the number of times the

mouse is moved in the opposite direction of the current motion either horizontally, vertically,

or diagonally. For any mouse motion, its direction is determined by the preceding location and

current location of the mouse pointer. If the directions of two consecutive mouse movements

are different, the count is increased.

4. Screen region hover count: The game board can be naturally partitioned into 16 regions

corresponding to the 16 tiles. This feature counts how many times the mouse pointer is moved

from one region to another. Functionality on graphical user interfaces can usually be grouped

into regions.

29 5. Task completion time: This feature measures the total time taken to complete a round in

milliseconds.

6. Total number of clicks: It is the total number of clicks observed during the round.

3.3 Analysis and Results

All the above features are calculated for each round (i.e., every round played is considered a data point). For a total of 10 rounds per participant and 49 participants we analyzed 490 labeled data points with 6 features each. In addition to looking at our data per condition, we also found it useful to categorize rounds based on the type of player interaction. We did this by looking at whether reveal mode was used or not during a round and defined 3 types of rounds: 1) when reveal mode was disabled for an entire round we call that a no reveal round, 2) when reveal mode was enabled for the entire duration of a round we call that a full reveal round, and 3) when reveal mode was both enabled and disabled during a single round we call that a partial reveal round. During our analysis, we also looked at the combination of full reveal and partial reveal rounds as a single class. We will refer to this class as mixed reveal rounds. The use of reveal mode changes the cognitive task of the game from one of memory storage and retrieval (when reveal mode is disabled) to one of visual search (when reveal mode is active). Thus, these subsets of data are actually reflective of different cognitive processes.

Table 3.2 shows the average values of all the relevant features on different round types. For illustrative purposes, Figure 3.2 shows visualizations of each type of round.

For our analysis we used Python’s Scikit-learn 0.15.2 [89] library’s implementation of the Random

Forest classifier [23] with 1000 estimators and the rest of the parameters set to their default values. We chose to use random forests because it provides good classification accuracy with a relatively simple implementation. Random forests also provide the ability to inspect trees for tweaking parameters to improve accuracy, if desired. We used 10-fold cross-validation to evaluate our models and calculated accuracy values per player, and then averaged these values per condition and/or round type to

30 Game: 17, Round: 1, Group: 2, Cheated? no, Mismatches: 4 Game: 21, Round: 4, Group: 3, Cheated? yes, Mismatches: 1

fast fast

200 slow 200 slow 8 1 click click 3 10 4 match match mismatch mismatch 2 18 7 14 cheat on cheat on 4 cheating 5 cheating cheat off cheat off 300 300 9 end 8 10 3 6 23 5 1 2 18 20 15 400 400

mouse_y 11 13 mouse_y 12 15 16 14 6 16 7 9 start 22 start end 24 500 500 21 19 17 12 11 13 17 600 600

200 300 400 500 600 200 300 400 500 600

mouse_x mouse_x

(a) No reveal round (b) Partial reveal round

Game: 22, Round: 1, Group: 3, Cheated? yes, Mismatches: 4

fast

200 slow click match 1 14 4 mismatch 2 3 cheat on 8 cheating cheat off 300

9 24 10 18 end 5 67 20 400 mouse_y

11 1213 16 22 15 start 500

19 23 17 21 600

200 300 400 500 600

mouse_x

(c) Full reveal round

Figure 3.2 Visualizations of mouse motion and click activity on different types of rounds. Notice in Fig- ure 3.2c that the player made four mismatches despite being able to see the contents of all tiles for the entire duration of the round, supporting the idea that they were trying to conceal their behavior.

31 Table 3.2 Global averages of features per type of round. In parentheses, standard deviation values.

Feature No Reveal Mixed Reveal Partial Reveal Full Reveal Time between clicks (ms) 1726.84 (686.20) 2360.06 (1814.52) 2728.32 (1739.83) 1721.97 (1763.61) Time between a click and a succeeding mouse movement (ms) 279.36 (164.64) 301.49 (487.16) 367.93 (581.04) 186.37 (206.55) Count of change in direction of mouse motion 389.87 (148.16) 299.64 (225.69) 310.12 (182.52) 281.50 (284.53) Screen region hover count 119.77 (36.91) 78.41 (45.49) 84.33 (41.29) 68.16 (50.34) Task completion time (ms) 50420.80 (19786.68) 45809.62 (32749.42) 54850.41 (31071.80) 30144.89 (29513.26) Total number of clicks 29.14 (6.51) 19.10 (4.98) 20.50 (5.61) 16.67 (2.02) Instances 214 276 175 101

32 Table 3.3 Assignment of round types to classes for each analysis. In parentheses, percentage of rounds in each class for each analysis.

Analysis 1 Analysis 2 Analysis 3 No reveal Class 1 (43.67%) Class 1 (67.94%) Class 1 (43.67%) Full reveal Class 2 (32.06%) Class 2 (20.61%) Partial reveal Class 3 (35.71%) Mixed reveal Class 2 (56.33%)

obtain a total accuracy. We also calculated values for classification precision, recall, and F-score

[100]. We divided our analysis into three parts. A summary of the types of rounds considered as classes for each of the three analysis is shown on Table 3.3.

Analysis 1: On the first part, we considered mixed reveal rounds as one class and rounds where reveal mode was never used (no reveal) as the second class. Using the model trained for this binary classification we obtained 89.18% accuracy with an F-score of 0.98. Precision and recall values were

1.00 and 0.96, respectively.

Analysis 2: On the second part of our analysis, we removed rounds of type partial reveal, there- fore considering only no reveal and full reveal round types for classification, obtaining a binary classification accuracy of 98.73% with an F-score of 0.88. Precision and recall values were 0.83 and

0.95, respectively.

Analysis 3: The third part of our analysis considered rounds of type partial reveal, no reveal, and full reveal as three separate classes. In this case, the accuracy was 80.61%. The no reveal class had precision, recall, and F-score values at 0.83, 0.92, and 0.87, respectively. The full reveal class had precision, recall, and F-score values at 0.80, 0.79, and 0.80, respectively. The partial reveal class had precision, recall, and F-score values at 0.76, 0.89, and 0.82, respectively.

In these three analyses we also looked at individual accuracy values per experimental condition.

We used the same models created for each analysis (trained with rounds from all conditions) and attempted to classify rounds of each condition separately. Table 3.4 shows a summary of our clas- sification accuracy results per round type and condition, including precision, recall, and F-score values for Analyses 1 and 2. Table 3.5 shows a summary of our classification accuracy results per

33 Table 3.4 Accuracy results for our first and second analyses per condition.

Classification type Experimental Condition Instances Accuracy Precision Recall F-score Cheating disabled 120 93.33% 0.93 1.00 0.97 Cheating discouraged 150 84.00% 1.00 0.63 0.77 Analysis 1 Cheating encouraged 110 93.64% 0.94 0.87 0.91 Cheating allowed 110 87.27% 0.94 0.55 0.70 Global 490 89.18% 0.83 0.95 0.88 Cheating disabled 120 100.00% 1.00 1.00 1.00 Cheating discouraged 87 95.40% 1.00 0.91 0.95 Analysis 2 Cheating encouraged 67 100.00% 1.00 1.00 1.00 Cheating allowed 41 100.00% 1.00 1.00 1.00 Global 315 98.73% 1.00 0.96 0.98

Table 3.5 Accuracy results for our third analysis per condition. For each class in our classification, Pn columns show precision values, Rn columns show recall values, and Fn columns show F-scores.

No reveal Full reveal Partial reveal

Experimental Condition Instances Accuracy P1 R1 F1 P2 R2 F2 P3 R3 F3 Cheating disabled 120 90.83% 1.00 0.88 0.94 N/AN/AN/A N/AN/AN/A Cheating discouraged 150 72.67% 0.65 0.95 0.77 0.85 0.72 0.78 0.75 0.81 0.78 Cheating encouraged 110 85.45% 0.84 1.00 0.91 0.84 0.87 0.86 0.89 0.81 0.85 Cheating allowed 110 75.45% 0.58 0.88 0.70 0.69 0.83 0.75 0.89 0.80 0.84 Global 490 80.61% 0.83 0.92 0.87 0.80 0.79 0.80 0.76 0.89 0.82

34 round type and condition for Analysis 3 including precision, recall, and F-score values for each of the classes. We have also included histograms of each feature containing all data points for Analysis

1 (Appendix A.1), for Analysis 2 (Appendix A.2), and for Analysis 3 (Appendix A.3).

3.4 Discussion

The above analysis shows that the method we present in this case study to model input-level data can be used to differentiate higher-level decision making processes with a high degree of accuracy, in agreement with our hypothesis H1. In other words, player knowledge in the form of revealed tiles modifies the cognitive process required to complete the game task, which has a predictable effect on mouse usage behavior patterns. Furthermore, these behavior differences can be detected unobtrusively by training computational models on input-level data. In agreement with our thesis, these changes in behavior are not only predictable, but are exhibited when interacting with our game using the same mechanics.

By looking at the data we collected from three separate angles through our three analyses we gain a better understanding of our results. The accuracy values we obtained on the second analysis were higher than those of the first one. This can be explained by understanding the nature of partial reveal rounds included in our first analysis. During these rounds, players used reveal mode for some of the round time. While reveal mode was on, players had to visually search for matches, whereas when reveal mode was off they had to rely on their memory. When reveal mode is turned off, this results in a transition from a cognitive state that relies on visual perception to a cognitive state that relies on memory. The opposite transition happens when going from having reveal mode on to turning it off. This transition would occur even after having seen all the tiles, since working memory decays in a short amount of time. However, if reveal mode was turned off late in a round when only a few tiles were left, the effect of memory decay would be smaller on the task. In our analyses we did not look at the different times in the rounds where reveal mode is turned on or off. Due to these variations, the input behavior observed during mixed cheating is not consistent and hence harder

35 to detect. Also, because we are averaging feature values for the entire round, some of the distinctive patterns are reduced, decreasing prediction accuracy.

By considering rounds of type full reveal on our second analysis we exclude the blend of patterns that occurs in partial reveal rounds and the diffusion that occurs by averaging the features for whole rounds, resulting in increased accuracy. A similar argument can be used to explain the results of our third analysis. Since we considered three classes in this third case, the classification problem increases in complexity. Because of this, accuracy values in this analysis are lower than in the first two.

Throughout our three analyses, prediction accuracy is consistently lowest (comparatively) in the condition where the use of reveal mode was discouraged. On this condition, players were told that a cheating detection system was active, and that if detected, they would lose their compensation for that round. Participants were unaware how such cheating detection system was supposed to work. To avoid losing their earnings, players who chose to use reveal mode were expected to behave as they think they would have behaved if they had not enabled reveal mode. If these players had been able to conceal their cheating behavior, our models would have classified these rounds as no reveal. However, our results show that even in this extreme case our models were able to differentiate mouse usage patterns with high accuracy in agreement with our hypothesis H2. Conversely, the prediction accuracy of the three analyses is consistently highest in conditions where the use of reveal mode was disabled. This is also expected because all of the rounds in this condition belong to the no reveal class where our features generally show much less variability. The inability of players to conceal their behavior supports the argument that input-level data patterns can be inspected to scrutinize players unobtrusively.

A plausible explanation for the increased accuracy value obtained on the condition where the use of reveal mode was encouraged when compared with the condition where the use of reveal mode was simply allowed can be the explicit statement provided during the game instructions when reveal mode was encouraged about there being no consequences to using this mode during the game. We suspect that the lack of information about reveal mode in the condition where its use

36 was simply allowed made it unclear for players whether there would be any adverse consequences of its use. Therefore, participants in this condition may have felt inclined to conceal their use of reveal mode, bringing their interaction patterns closer to those of players in the condition where reveal mode was discouraged. Evidence for this explanation can be found in the visualization of a full reveal round in Figure 3.2c showing that, even though the player could see the contents of all the tiles for the entire duration of the round, she still made four mismatches.

While this work lays the foundation toward building more complex models that could identify different cognitive states and cognitive processes of players based on their input-level behavior patterns, our method can be readily used to detect abnormal behavior, such as differentiating bots vs. human players, or players with access to information that other players do not have. In other words, it is possible to influence behavior that manifests at the input-level to scrutinize players, allowing game designers to gain player-specific insights that can be used to tailor game experiences to characteristics of specific players. The features we used in this case study can be applied to any game that requires interaction through a computer mouse. These features capture many nuances of human cognitive processes such as the time it takes to visually process information or to make decisions about a next action. We have shown that these subtle variations in interaction patterns can be used to scrutinize human behavior as well as to modify the difficulty of the game without altering how it is played.

37 Chapter 4

Case Study: The Typing Game

4.1 Introduction

This case study shows how player knowledge can be leveraged to unobtrusively influence input-level behavior with a difficulty or scrutiny purpose. More specifically, we show how keystroke patterns are altered by how familiar players are with game content, which can be used to influence players’ performance, but also to gauge their familiarity to this game content. Additionally, when combined with known human cognition phenomena, these input-level patterns can be used, for example, to distinguish real human players from automated agents.

We present early exploratory work toward creating cognitive models of keystroke behavior that can be applied to, among other things, identifying the cognitive processes at play when users are using a computer keyboard. In particular, we focus on the use of a keyboard as a window into behavior patterns that are reflective of the user’s familiarity with the elements being typed in a computer game we call The Typing Game. By manipulating the players’ familiarity with the words in our game through their similarity to dictionary words, and by allowing some players to replay rounds, we expect to identify distinct patterns in how they type these words under these different conditions.

Once established, these input-level data patterns can be used to scrutinize player behavior not only in terms of their familiarity with game content, but also by comparing to predictions of cognitive

38 models in order to determine if the player is a human or a bot.

Transcription typing has been well-studied, with some work looking at how typing speed varies with unfamiliar material [104]. Salthouse [104] observes 12 "basic phenomena" about typing, one of which describes the reduction in the typing speed when an expert typist is presented with random sequences of letters. John [71] introduced a model based on the Model Human Processor (MHP) [25] called the TYPIST model that “can be used to make quantitative predictions of performance on typing tasks". TYPIST applies the MHP to human typing tasks for skilled transcription typists in order to quantitatively predict the performance of the typists. It processes text at the level of chunks, which could be words, syllables, or letters. TYPIST is applied to several common typing tasks, and its predictions of the performance of the typists come to within 20% of empirical measurements. This previous research has focused on skilled or “expert” typists, with little work exploring input-level typing patterns of average users. While Feit et al. [49] recently explored the mechanics and high-level strategies of everyday typists, the cognitive processes involved in typing different types of content remain largely unexplored.

To explore input-level typing patterns and their relationship to cognition, our game involved typing words of different lengths with varying word shapes [22]. We recorded typing speed and accuracy expecting an improvement as the rounds were replayed, as well as better speed and accuracy while typing words more similar to dictionary words. By changing the nature of the words being typed, we were able to influence the cognitive process required to type them, allowing us to measure how the differences in cognition are reflected in input-level behavior. Formally, our hypotheses are:

H1: Practice increases speed – The average IKI in a round will be smaller when replaying.

H2: Practice increases accuracy – The average number of mistakes in a round will be smaller

when replaying.

H3: Familiar words are typed faster – The average IKI of a word will be smaller the closer the word

is to a dictionary word.

39 H4: Familiar words are typed more accurately – The average number of mistakes made when

typing a word will be smaller the closer the word is to a dictionary word.

4.2 Method

To evaluate our hypotheses we designed and implemented a casual game we call The Typing Game.

The goal of the game is to type words that appear on the game board as fast as possible. Game rounds consist of sets of between 1 and 4 words that are initially shown on the first row of a grid, one per column, and drop down one row at periodic intervals until they are correctly typed or fall off the board. Words that are correctly typed immediately disappear from the board. Words that are completed in higher rows earn a higher score. If a mistake is made while typing a word, the word must be typed again starting with the first letter. Because of the exploratory nature of this work, we focused primarily on establishing internal validity.

4.2.1 Experimental Design

Participants were randomly assigned to one of three experimental treatments.

• Replay not allowed: Participants were not allowed to voluntarily replay any rounds.

• Replay encouraged: Participants were allowed to voluntarily replay any round an unlimited

amount of times. After each round, the game showed both the key combination to press in

order to advance to the next round and the key combination to press in order to replay the

round.

• Replay allowed: Participants were allowed to voluntarily replay any round an unlimited

amount of times. At the end of each round, the game showed both the key combination

to press in order to advance to the next round and the key combination to press in order to

replay the round, but the latter was displayed as if it were inactive (grayed out) despite being

functionally equivalent to the replay encouraged treatment.

40 We found that participants in the replay allowed treatment never attempted to replay a game round and therefore behaved in the same way as participants in the replay not allowed treatment.

We believe that displaying the replay prompt as inactive was enough to make participants believe that they did not have the ability to use that feature. For the purpose of our analyses, we will treat all participants in these two treatments as a single group. We will refer to these groups as the replay and no replay conditions based on whether they voluntarily replayed rounds or not, respectively.

4.2.2 Population and Sampling

We targeted computer users of at least 18 years of age, and recruited using a combination of conve- nience and snowball sampling. We advertised our study primarily to the Computer Science student body at NC State University, but also posted fliers on nearby bulletin boards. Participants were of- fered a base compensation of $5.00 and a maximum of an additional $2.00 for each game round they completed, for a total maximum of up to $25.00 based on their gameplay performance. Interested individuals were asked to sign up online for an available time slot and location.

Our sample consisted of 43 participants, of which 14 were female and 29 were male. Before the game, we asked the participants to rate their typing skills by choosing one of these options: Beginner,

Intermediate, Advanced, or Expert. Of the females, 8 reported their skills as intermediate, and 6 as advanced. In the case of the male participants, 2 reported their skills as beginner, 14 as intermediate, and 13 as advanced. The average age of the female participants was 23.57 (SD = 2.53) years, and for the males, it was 23.90 (SD = 2.13) years. There were 16 participants in the replay group and 27 participants in the no replay group.

4.2.3 Description of the Environment

Our implementation of The Typing Game was written in Adobe Flash CS5.5 and was designed to run on a Web browser. Words on the game board are shown on a 4 4 grid with a black background, × where each cell is 200px wide and 100px tall, as shown on Figure 4.1. A cell with an untyped word will have a gray background. Every word uses the Consolas font in 18 point. The color of the font is

41 Figure 4.1 Screenshot of the game board showing a set of a single word on the fourth column and second row of the grid.

initially black, but as a word is typed, the color of correctly-typed letters changes to a dark gray to show progress.

To ensure that the game screen had input focus and that the keyboard input was received by our game, the first screen prompted the player to press the SHIFT-N key combination to begin. Every round begins at the highest score ($2.00) and decreases by $0.05 until it reaches $0 for every time a set of untyped words drops down one row. For a player to earn the maximum score, she has to type every word correctly while they are still on the first row. The current amount to be earned for a round is displayed at the bottom right of the game screen (see Figure 4.1) and is updated as the words drop. The game included a practice round that accurately simulated the mechanics that the player would experience in the game rounds. In order to advance to the game rounds, the player was required to earn a score of at least $1.70 during the practice, and was required to replay the round until she did. The money earned during the practice round did not count towards the player’s

final compensation.

Each round had a staging screen that prompted the player to press the space key to begin. After

42 each round, a summary screen presented the round number, the amount earned, and a prompt to press the SHIFT-N key combination in order to proceed to the next round (or end the game if on the final round). Depending on the experimental condition, some players had the option to replay rounds by pressing the SHIFT-R key combination during a round’s summary screen. The score earned for a round would be the one obtained on the last replay of that round, regardless of whether it was lower or higher than the score obtained in previous attempts. In addition to the practice round, the game consisted of a total of 10 game rounds, which did not require a minimum score.

A single game round contains multiple word sets that initially appear on the first row, but on different columns, of the game board grid. Our game consisted of 10 rounds varying the type of words and their length. We designed our rounds with four types of words, all in lowercase: 1) dictionary words (e.g., “quit”) , 2) dictionary words with one or more transposed letters, preserving the general shape of the word (e.g., “tiem” for time), 3) dictionary words with one or more transposed letters, breaking the general shape of the word (e.g., “gluf” for gulf), and 4) words composed of random letters, filtering out common bi-grams and tri-grams to avoid confounding our variables. The idea behind the differences in word choices was to explore how the similarity of the word being typed to a real word affected the typing patterns. For the same reason, our rounds had different word lengths

(short, medium, and large, as shown in Table 4.1, with 3-4, 4-5, and 5-6 characters, respectively).

4.2.4 Experimental Procedure

The researchers asked participants to meet them at a designated room during a time slot previously agreed upon. After providing informed consent, participants were given the opportunity to ask questions before moving on to the data collection phase. At this time, the researchers would instruct participants to sit in front of a computer that was previously set up to run our game using the Google

Chrome browser in full-screen mode. This computer was instrumented with a USB Microsoft Wired

Keyboard 600 configured with US American visual and functional keyboard layouts. The researchers asked participants to notify them once they reached the final screen of the game and stepped out of

43 Table 4.1 Description of the rounds in our Typing Game.

Round Round Word Length Word Type Practice Practice Short Dictionary word 1 DictM Medium 2 ShapeS Short Transposed letters 3 ShapeM Medium preserving word shape 4 ShapeL Long 5 NoShapeS Short Transposed letters 6 NoShapeM Medium breaking word shape 7 NoShapeL Long 8 RandS Short a d m lte sRnMMedium Random letters9RandM 10 RandL Long the room, leaving the participants alone with no distractions.

The game first presented a small demographics survey, followed by a small survey that asked about the player’s background and typing habits. Next, the game asked the participant to type the sentence “the quick brown fox jumps over the lazy dog”. This sentence was used to ensure that the keyboard was working properly. Once this sentence was correctly typed, the player was prompted to press the SHIFT-N key combination to proceed to an in-game tutorial. The player was asked to press the space key to begin the tutorial, which started by explaining the game mechanics in an interactive manner prompting the player to type the word “go” in order to move to the next screen.

This illustrated how words were removed from the game board once they were correctly typed. The next screen in the tutorial illustrated how sets of words would drop from the row on every time interval, and how drops affected scoring. This was followed by the practice round and the actual game began.

Once they completed the game, participants would notify the researchers who would then record the participant’s earnings and a unique game-generated code from the last game screen onto a paper form that participants would later use to collect their compensation. The purpose of this last step was to avoid associating participants with the data that was collected from their participation.

44 4.2.5 Evaluation Metrics

We had three independent variables in our experiment: 1) the Round of our game being played, which modified the difficulty of the game via the length and type of words that players had to type, 2) the Condition, which dictated player’s ability to replay rounds, and 3) the Attempt, which indicates how many times the round is being replayed. In the case of the no replay condition, the value of the Attempt of a game round is always 1. By allowing some participants to replay rounds, we can explore how increased familiarity affects input-level behavior.

Our implementation of The Typing Game captured the “key down” and “key up” keyboard events, causing each keystroke to be recorded as two events. In addition to the key that generated the event, our game also collected a timestamp, with millisecond precision, of when each event occurred. Each key event was also associated to the round or screen active when it occurred, to any word on the board to which it may have corresponded, whether the keystroke was correct or not, and whether it completed a word on the board. These input-level data allow us to calculate higher level metrics. In particular, in this paper we define the following analytics:

• Inter-keystroke interval (IKI): the number of milliseconds elapsed between the “key down”

events of each contiguous pair of keystrokes in a correctly-typed word. For the purposes of

this metric, we excluded events from words that were typed with mistakes.

• Number of mistakes: the count of keystrokes during a round that did not clear the game

board, or that did not result in the board being one character closer to being cleared. For the

purposes of this metric, we did not count whitespace characters as mistakes.

The IKI is a common metric for typing speed [104], while the number of mistakes is a natural metric for typing accuracy. We will refer to typing speed as the inverse of the IKI, where a smaller

IKI represents an increase in speed (and vice versa), and to typing accuracy as the inverse of the number of mistakes made, where fewer mistakes indicate a higher accuracy (and vice versa).

45 4.3 Analysis and Results

To ground the internal validity of our study with respect to both speed and accuracy, we compared the first attempt of the practice round between both the replay and no replay conditions. Because the game experience for both conditions is identical at this point in the game, we expected no substantial difference between the two. To evaluate the significance of the difference in the average

IKI between the replay (M = 164.32,SD = 92.45) and no replay (M = 163.62,SD = 121.77) conditions on the first attempt of the practice round, we conducted a Welch’s independent-samples t-test, which revealed no significant difference in speed (t (1170.2) = 0.12995,p = 0.8966). To evaluate the − significance of the difference in the average number of mistakes between the replay (M = 5.44,SD =

5.51) and no replay (M = 5.78,SD = 5.89) conditions on the first attempt of the practice round, we conducted a Welch’s independent-samples t-test, which revealed no significant difference in accuracy (t (33.331) = 0.19074,p = 0.8499). Having established the first attempt of the practice round as a valid baseline across our experi- mental conditions, we used the individual player’s averages of IKI and number of mistakes on this attempt of this round to normalize their game rounds’ IKI and number of mistakes, respectively, by dividing the measured value by the average. We use these normalized values for the rest of our analyses. Descriptive statistics for IKI and number of mistakes made in each round by attempt are shown in Table 4.2 and Table 4.3, respectively.

4.3.1 Improvement with Practice

This part of the analysis focuses on the replay condition as it was the only one that allowed replaying rounds. Even though participants in the replay condition were allowed to replay as many times as they wanted, the most participants replayed a single round was 8 times. However, because at most 3 participants replayed a single round more than 4 times, we decided to focus on the first 4 attempts in our analysis.

Our hypothesis H1 expects there to be an improvement in speed as rounds are replayed. We

46 Table 4.2 Normalized mean and standard deviation of the inter-keystroke interval of participants on the "replay" condition on each of the first four attempts of every round.

Attempt 1 Attempt 2 Attempt 3 Attempt 4 M SD M SD M SD M SD DictM 1.03 0.58 1.02 0.61 0.96 0.63 0.96 0.41 ShapeS 1.22 0.69 1.18 0.60 1.12 0.77 1.15 0.48 ShapeM 1.29 0.90 1.24 0.95 1.13 0.70 1.05 0.55 ShapeL 1.52 1.07 1.40 0.92 1.36 0.86 1.25 0.75 NoShapeS 1.21 0.65 1.15 0.55 1.06 0.50 1.05 0.60 NoShapeM 1.37 0.94 1.26 0.72 1.19 0.72 1.11 0.68 NoShapeL 1.46 0.99 1.42 0.82 1.21 0.64 1.22 0.75 RandS 1.41 0.89 1.25 0.72 1.18 0.55 1.24 0.80 RandM 1.68 1.36 1.58 1.03 1.37 0.85 1.54 1.50 RandL 1.95 1.49 1.91 1.45 1.77 1.14 1.56 1.03

conducted a factorial ANOVA to examine the effects of Attempt and Round on the IKI. The results yielded a main effect for the attempt (F (1,15471) = 102.4765,p < 0.001), indicating that the typing speed of participants significantly increased (i.e., the IKI decreased) the more rounds were replayed.

The main effect of the round was also significant (F (9,15471) = 102.4765,p < 0.001). The interac- tion effect was non-significant (F (9,15471) = 0.8436,p > 0.1). This results is consistent with our hypothesis H1.

Our hypothesis H2 expects there to be an improvement in accuracy as rounds are replayed. As before, we conducted a factorial ANOVA to examine the effects of Attempt and Round on the number of mistakes made. The results yielded a main effect for the round (F (9,284) = 3.2348,p < 0.001), indicating that the typing accuracy of participants is significantly dependent on the round that was being played. The main effect of the attempt was not significant (F (1,284) = 0.0693,p > 0.1). The interaction effect was also non-significant (F (9,284) = 0.4621,p > 0.1). This results contradicts our hypothesis H2.

47 Table 4.3 Normalized mean and standard deviation of the number of mistakes made by participants on the "replay" condition on each of the first four attempts of every round.

Attempt 1 Attempt 2 Attempt 3 Attempt 4 M SD M SD M SD M SD

DictM 2.85 2.74 2.98 2.86 2.98 3.15 2.2 N/A ShapeS 3.17 3.85 2.40 1.69 2.24 1.043 2.2 0.28

ShapeM 2.13 2.07 5.07 4.20 3.39 2.94 5 N/A ShapeL 5.68 7.47 5.09 3.35 3.08 1.88 4.38 4.14 NoShapeS 2.46 2.18 2.69 1.54 2.65 2.07 4.65 5.93 NoShapeM 3.767 3.48 3.30 1.97 3.06 1.65 5.05 0.64 NoShapeL 6.19 6.63 4.10 2.34 5.17 3.10 6.44 5.24 RandS 1.91 2.06 1.99 1.90 1.87 1.46 2.18 1.78 RandM 4.94 3.94 4.03 2.13 5.48 1.90 4.37 4.20 RandL 4.72 4.89 2.71 1.81 4.03 3.52 6 2.83

4.3.2 Familiarity with Words

For this analysis we look at how the different types of words in our game rounds affected speed and accuracy. In particular, we expected words that are more similar to real words to be typed faster

(H3) and more accurately (H4). In decreasing order of similarity to real words we have dictionary words (DictM), dictionary words with transposed letters preserving the shape of the word (ShapeS,

ShapeM, and ShapeL), dictionary words with transposed letters breaking the shape of the word

(NoShapeS, NoShapeM, and NoShapeL), and random letters (RandS, RandeM, and RandL).

The average IKI increases as the words participants typed resembled less real words (see Fig- ure 4.2a), as predicted by H3. To evaluate significance of this difference we conducted a factorial

ANOVA that explored the effects of word length, word type, and condition on IKI. The results yielded statistically significant interactions between the word type and word length (F (4,28374) =

22.9631,p < 0.001), between word length and condition (F (2,28374) = 9.2675,p < 0.001), and be- tween word type and condition (F (3,28374) = 10.4835,p < 0.001). The interaction between word

48 (a) Inter-keystroke interval

(b) Number of mistakes

Figure 4.2 Comparison of the average normalized inter-keystroke interval and normalized number of mistakes by word type on the first attempt of every round.

49 length, word type, and condition was not significant (F (4,28374) = 0.1962,p > 0.1). Simple main effects analysis showed significant differences in speed dependent on word length (p < 0.001), word type (p < 0.001), and condition (p < 0.001). This result is consistent with hypothesis H3.

Our hypothesis H4 expects participants to be more accurate on words that are closer to dictionary words. Figure 4.2b shows the number of mistakes made by our participants according to the type of word being typed. To evaluate these differences we conducted a factorial ANOVA that explored the effects of word length, word type, and condition on the number of mistakes made. The results yielded a statistically significant interaction between the word type and word length (F (4,554) = 3.0836,p = 0.01578). All the other interactions were not significant. Simple main effects analysis showed a significant difference in accuracy dependent on word length (F (2,554) = 16.8025,p < 0.001). We found no statistically significant difference in accuracy dependent on the type of the word. The lack of significance of the effect of the word type contradicts our hypothesis H4.

4.3.3 Additional Analyses

To obtain more insight we ran additional tests to compare the replay and no replay conditions on both speed an accuracy metrics, both on the first attempt of every round, and with up to 4 replays

(for the replay condition; the no replay condition only had one attempt per round).

When comparing the first attempt of every round between conditions we found that the mean

IKIs of the replay condition were consistently smaller than those of the no replay condition (see

Figure 4.3a). Using a Welch’s independent-samples t-test, we found a significant difference in speed on the first attempt of every round between conditions (t (18384) = 10.236,p < 0.001). In contrast, as shown on Figure 4.3b, we don’t see a clear distinction when comparing the number of mistakes made on the first attempt of every round between conditions. To determine significance difference we conducted a Welch’s independent-samples t-test, which revealed no significant difference in accuracy on the first attempt of every round between conditions (t (350.59) =

0.59744,p = 0.5506). −

50 (a) Inter-keystroke interval

(b) Number of mistakes

Figure 4.3 Comparison of the average normalized inter-keystroke interval and normalized number of mistakes by condition on the first attempt of every round. The vertical lines separate rounds by word type.

51 4.4 Discussion

The above analysis confirms that input-level behavior in the form of typing speed improves with practice and when the words are more familiar. Surprisingly, we find that this improvement in speed is not accompanied by an improvement in typing accuracy neither with practice nor with familiarity with the words being typed. The number of mistakes made cannot be used to explain the reduction in IKI.

We saw that on the first attempt of the practice round, where the game experience is identical across conditions, all of our participants behave similarly. However, as the game progresses, partici- pants in the replay condition significantly increase their typing speed without any improvement in the number of mistakes they make, indicating that the speed improvement is not attributable to an increase in accuracy. Because the only difference between conditions is the ability to replay rounds, a plausible explanation for this behavior lies in the fact that the cost (in terms of mathematical utility) of making mistakes is smaller than the reward of earning a higher compensation by typing faster, because the opportunity to replay the round is always there. This behavior is consistent with research on task accomplishment strategies, where there exists a trade-off between speed and accuracy [15, 55, 67]. We find support for this explanation in our data when we compare the typing speed on the first attempt of every level between the replay and no replay conditions (see

Figure 4.3a). On these first attempts, players in both conditions have had the same exposure to the words on each round, ruling out familiarity as an explanation for the significant difference in speed between conditions. We see that participants in the replay conditions are consistently and significantly faster than participants in the no replay condition after being exposed to the possibility of replaying, whereas this difference is non-existent on the first attempt of the practice round, where they have not yet been exposed to this game mechanic.

Our results show that player behavior can be predictably and unobtrusively influenced by manipulating familiarity. Typing speed has a more direct relationship to the nature of what is being typed than the number of mistakes that are made while typing. This suggests that, by inspecting

52 typing speed, a game can be more effective at detecting keystroke pattern anomalies (and possibly identifying the cause of the anomaly) than looking at the number of incorrect attempts alone.

Therefore, our results indicate that typing speed can be used to scrutinize the player’s familiarity to the text being typed, by comparing their input-level data patterns to a known baseline.

There are several limitations to consider when interpreting our results. As mentioned earlier, we focused on establishing internal validity in this case study, taking first steps toward building cognitive models of input device interaction patterns. Firstly, because our sample was comprised mostly of Computer Science students, the typing proficiency of our participants is probably well above average, which is a threat to the external validity of our findings. Secondly, our game did not attempt to establish ecological validity, but was instead designed to elicit specific behaviors that manipulated the cognitive processes required to complete the game rounds. Thirdly, the nature of the words included in our game was also intentionally limited, and did not include numbers, uppercase letters, nor special characters. Despite these limitations, the empirical data we collected in this case study will allow us to generate cognitive models from interaction patterns of real users that can then be validated with a more representative sample and on multiple domains.

53 Chapter 5

Case Study: The Mimesis Effect

5.1 Introduction

This case study shows how player knowledge and VGE affordances in the form of character roles in an interactive narrative role-playing game can be leveraged to unobtrusively influence semantic behavior with a narrative purpose. In particular, we show how character roles can influence players’ choices, which can be applied to guide the storyline of an interactive narrative experience.

Interactive narratives are a type of interactive experience in which users influence a dramatic storyline through actions by assuming the role of a character in a fictional world [99]. One of the key challenges of interactive narrative design [5] is achieving a balance between the story’s coherence and the user’s sense of dramatic agency—the satisfying power to take meaningful action and see the results of our decisions and choices [85]. The commercial state-of-the-art approach to this challenge is to author interactive narrative content for each of the user’s actions that has a meaningful impact on the story’s progress. However, the amount of interactive narrative content that must be authored to support this level of agency is exponential in the amount of ways the player can direct the development of the unfolding narrative [24]. For compelling experiences, this authoring becomes expensive and complex, requiring a significant amount of time to ensure that a high-quality experience is delivered [20, 84]. One approach to ameliorate the authorial combinatorics problem of

54 interactive narratives is to understand and catalogue how players engage with interactive narrative artifacts. Through this understanding, designers could identify mechanisms to influence players to take specific actions so that agency and story coherence are both preserved.

In this case study, we present an experiment aimed at distilling the relationship between a player’s sense of her narrative role and the actions she selects when faced with choice structures in an interactive narrative role-playing game. While character roles are tacitly assumed to affect a player’s behavior with respect to their in-game actions, no work exists to experimentally unpack this relationship. As defined by Mawhorter et al. [80], a choice structure consists of three things: a) the framing, which is the presentation of content prior to making the choice that influences how a player interprets it, b) the options, which are the discrete interface elements that lead to c) the outcomes, content that is presented after an individual option is chosen. Our work attempts to understand the relationship between the framing players experience with respect to in-game roles, and the options they select during the course of gameplay. A choice structure’s framing context

(which includes a player’s narrative role) in conjunction with the presentation of specific options for action will have an effect on their eventual choice [113]. In the context of the taxonomy we propose in this dissertation, these higher-level actions corre- spond to semantic behavior—they can only be interpreted in the context of the game where they occur. This case study construes roles as preference functions over actions in action sequences. In our context, actions are constrained to mean options in choice structures.1 Given a set of choice structure options that are afforded in an interactive narrative role-playing game, roles are distin- guished by the different preferences they express over those options. Being able to influence players to prefer one option over others within a choice structure could be advantageous to drive the nar- rative of a game in a particular direction. For a fixed set of available options, we would therefore expect the following hypotheses to be confirmed so that semantic behavior can be influenced with a narrative purpose:

H1: Choice Correspondence to Explicit Roles – Given an explicitly communicated player role,

1Thus, choices are a specialized kind of action; a choice is an action in a choice structure context. Other types of actions could be analyzed (e.g. movement), but that is beyond the scope of this work.

55 game players will consistently (ceteris paribus) prefer specific choice structure options over

others; namely those that they expect are dictated by their role.

H2: Choice Correspondence to an Implicit Role – In the absence of an explicitly communicated

player role, game players will consistently (ceteris paribus) prefer specific choice structure

options over others; namely those that they expect are dictated by a role.

H3: No Preferred Role in Control Group – In the absence of an explicitly communicated player

role, game players will not consistently (ceteris paribus and relative to other players) prefer

the same set of choice structure options over others; namely those sets of options that are

mapped to particular roles.

H4: Less Variability with Explicit Roles – Given an explicitly communicated player role, game

players will be more consistent (ceteris paribus) in their preference for specific choice structure

options over others than game players without an explicitly communicated player role.

H5: Choice Consistency Increases – Game players will become more consistent in their preference

for specific choice structure options over others as the game progresses.

H6: First Choices Are Predictive of Implicit Role – In the absence of an explicitly communicated

player role, the first few expressed preferences for specific choice structure options over others

by game players can be used to accurately determine their general preference for specific

choice structure options.

In essence, we expected that an explicit role serves as a tacit directive to players in interactive narrative role-playing games; a player’s sense of her narrative role is a way the game scripts the interactor [51] in the pursuit of actions that successfully complete the interactive narrative experi- ence. We also expected that, in the absence of an explicit role, a player’s personal preferences would guide the initial selection of actions, but that they would then remain consistent with prior choices, inspired by related work on consistency in decision-making from social psychology [30]. Lastly, and also based on this same related work, we expected that, as players made more choices in the game

(i.e., as the game progressed), they will make choices that were increasingly consistent with these prior choices.

56 A key challenge to our approach to understanding the impact of roles on player choice is coming up with a precise definition for the concept of interactive narrative role itself. Various disciplines in and around interactive storytelling have varying definitions and we do not necessarily care to settle the debate of what is and is not a role. However, to avoid making our claims vacuous, we discerned an operational definition of role upon which we anchor this study. Our definition draws from narrative roles as discussed within narratology.

5.1.1 Narrative Roles

Prince [93] defines a narrative role as a “typical set of [narrative] functions performable by, and attributes attachable to, an entity.” We focus on narrative functions since they are most closely linked to actions, and therefore choice. We were unable to find (neither in the HCI literature nor elsewhere) previously published work on narrative roles as we have defined them. The strength of our definition of role is that it is narratologically grounded; researchers who study narratives generally agree on what roles are conceptually, and these ideas guided our work, operationalization, and experimental design. This definition of narrative role is similar to that of HCI personas: “a pattern of user behaviors, goals and motives, compiled in a fictional description of a single individual” [19], which is a concept commonly used in the design of software systems. Narrative roles are different in that they are literary, and exist in fictional contexts with fictional behaviors, goals, and motives.

Propp [94] was the first to discuss narrative functions in his study of the Russian folktale when he identified two phenomena:

a) the same story action can have different narrative functions in different story plots; e.g. “John killed Peter” may be considered a villainy in one story, or a heroic victory in another, and b) different story actions can have the same narrative functions in different story plots; e.g. “John killed Peter” and “The dragon kidnapped the child” may both be considered villainy.

These examples assume “villainy” and “heroic victory” are themselves functions worth describ- ing in some universal sense. However, a narrative function can be described more generally as a story action defined in terms of its significance for the course of action in which it appears [93].

57 Narratologists generally disagree on the number and type of narrative functions [68]. Without ap- pealing to some fixed set of them, a narrative function could be extensionally regarded as a label that describes the relationship of an action to an action sequence in which it appears. If we consider narrative role to be in part defined by a typical set of narrative functions, then we can construe role as a preference for specific actions in action sequences. Transitively, a narrative role expresses a preference for action sequences themselves; given two action sequences, the one containing a higher proportion of preferred actions will itself be preferred.

5.1.2 Interactive Narrative Role-Playing Games

Based on our definition that narrative roles express preferences over action sequences, we construe interactive narrative role-playing games as a subset of interactive narratives (as defined by Riedl

& Bulitko [99]) that afford opportunities for players to express multiple distinct preferences over in-game actions in action sequences. As noted by Yee [120], different role-playing games provide different affordances to express preferences over in-game actions, and these affordances affect how players engage with the in-game choice structures. Yee et al. [124] also found that a player’s personality traits characterizes her behavior in RPGs. Additionally, work on avatar customization, such as those by Ducheneaut et al. [46] and Yee & Bailenson [121], shows that avatar identity affects player behavior. To avoid introducing spurious factors in our design caused by avatar appearance, we constrained avatar identity in our experiment as discussed in the STUDY DESIGN Section. Similarly, to control for extraneous variables, we did not provide affordances for players to express preferences via numerical attributes, instead opting for affording actions that are equally attractive in terms of mathematical utility. In our experiment, all game actions advance the story along the same causal progression toward the same conclusion; the difference is in the feedback received when a participant executes an action, which is unique for every action.

58 5.2 Method

To test our hypotheses, we developed a custom interactive narrative role-playing game, and de- veloped an experimental protocol around our custom game designed to evaluate how a player’s awareness of narrative role affects her choice over choice structure options. Our game afforded three different roles, namely, Fighters, Mages, or Rogues. These roles were chosen carefully and are based on the results of a first phase of this study we call the VALIDATION PHASE. We describe the

VALIDATION PHASE in detail in AppendixB for brevity, and because the goal of that first phase was only to inform the decisions we made when designing the interactive narrative role-playing game’s design used for the second phase of our study, which we describe here. Taken together, the results of the VALIDATION PHASE in AppendixB indicated that participants held no significant biases with respect to the roles we used in our interactive narrative role-playing game, and that the actions we afforded in our game were recognizable as representative of the respective roles we designed them to be.

5.2.1 Experimental Design

Our experiment used a 3 2 factorial design plus a control group. One factor has three levels for × the roles in the game (Fighter, Mage, or Rogue). The other factor differentiates conditions where participants chose their role to play (i.e., chosen condition), from conditions where participants were randomly assigned a role to play (i.e., assigned condition). We introduced these factors to see if there was a meaningful difference when the explicit role was adopted voluntarily versus when it was assigned by the game itself. Both the chosen and assigned conditions constituted a broader factor in the experiment, namely conditions of the experiment where the participant’s role was explicit, compared to our control condition in which the participant’s role was left unspecified. In our control group, participants were neither assigned a particular role nor given the ability to choose one explicitly. The game itself was identical for all participants.

59 Table 5.1 Distribution of participants across experiment conditions.

Condition Role Participants Fighter 26 Assigned role Mage 27 Rogue 25 Fighter 25 Chosen role Mage 34 Rogue 32 No role 41

5.2.2 Population and Sampling

The target population for this study was interactive narrative role-playing game players at least

18 years of age. As mentioned, the study had two sequential phases; the VALIDATION PHASE was conducted first and was used to inform and construct the materials used for the second phase, the

EXPERIMENTPHASE. In order for results from the first to be applicable to the second, we used the same sampling frame. However, to avoid introducing biases, we stratified the sampling frame to dis- tinguish and separate the sampling for each phase. Participants were recruited using a combination of convenience and snowball sampling.

For the EXPERIMENTPHASE, we recruited from the Computer Science student body at NC State

University, through social media, and through mailing lists. Our experiment sample consisted of

210 subjects between the ages 18 and 38 (M = 21.02, SD = 3.4) where 80% were males. Of those recruited, 56.67% reported having played table-top role playing games. 96.67% reported having played computer or console role-playing games, with more than half (70.48%) reporting that they play computer or console role playing games frequently. Of our sample, 85.71% were native English speakers with only 1.91% reporting having a limited English working proficiency. Table 5.1 contains the distribution of the 210 participants across conditions.

60 Figure 5.1 Part of the animation sprite used for the player’s avatar, which was modeled after Perlin’s Polly [91] to avoid the Proteus Effect [121] – the phenomenon that users conform to expected behaviors associated with an avatar’s appearance.

5.2.3 Description of the Environment

The game was developed using the Impact.js2 JavaScript game engine, and was hosted online. The game requires keyboard input exclusively, and was designed to reflect a control scheme that is typical of computer-based games. Keys ‘W’, ‘A’, ‘S’, and ‘D’ (alternatively, the arrow keys) moved the character up, left, down, and right, respectively. The ‘E’ key was a context-sensitive action button that enabled players to interact with non-player characters (NPCs) they were proximal to, and the

‘Spacebar’ key advanced dialog. The game interface alerted when each key was available to be pressed in the player’s context (e.g. “Press ‘E’ to talk”, “Hit space to continue”), as demonstrated in

Figure 5.3.

Our game3 is a one-player interactive narrative role-playing game; see Figure 5.6 for a synopsis of the game’s plot. To avoid the Proteus Effect [121]—the phenomenon that users conform to expected behaviors and attitudes associated with an avatar’s appearance—the playable character’s avatar was modeled after Perlin’s Polly [91], a gender-neutral anthropomorphic geometric shape. Figure 5.1 shows a portion of the sprite sheet we used to animate the player’s avatar motion. The game used a

2-dimensional top-down view with oblique projection as shown in Figure 5.2. The camera follows the player’s movement so that her character’s avatar is always centered on the screen. Carried items, such as the player’s weapon, were displayed in an inventory box on the bottom-right corner of the game screen. The inventory box was always visible during gameplay.

As the story unfolds, players face a series of 12 choice structures with consistent ordering across all participants. In each choice structure, the player must select one out of three options, with

2http://impactjs.com/ 3The reader is encouraged to play along! The game is available here: http://go.ncsu.edu/ixd-demo-rpg

61 Figure 5.2 Screenshot of a sample in-game level environment.

each option corresponding to one of the three afforded roles (Fighter, Mage, or Rogue). All choice structures, along with their associated options, and mappings between options and roles are listed in Table B.1. Importantly, participants are not explicitly informed of the mapping between choice structure options and roles. Instead, the game interface only presents the names of the options, as illustrated in Figure 5.4. The order in which the three actions were presented at each choice point was randomized using the unbiased Fisher-Yates shuffle algorithm [47] at the time the choice point was activated. Regardless of role alignment, each action in every choice point always succeeded and resulted in the same narrative progression in the game. Players were unaware of alternate narrative progressions because they were forbidden to play the game more than once. To provide a sense that the choice had a meaningful impact in the story, a static image cutscene was presented for three seconds immediately after every choice with an illustration of the selected action being performed.

This acknowledgment of choice was shown by Fendt et al. [50] to be enough to preserve player agency; an important characteristic of meaningful play experiences [78]. An example cutscene is shown in Figure 5.5. To give the narrative context for these choice structures, we provide a synopsis

62 Figure 5.3 Screenshot of a sample in-game dialog box.

of the game’s plot in Figure 5.6; the numbers preceding some of the sentences in the synopsis correspond to the choice structures that the player encountered to resolve the plot point described by the sentence.

5.2.4 Experimental Procedure

Participants engaged with the experiment via the Internet. After obtaining informed consent, par- ticipants completed a demographic information survey. Participants were then randomly assigned to one of the experimental conditions. Next, participants were presented with a description of the fantasy setting of the game. This description was presented to all experimental conditions, and included the three role descriptions as written in the GAMEDESIGNREQUIREMENTS Section in Ap- pendixB. However, these descriptions were framed as “ characters that could be encountered in the world” (emphasis added). Participants in the chosen and assigned conditions were then associated to a specific role. In the chosen condition, participants were presented with an additional screen that prompted them as follows: “In this game you will have one of the following roles. Please read the

63 Figure 5.4 Screenshot of a sample in-game action selection screen.

descriptions carefully and choose the role you would like to have” (emphasis added). In the assigned condition, participants were shown an additional screen that indicated the following: “In this game you will have the following role. Please read the description carefully” (emphasis added). All role descriptions were presented in random order. Participants in all conditions were then tasked to play a tutorial level to familiarize themselves with the game, which required them to move their avatar in all directions, advance dialogs, make choices, interact with NPCs, and understand the inventory system. After completing the tutorial level, the actual game began. Participants were required to complete the game in order to proceed. After completing the game, participants completed the interest/enjoyment sub-scale of the Intrinsic Motivation Inventory (IMI) [37].

5.2.5 Evaluation Metrics

The semantic data we are using to evaluate our hypotheses are operationalized in terms of the number and type of choice structure options that were selected by participants during gameplay.

We had two independent variables in our experiment: player awareness of role and player’s role;

64 Figure 5.5 Screenshot of a sample in-game cutscene.

if the player is not explicitly aware of a role they’re playing, the value of the second independent variable is undefined, as is the case for the control group. The player’s awareness of role is either explicit (for the chosen/assigned factors), or not explicit (for the control group). The player’s role (applicable only for the explicit conditions) is either Fighter, Mage, or Rogue. The one dependent variable in our experiment represents semantic data: selected player options (i.e. actions) at every choice structure. Every participant in our experiment encountered the 12 choice structures that appear in Table B.1 (in AppendixB); every participant contributes 12 data points for the experiment.

We are assuming all choice structures are equal in terms of relevance to the player’s option selection.

Given our six hypotheses, we expected to see the following trends in terms of our dependent variable as a function of manipulations to our independent variables:

H1: Choice Correspondence to Explicit Roles – For participants explicitly aware of their role, we

expected to influence semantic behavior in the form of a high count of actions associated to

the participant’s explicit role, and a low count of actions associated to other roles.

H2: Choice Correspondence to an Implicit Role – For participants in the control condition, we

65 In the beginning, the player encounters a kingdom Green Guard, who informs the player that the Crown of Power has gone missing; without it, the kingdom cannot crown a new king to replace the old king who passed away. (1) The guard gives the player a coupon for a weapon at the local shop and tells the player to meet him at the castle. (2) Along the way, the player encounters a denizen who is attempting to rescue her cat from atop a tree. (3) After rescuing the cat, the player encounters a set of bandits who are blocking the path. (4) When the bandits are dealt with, the player encounters a large tree-stump that blocks the way. (5) Having dealt with the tree-stump, the player arrives at the castle, and is interrogated by guards – they question the task the Green Guard entrusted upon you. With the guards managed, the player enters the castle and meets the king’s councilor, who urges to find the missing crown, since he does not desire to be the land’s steward. The player meets the Green Guard, who indicates that the crown is being guarded by a dragon, and that the player should seek the dragon slayer to help. The dragon slayer reveals that an enchanted weapon is needed to defeat the dragon. (6, 7) To enchant the weapon, the player must defeat a manticore and bring the beast’s heart to a witch who requires it as an ingredient for an enchantment spell. (8) The witch tests the player’s character, and then proceeds to enchant the weapon. (9, 10, 11) Armed with an enchanted weapon, the player travels to face the dragon. Upon slaying the dragon, the player recovers the crown of power. However, on the way back to the castle, the player encounters the king’s councilor who reveals that he gave the crown to the dragon, in order to be the land’s steward for perpetuity. (12) The councilor attempts to make an escape, which is foiled by the player. In the epilogue, the newly anointed king names the player the new councilor for the kingdom and the game ends.

Figure 5.6 Plot synopsis for our game. Numbers preceding some of the plot points correspond to choices the player encountered to resolve that plot point, enumerated in Table B.1.

expected that each participant would select a significant number of actions that were consis-

tent with one role, regardless of which. Each participant should produce semantic behavior

data in the form of a high count of actions associated to one role, and a low count of actions

associated to other roles.

H3: No Preferred Role in Control Group – For participants in the control condition, we expected

that, in aggregate (i.e., across all participants in that condition), the total counts for actions

associated to each of the roles would be relatively even, and neither high nor low.

H4: Less Variability with Explicit Roles – For participants explicitly aware of their role, we ex-

pected that, in aggregate (i.e., across all participants in that condition), the percentages of

semantic actions associated to the participants’ explicit role will show significantly less vari-

ability across all 12 choice structures in our game than the percentages of actions associated

to participants’ implicit role in the control condition, as determined when evaluating H2.

66 H5: Choice Consistency Increases – For participants both explicitly aware of their role and in the

control condition, we expected that, in aggregate (i.e., across all participants in their respective

condition), the percentages of semantic actions associated to the participants’ explicit role (in

the case of the chosen and assigned role conditions) or implicit role (in the case of the control

condition, as determined when evaluating H2) will increase as players navigate the 12 choice

structures in our game.

H6: First Choices Are Predictive of Implicit Role – For participants in the control condition, we

expected that each participant’s first three, four, or five actions can be used to accurately

determine their implicit role, using as a reference the role that was assigned to each participant

when evaluating H2.

5.3 Analysis and Results

To assess external validity with respect to the participants’ engagement throughout our experiment, we looked at our sample’s IMI scores across conditions. The mean score across the sub-scale for all conditions fell close to the middle point of the 7-point Likert scale. Specifically, the overall scores were [M = 4.3, SD = 1.6], [M = 4.4, SD = 1.6], [M = 3.8, SD = 1.8], [M = 3.4, SD = 1.8], [M = 3.4,

SD = 1.8], [M = 4.2, SD = 1.7], [M = 3.4, SD = 1.8] for questions 1-7, respectively. This means that our participants on average did not express strong feelings regarding their enjoyment of the study

(full results are included in Table 5.2).

5.3.1 H1: Choice Correspondence to Explicit Roles

To test our hypothesis that an explicitly communicated player role will influence players’ semantic behavior to prefer actions that they expect are dictated by their assigned or chosen role (H1), we calculated the number of actions each participant chose corresponding to each of the three roles, and then grouped them by the participant’s explicit role. As expected, participants chose more actions that align with their game roles than actions from the other roles, as shown in Table 5.3. To

67 Table 5.2 Score (and standard deviation) for the 7 questions in the Intrinsic Motivation Inventory (1. I enjoyed doing this activity very much, 2. This activity was fun to do, 3. I thought this was a boring activity, 4. This activity did not hold my attention at all, 5. I would describe this activity as very interesting, 6. I thought this activity was quite enjoyable, 7. While I was doing this activity, I was thinking about how much I enjoyed it).

Condition Role Question 1 Question 2 Question 3 Question 4 Question 5 Question 6 Question 7 Fighter 4.6 (1.4) 4.6 (1.6) 3.6 (1.6) 3.423 (1.724) 3.962 (1.708) 4.231 (1.966) 3.731 (1.909) Assigned role Mage 4.6 (1.7) 4.6 (1.8) 3.6 (1.8) 2.926 (1.639) 3.667 (1.754) 4.370 (1.690) 3.778 (1.847) Rogue 4.1 (1.4) 4.3 (1.4) 3.8 (1.6) 3.320 (1.600) 3.640 (1.411) 4.120 (1.481) 3.280 (1.969) Fighter 4.3 (1.4) 4.6 (1.4) 3.6 (1.8) 3.440 (1.710) 4.200 (1.658) 4.200 (1.581) 3.640 (1.630) Chosen role Mage 3.8 (1.5) 4.1 (1.6) 4.3 (1.7) 3.941 (1.808) 3.176 (1.566) 3.765 (1.519) 3.147 (1.925) Rogue 4.0 (1.6) 3.9 (1.8) 4.1 (2.0) 3.875 (1.947) 3.531 (1.626) 3.906 (1.766) 3.094 (1.532) No role 4.6 (1.7) 4.7 (1.6) 3.3 (2.0) 2.854 (1.667) 4.171 (1.702) 4.512 (1.705) 3.488 (1.567) Overall 4.3 (1.6) 4.4 (1.6) 3.8 (1.8) 3.354 (1.774) 3.772 (1.704) 4.165 (1.685) 3.449 (1.783)

Table 5.3 Number of actions chosen by participants with explicit roles that corresponded to each of the roles. In parentheses, the proportion 2 of each value for each player role (row). Players were significantly consistent with their explicit roles (χ = 1286.3, p < 0.0001, φc = 0.563).

Explicit Fighter Mage Rogue Role Actions Actions Actions Fighter 402 (65.7%) 76 (12.4%) 134 (21.9%) Mage 71 (9.7%) 557 (76.1%) 104 (14.2%) Rogue 84 (12.3%) 123 (18.0%) 477 (69.7%) No Role 108 (22.0%) 188 (38.2%) 196 (39.8%)

68 Table 5.4 Number of actions chosen by participants with explicit roles based on whether their role was chosen or assigned. In parentheses, the proportion of each value for each condition (row). We found a 2 statistically significant (but small) increase in consistency when players chose their role (χ = 22.365, p < 0.0001, φc = 0.106).

Explicit Role Consistent Inconsistent Chosen 822 (75.3%) 270 (24.7%) Assigned 614 (65.6%) 322 (34.4%)

evaluate the significance of these results we conducted a Chi-square test which revealed that the choices made by participants (excluding our control group) were significantly consistent with their

2 explicit role (assigned or chosen) (χ (d f = 4, N = 210 12) = 1286.3, p < 0.0001, φc = 0.563). ∗ We also calculated the percentages of actions that participants chose that corresponded to each role in each of the explicit roles, and compared them to choices made by participants in our control group. We found that not only is the percentage of actions aligned with participants’ roles higher than in the control group, but the percentages of actions that are not aligned with participants’ roles are lower than those in the control group. Together with the prior Chi-square analysis, this result strongly confirms H1.

We further explored if there was any significant difference between participants in the assigned condition versus participants in the chosen condition. As shown in Table 5.4, participants that were allowed to choose their role were more consistent with that role than participants that were assigned a role to play out. We conducted a Chi-square test with Yates’ continuity correction, which revealed

2 that this increase in consistency was statistically significant (χ (d f = 1, N = 210 12) = 22.365, ∗ p < 0.0001), but was not practically significant due to the small effect size (φc = 0.106).

5.3.2 H2: Choice Correspondence to an Implicit Role

To test our hypothesis that, in the absence of an explicitly communicated player role, game play- ers will consistently prefer specific semantic actions that they expect are dictated by a role (H2), we ran a k-means clustering with k = 3 on all actions that each participant chose, to determine if participants without an explicit role could be grouped into three categories based on their ac-

69 Table 5.5 Number of participants assigned to each cluster and the total number of actions chosen corre- sponding to each role by cluster. Participants in our control group were significantly consistent with an 2 implicit role (χ = 356.19, p < 0.0001, φc = 0.602), and had no significant preference for any particular one 2 (χ = 0.34146, p = 0.843).

Cluster Participants Fighter Mage Rogue Actions Actions Actions 1 15 17 22 141 2 14 10 140 18 3 12 81 26 37

tion choices. We found that the three clusters nicely capture these three dimensions of data with b e t w e e nSS/t o t al SS = 0.797. The next step was to determine if these clusters corresponded to the three roles that we defined.

For this, we added the number of semantic actions chosen by participants in each cluster by the roles to which those actions were mapped. The results, shown in Table 5.5, indicate that there is indeed an alignment between clusters and roles with participants in clusters 1, 2, and 3 choosing more

Rogues, Mages, and Fighters actions, respectively. A Chi-square test with Yates’ continuity correction on the number of actions by cluster revealed that choices made by participants were significantly

2 consistent with an implicit role (χ (d f = 4, N = 41 12) = 356.19, p < 0.0001, φc = 0.602), strongly ∗ confirming H2.

5.3.3 H3: No Preferred Role in Control Group

To test our hypothesis that, in the absence of an explicitly communicated player role, game players will not consistently (and relative to other players) prefer semantic actions mapped to a particular role (H3), we looked at the number of participants that were assigned to each cluster, and therefore to an implicit role, expecting to find an even distribution. To determine how close the distribution across the three implicit roles matched an even distribution, we conducted a Chi-square goodness- of-fit test. Our test revealed no significant preference for any of the three roles among participants

2 who were not given an explicit role (χ (d f = 2, N = 41) = 0.34146, p = 0.843), consistent with H3.

70 5.3.4 H4: Less Variability with Explicit Roles

To test our hypothesis that players with an explicit role will be more consistent in their choices than players with an implicit role (H4), we calculated the number of actions consistent with the players’ role—implicit or explicit—grouped by experimental condition and choice structure. Because participant numbers in each experimental condition were different, we normalized these counts by the number of participants in each group, obtaining the percentage of consistent choices made on each of our 12 choices by experimental group. As expected, participants with explicit roles (chosen or assigned) had a smaller variance in their consistency percentages (0.002173) than participants with an implicit role (0.008419), as shown in Table 5.7. To evaluate the significance of these differences, we conducted a Brown-Forsythe F test of homogeneity of variance which revealed a significant difference (α = 0.05) in variance (F (1) = 4.401, p = 0.038), supporting H4. To further explore the relationship between choice consistence variability and type of role, we evaluated the differences in variance of consistency percentages between participants with assigned, chosen, and implicit roles (0.001549, 0.004292, and 0.008419, respectively), as shown in Table 5.7.

We conducted a Brown-Forsythe F test of homogeneity of variance across all three experimental groups, which revealed a significant difference (α = 0.05) in variance (F (2) = 3.272, p = 0.042).

5.3.5 H5: Choice Consistency Increases

To test our hypothesis that game players’ consistency with their role—explicit or implicit—will increase as they play the game (H5), we fitted linear regression models on choice consistency values across the 12 choice structures of our game (values shown in Table 5.7) by condition and role, for a total of 9 linear regressions plotted in Figure 5.7. We evaluated the goodness of fit (R 2) and significance levels (α = 0.05) of these linear models and did not find any significant linear relationships (full results shown in Table 5.6) between players’ levels of consistency and the ordering of choice structures in our game. In short, our results show no support for H5.

71 Table 5.6 Goodness of fit measure (R 2) and significance values for the linear models fitted to evaluate H5. We did not find significant differences in consistency.

Fighter Mage Rogue Condition R 2 p-value R 2 p-value R 2 p-value Role Assigned 0.026856 0.610802 0.121259 0.26733 0.242902 0.103528 Role Chosen 0.001526 0.904069 0.082978 0.363904 0.006272 0.806719 No Role 0.028408 0.600533 0.036785 0.550403 3.95E-05 0.984532

Figure 5.7 Plots of the proportion of consistent choices made by players by their implicit or explicit role, and by experimental condition, as shown in Table 5.7. The blue lines and gray shadows represent the fitted linear regressions and confidence intervals, respectively.

72 Table 5.7 Proportion of consistent choices made by players by their implicit or explicit role, and by experimental condition, including sam- ple numbers and variance of consistency.

Choice Structure Condition Role n 1 2 3 4 5 6 7 8 9 10 11 12 Variance Fighter 26 53.85% 46.15% 73.08% 57.69% 34.62% 61.54% 61.54% 69.23% 65.38% 50.00% 69.23% 50.00% 0.012641 Assigned Mage 27 85.19% 74.07% 85.19% 77.78% 81.48% 74.07% 85.19% 74.07% 77.78% 81.48% 77.78% 74.07% 0.002078 Rogue 25 72.00% 68.00% 52.00% 56.00% 60.00% 60.00% 60.00% 52.00% 60.00% 64.00% 56.00% 52.00% 0.004024 Total 78 70.51% 62.82% 70.51% 64.10% 58.97% 65.38% 69.23% 65.38% 67.95% 65.38% 67.95% 58.97% 0.001549 Fighter 25 84.00% 48.00% 100.00% 76.00% 48.00% 68.00% 84.00% 80.00% 84.00% 64.00% 84.00% 68.00% 0.024000 Chosen Mage 34 88.24% 76.47% 79.41% 70.59% 73.53% 70.59% 76.47% 50.00% 82.35% 64.71% 70.59% 82.35% 0.009824 Rogue 32 90.63% 81.25% 62.50% 81.25% 81.25% 75.00% 65.63% 71.88% 90.63% 90.63% 62.50% 81.25% 0.010912 Total 91 87.91% 70.33% 79.12% 75.82% 69.23% 71.43% 74.73% 65.93% 85.71% 73.63% 71.43% 78.02% 0.004292 Explicit Role Total 169 79.88% 66.86% 75.15% 70.41% 64.50% 68.64% 72.19% 65.68% 77.51% 69.82% 69.82% 69.23% 0.002173 Fighter 12 75.00% 33.33% 83.33% 50.00% 33.33% 58.33% 83.33% 66.67% 58.33% 16.67% 83.33% 33.33% 0.053188 No Role Mage 14 85.71% 92.86% 92.86% 85.71% 78.57% 100.00% 85.71% 42.86% 78.57% 71.43% 92.86% 92.86% 0.022573 Rogue 15 86.67% 80.00% 60.00% 93.33% 93.33% 80.00% 53.33% 60.00% 80.00% 80.00% 86.67% 86.67% 0.017879 Total 41 82.93% 70.73% 78.05% 78.05% 70.73% 80.49% 73.17% 56.10% 73.17% 58.54% 87.80% 73.17% 0.008419

73 5.3.6 H6: First Choices Are Predictive of Implicit Role.

To test the hypothesis that, in the absence of an explicitly communicated player role, the first few choices made by game players can be used to accurately predict their implicit role (H6), we ran a series of k-means clustering analyses with k = 3 on the number of semantic actions each participant without an explicit role chose corresponding to each of the three roles afforded by our game. This is the same approach we used to test H2 except that, instead of using all 12 choices made by each participant, we considered only the first three, four, or five choices made in the game, for a total of three separate clustering analyses. We found that the three clusters nicely capture these three dimensions of data with b e t w e e nSS/t o t al SS = 0.865, 0.855, and 0.838 when considering the first three, four, and five choices, respectively.

Table 5.8 For each of the three clustering analyses, number of participants assigned to each cluster and the total number of actions chosen corresponding to each role by cluster.

Fighter Mage Rogue Cluster Choices Cluster Participants Actions Actions Actions Role 1 13 3 1 35 Rogue 3 2 18 3 47 4 Mage 3 10 24 1 5 Fighter 1 9 28 1 7 Fighter 4 2 19 5 61 10 Mage 3 13 3 3 46 Rogue 1 17 4 70 11 Mage 5 2 14 3 6 61 Rogue 3 10 33 5 12 Fighter

As we did when evaluating H2, we added the number of semantic actions chosen by participants in each cluster by the roles to which those actions were mapped, as shown in Table 5.8. We then labeled each cluster with the role that showed the highest count of semantic actions chosen. To evaluate the significance of the differences in counts on all three analyses, we conducted Chi-square

74 tests on the number of actions by role and cluster. When considering the first three choices made by

2 players, we found a significant relationship (χ (d f = 4, N = 41 3) = 151.15, p < 0.0001, φc = 0.784) ∗ between clusters and roles with participants in clusters 1, 2, and 3 choosing more Rogue, Mage, and Fighter actions, respectively. When considering the first four choices made by players, we

2 found a significant relationship (χ (d f = 4, N = 41 4) = 174.43, p < 0.0001, φc = 0.729) between ∗ clusters and roles with participants in clusters 1, 2, and 3 choosing more Fighter, Mage, and Rogue actions, respectively. When considering the first five choices made by players, we found a significant

2 relationship (χ (d f = 4, N = 41 5) = 196.94, p < 0.0001, φc = 0.693) between clusters and roles with ∗ participants in clusters 1, 2, and 3 choosing more Mage, Rogue, and Fighter actions, respectively.

Having found significant mappings between clusters and roles, the next step was to classify each participant in the control condition by assigning them a role label corresponding to their cluster’s role for each of the three clustering analyses. To evaluate these classifications for the first three, four, and five choices made by players, we compared these new role labels to the implicit role assigned to participants when evaluating H2. By looking at the first three, four, and five choices made by players, we were able to predict their implicit role with 85.37%, 82.93%, and 87.80% accuracy, respectively.

Full results are presented in Table 5.9. These results suggest that participants self-select into a role early in the game, providing strong support for H6.

5.4 Discussion

While we expected a player’s narrative role to influence semantic behavior in the form of in-game choices, the size of this effect was larger than we anticipated. Considering Cohen’s [32] interpretation of φc = 0.5 as a large effect size, our values of 0.563 and 0.602 for H1 and H2, respectively, indicate that players are strongly consistent with their roles, regardless of whether their role was explicit

(assigned or chosen), or not. We call this effect the Mimesis Effect—the phenomenon that players act in ways that are guided by their sense of their narrative role; so named in reference to the theatrical process of creating/playing a dramatic role [93]. Our results suggest that, in VGEs that support

75 Table 5.9 Confusion matrix, classification accuracy, precision, recall, and F-score when considering play- ers’ first three, four, and five choices made in the game.

Reference Choices Prediction Fighter Mage Rogue Accuracy Precision Recall F-score Fighter 9 0 1 0.90 0.75 0.82 3 Mage 2 14 2 85.37% 0.78 1.00 0.88 Rogue 1 0 12 0.92 0.80 0.86 Fighter 8 0 1 0.89 0.67 0.76 4 Mage 3 14 2 82.93% 0.74 1.00 0.85 Rogue 1 0 12 0.92 0.80 0.86 Fighter 9 0 1 0.90 0.75 0.82 5 Mage 2 14 1 87.80% 0.82 1.00 0.90 Rogue 1 0 13 0.93 0.87 0.90

narrative roles, the strength of Mimesis Effect can be leveraged to unobtrusively influence players to make choices that align with the storyline that game designers want to pursue by framing desired actions in alignment to the player’s role.

We detected the Mimesis Effect to be statistically stronger when participants chose their role as opposed to when they were assigned to it; however, we found this effect to be small (φc = 0.106) per Cohen. Although we did not control for player preference over the roles afforded by our game, it is possible this difference could be due to players identifying more closely to the character they are portraying. We posit that the real effect can be larger than what we found in light of the possibility that some participants in the assigned condition may have been assigned to roles they would have chosen (effectively putting them on par with participants in the chosen condition).

When looking at our control group (who played the game without an explicit role), our results show that players’ semantic behavior is in fact consistent with a role. This is interesting, since it suggests that participants (consciously or not) fabricated a mental constraint on their gameplay, preferring choice structure options that fell within those constraints. Further, we showed these constraints aligned with the three roles we made explicit to participants in the chosen and as- signed conditions, as demonstrated through our clustering analyses. In essence, participants binned

76 themselves into our pre-defined roles, rather than a) choosing randomly, or b) conforming to an undefined blend of our roles, as defined through a mix of action selection. This suggests that a participant’s mere awareness of distinct character types prompts her to select one, and role-play to it, making her behave as an exemplar of that character type.

Our results also show that the first three, four, and five choices made by players in our control group are representative of their overall choice trend, suggesting that these participants decided a priori to play a specific role rather than this behavior emerging subconsciously during gameplay.

Being able to scrutinize players to detect their acting role when it is implicit or otherwise unknown via their first few choices can enable on-demand game content adaptation and player behavior influence that leverage the Mimesis Effect as soon as the acting role is detected during gameplay, rather than after the fact. This is useful in situations where game designers would prefer to influence behavior earlier in a game without requiring complete logs from a previous game session, or when previous game sessions are not available.

We also found no evidence that the narrative of our game favored choices mapped to any particular role. While establishing this reinforces the internal validity of our study, the number of participants in our control group (41) may not have been enough to detect the expected small effect size between the distribution of participants with implicit roles and an even distribution. However, even with more participants, if the distribution grows proportionally, we do not expect a significant difference based on our current results.

When exploring the stability of the Mimesis Effect, our results show that players with an explicit role (chosen or assigned) display significantly less variability in their choices of actions consistent with their role than players that have an implicit role. Exploring this stability further, we found that, when narrative roles are explicitly communicated, players who were assigned a role show less variability in their consistency than players who chose their role. Designers of interactive narratives can use this knowledge to tailor stories to the type of narrative roles—explicitly assigned, chosen, or implicit—afforded by their games, and plan their storylines with the correspondingly expected variability of role-choice consistency in mind with the hope of optimizing the process of creating

77 more engaging experiences for players.

Our results did not show significant differences in players’ consistency with their role—explicitly assigned, chosen, or implicit—as the game progresses. However, we do note a slight decline in consistency as players navigate the 12 choices of our game, as illustrated in Figure 5.7. Knowing how choice consistency with a player’s narrative role varies throughout a game—whether it increases or decays—is important for game designers to determine when to best act to leverage the Mimesis

Effect to influence players’ choices. Designers of interactive narratives that expect changes in role- choice consistency should consider these trends when crafting their stories to, for example, place pivotal moments of their plots when role-choice consistency is expected to be higher to more effectively influence players’ behavior, and therefore more granularly control players’ aesthetic experiences of the game. Our findings point to avenues of future work that explore longer sequences of choices to more accurately assess the relationship between players’ consistency with their role and game progression.

Importantly, it is possible that the RPG genre carries with it the expectation that a player will remain consistent with her role, since many commercial role-playing games (e.g., [18]) constrain players in ways that are costly (in terms of game mechanics) to pursue different roles. Participants in our study were explicitly aware that what they were playing was a role-playing game. An interesting avenue for future work is to avoid the role-playing game framing of gameplay, to see if the Mimesis

Effect still holds in that context.

There are several limitations to consider when interpreting our results. Firstly, there are many different types of role-playing games [69], and our findings here are specifically applicable to role- playing games that place an emphasis on a narrative trajectory to drive unfolding action. Secondly, all choice structures of our game are equally important from our study’s perspective, since we do not control for the choice structure’s story-level importance. This was by design, to avoid creating the sense that some types of actions were more useful than others. Thirdly, we do not account for choice structure ordering effects, nor do we include choice structures that do not allow participants to express an in-game role (which would serve as distractor choices from our study’s perspective).

78 We expect that the Mimesis Effect will generalize well to multiple settings, but our goal here was to solidly lay its foundations so that future work could further explore its applicability to other domains.

Limitations notwithstanding, our findings are impactful, and encourage further exploration of the

Mimesis Effect, wherein a player’s role in an interactive narrative significantly affects the options she selects in choice structures in an interactive narrative role-playing game, even when the role is implicitly assumed by the player. This effect, which has been tacitly assumed to be true in the literature, is thus empirically confirmed to a great degree in this case study, and is demonstrably true when the player’s narrative role is made explicit to her, as well as when it is not.

79 Chapter 6

Case Study: Asymmetric VEs

6.1 Introduction

In contrast to the case studies presented previously, where we showed how specific behaviors could be influenced with a particular purpose, in this case study we address the generalizability of these behavior influence techniques to more complex domains by describing a general architecture and proof-of-concept implementation of an Asymmetric Virtual Environment (AVE) to demonstrate how affordances for asymmetry in VGEs could be leveraged to influence many types of behavior and with many purposes. An AVE is a generalization of a Transformed Social Interaction (TSI) [6,7,

9–11], described in Chapter2. In addition to TSIs, where only the social interaction is asymmetric, a more general AVE could support asymmetry in other types of content, such as auditory and visual cues in the environment that can occur outside of the context of a social interaction. However, to align with the abundant research on TSI [6,7,9–11 ], in this case study we focus on asymmertic avatar rendering. More specifically, to showcase how our implementation can also be used as a research platform for data collection, in this case study we present our AVE implementation within the context of an experimental design and preliminary data collection for multiple experimental treatments. In a similar vein to TSIs, we show how VE affordances for asymmetric rendering can be leveraged to change avatar colors for players sharing the same virtual context.

80 In VGEs, the only immediate information available about others usually comes from a person’s on-line name and possibly an avatar. This is in contrast with social interactions in the physical world, where a great amount of communication happens implicitly. Facial expressions, tone of voice, and body language in general, together with how a person looks and what they wear, provide information that affects how we perceive each other [63]. All of this information comes together when building a preliminary impression of the other person, which shapes our expectations before and during a social interaction. When the context of human interaction is switched to a virtual environment none of this information is readily available or is synthetic (i.e., generated by algorithms as animations, and therefore not an accurate portrayal of the physical world). Since the amount of available information about other people is greatly reduced in VGEs, the elements of information that are available become much more influential. Further, due to the virtual nature of these, there is an affordance to change appearances and interactions asymmetrically, where one participant is presented with a different set of social cues than what others see. As discussed in Chapter2, previous work has demonstrated quite resoundingly that changing one’s representation can affect not only the behavior of others [6,7,9–11 ], but also our own behavior [81, 88, 90, 121, 123], including the colors of our avatars [115, 116]. This case study expands on this research by showing how AVEs can be used to asymmetrically represent avatar colors.

To demonstrate the applicability of AVEs to various domains, in addition to a casual game environment, this case study covers additional game features not explored in previous chapters, namely, multiplayer, messaging, and an immersive 3-D world. The preliminary experiment we present in this case study is set up to explore how avatar colors can influence semantic behavior in the form of individual and group performance when completing tasks in VGEs, which can be used to alter the difficulty of a game. Our experiment here is designed to test the following hypotheses:

H1: In a multi-player setting where a player perceives him or herself as having a red-colored avatar,

while others perceive it as gray, he or she will perform significantly better than the other

players.

81 H2: In a multi-player setting where a player perceives him or herself as having a gray-colored

avatar, while others perceive it as red, he or she will not perform significantly better than other

players.

H3: In a multi-player setting, groups where at least one player perceives him or herself as having

a red-colored avatar, or is perceived by others as having a red-colored avatar, will perform

significantly better than groups where every player is perceived as having a gray-colored

avatar.

Performance will be defined as semantic behavior in terms of time required to complete given tasks, number of items found during a scavenger hunt task, and accuracy on a location identification task, described in more detail later. Unobtrusively influencing player performance via avatar colors falls within the difficulty purpose of our taxonomy.

6.2 Method

We designed our experiment to asymmetrically manipulate avatar colors to be either red or gray.

We asked participants to work in groups of at most 4 to solve two tasks: a scavenger hunt in a 3-D

VGE followed by a 2-D item placement task in a casual gaming setting, which we are calling a map reconstruction task. During the scavenger hunt, participant groups were required to find and collect

10 items (shown in Figure 6.1) scattered throughout the world. On the map reconstruction task each group was presented a 2-D map representation of the scavenger hunt world and had to place icons of the items found previously from a palette onto the position on the map where they collectively believe the items were originally located during the scavenger hunt.

Toavoid introducing undesired effects due to avatar appearance, every participant was embodied by an avatar modeled after “Polly’s World" character created by Perlin [91]. As shown in Figure 6.2, these avatars consisted of a triangular prism that was anthropomorphised via animation.

82 Figure 6.1 Icons of items used in our game, as shown to participants.

Figure 6.2 Avatar models in red and gray.

6.2.1 Experimental Design

We designed a full factorial experiment with two factors, as shown in Table 6.1. The first factor corresponds to the type of game, which refers to whether the game displayed leaderboards to participants. In the with leaderboard (L+) treatments, a randomly generated leaderboard was shown to every participant on a screen before each game task started, as shown in Figure 6.3. The purpose of this leaderboard was to prime for competitiveness, influencing players to limit collaboration to make the game more difficult. The without leaderboard (L-) treatments simply skipped displaying these leaderboard screens and went straight from the task instructions to the task itself.

The second factor corresponds to the appearance of the avatars. In all treatments, unless other- wise noted, every participant saw her avatar as gray and was seen as having a gray avatar by other participants. In the Red to others conditions, the avatar of one randomly chosen participant was asymmetrically rendered as red to every other participant, but was rendered as gray to herself. In the Red to self conditions, one randomly chosen participant saw herself as red while everyone else saw her as gray. Participant names were always randomly assigned between P1 and P4.

83 Table 6.1 Full factorial experimental design including number of participants and number of game ses- sions per experimental treatment.

Type Appearance Game Sessions Participants All gray 1 3 Without leaderboard (L-) Red to others 2 6 Red to self 1 2 All gray 1 2 With leaderboard (L ) + Red to others 1 4 Red to self 1 2

6.2.2 Population and Sampling

Our target population was adult gamers of at least 18 years of age. Recruiting materials were dis- tributed through NC State’s Computer Science mailing lists, through social media and gaming- related online forums, and through flyers posted in Engineering Building II at NC State’s Centennial

Campus. Students enrolled in some Computer Science courses at NC State were also invited to participate and were offered extra-credit as incentive.

A total of 56 people registered to participate of which only 19 ultimately completed the study.

We registered one game session per treatment with the exception of the L- Red to others that had

2, for a total of 7 game sessions, as shown in Table 6.1. This table also describes the distribution of participants across experimental conditions. While this small sample is unsuitable to establish external validity of any findings, our intention with this case study is only to demonstrate the data collection capabilities of our AVE implementation as a research platform.

6.2.3 Description of the Environment

We created a research platform from the ground up that takes advantage of the affordance of VEs for asymmetric content. Our implementation of an AVE uses a Unity3D-based 3-D component supporting both first-person and third-person perspectives, and a web-based 2-D component. A

84 (a) Scavenger hunt (b) Map reconstruction

Figure 6.3 Leaderboards shown in L+ treatments. general high-level architecture diagram showing the different components of this platform and how they are related is shown in Figure 6.4.

The Unity3D component is divided into an game server and a game client. The client is capable of running on a desktop computer, a Web browser, or mobile devices. Asymmetry in the Unity3D component is handled by an authoritative server that instructs each client what type of content to render. Game clients use the Unity3D Master Server to discover and connect to our game server.

The Web component consists of an application server that renders HTML content and processes requests. In addition to handling asymmetric rendering, this component provides a framework for conducting studies, such as participant sign-up, consent form, surveys, and access control. This application server is also in charge of handling chat communication among participants. While not used in this case study, controlling player communication on the server provides an opportunity to asymmetrically alter messages, if desired.

Both the Unity3D and the Web components have access to the same MySQL database where studies and experimental treatments can be configured, and where collected data is stored. Our implementation supports asymmetry on both components. However, the experiment presented in

85 Figure 6.4 High-level architecture diagram.

this case study only makes use of asymmetric colors of avatars.

6.2.4 Experimental Procedure

Game sessions were conducted exclusively online on participants’ computers through the supported

Web browser of their choice on either Windows or Mac OS X. Interested individuals would visit a website and sign up for one of the available time slots. Up to four people were allowed to register for the same time slot. Participants would then visit this website at the date and time they chose, where they would provide consent to participate and fill out a survey asking basic demographic information.

Since we are manipulating avatar colors, this survey also tested participants for different types of daltonism. Participants were randomly assigned a name (e.g., P1 – P4) and an avatar color according to the experimental treatment to which they were assigned.

Next, participants were presented with instructions on the scavenger hunt task which included a picture of how their avatar would look to them, their avatar’s name, and pictures of the 10 items they were tasked to find. This screen also contained instructions on how to control their avatar, pick

86 (a) Scavenger hunt

(b) Map reconstruction

Figure 6.5 Screenshots of both game tasks.

87 up items, and use a chat to communicate with other players. In L+ treatments, the following screen would display a leaderboard corresponding to the scavenger hunt component, shown in Figure 6.3a.

L- treatments simply skipped this screen.

The next screen would load the 3-D game inside the Unity Web Player next to a chat area, as shown in Figure 6.5a. Participants waited for other players to join in an introductory game screen where they had the opportunity to use the chat functionality to communicate with each other. The game screen would display the avatar names of all the players as they reached this stage. This screen displayed activity status and audio that indicated that the game would start soon. Once all available participants reached this screen, the researchers would start the game and participants could move freely in the environment by using their computer’s mouse to choose a direction and the keys

“W", “A", “S", “D" on their keyboards to move in that direction. This task used a 3-D third-person perspective to give participants the opportunity to see their avatars as they interacted with the game environment. The game also supported jumping by pressing the “space" key. Participants could pick up items by placing their avatars over an item and pressing “E" on their keyboard.

The game screen showed icons for all the items to be found. When a participant picked up an item, the game would display a message to all participants indicating the name of the player and the name of the item that was found, and the icon corresponding to the item that was just found would disappear from the list of missing items for all players. All game sessions were given

45 uninterrupted minutes to find all the items. If no items were found for a continuous period of

7 minutes, the game would display a randomly selected hint from a set of valid hints about the location of one of the missing items to all players simultaneously. This task ended when every item was found or when the 45 minutes elapsed, after which the game would display a countdown of 5 seconds before moving on to the next phase.

The following screen would display a second set of instructions; this time for the map recon- struction task. In L+ treatments, the next screen would display a second leaderboard corresponding to the map reconstruction component, shown in Figure 6.3b. L- treatments simply skipped this screen. Next, participants were asked to place all the items over a 2D representation of the 3D world

88 as close to the position they were originally found as they could. For this purpose, an icon for every item was displayed on one side of the screen and was made draggable using the computer’s mouse.

This screen also contained a chat area, but messages sent during the scavenger hunt component were not displayed. All the items could be moved by all participants any number of times and the new location was updated on every participant’s screen. Every time an item was moved by one participant, a small label was displayed over it containing the name of the participant that last moved it. This task would end after all the participants agreed on the placement of all the items by selecting a checkbox at the bottom of the screen. Once every participant had checked to agree on placement, the task would be considered complete and the experiment session would end.

6.2.5 Evaluation Metrics

As we mention earlier, the intention behind manipulating avatar colors asymmetrically is to influence semantic behavior in the form of player performance with a difficulty purpose. Here we describe metrics for performance on each of the two tasks for both individual players and groups, per our hypotheses.

6.2.5.1 Individual Metrics

As a metric of performance of an individual during the scavenger hunt we are considering the number of items found by each participant. During the map reconstruction task, we are considering the average proximity of all of the participant’s item placements with respect to the original location of each item in the 3-D world calculated as the linear distance of the projection of both points on the XZ plane (i.e., the map floor).

6.2.5.2 Group Metrics

To measure group performance during the scavenger hunt we looked at the total duration of this task in seconds. This time is measured from the moment the game task begins until all 10 items are found, or until 45 minutes elapse, whichever happens first. Similarly, we measured the duration of the map

89 reconstruction task as a metric for performance for this second task. On this task, time is measured from the moment a participant first places an item on the map until the last item placement is made. Additionally, on the map reconstruction task we are also considering the proximity of the

final collective placement of all the items on the map with respect to their original location in the

3-D world calculated as the linear distance of the projection of both points on the XZ plane (i.e., the map floor). Given that the 3-D world is laid out as a 1000 by 1000 unit square, the maximum possible distance would be the diagonal: 1000p2 = 1414.21 units.

6.3 Analysis and Results

Our sample size renders a formal statistical analysis ineffective. Instead, and because our goal with this case study is not to produce generalizable results but to illustrate data collection on our AVE platform, we merely present the raw measurements from the limited data we collected.

6.3.1 Individual Performance

We expected individual players who saw their own avatar red would perform better than others who saw their avatars gray (H1), and players who had their avatars rendered red to others not to perform better than others (H2).

6.3.1.1 Number of Items Found

The average number of items found by players who saw their avatar red was 4.5, compared to 3.53 items for players who saw their avatar gray. This difference narrows when comparing players who were red to others with an average of 4 items, where players who were gray to others found 3.56 items.

These results seem to align with (H1) and (H2), where players who see their avatar red perform better than others, and where the difference in performance between players who are red to others and players who are gray to others is marginal. Figure 6.6 shows the difference in number of items found by players in L+ and L- treatments.

90 (a) Avatar color seen by others (b) Avatar color seen by self

Figure 6.6 Average number of items found by players by the color of their avatars and the type of game session.

6.3.1.2 Proximity of Items

The average proximity of all the items placed on the map reconstruction task with respect to the original location of the items in the 3D world by players who saw their avatar red was 302.13 units, while players who saw their avatar gray achieved an average distance of 233.67 units. Players who were seen by others red had an average distance of 474.16 units, while players who were seen by others gray had an average distance of 197.14 units. Since a smaller distance indicates better performance, these results are in disagreement with (H1) and (H2). Figure 6.7 shows the difference in average proximity by players in L+ and L- treatments.

6.3.2 Group Performance

We expected groups that had players with red avatars to perform better than groups with all gray avatars (H3). More specifically, we expected groups in the red to others and red to self treatments to perform better than groups in the all gray treatment.

91 (a) Avatar color seen by others (b) Avatar color seen by self

Figure 6.7 Average distance, in game units, between players’ placement of items on the map to all items’ original location.

6.3.2.1 Scavenger Hunt Duration

The average duration of the scavenger hunt task for groups in the all gray treatment was 1546 seconds. Groups in the red to others and red to self treatments took longer to complete this task, with averages of 1549.67 and 1940 seconds, respectively. Since a shorter duration indicates better performance, these results disagree with (H3). Figure 6.8 shows how the duration of this task varied in L+ and L- treatments.

6.3.2.2 Map Reconstruction Duration

The average duration of the map reconstruction task for groups in the all gray treatment was 193.5 seconds. Groups in the red to others and red to self treatments took comparably the same amount of time to complete this task, with averages of 191.33 and 207.5 seconds, respectively. These values provide no support for (H3). Figure 6.9 shows how the duration of this task varied in L+ and L- treatments.

92 Figure 6.8 Average duration of the scavenger hunt task by avatar appearance and game type.

6.3.2.3 Proximity of Items

The average distance between the final placement of all items on the map reconstruction task with respect to their original location on the 3-D world for groups in the all gray treatment was 154.79 units. Groups in the red to others and red to self treatments were less accurate, with averages of

278.07 and 196.41 units, respectively. These values provide evidence against (H3). Figure 6.10 shows how the average distance on this task varied in L+ and L- treatments.

6.4 Discussion

The main contribution of this case study is our AVE architecture that demonstrates how the affor- dance for asymmetry in VGEs could be leveraged to influence many types of behavior with many purposes in more complex domains. We showed that our AVE architecture supports asymmetric rendering in various domains by altering avatar colors in both a multiplayer immersive 3-D world

93 Figure 6.9 Average duration of the map reconstruction task by avatar appearance and game type.

during the scavenger hunt task and in a multiplayer casual gaming environment in the map recon- struction task. We also showed how the same tasks could be framed differently—collaborative or competitive—to subtly influence player behavior.

In addition to their potential to be used in commercial games, this case study demonstrated the use of Asymmetric Virtual Environments as viable research platforms that increase the internal validity of experiments by reducing spurious causes for effects and biases, such as effects of avatar appearance on behavior, providing a better control of experimental variables. We showed this by designing an experiment around our AVE that asymmetrically altered avatar colors to influence semantic behavior in the form of player performance, affecting the difficulty of the game. This asymmetric rendering of avatar colors ensures that any differences in behavior cannot be explained, for example, by the expectations of others on our behavior—a phenomenon called behavioral confirmation [108]. While we cannot make any conclusive statements about the effects of avatar colors on behavior

94 Figure 6.10 Avatar models in red and gray.

from our dataset due to our small sample size, our intention was to demonstrate the data collecting capabilities of our AVE platform in relation to its asymmetric affordances across multiple experimen- tal treatments. However, while conducting the experiment we recorded a few anecdotal observations relevant to unobtrusive behavior influence. A notable one was that players tended to visit areas with distinctive land features such as mountains, trees, and rivers more often than other areas. This suggests dynamic, and perhaps asymmetric, rendering of land features in locations of interest as a possible approach to subtly influence behavior in both single and multiplayer VEs.

95 Chapter 7

Conclusions

This chapter is divided into two parts. First, in Section 7.1 we provide a summary of the work presented in this dissertation. Then, in Section 7.2 we discuss the main contributions and impact of this work.

7.1 Summary

In this document we described a series of unobtrusive behavior influence techniques in support of our thesis that in a virtual gaming environment, player knowledge, the framing of revealed knowledge, and/or the properties of in-game elements (e.g., player avatars) can be leveraged to have a predictable effect on player behavior, allowing authors to subtly influence players’ be- haviors without altering game mechanics. We organized this research by introducing a taxonomy, described in Chapter2, that partitions behavior influence in VGEs by the type of behavior being affected in terms of level of abstraction (input-level vs. semantic), and by the purpose for modifying player behavior (narrative, difficulty, or scrutiny). Framed in this taxonomy, we provided evidence in support of our thesis in the form of a series of case studies covering both types of behavior and all three purposes.

The first case study, presented in Chapter3, describes the Concentration Game, where we

96 showed how input-level behavior in the form of mouse patterns can be influenced by controlling both player knowledge of the contents of the game tiles as well as how access to this knowledge was framed in order to scrutinize players or alter the game’s difficulty. The second case study, presented in Chapter4, describes the Typing Game, where we also showed how input-level behavior can be influenced, but this time with a different input device, namely, a keyboard. Here, keystroke patterns were influenced by leveraging player knowledge in the form of familiarity with words in order to scrutinize players or alter the game’s difficulty. The third case study, presented in Chapter5, describes the Mimesis Effect, where we explored how semantic behavior in the form of in-game choices of actions were influenced by the affordance of interactive narrative experiences to allow players to have different narrative roles, which can be used with a narrative purpose, and also showed how semantic actions can be used with a scrutiny purpose to determine a player’s acting narrative role.

In the final case study, described in Chapter6, we presented a proof of concept implementation of an Asymmetric Virtual Environment that could conceptually be used to influence many type of behaviors with many purposes. We described this architecture in the context of an experimental design that asymmetrically altered avatar colors in multiplayer immersive 3-D and casual 2-D gaming environments to influence semantic behavior in the form of player performance with a difficulty purpose.

7.2 Impact

The overarching contribution of this work is providing additional tools to aid game designers and authors of interactive narratives to create more engaging aesthetic experiences for their players by subtly influencing player behavior. Doing so enables these creators to unobtrusively guide player actions so that they align with authorial intent, or to inspect players’ actions to obtain player-specific insights that can be used as inputs to generative or adaptive models . The series of behavior influence techniques that we presented in this dissertation could be directly leveraged by game designers to

fine-tune these experiences for their players in games they are already building. For example, creators

97 of role-playing games could frame a particular game quest in a way that promotes its relevance to a player’s in-game role to increase the likelihood of that player choosing that particular quest.

Similarly, designers of educational games could leverage these behavior influence techniques to control the difficulty of their experiences to promote engagement and optimize learning outcomes, or to drive players through specific game content relevant to the learning experience.

For games research, and research of interactive experiences in general, this work offers two main contributions. First, it exposes a new avenue for exploration of how human biases can be leveraged to influence player behavior, bridging the gap between games research and cognitive science. Second, it alerts game researchers of how experimental design decisions may be impacting study results by subtly and inadvertently influencing their participant’s behaviors. For example, studies that portray player avatars but fail to control for effects of avatar appearance, or studies with interactive narratives and role affordances that fail to control for effects of roles on choice, have the potential to bias results.

An additional contribution of this work is the taxonomy itself, which provides structure to organize not only the work we presented here, but also existing and upcoming literature in player modeling and player behavior influence. The taxonomy introduces a vocabulary that serves two main purposes. First, it simplifies parsing the games research literature to find, for example, ways in which input-level data has been used to scrutinize players, or behavior influence techniques to promote a narrative. Second, it helps researchers identify possible gaps in the literature or research opportunities where combinations of types of behavior and purposes appear under-represented in the taxonomy, indicating areas that are potentially under-explored or require further investigation.

98 REFERENCES

[1] Ahmed, Ahmed Awad E. & Traore, Issa. “A New Biometric Technology Based on Mouse Dynamics”. Transactions on Dependable and Secure Computing 4.3 (2007), pp. 165–179. DOI: 10.1109/TDSC.2007.70207.

[2] Ali, Noureldin & Yang, Yanyan. “Game Authentication Based on Behavior Pattern”. Proceed- ings of the 15th International Conference on Advances in Mobile Computing & Multimedia. MoMM2017. Salzburg, Austria: ACM, 2017, pp. 151–156. DOI: 10.1145/3151848.3151878.

[3] Anholt, Robert R. H. & Mackay, Trudy F.C. Principles of behavioral genetics. Academic Press, 2009.

[4] Arroyo, Ernesto; Selker, Ted & Wei, Willy. “Usability Tool for Analysis of Web Designs Using Mouse Tracks”. CHI ’06 Extended Abstracts on Human Factors in Computing Systems. CHI EA ’06. Montréal, Québec, Canada: ACM, 2006, pp. 484–489. DOI: 10.1145/1125451.1125557.

[5] Aylett, Ruth. “Emergent Narrative, Social Immersion and ’Storification’”. Proceedings of the 1st International Workshop on Narrative and Interactive Learning Environments. 2000, pp. 1–10.

[6] Bailenson, Jeremy N. “Transformed social interaction in collaborative virtual environments”. Digital media: Transformations in human communication (2006), pp. 255–264.

[7] Bailenson, Jeremy N. & Beall, Andrew C. “Transformed social interaction: Exploring the digital plasticity of avatars”. Avatars at Work and Play. Springer, 2006, pp. 1–16.

[8] Bailenson, Jeremy N. & Yee, Nick. “Digital Chameleons: Automatic Assimilation of Nonverbal Gestures in Immersive Virtual Environments”. Psychological Science 16.10 (2005). PMID: 16181445, pp. 814–819. DOI: 10.1111/j.1467-9280.2005.01619.x.

[9] Bailenson, Jeremy N.; Beall, Andrew C.; Loomis, Jack; Blascovich, Jim & Turk, Matthew. “Transformed social interaction: Decoupling representation from behavior and form in collaborative virtual environments”. PRESENCE: Teleoperators and Virtual Environments 13.4 (2004), pp. 428–441.

[10] Bailenson, Jeremy N.; Beall, Andrew C.; Loomis, Jack; Blascovich, Jim & Turk, Matthew. “Transformed Social Interaction, Augmented Gaze, and Social Influence in Immersive Virtual Environments”. Human Communication Research 31.4 (2005), pp. 511–537. DOI: 10.1111/ j.1468-2958.2005.tb00881.x.

[11] Bailenson, Jeremy N.; Yee, Nick; Blascovich, Jim & Guadagno, Rosanna E. “Transformed social interaction in mediated interpersonal communication”. Mediated interpersonal communi- cation (2008), pp. 77–99.

99 [12] Bargh, John A. & Chartrand, Tanya L. “The unbearable automaticity of being”. American Psychologist 54.7 (1999), pp. 462 –479.

[13] Bargh, John A.; Chen, Mark & Burrows, Lara. “Automaticity of social behavior: Direct effects of trait construct and stereotype activation on action.” Journal of Personality and Social Psychology 71.2 (1996), pp. 230 –244.

[14] Barik, Titus; Harrison, Brent E.; Roberts, David L. & Jiang, Xuxian. “Spatial Game Signatures for Bot Detection in Social Games.” AIIDE. 2012.

[15] Barik, Titus; Chakraborty, Arpan; Harrison, Brent E.; Roberts, David L. & Amant, Robert St. “Speed/Accuracy Tradeoff in ACT-R Models of the Concentration Game”. Proceedings of the 2013 International Conference on Cognitive Modeling. 2013, pp. 281–286.

[16] Bellizzi, Joseph A. & Hite, Robert E. “Environmental Color, Consumer Feelings, and Purchase Likelihood.” Psychology & Marketing 9.5 (1992), pp. 347 –363.

[17] Bennis, W. G.; Berkowitz, N.; Affinito, M. & Malone, M. “Authority, Power, and the Ability to In- fluence”. Human Relations 11.2 (1958), pp. 143–155. DOI: 10.1177/001872675801100204.

[18] Bethesda Game Studios. The Elder Scrolls V: Skyrim. Bethesda Softworks, 2013.

[19] Blomkvist, Stefan. “Persona – an overview”. Theoretical perspectives in human-computer interaction. 2002.

[20] Blow, Jonathan. “Game Development: Harder Than You Think”. Queue 1.10 (2004), pp. 28–37.

[21] Blum, Manuel; Von Ahn, Luis A.; Langford, John & Hopper, Nicholas. “The CAPTCHA project, “Completely automatic public turing test to tell computers and humans apart””. School of Computer Science, Carnegie-Mellon University, http://www.captcha.net (2000).

[22] Bouwhuis, Don & Bouma, Herman. “Visual word recognition of three-letter words as derived from the recognition of the constituent letters”. Perception & Psychophysics 25.1 (1979), pp. 12–22. DOI: 10.3758/BF03206104.

[23] Breiman, Leo. “Random forests”. Machine learning 45.1 (2001), pp. 5–32.

[24] Bruckman, Amy. The Combinatorics of Storytelling: Mystery Train Interactive. Tech. rep. MIT Media Lab, 1990.

[25] Card, Stuart K.; Moran, Thomas P.& Newell, Allen. “The model human processor: An engi- neering model of human performance”. Handbook of perception and human performance. (1986).

100 [26] Cardona-Rivera, Rogelio E.; Robertson, Justus; Ware, Stephen G.; Harrison, Brent E.; Roberts, David L. & Young, R. Michael. “Foreseeing Meaningful Choices”. Proceedings of the 10th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment. 2014, pp. 9–15.

[27] Chartrand, Tanya L. & Bargh, John A. “The chameleon effect: The perception-behavior link and social interaction.” Journal of Personality and Social Psychology 76.6 (1999), pp. 893 –910.

[28] Chen, Mon Chu; Anderson, John R. & Sohn, Myeong Ho. “What Can a Mouse Cursor Tell Us More?: Correlation of Eye/Mouse Movements on Web Browsing”. CHI ’01 Extended Abstracts on Human Factors in Computing Systems. CHI EA ’01. Seattle, Washington: ACM, 2001, pp. 281–282. DOI: 10.1145/634067.634234.

[29] Cialdini, Robert B. Influence: Science and Practice. Scott Foresman, 1985.

[30] Cialdini, Robert B. Influence: The Psychology of Persuasion. Revised. Harper Business, 2006.

[31] Cialdini, Robert B. Pre-Suasion: A revolutionary way to influence and persuade. Simon and Schuster, 2016.

[32] Cohen, Jacob. “A Power Primer”. Psychological Bulletin 112.1 (1992), p. 155.

[33] Cover, Jennifer G. The Creation of Narrative in Tabletop Role-Playing Games. McFarland & Company, 2010.

[34] Cruz, Christian Arzate & Uresti, Jorge Adolfo Ramirez. “Player-centered game AI from a flow perspective: Towards a better understanding of past trends and future directions”. Entertainment Computing 20 (2017), pp. 11 –24. DOI: 10.1016/j.entcom.2017.02.003.

[35] Csikszentmihalyi, Mihaly. Finding flow: The psychology of engagement with everyday life. Basic Books, 1997.

[36] Csikszentmihalyi, Mihaly. “Play and Intrinsic Rewards”. Flow and the Foundations of Positive Psychology: The Collected Works of Mihaly Csikszentmihalyi. Dordrecht: Springer Nether- lands, 2014, pp. 135–153. DOI: 10.1007/978-94-017-9088-8_10.

[37] Deci, Edward L. & Ryan, Richard M. Intrinsic Motivation and Self-Determination in Human Behavior. Plenum Press, 1985.

[38] Deuchar, Sue & Nodder, Carolyn. “The Impact of Avatars and 3D Virtual World Creation on Learning”. Proceedings of the 16th Annual NACCQ Conference. 2003, pp. 255–258.

[39] Dillon, Roberto. On the Way to Fun: An Emotion-based Approach to Successful Game Design. CRC Press, 2010.

101 [40] Dillon, Roberto. “The 6-11 Framework: a new approach to analysis and design”. Proceedings of the 3rd Annual GAMEON Asia Conference. Singapore, 2011, pp. 25–29.

[41] Domínguez, Ignacio X. & Roberts, David L. “Asymmetric Virtual Environments: Exploring the Effects of Avatar Colors on Performance”. Proceedings of the Workshop on Experimental AI In Games (EXAG) at the Tenth Artificial Intelligence and Interactive Digital Entertainment Conference (AIIDE 2014). Raleigh, North Carolina, USA, 2014.

[42] Domínguez, Ignacio X.; Goel, Alok; Roberts, David L. & St. Amant, Robert. “Detecting Abnor- mal User Behavior Through Pattern-mining Input Device Analytics”. Proceedings of the 2015 Symposium and Bootcamp on the Science of Security (HotSoS 2015). HotSoS ’15. Urbana, Illinois: ACM, 2015, 11:1–11:13. DOI: 10.1145/2746194.2746205.

[43] Domínguez, Ignacio X.; Dhawan, Jayant; St. Amant, Robert & Roberts, David L. “Exploring the Effects of Different Text Stimuli on Typing Behavior”. Proceedings of the 2016 International Conference on Cognitive Modeling (ICCM 2016). State College, PA, USA, 2016.

[44] Domínguez, Ignacio X.; Cardona-Rivera, Rogelio E.; Vance, James K. & Roberts, David L. “The Mimesis Effect: The Effect of Roles on Player Choice in Interactive Narrative Role-Playing Games”. Proceedings of the 34th Annual ACM Conference on Human Factors in Computing Systems (CHI 2016). San Jose, CA, USA, 2016. DOI: 10.1145/2858036.2858141.

[45] Dotsch, Ron & Wigboldus, Daniël H. J. “Virtual prejudice”. Journal of Experimental Social Psychology 44.4 (2008), pp. 1194 –1198. DOI: 10.1016/j.jesp.2008.03.003.

[46] Ducheneaut, Nicolas; Wen, Ming-Hui; Yee, Nicholas & Wadley, Greg. “Body and Mind: A Study of Avatar Personalization in Three Virtual Worlds”. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. CHI ’09. Boston, MA, USA: ACM, 2009, pp. 1151–1160. DOI: 10.1145/1518701.1518877.

[47] Durstenfeld, Richard. “Algorithm 235: Random Permutation”. Communications of the ACM 7.7 (1964), pp. 420–421.

[48] Elliot, Andrew J.; Maier, Markus A.; Moller, Arlen C.; Friedman, Ron & Meinhardt, JÃurg.˝ “Color and psychological functioning: The effect of red on performance attainment.” Journal of Experimental Psychology: General 136.1 (2007), pp. 154 –168.

[49] Feit, Anna Maria; Weir, Daryl & Oulasvirta, Antti. “How We Type: Movement Strategies and Performance in Everyday Typing”. Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. CHI ’16. Santa Clara, California, USA: ACM, 2016, pp. 4262– 4273. DOI: 10.1145/2858036.2858233.

[50] Fendt, Matthew W.; Harrison, Brent E.; Ware, Stephen G.; Cardona-Rivera, Rogelio E. & Roberts, David L. “Achieving the Illusion of Agency”. Proceedings of the 5th International Conference on Interactive Digital Storytelling. 2012, pp. 114–125.

102 [51] Fernández-Vara, Clara. “The Tribulations of Adventure Games: Integrating Story into Simu- lation through Performance”. PhD thesis. Georgia Institute of Technology, 2009.

[52] Fitts, Paul M. “The information capacity of the human motor system in controlling the amplitude of movement”. Journal of Experimental Psychology 47.6 (1954), pp. 381–391.

[53] Fleiss, Joseph L. “Measuring nominal scale agreement among many raters”. Psychological Bulletin 76 (5 1971), pp. 378–382.

[54] Gerard, Robert Marius. “Differential effects of colored lights on psychophysiological func- tions”. PhD thesis. University of California, Los Angeles., 1958.

[55] Gerjets, Peter; Scheiter, Katharina & Tack, Werner H. “Resource-adaptive selection of strate- gies in learning from worked-out examples”. Proceedings of the 22nd Annual Conference of the Cognitive Science Society. Erlbaum, 2000, pp. 166–171.

[56] Gianvecchio, Steven; Wu, Zhenyu; Xie, Mengjun & Wang, Haining. “Battle of Botcraft: Fighting Bots in Online Games with Human Observational Proofs”. Proceedings of the 16th ACM Conference on Computer and Communications Security. CCS ’09. Chicago, Illinois, USA: ACM, 2009, pp. 256–268. DOI: 10.1145/1653662.1653694.

[57] Gillath, Omri; McCall, Cade; Shaver, Phillip R. & Blascovich, Jim. “What Can Virtual Reality Teach Us About Prosocial Tendencies in Real and Virtual Environments?” Media Psychology 11.2 (2008), pp. 259–282. DOI: 10.1080/15213260801906489.

[58] Golle, Philippe & Ducheneaut, Nicolas. “Preventing Bots from Playing Online Games”. Com- put. Entertain. 3.3 (2005), pp. 3–3. DOI: 10.1145/1077246.1077255.

[59] Google. Google RECAPTCHA. 2015. URL: https://www.google.com/recaptcha.

[60] Gray, Wayne D. & Boehm-Davis, Deborah A. “Milliseconds matter: An introduction to mi- crostrategies and to their use in describing and predicting interactive behavior.” Journal of Experimental Psychology: Applied 6.4 (2000), pp. 322 –335.

[61] Gray, Wayne D. & Fu, Wai-Tat. “Soft constraints in interactive behavior: the case of ignoring perfect knowledge in-the-world for imperfect knowledge in-the-head”. Cognitive Science 28.3 (2004), pp. 359 –382. DOI: 10.1016/j.cogsci.2003.12.001.

[62] Hamari, Juho; Shernoff, David J.; Rowe, Elizabeth; Coller, Brianno; Asbell-Clarke, Jodi & Edwards, Teon. “Challenging games help students learn: An empirical study on engagement, flow and immersion in game-based learning”. Computers in Human Behavior 54 (2016), pp. 170 –179. DOI: 10.1016/j.chb.2015.07.045.

[63] Harris, Thomas E. & Nelson, Mark D. Applied Organizational Communication: Theory and Practice in a Global Environment. New York: Lawrence Erlbaum Associates, 2008.

103 [64] Harrison, Brent E. & Roberts, David L. “An Analytic and Psychometric Evaluation of Dynamic Game Adaption for Increasing Session-Level Retention in Casual Games”. Computational Intelligence and AI in Games, IEEE Transactions on 7.3 (2015), pp. 207–219. DOI: 10.1109/ TCIAIG.2015.2410757.

[65] Healey, Christopher & Enns, James. “Attention and Visual Memory in Visualization and Computer Graphics”. IEEE Transactions on Visualization and Computer Graphics 18.7 (2012), pp. 1170–1188. DOI: 10.1109/TVCG.2011.127.

[66] Heinsoo, Rob; Collins, Andy & Wyatt, James. Dungeons & Dragons Player’s Handbook: Arcane, Divine, and Martial Heroes. 4th. Wizards of the Coast, Inc., 2008.

[67] Heitz, Richard P.“The speed-accuracy tradeoff: history, physiology, methodology, and be- havior”. Frontiers in neuroscience 8.150 (2014). DOI: 10.3389/fnins.2014.00150.

[68] Herman, David. “Cognitive Approaches to Narrative Analysis”. Cognitive Poetics: Goals, Gains and Gaps. Ed. by Brône, Geert & Vandaele, Jeroen. Berlin, Germany: Mouton de Gruyter, 2009, pp. 79–124.

[69] Hitchens, Michael & Drachen, Anders. “The Many Faces of Role-Playing Games”. Interna- tional Journal of Role-Playing 1.1 (2008), pp. 3–21.

[70] Hunicke, Robin; LeBlanc, Marc & Zubek, Robert. “MDA: A formal approach to game design and game research”. Proceedings of the AAAI Workshop on Challenges in Game AI. Vol. 4. 1. AAAI Press San Jose, CA. 2004, pp. 1–5.

[71] John, Bonnie E. “TYPIST: A Theory of Performance in Skilled Typing”. Hum.-Comput. Interact. 11.4 (1996), pp. 321–355.

[72] Kahneman, Daniel. “A perspective on judgment and choice: Mapping bounded rationality”. American Psychologist 58.9 (2003), pp. 697 –720.

[73] Kickmeier, Michael & Albert, Dietrich. “Micro-adaptivity: Protecting immersion in didacti- cally adaptive digital educational games”. 26 (2010), pp. 95 –105.

[74] Kirkpatrick, Paul. “Probability theory of a simple card game”. The Mathematics Teacher (1954), pp. 245–248.

[75] Klimmt, Christoph; Blake, Christopher; Hefner, Dorothée; Vorderer, Peter & Roth, Christian. “Player Performance, Satisfaction, and Video Game Enjoyment”. Proceedings of the 2009 International Conference on Entertainment Computing (ICEC 2009). Ed. by Natkin, Stéphane & Dupire, Jérôme. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 1–12. DOI: 10. 1007/978-3-642-04052-8_1.

[76] Landis, Richard J. & Koch, Gary G. “The Measurement of Observer Agreement for Categorical Data”. Biometrics 33.1 (1977), pp. 159–174.

104 [77] Lichtenfeld, Stephanie; Elliot, Andrew J.; Maier, Markus A. & Pekrun, Reinhard. “Fertile Green: Green Facilitates Creative Performance”. Personality and Social Psychology Bulletin 38.6 (2012), pp. 784–797. DOI: 10.1177/0146167212436611.

[78] Lieberoth, Andreas. “Shallow Gamification: Testing Psychological Effects of Framing an Activity as a Game”. Games and Culture 10.3 (2015), pp. 229–248.

[79] Martey, Rosa Mikeal; Stromer-Galley, Jennifer; Banks, Jaime; Wu, Jingsi & Consalvo, Mia. “The strategic female: gender-switching and player behavior in online games”. Information, Com- munication & Society 17.3 (2014), pp. 286–300. DOI: 10.1080/1369118X.2013.874493.

[80] Mawhorter, Peter; Mateas, Michael; Wardrip-Fruin, Noah & Jhala, Arnav. “Towards a Theory of Choice Poetics”. Proceedings of the 9th International Conference on the Foundations of Digital Games. 2014.

[81] Merola, Nicholas A.; Peña, Jorge F. & Hancock, Jeffrey T. “Avatar color and social identity effects on attitudes and group dynamics in online video games”. Proceedings of the 2006 Annual International Communication Association Conference. 2006, pp. 1–26.

[82] Monrose, Fabian & Rubin, Aviel. “Authentication via Keystroke Dynamics”. Proceedings of the 4th ACM Conference on Computer and Communications Security. CCS ’97. Zurich, Switzerland: ACM, 1997, pp. 48–56. DOI: 10.1145/266420.266434.

[83] Mueller, Shane T. & Weidemann, Christoph T. “Alphabetic letter identification: effects of perceivability, similarity, and bias.” Acta Psychologica 139.1 (2012), pp. 19–37.

[84] Murphy-Hill, Emerson R.; Zimmermann, Tom & Nagappan, Nachiappan. “Cowboys, Ankle Sprains, and Keepers of Quality: How Is Different from Software Development?” Proceedings of the 36th International Conference on Software Engineering. 2014, pp. 1–11.

[85] Murray, Janet H. Hamlet on the Holodeck: The Future of Narrative in Cyberspace. Simon and Schuster, 1997.

[86] Pan, Bing; Hembrooke, Helene A.; Gay, Geri K.; Granka, Laura A.; Feusner, Matthew K. & Newman, Jill K. “The Determinants of Web Page Viewing Behavior: An Eye-tracking Study”. Proceedings of the 2004 Symposium on Eye Tracking Research & Applications. ETRA ’04. San Antonio, Texas: ACM, 2004, pp. 147–154. DOI: 10.1145/968363.968391.

[87] Pavlas, Davin. “A Model Of Flow And Play In Game-based Learning The Impact Of Game Characteristics, Player Traits, And Player States”. University of Central Florida (2010).

[88] Peña, Jorge; Hancock, Jeffrey T. & Merola, Nicholas A. “The Priming Effects of Avatars in Virtual Settings”. Communication Research 36.6 (2009), pp. 838–856. DOI: 10.1177/ 0093650209346802.

105 [89] Pedregosa, F.et al. “Scikit-learn: Machine Learning in Python”. Journal of Machine Learning Research 12 (2011), pp. 2825–2830.

[90] Peña, Jorge F. “Integrating the Influence of Perceiving and Operating Avatars Under the Automaticity Model of Priming Effects”. Communication Theory 21.2 (2011), pp. 150–168. DOI: 10.1111/j.1468-2885.2011.01380.x.

[91] Perlin, Ken. Polly’s World. http://mrl.nyu.edu/~perlin/experiments/polly/. On- line; accessed 11-February-2013.

[92] Pope, Alan T. & Bogart, Edward H. “Method of encouraging attention by correlating video game difficulty with attention level”. Pat. 5377100. 1994.

[93] Prince, Gerald. A Dictionary of Narratology: Revised Edition. University of Nebraska Press, 2003.

[94] Propp, Vladimir. Morphology of the Folktale. Austin, TX, USA: University of Texas Press, 1968.

[95] Raven, Bertram H. Social influence and power. Tech. rep. AD0609111. University of California Los Angeles, DTIC Document, 1964, p. 13.

[96] Reicher, Stephen D.; Spears, Russell & Postmes, Tom. “A social identity model of deindividu- ation phenomena”. European review of social psychology 6.1 (1995), pp. 161–198.

[97] Revett, Kenneth; Jahankhani, Hamid; Magalhães, Sérgio T.& Santos, Henrique M. D. “A survey of user authentication based on mouse dynamics”. Global E-Security (2008), pp. 210–219.

[98] Riedl, Mark; Saretto, C. J. & Young, R. Michael. “Managing Interaction Between Users and Agents in a Multi-agent Storytelling Environment”. Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems. AAMAS ’03. Melbourne, Australia: ACM, 2003, pp. 741–748. DOI: 10.1145/860575.860694.

[99] Riedl, Mark O. & Bulitko, Vadim. “Interactive Narrative: An Intelligent Systems Approach”. AI Magazine 34.1 (2013), pp. 67–77.

[100] Rijsbergen, C. J. van. “Information retrieval. 1979”. Btterworths, London (1997).

[101] Roberts, David L. & Isbell, Charles L. “Lessons on Using Computationally Generated Influence for Shaping Narrative Experiences”. Computational Intelligence and AI in Games, IEEE Transactions on 6.2 (2014), pp. 188–202. DOI: 10.1109/TCIAIG.2013.2287154.

[102] Robertson, Justus & Young, R. Michael. “Perceptual Experience Management”. IEEE Trans- actions on Games (2018), pp. 1–1. DOI: 10.1109/TG.2018.2817162.

[103] Ryan, Richard M. & Deci, Edward L. “Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being”. American Psychologist 55.1 (2000), p. 68.

106 [104] Salthouse, Timothy A. “Perceptual, Cognitive, and Motoric Aspects of Transcription Typing.” Psychological Bulletin 99.3 (1986), pp. 303–319.

[105] Sherry, John L. “Flow and Media Enjoyment”. Communication Theory 14.4 (2006), pp. 328– 347. DOI: 10.1111/j.1468-2885.2004.tb00318.x.

[106] Simon, Herbert A. “A Behavioral Model of Rational Choice”. The Quarterly Journal of Eco- nomics 69.1 (1955), pp. 99–118. DOI: 10.2307/1884852.

[107] Sivunen, Anu & Hakonen, Marko. “Review of Virtual Environment Studies on Social and Group Phenomena”. Small Group Research 42.4 (2011), pp. 405–457. DOI: 10 . 1177 / 1046496410388946.

[108] Snyder, Mark; Tanke, Elizabeth D. & Berscheid, Ellen. “Social perception and interpersonal behavior: On the self-fulfilling nature of social stereotypes.” Journal of Personality and Social Psychology 35.9 (1977), pp. 656 –666.

[109] Sweetser, Penelope & Wyeth, Peta. “GameFlow: A Model for Evaluating Player Enjoyment in Games”. Computers in Entertainment 3.3 (2005), pp. 3–3. DOI: 10.1145/1077246. 1077253.

[110] Thawonmas, Ruck; Kashifuji, Yoshitaka & Chen, Kuan-Ta. “Detection of MMORPG bots based on behavior analysis”. Proceedings of the 2008 International Conference on Advances in Computer Entertainment Technology. ACM. 2008, pp. 91–94.

[111] Thue, David; Bulitko, Vadim; Spetch, Marcia & Wasylishen, Eric. “Interactive Storytelling: A Player Modelling Approach”. Proceedings of the Third AAAI Conference on Artificial Intelli- gence and Interactive Digital Entertainment. AIIDE. Stanford, California: AAAI Press, 2007, pp. 43–48.

[112] Tversky, Amos & Kahneman, Daniel. “Judgment under Uncertainty: Heuristics and Biases”. Science 185.4157 (1974), pp. 1124–1131. DOI: 10.1126/science.185.4157.1124.

[113] Tversky, Amos & Kahneman, Daniel. “The Framing of Decisions and the Psychology of Choice”. Science 211.4481 (1981), pp. 453–458.

[114] Van Der Pligt, Joop & Vliek, Michael. The Psychology of Influence: Theory, research and practice. Taylor & Francis, 2016.

[115] Vrij, Aldert. “Wearing black clothes: The impact of offenders’ and suspects’ clothing on Impression formation”. Applied Cognitive Psychology 11.1 (1997), pp. 47–53.

[116] Vrij, Aldert & Akehurst, Lucy. “The existence of a black clothing stereotype: The impact of a victim’s black clothing on impression formation”. Psychology, Crime and Law 3.3 (1997), pp. 227–237.

107 [117] Waskul, Dennis & Lust, Matt. “Role-Playing and Playing Roles: The Person, Player, and Persona in Fantasy Role-Playing”. Symbolic Interaction 27.3 (2004), pp. 333–356.

[118] Yan, Jeff & El Ahmad, Ahmad Salah. “Usability of CAPTCHAs or Usability Issues in CAPTCHA Design”. Proceedings of the 4th Symposium on Usable Privacy and Security. SOUPS ’08. Pittsburgh, Pennsylvania: ACM, 2008, pp. 44–52. DOI: 10.1145/1408664.1408671.

[119] Yannakakis, Georgios N. & Togelius, Julian. “Experience-Driven Procedural Content Genera- tion”. IEEE Transactions on Affective Computing 2.3 (2011), pp. 147–161. DOI: 10.1109/T- AFFC.2011.6.

[120] Yee, Nick. “Motivations for Play in Online Games”. CyberPsychology & Behavior 9.6 (2006), pp. 772–775. DOI: 10.1089/cpb.2006.9.772.

[121] Yee, Nick & Bailenson, Jeremy. “The Proteus Effect: The Effect of Transformed Self- Representation on Behavior”. Human Communication Research 33.3 (2007), pp. 271–290. DOI: 10.1111/j.1468-2958.2007.00299.x.

[122] Yee, Nick; Bailenson, Jeremy N.; Urbanek, Mark; Chang, Francis & Merget, Dan. “The Unbear- able Likeness of Being Digital: The Persistence of Nonverbal Social Norms in Online Virtual Environments”. CyberPsychology & Behavior 10.1 (2007). PMID: 17305457, pp. 115–121. DOI: 10.1089/cpb.2006.9984.

[123] Yee, Nick; Bailenson, Jeremy N. & Ducheneaut, Nicolas. “The Proteus Effect: Implications of Transformed Digital Self-Representation on Online and Offline Behavior”. Communication Research 36.2 (2009), pp. 285–312. DOI: 10.1177/0093650208330254.

[124] Yee, Nick; Ducheneaut, Nicolas; Nelson, Les & Likarish, Peter. “Introverted Elves & Con- scientious Gnomes: The Expression of Personality in World of Warcraft”. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. CHI ’11. Vancouver, BC, Canada: ACM, 2011, pp. 753–762. DOI: 10.1145/1978942.1979052.

[125] Zanbaka, Catherine; Goolkasian, Paula & Hodges, Larry. “Can a Virtual Cat Persuade You?: The Role of Gender and Realism in Speaker Persuasiveness”. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. CHI ’06. Montréal, Québec, Canada: ACM, 2006, pp. 1153–1162. DOI: 10.1145/1124772.1124945.

108 APPENDICES

109 Appendix A

Concentration Game Feature Plots

110 A.1 Analysis 1 Plots

(a) Time between clicks (b) Time between a click and a succeeding mouse movement

(c) Screen region hover count (d) Count of change in direction of mouse motion

(e) Task completion time (f) Total number of clicks

Figure A.1 Histograms of all 6 features. In blue, values for no reveal rounds. In red, values for mixed reveal rounds.

111 A.2 Analysis 2 Plots

(a) Time between clicks (b) Time between a click and a succeeding mouse movement

(c) Screen region hover count (d) Count of change in direction

(e) Task completion time (f) Total number of clicks

Figure A.2 Histograms of all 6 features. In blue, values for no reveal rounds. In red, values for full reveal rounds.

112 A.3 Analysis 3 Plots

(a) Time between clicks (b) Time between a click and a succeeding mouse movement

(c) Screen region hover count (d) Count of change in direction

(e) Task completion time (f) Total number of clicks

Figure A.3 Histograms of all 6 features. In blue, values for no reveal rounds. In red, values for full reveal rounds. In green, values for partial reveal rounds.

113 Appendix B

Mimesis Effect Validation Phase

To inform the design of the game we used in the EXPERIMENTPHASE of the case study we present in Chapter5, we identified and validated our assumptions in a preliminary phase we call the

VALIDATION PHASE, which we describe here. These assumptions reflect what we assume to be true about how players will engage with the choice structures in the game we use in our experiment, which

(if not controlled for) could represent potentially spurious factors in our experimental design. In this phase, we identified requirements that our experiment’s interactive narrative role-playing game had to satisfy. These requirements, which served as game design constraints, represented experiment factors we needed to control for to help guarantee that players engaged with the choice structures in our custom game as we expected. The last two of these three requirements were design decisions that needed validation in order to be satisfied. In this appendix, we discuss the requirements, the rationale for them, as well as the validation to satisfy the second and third requirements.

B.1 Game Design Requirements

Our three requirements were developed to guarantee three things about how experiment partic- ipants would engage with the choice structures we developed in our game: a) that experiment participants would in fact treat our game as such (i.e., our game satisfied external validity), b) that the roles we provided controlled for player biases vis-à-vis role, and c) that the actions we afforded in

114 the game were easily recognizable as belonging to a role without having to telegraph the association to experiment participants during gameplay.

B.1.1 Requirement #1: External Validity of Game Experience

The game’s design had to be elaborate enough to be treated as an interactive narrative role-playing game by experimental participants, but remain tractable to produce for our experiment. Satisfying this requirement controlled for the effect of choice outcomes in the gameplay experience.

Recent work [78] has demonstrated a shift in people’s motivations when they are presented activities in a game-based framing, and we wanted to ensure that experiment responses were not inhibited due to the experience not feeling like a game. For Murray [85], the key qualities of interactive narratives are navigable space, encyclopedic capacity, procedurality, and participation. Our game affords all these except encyclopedic capacity, due to the relatively small scope demanded by a highly- controlled experimental environment. Of the remaining qualities, participation is the quality most closely linked to player action. Ensuring that a player’s sense of participation in our environment is undiminished requires the maintenance of her sense of dramatic agency – the satisfying power to take meaningful action and see the results of decisions and choices [85]. However, the amount of interactive narrative content that must be authored to support dramatic agency is exponential in the amount of ways the player can direct the development of the unfolding narrative [24]. To keep the authorial burden tractable for our study, while providing an interactive narrative experience that would be treated as such by experimental participants, we leveraged the illusion of agency as studied by Fendt et al. [50]. Their work attempted to discern a player’s sense of dramatic agency on the basis of the feedback player’s received through a choice structure’s outcomes in an interactive narrative (a text-based choose-your-own-adventure). Fendt et al.concluded that simply acknowledging (in their case, through textual feedback) a player’s choice after she selected a particular option is enough to create the illusion of agency, even if her choice has no other effect on the progression of the interactive narrative.

We did not seek to validate this requirement in our experimental design, since we were building

115 on well-established work [26, 50]. However, we did add distinct feedback for every action. Because we afforded a graphical navigable space in our game, the feedback we provided was visual rather than textual, but otherwise the principle of the illusion of agency was applied in the same manner.

B.1.2 Requirement #2: Controlling for Player Role Biases

Due to the nature of role-playing games, roles are very fluid in terms of their behaviors and compo- sition [33, 117]. We wanted to experimentally test whether a player’s awareness of her role had a meaningful effect in terms of her action selection when faced with a choice structure, independent of which role was being examined. If some roles express distinct but similar preferences over action sequences, or if the roles carry with them a tacit association with a particular gender or behavior alignment,1 spurious correlations may be introduced into the analysis. This would affect the choice structure’s framing, since the framing context would be wider than just the player’s sense of her narrative role with respect to narrative structure (it would include for example, gender expectations or behavioral alignment expectations). We therefore needed to select roles that had negligible over- lap in terms of their characteristics, such that they were recognizably distinct, and for which there existed no a priori association with a gender or behavior alignment.

Because of the popularity and influence of the tabletop role-playing game Dungeons & Drag- ons [33], we chose to use it as the basis for narrative roles in our game. Dungeons & Dragons (D&D) is set in a fantasy genre, which commonly uses supernatural phenomena as a primary plot element.

While D&D invites players to extensively customize their in-game persona, the player must first select one role from a finite set of character roles in the game to use as a baseline for that per- sona [66]. D&D supports many roles to choose from; for the purposes of our experiment, we needed to select roles that shared little overlap in the kinds of choice structure options they would take, and in the kinds of attributes associated to them. This is so there was a clear distinction between the afforded roles, and so that a player could make the distinction with as little effort as possible.

We arbitrarily selected three distinguishing attributes for characters: strong, magical, and stealthy.

1Behavior alignment in this case refers to whether the roles were considered to be intrinsically good or evil.

116 Figure B.1 Our triad of role-attribute mappings. We selected three attributes and identified three corre- sponding roles we felt best represented the attributes. Nodes represent role-attribute mappings, and edges are attributes shared between the connected role-attribute mappings. The edge opposite a node is the antonymic attribute to the node’s role-attribute.

These attributes led to the following three roles for our study: Fighter, Mage, and Rogue, which are related to each other as illustrated on the triad in Figure B.1. The in-game descriptions of these roles were designed as schematized paragraphs, exactly three sentences long. The first and third sentence of each paragraph was taken from that role’s description as written in the D&D Player’s Handbook

(4th Edition) [66]. The second sentence was designed to make clear the relation of that role in the triad in Figure B.1. These descriptions make clear the relations in the triad, as well as to refer to types of actions that those roles typically, but not necessarily take. Each description was as follows:

Fighter – “Fighters are experts in armed combat, relying on muscle, training, and pure grit to see

them through. While they are not stealthy or magical, they are strong. They typically mix it up

in close combat, protect their companions, and hack enemies into submission while their

attacks rain down fruitlessly on their heavily armed bodies.”

Mage – “Masters of potent arcane powers, mages disdain physical conflict in favor of awesome

magic. While they are not strong or stealthy, they are magical. They typically hurl balls of

fire that incinerates their enemies, cast spells that change the battlefield, or research arcane

rituals that can alter time and space.”

Rogue – “Thieves, scoundrels, dungeon-delvers, jacks-of-all-trades – rogues have a reputation for

larceny and trickery. While they are not strong or magical, they are stealthy. They typically

117 slip into and out of the shadows on a whim, tumble across the field of battle with little fear of

enemy reprisal, and appear from nowhere to plant a blade in their foe’s back.”

To validate that our choice of role attributes, role descriptions, and role gender and behavioral alignment biases for our game would be perceived as intended by participants of the EXPERIMENT

PHASE of our study, we conducted a survey that is discussed in the VALIDATION SURVEY Section.

B.1.3 Requirement #3: Actions are Recognizably Role-Specific

The actions (i.e. choice structure options) that we afford during gameplay must be easily recognizable as belonging to a role without having to telegraph the association to experiment participants during gameplay. We did not want to overtly signal to the player the association of in-game options to roles, to avoid implying (through the game’s interface) that the game expected them to select a particular option (especially in the experiment conditions where the participant is explicitly made aware of her role). We wanted the player to select whatever option “felt natural” for her throughout the narrative’s development, without instructing her to role-play.

Because actions could be interpreted differently by different people, we needed to ensure that, even without narrative context, game players identify our action choices as typical of the specific roles that were afforded in our game. Like for our selection of interactive narrative roles, we identified a set of candidate actions based on D&D. Our candidate actions were inspired by the actions that the D&D Player’s Handbook (4th Edition) [66] identified as afforded to our selected roles. To validate that our action choices would be perceived by participants of the EXPERIMENTPHASE as typical of the roles we designed them to match, we conducted a survey that is discussed in the VALIDATION

SURVEY Section below.

B.2 Validation Survey

We conducted an online survey in order to validate the design decisions that were taken to satisfy requirements #2 and #3 for our game’s choice structures with respect to their framing of our three roles: Fighter, Mage, and Rogue. Just as in the EXPERIMENTPHASE, in the VALIDATION PHASE we

118 targeted interactive narrative role-playing game players at least 18 years of age. As mentioned in

Chapter5, to avoid introducing biases, we stratified the sampling frame to distinguish and separate the sampling for each phase. Participants were recruited using a combination of convenience and snowball sampling. We recruited from the entertainment social network and news site Reddit. Our validation sample consisted of 231 subjects between the ages 15 and 60 (M = 26.07, SD = 6.89) where 77.73% were males. Our advertisement for recruitment targeted native English speakers, but we did not ask participants to self-assess their command of the English language. Of those recruited,

79.6% reported having played table-top role playing games, with more than half (52.6%) reporting that they play table-top role playing games frequently. Only 2.2% reported never having played computer or console role-playing games.

For the validation of role attributes, we asked survey participants to check from the list of three afforded roles which ones satisfied the presented role attributes (strong, magical, stealthy) as well as the antonymic attributes (frail, non-magical, non-stealthy). For the validation of role descriptions, we took each role’s three sentence description and for each sentence asked survey participants to name the role that best matched the sentence. For the validation of gender and behavioral alignment perception, we asked survey participants to identify what gender and behavioral alignment they considered each role to be mostly associated with. For the validation of actions, we asked participants to select the role that is most likely to execute the action (which will be presented in the experiment as a choice structure option). Of the 231 participants that completed this survey, we used the Fleiss’

Kappa [53] statistic to evaluate inter-rater agreement for the 192 that answered all of the questions.

We obtained a value of κ = 0.801 – an almost perfect agreement per Landis & Koch [76]. The survey randomized the presentation of all questions. We discuss our results for each portion of the survey below.

B.2.1 Validation of Role-Attribute Mappings

Participants were asked to identify, of the three roles, which one(s) were associated to each of the attributes individually (strong, magical, stealthy, frail, non-magical, non-stealthy). Participants could

119 select multiple roles for each attribute, if they felt that attribute was applicable to multiple roles. This section was designed to validate that our role-attribute mappings would be perceived as identified in Figure B.1. In general, participants agreed with our role-attribute triad, identifying that:a) Fighters are generally strong (99.1%), non-stealthy (94.7%), and non-magical (97.3%), b) Mages are generally frail (97.8%), non-stealthy (59.7%), and magical (100%), and c) Rogues are generally not strong

(30.8% considered it frail, but only 19.2% considered it strong), stealthy (98.7%), and non-magical

(69.3%).

B.2.2 Validation of Role Descriptions

Participants were presented with each of the three sentence role descriptions we developed, one sentence at a time. For each sentence, the participant was asked to provide the name of the role that best matched the sentence. In general, participants correctly identified the role that matched the description for each individual sentence. The three sentences in the Fighter description were attributed to the Fighter role by 98.3%, 96.5%, and 92.6% of the respondents, respectively. The three sentences in the Mage description were attributed to the Mage role by 100%, 99.6%, and 98.7% of the respondents, respectively. The three sentences in the Rogue description were attributed to the

Rogue role by 99.1%, 98.7%, and 99.1% of the respondents, respectively.

B.2.3 Validation of Role Gender and Behavioral Alignment

Participants were asked to identify, for each role, an association to a particular behavioral alignment

(good, evil, neutral, or none). Fighters were mostly considered to have a neutral or no alignment

(79.5%), with 20.4% considering them good and 0% considering them evil. Mages were also mostly considered to have a neutral or no alignment (89.5%), with 9.6% considering them good and 0.9% considering them evil. Similarly, Rogues were mostly considered to have a neutral or no alignment

(84.7%), with 0.4% considering them good and 14.8% considering them evil. Participants were also asked to identify an association of roles to genders (male, female, others, or none). Regarding alignment to specific genders (male, female, or others), Fighters were regarded as male by 44.3% of

120 Table B.1 Choice point options in the order they were presented in the game. The value in parentheses indicates the level of agreement of the assignment of a particular choice option to its particular role.

Choice Fighter Mage Rogue 1 Battleaxe (99.6%) Staff (99.6%) Set of Daggers (97.8%) 2 Shake Tree (78.5%) Levitate (98.7%) Acrobatic Climb (97.4%) 3 Charge! (98.2%) Sleep Spell (96.9%) Hide (96.9%) 4 Smash (99.1%) Disintegrate (96%) Acrobatic Jump (96.9%) 5 Intimidate (92.1%) Mind Control (98.2%) Bluff (90.4%) 6 Brute Strike (98.7%) Arcane Missile (95.6%) Silent Strike (98.7%) 7 Fierce Blow (98.2%) Frostbolt (99.1%) Finesse Strike (81.2%) 8 Share war stories (98.7%) Cast party tricks (83.8%) Tell him what he wanted to hear (90.8%) 9 Crushing Blow (97.8%) Fireball (98.7%) Sneak Attack (98.7%) 10 Endure (92.1%) Ice Shield (98.2%) Dodge (93.4%) 11 Skullcrusher (97.8%) Blizzard (98.7%) Poison (91.7%) 12 Bearhug (97.8%) Freeze (98.7%) Trap (91.7%)

the respondents, and female by 0.4%. Mages were regarded as male by 8.3% of the respondents, and female by 3.1%. Similarly, Rogues were regarded as male by 10.4% of the respondents, and female by

6.1%.

B.2.4 Validation of Action Choices

Participants were asked to select, for each individual action, the role most likely to execute that action.

The results for the ratings are summarized in Table B.1. This table presents for each choice structure

(row in the table, 12 total), the choice structure option that corresponds to the particular role (column in the table, three per choice structure), and the percentage of survey participants that agreed that the cell was a match for the column. As shown on this table, our game only included actions that were individually agreed upon by at least 78.5% of participants, with the average agreement being much higher.

121