<<

Playing : The Impact of Game-Play on Attributions of Gender and Racial

Jessica Hammer

Submitted in partial fulfillment of the requirements for the degree of Doctor of under the Executive Committee of the Graduate School of Arts and Sciences

COLUMBIA UNIVERSITY

2014

© 2014 Jessica Hammer All rights reserved

ABSTRACT

Playing Prejudice:

The Impact of Game-Play on Attributions of Gender and Racial Bias

Jessica Hammer

This dissertation explores new possibilities for changing Americans' theories about and sexism. Popular American rhetorics of , and learners' naïve models, are focused on individual agents' role in creating bias. These theories do not encompass the systemic and structural aspects of discrimination in American society. When learners can think systemically as well as agentically about bias, they become more likely to support systemic as well as individual remedies. However, shifting from an agentic to a systemic model of discrimination is both cognitively and emotionally challenging. To tackle this difficult task, this dissertation brings together the literature on prejudice reduction and conceptual change to propose using games as an entertainment-based intervention to change players' attribution styles around sexism and racism, as well as their attitudes about the same issues. “Playable model – anomalous data” theory proposes that games can model complex systems of bias, while instantiating learning mechanics that help players confront the limits of their existing models.

The web-based game Advance was designed using playable model – anomalous data

theory, and was used to investigate three questions. First, can a playable model – anomalous data

game change players' likelihood of using systemic explanations for bias, and how does it

compare to the effectiveness of a control text? Second, how does the game change players'

attitudes as compared to a control text? Finally, are there differences between three different

versions of the game that offer players different rewards for investigating the bias in the game

system?

Advance did not outperform the control text at changing players' likelihood of using

systemic attributions for racism and sexism, nor did it outperform the control text in changing

players' attitudes. However, significant differences were found between White and non-White

player populations in their sensitivity to the different game conditions. White players were

unaffected by differences between versions of the game, while non-White players showed differences in play behaviors, in systemic attribution likelihood, and in attitude. Given that White

Americans may have more entrenched ideas about discrimination in America, we consider the impacts of the game on non-White player populations as an indicator of what future development of playable model – anomalous data games may be able to achieve.

Table of Contents

List of Tables ...... iv List of Figures ...... vi Chapter 1: Introduction ...... 1 Chapter 2: Literature Review ...... 14 Models of Discrimination ...... 14 Reducing Prejudice ...... 20 Achieving Conceptual Change ...... 23 Game Design for Conceptual Change ...... 28 Chapter 3: Design ...... 38 Playable Models, Anomalous Data ...... 38 PMAD Design Principles ...... 43 Game Design Overview ...... 45 Sample of Gameplay ...... 49 Modeling Race and Gender ...... 53 Modeling Bias ...... 55 Reward System Design ...... 63 Chapter 4: Methods ...... 72 Research Questions ...... 73 Procedures ...... 77 Subjects ...... 81 Instruments ...... 82 Attribution tests...... 83 Attitude analysis...... 86 In-game data collection...... 87 Demographic data...... 89 Data Processing ...... 89 Attribution data...... 90 Attitude data...... 91 In-game data...... 91 Demographic data...... 92 Data Analysis ...... 93

i Conclusion ...... 98 Chapter 5: Results ...... 100 Player Source Analyses ...... 100 Analysis of Web-Recruited Players ...... 106 Demographics...... 106 Mortality and priming...... 106 Attribution type...... 109 Attitudes...... 115 Analysis of Mechanical Turk Players ...... 121 Demographics...... 121 Mortality and priming...... 121 Attribution type...... 125 Attitudes...... 134 Analysis by Player Group ...... 146 Chapter 6: Summary and Discussion ...... 150 Project Summary ...... 150 Literature ...... 151 Design...... 153 Research Questions and Results ...... 156 Results cluster one: treatment condition effects...... 160 Results cluster two: bias guess effects for web-recruited players and for White Mechanical Turk players...... 161 Results cluster 3: bias guess effects for non-White Mechanical Turk players...... 162 Discussion ...... 163 Results cluster one: differences between control and game...... 163 Results cluster 2: bias guess effects for web-recruited players and for White Mechanical Turk players...... 168 Results cluster 3: bias guess effects for non-White Mechanical Turk players...... 172 Limitations of the Study ...... 177 Implications for the Literature ...... 178 Implications for Future Research and Practice ...... 180 References ...... 187 Appendix A: Name Selection ...... 201

ii Appendix B: Sexism Attribution Test...... 202 Appendix C: Racism Attribution Test ...... 208 Appendix D: Attribution Test Validation Data Analysis ...... 214 Appendix E: Modern Sexism Scale (adapted) ...... 221 Appendix F: Symbolic Racism Test (adapted) ...... 224 Appendix G: Control Text ...... 227 Appendix H: Check Questions...... 230 Appendix I: Demographic Questions...... 232 Appendix J: Full Data Tables ...... 233

iii

List of Tables

Tables Page 1 Sample differences between League of Legends and Angry Birds 7 2 Subjects by recruitment source and treatment condition 82 3 PMAD design principles 83 4 Attribution test answer categories 85 5 Crosstabulation of player source and player race 101 6 Crosstabulation of player source and living area 102 7 Systemic Sexism pretest means by player source 103 8 Systemic Racism pretest means by player source 103 9 Modern Sexism pretest means by player source 104 10 Symbolic Racism pretest means by player source 104 11 Crosstabulation of player source and games won 105 12 Symbolic Racism pretest means by completion status 107 13 Systemic Sexism pretest means by pretest group 108 14 Systemic Racism posttest means by pretest group 108 15 Systemic Sexism posttest means by treatment condition 110 16 Symbolic Racism posttest means by treatment condition 117 17 Symbolic Racism pretest means by completion status 122 18 Systemic Sexism posttest means by pretest group 123 19 Systemic Racism posttest means by pretest group 123 20 Modern Sexism posttest means by pretest group 124 21 Symbolic Racism posttest means by pretest group 124 22 Number of clients placed by player race 129 23 Systemic Sexism posttest marginal means by race 130 24 Systemic Racism pretest means by player race and gender 132 25 Percentage of score earned from bias group, marginal means by player 134 race and gender 26 Modern Sexism posttest marginal means by treatment condition 136

iv 27 Symbolic Racism posttest marginal means by player race 137 28 Modern Sexism posttest marginal means by guess condition and player 141 race 29 Modern Sexism posttest marginal means by player race 142 30 Modern Sexism posttest means by guess condition, Black, Hispanic, & 142 Other 31 Symbolic Racism posttest marginal means by guess condition and player 144 race 32 Symbolic Racism marginal means by player race 145 33 Symbolic Racism posttest means by guess condition, Black, Hispanic, & 145 Other 34 Summary of research findings 160

v List of Figures

Figures Page 1 Layout of the game Advance 46 2 An empty job in the game Advance 47 3 Job requirements in the game Advance 48 4 Peer reactions to a possible character placement in Advance 58 5 Study overview flowchart 79

vi Acknowledgments

Thank you to my advisor, Dr. Charles K. Kinzer, who challenged my preconceptions from the first day I walked into his office. I will always be grateful for his wisdom, his guidance, and his support. I would also like to thank Dr. John B. Black, who always pushed me to embrace rigor. Thanks to Dr. Matthew S. Johnson and to Dr. Gary Natriello for their disciplinary insights, and special thanks to Dr. Steven K. Feiner who stepped in to help at a critical moment.

I am grateful for the support of the Mellon Interdisciplinary Graduate Research

Fellowship program and the Breneman-Jaech Foundation. The Mellon Program gave me a community of peers and a support system when I badly needed one; particular thanks to Dr.

William McAllister, who gave me excellent advice on navigating the joys and challenges of interdisciplinarity. The Breneman-Jaech Foundation supported the technical development of

Advance, in particular the development of the data collection and parsing system.

I collaborated with some wonderful people, both during game development and during the dissertation-writing process. Austin Grossman helped me commit to a game design direction.

Alex Kaufmann created evocative, simple art. Tess Snider coded critical game features on a tight schedule, and showed me some elegant development tricks in the process. Giulia Barbano, John

Stavropolous, and John Adamus were invaluable both in recruitment and in keeping my spirits up during the recruitment process. Courtney Hall and Stoops Noh provided emergency statistics advice. Anthony Bamonte is a wizard of data processing. Lillian Cohen-Moore helped me create a consistent format and tone. I am deeply grateful to you all.

I have had extraordinary mentors who guided me in my development both as a scholar and as a game designer. I would like to thank Frank Lantz, Scot Osterweil, Clay Shirky, Dr.

vii Helen Tager-Flusberg, and Eric Zimmerman. You believed in me even before I believed in myself, and gave me the chance to show what I was capable of. I would not be here today without you.

I have also had extraordinary students who inspired and challenged me. You taught me far more than I taught you. You are too many to list, but you know who you are. Thank you.

It takes a village to write a dissertation. For unfailing support, even in the face of late- night crises, I’d like to thank Alan McAvinney, Meguey Baker, Anthony Bamonte, Giulia

Barbano, Emily Care Boss, Joanna Charambura, Rowan Cota, Jennifer Coy, Dan Edmonds, Julia

B. Ellingboe, Abigail Estes, Isaac Everett, Ajit George, Bret Gillan, James Grimmelmann, Austin

Grossman, Crystal Huff, Steve Huff, Elena Taurke Joseph, Renee Knipe, Diana Kudayarova,

Kimberley Lam, Rachel Elkin Lebwohl, Ben Lehman, Tse Wei Lim, Blair Kamage, Geoffrey

McVey, Nada O’Neal, Malka Older, Brand Robins, Travis Scott, Brianna Sheldon, John Sheldon,

Alexis Siemon, Richard Silvera, John Stavropolous, Danielle Sucher, Chris Thorpe, Moyra

Turkington, Graham Walmsley, and Krista White. I am so grateful to have friends like you.

I would particularly like to thank Kaitlin Heller, who is a triple threat: collaborator,

cheerleader, confidant. I feel better knowing you’ve got my back, Cap.

Special thanks also to Robert Scott and Amy Scott, who gracefully tolerated my

dissertation as an additional housemate, and who never let me forget about fun along the way.

Thanks to my in-laws, J. Christopher Hall and Susan Hall, and to my sister-in-law

Courtney Hall, for many inspiring conversations about teaching, learning, and technology.

viii I am profoundly grateful for the unfailing support of my family. My mother, Phyllis

Hammer, was the first person to encourage me to consider a research career. My late father,

Michael Hammer, taught me to think broadly and communicate precisely. My sister Alison went

above and beyond in picking up family responsibilities while I was absorbed in this project. My

sister Dana sustained my spirit with timely pep talks and reminders of the value of my work. My

brother David helped me solve some tough problems over excellent coffee, and my sister-in-law

Elizabeth showed me how to handle crunch times with laid-back humor. I am grateful as well for

my grandparents Henry and Helen Hammer, and Leo and Sara Thurm – especially for Pappa

Leo’s reminders to value people as much as ideas, and for Grandma Helen’s commitment to the life of the mind under extraordinarily difficult circumstances.

Above all: my husband and partner, Chris Hall. For everything, always.

ix

In memory of Michael Hammer

My north star

x Chapter 1: Introduction Contemporary American society is haunted by persistent inequality. Health care and housing, pay rates and poverty rates, employment and education all show disparities by race and gender (Hausmann, Tyson, & Zahidi, 2010; Kozol, 1992; Lipsitz, 1995; Valian, 1999; Wenneras

& Wold, 1997). At the same time, overt expressions of prejudice have become increasingly unacceptable (Blanchard, Lilly, & Vaughn, 1991; Klonis, Plant, & Devine, 2005). So what is the true state of discrimination in American society? And what should be done about it? In addressing these questions, Americans rely on their ideas about what discrimination is and how it functions. Similarly, educational programs relating to racism and sexism also operate from assumptions about how these social constructs work.

Prejudice has been extensively theorized and studied over the years (Allport, 1979;

Dovidio, Glick, & Budman, 2005). These theories, in turn, have made their way into the design of diversity and anti-bias curricula (e.g. Adams, Bell, & Griffin, 2007) and the design of prejudice-reduction interventions more broadly. Teachers and curriculum designers, however, do not have a monopoly on what concepts are present in anti-bias work1. Learners are the most important participants – and their ideas about discrimination matter. The constructivist approach to education frames learning as the transformation of the learner's own ideas (Olsen, 1999).

Learners' pre-existing conceptions of sexism and racism are, therefore, critical. To shift learners' ideas about racism and sexism, we must understand what they originally believe2.

The widespread of discrimination among white Americans – or, as I will frame

1 While anti-bias work encompasses many types of discrimination (heterosexism, ageism, religious oppression and more), this paper limits itself to the consideration of racial and gender bias.

2 This is doubly true when learners are not themselves aware that their beliefs and actions are racist or sexist. Whether we explicitly challenge their ideas, reflect them back so that learners can see the true implications of their beliefs, or confront them with anomalous data, we must always work from learners' actual beliefs rather than from their claims about those beliefs.

1 it in this paper, their pre-existing understanding of discrimination – is highly individualistic

(Bidell, Lee, Bouchie, Ward, & Brass, 1994; Bonilla-Silva & Forman, 2000). Learners, particularly white learners and male learners, often enter the diversity classroom with the notion that racism and sexism can only be the products of individual, intentional actions. As we will see, this model only accounts for some of the challenges that women and minorities face in modern

American society. It excludes that operate at the level of aggregates, probabilities, feedback systems, and institutions, as well as being unable to account for unconscious judgments or unintentional harm.

To understand discrimination in American society, learners must incorporate these ideas into their understandings of racism and sexism. However, to achieve this, anti-discrimination education cannot simply provide about the many types of bias. Instead, programs must work to change learners' underlying models of racism and sexism. This transformation must break learners' attachment to the of “color-blind racism” and “choice feminism,” with their emphases on the isolated individual. Instead, learners must develop models of discrimination that include systemic and emergent effects in addition to individualistic ones.

This dissertation explores the challenges of changing learners' theories about racism and sexism. Achieving conceptual change of this kind is not easy, particularly when the issue at hand is a sensitive one in which learners may have a personal stake (Chinn & Brewer, 1993).

Additionally, the learners it is most important to reach are the least likely to voluntarily participate in diversity training, or even to realize that they have something to learn about the problems of race and gender in America (Paluck & Green, 2009). How, then, to achieve a

difficult conceptual shift, particularly when reaching learners who may be resistant?

There is existing work on teaching “systems thinking” - showing learners that parts of the

2 world are represented by interrelated systems, and helping them build mental models of those

systems (von Bertalanffy, 1950; Forrester, 1992; Wilensky & Resnick, 1999; Jacobson &

Wilensky, 2006). Systems thinking is not about acquiring facts about a target domain. For

students to learn systems thinking means they must build mental models for themselves, ones

that accurately represent the desired system.

Broadly speaking, a system is defined by entities and the dynamic relationship between

them (von Bertalanffy, 1950; Forrester, 1961; Richardson, 1999; Meadows, 2008). For example,

one entity might cause the quantity of another entity to increase. Other entities feed back to

themselves, increasing or decreasing as far as the model will allow. Pedagogies for teaching

systems thinking include concept mapping, inquiry-based approaches, participatory simulations,

and computer modeling (Chen & Stroup, 1993; Jacobson & Wilensky, 1996). The common

factor in all these methods is that students engage with models consisting of relationships

between entities, whether by creating such models or analyzing them.

The bulk of the work on systems thinking has been done in science, where there are

broadly accepted systemic representations of particular scientific concepts. For example, a

classic example of systems thinking is building a of an ecological system that

connects predators to prey. While there are differences among ecologists about the details of

ecological modeling, and there is much to learn about specific predator-prey networks, the basic representation of how predator and prey populations relate to one another is shared.

When it comes to racism and sexism, however, it is difficult to even agree on a representation of the underlying problem. Unlike systems in the physical world, the systems that reinforce racism and sexism are relatively difficult to isolate and measure. There is still debate in the field about even apparently simple claims, like what prejudice is and how it operates.

3 Although some models of bias exist (e.g. Schelling, 1971), it is much easier to construct a simplified example of predation than of, say, persistent income inequality. Additionally, learners are likely to be more attached to their explanations for income inequality than for predation, as issues of social inequity connect to their moral outlook and their basic faith in the world (Jost &

Banaji, 1994; Jost, 2002; Jost, Banaji & Nosek, 2004).

The inability to even conceive of systemic explanations for racism and sexism, however, poisons the public conversation about these issues (Iyengar, 1994). Even if the details of the model are wrong, the notion that a systemic explanation is possible is in itself valuable.

What we attempt to do with this work, therefore, is to introduce the concept of a systemic explanation for racism and sexism in the first place – even if learners do not agree with all the specifics of the model being presented. It is a precursor to developing accurate systemic models of sexism and racism. This project, therefore, uses simplified models of actual to construct its theories of what racism and sexism are, in order to help players understand that there might be systemic factors at play when looking at unequal outcomes along racial or gender lines. The goal is for players to be willing to consider systemic explanations in addition to agentic ones. This is a conceptual change in the type of model of discrimination they are applying, even if some of the details of the model may be contested (Niedderer, Schecker &

Bethge, 1991).

As we will see in chapter two, achieving this sort of conceptual shift is a difficult challenge. However, games may provide one way to meet it. The Serious Games movement proposes the use of game-based interventions to achieve social goods, from education to health to activism and more (Sawyer, 2010). The premise of game-based interventions is that games can achieve things that may be difficult for other forms of media. Games do a number of things

4 particularly well, from letting players take on new identities to reaching millions of people worldwide (Gershenfeld, 2010; Klopfer, Osterweil, & Salen, 2009). Harnessing these capabilities can give us a powerful new tool to help learners achieve conceptual change around racism and sexism.

Many games can be productively understood as complex systems of entities and their interrelationships (Strange, 2011). In fact, systemic features such as positive and negative feedback systems are core elements of what it means to make a game (Zimmerman & Salen,

2003). For this reason, games can incorporate many of the features of pedagogies for systems thinking, as will be explored later in this dissertation.

Games provide players with clear goals, and rely on player actions within a complex system to achieve those goals (Isbister, Flanagan, & Hash, 2010). Given appropriate game design, players can be motivated to engage with an arbitrary complex system, to explore it, and to figure out how to exploit it in order to succeed. This is the process of players exploring a game's rule- and strategy space.

Well-designed games provide immediate and meaningful feedback, helping players test theories, then discard or build on them depending on how useful they are (Gee, 2003). This process of directed theory-building, followed by rapid testing, is a powerful way for learners to achieve conceptual change (Chinn & Brewer, 1993).

Games also provide a playful approach to a serious subject – potentially allowing the intervention to reach many people who are otherwise inaccessible to anti-racist and anti-sexist work Games take place in a “magic circle,” a space in which real-world objects, emotions and commitments do not have precisely the same meaning they do in one's ordinary life (Copier,

2005; Huizinga, 1950). Because players can choose the degree to which their game-play has real-

5 world implications, games may help separate people from their everyday positions and make challenging messages more palatable.

Finally, games have an immense reach. This is particularly true of the short-form, web- based games known as “casual games.” As of 2007, casual games reached 200 million people per month, many of whom play for as many as fifteen hours per week (Casual Games Association,

2007). Compared to many other forms of diversity training, such as in-person classes, a digital game intervention has the potential for rapid deployment to a large population who might otherwise be unreachable by anti-bias work.

Of course, games are not a unitary subject. For example, two of the most popular games in the world, as of this writing, are League of Legends and Angry Birds (Rovio Entertainment,

2012; Riot Games, 2012). We can argue that their popularity indicates good design, but they are radically different, demonstrating that a single theory of game design cannot encompass all games.

League of Legends (Riot Games, 2009) is primarily played on home computers. Teams of five players must collaborate to defeat enemy players, destroy computer-controlled agents, and capture their opponents' home base. Players spend their time completing tasks such as moving around a map, coordinating with teammates, predicting the behavior of opposing players based on limited information, executing swift power combos, and investing limited resources into improving their capacities. The game plays out in real time (Juul, 2005) and losses count against players in the long run; once a player initiates a game, they must follow through and win, since losing has a price (League of Legends Wiki, 2013).

Angry Birds (Rovio Entertainment, 2009), on the other hand, is primarily a single-player mobile game; players never interact directly, although they can compare their post-play

6 performance on leaderboards. In each level, players must “throw” birds at structures made of

different materials such as wood, glass, and stone, in order to destroy all the enemy pigs. Players

spend their time calculating trajectories, observing the results of a bird throw, and making plans

based on their understanding of bird properties, material properties, and pig location. The game

can be paused at any time, and players can continue to replay levels as often as they want – there

is no penalty for failure.

Some examples of differences between these games are summarized in Table 1.

Table 1

Sample differences between League of Legends and Angry Birds

League of Legends Angry Birds

Number of players Up to five per team One

Platform PC Primarily mobile

In-game Time3 Haste & synchronicity Interval control

Point of View Isometric 2D

Player Control Single humanoid character Slingshot

Loss Condition Base destroyed Pigs remaining at end of level

Even these games, however, have some fundamental similarities. For example, they are

both digitally mediated, rather than using cards, blocks, dice, or other physical game

technologies. In this way, both games are quite different from Jenga (Scott, 1983), which relies on its materiality, or from solitaire, which can be played using either physical or virtual cards. In both games, success and failure are primarily mediated by the technology; the role of player

3 Player control of temporal progression in the game, as per Elverdam & Aarseth (2007).

7 judgment is in strategy development rather than evaluating outcomes, unlike Apples to Apples

(Out of the Box Publishing, 1999). For that matter, players can win and lose the game in the first place, rather than experience an open-ended story as in 1,001 Nights (Baker, 2012).

Rather than rehash the debates about whether there are particular features that make a game a game rather than a puzzle or a toy (e.g. Costikyan, 2002), we raise this issue to point out that it is important to make claims about specific types of games. This paper proposes a theory for understanding a particular game type, and does not argue that all games will function in this way. Instead, we can look at to what extent a particular game fits the theory we develop, and we can use design principles based on this theory to guide the creation of games.

The advantages that games can provide depend on the game's being well-designed.

“Well-designed,” however, is not a neutral term. For example, game mechanics that allow losing players to catch up to the leaders can heighten the sense of tension in a game (LeBlanc, 2005). If that tension is consonant with other choices in the game, and appropriate for the audience for the game, then perhaps it can be described as “good” design. However, the very same choice in a different context might be a poor one, if it undermines the rest of the experience of play.

When we are discussing games for learning, therefore, we must always evaluate the

“goodness” of game design in relation to the game's learning model as well as to playability and enjoyment. For the purposes of this project, good game design means that in addition to being playable and enjoyable, the game challenges player's existing conceptions of bias and encourages them to shift their mental model to accommodate systemic explanations. The research indicates that one effective way to do this is through demonstrating to players that their existing model does not explain their experiences (Chinn & Brewer, 1993).

In order to talk about what we mean by “well-designed” and what we mean by “game,”

8 therefore, this dissertation proposes a theory of “playable model – anomalous data” games.

PMAD theory treats games as “playable models,” in which the game's rules express a model of

the situation at hand. However, when players engage with the game, they may encounter

“anomalous data” - game experiences that do not fit their preconceptions. Because players know

that game rules may be different from real-life rules, they may be more willing to rethink the

model they are using within the game context in order to explain, manipulate, and win the game.

Using this theory, we develop guiding principles for PMAD-based game design, as follows:

• The game system models the relevant domain.

• Player actions affect, and are affected by, the model.

• Players receive feedback about the impacts of their actions as they relate to the model.

• The game goals point players toward model conflict.

• Players can experiment with the game's model.

• Players must figure out rules and strategies for themselves.

These principles are based on game design theory and educational theory, as detailed in

chapter three. They can serve as an analytic framework for analyzing existing games, or provide

design guidelines for creating games to address specific issues. This dissertation takes the latter

approach, using the PMAD design principles to guide the design and development of Advance4,

a PMAD game which supports the development of systemic models of bias through play.

In chapter four, we outline a method for testing the impact of Advance, as a way of

4 Advance is available online at http://www.replayable.net/advance/.

9 testing PMAD theory more generally. Can a PMAD game that models systemic bias change the

type of explanation players use to explain incidents of bias? And might it also change players'

attitudes toward bias as part of the process? To investigate these questions, we gather data on

players' attribution style for incidents of racism and sexism, and on their attitudes about race and

gender. By randomly assigning players either to play the game, or to read a text about related

concepts, we can see whether the game or the text has a greater impact on players' attribution

styles and attitudes. This allows us to understand whether the game is effective at helping

American players use systemic models to explain discrimination5.

However, the overall test of effectiveness gives us limited information, because there are

many things about a PMAD game that might make it a more or less effective intervention. If it

succeeds, we have not isolated which factors were most important. Similarly, if it fails, we

cannot know whether PMAD theory has failed, or whether there is something else about the

game that made it less than useful. Work has begun to define game design patterns (Bjork &

Holopainen, 2004) and learning mechanics (Plass, Homer, Kinzer, Frye, & Perlin, 2011), but the

impact of specific game design decisions on a player's experiences and ideas is still an open

field.

The purpose of this dissertation, therefore, is not only to test the effectiveness of this

intervention, but to evaluate design decisions within the context of the PMAD game itself.

Specifically, in this game we test how different design patterns for in-game rewards are more or

less effective at changing players' attribution styles and attitudes about racism and sexism.

In line with the PMAD design principles, racial and gender bias are deeply embedded in

5 The cognitive and ideological biases described in this paper may be specific to American culture (Morris, Menon, & Ames, 2001). A cross-cultural analysis of mental models of discrimination would be a fruitful area for future research.

10 the game's model, as detailed in chapter three. To surface whether players are developing theories about the game's bias, players are offered the chance to identify the bias present in the game from a set of seven possible options. However, asking this in a vacuum is not necessarily meaningful in the game's model. Rather, we need to consider how their knowledge of the game's bias is connected to the rest of the game – in this case, through the game's reward system.

The three reward conditions in the game all give the player some type of value in exchange for identifying the bias. By holding all other game factors constant, we can analyze the comparative motivation to engage with this part of the game system under different reward types.

We choose three different reward patterns, based on common game design patterns (Bjork &

Holopainen, 2004). The first type of reward is an informational reward; players gain knowledge about how the game's system works. The second type of reward is a financial reward; players receive a flat sum for demonstrating their knowledge. The final type of reward is a generative reward, in which the reward is valueless on its own but gives the players new capacities by changing the rules of the game.

By randomly assigning players to reward conditions while holding all other game elements constant, we can examine which of these design techniques make Advance more and less effective at conveying complex concepts and challenging player preconceptions. We can also isolate the impact of the changes in the game's reward system on players' experiences in the game. More generally, we may be able to draw conclusions about the way in-game reward systems can shape player goals and activities.

The findings of this dissertation, detailed in chapter five and explored more deeply in chapter six, are not straightforward. Advance did not outperform a control text at either changing players' likelihood of using systemic attributions for racism and sexism, nor did it outperform the

11 control at changing players' attitudes about race or gender. This finding is not entirely surprising.

Advance tackled an extremely difficult problem using a novel design theory. If Advance had

outperformed the control text, it would have been a promising indicator in favor of PMAD

theory. However, the failure to find a main effect of the game does not necessarily contradict

PMAD theory. Rather, we must consider whether there were problems in the game's design that

were not based on PMAD theory, or whether PMAD theory itself is flawed. As we will see in

chapter six, it is most probable that elements of both are true. There were flaws in the game's

design, including the short play period, that limited its efficacy. However, in light of the data

gathered from this study, the PMAD design principles will also need to be revised and tested

further.

The study also found a significant difference between White and non-White player

populations6 in how they responded to the different game conditions, including the condition in

which players did not guess at the bias in the system and were not aware they could be rewarded

for it. White players were unaffected by differences between versions of the game. They did not

respond to the different bias guess conditions with differences in their play behaviors, and they

did not differ on any of the outcome measures. However, non-White players did show

differences in play behaviors and on the outcome measures. Given that White Americans benefit

most from racism in America, their beliefs about discrimination may be more entrenched,

making them a harder population to affect (e.g. Kunda, 1990). We therefore consider the impacts

of the game on non-White players as an indicator for what future development of PMAD games for racism and sexism might be able to achieve.

6 As described later in this dissertation, the distinction is in fact between web-recruited players (who were primarily White) and White players from Mechanical Turk taken together, compared to non-White players recruited from the Mechanical Turk site.

12 Finally, there are further lessons in the nature of the differences found for non-White

players. These players performed differently across the four bias guess conditions on post-test

measures of both racist and sexist attitudes. However, there were no performance differences

between the groups on in-game measures. We therefore conclude that factors other than changes

in play behavior were driving these differences between conditions. This study originally

hypothesized that players would engage differently with the game's anomalous data under

different reward conditions, and that differences in outcome measures would be driven by

changes in player behavior in response to different in-game opportunities and incentives. Instead,

it appears that change can also be driven by other factors, even within the context of a PMAD

design. For example, players may have drawn conclusions about the game world's commitment

to an unjust status quo based on the mechanics available to address bias in the game system.

When the game implied that discrimination was the norm, players retroactively justified it; when

it rewarded players for speaking out, players did not.

Although Advance did not outperform the control text at changing players' attributions or

attitudes, we believe that PMAD design theory can be revised to be more useful to a poorly-

theorized area. Using non-White players' reactions as our guide, we can develop PMAD theory to help create more effective entertainment-based prejudice-reduction interventions. As we will see in the following chapter, this is an important and difficult task.

13 Chapter 2: Literature Review

Models of Discrimination This chapter considers theories of how discrimination manifests in society, rather than its roots or origins. What do racism and sexism look like? How can one recognize them? What should one do to intervene? Answers to these questions are based on mental models of how discrimination, as a social process, functions. We propose that the mental models of many

Americans are significantly different from those of scholars, educators and other experts, and that they are different in ways that go far beyond the possession of different information.

This dissertation explores the differences between two models of bias: individual and systemic bias (e.g. Gomez & Wilson, 2006; Feagin, 1972). Individual bias is bias which is rooted in the beliefs, actions and attitudes of individuals, while systemic bias emphasizes the effects of larger systems and processes (Gomez & Wilson, 2006; Adams et al., 2007; Feagin, 2006;

Schmidt, 2005). For example, no one individual is responsible for the Federal Housing

Administration's policies from 1930 to 1970, but taken together, those policies resulted in white

Americans receiving far better housing outcomes than black Americans and other minorities

(Lipsitz, 1995).

The distinction between individual and systemic bias is cognitively significant. The former is an example of a direct process, in which actions are directly tied to outcomes (Chi,

2005). The latter is best represented as a complex system, in which structural factors, remote causes, and the interrelationship between entities cause racial or gender disparities (Meadows,

2008).

In direct processes, the focus is on the actions of individual agents and their intentional work toward a larger goal. The actions of agents directly predict the larger pattern, and are

14 considered in sequential order rather than simultaneously. Causes lead to effects in a relatively

straightforward fashion. This is a familiar narrative mode. As E. M. Forster might put it, “the

white man fired the black woman, and then discrimination occurred” is a story we can all

understand (Forster, 1956).

Systems, on the other hand, consist of entities and the relationships between them (von

Bertalanffy, 1950; Forrester, 1992; Wilensky & Resnick, 1999; Jacobson & Wilensky, 2006;

Richardson, 1999; Meadows, 2008). These relationships are dynamic, consisting of “flows” from

one entity to another. One entity might cause the quantity of another entity to increase, for

example. Entities can also return upon themselves, creating self-regulatory mechanisms.

The behavior of a system is latent in its structure (Meadows, 2008). This behavior can be conceptualized as emerging from the relationships between the entities of the system. In other words, systems are interdependent. The behavior of a complex system can also not always be

predicted from the behavior of individual entities; holistic results emerge from the behavior of

the system as a whole. Finally, systems can feed back to themselves; their full range of behaviors

emerges over time, as the patterns of activity accumulate.

Most Americans begin with a “deterministic” or “centralized” (Wilensky &

Resnick, 1995, 1999; Resnick, 1996) rather than with a systemic way of seeing the world.

American learners primarily understand racism and sexism as direct processes – in other words,

as centering on individual bias (Hughs & Tuch, 2000). To include systemic effects in their

understanding of bias would require an underlying conceptual shift, helping people see sexism and racism as manifestations of a complex system. Many people have trouble making sense of complex systems (Mandinach & Cline, 1994; Tversky & Kahneman, 1974), so we would expect this shift to be difficult.

15 So what, precisely, do naïve learners believe about the nature of racism and sexism?

What sort of processes do they believe are at work? For this, we turn to popular ideologies of race and gender. Bonilla-Silva has demonstrated the prevalence of “color-blind racism” among

white Americans (Bonilla-Silva & Forman, 2000; Bonilla-Silva, 2006). Hirshman defines a

similar phenomenon regarding gender as “choice feminism” (Hirshman, 2007). As we examine

these ideologies, we will see strong evidence of thinking about discrimination as agentic rather

than systemic.

Color-blind racism, as outlined by Bonilla-Silva (Bonilla-Silva, 2006), relies on

underlying beliefs he refers to as frames. These frames reveal clear evidence of agentic thinking.

The first of these is the minimization of racism frame. This frame suggests that discrimination is

no longer an important part of life for minorities. To maintain this frame in the face of

widespread racial inequality, people using it define discrimination as all-out racist behavior -

“old-fashioned racism,” as scholars might call it (Sears & Jessor, 1996). Racist incidents with no obvious perpetrator or a lack of intentionality are minimized and dismissed – and all the more so when the problem cannot be reduced to a single incident, but rather appear in patterns and probabilities of behavior. This is precisely a definition of racism as a direct, agent-based process, and the corresponding minimization of systems of racism.

The other three frames support this approach by emphasizing the centrality of individual decisions for social behavior, even as they are simultaneously used to diffuse responsibility for those decisions. The frame of abstract liberalism emphasizes an individual's right to make free choices, regardless of larger social implications. The frame of naturalization removes individual responsibility for those choices by framing them as near-biological imperatives. And finally, the frame of cultural racism provides an explanation for racial inequality in which culture, not

16 individuals, is to blame; since individuals are not at fault, racism could not possibly be at work.

Taken together, these concepts underlying color-blind racism reveal a popular understanding of racism as a direct process – precisely what Chi's research on the prevalence of direct-process theories would lead us to believe (2008). The same is true of choice feminism.

Choice feminism is a cultural frame which valorizes choices made by women – any choices made by women, with no attention to their larger systemic impact. In other words, choice feminism looks only at direct processes of sexism and feminist action. Hirshman (2007) illustrates the way these supposedly individual choices (in her case, the choice for high-powered women to take primary responsibility for housework and opt out of elite careers) cause society- wide effects, such as the absence of women from the corridors of power, and are rooted in the effects of complex systems such as the tax code.

We can see, therefore, that for both racism and sexism, popular discourse reveals ideas of individual choices and direct processes. These discourses ignore systemic effects, or even use the idea of systemic bias as a way of minimizing the experience of discrimination.

Of course, we must also demonstrate that a systemic approach is a valuable addition to agentic understandings of sexism and racism can be understood through systems thinking. Much work has been done that demonstrates systemic bias around race and gender, both theoretically

(e.g. Schelling, 1971) and empirically (e.g. Swim, Hyers, Cohen, & Ferguson, 2001). These theories encompass individual acts of sexism and racism, but also explain situations that a direct- process approach cannot explain.

For example, a direct-process approach excludes racism and sexism that relies on feedback effects. In her discussion of women and ambition, Fels (2004) argues that women learn what is possible from their daily experiences, and then, rationally, adjust their ideas to match

17 their experiences. That adjustment, in turn, affects the types of experiences they have from day to

day. This type of feedback can occur at more abstract levels as well. Valian (1999) analyzes the

impact of repeated interactions that involve even a small level of bias. She demonstrates that bias

is amplified if these interactions are part of a feedback system rather than considered

individually.

A direct-process approach also excludes probabilistic effects. Consider the paucity of

films that pass the Bechdel Test (“Bechdel Test Movie List,” 2010). No individual movie is

necessarily problematic on its own. However, a moviegoer is far more likely to encounter films

which do not represent women as full-fledged characters, than to encounter films that portray

women as human beings with their own interests, motivations and goals.

Of course, there are other types of racism and sexism that are not well accounted for by

American's naïve models of discrimination, such as implicit bias (Greenwald, McGee &

Schwartz, 1998; Stanley et. al., 2011). However, the categories above are the ones which most

directly reject the assumptions of direct-process racism and sexism: that individuals are the unit

of analysis, and that each incident should be judged in isolation. For that reason, we emphasize

these categories, both in our analysis and in the design of an anti-bias intervention.

We can see, therefore, that the cultural discourse of Americans around racism and sexism frames them as direct processes. However, sexism and racism must also be understood systemically if we hope to change the social structures that reinforce and perpetuate racial and gender inequality. When facing a problem which has systemic aspects, a systemic approach allows the consideration of appropriate remedies. Having the right model is a crucial factor in problem-solving, as we know from decision-making research (Newell & Simon, 1972).

Remedies that only address direct-process discrimination will leave vast swaths of inequality

18 untouched – but Americans who use a direct-process model may react negatively to any

intervention that affects the systemic level. For example, an individual-centric, direct-process

approach to racial inequality significantly predicts less-progressive views on a wide range of

race-related policy measures (Iyengar, 1989; Iyengar, 1994; Hughes & Tuch, 2000; Lau & Sears,

1981). If only an individual's choice matters, then it is unsurprising that direct-process thinkers

react negatively to any intervention which constrains that choice because of systemic effects.

Whether these interventions focus on changes in individual behavior, or on reshaping social

structures, people will not support interventions whose rationale does not match their mental

model of the underlying problem.

For example, one result of direct-process thinking is that the word “racist” or “sexist” is

often treated as more problematic than discrimination itself. Consider Bush's claim that the worst

moment of his presidency was being called a racist by Kanye West – not the 9/11 attacks or

Hurricane Katrina or the collapse of Lehman Brothers (Bush, 2010). Rather than addressing the

systemic inequalities which lead to unfair outcomes, the emphasis is on the identification of

racism, which is necessarily framed as an accusation under the direct-process model. A culture in which sexism and racism cannot be even be identified without changing the conversation is a culture that cannot solve the problems of racial and gender bias.

The challenge, then, is one of conceptual transformation. Rather than conceiving of discrimination solely as the result of direct processes, learners must also consider it as a complex system. Without the language to describe the systemic aspects of racism and sexism, learners cannot identify and support appropriate remedies, or even identify major aspects of the problem in the first place. So how do we change people's minds about racism and sexism? And what work has already been done toward this goal?

19 Reducing Prejudice If we hope to propose methods for expand people’s models of discrimination, we must understand what has already been proven effective in this area. This section reviews existing prejudice-reduction interventions. Many of these interventions are based in psychological or sociological theories of racial and gender bias (Oskamp, 2000). However, as we shall see, these interventions largely focus on combating the causes of prejudice, not on changing learners' underlying models of how racism and sexism operate. Even those that include this element do not treat it as the fundamental conceptual shift that it is, but rather as a standard element of the curriculum to be conveyed (Adams, Bell, & Griffin, 2007).

Effective prejudice-reduction interventions must be based in accurate theories of prejudice. Duckitt (1994) proposed a four-level model of the origins of prejudice: genetic predispositions, norms for intergroup relations, mechanisms of social influence, and individual differences in attitudes and behaviors. Oskamp (2000) then connects the development of anti- prejudice interventions to these four levels. He concludes that interventions are unlikely, at present, to occur on the genetic level. However, it should be possible to design interventions around social norms, mass or interpersonal influence, or the modification of individual personality characteristics.

Paluck & Green (2009) review and categorize existing prejudice-reduction interventions, identify which theories the intervention relies on, and evaluate the standard of evidence for the effectiveness of each type of intervention. They define six types of intervention that are relatively well-supported by experimental evidence, both from the laboratory and from the field. These six approaches include explicit cross-cultural training; contact with members of other groups; cooperative learning with members of other groups; value consistency exercises, which show

20 bias to be inconsistent with other deeply-held values; peer influence; and exposure to books, films and other entertainment media.

These interventions are rooted in a variety of theories: the contact hypothesis, social interdependence theory, theory, social learning theory and more (Khan, 2000; Paluck

& Green, 2009). These theories, however, emphasize the psychological and social causes that produce prejudice – not the models and ideas that learners have about what prejudice is, where it comes from, and how to recognize it in the real world.

In fact, some of these approaches, as effective as they are in reducing prejudice, may simultaneously contribute to the spread of direct-process models of discrimination. Consider, for example, the peer influence approach. Peer behavior can set the social norm for expressions of prejudice and attitudes toward discrimination (Blanchard et al., 1991; Paluck, 2006). Some theories even give peer influence a major role in the nature of prejudice itself (Crandall &

Stangor, 2005). However, without special training, peer influence programs are likely to spread naïve learners' own understandings of how prejudice functions. Direct-process models of racism and sexism are the ones most learners start with, and are easier to understand than emergent- process models, as we will see below. They may, therefore, become part of the social norm.

This is not to say it is impossible to teach an emergent understanding of racism or sexism.

Schmidt (2005) outlines seven concepts tied to the teaching of racism as systemic inequality.

These concepts include the larger notion of individual, institutional and cultural levels of racism

(Feagin, 2001; Feagin, 2006), as well as specifics such as the internalization of racism, which references unconscious prejudice (Fox, 2001), and historical inequality, which functions as a feedback effect (Oliver & Shapiro, 2006). However, the way in which these concepts are taught must be tied to the desired effect, namely conceptual change. Even if learners are able to grasp

21 these specific concepts, as long as they adopt a direct-process model of discrimination, they will not be able to build on them effectively.

A successful intervention aimed at developing a systemic understanding of bias will build on one of the six types found to be effective by Paluck and Green (2009), while addressing the specific learning challenges related to conceptual change. This dissertation explores the theory and practice of creating entertainment-based interventions using games.

Entertainment-based interventions are interventions that use popular forms of media to change people's beliefs and attitudes about sexism, racism, or other forms of prejudice. To date, this type of intervention has been particularly poorly theorized (Paluck & Green, 2006).

Research on entertainment-based interventions thus far has focused on two approaches: first, changing social norms through mass influence, and second, encouraging perspective-taking and empathy at the individual level (Strange, 2002; Zillmann, 1991). Entertainment interventions have been shown to be effective at changing social norms, but not in changing personal beliefs

(Paluck, 2009). However, recent research on perspective-taking in fiction suggests that this, too, may be possible. Kaufman & Libby (2012) found that inducing a reader to identify with a fictional character, then later revealing that the character was a member of an out-group, reduced the tendency to stereotype that character and improved reader attitudes toward their group. We therefore believe that with an appropriate theoretical grounding, entertainment interventions can change individual attitudes as well as social norms.

This dissertation proposes one appropriate theoretical grounding for using games as an anti-bias entertainment interventions to help players shift their conceptual models of racism and sexism from the individualistic to the systemic. Games are certainly a popular entertainment media form; for example, 97% of American teens play games (Lenhart et. al., 2008). However, in

22 the games community, a division is often drawn between “serious” and “entertainment” titles,

with games created for the social good falling into the former category (Michael & Chen, 2005).

Under this typology a game would be “serious” if it were constructed with an eye to social

change and successful research. However, that does not necessarily prevent the same game from

being entertaining. Games with serious intentions can still have satisfying, nuanced, entertaining

gameplay. For example, Dog Eat Dog (Burke, 2012) is a role-playing game that addresses issues of colonialism, power, and injustice, which are certainly serious topics; it has also been nominated for multiple game design awards (Diana Jones Award, 2013; Indiecade, 2013).

Additionally, the perceived seriousness of a game can be influenced by the context in which it is deployed. Games that are made mandatory, for example, are less likely to be perceived as entertaining (Heeter et. al., 2011). We therefore conclude that games are able to fit within Paluck and Green's category of entertainment intervention (2009), even if within the field of game studies a particular game might be called “serious” instead.

With this understanding in mind, we turn to the problem of identifying an appropriate theory on which to build an entertainment-based intervention using games. To do so, we begin with what we know about how individuals achieve conceptual change. If we hope to build a field of game-based interventions around prejudice and discrimination, we must work from what learning theory tells us about how individuals move from agentic to systemic approaches, then create a theory of game design to match.

Achieving Conceptual Change When we talk about conceptual change, we ask learners to understand processes, which are cognitively represented as mental models. Mental models are distinct from facts or skills.

They are sets of beliefs that can be treated as an internal simulation (Gentner & Stevens, 1983).

23 This simulation can be mentally “run” to make predictions about the world (Johnson-Laird,

1994; Jonassen & Henning, 1996). However, presenting information is not enough to cause people to create a new model. Changing one's mental model means acquiring new beliefs, integrating them into a runnable model, and then understanding the implications of those beliefs when one's new model is “executed” (Chi, 2008).

Significant research has been done on achieving conceptual change in science. Students come to school with naïve models of how things move, how heat is generated, and more. These models are inaccurate, but they are sufficient to explain learners' everyday experiences (Perkins

& Simmons, 1988; Roth, 1990). Through the schooling process, students learn the accepted theories of physics, biology, chemistry and mathematics. However, learning a new theory is not the same as achieving conceptual change. Unless the models underlying students' naïve understandings are specifically and explicitly addressed, students will often retain them. For example, students can succeed in high-school and university physics courses, yet still believe that heavier objects fall faster than lighter ones (Champagne, Klopfer, & Anderson, 1980). These inaccurate models are most likely to appear when students are presented with novel problems, for which they do not have well-rehearsed strategies. Unless learners can expect to be presented only with familiar problems, conceptual change is necessary.

Contemporary approaches to conceptual change emphasize the role of anomalous data, or evidence which contradicts students' naïve theories (Chinn & Brewer, 1993; Hewson &

Thorley, 1989). The anomalous data are intended to show learners that their existing models are inadequate to explain their experiences. Instead, learners must adopt a new, more scientifically accurate model which explains the data at hand. For example, many fourth-graders believe that sweaters and hats are warm because they generate heat. Watson and Konicek (1990) observed a

24 classroom experiment designed to produce anomalous data for this . Students wrapped thermometers in sweaters and observed whether the temperature changed. The students' naïve belief that the sweater would make the temperature rise was confronted with an unexpected , in which the temperature remained the same.

Anomalous data, however, does not always cause conceptual change to occur. Watson and Konicek's subjects developed multiple alternate hypotheses to avoid changing their minds, such as cold drafts which might have somehow gotten in to the sweaters and affected the results

(1990). Similar resistance to anomalous data has been demonstrated in other contexts, such as modeling electrical circuits or deciding what causes colds (Johsua & Dupin, 1987; Kuhn, 1989).

Chinn and Brewer (1993) list the factors that influence how people respond to anomalous data. They argue that there are four areas of influence: the nature of the learner's prior knowledge of the subject, characteristics of the new model the learner is asked to adopt, what the anomalous data themselves are like, and how the learner processes the material. In order to understand the challenges of achieving conceptual change around racism and sexism, we must examine each of these four areas in turn.

First, we take up the issue of prior knowledge. In the case of racism and sexism, prior knowledge does not just include explicit knowledge about racial and gender bias; it also includes the underlying models that learners believe explain disparate outcomes in American society.

Even if learners have never explicitly encountered racism or sexism, they are enculturated with

“common knowledge” explanations for these outcomes. As described earlier in this chapter, popular models of racism and sexism focus on individual rather than systemic explanations, usually in a way that undermines or even opposes anti-bias work. These underlying models should be considered part of learners' prior knowledge about racism and sexism, whether or not

25 they can articulate them explicitly.

If prior knowledge is “entrenched,” or deeply embedded in the way the learner understands the world, it is more difficult to change (Klahr, Dunbar, & Fay, 1990; Kunda, 1990).

Ontological beliefs, or beliefs about the fundamental categories and properties of the world, are particularly likely to be entrenched and are hard to change (Chi & Roscoe, 2002). However, beliefs may also be entrenched because they satisfy personal or social goals (Chinn & Brewer,

1993).

Beliefs about racism and sexism are likely to be deeply entrenched. Individuals are often resistant to anti-racist and anti-sexist work, which has been theorized in many ways. This resistance may be rooted in group self-interest (Kluegel, 1985; M. C. Taylor, 1998), in fear

(Stephan & Stephan, 2000), or even in conflict between conscious and unconscious beliefs

(Gaertner & Dovidio, 1986). Members of privileged groups resist acknowledging their role as oppressors, while members of disenfranchised groups may have internalized racist or sexist attitudes. Additionally, as we have demonstrated above, changing one's beliefs about prejudice from a direct-process to an emergent-process model is an ontological shift (Chi, 2005). This suggests that even individuals who are neither intellectually nor socially committed to a direct- process model of racism are likely to hold entrenched beliefs about it.

Second, we must consider the nature of the new theory which learners are being asked to adopt. For conceptual change to occur, a new theory must be available, coherent, and intelligible

(Chinn & Brewer, 1993). A bad theory is better than no theory at all, but coherent theories are preferred to less-coherent ones. Learners may try to adopt theories they do not understand, but they will be unable to apply them to novel situations (Linn & Songer, 1991).

Understanding complex systems, unfortunately, is difficult. Our cognitive biases make it

26 easier to understand individually-based, direct explanations than ones that rely on systemic, probabilistic or emergent thinking. For example, the means that humans use ease of retrieval as a proxy for likelihood (Kahneman, Slovic, & Tversky, 1982). Rather than seeing the big picture, we are biased toward using individual events to make judgments about larger patterns. Systems cannot be reduced to a set of incidents; systemic effects occur precisely at the level of the system. This means that the availability heuristic serves us poorly when thinking about emergent processes.

Because the emergent properties of systems often rely on chance, misconceptions about probability can also contribute to the difficulty of understanding them (Chi, 2005).

Misconceptions about probability persist beyond high-school and into adulthood (Batanero &

Sanchez, 2005). One misconception particularly relevant to emergent thinking is the “outcome approach” bias. Subjects reason backwards from the outcome of a particular trial to decide what the probability of its occurrence must have been. For example, if told that it rained, they concluded that there must have been a high probability of it raining, regardless of what they were told beforehand about the likelihood of rain (Batanero & Sanchez, 2005). Using outcomes to reason backward about the model that produced them is a useful process, but not when probabilistic processes are incorrectly treated as deterministic (Prediger & Rolka, 2009).

Third, we must examine the anomalous data itself. Chinn & Brewer (1993) argue that to achieve conceptual change, the anomalous data must be credible and unambiguous. Data that is not credible can be easily dismissed; data that is ambiguous can be distorted to fit an observer's existing theories (Chinn & Malhotra, 2002). The learner must also be presented with multiple anomalous experiences; any given experience may be able to be accommodated within the learner's naïve theory, but the full set of experiences cannot (Watson & Konicek, 1990).

27 Finally, there is the role of how the learner processes the anomalous data. Deep

processing means paying careful attention to the material at hand, elaborating relationships between the new material and prior knowledge, and working out the larger implications of the new information (Craik & Tulving, 1975; Nickerson, 1991). This sort of processing has been shown to promote theory change (Tesser & Shaffer, 1990).

It may be difficult to induce deep processing around issues of sexism and racism because of the social desirability effect. It has become increasingly socially unacceptable to express overtly racist or sexist attitudes (Jussim, 1991). Rather than thinking carefully about issues of race and gender, learners may use their knowledge of what is socially acceptable to guide their actions in this sensitive area.

There is also the issue of motivation. Deep processing requires a cognitive commitment on the part of the learner, whether that happens because they are personally engaged with the issue at hand or because they believe they will have to justify their positions (Tesser & Shaffer,

1990). No matter the reason, we must acknowledge that there are many individuals who are not motivated to learn about racism and sexism, let alone to reduce their own prejudice, and are therefore unlikely to make such a cognitive commitment. One might even argue that those least motivated to change are the ones who most need to be reached by prejudice-reduction interventions.

Conceptual change is difficult enough to achieve in science, but expanding learners' models of discrimination may be even harder. The question, then, is how to make it happen most effectively. We propose that games can help.

Game Design for Conceptual Change The Serious Games movement proposes that games can be effective interventions for

28 learning, social change, health, business and more (Sawyer, 2010). These interventions can

include approaches as diverse as including pro-social activities in a commercial game, such as

Farmville's fund-raising for Haitian earthquake relief (Morales, 2010); using commercial games

to prepare students for future learning (Hammer & Black, 2009; Squire & Barab, 2004); creating

custom-built games that inform or persuade (Bogost, 2007); or using game mechanics to drive

desired real-world behavior (Lieberman, 2006). What these diverse approaches have in common

is the notion that some problems are difficult to solve by conventional means, but can be

addressed by the unique affordances of games.

To understand how games can support conceptual change, it is important to understand

what games are and how they function7. Games can be described on three different levels: mechanics, player experience, and culture (Juul, 2003; Salen & Zimmerman, 2003, 2005).

Game mechanics refer to explicit rules, but also to the goals, resources, and materials used to play the game (Salen & Zimmerman, 2003). In Tetris (Pájitnov, 1984), for example,

blocks fall from the top of the screen at a pace that increases as the game continues. The blocks

are in-game entities that the player can interact with; the behavior of the blocks, as well as the

player's capacity to affect them, is governed by rules.

Player experience describes the player's emotional engagement with the game, and the physical or cognitive efforts they put forward to achieve the game's goals. Games can provide a wide variety of emotional and aesthetic experiences, which players often participate in constructing for themselves (Lazarro, 2005). For example, Flow (Chen & Clark, 2006) allows players to seek their desired level of challenge. Players can control the emotional experience they

7 While this author is conceptually committed to the fundamental similarity of digital and non-digital games, this paper primarily considers the impact of digital games.

29 have by choosing what level to play on, and how aggressively to pursue the other creatures in the

game.

Finally, games have a unique cultural position. As Squire and Barab (2004) found, games

can reach learners who are alienated by other forms of media. Students who reject school-based

literacy, for example, spend time and effort on reading and writing in games such as World of

Warcraft (Blizzard Entertainment, 2004) and Lineage (NCsoft, 1998; Steinkuehler, 2008). Not all

aspects of game-based literacy overlap with what is taught in school; players do not learn how to

write a job application letter, while students do not learn how to persuade strangers to join their

guild. However, there are also significant areas of overlap, such as using games as subject matter

to teach technical writing (Vie, 2008) or creating text-based games as creative writing (Kee,

Vaughan, & Graham, 2012). While games do not, cannot, and should not serve as a replacement for school-based literacies, they do provide an alternate source of connection for at least some learners with at least some literacy tasks.

Media interventions can help with prejudice reduction (Paluck & Green, 2009), but demonstrating their effectiveness requires a better theoretical grounding than what currently exists. To develop that grounding, we will consider, in turn, how mechanics, player experience, and game culture can help us design an intervention for cognitive change around racism and sexism. As we will see in the next chapter, games can present anomalous data to players in unusually effective ways. However, games also have challenges, particularly in the areas of credibility and transfer.

Digital games can be framed as playable simulations because they provide complex systems of rules with which the players can experiment. In Civilization IV (Firaxis, 2005), for example, players' cities gain resources based on the values of multiple other factors in the game,

30 such as the types of land adjacent to the city, the buildings the player has created in the city, and what type of government the player has adopted. Taken together, the rules governing how cities accumulate resources present a model that the player must first comprehend and then manipulate

(Squire & Barab, 2004). The model presents a very specific view of what matters in building a civilization, and of how cities grow (Squire & Durga, 2005).

Computer simulations can provide an alternate route for students to engage with anomalous data and form new theories (Beichner, 1996; White, 1993). Simulations have been shown to be effective at reducing misconceptions and achieving conceptual change (Lipson,

1997; Taylor & Chi, 2006; Zietsman & Hewson, 1986). In some situations, simulations may even be more effective than direct instruction. For example, Taylor & Chi (2006) found that simulations and text instruction both helped learners improve on decontextualized assessments, like tests. However, only the simulation caused an improvement in a contextualized situation, like one that might be found in the real world.

The effectiveness of computer simulations comes from their ability to present learners with otherwise inaccessible data, and to give learners the opportunity to experiment with that data (Windschitl & Andre, 1998). Simulations can, like games, be framed as a type of vicarious experience (Hammer & Black, 2009). If learners develop their naïve models through personal experiences, which serve as data to test their theories in the real world, vicarious experiences can help them develop better models through a similar learning process (Gorsky & Finegold, 1992).

The models in games, unlike the models in simulations, are designed for playability rather than for faithfulness to the real world. However, these models can nonetheless convey powerful social messages. As Bogost (2007) demonstrates, the rules of a game reflect ideologies which may either undermine or reinforce the game's overt message. Consider The Sims 2, for

31 example (Electronic Arts, 2004). While the game overtly emphasizes tending to one's characters'

needs, the underlying message of the game is consumerist. Characters can only be made happy

by the purchase and use of material goods, from couches to televisions to art and more. This

message is represented only in rules, and it becomes visible to players through experimentation

in play.

Experimentation is one of the central elements of play. To take a simple example, in

Super Mario 64 (Nintendo, 1996) the player must figure out how far the character can jump.

Which jumps can Mario hurdle, and which will send him hurtling to his doom? The only way8 to

find out is to try jumping, and dying, until the player's model of Mario's ability matches the

game's underlying structure (Church, 1999). Players approach the game as scientists, making

hypotheses, conducting tests, and then examining the game world for confirming or

disconfirming data (Gee, 2003).

Players do not conduct experiments in the game world at random. Their experiments are

guided by the game's goals and reward structure. An appropriate choice of reward mechanic can

focus player attention on any desired aspect of the game (Ciavarro, 2008; Klimmt & Hartmann,

2009). This aspect of games addresses one of the major weaknesses of simulations, namely that the unguided use of simulations is far less effective than simulations embedded in an explicitly instructional context. Even when explicit instructional material is included within the simulation itself, learners often do not attend to it (De Jong & Van Joolingen, 1998). This weakness limits the use of most simulations to classrooms and other monitored spaces.

8 While the Super Mario 64 example implies that experimentation is a solitary experience, many players build on the research of their peers by consulting FAQs and walkthroughs. In some games, these walkthroughs can give the player precise instructions, but many games include elements that mean walkthroughs can only guide, not define, the player's choices. For example, the game Desktop Dungeons (2013) generates a new level with randomly chosen monsters for each game. The shared material for the game emphasizes strategies and models for novice players to adopt, not how-to instructions. For more on collaborative model-making, see Steinkuehler, 2008.

32 The goals of a game serve as a deeply integrated guide for play. An appropriately

designed game can stimulate hypothesis testing around how to accomplish a given goal within

the game system (Gee, 2003; Osborne & Squires, 1987). If the goals relate to common

misconceptions about the game model, then players can be induced to try experiments which are

likely to conflict with their initial models of how the system works. The player can then receive

feedback on the outcome of their play decisions, to motivate them to pursue or abandon a

particular path.

By providing appropriate feedback on how the player is doing in achieving that goal, we

can address another major issue in conceptual change: learners' ability to perceive anomalous

data in the first place. Chinn & Malhotra (2002) identified four possible stages at which

conceptual change can break down: observation of the anomalous results, interpretation of those

results, generalizing the results to construct a theory, and retention of the new theory for future

encounters. They found that observation was the stage at which most learners failed. Fewer than

half the subjects who made incorrect predictions were able to correctly identify what had

happened right in front of their eyes. The more ambiguous the data, the less conceptual change

was achieved. However, once learners were able to correctly perceive the anomalous data, more

than half of them were able to interpret, generalize and retain the material. Though this does not

guarantee conceptual change, it is clear evidence of effective confrontation with anomalous data.

Games can amplify the feedback that players get, and motivate them to attend to it. When

Mario falls into a pit, the player has no doubt that something has gone wrong with their mental model; because Mario's tumble interferes with the player's goals, they care about it9. Games,

unlike simulations, are not ideologically committed to fidelity. Desirable objects can glow, or

9 Presumably.

33 shimmer, or make one's character invincible; failure to understand the game model can result in

one's army being captured or one's character falling into a pit. This exaggerated feedback may

not be realistic, but it may help players get past the perceptual difficulties found by Chinn and

Malhotra (2002).

Perhaps the most powerful effect of player experimentation in games, though, is precisely

that it is player-driven. Meier describes a good game as a series of interesting choices (Juul,

2005). What makes these choices interesting is that players must assess the options available to

them and decide which approach they will take, based on their understanding of the game's

model, their assessment of their own capabilities, and their in-game and out-of-game goals (Gee,

2003). This requires deep processing of the sort that encourages players to engage with anomalous data. Tasks that require learners to engage in active, constructive and integrative behavior are the most effective at producing conceptual change (Chi, de Leeuw, Chiu, &

LaVancher, 1994). This is exactly what games do.

It is a challenging design problem, however, to create a game that both serves a larger purpose and contains interesting choices (Klopfer, Osterweil, & Salen, 2009). The model underlying game-play must reward deep processing. However, if players can use their knowledge of socially acceptable behavior to play the game, they will be engaging only in superficial thinking. Since player choices and experiments are at the heart of what is interesting about a game, players who use social desirability to play the game are not truly playing, and hence unlikely to respond well to the game's mission. It is a designer's obligation to create challenges that cannot be solved simply by doing what is socially appropriate, even if the game addresses pro-social themes.

However, the same challenge becomes an opportunity when done well. Games can

34 provide an environment where people do not have to do the socially acceptable thing, but can experiment with a variety of approaches or identities (Klopfer, Osterweil, & Salen, 2009). Games can also reach people who do not care about the issue that the game addresses. If people come to the game for the play experience, they can still be engaged by the game's underlying concepts – particularly if those concepts are embedded in the mechanics, with which the player cannot help engaging deeply (Lindley & Mayra, 2002). Given the immense reach of casual, web-based games, this is an opportunity to influence many individuals who would never seek out a prejudice-reduction intervention or think critically about bias. One might even argue that those are the people who need such an intervention the most.

Retaining a game's sense of playfulness is also important because games invoke an epistemological frame – a set of assumptions about learning – that is particularly useful for achieving conceptual change. Windschitl and Andre (1998) found that simulations are most helpful for epistemologically sophisticated students. These students believed that learning is complicated and happens over time; that knowledge is context-dependent; and that people can learn how to learn.

When game players are engaged with a game, they display evidence of sophisticated epistemological beliefs such as the ones Windschitl and Andre (1998) describe. Failure is not failure; it is an opportunity. Players expect that their job during a game is to learn how to play it, which may take quite some time as they explore the many challenges of the game world. The skills they learn for one challenge may have to be reinvented for another challenge, and may not even be applicable to future challenges. At the same time, the way they must learn to engage in the new challenges of the game is to experiment repeatedly, pay close attention to the results, and identify new courses of action based on what they learn. These are the epistemological beliefs

35 that make simulations most useful, and they are already present in gamers (Gee, 2003).

As we can see, games provide strong support for some of the elements that support conceptual change. Games can simulate a complex model and allow players to interact with it experimentally. By providing goals, the game gives players directed challenges rather than leaving them to wander in a simulated environment. Games can reach players who are otherwise unmotivated, and evoke an appropriate epistemological frame. Most of all, games require players to take action and make considered choices.

A well-designed game can provide support for these elements over and above what is possible in a simulation. For example, both games and simulations allow players to experiment with complex models. However, in a well-designed game, players’ in-game goals align with both the learning content and with their desire to play, allowing them to guide themselves through the experience. Both games and simulations allow for experimentation and failure, but games also evoke a powerful cultural frame reinforcing that failure is an opportunity. Both games and simulations can be engaging to learners, but learners are far more likely to engage with a game voluntarily.

Games also have challenges, of course. Many of these challenges are closely related to games' most significant advantages. The “magic circle” effect helps draw players in, but it may also make players less likely to transfer knowledge gained in the game to the real world. Games' cultural position makes them accessible to unenthusiastic learners, but it may also make the material in the game less credible. Games require active participation by the player, but providing interesting choices means not all players will have the same in-game experience.

While we will return to these limitations at the conclusion of this dissertation, more research is required to understand them, particularly as they relate to social-issue games.

36 Overall, however, game-based interventions are a promising approach for achieving

conceptual change around complex, socially sensitive issues. By comparing their efficacy to

interventions of other types, we can understand to what extent they impact player's attribution

styles; by experimenting with the reward systems that guide players through the game, we can

learn something about what makes these games effective. A theoretically-grounded game for conceptual change around discrimination may provide an effective prejudice-reduction intervention, while the empirical results comparing reward systems within the game can provide best practices for future game-based interventions around issues we can only now imagine.

The next chapter translates the theory reviewed in this chapter into specific design

principles for creating games that may help players incorporate systemic understandings of a

problem. It also describes the design and development of Advance, a custom-created web-based game using these principles. In the remainder of this dissertation, we empirically investigate the question of whether Advance can, indeed, increase players' willingness to use systemic explanations for racism and sexism, and whether there are different impacts between different versions of the game; we also examine whether the game affects players' attitudes toward these socially sensitive issues.

37 Chapter 3: Design

Playable Models, Anomalous Data This dissertation uses a custom-designed and -developed game, Advance10, to try to shift players' conceptual models of racism and sexism. This chapter defines a theory that provides one way to guide such a design process, based on the research outlined in the previous chapter and on how games function as a medium, and proposes a set of specific design principles based on that theory. It then describes how the game Advance was designed, both as a game and in terms of what aspects of bias it attempts to model. Finally, we explore how Advance uses this chapter's principles to present anomalous data to players while emotionally engaging them in the service of achieving conceptual change around how bias functions. In the remainder of this dissertation, we will see to what extent it has succeeded in doing so.

In order to accomplish this, we must understand how games communicate. Many games function as playable models (Bogost, 2006, 2007). Games, as opposed to simulations, are not intended to have perfect fidelity to a knowledge domain. Instead, games simplify the domains they represent into rule systems that provide meaningful opportunities for play.

Sometimes the rule system directly represents a domain in the real world. For example,

Portal (Valve, 2007) has a physics engine with direct fidelity to real-world physics, which incorporates concepts such as momentum. However, it is not enough for the system to represent physical reality. Portal makes these rules of physics playable by introducing goals to point the player in a particular direction (the exit), new elements about which to reason (the portal gun), and challenges to make the player do so (robot sentries). In Portal the representation of the

10 As previously noted, Advance can be played online at http://www.replayable.net/advance/.

38 physics system happens through action, and its exploration is done through the actual play of the

game (“Teach with Portals,” n.d.).

Sometimes, however, the rule system functions more abstractly. Consider the game The

Sims 2 (Electronic Arts, 2004). The rules of the game define needs for each character, which are

represented to the player as bars. The player can direct a given character to satisfy those

needs by taking various actions in the world, usually in relation to objects. For example, sleeping

on a bed restores a character's energy and comfort levels, but decreases their hygiene and bladder

needs. The way in which these needs are abstracted and manipulated have no direct parallel in

the real world. Sleeping on a bed certainly restores our bodies – at least if the bed is a

comfortable one! - but not in the same direct way. Also, the player has very little control over

how the abstraction functions. The player can be clever about what actions they command their

character to take, but they have only strategic control. Execution matters less than decision-

making.

Although these two games are very different, they can both be understood as “playable

model” games - as opposed to games where, for example, player judgment or taste is at stake, as in Apples to Apples (Out of the Box Publishing, 1999). In each of these examples, there is a set

of rules embedded in code. These rules determine what objects exist in the game world (for

example: portals, robots, couches, fishtanks, energy meters), how they function, and what player

inputs are available to interact with them. These rules provide constraints for player behavior –

what they can do in the game, how the game will respond. When players try to achieve their

goals in the game, they are always interacting with the constraints of the ruleset, developing a

strategy based on its affordances.

39 This is true whether the goals are set by the game or the player. For example, in Portal the player must discover how to move through a puzzle room, from the entry to the exit gate.

Although the player may form other goals along the way, particularly setting sub-goals to achieve the larger goal, accomplishing this goal is the only way to continue playing the game.

The only way to move to the next level is to finish the current level. Portal puts physical obstacles between the player and their goal, whether in the form of deadly acid or laser-shooting robots. The player has affordances that are fundamentally physics-based – they can create portals, jump, run. The process of solving a Portal level is a process of figuring out how to exploit the player's abilities in the context of rule constraints to achieve a goal, which requires understanding of the game system.

The Sims 2 is a slightly more complex case because the game system affords many different goals. Real-world examples of play goals in The Sims 2 include trying to build a house that suits the player's tastes; replicating the lives and activities of the player's friends; setting up screenshots to use in making a comic book; or killing the characters in assorted grotesque ways.

For each of these goals, though, the process of achieving them is the same: constrained by the rules of the game's model, with the tools of limited player interaction possibilities. For example, a player who wants to keep their Sims happy at all times cannot simply press a button to do so.

They must do so indirectly, by giving their Sims a pleasant environment and tending to their daily needs.

What we have just described are game mechanics – the actions and interactions through which the player engages with the game (Salen & Zimmerman, 2003; Jarvinen, 2008). For playable model games, this takes the form of entities and relationships for the player to explore and master (Cook, 2006). The formulation of games as playable models is analogous to the

40 formulation of complex systems described in the previous chapter (e.g. Forrester, 1961). Game entities and their relationships provide a complex system for the player to explore, with dynamic behavior inherent in the design of the system itself arising through play.

Although playable-model games are about exploring the affordances of a game system, such exploration does not automatically induce learning. If we hope to have players learn about complex systems from playable-model games, we must go beyond game mechanics. We must consider learning mechanics and assessment mechanics as well (Plass, Homer, Kinzer, Frye, &

Perlin, 2011).

Learning mechanics are research-based conceptual approaches to how a game might help players learn (Plass, Homer, Kinzer, Frye, & Perlin, 2011). Learning mechanics are design patterns, which are always instantiated in game-mechanical choices that are specific to a particular game. However, a particular learning mechanic provides a grounded and substantiated organizing theory for how to make those decisions. Designers and scholars agree that learning mechanics must be embedded in the core mechanics of play (Isbister, Flanagan, & Hash, 2010;

Plass, Homer, Kinzer, Frye, & Perlin, 2011). Learning mechanics let us see how to adapt the moment-to-moment process of play into a meaningful learning activity.

Similarly, assessment mechanics are design patterns, instantiated in games, that allow the assessment of player activity in the game (Plass, Homer, Kinzer, Frye, & Perlin, 2011). As with learning mechanics, they are not separate game experiences. Rather, they are theoretical constructs that are expressed in the game mechanics of a particular game. Assessment mechanics allow us to create games that let players express their understanding of a specific concept.

41 When selecting learning and assessment mechanics, there are several criteria to keep in

mind. They must be theoretically substantive and research-backed (Plass, Homer, Kinzer, Frye,

& Perlin, 2011), and they should be able to integrate deeply with the mechanics of the game itself (Isbister, Flanagan, & Hash, 2010). When instantiating learning and assessment mechanics, the designer must watch for skill confounds and manage cognitive load (Plass, Homer, Kinzer,

Frye, & Perlin, 2011).

The previous chapter reviewed research that suggests a confrontation with anomalous data is a good approach to help players adopt a systemic model for bias (e.g. Chinn & Brewer,

1993). American learners hold to their existing models unless they are forced to revise them based on anomalous data. Even when people have experiences that do not match their mental models, they will often work to reconcile their existing model with the new data. Only when their previous model proves incapable of explaining the new information will they be willing to abandon it and develop a new model instead. In light of this research, we can adopt encountering anomalous data as our learning mechanic.

While this learning mechanic is substantiated in the research literature, the question becomes how to incorporate it in playable-model games. Existing work with simulations shows that people can in fact achieve conceptual change through an anomalous-data inquiry process.

The question is how we can effectively do this with the tools of game design. The need to bring anomalous data into the player's awareness adds additional design constraints to the “playable model” approach to games.

The core element of playable-model games is for players to figure out how to use the rules to achieve the goal – and therefore it is the process that we try to harness for conceptual change using games. That process is mediated by the challenges the game provides, which

42 provide the motivation for engaging with the game's model, and by the affordances available to

the player, which provide the tools for doing so. Finally, there needs to be clear feedback; if players are to develop effective strategies for using their limited abilities to interact with the game, they must be able to predict the effects of their actions on the game's state.

It is for this reason that we argue that players attend to a game's model when the rules of the game provide an obstacle to achieving in-game goals (Gee, 2003). While this may not be the only reason players ever attend to the model of a game, it leads us to define design principles for

PMAD games. If we expect players to include the target concepts in the models they are building of how the game works, the new ideas conveyed through game design should be deeply integrated into the rules, goals and challenges of the game itself (Isbister, Flanagan, & Hash,

2010).

PMAD Design Principles Based on this theory, we propose a set of principles for designing playable model – anomalous data games, henceforth referred to as PMAD games11. These principles build on the

way that players engage with game systems in order to encourage a confrontation with

anomalous data and therefore a model shift.

Principle 1: The game system models the relevant domain. The model is built into the

game rules. Text can clarify what the player is supposed to learn or attend to, as can narrative

elements. However, players' primary strategic engagement is with the game's ruleset, which

should therefore encapsulate the desired model.

11 PMAD defines a type of game and a type of learning mechanic that are appropriate for a specific category of learning problem, but does not specify a particular assessment mechanic. The assessment mechanics used in this study are specific to the model developed for Advance, and are discussed in the context of the game rather than generalized. Future work in this area could attempt to identify assessment mechanics that are particularly effective for PMAD games.

43 Principle 2: Player actions affect, and are affected by, the model. The things that players

can do in the game are inputs to the model in some way, and the outputs that the player cares

about are affected by the model. Players' strategy development is based on the actions they are

able to take in the game; unless they are engaging with the model in question in meaningful

ways, they will find other routes to their goals.

Principle 3: Players receive feedback about the impacts of their actions as they relate to the model. Players cannot develop play strategies effectively if they lack the necessary data about what the model is doing and how their actions affect it.

Principle 4: The game goals point players toward model conflict. Players develop strategies in order to move toward in-game goals. The game goals must create a situation where players' existing mental model conflicts with the model in the game.

Principle 5: Players can experiment with the game's model. Because strategy

development in games is an inquiry process, players can test and compare different possibilities

– as opposed to some games which make you commit to your decisions and don't let you test

possibilities.

Principle 6: Players must figure out rules and strategies for themselves. Players need to

know enough to figure out what they are supposed to be doing and what the impacts of their

behavior is, but they can't just be told how to succeed – figuring out how to overcome the game's

challenges is the heart of engagement with the playable model of the game.

These principles define what it means to be a PMAD game. However, they are not sufficient to construct a particular PMAD game, nor are they unique to a PMAD-theoretical approach. Instead, they are guiding principles to be used in concert with the constraints of the

44 particular domain being represented in the game, with decisions about game type and game

mechanics, and with the logistical and practical constraints of game development and

distribution.

Within a particular domain, the PMAD principles do not provide the only way to communicate through play. There are many types of game that could address issues of racism and sexism; for example, Dog Eat Dog (Burke, 2012) interrogates the colonial experience through storytelling and role-taking. Similarly, there are many underlying models that a game could choose for systemic bias, using different elements of racial and gender bias in the world we live in. Even given a PMAD-theoretical approach and a particular set of ideas around systemic bias, Advance is not the only game that could be created.

This chapter, however, describes the design and development of Advance as a particular

example of using the PMAD principles in practice. First, we explain the design of the game,

including how it models racism and sexism, what forms of feedback it presents to players, how

the game's goals encourage players to engage with the model, and more. Next, we look at the difference between the three experimental versions of the game, so that we can begin to break down the efficacy of individual strategies within a larger PMAD context. Finally, we return to

the PMAD principles to demonstrate how the game adheres to these design principles, and how

they helped shape the design of the game during development and play-testing.

Game Design Overview In this game, our goal is to represent "systemic bias" in a playable way. Doing so means

solving three design challenges. First, how do we model systemic bias? We must choose an

evidence-based model for how systemic bias operates, and develop a way to reduce it to simple

rules. While we lose real-world complexity and nuance, we gain in clarity and focus. Second, we

45 must choose a context and interface for representing these rules to the player. Finally, we must

define interactions between the player and the model, which allow players to explore elements of

the game's model as they pursue their in-game goals. All these elements must adhere to the

PMAD design principles outlined above.

Advance is a custom-designed web-based game developed in Flash. The player takes the

role of a corporate recruiter, whose goal is to make money and keep their business afloat. During

the game, clients approach the player character for assistance with getting a job. The player

makes money by placing these clients into jobs that suit them, and by helping them get promoted

over time. The player must survive for five minutes - five years of game time - without running

out of money to pay their business expenses. Whatever money the player has left at the end of

the game becomes their final score.

Figure 1. Layout of the game Advance

The game visually represents the player's clients and the company they work with in an isometric and highly stylized way (Figure 1). The player's client base is represented as a row of

46 abstract figures, any of whom can be clicked for more information. The company is represented as a connected network of jobs, some of which are available for the player to fill and some of which are occupied by non-player characters (NPCs). This abstract representation allows players to focus on the challenges of placing a client and earning money.

There are four key constraints that players must attend to in order to place their clients into jobs.

First, job availability constrains client placement. The game board begins with some empty jobs. Other jobs become empty when player clients are promoted or when NPCs leave a position. When a job is empty (Figure 2), the player can put a client into it. However, an empty job can also be filled by the arrival of an NPC. The player must be vigilant if they are to notice when a job is empty and act on it before it is taken by an NPC.

Figure 2. An empty job in the game Advance

47 Next, job requirements constrain client placement. Even if a job is open, that does not mean a given character can take it. Each job has a required level of competence, creativity, and charisma. Characters have the same three characteristics; in order to take a job, the character must meet or exceed the job's requirements in all three areas (Figure 3). Characters begin with low levels of each, but can be improved by upgrading them. If the player wants to place a character in a particular job, they can upgrade them until they can meet the job's requirements.

However, money constrains upgrades. The player must pay for every upgrade to their characters. Each successive upgrade to a given characteristic costs more money, as the character receives more specialized and advanced training. Jobs with stricter requirements, therefore, require the player either to choose more skilled clients or to invest money in training.

Figure 3. Job requirements in the game Advance

48 Finally, success constrains advancement. The player begins with access to only the

lowest-level jobs in the company. When a client is placed in a job, their success meter begins to

fill. When a client's success meter completely fills up, they move up to the next level in the

company, where higher-paying jobs are available. That level is then unlocked for the player; the

player can move between levels to place clients and consider jobs. However, clients only appear

at the higher levels after they have been promoted from lower levels. They never appear on their

own, except on the very first level of the game.

Taken together, these constraints make some jobs better than others, and create

challenges for the player around placing clients. Some jobs pay particularly well. Others have

low requirements, making them easy for any client to take without investing money. Yet other

jobs will lead to rapid success, putting the client on track for better-paying jobs at a higher level.

The player must strategically decide which clients to invest in, which jobs to attend to, and which

levels of the game to focus on. Players must also balance time spent gathering information with

time spent taking action. Making a good client-job match is critical, but so is the player's overall approach to managing their time, money, and attention.

Sample of Gameplay To better understand the flow of the game, this sample of gameplay follows Jane, a player who encounters the game for the first time. Jane is a composite of several players who assisted with play-testing the game during the development and pilot process.

Jane loads the page on which Advance is hosted. She immediately encounters the game

tutorial, which asks her to complete four tasks in sequence12. First, Jane must place a

client into a job. Next, she must upgrade a client. After that, she must have a client

12 These tasks are explained more fully later in this section.

49 promoted. Finally, she must visit the second level of the game. Each task in the tutorial is presented after the previous task is complete. Jane also has the option to exit the tutorial and begin play using a button at the side of the screen. However, Jane wants to learn how to play the game, so she completes the entire tutorial. Only then does she begin play.

When the game begins, Jane looks at the client list on the right-hand side of her screen.

By clicking on different clients, she can see who is available for placement. When she clicks on each client, she can see their name, their picture, and how good they are at different aspects of their job.

Jane begins the game with three clients in her queue. She clicks on each of them to see who they are, and decides to place “Destinee Benton,” the last of the three available clients.

With Destinee selected, Jane looks at the left side of the screen to see the available jobs.

While there are many jobs on the screen, represented by empty spots in the office building, some of them are already occupied by NPCs, or non-player characters. There are currently four jobs available for Jane to choose from.

Jane clicks on one of the available jobs, and the job's requirements pop up under

Destinee's stats. Two of the stat bars are blue, indicating that Destinee is qualified for the job. However, one of them is red, indicating that she does not have enough skill in that area.

Simultaneously, each NPC adjacent to the job selected displays what their attitude toward

Desiree would be, should she be placed in that job. There are three NPCs adjacent to

50 Desiree. Two are showing hearts, indicating a favorable attitude, but one is showing a red skull, indicating a potentially difficult relationship.

Jane must now choose among three options. She can upgrade Destinee, spending money to train her for this job. She can test a different client in the same job, and see whether they might be better qualified. Finally, she can try Destinee in a different job to see if she can arrange a better fit.

Jane sees that there are three other jobs open, so she clicks on one of them to see if she can find a better fit for Destinee. Indeed, Destinee's stat bars are all blue – she is qualified for the job. The “Place Client” button activates, allowing Jane to confirm that she would like to place Destinee into the job. Jane notices that in this job, there are three hearts and one skull among the adjacent non-player characters, so it seems like Destinee might be treated better here. Jane decides to place Destinee into the job. Jane receives $1500 as her commission.

Jane goes back to the first job where she tried to place Destinee. Maybe she can place someone else into that job! She goes back to her client list and notices that a new client has arrived. She clicks on him. His name is “Emiliano Cruz” and all his stat bars are blue.

He is eligible for the job – and this time, all the adjacent non-player characters are showing hearts. Everyone likes Emiliano! She decides to place him into the job and receives another $1500.

Jane checks her score to see how much money she has. She's doing fine – she's received some money from placing her clients – but she notices that business expenses are being

51 regularly deducted from her account. She decides that just to be safe, she'll place another

character before she spends money on upgrading anyone.

Meanwhile, a notice pops up telling her that Emiliano has been promoted! He is waiting

for her on the second level of the game. Jane clicks the button to take her to the second

level. When she clicks on the jobs there, she sees that they will earn her more money than

the first-level jobs did. Emiliano is in her client list, so she immediately selects him.

While he has one red stat bar – these jobs have stricter requirements than the lower-level

jobs – Jane decides that she wants to invest in Emiliano. That higher salary is very

tempting! She pays $600 to upgrade his charisma and places him into the job, earning

$3,000.

Jane remembers that she placed Desiree before she placed Emiliano, so she goes back to

the first level to check on Desiree. When she clicks on Desiree, she can see Desiree's

success meter, which determines how quickly she will be promoted. Desiree's success

meter is filling up very slowly. Jane wonders why. She knows that not all characters are

treated fairly – maybe this is an example. She considers pressing the large, flashing

“Blow the Whistle” button to report the unfair treatment of Desiree, but she doesn't feel

confident that she knows enough to do that yet. Instead, she decides to continue playing.

However, she pays closer attention to how characters seem to be doing. Over the course

of the game, she has multiple encounters where she must make decisions about where to

place characters and how much to invest in them. She does so with Desiree's experience

in mind.

As this gameplay snippet illustrates, players of Advance must make decisions about which characters to invest in, both in terms of time and money. They select among different

52 characters, choose which jobs to place characters into, and invest money into characters to

improve their job chances. While doing so, they must maintain enough money in their bank

account to pay their bills. If they go bankrupt, they lose the game. However, if they keep the

company afloat for five years of game time, they win. Their final score is how much money they

have left at the end of the game.

Modeling Race and Gender If we hope to model race and gender bias within this game, we must decide how we are representing race and gender. What categories and definitions will we use? And how will those categories be conveyed to the player?

Gender is modeled as two categories: male or female. These categories are used by major public institutions such as the U. S. Census (Howden & Meyer, 2011). This categorization does not address transgendered individuals or those who do not identify with either gender. These groups are certainly discriminated against in the workplace, and may be included in future versions of the game.

Gender is represented through a character's name and picture. Research indicates that a woman's name or picture is sufficient to evoke gender bias (Steinpreis, 1999).

Race is a more difficult modeling problem, because it is a poorly defined construct.

Turning to the U. S. Census questionnaire is unhelpful, as it includes more than a dozen racial categories - too many for players to recall and compare. However, the census office then collates

this data into six "major race groups" (Hume, Jones, & Ramire, 2010). Of these, only four

represent 1% or more of the American population. These groups are white (72%), Hispanic

(16%), black (13%), and Asian (5%). These are the groups we have chosen to use in Advance.

53 There are two potential concerns with this modeling choice. First, it excludes the categories of American Indian / Alaska Native and Native Hawaiian / Pacific Islander, as well as those who identify as multi-racial. These groups do experience discrimination, and may be included in future versions of the game. Second, the category "Hispanic" is collected and analyzed orthogonally to race, while the game treats "Hispanic" just as it treats "black," "white," and

"Asian" characters. To address the full complexity of Hispanic / Latin@ identity is beyond the scope of this game; however, Hispanic Americans are a large group who suffer ongoing discrimination. For example, Hispanic women experience the largest pay gap in the United

States (Hegewisch & Edwards, 2011). Due to the size of the population affected, we chose to risk oversimplifying Hispanic identity and include Hispanic as a category nonetheless.

As with gender, race is signaled through a character's name and picture. Research indicates that names and pictures are sufficient to invoke racial bias (Bertrand & Mullainathan, 2003).

Early play-testers were able to distinguish between male and female character images, and were able to match images to racial categories correctly. Names were selected from lists of the most common names for each of the four racial groups being modeled. Names that appeared on more than one list were eliminated. See Appendix A for more information.

Race and gender are never under the player's control. At the beginning of the game, the game randomly selects one race, one gender, or one race-gender combination to be discriminated against. The player cannot affect this choice, only react to it.

Similarly, every character in the game is categorized by race and gender, including the clients who seek the player's assistance. The player can never choose which clients approach them, only how they respond to the clients who do. Because every client is affected by the

54 game's bias, the player must always grapple with the bias in the game, but without being able to

affect it directly.

Modeling Bias How, then, is bias in the game modeled? We know it must exist, and we now understand

the categories on which it operates. But how does the game's bias actually work?

The first model of discrimination used in Advance is the microaggressions model (Pierce,

Carew, Pierce-Gonzalez, & Willis, 1978; Sue, 2010; Sue & Capodilupo, 2008).

Microaggressions refer to daily interactions that contain coded messages of racism and sexism.

Each individual interaction may seem harmless, but the cumulative impact can be significant.

For example, Sue describes how he and a black colleague were asked to move to the back of a

plane in order to improve the plane's load balance, although there were white passengers who

had arrived later and were seated nearby (Sue, 2010). To the white flight attendant making the

request, it seemed perfectly reasonable. To her passengers of color, on the other hand, the request evoked Jim Crow laws. Worse, it was part of a pattern of repeated small aggressions and humiliations.

Microaggressions draw on two critical elements of systemic bias, as defined earlier in this paper. First, many microaggressive events are unintentional. The microaggressor does not realize that they are enacting existing race or gender narratives, believing instead that what they are doing is neutral or harmless (Sue & Capodilupo, 2008). Additionally, microaggressive episodes may be minor in and of themselves. The cumulative impact of microaggressive interactions is what makes them so painful (Sue, 2010).

55 An agentic analysis of a microaggression, such as moving black passengers to the rear of

the plane, would focus on the microaggressive event in isolation, and would emphasize that the

aggressor did not intend to act hurtfully. Not only does this analysis fail to capture the real effect

of microaggressions, it makes the microaggressive incident worse. Am I overreacting? Should I

respond? These kinds of questions themselves become a source of stress and trauma (Sue, 2010).

A systemic analysis, on the other hand, looks at the impact of the event rather than the

intent of the perpetrator, and sees it as part of a pattern that extends across time and occurs in

many different contexts. For example, there is nothing inherently wrong with moving black

rather than white passengers to the back of the plane. It is only troublesome because of the

pattern it evokes. Trying to analyze each incident in isolation leads to dead-end questions (should

no person of color ever be asked to move their seat?), while understanding and addressing the

pattern can lead both to changes in individual behavior and to challenging the pattern itself. It is

precisely this aggregate and cumulative impact that makes microaggressional analysis so useful

to a systemic understanding of racism and sexism, and it is this that we attempt to model in our

game.

The theory of microaggressions may be a systemic one, but do microaggressions

negatively impact the lives of minority populations and women? The research indicates that it

does. Microaggressive stress injures the health of targeted groups - in our case, women and

people of color (Buser, 2009; Harrell, Hall, & Taliaferro, 2003). It lowers their subjective well- being and may be a risk factor for depression (Brondolo et al., 2008; Cortina & Kubiak, 2006;

Finch, Kolody, & Vega, 2000; Hill & Fischer, 2007). It can invoke , causing impaired performance when people must act against type (Steele, 1997). Finally,

56 microaggressive stress can directly impair cognitive performance for members of the group

being targeted (Salvatore & Shelton, 2007).

When modeling microaggressions in Advance, we must capture both the systemic nature of microaggressions and their negative impact. To do this, we assume that microaggressive stress affects success in the workplace. Each character in the game has an internal meter representing their progress toward advancement. The more skilled a client, the fuller the meter when they are first placed. The better the work environment, the faster the meter fills; a poor work environment, on the other hand, can cause the meter to decrease over time. Promotion and demotion occur when the meter fills entirely and empties entirely, respectively.

The "goodness" of an environment is determined by how many microaggressions the

client encounters on a regular basis. When a client is placed, relationships are calculated for the

client with all characters who are adjacent to them on the game board. If the character is from a

privileged group, relative to the client, the relationship is judged to contain microaggressive

interactions. For example, if women are privileged in a particular game of Advance, they make

jobs adjacent to them a worse fit for men. On the other hand, members of the same group are

considered to have a supportive effect. Male characters would be helped by their peer

relationships with other male characters. The total number of positive peer relationships and the

total number of negative peer relationships are used to calculate the speed at which a client's

success meter fills - or empties.

A client will be promoted fastest, therefore, if they are placed in a job with the most

positive peer relationships and the fewest negative peer relationships. Clients from the dominant

group never experience negative peer relationships; the player must consider both positive and

negative relationships only when placing clients who are being discriminated against. At the

57 same time, the impact of these relationships is only felt over time, and is evident only in

comparison to the performance of clients in better situations.

When a player has both a client and a job selected, the player can see the reactions of the

client's potential peers on the board. Negative relationships are marked by a skull and positive

ones by a heart (Figure 4). The player can therefore experiment with different characters for the

same position, or with the same character in different positions, both for strategic reasons and to

discover which characters are being discriminated against. The visual feedback also allows the

player to see the peer-based nature of microaggressive stress at a glance.

Similarly, each client's success meter is made visible to the player. When a client is

selected, players can see how far that client is toward promotion, whether their meter is filling or

emptying, and how fast it is moving. This information makes the cumulative impact of

microaggressive stress visible to the player, and can help them develop strategies for faster client

promotion.

While the model of bias in the game affects how quickly clients get promoted, players

still retain choice and agency. The model constrains the player's strategy, but does not determine

it. For example, a player might carefully place discriminated-against clients in positions where they have supportive members of their own group nearby, in order to maximize their chances of promotion. Alternately, the player might put clients from this group into jobs with the fewest peer connections, leaving the best jobs for the clients with the most long-term potential. Several other strategies are possible, as are combinations of these strategies in response to specific situations.

58

Figure 4. Peer reactions to a possible character placement in Advance

No single strategy is obviously virtuous; the player cannot solve the problems associated with this model of bias through social desirability analysis. The bias is located in the game's system, not in player actions, and cannot be changed by anything the player does. Clients from the dominant group will have a promotion advantage over clients from the non-dominant group.

The question is whether the player can turn this to their own advantage, or whether it will interfere with their goal of making money through client promotion.

The second model of discrimination used in Advance is bias in perceptions of competence (Valian, 1999). In our society, people unconsciously discount the contributions of women and people of color. Several classic experiments with resumes demonstrate this effect. A white name on a resume generates 50% more callbacks than a black name on the same resume

(Bertrand & Mullainathan, 2003). Similarly, faculty were more likely to hire male than female job candidates, when the only difference between the two was the name on the resume (Moss-

59 Racusin, Dovidio, Brescoll, Graham, & Handelsman, 2012; Steinpreis, 1999). Even the

perceived value of the resume fluctuates based on gender. Uhlmann and Cohen presented

subjects with two candidates, one male and one female. The candidates were randomly provided

with backgrounds: one had an excellent educational background and one had excellent

experience. Subjects claimed the job required whichever strength was possessed by the male

candidate, while arguing it was simply the right set of skills for the job (Uhlmann & Cohen,

2005).

This type of bias is unconscious, as many types of systemic bias are. Additionally, in hierarchical environments, this type of bias creates cumulative advantage. People who display

competence are given better opportunities and more resources. If women are repeatedly seen as

less competent than their male peers, and therefore receive fewer opportunities to display

competence, they will fall further and further behind.

Does this type of bias have a negative impact on women and minority populations?

Certainly. When evaluations are made race- and gender-blind, women and people of color

perform better. For example, when orchestra auditions were conducted behind a curtain,

women's acceptance rates soared (Goldin & Rouse, 1997). In short, the saying "Work twice as

hard to get half as far" is not far off. For a given level of recognition, women and people of color

must work harder and perform better than men or white people need to.

When modeling bias in perceptions of competence in Advance, we must capture the sense

that the discriminated-against group receives fewer rewards for the same work - or, as we will

see, works harder for the same level of reward. To do this, we must develop a model for client

competence and for the difficulty of a given job. That model can then be adjusted on a per-group

basis to reflect the reality of discrimination.

60 In Advance, each client has three statistics: talent, creativity, and charisma. Each client has a score between zero and twenty-five in each of these statistics. A new client begins with low scores in all three skills. Each time the client is promoted, they receive a bonus to one or more statistics. Additionally, the player may pay to upgrade any client's statistics. The higher the client's level, the higher the cost to upgrade it further.

Each job in Advance has a minimum required level for each of the three character statistics. The client must meet or exceed the required level in all three in order to be eligible for the job. If the client fails in any of the three, they cannot be placed into that job. If the player attempts to do so, nothing happens.

The game makes this information visually salient. When the player has a client selected, that client's satatistics are visible. When the player selects a job, that job's requirements become visible. When both are selected, both the client's statistics and the job's requirements are shown, and any skills the client must improve are highlighted. This makes it easy for the player to see whether a financial investment would be required for that client to take the job.

Given this model of competence in a corporate environment, how is bias modeled? In this game, job requirements are raised for members of the discriminated-against group. For example, if a given job required a creativity score of three, it would be a four for members of the discriminated-against group. By raising the bar for members of this group, rather than artificially lowering their scores, the player can see that clients from this group have the same underlying capacities as their other clients. It is only the expectations of them that have changed.

The visual design of the game makes it easy for players to see the difference between requirements for members of different groups. Early play-testing revealed that players had

61 difficulty remembering subtle differences in job requirements, which made it impossible for

them to compare requirements between groups. Players can now keep one job selected while

switching from client to client. This allows them to easily compare the job requirements for

different characters.

This model of bias adds a new strategic dimension to the game. Players can still choose

to put any client with high enough skill scores into any job. They must still decide whether to

spend their time searching for the most efficient client-job match, or whether to spend money

liberally to match clients with jobs quickly. The only difference? On average, it will take them

longer to find an existing client-job match for their discriminated-against clients, and it will cost

them more to improve these clients for a quick match. Successful players must incorporate this

reality into their play strategy.

The true impact of bias, however, becomes apparent as it intersects with the game's

hierarchical structure. To reach the higher, more lucrative levels of the game, the client must be

promoted. However, when this promotion occurs, the client does not automatically receive a job

at the higher level. Instead, the player must once again solve the problem of placement - with

bigger risks and better rewards. The financial and temporal challenges of placing a

discriminated-against client are repeated and magnified, allowing players to see biased outcomes

more clearly as each level is unlocked.

The anomaly in this game is that the source of the bias is located in the game system itself

– it cannot be attributed to any single character. The player observes differential outcomes, for example that some characters are harder to place than others, or that some characters are promoted more slowly. However, the game does not allow for agentic explanations. There is simply no individual who is making intentionally biased choices. Although in the long run

62 players are expected to incorporate both agentic and systemic explanations for bias into their explanatory repertoire, the game does not give players the easy alternative of a familiar approach.

It is important to note that players do not have the opportunity to change the bias in the game. The specific group being discriminated against is randomly selected at the beginning of each playthrough and cannot be changed by the player. The player also cannot change the way that bias operates in the game system. While this runs the risk of suggesting that the game and its designer agree that bias is normal and expected, this is also authentic to the way that systemic bias operates. A biased individual can have a change of heart, or can be removed from their position if need be. Bias located in the system, however, can only be addressed at the systemic level. The inability to directly affect the biased system is part of the game’s anomaly.

The challenge is to confront players with this anomaly as effectively as possible without further reifying discrimination – and to learn something about how different strategies for doing so function.

Reward System Design The pedagogical goal of Advance is to encourage players to notice the bias in the game system and respond to it with new play strategies, which may expand their attributions of the origins of bias and change their attitudes toward biased outcomes. However clever the game's model of bias, it is not useful if players do not engage with it. For this reason, one of the primary research goals of this project is to investigate the impact of reward systems on the player's engagement with the game's model of bias.

63 It is important to clarify what reward systems mean in this context. Discussion of rewards often ends in a discussion of extrinsic and intrinsic motivation, which are two ways of understanding what motivates people to perform a particular task (e.g. Deci & Ryan, 2000; Deci,

Koestner, & Ryan, 1999; Deci, Koestner, & Ryan, 2001). According to this theory, intrinsic motivation is the motivation to perform a task because of interest or enjoyment, while extrinsic motivation is the desire to perform the task in order to obtain some outcome or object. For example, a child who reads a book because they are interested in the story is intrinsically motivated, while a child who reads a book because they want to get a good grade is extrinsically motivated.

There are extensive debates about the virtues of intrinsic and extrinsic motivation, and particularly about the role of external rewards for getting learners to engage with a task.

Significant evidence suggests that providing desirable rewards lowers intrinsic motivation to engage with tasks (Deci, Koestner, & Ryan, 1999; Deci, Koestner, & Ryan, 2001). On the other hand, positive reinforcement can be effective at shaping human behavior with external rewards.

For example, behavior change has been demonstrated in token economies, where people receive tokens (markers of value which can be redeemed for a variety of concrete rewards, from special food to computer time) in return for good behavior (Martin & Pear, 1999). In fact, one can frame much of American society as behavior that is shaped with external rewards in the form of money.

Motivation, however, is contextual. The same person may engage in a particular activity with different motivations at different times. This is a concept Deterding (2011) explains as

“situated motivational affordances.” Within a given context, we can analyze what activities

64 people can engage in to address their motivational needs, compared to the other options available to them in that particular situation.

Recognizing that the context of this analysis is a game, we can avoid the intrinsic versus extrinsic comparison. Advance is designed to be played as a leisure activity, where players engage voluntarily with the game on their own computers in their own time. It is true that when games become mandatory, players' response to them changes for the worse (Heeter et. al., 2011).

However, this study does not attempt to motivate people to like the game, nor to obligate them to play. One group of subjects was drawn from voluntary web-based recruitment of casual gamers.

Another group was drawn from the microtask service Mechanical Turk, where subjects can choose to complete small jobs in return for payment. Although subjects in this group are being paid to play, they have still chosen to spend their free time using Mechanical Turk, and they have selected the game from many thousands of available tasks. As we will see later in this dissertation, there are differences between these two groups of players, but we can still consider both of them as freely choosing to engage with the game.

Games provide their own contextual frame within which meanings are made, which is often referred to as the “magic circle” (Huizinga, 1950; Goffman, 1974). Although this contextual frame is not inviolable (Copier, 2007; Consalvo, 2009), games do provide interpretive and imaginative frames within which a stick can become a horse, a sword can symbolize power, or green pigs can be a deadly enemy. Within these frames, games present challenges to the player. As Suits (2005) points out, the essence of playful challenges is that they are unnecessary and inefficient. For example, the rules of basketball make it less efficient, not more efficient, to put the ball through the hoop. Players expect games to set goals that are difficult but possible, and to require them to complete those goals in ways that are deliberately unproductive. In return,

65 players work to develop skills and strategies to overcome the challenges and accomplish their goals.

Given this context, we frame our reward system not as a motivator to play the game, but rather as motivators for particular choices and strategies within the game. The questions to ask are not “What would motivate a player to achieve this goal,” but rather “How does this goal relate to the game's system? What capacities does the player have to achieve it? What strategies does it encourage the player to develop? What behavior does it support or undermine?”

We do not expect to understand the precise motivation for players to approach the game, what alternative activities they have available, or what contextual lack they are attempting to address. For example, players may approach a game like Advance, which addresses the rather serious subject of racial and gender bias, with the goal of engaging with the subject in a safe way. Alternately, players might approach such a game playfully, hoping for the chance to turn the world as they know it on its head. In the study as designed, we cannot know whether players are approaching a game for a dose of reality or for a dose of escape.

What we can do, however, is to randomly assign players to experimental conditions. In order to accomplish this, three different versions of Advance were created and compared. While we do not know the exact motivations for people in playing the game in the first place, we can hold those motivations constant across the three game conditions, since we are randomly assigning players to groups; similarly, we can hold them constant across all game conditions and a control group. By comparing different types of rewards, therefore, we can look at how specific design patterns for in-game rewards affect the choices players make, the strategies they develop, and the impact on their attention to the game's model and its anomalous data.

66 In all three reward conditions, the game explicitly challenges players to guess the bias

shown by the company the player is working with. This goal encourages players to try to figure

out the bias by exploring and experimenting with the game's system, which confronts them with

the anomalous data in the game. The assessment mechanic used to determine whether they have

successfully done so is an explicit one. Players are asked to select the group they think is being

discriminated against from a multiple-choice list. We chose an explicit assessment mechanic in part to explore a common assessment design pattern in serious games – being explicitly asked about the content the player is supposed to learn. We can therefore investigate the most effective ways to use (and avoid) this pattern.

Another assessment design issue was salience. To what extent should guessing the bias be a required part of play? The bias guess system could be made mandatory – but requiring participation would change the motivational landscape (Heeter et. al., 2011). On the other hand, for research purposes, players should not completely overlook the bias guessing process. As a compromise position, Advance gives players explicit instructions to guess the bias in the tutorial, and uses a flashing button to draw their attention to the bias guess interface during play.

However, there is no penalty for not guessing the bias. In other words, it offers money for succeeding, but the player loses nothing if they avoid it; neither are players obligated to make a guess by in-game constraints.

Finally, the assessment design considered how to incentivize the appropriate behavior

(experimentation) rather than inappropriate behavior (guessing). Here we rely on the psychological principle of loss aversion (Kahneman & Tversky, 1991). When players are offered a reward for guessing the bias, they are told what reward they will get for a correct guess.

However, if they guess incorrectly, they will receive a lesser reward. This gives players the

67 freedom to guess if they don't know who is being discriminated against, and doesn't force them to get things right the first time; however, the game implies that they already have that great reward and they just need to lock it in by guessing correctly.

Within these broader constraints of assessment design, three types of reward were constructed. However, in order to understand them, we must first understand the underlying resources used to generate them.

Money is the most visible resource in Advance. It helps create both positive and negative feedback cycles during play. If the player has a lot of money, they can afford to upgrade their clients liberally. Upgraded clients are easier to place into jobs, earning the player even more money, and affording the player more freedom of choice in their in-game actions. However, players must also regularly pay in-game expenses. Players must earn money faster than they lose it; if a player runs out of money entirely, they immediately lose the game. Money therefore provides players with in-game opportunities, while simultaneously serving as a hedge against disaster. Taken together, these factors make money valuable within the frame of the game.

Money also serves as a signaling function. Players receive money for engaging in the target behavior, namely placing clients into jobs. At the end of the game, the player's score is calculated based on how much money they have. Money, therefore, tells the player how well they are doing at the game. A player who has a lot of money can, by definition, be confident that their strategies and techniques are successful. A player who has very little money is either playing poorly, or taking a significant risk for a later payoff.

Another resource in Advance is information. When players understand the way the game system works, they can design strategies to take advantage of it. For example, if a particular

68 player group is being discriminated against, the player can make an informed decision about how much time and money to invest in members of that group. They may choose to invest more time, because they know it takes more effort to place characters from that group successfully, or they may choose to ignore the character entirely for the same reason. In both cases, however, the information lets the player make better judgments about how to behave in the game, and gives them more insight into how to meet the game's challenges.

The reward systems created for Advance use information and money in different ways to incentivize engagement with the bias guess system, based on common design patterns for rewards in games (Bjork & Holopainen, 2004).

In the informational game condition, when players guess what the bias is, they learn whether they were right or wrong. The benefit of a correct guess is, in this condition, purely informational. As we have seen, characters who are being discriminated against cost more money to place, and are promoted more slowly than their peers. If the player recognizes this, they can develop strategies to compensate, which will earn them more money in the long run.

These strategies may be supportive of these characters, such as seeking placements that are minimally stressful; the player may also develop strategies that involve giving these characters little support and removing them from the game as quickly as possible. In both cases, the bias information can help players play better, because it helps them figure out how to allocate their time and effort.

In the financial game condition, players are rewarded for a correct guess with confirmation, but also with a one-time award of money. This monetary reward is worth many times as much as placing a single character, making it an appealing way for the player to earn money. The player is incentivized to identify the bias rather than simply guess at it, because the

69 reward amount decreases with each wrong guess. However, the player does not need to do anything with their knowledge of the bias beyond displaying it. The knowledge can remain entirely inert. Of course, the player may choose to apply it to strategy development, as in the first condition. However, there is no particular incentive for the player to do so.

Finally, in the generative game condition, players receive both confirmation of their hypothesis and an opportunity to earn more money. However, if they wish to follow up on the opportunity to earn more, they must develop strategies for managing the bias in the game.

Specifically, the player receives a significant monetary bonus for each member of the discriminated-against group they place. As with the explicit reward, guessing is disincentivized by reducing the bonus for each incorrect guess. Unlike the previous case, however, the player receives no reward unless they act on the knowledge they have gained. In order to receive the bonus, the player must work out strategies of some kind for placing discriminated-against characters, which requires them to engage with the game's model in a new way. Characters from the discriminated-against group now provide an opportunity for extra profit, rather than requiring an extra investment of time and money. Players must reconsider their strategies in light of this new information. Additionally, players may now feel invested in placing these individuals in order to receive their "rightful" payoff, which they have put effort into acquiring.

Because the game's bias is different every time, players must repeatedly go through the bias-discovery process, no matter which of the three reward systems they are working with. The process of bias-discovery is precisely what PMAD theory suggests players should engage with, because we propose that engagement with this process will in turn drive measurable changes to their ideas about racial and gender bias, and to their attitudes toward it. By looking at the

70 differences between the reward systems, we can understand how different reward mechanics may drive players to engage with the bias discovery process in different ways.

This chapter has drawn together a wide range of theoretical and empirical work to define

PMAD theory, an approach to creating games that help shift players' conceptions of complex systems. It also described Advance, a game built on PMAD theory to help players change their attributions and attitudes around racial and gender bias. In the next chapter, we describe the research methods used to empirically test the effectiveness of the game and to look at differences between the three versions of the game.

71 Chapter 4: Methods This project investigates three questions involving PMAD (playable model – anomalous

data) design theory. First, can a PMAD-based game that models systemic racial and gender bias

change players' likelihood of using systemic explanations for incidents of racial and gender bias?

How does it compare to more traditional methods of education, such as reading a text? Does the

reward structure (informational, financial, or generative) in such a game affect the likelihood of

the player using systemic explanations? Second, can such a game change players' attitudes about

racial and gender bias, and how does it compare to reading? Do differences in reward structure

help players shift their attitudes? Finally, how do differences in game-play behaviors affect

players' attributions and attitudes?

In this study, data is collected from subjects about their likelihood of using systemic

rather than agentic explanations for incidents of racial and gender bias, and about their attitudes

toward racism and sexism. The study uses a Solomon eight-group research design; half the subjects have data collected both before and after the experimental intervention (playing one of three versions of the custom-designed PMAD game Advance, or reading a control text), while half receive only a post-test to control for possible priming effects of the pre-test questions. This design allows us to investigate how players' attributions and attitudes change under the four different conditions.

By looking at pre-test to post-test change in the likelihood of using a systemic attribution, we can begin to understand whether PMAD games can help people adopt a systemic mindset. By understanding the difference in impact between playing a PMAD game and reading a piece of text on the same topic, we can further refine how to use the PMAD design principles to achieve conceptual change on a socially charged issue.

Similarly, we can look at pre-to-post change on measures of racially- and gender-biased

72 attitudes to understand whether PMAD games can help change players' attitudes toward racism

and sexism; by comparing the impact of the game to the impact of text, we can work toward

making PMAD games more effective.

Finally, we can compare both attitude and attribution measures across the three different

game conditions, in order to understand the comparative effect of different reward systems

within a PMAD game. Finding differences between reward systems tells us how to make the

most effective design choices for conceptual change using PMAD games as an intervention, but

it also indicates new research questions about the impact of these reward types in other kinds of

games for impact. Additionally, designers can use this work to choose appropriate reward

systems to focus player attention within complex game models.

Research Questions As discussed above, this project investigates three questions. First, can a PMAD

(playable model – anomalous data) game that models systemic racial and gender bias change

players' attribution styles around issues of racism and sexism, and how does it compare to

reading a text? Second, can such a game change players' attitudes about racial and gender bias,

and how does it compare to a text-based intervention? Finally, does the reward type

(informational, financial, or generative) in such a game affect either the likelihood of the player

using systemic explanations or of shifting their attitudes?

We begin by considering the question of attribution. As outlined in earlier chapters, we

believe that games designed using the PMAD principles may be able to induce conceptual

change. Players must engage deeply with anomalous data and therefore with the game's model of the target domain. We therefore hypothesize that if a game presents an alternate model of racism and sexism, players will show conceptual change related to those models after playing the game,

73 and that it will differ from the change experienced after reading text.

First, we investigate the differences in impact between the three game conditions and the

control activity (reading a piece of text) on the player's likelihood of using systemic explanations

for racism and sexism.

Q1: Controlling for pre-test scores, are there differences in attribution test

scores for race and gender across the four study conditions?

This analysis looks at the overall impact of the game. However, we know that all players

do not play with equal attention, engagement or success. We investigate whether successful game play will affect players more than simple exposure to game play, and whether there are differences in attribution post-test scores based on other game performance measures.

Q2: Are there associations between in-game measures (such as player score,

number of characters placed, and number of game plays) and attribution test

scores?

In order to make claims about the game's impact, it is important to understand not only whether the game impacts players, but how. As outlined in the game design portion of this paper, we believe that the reward system for bias detection can be manipulated to produce different effects in players. Comparing a “financial” reward system of the sort often found in educational games to a reward system that provides information for improved play and to a reward system which encourages strategy change after bias detection, we expect to find differences in the

74 effectiveness of the game as an intervention on players' attributions of racial and gender disparities. As above, we investigate whether in-game performance differences correlated with the post-test measures across the reward conditions.

We investigate four different conditions for the bias guess hypotheses, as not all players interacted with the bias guess system. Players who did not interact with the bias guess system never learned which reward condition they were in or even that they would be rewarded for guessing the bias, hence they must be treated as a separate group for purposes of understanding the differences between reward conditions. The groups analyzed, therefore, are as follows: players who did not interact with the bias guess system (no guess), players who interacted with the bias guess system in the informational condition (informational guess), players who interacted with the bias guess system in the financial condition (financial guess), and players who interacted with the bias guess system in the generative condition (generative guess).

Q3: For game players, are there differences in measures of systemic

understanding of racism and sexism across bias guess conditions?

Q4: For game players, are there differences in game performance measures

across bias guess conditions?

Additionally, games have a powerful affective impact on players, even when they are not specifically designed to do so. We therefore hypothesize that game-play will have at least a temporary effect on players' attitudes about race and gender. Because bias serves as an obstacle for the player, we predict that players will be more sympathetic to the struggles of discriminated-

75 against groups in American society. Specifically, we hypothesize that players will perform differently on existing measures of racism and sexism after playing the game, and that the impact of the game will differ from the impact of reading text. As with the attribution test, we investigate the impact of game-play on players' attitudes about race and gender.

Q5: Controlling for pre-test scores, are there differences in attitude test scores

for race and gender across the four study conditions?

Q6: Are there associations between in-game measures (such as player score,

number of characters placed, and number of game plays) and attitude test

scores?

We also investigate the differences between reward conditions for players attitudes about racism and sexism. As above, we include players who did not guess the bias as a separate group, and compare them to players who attempted to guess the bias in each of the three reward conditions.

Q7: For game players, are there differences in measures of attitudes toward

racism and sexism across conditions?

As we investigate each of these seven questions, we control for player race and gender.

There are many questions one could ask using the player's race and gender, such as the impact of an in-game racial or gender bias that matches (or does not match) the player's race or gender.

76 However, for the purposes of this project, we limit our analysis to whether there are differences

across race and/or gender groups.

Procedures This study compares the effect of three different versions of the Advance PMAD game, and of a control text, on players' conceptions of racism and sexism. It uses a Solomon eight- group design across the three game conditions (informational reward, financial reward, and generative reward) and the control condition. This design allows us to look at pre-test to post-test

differences, while parceling out the potential impact of pre-test priming on post-test scores. It

also allows us to examine differences between the three game conditions, and between the

treatment condition and the control.

Subjects who arrived at the Advance site were presented with IRB-approved study

information and asked whether they consented to participate. Individuals who chose not to

participate could still play the game, but their in-game data was not collected and they did not

receive any pre-test or post-test material associated with the study.

Upon obtaining , study procedures followed the procedure summarized in Figure

5.

Consenting subjects were randomly assigned to a pre-test / no pre-test condition. Subjects who were assigned to the pre-test condition completed the pre-test before playing the game; subjects assigned to the no pre-test condition moved immediately to the next step.

The pre-test consisted of the Attribution Test, the Symbolic Racism Scale and the Modern

Sexism scale. The tests were presented in a random order, and the order of questions within each test was randomized.

When an attribution pre-test was generated, five of the ten test questions were randomly

77 chosen to use the race version of the question. The other five questions used the gender version of the same question. The order of questions was randomized, and the order of answers was randomized for each question.

The Symbolic Racism and Modern Sexism scales were administered as per the test instructions. However, the Likert scales in the two tests as originally designed used opposite codings; the Symbolic Racism scale uses 1 to indicate Strongly Agree (Henry & Sears, 2002), while the Modern Sexism scale uses 1 to indicate Strongly Disagree (Swim et. al., 1995). For consistency, the Modern Sexism scale's Likert scales were flipped when presented to the user.

After completing or skipping the pre-test as per their assigned condition, each subject was randomly assigned either to the control group, or to one of three bias detection reward conditions: informational rewards, financial rewards and generative rewards.

In the control condition, players were presented with a brief text about the two core concepts addressed by the game: microaggressions and the cumulative impact of small differences. The former was adapted from the classic text on microaggressions by Sue (2010); the latter was adapted from Valian's work on gender schemas (1999). After reading the text, players were asked a factual question about the text to test their understanding, and asked how much they enjoyed reading the text. (See Appendix G for the control text, and Appendix H for the check question details.) Players were then taken to the post-test.

78

Figure 5. Study overview flowchart

In the game conditions, the player first played through a short tutorial. The tutorial explained four key game concepts: character placement, promotion, upgrades, and changing levels. The tutorial asked players to practice performing each task, and only began the next tutorial segment when the player succeeded. However, players could end the tutorial at any time and begin the game.

The player then played the game in the appropriate reward mode, either until they won by surviving for five minutes or until they lost the game by running out of money.

At the end of the game, the player was invited to "End Play" or "Play Again." The first

79 time the player encountered this screen, the "End Play" button was grayed out and inaccessible, since subjects must play twice before beginning the post-test. On their second time through the game, a new randomly chosen in-game bias was selected; the only constraint is that it could not be the same as the bias chosen in the first game. However, the player continued to play the game in the same reward mode. After playing a second time, the player had the option to select either

"Play Again" or "End Play." Players could select "Play Again" as many times as they liked, receiving a new bias each time but always remaining in the same treatment condition. When the player selected "End Play" they were asked a factual question confirming their understanding of game content, and asked about how much they enjoyed the game.

After completing the game comprehension and feedback questions, all players were sent to the post-test. The post-test consisted of the Attribution Test, the Symbolic Racism Scale

(Henry & Sears, 2002), and the Modern Sexism Scale (Swim et. al., 1995). As in the pre-test, the tests and questions were presented in a randomized order.

If a pre-test was administered, the attribution post-test included the ten questions not used in the pre-test (five race questions, five gender questions). If the player did not receive a pre-test, a ten-question attribution test was generated as for the pre-test: five randomly chosen questions used scenarios featuring race, and the other five used scenarios involving gender. The order of questions and the order of answers within each question was randomized. The Symbolic Racism and Modern Sexism scales were presented again in full.

At the end of the post-test, demographic data was collected from all participants.

After completing the post-test, web-recruited subjects had the option to submit an email address if they wanted to be entered in the raffle for an iPad. Mechanical Turk subjects were given a unique, randomly-generated code to submit to the Mechanical Turk site in order to be

80 paid for their participation.

Finally, all subjects were presented with links to organizations promoting racial and

gender equality, as required by the IRB.

At this point, players in the three game conditions were done with the experiment; as they had not yet had a chance to play, players in the control condition had the option of playing the game once the study was complete.

Subjects The target population for the study was "adult American players of online casual games."

As such, web-based recruitment took place on sites devoted to gaming, social media

communities related to gaming, and email lists of game-players. Potential subjects were provided

with a link to follow if they wished to participate in the study. Subjects were also informed that if

they completed the study, they would have the opportunity to enter a drawing to win an iPad.

In addition to the open online recruitment process, an additional subject pool was

recruited from Mechanical Turk, Amazon's microtask assignment site. In addition to the

logistical benefits of the service, the Mechanical Turk group serves to control for .

Recruiting through gaming communities is a snowball sample; the reach of Mechanical Turk is

broader. The former group is composed entirely of self-identified gamers who participate in

online gaming communities, while the latter group may include self-identified gamers as well as

participants who do not primarily see themselves as gamers. Using both data sources allowed us

to compare two different sources of potential players. Subjects who completed the entire study

were paid $5 for their time.

During the consent process, subjects were asked to provide their age in the demographic

section of the survey. If subjects consented to the study, but later reported an age under eighteen,

81 their data was discarded.

Four hundred valid responses were required to achieve a power over .8, given a two-way

ANOVA model with game condition (control, informational, financial, and generative) and pre-

test condition (present, absent) as independent factors, and assuming a “small” effect size

(Cohen's d of .2). Recruitment therefore continued until over four hundred valid responses were

recorded.

A total of 703 subjects began the study; 72 completed the study, but provided invalid data

for the purposes of this study; 412 valid responses were acquired. See Table 2 for details.

Table 2

Subjects by recruitment source and treatment condition Experimental Control (Game-task) (On-screen Text-task) Mechanical Turk Web-recruited Mechanical Turk Web-recruited Pretest* Yes No Yes No Yes No Yes No iReward n = 30 n = 20 n = 25 n = 18 fReward n = 24 n = 15 n = 28 n = 14 gReward n = 19 n = 26 n = 26 n = 24 Text n = 26 n = 31 n = 43 n = 42 * Pretests were: Modern Sexism Test, Symbolic Racism Test, Attribution Test

Instruments The game intervention is a custom-designed Flash-based online game built using the

PMAD design principles (Table 3), which models a systemic (but heavily simplified) approach to racism and sexism. Players take the role of a recruiter trying to place clients into biased organizations, and are encouraged to understand and manipulate the game's bias in order to maximize their score.

The study uses multiple instruments to collect different types of data before, after and

82 during game-play. Specifically, the study collects data on in-game player behavior, on player attributions of racism and sexism, on player attitudes about race and gender, and on player demographics.

Table 3

PMAD design principles 1 The game system models the relevant domain. 2 Player actions affect, and are affected by, the model. 3 Players receive feedback about the impacts of their actions as they relate to the model. 4 The game goals point players toward model conflict. 5 Players can experiment with the game's model. 6 Players must figure out rules and strategies for themselves.

As described above, consenting subjects were randomly assigned to the pre-test or no pre-test condition. The pre-test consisted of the Racism and Sexism Attribution Tests, the

Modern Sexism Scale, and the Symbolic Racism Scale.

Attribution tests. Because this project investigates conceptual change, we examine

subjects' underlying conceptual models of how racism and sexism function. Specifically, we

hope to uncover whether subjects see racism and sexism as direct processes, isolated episodes

resulting from the actions of individuals, or as systemic processes that are created by multiple,

often simultaneous, factors in the absence of a single agent (Chi & Roscoe, 2002; Chi, 2008;

Johnson, 2002; Meadows, 2008; Forrester, 1961).

Existing measures of conceptual change often analyze concept maps to understand how

ideas are related to each other (Markham, Mintzes, & Jones, 1994; Wallace & Mintzes, 1990).

However, because this project aimed for a large subject population, it required a scalable solution

that can be deployed online. Additionally, we are primarily interested how likely players are to

83 attribute discriminatory behavior to individual or systemic causes, not in the details of the

underlying models. It seemed possible to develop a simpler test that is also more amenable to

automated analysis.

We developed a "Racism Attribution Test" and a "Sexism Attribution Test." Each of these

tests measures whether subjects are more likely to use a direct-process model for understanding discrimination, or whether they are more likely to use a systemic model. As discussed later, each test was verified by domain experts to ensure that it assesses the target area accurately.

Each test contains ten questions, each of which presents a brief scenario in which unequal

outcomes occur. Test questions are paired; for each question in the Racism Attribution Test, there

is a parallel scenario in the Sexism Attribution Test, and vice versa. The Sexism Attribution Test

can be found in Appendix B, while the Racism Attribution Test can be found in Appendix C.

For purposes of this study, characters who benefit from this inequality are white or male;

characters who are discriminated against are non-white or female. Subjects are then asked to

choose the most likely explanation of the scenario.

To avoid conflation of attitudes about prejudice with attributions of prejudice, subjects

are provided with four explanatory options. Along one axis, the responses vary based on attribution, while along the other axis, responses vary based on attitude. For each question, therefore, the responses are framed as shown in Table 4.

84 Table 4

Attribution test answer categories

Unfair Outcome Fair Outcome Individual Explanation "Individual choices resulted in "Individual choices resulted an unfair outcome." in an appropriate outcome."

Systemic Explanation "Systemic factors resulted in an "Systemic factors resulted in unfair outcome." an appropriate outcome."

A sample question runs as follows:

The anthology "Best Short Stories By New Writers" is compiled by a single editor.

The editor chooses stories for the anthology from obscure small-press literary

magazines.

This year's anthology contains only stories by white writers.

Which of the following explanations would you consider most likely to be true?

1. The editor was racist in selecting the stories.

2. White authors are better at writing stories that appeal to an audience of

sophisticated, literary readers.

3. The editor selected the best stories without considering race.

4. Non-white authors are underrepresented in small-press literary magazines.

Fourteen questions were developed and reviewed by domain experts to ensure they appropriately represented systemic and individual explanations of racism and sexism. Next, a

85 pool of subjects was recruited to perform a sorting task on the answers using Mechanical Turk.

After training on the definitions of each category, subjects were asked to identify which answer

fell into which category. Subjects were able to assign as many answers to each category as they

wanted, so the ability to correctly assign answers to categories demonstrates that subjects were

indeed able to discriminate between systemic and agentic outcomes.

Binomial analyses were performed to see whether players were able to correctly assign answers to “systemic” and “agentic” categories at a statistically significant level. For each of the

112 answers (four answers to each of fourteen race questions and fourteen gender questions), we conducted a binomial test, using “answer is systemic” and “answer is agentic” as the categories.

In order to pass the validation stage, all four answers to the question had to be correctly assigned to the systemic or agentic categories with p < .05. Additionally, both the race and gender versions of the question had to pass in order to preserve parallelism between the tests.

Ten questions passed the validation stage and were integrated into the final test. See

Appendix D for the data table from the validation study.

As described in the procedures section, when an attribution pre-test was generated, five of the ten test questions were randomly chosen to use the race version of the question. The other five questions used the gender version of the same question. The order of questions was randomized, and the order of answers was randomized for each question. The answer chosen by the subject was stored for analysis.

Attitude analysis. In addition to understanding players' attributions of racism and sexism, we also hope to examine their attitudes about race and gender. While it might be possible to extract data on this from the attribution test, due to its four-cell design, we prefer to use existing measures when they fit the needs of the project.

86 As it becomes less acceptable to express overt racism and sexism in American society,

subjects become more likely to respond in a socially appropriate way to measures of explicit or

"old-fashioned" racism and sexism (Schuman, Steeh, Bobo, & Krysan, 1998; Swim, Aikin, Hall,

& Hunter, 1995). These measures of explicit racism and sexism no longer correlate meaningfully

with racist and sexist attitudes.

In response, researchers have developed scales that measure implicit or "modern" racism

and sexism. These measures correlate highly with discriminatory attitudes and behavior, and do

not evoke social desirability bias. The Symbolic Racism Scale, despite some debate over what

precisely it measures, has been validated as a measure of racial bias (Henry & Sears, 2007;

Sears, 2010). The Modern Sexism Scale, similarly, has been validated as an appropriate measure

of gender bias (Swim et. al., 1995). The Modern Sexism Scale can be found in Appendix E of

this document, and the Symbolic Racism Scale is included as Appendix F.

The Modern Sexism Scale and the Symbolic Racism Scale were administered to all

subjects as part of the pre-test. As described in the procedures section, the Likert scales in the

two tests as originally designed used opposite codings; the Symbolic Racism scale uses 1 to

indicate Strongly Agree (Henry & Sears, 2002), while the Modern Sexism scale uses 1 to

indicate Strongly Disagree (Swim et. al., 1995). For consistency, the Modern Sexism Scale's

Likerts were flipped when presented to the user.

Subjects who were assigned to one of the three game conditions had their in-game data

collected. Subjects in these conditions who received a pre-test began play after completing the

pre-test, while subjects who did not receive a pre-test began play immediately upon consenting to participate in the study.

In-game data collection. Advance tracks consenting players' in-game behavior using a

87 custom-developed Google App Engine data proxy. Every action the player takes, such as placing

a client or inspecting a job, is recorded and sent to a central database, from which it can be

downloaded and analyzed. This repository of in-game data can be used to follow a particular player's progress through the game, or to inspect player activity in the aggregate.

For the purposes of this project, we analyze the following in-game events, all of which contribute to the player's final score:

• A character enters the player's client list

• The player places a character into a job

• The player pays to upgrade a character

• A character is promoted or demoted

All relevant data is collected for each event. For example, when a character enters the player's client list, the game records the character's race and gender, the level the character is on, the character's name and statistics, and at what time the event occurred. Additionally, we collect data unique to each instance of the game, such as which version of the game the player received, what group was being discriminated against, at what time the player identified the game's bias, and the player's final score.

Taken together, this data can give us a picture of whether players are successful or unsuccessful in their play, and how effective they are at countering the bias in the system through skillful choices. In turn, we can correlate this data with more explicit instruments to see what play experiences best promote understanding.

After completing play, all subjects received a post-test. The post-test consisted of the

Racism and Sexism Attribution Tests, the Modern Sexism Scale, and the Symbolic Racism Scale.

As described in the procedures section, if a pre-test was administered, the attribution

88 post-test included the ten questions not used in the pre-test (five race questions, five gender

questions). If the player did not receive a pre-test, a ten-question attribution test was generated as

for the pre-test: five randomly chosen questions used scenarios featuring race, and the other five

used scenarios involving gender. The Modern Sexism and Symbolic Racism scales were re- administered in full.

Demographic data. After completing the post-test, all subjects were asked to provide demographic data. Subjects were asked to provide their race and gender. Additionally, they were asked to provide other demographic information that is known to influence levels of racial and gender bias: their age and what type of region they live in (urban, suburban, rural) (Schuman,

Steeh, Bobo, & Krysan, 1998). We collect this data in order to control for the impact of player race, gender, age, and living situation on attribution and attitude test performance. See Appendix

I for the demographic questions used in the study.

Data Processing Data was collected through a custom-designed Google App Engine data proxy connected to web forms. Player actions were recorded and transmitted to the proxy, which stored them in a database. Upon completion of the data collection period, all data was downloaded from the site for processing.

At this point, invalid data sets were removed from the analysis pool if any of the following three factors were present: if the subject reported an age under eighteen, if the subject reported a country of origin other than America or a first language other than English, or if the subject did not complete the study.

The subject pool was limited to eighteen and over in order to comply with standards of informed consent. Subjects were informed during the consent process that they must be over the

89 age of eighteen. If subjects consented and later reported an age under eighteen, their data was

immediately deleted without being processed or analyzed. Data from non-American subjects and subjects for whom English was not their first language was archived for future analysis. Data from non-American subjects was not used for the current study because some evidence suggests there may be cross-cultural differences in attribution styles (Choi, Nisbett, & Norenzayan, 1999;

Norenzayan, Choi, & Nisbett, 2001). A separate cross-cultural study would be required to properly analyze this data. The analysis was limited to English-speakers because both the game tutorial and the control condition required reading English text. Finally, incomplete data sets were used to test comparability between completers and non-completers. Data was sent to the data proxy for storage after the pre-test, after gameplay, and after the post-test. This allowed the collection of pre-test data for subjects who did not complete the study in its entirety. Non- completers' pre-test scores on the attribution and attitude tests were compared to the pre-test data of completers using t-tests to verify basic comparability between the groups.

Valid data sets, including pre-test data from non-completers, were processed for analysis as follows.

Attribution data. As noted in the design of the attribution test, each question on the attribution test has four answers: systemic-positive, systemic-negative, individual-positive and individual-negative. For each question, we record what answer the subject provided. Data is recorded separately for questions involving race and questions involving gender.

Given this basic data, we calculate a systemic-sexism score by combining the number of questions about gender answered with the systemic-positive or systemic-negative categories, taken together. We calculate a systemic-racism score in the same fashion, using the questions regarding race. For both race and gender, a higher number means the subject answered more

90 questions using systemic explanations.

Attitude data. The Symbolic Racism scale and Modern Sexism scale both contain

scoring instructions for how to convert subjects' responses into a single number representing

their total score. These instructions were used to calculate subjects' scores on these scales. The

instructions for the Modern Sexism Scale were adjusted to account for the reverse-coding of the

Likert scales. Given this adjustment, for both the Symbolic Racism scale and the Modern Sexism

scale, a higher score means less evidence of biased attitudes around race and gender.

In-game data. We conceptualize in-game data in two different ways. First, we use game- play data to measure player performance. Second, we examine whether in-game performance is affected by the reward condition. Since every action by the player is recorded, we must distill an enormous mass of data to determine appropriate metrics for addressing each of these questions.

In other words, we must operationalize player success so that we can measure it.

For the purpose of measuring the success of players, we are concerned with their ability to understand and manipulate the game's system. Fortunately, the game has a built-in metric for determining which players are best able to manipulate the game's system: the score. A high score reflects skillful play, while a low score reflects a poor understanding of the game or poor skills at manipulating it.

Unfortunately, the score displayed to the player will vary between the three versions of the game. In the financial-reward condition, the player receives a fixed bonus to their score for detecting the game's bias; in the generative-reward condition, players receive a score multiplier for placing certain kinds of characters; and in the informational-reward condition players receive no bonuses at all. This makes the raw score a useful measure for displaying feedback to an individual player, but not useful for comparing the performance of different players across

91 different versions of the game.

Instead, we calculate a version-independent score, which omits the special cases of the one-time bonus (in the financial-reward condition) and the score multiplier (in the generative- reward condition). The version-independent score is calculated during play and updated every time the player's actual score is updated. For example, it is updated when the player successfully places a character, when the player promotes a character, when they spend money on an upgrade, or when expenses are deducted from their score. Like the score visible to the player, this score is based on the player's success in placing characters and the level at which they place them.

However, because it omits all financial and generative rewards, it is calculated identically across all three versions of the game.

This version-independent score lets us determine whether there are performance differences between versions of the game. Because it distills player success at character placement to a single number, it allows us to see whether players are more successful at placing characters, on average, in a particular game version.

Using the version-independent score, we also analyze the impact of game version on a key in-game challenge – placing members of discriminated-against groups into jobs. For that reason, we also calculate the percentage of the version-independent score generated from placing each type of character. We can then calculate the percentage of the player's total score received from members of the discriminated-against group. These scores are normalized based on the total number of characters the player encountered from each group.

The version-independent score is used instead of the raw score in all score-related analyses in this study.

Demographic data. Players are asked to select one of three gender categories (male,

92 female, other). These three categories were the only options available; the player must choose

one from a drop-down menu. Players are asked to select one or more racial categories, which

include the four categories used in the game as well as additional categories from the United

States Census (Hume, Jones, & Ramire, 2010). Subjects could use the check-boxes to select

more than one racial identity. Finally, players are asked to provide their age, country of origin,

community type (urban, suburban, or rural), and first language.

Gender, age, and community type were used as-is for analysis. However, players were

grouped into three broad racial categories. All players who reported their racial identity as White,

and did not select any other racial categories in addition, were grouped into a White category.

Players who reported Black, Hispanic, or both were grouped into a Black and Hispanic category.

Finally, all remaining players were grouped. Black and Hispanic Americans are, broadly

speaking, disadvantaged by the racialized power structure of the , while White

Americans primarily benefit from it. We therefore choose these analytic categories to investigate

the impact of relative advantage and disadvantage on players' experiences with the game.

Data Analysis Before investigating the study's questions, three preliminary analyses were conducted.

First, we compared the web-recruited players to players recruited through Mechanical

Turk. In order to determine whether the two populations were equivalent, we compared their demographic characteristics using t-tests for the continuous dependent variables (age) and using chi-square tests for the categorical dependent variables (race, gender, community type). We also compared their scores on the pre-tests using a series of ANOVAs with player source as a fixed

factor and pretest scores as the dependent variables. We investigated whether the two groups win

at similar rates using a chi-square, testing player source against whether they won a game.

93 Finally, we examined whether the two groups score similarly on the test using an ANOVA with

group membership as a fixed independent factor and game score as the dependent variable.

As described in the results chapter, the web-recruited players and the Mechanical Turk players were not equivalent populations. All subsequent analyses were performed separately for the web-recruited and Mechanical Turk players.

Next, the pre-test scores of completers and non-completers were compared using a series of ANOVAs, with completion status as a fixed independent factor and pre-tests scores as the dependent factors. ANOVAs were conducted for the Modern Racism Scale, the Symbolic Racism

Scale, the Systemic Racism Test, and the Systemic Sexism Test.

Finally, we checked the possible impact of the pre-test on post-test performance, by comparing post-test means between players who received the pre-test and players who did not.

To achieve this, we conducted four ANCOVAs, using pre-test group membership as the independent fixed factor, participant race and gender as the covariates (modeled as independent fixed factors), and test scores as the dependent variables as per above. This analysis demonstrates whether the presence of the pre-test influenced the post-test outcomes.

Next, we investigated the questions laid out at the beginning of this chapter.

Q1: Controlling for pre-test scores, are there differences in attribution test

scores for race and gender across the four study conditions?

For research question 1, we examine within-player change. We test the hypothesis that there are score differences between groups on the attribution test, as opposed to the null hypothesis that no such differences exist. These hypotheses were tested using ANCOVAs with

94 group membership as an independent fixed factor, participant race and gender as covariates

(modeled as independent fixed factors), pre-test scores as a covariate, and post-test scores as the dependent variable. Contrasts are used to test for an overall difference between the three game conditions and the single control condition. This analysis was conducted for both the Systemic

Sexism and Systemic Racism measures.

Q2: Are there associations between in-game measures (such as player score,

number of characters placed, and number of game plays) and attribution test

scores?

For research question 2, we test the hypothesis that there is an association between in- game measures and attribution post-test scores, as opposed to the null hypothesis that no such associations exist. The hypotheses were tested using Spearman's rho, conducting a partial correlation that controls for pre-test score. This analysis was conducted for both the Systemic

Sexism and the Systemic Racism measures.

As part of understanding the effect of game performance, we also investigate the effect of player race and gender on in-game measures. We test the hypothesis that there are in-game performance differences between players of different races and/or genders, as opposed to the null hypothesis that no such differences exist. These hypotheses were tested using ANCO VAs with participant race and gender as independent fixed factors and game performance measures as the dependent variables.

Q3: For game players, are there differences in measures of systemic

95 understanding of racism and sexism across bias guess conditions?

For research question 3, we test the hypothesis that there are mean differences in attribution test scores between bias guess conditions, as opposed to the null hypothesis that no such differences exist. These hypotheses were tested using ANCOVAs with bias guess condition as an independent fixed factor, participant race and gender as covariates (modeled as independent fixed factors), pre-test scores as a covariate, and post-test scores as the dependent variable.

Contrasts were used to determine whether there was a difference between players who did and did not make a guess. This analysis was conducted for both the Systemic Sexism and Systemic

Racism measures.

Q4: For game players, are there differences in game performance measures

across bias guess conditions?

For research question 4, we test the hypothesis that there were in-game performance differences between bias guess conditions, as opposed to the null hypothesis that players performed identically across game conditions. These hypotheses were tested using ANCOVAs with bias guess condition as an independent fixed factor, participant race and gender as covariates (modeled as independent fixed factors), and in-game performance measures as the dependent variable. Contrasts were used to determine whether there was a difference between players who did and did not make a guess.

Q5: Controlling for pre-test scores, are there differences in attitude test scores

96 for race and gender across the four study conditions?

For research question 5, we test the hypothesis that there are mean differences in attitude

test scores between treatment conditions, as opposed to the null hypothesis that there are no

differences between treatment conditions. These hypotheses were tested using ANCOVAs with

group membership as an independent fixed factor, participant race and gender as covariates

(modeled as independent fixed factors), pre-test scores as a covariate, and post-test scores as the dependent variable. Contrasts were used to test for an overall difference between the three game conditions and the single control condition. This analysis was conducted for both the Modern

Sexism and Symbolic Racism measures.

Q6: Are there associations between in-game measures (such as player score,

number of characters placed, and number of game plays) and attitude test

scores?

For research question 6, we test the hypothesis that there is an association between in- game measures and attitude post-test scores, as opposed to the null hypothesis that there is no such association. The hypotheses were tested using Spearman's rho, conducting a partial correlation that controls for pre-test score. This analysis was conducted for both the Modern

Sexism and the Symbolic Racism measures.

Q7: For game players, are there differences in measures of attitudes toward

racism and sexism across conditions?

97

For research question 7, we test the hypothesis that there are mean differences on the

attitude tests between bias guess conditions, as opposed to the null hypothesis that no such

differences exist. These hypotheses were tested using ANCOVAs with bias guess condition as an

independent fixed factor, participant race and gender as covariates (modeled as independent

fixed factors), pre-test scores as a covariate, and post-test scores as the dependent variable.

Contrasts were used to determine whether there was a difference between players who did and

did not make a guess. This analysis was conducted for both the Modern Sexism and Symbolic

Racism measures.

Conclusion This project aims to determine the impact of targeted game-play on player conceptions of racism and sexism, and to explore the effect of different reward systems on such conceptions. It looks both at player attributions of racial and gender disparities, and at evidence of players' racist and sexist attitudes. The study described in this chapter allows us to address these questions.

Taken together, our first four hypotheses let us examine whether playing Advance changes players' likelihood of using systemic explanations for racism and sexism. We compare the overall effect of the game to the effect of reading a piece of text, which allows us to assess the overall effectiveness of the game compared to other possible interventions. Next, we investigate the impact of game-play on player attributions. Because we investigate an association between player performance measures and attribution change, we can investigate whether player behavior may be responsible for changes in player attributions. Finally, we look for differences in attribution style and in game performance between players who encountered the three reward systems in the game (informational, financial, and generative) and players who did not encounter

98 the reward system at all.

Our second set of hypotheses examine the same questions, but looking at player attitudes about race and gender rather than at attribution style.

Using data gathered from this study, we can draw conclusions about the overall effectiveness of the game, and about which reward systems function most effectively at achieving cognitive or affective change around issues of racism and sexism. In the next chapter, we turn to the results of the study and what they mean.

99 Chapter 5: Results

As described in the previous chapter, data was collected from two separate populations.

One population was recruited online, recruiting self-identified game players through affinity

groups and social networking. The other population was recruited through Amazon's Mechanical

Turk microtask service, who were able to select this particular task from a broad range of other

tasks and activities. The latter group may have included self-identified game players, but also reached a larger subject pool.

While both groups fit the broad profile of subjects for this study (English-speaking

Americans over 18 who play casual games), there may be differences between them.

Comparability between these groups was determined before conducting further analyses.

Player Source Analyses Given that players were drawn from different sources, we first investigated whether there were demographic differences between the samples. The two player sources were the web and

Mechanical Turk. Four demographic factors were examined: gender, race, age, and community type.

Possible gender differences were tested using a chi-square analysis of player source by player gender. No significant difference was found between the groups, χ2 (1,403) = 1.798, p

= .180 (Table J113).

Possible racial identity differences were tested using a chi-square analysis of player

source by player race, using the three analytic categories detailed in the previous chapter.

Significant differences were found between the two player groups, χ2 (2,412) = 10.939, p = .004.

13 For all data tables not interpolated in text, please see Appendix J. A separate table of contents is provided with the appendix to aid in referencing specific tables.

100 87% of Web-recruited players reported themselves as White, while only 77% of Mechanical Turk

players did the same (Table 5).

Table 5

Crosstabulation of player source and player race

Player Race χ² p Total

Black and Player Source White Hispanic Other

Web 194 5 22 10.94a .004* 221

Mechanical Turk 148 17 26 191

Total 342 22 48

a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 10.20.

* p≤ .005

An ANOVA was conducted on player age, with player source treated as a fixed

independent variable. There was a significant effect of player source on player age, F(1, 408) =

4.787, p = .029 (Table J4). As shown in Table J3, the average age of web-recruited players was

32.6 years (SD = 8.73), while the average age of Mechanical Turk players was 34.72 years (SD

= 10.927). However, the explanatory power of player source is negligible (η2 = .012).

Finally, possible differences in community type were tested using a chi-square analysis.

There was a significant difference in community type between the two groups, χ2 (2,412) =

7.540, p = .023. Subjects in the web group were more likely to live in rural areas, while subjects

in the Mechanical Turk group were more likely to live in suburban or urban environments (Table

6).

101 Table 6

Crosstabulation of player source and living area

Living Area χ² p

Player Source Rural Suburban Urban

Web 17 118 86 7.54 .023*

Mechanical Turk 31 97 63

Total 48 215 149

a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 22.25.

p ≤ .05

Taken together, the demographic analysis indicates that web-recruited subjects are younger, more likely to be white, and more likely to be rural than the Mechanical Turk subjects.

However, while these demographic differences potentially influence the factors we are interested in investigating, the study collected direct data about the pre-existing attitudes and attribution

styles of both groups of players. For this reason, pre-test scores were analyzed to see if there are

differences in performance between player groups.

Two attribution style pre-test scores were calculated: the Systemic Sexism score and the

Systemic Racism score. For each of these scores, an ANOVA was conducted with the score as

the dependent variable and player source as a fixed independent factor.

There was a significant effect of player source on Systemic Sexism pre-test scores, F(1,

219) = 50.803, p < .001. η2 = .132 (Table J7). Web-recruited players scored significantly higher

than Mechanical Turk players, indicating they are more likely to use systemic explanations for

sexism (Table 7).

102 Table 7 Systemic Sexism pretest means by player source

Player Source Mean SD N

Web 2.89 1.29 122

Mechanical Turk 1.93 1.16 99

Total 2.46 1.32 221

There was a significant effect of player source on Systemic Racism pre-test scores, F(1,

219) = 47.561, p < .001. η2 = .178 (Table J9). Web-recruited players scored higher than

Mechanical Turk players, indicating a greater likelihood of using systemic explanations for racial

disparities (Table 8).

Table 8 Systemic Racism pretest means by player source

Player Source Mean SD N

Web 3.08 1.28 122

Mechanical Turk 1.90 1.25 99

Total 2.55 1.40 221

Two attitude pre-test scores were calculated: the Modern Sexism score and the Symbolic

Racism score. For each of these scores, an ANOVA was conducted with the score as the

dependent variable and player source as an independent fixed factor.

There was a significant effect of player source on Modern Sexism pre-test scores, F(1,

219) = 45.075, p < .001, η2 = .171 (Table J11). Web-recruited subjects scored higher, on average, than the Mechanical Turk subjects (Table 9). This indicates that, prior to the intervention, web-

recruited players held less sexist attitudes than Mechanical Turk players did.

103 Table 9 Modern Sexism pretest means by player source

Player Source Mean SD N

Web 31.82 4.15 122

Mechanical Turk 27.43 5.56 99

Total 29.86 5.29 221

There was a significant effect of player source on Symbolic Racism pre-test scores, F(1,

219) = 95.392, p < .001, η2 = .303 (Table J13). Again, web-recruited players scored higher than

Mechanical Turk players, indicating that before the study web-recruited players held less racist

attitudes than Mechanical Turk players did (Table 10).

Table 10 Symbolic Racism pretest means by player source

Player Source Mean SD N

Web 26.28 4.38 122

Mechanical Turk 20.12 4.99 99

Total 23.52 5.57 221

These results indicate that, prior to the study, the web-recruited players were more likely to use systemic attribution styles and held less racist and sexist attitudes than the Mechanical

Turk players. The difference between player groups is largest for attitudes about racism, as expressed on the Symbolic Racism test, with a medium as opposed to a small effect size.

Although the demographics of the two groups might lead one to expect the reverse, it is empirically evident that the web-based players are a sample with more liberal attitudes and a greater likelihood of using systemic explanations.

Finally, differences in player performance in the game itself were investigated. The web-

104 recruited group self-identified as game players, while the Mechanical Turk group had the chance

to select a game task from among many other tasks. The Mechanical Turk group may have

included self-identified game players, but may also have included a broader range of subjects. It

is therefore important to verify the nature of the gameplay differences between the groups.

Win frequency and player score were investigated as possible axes of player competence.

For win frequency, a chi-square analysis of win frequency by player source was conducted. For

player score, an ANOVA with player score as the dependent variable and player source as a fixed

independent factor was conducted.

Significant differences in win likelihood were found between the two player source

groups, χ2(1,345) = 9.710, p = .002. 93% of web-based players won a game, compared to 83% of Mechanical Turk players (Table 11).

Table 11 Crosstabulation of player source and games won

Won Game χ² p Total

Player Source No Wins Wins

Web 11 171 9.71a .002* 182

Mechanical Turk 27 136 163 Total 38 307 345

a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 17.95.

p ≤ .05

However, no significant difference in average score between the groups was found,

F(1,226) = 2.384, p = .124 (Table J15, Table J16). Because only players who won a game were

105 given a score, this suggests that successful players in both groups performed equally well at the

game. However, the Mechanical Turk group contained more players who were unable to master

the game well enough to end with a positive score.

In summary, significant differences were found demographically, in players' prior

likelihood of using systemic attribution styles, in their previously-held attitudes about race and gender, and in their ability to win games. Broadly speaking, before beginning the study,

Mechanical Turk players were less likely to attribute racial or gender disparities to systemic rather than agentic causes; they held more racist and sexist attitudes; and they were less likely to win games, although winners scored comparably to web-based players. Given that this study looks at the impact of gameplay on attribution style and attitudes about racism and sexism, we concluded that these groups differ too significantly to be considered as a single sample. We therefore chose to conduct further analyses on these groups separately.

Analysis of Web-Recruited Players Demographics. There were 221 players in this group, including 106 males and 108 females; 7 players selected neither male nor female as their gender identity (Table J17). 194 players reported themselves as White, 5 as Black or Hispanic, and 22 as another racial identification (Table J17). The average age of players was 32.60 years (SD = 8.730) (Table J17).

Mortality and priming. Before drawing conclusions about the study's research questions, we investigated possible mortality and priming effects.

The mortality analysis compared the pre-test scores of completers and non-completers for each of the four tests: Systemic Sexism, Systemic Racism, Modern Sexism, and Symbolic

Racism. For each test, an ANOVA was conducted with pre-test score as the dependent variable, and completion status as a fixed independent factor. No significant differences were found

106 between completers and non-completers for the Systemic Sexism test (F(1, 243) = .210, p = .647;

Tables J19 and J20), the Systemic Racism Test (F(1, 243) = 2.489, p = .116; Tables J21 and J22),

or the Modern Sexism test (F(1, 243) = .223, p = .637; Tables J23 and J24).

Pre-test score significantly correlates with completion status for the Symbolic Racism

test, F(1,243) = 5.060, p = .025 (Table J26). Completers scored higher than non-completers,

indicating less racist attitudes (Table 12). However, this effect is responsible for very little of the

variance in pre-test score (η2 = .02). We therefore conclude that it does not represent a significant

mortality effect for this group.

Table 12 Symbolic Racism pretest means by completion status

Completed Mean SD N

No 25.17 4.33 88

Yes 26.42 4.08 157

Total 25.97 4.21 245

To investigate possible priming effects, we compared the post-test scores of players who received a pre-test and players who did not. For each of the four tests (Systemic Sexism,

Systemic Racism, Modern Sexism, and Symbolic Racism), an ANOVA was conducted with post-

test score as the dependent variable, and pre-test group as a fixed independent factor.

There was a significant effect of pre-test group on Systemic Sexism post-test scores,

F(1,218) = 4.3, p = .039 (Table J28). Subjects who received the pre-test scored lower than subjects who did not, indicating a lower likelihood of using systemic explanations (Table 13).

While it might be possible to explain this effect through a fatigue effect, we do not consider this a significant impact of the pre-test due to a low η2 (.019).

107 Table 13 Systemic Sexism pretest means by pretest group

Pretest Group Mean SD N

No Pretest 3.50 1.24 98

Pretest 3.12 1.42 122

Total 3.29 1.35 220

There was a significant effect of pre-test group on Systemic Racism post-test scores,

F(1,218) = 9.011, p = .003. As with the Systemic Sexism test, subjects who received the pre-test scored lower on the Systemic Racism post-test, indicating a lower likelihood of using systemic explanations (table 10). This effect shows a more substantial η2 (.04).

Table 14 Systemic Racism posttest means by pretest group

Pretest Group Mean SD N

No Pretest 3.56 1.32 98

Pretest 3.00 1.43 122

Total 3.25 1.40 220

No significant effect of pre-test group was found for the Modern Sexism test (F(1,218)

= .698, p = .404; Table J31 and J32) or the Modern Racism test (F(1,218) = .217, p = .642;

Tables J33 and J34).

Taken together, we conclude that the priming effect of the pre-test has a limited impact on

this analysis. While significant differences were found for the Systemic Sexism test, the impact

was very small (η2 of .019), and no significant differences were found for the Modern Sexism

and Symbolic Racism tests. However, there was a significant difference found for the Systemic

Racism test. Players scored lower on the Systemic Racism post-test if they received the pre-test,

with η2 of .04.

108 Attribution type. Advance, the game used in this study was designed using the PMAD

principles noted in chapter three, in an attempt to change players' likelihood of using systemic

rather than agentic explanations for racial and gender bias. In order to determine whether

Advance successfully affected players, its impact was compared to the impact of reading text on

the same topics that are modeled by the game. This comparison was conducted by comparing

player performance on the attribution post-tests for each of the four study conditions (control, informational game, financial game, generative game), controlling for pre-test performance.

Q1: Controlling for pre-test scores, are there differences in attribution test

scores for race and gender across the four study conditions?

To answer this question, an ANCOVA was conducted for each of the two attribution tests,

the Systemic Sexism test and the Systemic Racism test, with treatment condition (control,

informational, financial, or generative) as a fixed independent factor and the post-test score for

each test as the dependent factor. Pre-test score, player race, and player gender were controlled

for, with pre-test score analyzed as a covariate and player race and player gender modeled as

fixed independent factors.

For the Systemic Sexism test, there was a significant effect of treatment condition on

post-test score, controlling for pre-test score, player race, and player gender, F(3,101) = 3.695, p

= .014 (Table J37). Players performed best in the control condition and worst in the financial

condition, with the other two game conditions falling in between (Table 15). An ANCOVA was conducted using a contrast to compare the control condition to the game conditions, which found a significant effect, F(1,101) = 5.192, p = .025 (Table J38). Players performed significantly better

109 in the control condition than when playing the game, no matter which version of the game they

encountered. The control text was more effective than the game at changing players' likelihood

of using systemic explanations for gender bias.

Table 15

Systemic Sexism posttest means by treatment condition

Treatment Condition Mean SD N

Control 3.73 1.18 41

Informational 2.92 1.26 25

Financial 2.30 1.41 27

Generative 3.33 1.37 24

Total 3.15 1.39 117

To understand the meaning of this difference, we conducted t-tests to determine whether there was positive or negative change from pre-test to post-test. For each of the four conditions, a difference score between the pre-test and post-test was calculated. T-tests were conducted on each difference score to determine whether they were significantly different from zero, which would indicate no change from pre-test to post-test.

In the control condition, the pre-post difference score was significantly different from zero, t(42) = 3.478, p < .001. The mean difference score was .5814, indicating that subjects were more likely to use a systemic attribution style for sexism after reading the control text (Table

J39).

In the informational condition, the pre-post difference scores were not significantly different from zero, indicating no significant effect of the game (t(24) = .156, p = .880; Table

J39). In the financial condition, the pre-post difference scores were not significantly different

110 from zero, indicating no significant effect of the game (t(27) = -1.070, p = .294; Table J39).

Finally, in the generative condition, the pre-post difference scores were not significantly different from zero, indicating no significant effect of the game (t(25) = 1.397, p = .175; Table J39).

In other words, the text increased players' likelihood of using systemic explanations for sexism, but none of the game conditions did. However, the explanatory power of treatment condition is small (η2 = .049).

For the Systemic Racism test, no significant effect of treatment condition was found,

F(3,101) = 1.549, p = .932 (Table J40, J41). A t-test was conducted to determine whether pre- post difference scores on this test were significantly different from zero. No significant effect was found, t(122) = -.649, p = .517 (Table J44). We conclude that there was no impact of either game or text on players' likelihood of using systemic attributions for racism.

In addition to investigating differences between treatment conditions, we want to know whether there is a relationship between in-game decisions and player attributions. Do more

successful players experience more change? Are there specific in-game behaviors that are linked

to the game having an impact? Research question two addresses these questions.

Q2: Are there associations between in-game measures (such as player score,

number of characters placed, and number of game plays) and attribution test

scores?

The following in-game measures were investigated: player score (as a measure of overall player success), total clients placed (as a measure of how often players had to consider character placement issues), total clients placed from the bias group (as a measure of how often players

111 contended with bias), how many attempts it took them to identify the game's bias (as a measure

of guessing versus investigating), and how many times the player chose to play (to control for

time on task).

For player score, total clients placed, bias-group clients placed, and bias identification attempts, partial correlations were conducted between the game measure and the post-test score on each of the attitude tests, controlling for the influence of the pre-test. For the number of plays, players were separated into two categories: those who played two times (n = 120), and those who played more than twice (n = 2). This was due to the small number of players who played more than twice. ANCOVAs were conducted on the post-test scores with number of plays as an independent fixed factor and the pre-test scores as a covariate.

For Systemic Sexism, there were no significant correlations found between any of the in- game measures and post-test scores, controlling for pre-test scores. For game score, r(70) = -

.180, p = .130 (Table J45). For clients placed, r(99) = -.038, p = .706 (Table J45). For bias group clients placed, r(99) = -.054, p = .590 (Table J45). For number of guesses, r(100) = .157, p = .115

(Table J45). For number of plays, F(1,119) = .056, p = .814 (Table J46, J47).

For Systemic Racism, there were no significant correlations found between any of the in- game measures and post-test scores, controlling for pre-test scores. For game score, r(70) = -

.059, p = .621 (Table J48). For clients placed, r(99) = .032, p = .732 (Table J48). For bias group clients placed, r(99) = -.039, p = .697 (Table J48). For number of guesses, r(100) = .165, p = .097

(Table J48). For number of plays, F(1,119) = 2.888, p = .092 (Table J49, J50).

We can conclude that there is no relationship between performance on these in-game

measures and player attribution style, indicating that in-game skill did not drive the effect.

We also investigated the relationship between player race and gender and these in-game

112 measures: player score (as a measure of overall player success), total clients placed (as a measure

of how often players had to consider character placement issues), total clients placed from the

bias group (as a measure of how often players contended with bias), how many attempts it took

them to identify the game's bias (as a measure of guessing versus investigating), and how many

times the player chose to play (to control for time on task). ANCOVAs were conducted on each of the first four measures, with player race and gender modeled as independent fixed factors and the game measure as the dependent variable. For number of plays, a chi-square analysis was conducted for player race and gender.

For game score, no significant effect of player race and gender was found, F(2,114)

= .001, p = .999 (Table J51, J52). For clients placed, no significant effect of player race and gender was found, F(2,171) = 3.054, p = .055 (Table J53, J54). For bias-group clients placed, no significant effect of player race and gender was found, F(2, 170) = 1.058, p = .349 (Table J55,

J56). For the number of guess attempts, no significant effect of player race and gender was found, F(2, 171) = .944, p = .391 (Table J57, J58). No association was found between player race and number of plays, χ2 (2,221) = .433, p = .806 (Table J59). No association was found between player gender and number of plays, χ2 (2,214) = .001, p = .981 (Table J70).

Finally, we investigate the impact of game type on player attribution style. For players who received the game intervention, some did not attempt to identify the bias. We compare those players to players who interacted with the bias system in the informational condition, in the financial condition, and in the generative condition, to understand the impact of the bias system specifically on players' attributions around racism and sexism. Formally stated, these comparisons derive from research question three.

113 Q3: For game players, are there differences in measures of systemic

understanding of racism and sexism across bias guess conditions?

For each of the attribution style tests (Systemic Sexism and Systemic Racism), we

conducted an ANCOVA with bias guess condition (no guess, guess in informational condition, guess in financial condition, guess in generative condition) as an independent fixed factor, pre-

test score as a covariate, player race and gender as covariates (modeled as independent fixed

factors), and post-test score as the dependent variable.

For the Systemic Sexism test, no significant effect of bias guess group was found, F(3,59)

= 1.496, p = .225 (Table J61, J63). For the Systemic Racism test, no significant effect of bias

guess group was found, F(3,59) = .866, p = .464 (Table J65, 66). From our earlier investigation

into Systemic Sexism and Systemic Racism difference scores, we know that subjects in the game

condition did not change significantly from pre-test to post-test. We therefore conclude that not only were there no differences between bias guess conditions, there was no change from pre- to post-test in all the conditions taken together. None of the bias guess conditions made players either more or less likely to use systemic attributions for either sexism or racism.

In order to understand this result, the impact of bias guess condition on in-game measures was checked. This investigation shows whether players behaved differently across game conditions. If players acted differently in different bias guess conditions, then we can conclude that those behavioral differences in play had no impact on player attribution style. If, on the other hand, players behaved the same no matter which game condition they encountered – and particularly if they behaved the same when they guessed and when they did not guess – then we can conclude that the bias guess system did not successfully affect in-game activity. If players

114 did not respond to the bias guess system by changing their in-game behavior, then we cannot know whether the intended strategic shifts would have impacted player attribution styles.

Q4: For game players, are there differences in game performance measures

across bias guess conditions?

ANCOVAs for in-game measures were conducted on: game score, the percentage of the score earned from bias-group clients, the number of clients placed, and the number of clients placed from the bias group. For each measure, we use bias guess condition (no guess, guessed in informational condition, guessed in financial condition, guessed in generative condition) as an independent fixed factor, player race and gender as covariates (modeled as independent fixed factors), and the game measure as the dependent variable.

No significant differences were found between bias guess conditions for any of the in- game measures. For game score, F(3,106) = .066, p = .978 (Table J67, J69). For the percentage of player score obtained from bias-group members, F(3,106) = .283, p = .838 (Table J71, J73).

For the total number of clients placed, F(3,116) = 1.265, p = .290 (Table J74, J76). For the total clients placed from the bias group, F(3,116) = .594, p = .620 (Table J77, J79).

Since no differences in player behavior or outcomes were found between groups, we conclude that the failure to find an impact of bias guess condition is a failure of the design to drive player action differently between groups. This has implications for the PMAD principles on which the game in this study was designed, as will be further discussed in the following chapter.

Attitudes. In addition to investigating the influence of the game on attribution styles, we

115 investigated whether the game can influence player attitudes about racism and sexism. In order to determine the answer to this question, we conduct the same comparison as in research question one, comparing the impact of the game to the impact of a control text, but using attitude measures as our dependent variables and covariates.

Q5: Controlling for pre-test scores, are there differences in attitude test scores

for race and gender across the four study conditions?

To answer this question, an ANCOVA was conducted for each of the two attitude tests, the Modern Sexism test and the Symbolic Racism test, with treatment condition (control, informational, financial, or generative) as a fixed independent factor and the post-test score for each test as the dependent factor. Pre-test score, player race, and player gender were controlled for, with pre-test score analyzed as a covariate and player race and player gender modeled as fixed independent factors.

For the Modern Sexism test, no significant effect of treatment condition was found,

F(3,101) = .525, p = .666 (Table J80, J82). There were no significant differences between treatment conditions. To investigate whether there was an overall effect, a t-test was conducted on the difference score between the Modern Sexism pre-test and post-test. No significant differences were found from pre- to post-test, t(121) = 1.756, p = .082 (Table J83). The game did not change players' attitudes about sexism.

For the Symbolic Racism Test, a significant effect of treatment condition was found,

F(3,101) = 2.993, p = .034 (Table J86). Players performed best in the informational game condition and worst in the financial game condition, with the control group and the generative

116 game condition falling in between (Table 16). The η2 for the effect of treatment condition was

small (.082).

An ANCOVA was conducted using a contrast to determine whether there was a difference

between the game conditions, taken together, and the control condition. No significant difference

was found, F(1,101) = .084, p = .773 (Table J87). The differences between conditions do not

reflect an underlying game-versus-control difference.

Table 16

Symbolic Racism posttest means by treatment condition

Treatment Condition Mean SD N

Control 23.12 1.98 41

Informational 23.44 2.48 25

Financial 22.78 2.26 27

Generative 22.88 2.15 24

Total 23.06 2.18 117

To investigate the impact of each condition on player performance, t-tests were conducted for each of the treatment conditions. For each condition, a t-test was conducted on the pre-post difference score for the Symbolic Racism test to see whether it significantly differs from zero.

In the control condition, the difference score was significantly different from zero, t(42) =

-6.849, p < .001. The mean difference score was -3.3488 (SD = 3.20627). In the informational condition, the difference score was significantly different from zero, t(24) = -4.313, p < .001. The mean difference score was -2.600 (SD = 3.01386). In the financial condition, the difference score was significantly different from zero, t(27) = -4.571, p < .001. The mean difference score was -

2.8571 (SD = 3.30784). Finally, in the generative condition, the difference score was significantly different from zero, t(25) = -6.426, p < .001. The mean difference score was -3.7308

(SD = 2.96051). (See Table J88.)

117 In all four conditions, the mean difference score was significantly different than zero, and was negative, indicating a lower score on the Symbolic Racism test after the study. In other words, experiencing the study caused subjects to express more racist attitudes. Because there was no difference between the game conditions, taken together, and the control, we cannot conclude that the game is what caused players to express more racist attitudes; the control condition had an equally large effect. Rather, we must consider explanations that refer to the study as a whole.

When people are asked to socially perform in ways that indicate racial openness, such as talking about their choice to vote for Obama, they later are more willing to express racist attitudes

(Effron, Cameron, & Monin, 2009). This may be because they feel they have “certified” themselves as anti-racist, or because they have exhausted the cognitive resources they use to overcome the racially biased attitudes of our society (Monin & Miller, 2001; Devine, 1989).

Being exposed to the study itself, therefore, in which players are asked to help unravel issues of discrimination, may have caused this effect.

As with attributions, we wanted to know whether there is an relationship of player in- game behavior to the attitude outcome measures. Do more successful players experience more change? Are there specific in-game behaviors that are linked to the game having an impact?

Q6: Are there associations between in-game measures (such as player score,

number of characters placed, and number of game plays) and attitude test

scores?

The following in-game measures were investigated: player score (as a measure of overall player success), total clients placed (as a measure of how often players had to consider character

118 placement issues), total clients placed from the bias group (as a measure of how often players

contended with bias), how many attempts it took them to identify the game's bias (as a measure

of guessing versus investigating), and how many times the player chose to play (to control for

time on task).

For player score, total clients placed, bias-group clients placed, and bias identification attempts, partial correlations were conducted between the game measure and the post-test score on each of the attitude tests, controlling for the influence of the pre-test. For the number of plays, players were separated into two categories: those who played two times (n = 120), and those who played more than twice (n = 2). This was due to the small number of players who played more than twice. ANCOVAs were conducted on the post-test scores with number of plays as an independent fixed factor and the pre-test scores as a covariate.

For the Modern Sexism test, no significant correlations between the in-game measures and Modern Sexism post-test scores were found, controlling for pre-test score. For game score, r(70) = -.121, p = .313 (Table J89). For clients placed, r(99) = -.021, p = .834 (Table J89). For bias-group clients placed, r(99) = -.011, p = .916 (Table J89). For the number of attempts at bias identification, r(100) = .042, p = .673 (Table J89). Finally, for the number of plays, F(1,119)

= .793, p = .375 (Table J90, J91).

For the Symbolic Racism test, no significant correlations between the in-game measures and Symbolic Racism post-test scores were found, controlling for pre-test score. For game score, r(70) = .081, p = .497 (Table J92). For clients placed, r(99) = -.085, p = .398 (Table J92). For bias-group clients placed, r(99) = .039, p = .700 (Table J92). For identification attempts, r(100) =

-.106, p = .288 (Table J92). Finally, for number of plays, F(1,119) = .406, p = .525 (Table J93,

J84).

119 Taken together, this data suggests that there was no relationship between greater skill at

the game and the outcome measures. Neither did more encounters with the core mechanic,

namely placing clients successfully.

Finally, we revisit the impact of game type, this time looking at player attitudes. For

players who received the game intervention, some did not attempt to identify the bias. We

compare those players to players who interacted with the bias system in the informational

condition, in the financial condition, and in the generative condition, to understand the impact of

the bias system specifically on players' attitudes about racism and sexism.

Q7: For game players, are there differences in measures of attitudes toward racism

and sexism across conditions?

For each of the attitude tests (Modern Sexism and Symbolic Racism), an ANCOVA was conducted with bias guess condition (no guess, guess in informational condition, guess in financial condition, guess in generative condition) as an independent fixed factor, pre-test score as a covariate, player race and gender as covariates (modeled as independent fixed factors), and post-test score as the dependent variable.

For the Modern Sexism test, no significant differences between bias guess conditions were found, F(3,74) = .026, p = .994 (Table J95, J97). For the Symbolic Racism test, no significant differences between bias guess conditions were found, F(3,74) = 1.525, p = .217

(Table J99, J101). Given the previous finding that the game design failed to drive behavioral differences between players, this result is unsurprising. If players interacted with the game's model the same way across conditions, PMAD theory predicts that there will be no differences in

120 impact on either attribution or attitude.

We conclude that for web-based players, the game did not drive changes in attribution style, though the control text successfully helped players use more systemic attributions for gender disparities in outcome. Neither the game nor the control text impacted player attitudes around gender, while all intervention conditions caused players to express more negative attitudes about race, possibly due to a fatigue effect.

The most important finding is that players did not react differently to the different game conditions in their play behaviors. The PMAD model is driven by players engaging differently with the game's model under different game-mechanical conditions. We did not find any impact of bias identification condition on either attributions or attitudes, but this is because the game failed to evoke differences in player behavior between bias conditions rather than because differing player behavior across bias conditions failed to affect player attributions or attitudes.

Analysis of Mechanical Turk Players Demographics. There were 191 players in this group. 81 were male and 108 were female; 2 did not report their gender (Table J103). 148 reported White as their race, while 17 reported Black or Hispanic and 26 reported another racial category (Table J103). The average age of players in this group was 34.72 (SD = 10.927; Table J104).

Mortality and priming. Before drawing conclusions about the study's research questions, we investigated possible mortality or priming effects.

The mortality analysis compared the pre-test scores of completers and non-completers for each of the four tests: Systemic Sexism, Systemic Racism, Modern Sexism, and Symbolic

Racism. For each test, an ANOVA was conducted with pre-test score as the dependent variable, and completion status as a fixed independent factor. No significant differences were found for the

121 Systemic Sexism test (F(1,130) = .084, p = .804; Table J105, J106), the Systemic Racism test

(F(1,130) = 1.716, p = .193; Table J107, J108), or the Modern Sexism test (F(1,130) = .023, p

= .880; Table J109, J110).

Pre-test score significantly correlated with completion status for the Symbolic Racism test, F(1,130) = 4.754, p = .031 (Table J112). Non-completers scored higher than completers on

the Symbolic Racism pre-test, indicating that they held less racist attitudes (Table 17), with η2

of .035.

Table 17 Symbolic Racism pretest means by completion status

Completed Mean SD N

No 22.28 4.50 32

Yes 20.13 4.96 100

Total 20.65 4.93 132

Possible priming effects were investigated by comparing the post-test scores of players who received a pre-test and players who did not. For each of the four tests (Systemic Sexism,

Systemic Racism, Modern Sexism, and Symbolic Racism), an ANOVA was conducted with post- test score as the dependent variable, and pre-test group as a fixed independent factor.

A significant effect of pre-test condition on post-test score was found for the Systemic

Sexism test, F(1,189) = 19.724, p < .001, η2 = .094 (Table J114). Subjects who received a pre-

test scored lower on the post-test than subjects who did not (Table 18). In other words, subjects

who received the pre-test were less likely to use a systemic explanation than subjects who did

not.

122 Table 18 Systemic Sexism posttest means by pretest group

Pretest Group Mean SD N

No Pretest 2.53 1.31 92

Pretest 1.71 1.26 99

Total 2.10 1.35 191

A significant effect of pre-test condition was found for the Systemic Racism test,

F(1,189) = 5.520, p = .020, η2 = .028 (Table J116). Again, subjects who received the pre-test

scored lower than subjects who did not (Table 19), indicating that they were less likely to use

systemic explanations.

Table 19

Systemic Racism posttest means by pretest group

Pretest Group Mean SD N

No Pretest 2.37 1.52 92

Pretest 1.89 1.31 99

Total 2.12 1.43 191

A significant effect of pre-test condition was found for the Modern Sexism test, F(1,189)

= 5.718, p = .018, η2 = .029 (Table J118). Subjects who received the pre-test scored lower than

those who did not, indicating more sexist attitudes (Table 20).

123 Table 20

Modern Sexism posttest means by pretest group

Pretest Group Mean SD N

No Pretest 29.28 4.46 92

Pretest 27.56 5.43 99

Total 28.39 5.05 191

Finally, a significant effect of pre-test condition was found for the Symbolic Racism test,

F(1,189) = 4.473, p = .036, η2 = .023 (Table J120). Subjects who received the pre-test scored

lower than those who did not, indicating more racist attitudes (Table 21).

Table 21

Symbolic Racism posttest means by pretest group

Pretest Group Mean SD N

No Pretest 20.84 2.59 92

Pretest 20.07 2.42 99

Total 20.44 2.52 191

While significant differences were found for all four tests, the impacts were very small

for the Systemic Racism, the Modern Sexism, and the Symbolic Racism tests. We therefore

conclude that for these three tests, the priming effect of the pre-test is not a major factor in the analysis. A small effect of pre-test condition was found for the Systemic Sexism test (η2 = .094).

Subjects who received a pre-test were less likely to use systemic explanations for sexist at post-

test. Since many of the analyses conducted rely on pre-post differences, this finding suggests that the impact of the intervention on Systemic Sexism attribution test scores may be underreported by this study.

124 Attribution type. As previously discussed, Advance, the game used in this study, was designed using the PMAD principles discussed in chapter three in an attempt to change players' likelihood of using systemic rather than agentic explanations for racial and gender bias. In order to determine whether Advance successfully affected players, its impact is compared to the impact of reading text on the same topics modeled by the game. This comparison was conducted by comparing player post-test performance on the attribution tests across the four study conditions

(control, Informational, financial, generative), controlling for pre-test performance.

Q1: Controlling for pre-test scores, are there differences in attribution test

scores for race and gender across the four study conditions?

To answer this question, an ANCOVA was conducted for each of the two attribution tests, the Systemic Sexism test and the Systemic Racism test, with treatment condition (control, informational, financial, or generative) as a fixed independent factor and the post-test score for each test as the dependent factor. Pre-test score, player race, and player gender were controlled for, with pre-test score analyzed as a covariate and player race and player gender modeled as fixed independent factors.

No significant effect of treatment condition on Systemic Sexism post-test score was found, F(3,80) = 2.279, p=.086 (Table J121, Table J122). An ANCOVA was conducted using a contrast to determine whether there was a difference between the game conditions, taken together, and the control condition. No significant effect was found, F(1,80) = .093, p = .761

(Table J123).

A t-test was used to determine whether the intervention, as a whole, changed players'

125 likelihood of using systemic explanations, by comparing the pre-post difference score on the

Systemic Sexism test to 0. No significant effect was found, t(98) = -1.551, p = .124 (Table J124).

For the Systemic Racism test, no significant difference between conditions was found,

F(3,80) = .291, p = .832 (Table J125, J126). An ANCOVA was conducted using a contrast to determine whether there was a difference between the game conditions, taken together, and the control group. No significant effect was found, F(1,80) = .546, p = .462 (Table J127).

A t-test was used to determine whether the intervention, as a whole, changed players'

likelihood of using systemic attributions for racism. No significant effect was found, t(98) = -

.065, p = .949 (Table J128).

For the Mechanical Turk group, therefore, the overall intervention had no effect, neither

in the control condition nor in any of the game conditions.

As with the web-recruited group, we want to know whether there is a relationship

between in-game decisions and player attributions. Do more successful players experience more

change? Are there specific in-game behaviors that are linked to the game having an impact?

Research question two addresses these differences.

Q2: Are there associations between in-game measures (such as player score,

number of characters placed, and number of game plays) and attribution test

scores?

The following in-game measures were investigated: player score (as a measure of overall player success), total clients placed (as a measure of how often players had to consider character placement issues), total clients placed from the bias group (as a measure of how often players

126 contended with bias), how many attempts it took them to identify the game's bias (as a measure

of guessing versus investigating), and how many times the player chose to play (to control for

time on task).

For player score, total clients placed, bias-group clients placed, and bias identification attempts, partial correlations were conducted between the game measure and the post-test score on each of the attitude tests, controlling for the influence of the pre-test. For the number of plays, players were separated into two categories: those who played two times (n = 95), and those who played more than twice (n = 4). This was due to the small number of players who played more than twice. ANCOVAs were conducted on the post-test scores with number of plays as an independent fixed factor and the pre-test scores as a covariate.

For Systemic Sexism, there were no significant correlations found between any of the in- game measures and post-test scores, controlling for pre-test scores. For game score, r(55) = .162, p = .227 (Table J129). For clients placed, r(88) = -.083, p = .439 (Table J129). For bias group clients placed, r(88) = -.131, p = .217 (Table J129). For number of guesses, r(88) = -.106, p

= .322 (Table J129). For number of plays, F(1, 98) = 1.744, p = .190 (Table J130, J131).

For Systemic Racism, there were no significant correlations found between any of the in- game measures and post-test scores, controlling for pre-test scores. For game score, r(55) = .057, p = .671 (Table J132). For client placed, r(88) = .159, p = .134 (Table J132). For bias-group clients placed, r(88) = .077, p = .473 (Table J132). For number of identification attempts, r(88)

= .079, p = .459 (Table J132). For number of plays, F(1,96) = .239, p =.626 (Table J133, J134).

We can conclude that there is no relationship between engagement with these specific game activities and player change on the outcome measures.

We also investigated the relationship of player race and gender to these in-game

127 measures: player score (as a measure of overall player success), total clients placed (as a measure

of how often players had to consider character placement issues), total clients placed from the

bias group (as a measure of how often players contended with bias), how many attempts it took

them to identify the game's bias (as a measure of guessing versus investigating), and how many

times the player chose to play (to control for time on task). ANCOVAs were conducted on each

of the first four measures, with player race and gender modeled as independent fixed factors and

the game measure as the dependent variable. For number of plays, a chi-square analysis was

conducted for player race and gender.

For game score, no significant effect of player race and gender was found, F(2,98) = .581,

p = .561 (Table J135, J136). For bias-group clients placed, no significant effect of player race

and gender was found, F(2, 169) = .217, p = .805 (Table J140, J141). For number of guesses, no

significant effect of player race and gender was found, F(2,169) = .171, p = .843 (Table J142,

J143). No association was found between player race and number of plays, χ2 (2,191) = 3.169, p

= .205 (Table J144). No association was found between player gender and number of plays, χ2

(2,189) = .010, p = .921 (Table J145).

A significant effect of player race was found for the total number of clients placed,

F(2,169) = 4.259, p = .016, η2 = .048 (Table 22, J137, J138). White players placed the most

clients, while Black and Hispanic players placed the fewest. A second ANCOVA was conducted

using a contrast to determine whether White players performed different from Black, Hispanic,

and Other players taken together. A significant effect was found, F(1,169) = 8.383, p = .004, η2

= .047 (Table J139). White players placed more clients during the game than Black, Hispanic, and Other players did.

128 Table 22

Number of clients placed by player race

Pretest Group Mean SD N

White 26.15 22.02 136

Black and Hispanic 14.41 14.10 17

Other 18.55 14.43 22

Total 24.06 20.06 175

Finally, we investigated the impact of game type on player attribution style for

Mechanical Turk players. For players who received the game intervention, some did not attempt

to identify the bias. Those players were compared to players who did interact with the bias

system in the informational condition, in the financial condition, and in the generative condition,

to understand the impact of the bias system specifically on players' attributions around racism

and sexism. These comparisons are investigated with research question three.

Q3: For game players, are there differences in measures of systemic

understanding of racism and sexism across bias guess conditions?

For each of the attribution style tests (Systemic Sexism and Systemic Racism), an

ANCOVA was conducted with bias guess condition (no guess, guess in informational condition,

guess in financial condition, guess in generative condition) as an independent fixed factor, pre-

test score as a covariate, player race and gender as covariates (modeled as independent fixed

factors), and post-test score as the dependent variable.

For Systemic Sexism, no significant difference between bias guess conditions was found,

F(3,54) = 1.542, p = .214 (Table J146, J147). An ANCOVA was conducted using a contrast to

129 determine whether there was a difference between non-guessers and guessers. No significant effect was found, F(1,54) = .009, p = .925 (Table J148).

Player race was found to be a significant predictive factor, F(2,54) = 3.855, p = .049, η2

= .106 (Table J147). Black and Hispanic players had the highest mean score (Table 23),

indicating the highest likelihood of using systemic explanations for sexism.

Table 23 Systemic Sexism posttest marginal means by race

Player Race Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

White 1.54a 0.15 1.23 1.85

Black and Hispanic 2.73a 0.45 1.83 3.63

Other 1.48a 0.35 0.77 2.19

a. Covariates appearing in the model are evaluated at the following values: Systemic Sexism Pretest Score = 1.9315.

T-tests were conducted for each group to determine whether the means differed from

zero. While the groups did significantly differ from each other, none of the group means

significantly differed from zero. For the White group, t(53) = -1.901, p = .063 (Table J150). For

the Black and Hispanic group, t(6) = .918, p = .394 (Table J150). For the Other group, t(11) = -

.761, p = .463 (Table J150).

Note that this racial difference is found only when looking at players; when looking

across all four treatment conditions, no differences are found. This is suggestive that perhaps the

game can be most effective for Black and Hispanic players in thinking about issues of sexism.

Although there were play differences between White and non-White players, leading us to expect

that there may be differences between these groups in their performance on post-game measures,

130 it is surprising that this effect appears only for the issue of sexism. The effect may appear for

Systemic Sexism because it was the only test found to have a significant priming effect.

However, other results from this player group suggest that race may also be the dominant

category experienced by players when playing the game. This will be discussed further in the

following chapter.

For the Systemic Racism test, no significant effect of bias guess condition on post-test

score was found, F(3,54) = .139, p = .936 (Table 151, 151). An ANCOVA was conducted using a

contrast to determine whether there was a difference between non-guessers and guessers. No significant effect was found, F(1,54) = .187, p = .667 (Table 153).

The interaction of player race and gender significantly predicted post-test scores, F(2,54)

= 3.480, p = .038, η2 = .114 (Table 24). To understand the results more deeply, t-tests were conducted on the pre-post difference scores of each group to determine whether the pre-post difference was significantly different from 0. The only group whose pre-post difference scores significantly differed from zero were white women, t(31) = -2.370, p = .024. The mean observed difference score was -.5625 (SD = 1.34254), indicating that white women were less likely to use systemic explanations for racism after playing the game (Table J155).

131

Table 24

Systemic Racism pretest means by player race and gender

Player Gender Player Race Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

Female White 1.62a 0.21 1.20 2.03

Black and Hispanic 3.06a 0.74 1.58 4.55

Other 2.14a 0.61 0.91 3.36

Male White 2.09a 0.25 1.60 2.58

Black and Hispanic 0.52a 0.74 -0.97 2.01

Other 1.64a 0.45 0.74 2.53

a. Covariates appearing in the model are evaluated at the following values: Systemic Racism Pretest Score = 2.0137.

As with the web-recruited group, to understand this result, the impact of bias guess

condition on in-game measures is investigated to see whether there were conditions that

motivated players to behave differently in the game from their peers. If players behave

differently across conditions, then the lack of performance differences would indicate that

players' behavior was not driving attribution style change. If, on the other hand, players behave

identically across conditions, it suggests that the bias guess condition does not affect player

behavior. Because PMAD theory predicts that players must respond to the anomalous data in the

game in order to for their attribution style to be affected, if there is no difference in player

behavior between conditions then we do not expect differences in attribution style either.

Q4: For game players, are there differences in game performance measures

across bias guess conditions?

132 ANCOVAs were conducted for the following in-game measures: game score, the percentage of the score earned from bias-group clients, the number of clients placed, and the number of clients placed from the bias group. For each measure, bias guess condition (no guess, guessed in informational condition, guessed in financial condition, guessed in generative condition) was used as an independent fixed factor, player race and gender were treated as covariates (modeled as independent fixed factors), and the game measure was the dependent variable.

No significant differences were found between bias guess conditions for any of the in- game measures. For game score, F(3,86) = 2.475, p = .067 (Table J156, J157). For percentage of score earned from the bias group, F(3,86) = 1.158, p = .330 (Table J158, J159). For the total number of clients placed, F(3,114) = .714, p = .546 (Table J161, J162). For the total number of bias-group clients placed, F(3,114) = .735, p = .533 (Table J163, J164).

Since no differences in player behavior or outcomes were found between groups, we conclude that the failure to find an impact of bias guess condition is a failure of the design to drive player action differently between groups.

Since we found racial and gender differences on the outcome measures, we might expect to find an effect of player race or gender on play styles. There was one significant difference between game measures for player race and gender, F(2,86) = 3.821, p = .026, η2 = .082 (Table

J159). Among women, White women earned the highest proportion of their score from the bias

group and Other women earned the smallest, while Other men earned the largest and White men

the smallest proportion (Table 25).

133

Table 25

Percentage of score earned from bias group, marginal means by player race and gender

Player Race Player Gender Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

White Female 0.22 0.07 0.09 0.36

Male 0.14 0.08 -0.02 0.30

Black and Hispanic Female 0.11 0.35 -0.58 0.81

Male 0.22 0.19 -0.15 0.59

Other Female 0.09 0.18 -0.27 0.45

Male 0.72 0.17 0.39 1.05

This finding suggests that differential levels of success in earning money from bias-group

clients may drive some of the racial and gender differences previously observed. We conclude

that further investigations of the racial and gender differences present in this game should look

more deeply at differences in the bias group placement process.

Attitudes. In addition to investigating the influence of the game on attribution styles, we

investigated whether the game can influence player attitudes about racism and sexism. In order

to determine the answer to this question, the same comparison as in research question one was

conducted, comparing the impact of the game to the impact of a control text, but using attitude

measures as our dependent variables and covariates.

Q5: Controlling for pre-test scores, are there differences in attitude test scores

for race and gender across the four study conditions?

134 To answer this question, an ANCOVA was conducted for each of the two attitude tests,

the Modern Sexism test and the Symbolic Racism test, with treatment condition (control,

informational, financial, or generative) as a fixed independent factor and the post-test score for

each test as the dependent factor. Pre-test score, player race, and player gender were controlled

for, with pre-test score analyzed as a covariate and player race and player gender modeled as

fixed independent factors.

A significant effect of treatment condition was found on Modern Sexism post-test score,

F(3,80) = 3.868, p = .012, η2 = .127 (Table J166; Table 26). An ANCOVA was conducted using a contrast to determine whether there was a difference between the game conditions, taken together, and the control condition. No significant difference was found, F(1,80) = 2.334, p

= .131 (Table J167).

T-tests were conducted on the Modern Sexism pre-post difference score for each of the four treatment conditions, to determine whether the pre-post change was significantly different from zero. A significant difference was found for the control group, t(25) = 2.461, p = .021

(Table J158). The mean was .8077 (SD = 1.67378), indicating that players showed less evidence of sexist attitudes after reading the control text. A near-significant effect was found for the informational group, t(30) = -2.009, p = .054, with a mean of -.3667 (SD = .99943; Table J158).

No significant effect was found for the financial condition, t(23) = 1.030, p = .314 (Table J158), or the generative condition, t(19) = -1.455, p = .163 (Table J158).

135

Table 26 Modern Sexism posttest marginal means by treatment condition

Treatment Condition Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

Control 28.71a 0.69 27.34 30.09

Informational 26.85a 0.34 26.18 27.51

Financial 28.67a 0.49 27.70 29.64

Generative 27.29a 0.48 26.33 28.25

a. Covariates appearing in the model are evaluated at the following values: Modern Sexism Pretest Score = 27.4343.

Given that the game-versus-control contrast found no significant effect, we cannot simply

interpret this finding to mean that the control group benefited, while there was no change in the

other groups. Rather, we note that while the control group was the only group to significantly

benefit from the intervention, differences between the other game conditions are worth investigating further. Specifically, although the informational condition showed only a near-

significant effect on changing player attitudes, its lower score compared to all other groups

suggests that players may have struggled most with the informational condition. The

informational condition provides no explicit way to counteract the bias present in the game;

given that Mechanical Turk players were also less likely to win games than their web-recruited

peers, they may have experienced frustration because of this and reacted against it.

For the Symbolic Racism test, no significant effect of treatment condition was found,

F(3,80) = 2.281, p = .086. An ANCOVA was conducted using a contrast to compare the game conditions, taken together, to the control condition. No significant effect was found, F(1,80)

= .314, p = .577.

To determine whether there was an impact of the intervention as a whole, a t-test was

136 conducted on the pre-post difference score for the Symbolic Racism test, to determine whether

the difference score was significantly different from zero. No significant difference was found,

t(98) = -.142, p = .887 (Table J169, J170).

A significant effect of player race was found, F(2,80) = 3.676, p = .030, η2 = .084 (Table

J170). White players scored highest, followed by Black and Hispanic players, with Other players

scoring the lowest (Table 27).

Table 27

Symbolic Racism posttest marginal means by player race

Player Race Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

White 20.24a 0.18 19.89 20.59

Black and Hispanic 19.65a 0.65 18.36 20.94

Other 18.78a 0.52 17.75 19.82

a. Covariates appearing in the model are evaluated at the following values: Symbolic Racism Pretest Score = 20.1212.

White subjects showed no significant change from pre-test to post-test, t(77) = .940, p

= .350 (Table J173). Black and Hispanic subjects also showed no change, t(7) = -.759, p = .472

(Table J173). Subjects in the Other racial category, however, showed a significant decrease from pre-test to post-test, t(12) = -2.547, p = .026 (Table J173). Their mean difference score was -

1.9231 (SD = 2.72218). In other words, the racial attitudes of White, Black, and Hispanic subjects were not affected by any of the conditions of the study, nor were there differences between conditions. Players reporting a racial identity of Other, on the other hand, reported more racist attitudes after the study. However, this effect may be because of the nature of the Symbolic

Racism test, which treats racism as though it were about Black people only.

As with attributions, we want to know whether there is a relationship between player in-

137 game behavior and the attitude outcome measures. Do more successful players experience more change? Are there specific in-game behaviors that are linked to the game having an impact?

Research question six addresses this area.

Q6: Are there associations between in-game measures (such as player score,

number of characters placed, and number of game plays) and attitude test

scores?

The following in-game measures were investigated: player score (as a measure of overall player success), total clients placed (as a measure of how often players had to consider character placement issues), total clients placed from the bias group (as a measure of how often players contended with bias), how many attempts it took them to identify the game's bias (as a measure of guessing versus investigating), and how many times the player chose to play (to control for time on task).

For player score, total clients placed, bias-group clients placed, and bias identification attempts, partial correlations were conducted between the game measure and the post-test score on each of the attitude tests, controlling for the influence of the pre-test. For the number of plays, players were separated into two categories: those who played two times (n = 95), and those who played more than twice (n = 4). This was due to the small number of players who played more than twice. ANCOVAs were conducted on the post-test scores with number of plays as an independent fixed factor and the pre-test scores as a covariate.

For the Modern Sexism test, a significant negative correlation was found between player score and post-test score, controlling for pre-test score, r(55) = -.384, p = .003 (Table J174). A

138 higher player score was associated with a lower post-test score, indicating that players expressed more sexist attitudes.

No significant effect was found for the total number of clients placed (r(88) = .070, p

= .509; Table J174), the number of clients placed from the bias group (r(88) = .007, p = .948;

Table J174), the number of guesses made (r(88) = -.109, p = .308; Table J174), or the number of plays (F(1,96) = .731, p = .500; Table J175, J176).

For the Symbolic Racism test, no significant correlations were found. For game score, r(55) = .089, p = .511 (Table J177). For clients placed, r(88) = -.118, p = .268 (Table J177). For clients placed from the bias group, r(88) = -.092, p = .387 (Table J177). For bias identification attempts, r(88) = -.002, p = .988 (Table J177). For number of plays, F(1,98) = .001, p = .973

(Table J178, 179).

For Symbolic Racism, this data suggests that greater skill at the game did not correlate with the outcome measures. Neither did more encounters with the core mechanic, namely placing clients successfully. However, greater skill at the game did have a negative association with score for modern sexism. More successful players expressed more sexist attitudes after play.

This suggests that players may have been frustrated with their ability to place female characters; it will be discussed further in the following chapter.

Finally, we revisit the impact of game type, this time looking at player attitudes. For players who received the game intervention, some did not attempt to identify the bias. We compare those players to players who interacted with the bias system in the informational condition, in the financial condition, and in the generative condition, to understand the impact of the bias system specifically on players' attitudes about racism and sexism.

139 Q7: For game players, are there differences in measures of attitudes toward racism

and sexism across conditions?

For each of the attitude tests (Modern Sexism and Symbolic Racism), an ANCOVA was conducted with bias guess condition (no guess, guess in informational condition, guess in financial condition, guess in generative condition) as an independent fixed factor, pre-test score as a covariate, player race and gender as covariates (modeled as independent fixed factors), and post-test score as the dependent variable.

For the Modern Sexism test, a significant interaction (at alpha of .025 to account for family effects) was found between bias guess condition and player race, F(6,54) = 2.644, p

= .004 (Table J181). Black and Hispanic players performed best in the financial condition, as did

Other players, but White players performed best when not guessing (Table 28).

To understand these results more clearly, an ANCOVA was performed using a contrast to compare the no-guessing condition to the guess conditions, taken together. No significant effect was found, F(1,54) = .257, p = .614 (Table J183). A second ANCO VA was performed using a contrast to compare White players to all other players. A significant difference between White and non-White players was found at alpha of .025, F(1,54) = 10.277, p = .002 (Table J185, Table

29).

140 Table 28

Modern Sexism posttest marginal means by guess condition and player race

Guessing Condition Player Race Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

No Guess White 27.74a 0.24 27.27 28.22

Black and Hispanic 27.57a 0.61 26.36 28.79

Other 27.66a 0.51 26.64 28.68

Informational Guess White 27.06a 0.23 26.60 27.51

Black and Hispanic 26.42a 0.61 25.20 27.63

Other 27.14a 0.44 26.27 28.01

Financial Guess White 27.14a 0.24 26.66 27.61

Black and Hispanic 31.37a 1.01 29.34 33.40

Other 28.54a 0.72 27.09 29.98

Generative Guess White 27.10a 0.25 26.59 27.60

Black and Hispanic 27.96a 0.82 26.33 29.60

Other 27.75a 0.51 26.74 28.76

a. Covariates appearing in the model are evaluated at the following values: Modern Sexism Pretest Score = 27.5068.

For White players only, an ANCOVA was performed with the Modern Sexism post-test

score as the dependent variable, the bias guess condition as a fixed independent factor, and the

Modern Sexism pre-test score as a covariate. No significant result was found at an alpha of .025,

F(1,49) = 2.181, p = .102 (Table J186, J187). For White players, therefore, we treat all game

conditions as identical. We conducted a single t-test to determine whether there was any pre-post

change, comparing the pre-post difference score to zero. No significant effect was found, t(77)

= .441, p = .660 (Table J188). White players showed no change in their attitudes about sexism after playing the game, and this was true across all game conditions.

141 Table 29

Modern Sexism posttest marginal means by player race

Player Race Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

White 27.26a 0.12 27.02 27.50

Black and Hispanic 28.33a 0.35 27.63 29.03

Other 27.77a 0.27 27.22 28.32

a. Covariates appearing in the model are evaluated at the following values: Modern Sexism Pretest Score = 27.5068.

For Black, Hispanic, and Other players, an ANCOVA was performed with the Modern

Sexism post-test score as the dependent variable, the bias guess condition as a fixed independent

factor, and the Modern Sexism pre-test score as a covariate. A significant result was found at

alpha of .025, F(1,14) = 5.887, p = .003 (Table J190). There were differences in post-test score

between conditions for Black, Hispanic, and Other players (Table 30).

Table 30

Modern Sexism posttest means by guess condition, Black, Hispanic, & Other

Guessing Condition Mean SD N

No Guess 30.60 3.58 5

Informational Guess 22.83 6.46 6

Financial Guess 27.33 3.51 3

Generative Guess 29.80 5.22 5

Total 27.42 5.77 19

T-tests were conducted for each condition to determine whether the Modern Sexism pre-

post difference score was significantly different from zero. For the no-guess condition, no

142 significant effect was found, t(4) = .001, p = 1.000 (Table J191). For the informational-guess condition, no significant effect was found, t(5) = -1.000, p = .363 (Table J191). For the financial- guess condition, no significant effect was found, t(2) = 2.646, p = .118 (Table J191). For the generative-guess condition, no significant effect was found, t(4) = .374, p = .372 (Table J191). In other words, while the differences between conditions were significant, none of the individual conditions caused players to change their attitudes about sexism.

For the Symbolic Racism test, a significant interaction (at an alpha of .025) was found between bias guess condition and player race, F(6,54) = 2.567, p = .029 (Table J193). Black and

Hispanic players performed best in the generative-guess condition, as did Other players, but

White players performed best in the financial-guess condition (Table 31).

To understand these results more clearly, an ANCOVA was performed using a contrast to compare the no-guessing condition to the guess conditions, taken together. No significant effect was found, F(1,54) = .289, p = .593 (Table J195). A second ANCOVA was performed using a contrast to compare White players to all other players. A significant difference between White and non-White players was found at alpha of .025, F(1,54) = 9.572, p = .003 (Table J197; Table

32).

For White players only, an ANCOVA was performed with the Symbolic Racism post-test score as the dependent variable, the bias guess condition as a fixed independent factor, and the

Symbolic Racism pre-test score as a covariate. No significant result was found at an alpha of .025, F(1,49) = .420, p = .739 (Table J198, J199). For White players, therefore, all game conditions were treated as identical, and a single t-test was conducted to determine whether there was any pre-post change, comparing the pre-post difference score to zero. No significant effect was found, t(77) = .940, p = .350 (Table J200). White players showed no change in their attitudes

143 about sexism after playing the game, and this was true across all game conditions.

Table 31

Symbolic Racism posttest marginal means by guess condition and player race

Guessing Condition Player Race Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

No Guess White 20.62a 0.39 19.84 21.41

Black and Hispanic 20.13a 1.01 18.12 22.15

Other 18.34a 0.84 16.66 20.01

Informational Guess White 19.96a 0.38 19.20 20.71

Black and Hispanic 20.16a 1.00 18.16 22.16

Other 20.07a 0.70 18.66 21.48

Financial Guess White 20.48a 0.39 19.70 21.27

Black and Hispanic 15.62a 1.66 12.29 18.96

Other 17.05a 1.19 14.67 19.44

Generative Guess White 20.24a 0.42 19.40 21.07

Black and Hispanic 20.85a 1.38 18.09 23.61

Other 20.17a 0.84 18.49 21.85

a. Covariates appearing in the model are evaluated at the following values: Symbolic Racism Pretest Score = 20.4521.

For Black, Hispanic, and Other players, an ANCOVA was performed with the Symbolic

Racism post-test score as the dependent variable, the bias guess condition as a fixed independent

factor, and the Symbolic Racism pre-test score as a covariate. A significant result was found at

alpha of .025, F(1,14) = 8.164, p = .002 (Table J202). There were differences in post-test score

between conditions for Black, Hispanic, and Other players (Table 33).

144 Table 32

Symbolic Racism marginal means by player race

Player Race Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

White 20.33a 0.20 19.93 20.72

Black and Hispanic 19.19a 0.58 18.04 20.35

Other 18.91a 0.45 18.00 19.81

a. Covariates appearing in the model are evaluated at the following values: Symbolic Racism Pretest Score = 20.4521.

T-tests were conducted for each condition to determine whether the Symbolic Racism

pre-post difference score was significantly different from zero. For the no-guess condition, a significant effect was found, t(4) = -4.000, p = .016 (Table J203). The mean difference score was

-3.2000 (SD = 1.78885), indicating that Black, Hispanic, and Other subjects expressed more racist attitudes after the game in this condition.

Table 33 Symbolic Racism posttest means by guess condition, Black, Hispanic, & Other

Guessing Condition Mean SD N

No Guess 20.00 1.87 5

Informational Guess 19.83 2.79 6

Financial Guess 15.00 1.00 3

Generative Guess 21.80 1.92 5

Total 19.63 2.97 19

For the informational-guess condition, no significant effect was found, t(5) = .104, p

= .921 (Table J203). For the financial-guess condition, no significant effect was found, t(2) = -

1.732, p = .225 (Table J203). For the generative-guess condition, no significant effect was found,

145 t(4) = -2.449, p = .070 (Table J203).

In summary, White players were not affected by any of the treatment conditions, and there were no differences between treatment conditions for White players. All differences found on these measures were for Black, Hispanic, and Other players. For the Modern Sexism test, there were differences in attitudes about sexism between conditions, but there was no significant change from the pre-test. For the Symbolic Racism test, Black, Hispanic, and Other players expressed more racist attitudes when they did not guess, while there was no significant change from the pre-test in the other conditions.

We conclude that Black, Hispanic, and Other players were most emotionally sensitive to the differences between guess conditions. However, the only overall change induced by the game was to make players who did not guess express more racist attitudes. Possible interpretations will be further discussed in the next chapter.

Analysis by Player Group Broadly speaking, the web-recruited group was unaffected by the intervention. There was a small positive effect of the control text on players' likelihood of using systemic attributions for sexism, but no effect of the game. Players expressed more racist attitudes after the intervention, with some differences between conditions, but there was no difference between the impact of the game and that of the control text. Rather, engaging with the issue at all may have made players more likely to express racist attitudes because of a moral justification effect (Monin & Millar,

2001).

We might have expected these self-identified gamers to be sensitive to the differences between the game conditions, but there was no effect for the four bias-group conditions on either attitude or attribution measures. We attribute this to players' not actually playing differently

146 across game conditions, as evidenced by the lack of significant differences in game measures.

For this group of players, the rewards offered by the different game conditions were not effective

at inducing behavior change in play.

What this suggests is that if we hope to have an effect on the web-recruited player group,

and players like them, we will need to go back to the PMAD principles and consider ways to

make the design differences between conditions more salient to the player, as well as ways to

make sure that they drive differences in play behavior. From there, it would be possible to

investigate whether such play differences would change conceptual models about discrimination.

As discussed in the next chapter, this will likely mean modifying the principles themselves.

The paucity of significant results for the web-based player group may also be explained by the small number of non-White subjects; we may not be picking up the race-based effects visible in the other groups. However, it may also be that there is a ceiling effect. The web- recruited group had higher pre-test scores on all four pre-tests, indicating they were more likely to use systemic attributions for racism and sexism, and that they held less racist and sexist attitudes.

We believe that the web-recruited group, who are self-identified gamers with a significant online presence in the hobby, may be somewhat atypical for their demographic. Gaming culture is quite racist and sexist in both subtle and overt ways; players willing to participate in a study addressing bias may not be particularly representative of gaming culture as a whole. Mechanical

Turk players, on the other hand, may be more typical of their demographic. Although the

Mechanical Turk population may also include self-identified gamers, this population is more representative in its racial makeup than the self-identified gamers recruited through affinity networks.

147 The results for the Mechanical Turk are primarily driven by racial differences, and by

interactions of condition or gender with race. Although the effect size was modest, the game may

be most effective for Black and Hispanic players in changing attribution styles. In terms of

changing attitudes, Other players expressed more racist attitudes after the study, though this may

be an artifact of their exclusion from the Symbolic Racism measure. Players in this group also

demonstrated differences in client placement rates by player race, with White players placing

more clients than Black, Hispanic, and Other players. These play differences may drive the

differences in outcomes.

Although they were not recruited as gamers per se, this group also seemed more sensitive

to differences between game conditions. There were differences in the percentage of money

earned from members of the discriminated-against group between the four guess conditions, indicating that players responded to the guess condition with changes in play style. Perhaps as a result of the changed play behaviors, there were differences between bias guess conditions for the attitude tests. For both attitude tests, white players were unaffected by the bias guess condition, while Black, Hispanic, and Other players were sensitive to it. While the patterns of difference between conditions were different for racism and sexism, the fact that non-White players showed sensitivity to the difference between conditions is something we can build on.

This suggests that future design research regarding rewards should focus on working with people of non-White racial backgrounds, including those who are not self-identified gamers. It could be an intervention to change their attitudes and attribution styles around sexism, which is an issue for people of all racial backgrounds – but it would also be helpful to make change on the racism front. While in the long run, changing the minds of White players would be useful because of their comparatively dominant position in society, internalized racism is an issue that

148 can be addressed with non-White players; giving these players a systemic perspective on racism can, as discussed in chapters one and two, help them articulate challenges to the dominant narratives of race in American society.

The implications of these results will be further discussed in the final chapter of this dissertation.

149 Chapter 6: Summary and Discussion

Project Summary This dissertation proposed a specific theoretical mechanism through which games might be able to expand Americans' models of how racism and sexism function, and to change their attitudes about racism and sexism directly. It defined the theory behind this mechanism, the

“Playable Model – Anomalous Data” (PMAD) approach, and laid out design principles for creating games of this type.

Advance, a web-based PMAD game addressing issues of systemic racial and gender bias, was used to empirically test the PMAD approach. Advance was designed specifically for this project, using the PMAD principles, and developed using Flex and Google Apps. The study was deployed online. Subjects received either a control text or one of three versions of the Advance game, and data was collected about subjects' attribution styles and attitudes around racism and sexism before and after the intervention. Subjects in the three game conditions also had data about their in-game performance collected. Finally, demographic data was collected for all subjects.

Using this data, we were able to address the following research questions. Can a PMAD game that models systemic bias change the likelihood of players using systemic rather than agentic explanations for sexism and racism? How does the game perform compared to a control group, and are there differences between different versions of the game? Might it also change players' attitudes about bias? And how do attribution changes and attitude changes relate to player performance in the game?

This chapter sums up the findings presented in the previous chapter, discusses them in more depth, and considers the larger implications of the study.

150 Literature Popular rhetorics of racism and sexism in American society, such as color-blindness and choice feminism, implicitly rely on an individualistic, agentic approach (Bidell, Lee, Bouchie,

Ward, & Brass, 1994; Hughs & Tuch, 2000; Bonilla-Silva & Forman, 2000; Bonilla-Silva, 2006;

Hirshman, 2007). Just as these models propose that the ultimate fault for incidents of racism and sexism lies with the intentional behavior of a single individual, these theories make individuals independently responsible for disparities in outcomes between groups.

This approach inadequately explains the of racism and sexism in modern

America. Americans are increasingly unlikely to admit to explicitly sexist or racist attitudes, or to admit to behaving in intentionally racist or sexist ways (Blanchard, Lilly, & Vaughn, 1991;

Klonis, Plant, & Devine, 2005). At the same time, significant racial and gender disparities persist in housing, healthcare, employment, and beyond (e.g. Hausmann, Tyson, & Zahidi, 2010; Kozol,

1992; Lipsitz, 1995; Valian, 1999; Wenneras & Wold, 1997). These disparities are not enacted by independent, intentional actions of individuals, as the agentic rhetorics would have it; they are produced by complex systems of bias and privilege. The impact of these systems come not from single intentional incidents, but from patterns of behavior over time, feedback systems, and emergent effects (Gomez & Wilson, 2006; Adams et al., 2007; Feagin, 2006; Schmidt, 2005).

It is important to help Americans adopt a systemic understanding of racism and sexism for two related reasons. First, agentic explanations are insufficient to explain the mechanisms of racial and gender discrimination in modern American society. Americans who rely purely on their naïve conceptions or on popular rhetorics, both of which are agentically oriented, will be less likely to support the systemic remedies needed to address structural discrimination, since the proposed remedies do not match their mental models of how discrimination functions (Iyengar,

1989; Iyengar, 1994; Hughes & Tuch, 2000; Lau & Sears, 1981). Conversely, the agentically

151 oriented remedies they may be more willing to support are unlikely to help with systemic problems. Second, individualistic models of racism and sexism are often used to defend a racist, sexist status quo (Bonilla-Silva & Forman, 2000; Bonilla-Silva, 2006; Hirshman, 2007).

Adopting a systemic approach to racial and gender bias can help dismantle and neutralize common rhetorical moves.

In order to change players' models of racism and sexism, we turn to two different bodies of literature. First, we examine the existing literature on prejudice-reduction interventions.

Entertainment-based anti-bias interventions are interventions that use popular media forms, like radio or television shows, to reduce prejudice among media consumers. Such interventions have been demonstrated to be effective in the field, and they have the potential for reaching broad populations; however, they are poorly theorized (Paluck & Green, 2009). Building a testable theory for entertainment-based prejudice-reduction interventions would be a significant contribution to the literature.

To build the theory itself, we turn to the literature on conceptual change. One method for helping learners adopt a new mental model for understanding a complex process is to confront them with anomalous data (Chinn & Brewer, 1993; Hewson & Thorley, 1989). Anomalous data interventions have been demonstrated to produce conceptual change in the classroom, but they have primarily been tested with academic subjects (e.g. Watson and Konicek, 1990). The question is how to apply the anomalous data approach to models of racism and sexism in an entertainment context.

For an answer, we turn to the field of “games for change.” Games for change use the medium of games to change players' ideas, attitudes, or behavior about social issues (Sawyer,

2010). Like entertainment-based prejudice-reduction interventions, games for change operate on

152 two levels at once. For their designers, these games are ways of changing attitudes or behavior;

for players, they are one form of entertainment media. By deeply integrating theories of learning

into the core mechanics of the game, these two approaches can be unified (Isbister, Flanagan, &

Hash, 2010; Plass, Homer, Kinzer, Frye, & Perlin, 2011).

Games, of course, can take many different forms, from leapfrog to League of Legends

(Riot Entertainment, 2009), and not all games support all theories of learning equally well.

Within many games, players explore rulesets and test hypotheses about how to use those rulesets

most effectively to achieve their goals within the game context. Anomalous data – experiences

that indicate that their models of the game are incorrect – become tools for better understanding

the game. We frame games of this type as “playable models.” When the model of a game is

based on the target learning domain, playable model games seem to be a good fit for an

anomalous data approach.

Design Taken together, these theoretical approaches give us “Playable Model – Anomalous

Data,” or PMAD, design theory. This theory provides a testable approach to creating games that help players use systemic rather than agentic explanations.

PMAD theory defines an approach to building games, but it does not entirely specify the type of game that is created. For example, PMAD theory assumes that the model being built in the game is domain-specific; even within a domain-specific model, different aspects of the model can be emphasized or simplified, and different game experiences can be used to provoke the encounter with anomalous data. The PMAD design principles provide a general approach to creating a playable model and maximizing the encounter with anomalous data.

The PMAD design principles are as follows:

153 • The game system models the relevant domain.

• Player actions affect, and are affected by, the model.

• Players receive feedback about the impacts of their actions as they relate to the model.

• The game goals point players toward model conflict.

• Players can experiment with the game's model.

• Players must figure out rules and strategies for themselves.

For this dissertation, PMAD theory was used to create Advance, a game about systemic bias around race and gender14. Advance models the impact of microaggressions and of perceptual bias in the workplace (Sue, 2010; Sue & Capodilupo, 2008; Valian, 1999). Both rely on systemic rather than individual processes – repeated patterns and feedback systems – to have their most significant impacts (Sue, 2010; Valian, 1999), and both have significant negative outcomes for women and people of color (e.g. Harrell, Hall, & Taliaferro, 2003; Cortina & Kubiak, 2006;

Steele, 1997; Bertrand & Mullainathan, 2003; Steinpreis, 1999).

In Advance, the player takes the role of a corporate recruiter. The player earns money by placing clients into jobs; the player has a slate of clients to select from, and can also select from multiple jobs within the organization they are working with. By clicking on a client, the player can see information about that client, including their race, their gender, and their stats. By clicking on a job, the player can see the stat requirements for that job and its salary. When the player chooses both a client and a job, the game provides further feedback on how successful the client will be in that job, and on whether that client meets the job's requirements. If the client does not meet the job's requirements, the player can either spend money to upgrade their stats,

14Advance can be accessed and played at http://www.replayable.net/advance/.

154 select another client for the same job, or try another job for the same client.

Not all clients are equally easy to place. Each time the game is played, a discriminated- against group is randomly selected. For this group, some jobs are more likely to expose them to microaggressions, which interfere with their on-the-job success. They also need to be more qualified than a member of the dominant group would be to get the same job. By trying different clients in the same job, or different jobs for the same client, players can compare the situations of clients from dominant and non-dominant groups.

Upon successfully placing a client, the player earns money. The player also earns money when a client is promoted to the next level of the game; more successful clients are promoted faster, while truly unsuccessful clients get fired. Players benefit from money because it allows them to upgrade characters, and money also serves as their final score if they can keep their company afloat for the entire length of the game. However, they also must pay their business expenses using this money; if they do not have enough money to pay their expenses, they go bankrupt and lose the game.

Three versions of the game were created based on common design patterns (Bjork &

Holopainen, 2004). In each, players were explicitly asked to identify who was being discriminated against, and is offered a different reward for successfully doing so. The first version of the game is informational; with a correct guess, players have their suppositions about the bias in the system confirmed. The second version offers players a financial reward for getting the guess right. Finally, the third version of the game changes the rules to offer players a greater benefit from placing members of the disadvantaged group. In addition to looking at the overall impact of PMAD design, these three conditions allow us to examine whether players respond differently to different design patterns for reward systems in a PMAD context.

155 Research Questions and Results Broadly speaking, the study asks whether a PMAD game that models systemic bias can

change the likelihood of players using systemic rather than agentic explanations; whether it can

change players' attitudes about racial and gender bias; and what the relationship is between in-

game behavior and player change.

A total of 412 players were recruited to participate in this study. One group of players

was recruited online based on their participation in online gaming. A second group was recruited

using Amazon's microtask service, Mechanical Turk. As the two groups were not comparable,

results for the two groups are described separately.

In order to examine the overall impact of the game, players were randomly assigned to

one of four treatment conditions: three game conditions or a text-based control group. Pre- and post-test data on player attributions and attitudes around racism and sexism were collected.

Using this data, we were able to answer the research questions noted in chapter four and restated below with a summary of the findings. The findings summarized under each question are further discussed later in this chapter.

Q1: Controlling for pre-test scores, are there differences in attribution test scores for

race and gender across the four study conditions?

This question looks across the four treatment conditions: the control group, who received a text-based intervention; the informational game group, who could confirm their hypotheses about the bias in the game; the financial game group, who received a monetary reward for correctly identifying the bias; and the generative game group, who received more money for placing bias-group clients after a successful bias identification.

156 For web-based players, subjects in the control condition were more likely to use systemic explanations for sexism after the intervention; their scores were significantly different from the game conditions taken together. No differences were observed between conditions for any of the other measures. For Mechanical Turk players, no significant differences between conditions were found.

Q2: Are there associations between in-game measures (such as player score, number

of characters placed, and number of game plays) and attribution test scores?

No associations were found for either group. However, for the Mechanical Turk players, there was an effect of player race on the number of characters placed.

Q3: For game players, are there differences in measures of systemic understanding

of racism and sexism across bias guess conditions?

This question looks across the four bias guess conditions: players who did not attempt to guess the bias, players who attempted a guess in the informational game condition, players who attempted a guess in the financial game condition, and players who attempted a guess in the generative game condition.

For web-recruited players, no significant differences were found between the bias guess conditions. For Mechanical Turk players, no significant differences were found between the bias guess conditions for White players; however, differences were found between bias guess conditions for non-White players. For detailed results, see chapter five; the meaning of this

157 finding is discussed later in this chapter.

Q4: For game players, are there differences in game performance measures across

bias guess conditions?

For the web-recruited players, no differences were found in game performance measures

across bias guess conditions. This was also the case for the Mechanical Turk players. However,

there was an effect of player race and gender on the percentage of money earned from clients

who were being discriminated against.

Q5: Controlling for pre-test scores, are there differences in attitude test scores for

race and gender across the four study conditions?

For web-recruited players, no significant differences were found between treatment

conditions. However, for Mechanical Turk players, a significant difference in attitudes toward

sexism was found between the four treatment conditions. Subjects in the control condition

expressed less sexist attitudes after the intervention; although none of the game conditions

caused a significant change in player attitudes, there were still significant differences between

them. See chapter five for more details.

For both web-recruited and Mechanical Turk players, the entire intervention – including

the control group – elicited more racist attitudes at post-test. Although there were no differences in this result between treatment conditions, these findings are discussed later in this chapter.

158 Q6: Are there associations between in-game measures (such as player score, number

of characters placed, and number of game plays) and attitude test scores?

For the web-recruited players, no significant associations were found between in-game measures and attitude test scores. However, for the Mechanical Turk players, a significant negative correlation was found between player score and the Modern Sexism post-test score; a higher player score was associated with a lower post-test score, indicating that more successful players expressed more sexist attitudes at post-test.

Q7: For game players, are there differences in measures of attitudes toward racism

and sexism across conditions?

For the web-recruited players, no significant differences were found between the bias guess conditions. For the Mechanical Turk players, no significant differences were found between guess conditions for White players. However, significant interactions were found between player race and treatment condition for non-White players on both the racism and sexism attitude measures. Detailed results can be found in chapter five; the meaning of this finding is discussed at greater length later in this chapter.

The results described above are summarized in Table 34. To better understand these results, we cluster them into the following three broad categories: treatment condition effects, guess condition effects for web-recruited and White Mechanical Turk players, and guess condition effects for non-White Mechanical Turk players.

159

Table 34

Summary of study findings

Sexism – Web Racism – Web Sexism – MT Racism – MT Q1: Attribution Control increased differences between chance of systemic No effect No effect No effect treatment conditions attributions Q2: Attribution No effect No effect correlations with No effect No effect (Race effect on game (Race effect on game game data data) data) Q3: Attribution Non-White players Race x gender differences between No effect No effect outperform White interaction bias guess conditions players Q4: Game data differences between No effect (game data) Race x gender interaction (game data) bias guess conditions Q5: Attitude All conditions All conditions Control decreased differences between No effect increased racist increased racist sexist attitudes treatment conditions attitudes attitudes Q6: Attitude Negative correlation correlations with No effect No effect No effect with player score game data Q7: Attitude Differences found for Differences found for differences between No effect No effect non-White players non-White players bias guess conditions

Results cluster one: treatment condition effects. This cluster of results deals with the differences between the four treatment conditions – the control text and the three versions of the game.

When dealing with issues of sexism, effects were found for the control text for both the web-recruited and Mechanical Turk players. For web-recruited players, the control text increased their likelihood of using systemic explanations for sexism, while the game, taken as a whole, did not. For Mechanical Turk players, subjects in the control group expressed less sexist attitudes

160 after the intervention. While subjects in the other three game conditions showed no pre-post

change, the differences between game conditions were still significant. In both cases, the effect appeared only for sexism and not for racism – a difference discussed in the next section of this chapter.

When dealing with issues of racism, there were no significant differences between treatment conditions. However, the study as a whole caused players in both groups to express more racist attitudes at post-test than at pre-test. This effect was found for all web-based players and for Mechanical Turk players reporting a racial identity of Other.

We cannot conclude that this was an effect of the game, because the control text had the same effect on study participants as the game did. Rather, we conclude that when it comes to racism, something about the study itself caused all web-recruited subjects and some Mechanical

Turk subjects to express more racist attitudes. We consider possible explanations for this finding, and discuss its implications, later in this chapter.

Results cluster two: bias guess effects for web-recruited players and for White

Mechanical Turk players. This cluster of results deals with differences between the four bias guess conditions. Among subjects who received a game condition, we look at differences between players who did not make a guess about the bias and players who made such a guess in each of the three game conditions.

Web-recruited players displayed the same patterns as white Mechanical Turk players.

Although there were significant pre-existing differences between the two groups, we consider their results together for clarity.

For these two groups of players, no differences were found between bias guess conditions on any of the in-game or outcome measures. We discuss the implications of this null finding later

161 in this chapter.

Results cluster 3: bias guess effects for non-White Mechanical Turk players. These

results deal with differences between the four bias guess conditions. Among subjects who played

the game, we look at differences between players who did not make a guess about the bias and

players who did make such a guess in each of the three game conditions.

For non-White players from the Mechanical Turk group, differences were found between bias guess conditions for both the attribution and attitude measures. As described in the previous chapter, non-White players were significantly more likely to use systemic explanations for

sexism after the game than White players were, although neither group showed significant

change from their pre-test performance. Non-White players also showed significant differences

between guess conditions for both the Modern Sexism and Symbolic Racism measures. Although

for the majority of guess conditions there was no significant change in attitude from their

performance at pre-test, non-White players scored significantly lower on the Symbolic Racism

test after playing a game in which they did not make a guess. Possible explanations for this

finding are discussed later in this chapter.

A more complex pattern was found for the Systemic Racism test, where there was an

interaction between race and gender. Among women, White women were less likely to use

systemic explanations for racism than their non-White peers at post-test, and significantly decreased their likelihood of using such explanations from their pre-test performance; among men, White men were more likely to use systemic explanations for racism than non-White

players, although none of the groups showed a significant change from their pre-test performance.

In-game performance differences aligned with these results. White women earned a

162 higher proportion of their score from members of the discriminated-against group than non-

White women did, while Black, Hispanic, and Other men all earn a higher proportion of their

score from bias group members than White men did.

Additionally, non-White players placed fewer characters during the game than White

players did – but with no significant difference between scores. This difference suggests that

non-White players from the Mechanical Turk group relied on quality, rather than quantity, of

placements in order to succeed in the game.

These results taken together suggest that non-White players may be more sensitive to the effects of the game than White players, and that the race-gender interactions found here require further investigation. These findings will be further discussed later in this chapter.

Discussion The three clusters of findings, above, have raised several complex issues that need further investigation. We now return to each of the three clusters, looking into the issues they raise in greater depth.

Results cluster one: differences between control and game. The pattern of differences between the control and the game conditions raises several questions. First, why did the control text have an impact on sexism, but not racism? Second, why did the intervention as a whole make players express racist attitudes, and what might that imply? Finally, what do the results, and particularly the differences between racism and sexism, mean for the project as a whole? We deal with each of these questions in turn.

As described above, for both player groups, the control text performed significantly better than the game at getting players to change their attribution styles (for web-based players) and attitudes (for Mechanical Turk players) about sexism. However, there were no differences

163 between control and game for racism for either group. Why did these differences between sexism and racism appear?

It is possible that the control text conveyed ideas about sexism more effectively than it did for racism. However, the text was constructed to alternate between examples using race and examples using gender for each of the concepts explained. Given that there were also differences between player groups in which measures were affected by the control text, we propose that a more likely explanation has to do with underlying differences in players' prior conceptions of racism and sexism. As we will see, that is consistent with other results found in this section.

While there were no differences between control and game conditions for racism, the intervention as a whole caused players to express more racist attitudes after completing the study.

We therefore ask why the study made players express more racist attitudes.

First, we dismiss the hypothesis that playing the game caused players to express more racist attitudes, because the control condition also caused players to express more racist attitudes at post-test. Reading text about racism had precisely the same effect as the game – namely to cause players to express more racist attitudes after the study. The activity itself, regardless of format, had the same effect on study participants.

The existing literature on racism notes that people's attitudes about racism do not always match what they consider to be socially appropriate. As with other social desirability effects, people want to conform to what they think is appropriate behavior (Holtgraves, 2004; Uziel,

2010). However, social norms about the expression of racism are strong, so having this discrepancy revealed can be unusually stressful (Devine, 1989; Fiske, 1998; Crocker, Major &

Steele, 1998).

One hypothesis is that maintaining an appropriate set of beliefs is a cognitively

164 exhausting task. It takes effort for a person to avoid expressing or acting on underlying

prejudices that they cannot entirely eliminate (Devine, 1989; Monteith, 1993). When a person's

willpower is exhausted, or under circumstances of high cognitive load, underlying attitudes will

be expressed (Macrae, Bodenhausen, Milne, & Jetten, 1994). However, in this case we might

expect to see a difference between the control text and the game. There is no reason to believe

that reading a text is as hard as learning a new game and playing it multiple times. While the

cognitive exhaustion theory makes sense as stated in the literature, it may not fully explain what

is happening here.

Instead, we turn to the concept of “moral credentials” (Monin & Miller, 2001). When

people have a chance to establish themselves as non-racist, they are more likely to act in ways that could possibly be interpreted as racist, because they feel they have proven that racism could not be the explanation for their actions. For example, Effron, Cameron, and Monin (2009) found that subjects were more likely to describe a job as suited for whites if they had just endorsed

Obama for president, compared to endorsing a White Democrat for president or identifying

Obama without endorsing him.

We propose that taking part in this study served as a “moral credential” for subjects, who then felt more comfortable expressing ambiguously racist attitudes at post-test. In other words, the study did not cause players to become more racist. Rather, it made them more likely to express their true attitudes about racism. Paradoxically, expressing more negative attitudes about race can actually be a positive thing for anti-bias work. People with racist attitudes will find justifications to validate them, even if they are not willing to explicitly commit to them

(Uhlmann & Cohen, 2005; Gaertner & Dovidio, 1977). When people conceal racist attitudes with rhetoric, it only leads to racist outcomes that use a different set of justifications. Exposing

165 those attitudes can be the first step to changing them. For example, a value-consistency approach

is one of the validated techniques for reducing prejudice (Paluck & Green, 2009; Rokeach,

1973). When individuals are confronted with a conflict between their prejudice and another value that is important to them, such as equality, they behave in less prejudiced ways (Grube, Mayton,

& Ball-Rokeach, 1994). For these techniques to be effective, subjects must be aware of their own prejudice.

Moral credentialing is also a concern for any anti-bias intervention. By being the kind of person who takes part in an anti-bias intervention, one could see oneself as creating moral credentials that would allow the expression of ambiguously racist beliefs or behavior. This hardly invalidates the importance of anti-bias work itself.

However, there is a question about this interpretation as well. Monin & Miller (2001) have shown that the moral credentialing effect works for sexism as well as for racism. Why do we only see it for racism here, while for sexism we see a benefit from the control text and no impact from the game? It might be an impact of the Symbolic Racism test itself (Henry & Sears,

2002). Because the Symbolic Racism test discusses only anti-Black racism, subjects who were not Black may have reacted poorly to having their experiences with racism omitted from the study.

The deeper finding here is that the patterns of behavior for sexism and racism were

different. For sexism, the control condition had a positive impact on player attributions (for the

web-recruited players) and attitudes (for Mechanical Turk players). For racism, the study as a

whole caused players to express more racist attitudes. What this suggests challenges one of the

underlying premises of this study. The premise of the game was to model the common structural

underpinnings of sexism and racism. While sexism and racism are not identical in practice, they

166 could be deconstructed to produce a common set of patterns that produce both racial and gender

disparities, and that therefore could be addressed by the same set of game mechanics. In this

case, those were the patterns of microaggressive stress and perceptual bias.

The patterns themselves may operate the same way for race and gender – but players do

not. The game simplified racism and sexism to this common model, but it did not successfully

induce the player to reduce racism and sexism to the same model, or at least to do it without

incorporating their previous understandings of the phenomena. For example, Sidaneus and

Veniegas (2000) argue that sexism and racism are qualitatively different forms of discrimination,

with racism functioning primarily as an act of aggression and sexism functioning primarily as an

act of control. These differences may have been more salient than the common structural model.

Players bring their own assumptions about race and gender to the table, as well as their

prior experiences with racism and sexism. While the study attempted to control for this by

controlling for player race and gender, this approach did not succeed. The assumption was that

people would respond to the game based on their dominant or non-dominant position in

American society, which would shape their prior experiences. Instead, all players showed the

same pattern of differences between racism and sexism in this results cluster, indicating that it

may be the differences in their conceptions of racism and sexism, rather than their personal experiences with racism and sexism, that are driving these findings.

The broader lesson of these results is that underlying structural commonalities between

real-world phenomena do not override differences in players' prior understandings of those phenomena, and that those conceptual and social understandings may even override differences in personal experience that would lead players to experience the game differently. That said, there were differences between players by race (and, occasionally, gender), which we describe

167 more fully below.

Results cluster 2: bias guess effects for web-recruited players and for White

Mechanical Turk players. For web-recruited players, and for White Mechanical Turk players, no differences were found in either player behavior or outcomes between the bias guess groups.

In other words, players who never encountered the bias guess system behaved the same way as players who did encounter it; among players who encountered the bias guess system, it did not matter which version of the bias guess system they used.

This result could be interpreted to suggest that PMAD theory is not useful for helping players achieve conceptual change. PMAD theory predicts that the design of the bias system will affect player learning. If the type of bias system made no difference to players’ attitudes or attribution styles, and if there was no difference between players who encountered the bias system and players who did not, then perhaps PMAD theory is simply wrong.

While this explanation sounds plausible, we reject it as a conclusion. First, there were indeed differences found for some players, as will be discussed later in this chapter. More importantly, this explanation does not account for the mechanism by which the theory operates.

PMAD theory predicts that the differences between game versions will drive differences on the outcome measures because of players' response to differences in the game rules. Change in attribution style and attitudes, therefore, would be the result of differences in play behavior.

Because we found no differences in play behavior between bias guess conditions for these groups, we cannot draw conclusions about whether this aspect of PMAD theory is effective.

Instead, we can draw the more limited conclusion that the game's design did not successfully drive players to change their behavior in the game. Because subjects played the same way across all game versions, their experience in confronting the anomalous data presented by the game's

168 model was effectively also identical.

The question, therefore, is why Advance, which was designed according to PMAD

principles, failed to evoke differences in player behavior between game conditions for web-

recruited players and for White Mechanical Turk players. We consider possible explanations for

this, and suggest ways in which the PMAD principles can be changed to make model encounters

more salient to the player. From there, it will be possible to conduct further studies to investigate

how these changes would drive differences in player behavior, and to test whether such play

differences would in fact change players' conceptual models about discrimination.

First, we consider that the game may have failed to drive differences in behavior between

groups because it was insufficiently fun. After playing the game, players were asked about their

enjoyment of the game. Fewer than 25% of players reported more than moderate liking for the

experience, or reported that they would be likely to recommend it to a friend. The game replay

data are similarly suggestive; although all players had the option to continue with the game after

the required two playthroughs, only 3.6% of players did so. Players who did not enjoy the game

may have been less motivated to attend to its underlying rules and processes for their own sake.

Instead, they may have focused on trying to complete the experience and move on. While fun is

difficult to reduce to a single design principle, this possibility suggests that future projects based on PMAD theory should use better metrics for player enjoyment during play-testing and pilot- testing.

Another possible explanation for the lack of differences between player groups is time on task. Time on task is a significant factor in learning (Arlin & Roth, 1978; Bell & Davidson, 1976;

Cobb, 1972). In the intervention as currently designed, the vast majority of players engaged in ten minutes of gameplay over two play instances. The time spent playing the game may simply

169 have been insufficient to engage players with the game's model, let alone have them figure out strategies with which to respond. If time on task were the critical factor, however, we might expect to see an impact for players who chose to play a third time, since they would actually spend more time on task. This is not, in fact, the case. As described in the previous chapter, there were no significant differences between players who played twice and players who played more than twice on any in-game or outcome measures. However, as with fun, we note time on task as a possible issue to investigate in future PMAD-based designs.

A third possible explanation is problems with player feedback. During play-testing, players described difficulty understanding the in-game reasons for their play experiences. In response, more of the game's model was made directly visible to players. For example, players described confusion about why some jobs were better than others for a given client. In response, during the client placement process, the reactions of adjacent NPCs to the given client were represented visually by hearts (for positive relationships) and skulls (for negative relationships).

This allowed players to see the pattern of relationships for a given client in a given job. However, this access to the game's underlying model may not have gone far enough. Players were able to see the pattern of who they were placing, but they did not have an easy way to see who they had placed. In order to see the patterns in their own placement behavior, players would have to hunt through multiple levels, clicking on each of their successfully-placed clients individually and mentally tracking what they'd found. At best, they might be able to infer which types of clients they were successfully placing based on the “leftover” clients in their queue.

This information was critical to players developing models of bias, because the bias in the system emerges in patterns of placement. The difficulty of understanding these patterns was meant to be an engaging central mechanic of the game, but it may simply have been too complex

170 a task in too short a period of time. Giving players explicit access to their own behavioral

patterns – for example, by allowing them to bring up a control panel showing who they'd placed

on which level – might have helped players quickly see what was happening, in time to develop strategies to respond to it for a higher score. This suggests that, for example, the principle of feedback may need revision. Making patterns more accessible to players more quickly may help

players get to the model confrontation faster, and may help them develop goals and strategies to

accommodate the model shift.

The issue of feedback also relates to the fourth possibility: reflection. Reflection can help

players learn more from their experience in games (Moreno & Mayer, 2005). Advance does not

explicitly ask players to reflect on their in-game behavior or experiences. The game was intended

to use players' difficulty in client placement to induce them to reflect on possible strategies for

improved performance. However, players may not have done this on their own.

In some ways, the game actively discouraged players from taking time for reflection.

While finding a good placement for a client earned more money than a poor placement, there

was a significant opportunity cost of spending time hunting for a better placement, with no

guarantee of success. This tradeoff was meant to cause players to develop more efficient

strategies for placement, so that they could earn more money with less time invested. However,

players may have felt uncertain that they would be able to benefit from spending more time on

the task, and therefore disproportionately chose to place characters as quickly as possible instead

of thinking carefully about their options (McGuire & Cable, 2013).

To address this issue, a new PMAD design principle could be added that deals with

reflection. Reflection, in this context, is meant to occur when the player has a decision to make

about the best way to engage with the game's model in pursuit of their in-game goals. PMAD

171 theory predicts that players' encounters with the game model, and subsequent engagement with it through strategy development, help them integrate the anomalous data into their .

Instead of focusing on getting the player to reflect on the whole game experience after playing, the new principle would emphasize repeated opportunities to reflect on their strategy, performance, and goals during play.

New principles would also need to address a fifth issue, namely the issue of centrality.

The current version of Advance was designed to make interactions with the bias system optional.

Players would always be trying to place clients within a biased system, but they did not have to attempt to identify the bias or take advantage of the rewards for doing so. This design decision was meant to reduce the players' sense of mandatory engagement with the bias system (Heeter et. al., 2011), so that players would be freely choosing whether or not to engage with that mechanical subsystem during play. Unfortunately, over 50% of players chose not to interact with the bias system at all. The bias system was not central enough to motivate the majority of players to interact with it, even though it was affecting their play. Without making the bias system mandatory, there are ways to make it more central to the design. Developing a new principle, and testing the impact of centrality, would be a good way to investigate this possibility.

Finally, a PMAD design principle would need to be added to account for the issue raised by the first results cluster, namely that player preconceptions need to be taken into account when designing the game's model. For this study, extensive research was conducted about players' models of racism and sexism, but things that are tangential to the model may actually be driving significant differences – such as implications about how the society described by the game handles issues of bias – as we will see in the next cluster of results.

Results cluster 3: bias guess effects for non-White Mechanical Turk players. There

172 are several race-based effects and interactions in this cluster that require deeper investigation.

First, we consider the issue of why non-White players outperformed White players on systemic

attributions of sexism. Second, we look more deeply at the interaction between race and gender

for the systemic racism test. Finally, we consider why different game conditions may have

affected attitudes for this particular group, both in terms of differences between conditions and in

terms of some conditions displaying pre-post differences.

First, we examine the finding that Black, Hispanic, and Other Mechanical Turk players were more likely to use systemic attributions for sexism after playing the game than White players were. This effect may be driven by play differences between the groups. Black, Hispanic, and Other players placed fewer characters than White players, but without any difference in overall game score. Since the game ran for the same length of time for all players, this finding suggests that this player group spent more time on placing each character, and made better per-

character investments. This difference in engagement with the game may explain why the non-

White players were more willing to use systemic explanations at post-test.

That said, this finding requires further explanation. Even given differences in play behavior between White and non-White players, why would this effect appear only for systemic explanations for sexism? This may be a population effect. Gaming culture is quite sexist; for example, over 85% of characters in videogames are male (Williams, Martins, Consalvo, & Ivory,

2009), identifiably female players are disproportionately harassed (Kuznkeoff & Rose, 2012), and prominent women designers and critics face enormous, gendered hostility (Lewis, 2012).

Women who are more sensitive to sexism may have been driven out of gaming culture entirely, leaving a population of game-playing women who are less sensitive to the issue. Gaming culture is also racist, in that Black and Hispanic characters are underrepresented in games (Williams et.

173 al., 2009) and racial insults are commonly deployed among gamers (Gray, 2012). Women, however, may find it harder to “pass” in gaming culture while still participating fully. For example, many online games require voice chat, where it is easier to identify female voices than the voices of non-White players, and where female voices have been shown to provoke hostility and gender-based harassment (Kuznekoff & Rose, 2012). This effect appears only for the

Mechanical Turk group, not for the web-recruited group. However, as noted above, web- recruited players may not be particularly representative of gaming culture, since self-identified gamers willing to participate in a study addressing bias are bucking a gaming culture that is racist and sexist in both subtle and overt ways.

Second, there was one area where an interaction between race and gender was found: on the Systemic Racism test. Women and men showed opposite patterns. White women scored lowest among women, while White men scored highest among men; Black and Hispanic women scored highest among women, while Black and Hispanic men scored lowest for men; both male and female Other players scored in between the other two groups. In other words, White men and

Black and Hispanic women were most likely to use systemic explanations for racism at post-test.

A similar, but not identical, interaction was found between race and gender for how much the player earned from discriminated-against clients. Again, the patterns for women and men were reversed. Among women, White women earned the highest proportion of their score from discriminated-against clients, while White men earned the lowest proportion among men.

However, instead of Black and Hispanic players situated at the other extreme, Other men earned the highest proportion of their score from bias-group members among men, while Other women earned the lowest proportion among women. Black and Hispanic players scored in between.

Although we have no direct evidence that play differences drove the differences in

174 likelihood of using systemic explanations for racism at post-test, the patterns are suggestively similar, and it is certainly consistent with PMAD theory. If this play difference is indeed responsible for the differences on the Systemic Racism test, then it implies that players who were more successful at earning money from bias-group clients were less likely to use systemic explanations for racism after playing the game. To explain this, we must recall that there were no differences between groups for the number of bias-group clients placed, or for the total number of clients placed. The extra income being earned from the bias group, therefore, must be coming from promotion bonuses and from the higher salaries available at higher levels. In other words, the players who were more successful at promoting discriminated-against clients were less likely to use systemic explanations for racism after the study. These players may have concluded that if they were able to place these clients, the problem couldn't be all that serious. Frustration with the system, on the other hand, may have been comparatively more productive at getting players to consider systemic explanations for bias. The frustration hypothesis may also explain the finding that for Mechanical Turk players, a higher game score was correlated with more sexist attitudes at post-test.

Finally, we consider the differences between game conditions that were found for non-

White Mechanical Turk players. These differences appeared only for the attitude tests, which gives us a clue as to what might underlie these differences.

Although there were differences between Modern Sexism and Symbolic Racism, in both cases players performed best on conditions where they received money in some form for reporting the bias – the financial-guess condition for sexism, and the generative-guess condition for racism. Conversely, the only case in which there was a significant pre-post difference was on the Symbolic Racism test in the no-guess condition. Players held more racist attitudes at post-test

175 when they did not realize they had the option of doing anything about the bias in the game

system, even if only reporting it.

We propose that this has to do with non-White players being sensitive to the vision of the

larger world presented by the game. What happens in a world where bias exists? Will someone

do something about it? Or is the unjust state of affairs simply presented as normalized and

inevitable? When players did not make a guess, they encountered bias helplessly; when they did,

even if they did not succeed in guessing the bias, they discovered that someone cared about

whether bias existed. Players expressing less racist attitudes when they thought they could do

something about the game's bias, and more racist attitudes when they had no such option,

suggests a retroactive justification for their game experiences (Jost & Banaji, 1994; Jost, 2002;

Jost, Banaji & Nosek, 2004).

The original hypothesis of this study was that players would be sensitive to the variation

in the mechanics of the reward conditions and respond by making different play choices.

However, these results suggest that players were more sensitive to the social implications of differences between conditions. Instead of responding with game strategies, they responded by developing a model of the game world's implicit attitudes toward justice and justifying it. This question is worth investigating more deeply. It suggests that players perform narrative extrapolation even in simple games, which challenges earlier findings that players pay more attention to game mechanics than story (Lindley, 2002). A better understanding of this extrapolation process can help designers use narrative framing more effectively to convey their game concepts, and, conversely, to avoid investing effort in narrative elements that are not necessary to make their point.

We must, however, consider why these effects did not show up for the web-recruited

176 subjects. This may be due to the small number of non-White web-recruited subjects, which

prevents us from picking up on race-based effects. However, it may also be that there is a ceiling

effect due to differences between groups. Players from the web-recruited group were more likely to use systemic attributions for racism and sexism, and they held less racist and sexist attitudes.

The race-based effects found here may only be visible among a more biased population.

Limitations of the Study There are significant limitations to the study, which need to be kept in mind as we interpret the results and determine its implications.

First, the study is limited because of its player population. Two groups of subjects were recruited, one from web-based communities of online gamers and one from Amazon's

Mechanical Turk. We can question to what extent these subjects are representative of the population as a whole. Both were recruited in ways that suggest there are limits to their generalizability – whether because they are self-identified game-players or because they spend

their free time on Mechanical Turk. (Though of course these categories are not mutually

exclusive.)

Second, there were demographic issues with the study. It failed to recruit enough non-

White players in the web group to know whether there are differences in play based on racial

identity; even in the Mechanical Turk condition, where more non-White subjects were recruited,

the demographics of the subject pool do not reflect the larger demographics of the United States

(Howden & Meyer, 2011; Hume, Jones, & Ramire, 2010).

Third, the study was deliberately limited to American participants. This was done because

race as a social construct is understood differently in different countries. To get at attitudes and

attributions surrounding race, we would need to use different instruments and different models

177 that are appropriate for different groups of subjects. Additionally, we have reason to expect that attribution patterns are culturally specific (Choi, Nisbett, & Norenzayan, 1999; Norenzayan,

Choi, & Nisbett, 2001). Because this study specifically looked at attribution styles, we limited our study to American subjects to avoid cultural confounds.

Finally, the major limitation of this study is the design of the game itself. While this dissertation attempted to create a game that conveyed the model and functioned successfully as a game, the game was not entirely successful. Although many problems were caught in play-test, the pilot study found that many players did not understand the game interface. A tutorial was added before the full study was performed, which helped players understand the game, but player satisfaction rates were still low.

Implications for the Literature In light of the above-noted weaknesses and limitations, what are the implications of this work as related to the body of literature used to develop this dissertation?

Based on this study, there is no reason to doubt that it is important to change Americans' models of sexism and racism. Although this study did not find that Advance was more effective than a control text at changing players' attribution styles, the original problem posed by the dissertation still stands. Using dominant, individualistic models of racism and sexism inadequately explains systemic racial and gender bias, and switching from the former to the latter is a difficult problem. Without a systemic model of bias, people are less likely to support systemic interventions to reduce racial and gender inequity (Iyengar, 1989; Iyengar, 1994;

Hughes & Tuch, 2000; Lau & Sears, 1981), and hence less likely to effectively advocate for social justice. Similarly, changing people's attitudes about racism and sexism matters because attitudes shape social norms (Oskamp, 2000). The findings of this study in no way negate the

178 need for social change.

Although it is likely that PMAD theory will require significant modification and testing before it can be said to be effective, this approach can still aid in the development of

entertainment interventions for prejudice reduction. There are projects which have been

successful in this area (e.g. Paluck, 2009), but the literature is inadequately theorized (Paluck &

Green, 2009). While PMAD theory, as currently constructed, does not change outcomes for

players, it does provide a testable theoretical base for future work. Testing the changes and

additions proposed in this chapter, for example, may lead to PMAD being shown to be effective.

In other words, it is too soon to say that PMAD theory does not provide anything useful to the

prejudice-reduction enterprise. Advance was only a first attempt at building games based on this

theory, and the fact that there is a testable and modifiable theory in the first place is a step

forward for this sub-field. The PMAD design principles, even in their currently-limited form,

give us a way into the problem of designing games as entertainment-based prejudice-reduction interventions.

This study also may have implications for the field of games for impact. This study demonstrates that even when a game provides a simplified model, players may bring their prior knowledge with them across the border of the magic circle (Huizinga, 1950). For example, players appeared to be more influenced by differences in their prior knowledge of racism and sexism than by the shared model of racism and sexism constructed by the game. This is consistent with existing work in the field (e.g. Copier, 2007; Consalvo, 2009), but it demonstrates that this effect can be present in casual games as well as for more in-depth gaming experiences.

The finding that non-White players are the most sensitive to the game is a challenge to

179 the usual recruitment issues for game-based research, which often focus on White players or on

games that are primarily played by White players. While we do not yet know if this finding holds

true for all PMAD games, or only for PMAD games about racism and sexism, or only for

Advance in particular, further investigation can answer this question and challenge the field to widen the search for its research subjects and potential audiences.

Finally, this study suggests some of the complexities of interventions within a game context. Reading the text had no impact for the majority of measures for each group; it is possible that reading text within the context of a game activity made players take it less seriously. As suggested earlier in this dissertation, games' cultural position makes them accessible to unenthusiastic learners, but it may also make the material in the game less credible

– even if that material is not itself part of the game. This suggests that the positioning of game interventions is something worth investigating further, not just for this study but for game research of all sorts.

Implications for Future Research and Practice As discussed earlier in this chapter, there are many research questions that emerge from the findings of this dissertation.

First, there are many questions that could be answered with further analysis of the data collected in this study. For example, full gameplay logs were collected for every player.

Analyzing these logs would allow us to look more deeply at the question of how player behavior in the game drives conceptual change. We could work to identify play patterns associated with conceptual change around issues of race and gender, and then validate the predictive value of those play patterns in a second study.

There are also research questions to be answered about whether the Advance game would

180 prove to be effective in other contexts. Specifically, we could investigate the possibility of preparation for future learning (PFL) effects using Advance. Preparation for future learning, in this context, means that games can function as virtual experiences that players can build on when making sense of new concepts (Hammer & Black, 2009). Would playing the game change the extent to which players are later affected by more formal learning experiences around discrimination and bias? For example, how would players perform if they first played the game and then read the control text, or if they first read the control text and then played the game?

Looking at PFL effects with Advance would not just help us investigate ways of changing players' models of racism and sexism. It would also help us understand PFL effects in games better. While some games appear to prepare learners for future encounters with formal learning experiences such as reading from a textbook, others do not (Hammer & Black, 2009). Knowing whether PFL effects appear in Advance would help us learn more about the game features and knowledge domains where game-based PFL can be the most effective.

We could also study the impact of situating the game within a more explicitly educational context. This might change what players attend to within the game, especially if framed in a

“productive failure” context (Kapur, 2008). Gameplay could be understood as a challenging and potentially unsolvable problem to which a subsequent text intervention could provide a solution.

Given that the only finding of differences in game-play effects between game conditions indicated that groups who struggled more with earning money from discriminated-against clients were also more likely to use systemic explanations for sexism, we consider that this might be a fruitful approach.

Earlier in this chapter, we proposed possible additions and alterations to the PMAD principles. Each of these principles would need to be investigated more thoroughly. In future

181 studies on the PMAD principles, the research would be differently constructed. First, we would

iterate on designs using a given principle to determine whether it provokes the desired gameplay

differences. Only then would we investigate whether in-game behaviors affect any outcome measures. Additionally, further research on the PMAD principles can investigate the importance of the principles already defined in this dissertation. While they are based in both game design and educational theory, they can be individually tested to see to what extent they are important to

PMAD theory. For example, the first principle (model-building) will need to be modified to account for player preconceptions that are not model-based, and such modification should be empirically tested.

Additionally, we found that the challenge of Advance was that the differences between conditions did not drive differences in play behaviors. There are elements of the game beyond the PMAD principles that could be modified and tested in order to get players to react differently.

For example, the study’s findings suggest that players may have connected to the game narratively and emotionally. We could investigate whether creating more of a personal connection for the player, such as by allowing them to name their company, would increase the impact of the game. Similarly, we could investigate the effect of creating a deeper personal connection between the player and the characters they are helping, such as by allowing the player to see a particular character’s backstory or by having characters express gratitude to the player.

We could also investigate the impact of allowing players to change the bias in the system, as opposed to simply react to it. Bias in Advance is presented as an inescapable and unchangeable fact; players who fail to successfully navigate it may feel helpless, while players who successfully place discriminated-against characters may conclude that their success means systemic bias does not exist. Allowing players to modify the bias system can help address both

182 these issues. For the former group, it demonstrates that discrimination can be fought; for the

latter group, it allows them to see that their success could be even greater in a fairer world.

Finally, engaging with the game before and after the change in the bias system could help players

compare and contrast the two rulesets in order to understand the systemic aspects of

discrimination more deeply.

Other findings of this dissertation suggest that it is worth further investigating to what

extent player preconceptions cross the line into the game, and to what extent the game's model

can challenge or undermine those preconceptions. Although the game's model was built on the

similarities between racism and sexism, differences appeared between the two. The control text

helped players address issues of sexism, while the study as a whole caused players to express

more racist attitudes. The attempted simplification to a common model did not succeed, possibly

because of players' prior experiences with the concepts. It is worth trying to understand which

features of racism and sexism, as social phenomena, remain attached to the game even when

using simplified models, and which are successfully abstracted away.

Similarly, the original hypothesis of this study prioritized the mechanical differences between game reward conditions as a way of creating differences in player behavior. However, the results of the study suggest that players are at least equally sensitive to the social implications of game rules, and that they can be affected by these implications even without changing their play behavior. It suggests that future designs of games for impact need to account for the affective and narrative environments suggested by gameplay, and what features of games are most effective at conveying such elements.

Finally, there is the issue of the impact of player race on the results of the study. By looking at a sample of web-recruited non-White subjects, we could see whether they show the

183 same patterns of difference as the Mechanical Turk players, indicating that this study's difference in findings were the result of small sample size, or whether there are indeed underlying population differences. It would be highly informative to know whether the same pattern of results applies across the two groups.

Whether or not it does, our future research and practice involving PMAD design needs to address the fact that the impacts of the study were greatest for non-White players who were not recruited through gaming communities. These players played differently from web-recruited and

White players, and they were most sensitive both to the study’s overall effects and to differences

between conditions. If this is the population for whom this type of game is most effective for

addressing bias, we need to consider what this means for the possibility of making social change

through PMAD game design.

Do non-White Americans even need help changing their models of how racism and

sexism operate? And do they need interventions aimed at their attitudes? We argue that they do.

Although it is certainly valuable to create anti-bias interventions for White Americans, because

of their comparatively dominant position in society, they are hardly the only people worth

reaching.

Sexism is an issue for people of all racial backgrounds. While some specific

manifestations of sexism differ by class and race (Williams, 2012), among other factors, non-

White Americans hold and express sexist attitudes and adopt individualistic rhetorical frames

around issues of sexism (Hunt, 1996; Hughes & Tuch, 2000). Changing these attributions and

attitudes would help in terms of personal change, in terms of supporting appropriate remedies for

systemic sexism, and in terms of changing the kinds of conversations we have about sexism in

society. Instead of talking about how individual women can “lean in,” for example, the

184 conversations can begin to be about the structural factors that make the rhetoric of leaning in

necessary in the first place (Losse, 2013).

However, it is also worth designing interventions around racism for non-White

Americans. This population is not somehow exempt from the racist attitudes and rhetorics that pervade our society. For example, internalized racism and intraethnic othering mean that individuals can hold negative stereotypes about groups to which they belong, police their own performance of ethnic identity, and acquiesce to the structures of racial oppression that disadvantage them (Steele & Aaronson, 1995; Pyke & Dang, 2003; Speight, 2007). Additionally, different non-White groups can hold negative views about each other. If we can change the attitudes of non-White Americans, that is a worthwhile thing both for reducing internalized racism and for reducing racism between non-dominant groups.

Additionally, changing the attribution styles of non-White Americans can be a powerful way to expand the reach of existing groups that work for social change. As described earlier in this dissertation, people with an individualistic model are less likely to support interventions that address the systemic elements of bias (Iyengar, 1989; Iyengar, 1994; Hughes & Tuch, 2000; Lau

& Sears, 1981). Changing attributions can help anti-racist groups recruit people who share their vision about the structural changes necessary for a just world. It can also give people a tool to strike back against the dominant rhetoric of color-blindness, not by arguing with its details but by deconstructing the basic premises of individualism on which it relies. Changing the attribution styles of non-White players can be seen as giving them more ways to articulate challenges to the dominant narratives of race in American society.

Finally, we can think about how we can generalize the characteristics of this particular group of players. By understanding better why non-White players in the Mechanical Turk were

185 affected by the game, we may be able to make testable inferences who else might be affected.

There may be underlying commonalities that would allow us to affect some White players as

well because of things other than racial identification that they have in common with the affected group.

In conclusion, while this study’s limitations, as delineated earlier, may have affected its outcomes, it provides impetus and direction for future work in an important area that can be

addressed through a “games for impact” perspective. The study addressed the difficult and

socially relevant problem of prejudice by proposing and testing a new approach to creating

entertainment-based interventions, namely PMAD theory. Although PMAD theory has not been

demonstrated to be completely effective in its current form, it provides the field of entertainment-

based prejudice-reduction interventions with a direction to test more theoretically-grounded

approaches on which to build. The modifications to the PMAD design principles suggested in

this dissertation could refine PMAD theory in ways that result in effective designs to address

social problems that stem from individuals’ perceptions and beliefs. Additionally, this study

found that some players were sensitive to the differences between different versions of the game,

namely non-White players in the Mechanical Turk group. This suggests that further research to

determine whether PMAD theory may differentially persuade non-White players about the nature

of prejudice could be productive, and, ultimately, may result in knowledge about designing the

most effective game-based interventions for various populations when attempting to address

beliefs about prejudice using PMAD theory.

186 References

Casual Games Association (2007). 2007 casual games report. Retrieved from https://dl.dropboxusercontent.com/u/3698805/Reports/CasualGamesMarketReport- 2007.pdf Adams, M., Bell, L. A., & Griffin, P. (Eds.). (2007). Teaching for diversity and social justice. London, UK: Routledge. Allport, G. W. (1979). The nature of prejudice: 25th anniversary edition. New York, NY: Basic Books. Arlin, M., & Roth, G. (1978). Pupils’ use of time while reading comics and books. American Educational Research Journal, 15(2), 201-216. Banaji, M. R., & Greenwald, A. G. (1995). Implicit gender stereotyping in judgments of fame. Journal of Personality and Social Psychology, 68(2), 181–98. Batanero, C., & Sanchez, E. (2005). What is the nature of high school students’ conceptions and misconceptions about probability? In G. Jones (Ed.), Exploring probability in school (pp. 241–266). New York, NY: Springer. Bechdel Test Movie List. (2010). Retrieved November 8, 2010, from http://bechdeltest.com/ Beichner, R. J. (1996). The impact of video motion analysis on kinematics graph interpretation skills. American Journal of Physics, 64, 1272–1278. Bell, M., & Davidson, C. (1976). Relationships between pupil-on-task-performance and pupil achievement. The Journal of Educational Research, 69(5), 172-176. Bertrand, M., & Mullainathan, S. (2003). Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination. (NBER Working Paper No 9873). Cambridge, MA: National Bureau of Economic Research. Retrieved from http://www.nber.org/papers/w9873 Bidell, T. R., Lee, E. M., Bouchie, N., Ward, C., & Brass, D. (1994). Developing conceptions of racism among young white adults in the context of cultural diversity coursework. Journal of Adult Development, 1(3), 185–200. Bjork, S., & Holopainen, J. (2004). Patterns in game design. Cambridge, MA: Charles River Media. Blanchard, F. A., Lilly, T., & Vaughn, L. A. (1991). Reducing the expression of racial prejudice. Psychological Science, 2(2), 101–105. Blizzard Entertainment. (2004). World of warcraft. [PC game]. Irvine, CA: Blizzard Entertainment. Bogost, I. (2006). Unit operations: An approach to videogame criticism. Cambridge, MA: The MIT Press.

187 Bogost, I. (2007). Persuasive games: The expressive power of videogames. Cambridge, MA: The MIT Press. Bonilla-Silva, E., & Forman, T. A. (2000). “I am not a racist but...”: Mapping White college students’ racial ideology in the USA. Discourse & Society, 11(1), 50–85. Bonilla-Silva, E. (2006). Racism without racists: Color-blind racism and the persistence of racial inequality in the United States (2nd ed.). Lanham, MD: Rowman & Littlefield. Brondolo, E., Brady, N., Thompson, S., Tobin, J. N., Cassells, A., Sweeney, M., McFarlane, D., et al. (2008). Perceived racism and negative affect: Analyses of trait and state measures of affect in a community sample. Journal of Social and Clinical Psychology, 27(2), 150– 173. Burke, L. L. (2012). Dog eat dog. [Tabletop roleplaying game]. Oakland, CA: Liwanag Press. Buser, J. K. (2009). Treatment-seeking disparity between African Americans and Whites: Attitudes toward treatment, coping resources, and racism. Journal of Multicultural Counseling and Development, 37(2), 94–104. Bush, G. W. (2010). Decision Points. New York, NY: Crown. Champagne, A. B., Klopfer, L. E., & Anderson, J. H. (1980). Factors influencing the learning of classical mechanics. American Journal of Physics, 48(12), 1074-1079. Chen, J. & Clark, N. (2006). Flow. [Flash game]. Los Angeles, CA: thatgamecompany. Chi, M. T. H., de Leeuw, N., Chiu, M. H., & LaVancher, C. (1994). Eliciting self-explanation improves understanding. Cognitive Science, 18, 439–477. Chi, M. T. H., & Roscoe, R. (2002). The processes and challenges of conceptual change. In M. Limon & L. Mason (Eds.), Reconsidering conceptual change: Issues in theory and practice (3–27). New York, NY: Springer. Chi, M. T. H. (2008). Three types of conceptual change: Belief revision, mental model transformation, and categorical shift. In S. Vosniadou (Ed.), Handbook of research on conceptual change (pp. 61-82). Hillsdale, NJ: Erlbaum. Chi, M. T. H. (2005). Commonsense conceptions of emergent processes: Why some misconceptions are robust. Journal of the Learning Sciences, 14(2), 161–199. Chinn, C. A., & Malhotra, B. A. (2002). Children’s responses to anomalous scientific data : How is conceptual change impeded ? Journal of Educational Psychology, 94(2), 327–343. Chinn, C. A., & Brewer, W. F. (1993). The role of anomalous data in knowledge acquisition: A theoretical framework and implications for science instruction. Review of Educational Research, 63(1), 1–49. Choi, I., Nisbett, R. E., & Norenzayan, A. (1999). Causal attribution across cultures: Variation and universality. Psychological Bulletin, 125(1), 47-63. Church, D. (1999). Formal abstract design tools. Game Developer, 6(8), 44–50.

188 Ciavarro, C. (2008). Implicit learning as a design strategy for learning games: Alert Hockey. Computers in Human Behavior, 24(6), 2862–2872. Cobb, J. (1972). Relationship of discrete classroom behaviors to fourth-grade academic achievement. Journal of Educational Psychology, 63(1), 74-80. Consalvo, M. (2009). There is no magic circle. Games & Culture, 4(4), 408-417. Cook, D. (2006). What are game mechanics? [Web log message]. Retrieved from http://www.lostgarden.com/2006/10/what-are-game-mechanics.html Copier, M. (2005). Connecting worlds: Fantasy role-playing games, ritual acts and the magic circle. Proceedings of the DIGRA 2005 Conference: Changing Views - Worlds in Play. Copier, M. (2007). Beyond the magic circle : A network perspective on role-play in online games. (Doctoral dissertation). Retrieved from http://dspace.library.uu.nl/handle/1874/21958 Cortina, L. M., & Kubiak, S. P. (2006). Gender and posttraumatic stress: sexual violence as an explanation for women’s increased risk. Journal of abnormal psychology, 115(4), 753–9. Costikyan, G. (2002). I have no words & I must design: Toward a critical vocabulary for games. Proceedings from Computer Games and Digital Cultures Conference. Tampere, Finland: Tampere University Press. Retrieved from http://www.digra.org/digital- library/publications/i-have-no-words-i-must-design-toward-a-critical-vocabulary-for- games/ Craik, F. I. M., & Tulving, E. (1975). Depth of processing and the retention of words in episodic memory. Journal of Experimental Psychology, 104, 268–294. Crandall, C. S., & Stangor, C. (2005). Conformity and prejudice. In J. F. Dovidio, P. Glick, & L. A. Rudman (Eds.), On the nature of prejudice: Fifty years after Allport (pp. 295–309). Malden, MA: Blackwell. Diana Jones Awards, The (2013). The shortlist for the 2013 Diana Jones award for excellence in gaming. (2013). Retrieved from http://www.dianajonesaward.org/13nominees.html Deci, E., Koestner, R., & Ryan, R. (1999). A meta-analytic review of experiments examining the effects of extrinsic rewards on intrinsic motivation. Psychological Bulletin, 125(6), 627- 668. Deci, E., Koestner, R., & Ryan, R. (2001). Extrinsic rewards and intrinsic motivation in education: Reconsidered once again. Review of Educational Research, 71(1), 1-27. De Jong, T., & Van Joolingen, W. R. (1998). Scientific discovery learning with computer simulations of conceptual domains. Review of Educational Research, 68(2), 179–201. Deterding, S. (2011). Situated motivational affordances of game elements: A conceptual model. Proceedings from CHI 2011. Retrieved from http://gamification-research.org/wp- content/uploads/2011/04/09-Deterding.pdf Devine, P. (1989). Stereotypes and prejudice: Their automatic and controlled components. Journal of Personality and Social Psychology, 56(1), 5-18.

189 Doerr, H. (1996). STELLA ten years later: A review of the literature. International Journal of Computers for Mathematical Learning, 1(2), 201-224. Dovidio, J. F., Glick, P., & Budman, L. A. (Eds.). (2005). On the nature of prejudice: Fifty years after Allport. Hoboken, NJ: Wiley-Blackwell. Duckitt, J. (1994). The social psychology of prejudice. Westport, CT: Praeger. Elverdam, C., & Aarseth, E. (2007). Game classification and game design: Construction through critical analysis. Games and Culture, 2(3), 3-22. Feagin, J. R. (2001). Racist America: Roots, current realities and future reparations. London, UK: Routledge. Feagin, J. R. (2006). Systemic racism: A theory of oppression. London, UK: Routledge. Fels, A. (2004). Necessary dreams: Ambition in women’s changing lives. New York, NY: Pantheon. Finch, B. K., Kolody, B., & Vega, W. A. (2000). Perceived discrimination and depression among Mexican-origin adults in California. Journal of Health and Social Behavior, 41(3), 295– 313. Firaxis Games Inc. (2005). Sid Meier's Civilization IV [PC game]. New York, NY: Take-Two Interactive Software. Forster, E. M. (1956). Aspects of the novel. Boston, MA: Mariner Books Forrester, J. (1961). Industrial dynamics. New York, NY: Productivity Press. Forrester , J. (1992). System dynamics and learner-centered education in kindergarten through 12th grade education. Retrieved from http://www.mitocw.espol.edu.ec/courses/sloan- school-of-management/15-988-system-dynamics-self-study-fall-1998-spring- 1999/readings/learning.pdf Fox, H. (2001). “When race breaks out:” Conversations about race and racism in college classrooms. New York, NY: Peter Lang Publishing. Gaertner, S. L., & Dovidio, J. F. (1986). The aversive form of racism. In J. F. Dovidio & S. L. Gaertner (Eds.), Prejudice, discrimination and racism: Theory and research (pp. 61–89). Orlando, FL: Academic Press. Gaertner, S., & Dovidio, J. (1977). The subtlety of white racism, arousal, and helping behavior. Journal of Personality and Social Psychology, 35(10), 691-707. Gee, J. P. (2003). What video games have to teach us about learning and literacy. New York, NY: Palgrave Macmillan. Gentner, D., & Stevens, A. L. (1983). Mental models. London, UK: Routledge. Gershenfeld, A. (2010). Computer and Video Games That Engage, Educate and Empower. Retrieved September 29, 2010, from http://elineventures.com/static/GameOnTexas-Alan Gershenfeld.pdf

190 Goffman, I. (1974). Frame analysis: An essay on the organization of experience. Cambridge, MA: Harvard University Press. Goldin, C., & Rouse, C. (1997). Orchestrating Impartiality: The Impact of “Blind” Auditions on Female Musicians (NBER Working Paper No 5903). Cambridge, MA: National Bureau of Economic Research. Retrieved from http://www.nber.org/papers/w5903 Gomez, B., & Wilson, J. (2006). Rethinking symbolic racism: Evidence of . The Journal of Politics, 68(3), 611-625. Gorsky, P., & Finegold, M. (1992). Using computer simulations to restructure students’ conceptions of force. Journal of Computers in Mathematics and Science Teaching, 11, 163–178. Gray, K. L. (2012). Deviant bodies, stigmatized identities, and racist acts: examining the experiences of African-American gamers in XBox Live. New Review of Hypermedia and Multimedia, 18(4), 261-276. Greenwald, A., McGhee, D., & Schwartz, J. (1998). Measuring individual differences in implicit cognition: The implicit association test. Journal of Personality and Social Psychology, 74(6), 1464-80. Grube, J., Mayton , D., & Ball-Rokeach, S. (1994). Inducing change in values, attitudes, and behaviors: Belief system theory and the method of value self-confrontation. Journal of Social Issues, 50(4), 153–173. Hammer, J., & Black, J. B. (2009). Games and (preparation for future) learning. Educational Technology, 49(2), 29–34. Harrell, J. P., Hall, S., & Taliaferro, J. (2003). Physiological responses to racism and discrimination: An assessment of the evidence. American Journal of Public Health, 93(2), 243–8. Hausmann, R., Tyson, L. D., & Zahidi, S. (2010). Global Gender Gap Report. World Economic Forum. Retrieved from http://www.weforum.org/pdf/gendergap/report2010.pdf Heeter, C., Lee, Y., Magerko, B., & Medler, B. (2011). Impacts of forced serious game play on vulnerable subgroups. International Journal of Gaming and Computer-Mediated Simulations, 3(3), 34-53. Hegewisch, A., & Edwards, A. (2011). The gender wage gap: 2011. Retrieved from http://www.iwpr.org/publications/pubs/the-gender-wage-gap-2011 Henry, P. J., & Sears, D. O. (2007). The Symbolic Racism 2000 Scale. Political Psychology, 23(2), 253–283. Hewson, P. W., & Thorley, R. N. (1989). The conditions of conceptual change in the class-room. International Journal of Science Education, 11, 541–553. Hill, M. S., & Fischer, A. R. (2007). Examining objectification theory: Lesbian and heterosexual women’s experiences with sexual- and self-objectification. The Counseling Psychologist, 36(5), 745–776.

191 Hirshman, L. R. (2007). Get to Work: . . . And Get a Life, Before It’s Too Late. New York, NY: Penguin. Holtgraves, T. (2004). Social desirability and self-reports: Testing models of socially desirable responding. Personality and Social Psychology Bulletin, 30(2), 161-172. Howden, L. M., & Meyer, J. A. (2011). Age and sex composition: 2010. Retrieved from http://www.census.gov/prod/cen2010/briefs/c2010br-03.pdf Hughes, Michael, and Steven Tuch. 2000. How beliefs about poverty influence racial policy. In D. O. Sears, J. Sidanius, & L. Bobo (Eds.), Racialized politics: The debate about racism in America (pp. 165-190). Chicago, IL: University of Chicago Press. Huizinga, J. (1950). Homo ludens: A study of the play element in culture. Boston: The Beacon Press. Hunicke, R., LeBlanc, M., & Zubek , R. (2004). MDA: A formal approach to game design and game research. Proceedings from The Challenges in Games AI Workshop, Nineteenth National Conference of Artificial Intelligence. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.79.4561 Humes, K., Jones, N., & Ramire, R., (2010). U.S. census bureau overview of race and Hispanic origin: 2010. Retrieved from http://www.census.gov/prod/cen2010/briefs/c2010br-02.pdf IndieCade. (2013). 2013 festival games nominees. Retrieved from http://www.indiecade.com/2013/nominees/ Isbister, K., Flanagan, M., & Hash, C. (2010). Designing games for learning : Insights from conversations with designers. Proceedings of the 28th International Conference on Human Factors in Computing Systems (pp. 2041–2044). Atlanta, GA. Iyengar, S. (1994). Is anyone responsible?: How television frames political issues. Chicago, IL: University Of Chicago Press. Iyengar, S. (1989). How citizens think about national issues: A matter of responsibility. American Journal of Political Science, 33(4), 878-900. Jacobson, M., & Wilensky, U. (2006). Complex systems in education: Scientific and educational importance and implications for the learning sciences. Journal of the Learning Sciences , 15(1), 11-34. Järvinen, A. (2009). Games without frontiers: Methods for game studies & design. Saarbrücken, Germany: VDM Verlag. Johnson, S. (2002). Emergence: The connected lives of ants, brains, cities, and software. New York, NY: Scribner. Johnson-Laird, P. N. (1994). Mental models and probabilistic thinking. Cognition, 50(1-3), 189– 209. Johsua, S., & Dupin, J. J. (1987). Taking into account student conceptions in instructional strategy: An example in physics. Cognition and Instruction, 4, 117–135.

192 Jonassen, D. H., & Henning, P. (1996). Mental models : Knowledge in the head and knowledge in the world. Educational Technology, 39(3), 37-42. Jost, J.T., & Hunyady, O. (2002). The psychology of system justification and the palliative function of ideology. European review of social psychology, 13, 111-153. Jost, J.T., Banaji, M.R., & Nosek, B.A. (2004). A decade of system justification theory: Accumulated evidence of conscious and unconscious bolstering of the status quo. Political Psychology, 25, 881-919. Jost, J.T., & Banaji, M.R. (1994) The role of stereotyping in system‐justification and the production of false consciousness. British Journal of Social Psychology, 33(1), 1-27. Jussim, L. (1991). Social perception and : A reflection-construction model. Psychological Review, 98, 54–73. Juul, J. (2003). The game, the player, the world: Looking for a heart of gameness. Proceedings from Level Up: Digital Games Research Conference. Utrecht: Utrecht University. Juul, J. (2005). Half-real: Video games between real rules and fictional worlds. Cambridge, MA: MIT Press. Kahneman, D., Slovic, P., & Tversky, A. (1982). Judgment under uncertainty: Heuristics and biases. Cambridge, UK: Cambridge University Press. Kapur, M. (2008). Productive failure. Cognition and Instruction, 26(3), 379-424. Karniol, R., & Ross, M. (1977). The effect of performance-relevant and performance-irrelevant rewards on children's intrinsic motivation. Child Development, 48(2), 482-487. Kaufman, G. F., & Libby, L. K. (2012). Changing beliefs and behavior through experience- taking. Journal of Personality and Social Psychology, 103(1): 1-19. Kee, K., Vaughan, T., & Graham, S. (2010). The haunted school on horror hill: A case study of interactive fiction in an elementary classroom. In Y. Baek (Ed.), Gaming for classroom- based learning: Digital role playing as a motivator of study (pp. 113-124). Cheongwon, North Chungcheong, South Korea: Korea National University of Education. Khan, S. R. (2000). Teaching an undergraduate course on the psychology of racism. Teaching of Psychology, 28(1), 28–33. Kirby, M., & Osterhaus, M. A. (1999). Apples to apples. [Card game]. Madison, WI: Out of the Box Publishing. Klahr, D., Dunbar, K., & Fay, A. L. (1990). Designing good experiments to test bad hypotheses. In J. Shrager & P. Langley (Eds.), Computational models of scientific discovery and theory formation (pp. 355–402). San Mateo, CA: Morgan Kaufmann. Klimmt, C., & Hartmann, T. (2009). Effectance, self-efficacy and the motivation to play video games. In P. Vorderer & J. Bryant (Eds.), Playing video games: Motives, responses and consequences (pp. 133-145). Mahwah, NJ: Lawrence Erlbaum.

193 Klonis, S. C., Plant, E. A., & Devine, P. G. (2005). Internal and external motivation to respond without sexism. Personality and Social Psychology Bulletin, 31(9), 1237–49. Klopfer, E., Osterweil, S., & Salen, K. (2009). Moving Learning Games Forward. Cambridge, MA. Retrieved from http://education.mit.edu/papers/MovingLearningGamesForward_EdArcade.pdf Kluegel, J. R. (1985). If there isn’t a problem, you don't need a solution. American Behavioral Scientist, 28, 761-784. Kozol, J. (1992). Savage inequalities: Children in America’s schools. New York, NY: Harper Perennial. Kuhn, D. (1989). Children and adults as intuitive scientists. Psychological Review, 96, 674–689. Kunda, Z. (1990). The case for . Psychological Bulletin, 108, 480–498. Kuznekoff, J., & Rose, L. (2013). Communication in multiplayer gaming: Examining player responses to gender cues. New Media & Society, 15(4), 541-556. Lau, R., & Sears, D. (1981). Cognitive links between economic grievances and political responses. Political Behavior, 3(4), 279-302. Lazarro, N. (2005). Why we play games: Four keys to more emotion without story. Retrieved from http://www.xeodesign.com/xeodesign_whyweplaygames.pdf League System. (2013). Retrieved November 27, 2013 from http://leagueoflegends.wikia.com/wiki/League_system LeBlanc, M. (2005). Tools for creating dramatic game dynamics. In K. Salen & E. Zimmerman (Eds.), The game design reader: A rules of play anthology. Cambridge, MA: MIT Press. Lenhart, A., Kahne, J., Middaugh, E., Macgill, A., Evans, C., & Vitak, J. (2008). Teens, video games and civics. Retrieved from http://www.pewinternet.org/Reports/2008/Teens- Video-Games-and-Civics.aspx Lieberman, D. A. (2006). Dance games and other exergames: What the research says. Retrieved from http://www.comm.ucsb.edu/faculty/lieberman/exergames.htm Lindley, C. A., & Mayra, F. (2002). The gameplay gestalt, narrative, and interactive storytelling. Proceedings of Computer Games and Digital Cultures Conference. Tampere, Finland: University of Tampere Press. Linn, M. C., & Songer, N. B. (1991). Teaching thermodynamics to middle school students: What are the appropriate cognitive demands? Journal of Research in Science Teaching, 28, 885–918. Lipsitz, G. (1995). The possessive investment in Whiteness: Racialized social democracy and the “White” problem in American Studies. American Quarterly, 47(3), 369–387. Lipson, K. (1997). What do students gain from computer simulation exercises? In J. B. Garfield & G. Burrill (Eds.), Research on the Role of Technology in Teaching and Learning Statistics (pp. 137–150). Voorburg: International Statistical Institute.

194 Losse, K. (2013). Feminism’s tipping point. Retrieved from http://www.dissentmagazine.org/online_articles/feminisms-tipping-point-who-wins- from-leaning-in Macrae, C., Bodenhausen, G., Milne, A., & Jetten, J. (1994). Out of mind but back in sight: Stereotypes on the rebound. Journal of Personality and Social Psychology, 67(5), 808- 817. Mandinach, E. B., & Cline, H. F. (1994). Classroom dynamics: Implementing a technology- based learning environment. Mahwah, NJ: Lawrence Erlbaum Associates Markham, K. M., Mintzes, J. J., & Jones, M. G. (1994). The concept map as a research and evaluation tool: Further evidence of validity. Journal of Research in Science Teaching, 31(1), 91–101. Maxis. (2004). The sims 2. [PC game].Redwood City, Redwood Shores, CA: Electronic Arts. McGuire, J., & Kable, J. (2013). Rational temporal predictions can underlie apparent failures to delay gratification. Psychological Review, 120(2), 395-410. Meadows, D. (2008). Thinking in systems. White River Junction, VT: Chelsea Green Publishing Michael, D., & Chen, S. (2005). Serious games: Games that educate, train & inform. Cincinnati, OH: Muska & Lipman. Monteith, M. (1993). Self-regulation of prejudiced responses: Implications for progress in prejudice-reduction efforts. Journal of Personality and Social Psychology, 65(3), 469- 485. Morales, A. (2010). Farmville meets 1 billion Haiti charity goal, gives away free Hot Air Balloon. Retrieved from http://blog.games.com/2010/10/03/farmville-meets-1-billion- haiti-charity-goal-gives-away-free-ho/ Moreno, R., & Mayer, R. (2005). Role of guidance, reflection, and interactivity in an agent-based multimedia game. Journal of Educational Psychology, 97(1), 117-128. Morris, M. W., Menon, T., & Ames, D. R. (2001). Culturally conferred conceptions of agency: A key to social perception of persons, groups and other actors. Personality and Social Psychology Review, 5(2), 169–182. Moss-Racusin, C. A., Dovidio, J. F., Brescoll, V. L., Graham, M. J., & Handelsman, J. (2012). Science faculty’s subtle gender biases favor male students. Proceedings of the National Academy of Sciences of the United States of America, 109(41), 16474–9. Newell, A., & Simon, H. A. (1972). Human problem solving. Englewood Cliffs, NJ: Prentice Hall. Nickerson, R. S. (1991). Modes and models of informal reasoning: A commentary. In J. F. Voss, D. N. Perkins, & J. W. Segal (Eds.), Informal reasoning and education (pp. 291–309). Hillsdale, NJ: Lawrence Erlbaum. Niedderer, H., Schecker, H., & Bethge, T. (1991). The role of computer-aided modelling in learning physics. Journal of Computer Assisted Learning, 7(2), 84–95.

195 Nintendo EAD. (1996). Super Mario 64. [Nintendo 64 game]. Kyoto, Japan: Nintendo. Norenzayan, A., Choi, I., & Nisbett, R. (2002). Cultural similarities and differences in social inference: Evidence from behavioral predictions and lay theories of behavior. Personality and Social Psychology Bulletin, 28(1), 109-120. Oliver, M. L., & Shapiro, T. M. (2006). Black wealth, white wealth: A new perspective on racial inequality. New York, NY: CRC Press. Olsen, D. G. (1999). Constructivist principles of learning and teaching. Education, 120(2). Osborne, J., & Squires, D. (1987). Learning science through experiential software. In J. Novak (Ed.), Proceedings of the Second International Seminar on Misconceptions and Educational Strategies in Science and Mathematics (pp. 373–380). Ithaca, NY: Cornell University. Oskamp, S. (Ed.). (2000). Reducing prejudice and discrimination. New York, NY: Taylor & Francis. Overview of Race and Hispanic Origin: 2010. (2010). Retrieved November 13, 2012, from http://www.census.gov/prod/cen2010/briefs/c2010br-02.pdf Pajitnov, Alexey. (1984). Tetris. [PC game]. Alameda, CA: Spectrum HoloByte, Inc. Paluck, E. L. (2009). Reducing intergroup prejudice and conflict using the media: A field experiment in Rwanda. Journal of Personality and Social Psychology, 96(3), 574-587. Paluck, E. L., & Green, D. P. (2009). Prejudice reduction: What works? A review and assessment of research and practice. Annual Review of Psychology, 60, 339–67. Perkins, D. N., & Simmons, R. (1988). Patterns of misunderstanding: An integrative model for science, math, and programming. Review of Educational Research, 58(3), 303-326. Pierce, C., Carew, J., Pierce-Gonzalez, D., & Willis, D. (1978). An experiment in racism: TV commercials. In C. Pierce (Ed.), Television and education (pp. 62–88). Beverly Hills: Sage. Plass, J. L., Homer, B. D., Kinzer, C., Frye, J., & Perlin, K. (2011). Learning mechanics and assessment mechanics for games for learning. G4LI White Paper # 01/2011 Version 0.1 September 30, 2011, Available online at g4li.org Valve. (2007). Portal. [PC game]. Bellevue, WA: Valve Entertainment. QCF Design. (2013). Desktop Dungeons. [PC game]. Cape Town, South Africa: QCF Design. Resnick, M. (1996). Beyond the centralized mindset. Journal of the Learning Sciences, 5, 1-22. Richardson, G. (1999). Feedback thought in social science and systems theory. Westford, MA: Pegasus Communications. Riot Games. (2009). League of legends. [PC game]. Santa Monica, CA: Riot Games.

196 Riot Games Blog. (2012). League of legends’ growth spells bad news for Teemo. Retrieved from http://www.riotgames.com/articles/20121015/138/league-legends-growth-spells- bad-news-teemo Rokeach, M. (1973). The nature of human values. New York, NY: Free Press. Roth, K. J. (1990). Developing meaningful conceptual understanding in science. In B. F. Jones & L. Idol (Eds.), Dimensions of thinking and cognitive instruction. London, UK: Routledge. Rovio Entertainment. (2009). Angry birds. [iOS mobile game]. Espoo, Finland: Rovio Entertainment. Rovio Entertainment Reports 2012 Financial Results. (2013). Retrieved November 27, 2013 from http://www.rovio.com/en/news/press-releases/284/rovio-entertainment-reports- 2012-financial-results/ Ryan, R., & Deci, E. (2000). Intrinsic and extrinsic motivations: Classic definitions and new directions. Contemporary Educational Psychology, 25, 54–67. Salen, K., & Zimmerman, E. (2003). Rules of play: Game design fundamentals. The MIT Press. Salen, K., & Zimmerman, E. (2005). The game design reader: A rules of play anthology. Cambridge, MA: The MIT Press. Salvatore, J., & Shelton, J. N. (2007). Cognitive costs of exposure to racial prejudice. Psychological science, 18(9), 810–5. Sawyer, B. (2010). Serious Games Initiative. Retrieved November 10, 2010, from http://www.seriousgames.org/. Sawyer, R. K. (2005). Social emergence: Societies as complex systems. Cambridge, UK: Cambridge University Press. Schelling, T. C. (1971). Dynamic Models of Segregation. Journal of Mathematical Sociology, 1, 143–186. Schmidt, S. L. (2005). More than men in white Sheets: Seven concepts critical to the teaching of racism as systemic inequality. Equity & Excellence in Education, 38(2), 110–122. Schuman, H., Steeh, C., Bobo, L. D., & Krysan, M. (1998). Racial attitudes in America: Trends and interpretations, revised edition. Cambridge, MA: Harvard University Press. Sears, D. O., Inquiry, S. P., & Sears, D. (2010). A perspective on implicit prejudice from survey research. Psychological Inquiry, 15(4), 293–297. Sears, D. O., & Jessor, T. (1996). Whites’ racial policy attitudes: The role of White racism. Social Science Quarterly, 77(4). Sibley, C. G., & Duckitt, J. (2008). Personality and prejudice: a meta-analysis and theoretical review. Personality and Social Psychology Review, 12(3), 248–79. Sidanius, J., & Veniegas, R. (2000). Gender and race discrimination: The interactive nature of disadvantage. In S. Oskamp (Ed.), Reducing Prejudice and Discrimination (pp. 47-69). Mahwah, NJ: Lawrence Erlbaum Associate.

197 Song, J. (1998). Lineage. [PC game]. Seoul, South Korea: NCsoft. Speight, S. (2007). Internalized racism, one more piece of the puzzle. The Counseling Psychologist, 35(1), 126-134. Squire, K., & Barab, S. (2004). Replaying history: Engaging urban underserved students in learning world history through computer simulation games. Proceedings of the 6th International Conference on Learning Sciences. International Society of the Learning Sciences. Retrieved from http://portal.acm.org/citation.cfm?id=1149126.1149188 Squire, K., & Durga, S. (2005). Productive gaming : The case for historiographic game play. In R. Ferdig (Ed.), The Handbook of Research on Effective Electronic Gaming. Hershey, PA: IGI Global. Stanley, D., Sokol-Hessner, P., Banaji, M., & Phelps, E. (2011). Implicit race attitudes predict trustworthiness judgments and economic trust decisions. Proceedings of the National Academy of Sciences of the United States of America, 108(19), 7710-7715. Steele, C. M. (1997). A threat in the air: How stereotypes shape intellectual identity and performance. The American Psychologist, 52(6), 613–29. Steele, C., & Aronson, J. (1995). Stereotype threat and the intellectual test performance of African Americans. Journal of Personality and Social Psychology, 69(5), 797-811. Steinkuehler, C. (2008). Cognition and literacy in massively multiplayer online games. In J. Coiro, M. Knobel, C. Lankshear, & D. Leu (Eds.), Handbook of Research on New Literacies (pp. 611–634). Mahwah, NJ: Erlbaum. Steinpreis, R., Anders, K. A., & Ritzke, D. (1999). The impact of gender on the review of the curricula vitae of job applicants and tenure candidates: A national empirical study. Sex Roles, 41(7-8), 509-528. Stephan, W. G., & Stephan, C. W. (2000). An integrated threat theory of prejudice. In S. Oskamp (Ed.), Reducing Prejudice and Discrimination (pp. 23–46). Mahwah, NJ: Lawrence Erlbaum. Strange, J. J. (2002). How fictional tales wag real-world beliefs. Narrative impact: social and cognitive foundations (pp. 263–286). London, UK: Routledge. Strange, S. (2011, March). Tension maps. Game Developer Magazine, 18(3), 7-11. Sue, D. W. (2010). Microaggressions in everyday life: Race, gender, and sexual orientation. Hoboken, NJ: Wiley. Sue, D. W., & Capodilupo, C. M. (2008). Racial, gender, and sexual orientation microaggressions: Implications for counseling and psychotherapy. In D. W. Sue & D. Sue (Eds.), Counseling the culturally diverse: Theory and practice. Hoboken, NJ: Wiley. Suits, B. (2005). The grasshopper: Games, life and utopia. Peterborough, Ontario: Broadview Press. Swim, J. K., Aikin, K. J., Hall, W. S., & Hunter, B. A. (1995). Sexism and racism: Old-fashioned and modern prejudices. Journal of Personality and Social Psychology, 68(2), 199-214.

198 Swim, J. K., Hyers, L. L., Cohen, L. L., & Ferguson, M. J. (2001). Everyday sexism: Evidence for its incidence, nature, and psychological impact from three daily diary studies. Journal of Social Issues, 57(1), 31–53. Taylor, M. C. (1998). How White attitudes vary with the racial composition of local populations: Numbers count. American Sociological Review, 63, 512-535. Taylor, R., & Chi, M. (2006). Simulation versus text: Acquisition of implicit and explicit information. Journal of Research in Science Teaching, 35(3), 289–313. Teach with Portals. (n.d.). Retrieved November 13, 2012, from http://www.teachwithportals.com/ Tesser, A., & Shaffer, D. R. (1990). Attitudes and attitude change. Annual Review of Psychology, 41, 479–523. Tversky, A., & Kahneman, D. (1991). Loss aversion in riskless choice: A reference-dependent model. The Quarterly Journal of Economics , 106(4), 1039-1061. US Census Bureau, D. I. D. (n.d.). Genealogy Data: Frequently Occurring Surnames from Census 1990 – Names Files - U.S. Census Bureau. Retrieved from http://www.census.gov/genealogy/www/data/1990surnames/names_files.html Uhlmann, E. L., & Cohen, G. L. (2005). Constructed criteria: Redefining merit to justify discrimination. Psychological Science, 16(6), 474–480. Uziel, L. (2010). Rethinking social desirability scales from impression management to interpersonally oriented self-control. Perspectives on Psychological Science, 5(3), 243- 262. Valian, V. (1999). The cognitive bases of gender bias. Brooklyn Law Review, 65(4), 1037-1061. Vie, S. (2008). Tech writing, meet tomb raider: video and computer games in the technical communication classroom. E-Learning and Digital Media, 5(2), 157-166. Von Bertalanffy, L. (1958). An outline of general system theory. British Journal for the Philosophy of Science, 1, 134-165. Wallace, J. D., & Mintzes, J. J. (1990). The concept map as a research tool: Exploring conceptual change in biology. Journal of Research in Science Teaching, 27(10), 1033–1052. Watson, B., & Konicek, R. (1990). Teaching for conceptual change: Confronting children’s experience. Phi Delta Kappan, 71, 680–685. Wenneras, C., & Wold, A. (1997). Nepotism and sexism in peer review. Nature, 387, 341–343. White, B. Y. (1993). ThinkerTools: Causal models, conceptual change, and science education. Cognition and Instruction, 10, 1–100. Wilensky, U., & Resnick, M. (1999). Thinking in levels: A dynamic systems approach to making sense of the world. Journal of Science Education and Technology, 8(1), 3-19. Williams, D., Martins, N., Consalvo, M., & Ivory, J. (2009). The virtual census: representations of gender, race and age in video games. New Media & Society , 11(5), 815-834.

199 Williams, J. (2012). Reshaping the work-family debate: Why men and class matter. Cambridge, MA: Harvard University Press Windschitl, M., & Andre, T. (1998). Using computer simulations to enhance conceptual change: The roles of constructivist instruction and student epistemological beliefs. Journal of Research in Science Teaching, 35(2), 145–160. Zietsman, A. I., & Hewson, P. W. (1986). Effect of instructing using microcomputer simulations and conceptual change strategies on science learning. Journal of Research in Science Teaching, 23, 27–39. Zillmann, D. (1991). Affect from bearing witness to the emotions of others. Responding to the screen: Reception and reaction processes. Hillsdale, NJ: Lawrence Erlbaum.

200 Appendix A: Name Selection

The most common hundred last names for each of the four racial groups/ethnicities used in the game were selected using US census data, provided by Mongabay

(http://names.mongabay.com). To remove ambiguity, any last name that appeared on more than one list was removed from all lists. For groups where this brought the total under one hundred last names, the next fifty most common last names were selected. Again, any last name appearing on more than one list was removed from all lists. This process was repeated until all lists had at least one hundred last names.

Frequency data on first names by ethnicity was not available from the US Census data, so we turned to alternate sources. Lists were generated as follows:

• Black: the site http://www.babynamesworld.com reports the most popular names for

African American children in 2011.

• Hispanic: the site http://www.babycenter.com reports the most popular names for

Hispanic children in 2011.

• Asian and White: lists of popular Asian and White names were generated using the

randomNames package for R, which draws on the 2010 US Census data. (Package

available at http://cran.r-project.org/web/packages/randomNames/).

Fifty male and fifty female names were selected for each group. To remove ambiguity, any first name that appeared on more than one list was removed from all lists. For lists with fewer than fifty names remaining, ten names were added and the process repeated until all lists had at least fifty names.

201 Appendix B: Sexism Attribution Test

For the following questions, answers 1 and 2 are individual answers, while answers 3 and 4

are systemic. Questions are matched to questions with the same number on the race questions. To

score the test, sum the number of systemic answers; a higher score indicates more systemic

thinking about sexism.

Question 1

The anthology "Best Short Stories By New Writers" is compiled by a single editor. The

editor chooses stories for the anthology from obscure small-press literary magazines.

This year's anthology contains only stories by male writers.

Which of the following explanations would you consider most likely to be true?

1. The editor gave special preference to stories by male authors.

2. The editor selected the best stories without considering gender.

3. Female authors are underrepresented in small-press literary magazines.

4. Male authors are better at writing stories that appeal to an audience of sophisticated,

literary readers.

Question 2

Professor Jones teaches an advanced mathematics seminar at a public university. The seminar requires a lower-level prerequisite, as well as permission from Professor Jones, to enroll.

Only half the students who apply for the seminar are admitted.

This year, there are no female students in Professor Jones' course.

Which of the following explanations would you consider most likely to be true?

202 1. Professor Jones discouraged female students from taking his classes.

2. Professor Jones selected the best students for his seminar, regardless of gender.

3. Female students are underrepresented in the lower-level math class required to study with

Professor Jones.

4. The female brain is not wired for mathematical excellence, so fewer women are good

enough to study with Professor Jones.

Question 3

Every year, the president of a small country nominates several high-level army officers to become generals.

This year, all the nominees are male.

Which of the following explanations would you consider most likely to be true?

1. The president gave preferential treatment to the male candidates.

2. The president chose the best candidates available, regardless of gender.

3. Women have fewer opportunities to reach the top ranks of the military.

4. Since most of the lower-level officers are male, they would be most effectively led by

male generals.

Question 4

Every year, the hockey coach at Middlebrook College offers scholarships to promising young hockey players. The coach chooses from college-bound high-school students who have demonstrated skill at hockey.

This year, all the scholarships go to male hockey players.

203 Which of the following explanations would you consider most likely to be true?

1. The coach gave special treatment to the male players.

2. The coach picked the best players for the scholarship, regardless of gender.

3. Many more high-schools have a hockey team for men than for women.

4. Students are more interested in attending the men's hockey games, so the coach

prioritized putting together a great men's team.

Question 5

A male acquaintance refers to Nancy as "babe." Nancy becomes angry.

Which of the following explanations would you consider most likely to be true?

1. The acquaintance intended to disparage Nancy for her gender.

2. Nancy overreacted to a good-humored comment.

3. Many people use the word "babe" to disparage women, even if Nancy's acquaintance did

not mean to.

4. The term "babe" may once have been used in a sexist way, but its use in other contexts is

more important.

Question 6

Alison, a female college student, rarely participates in her literature seminar. At the end of the semester, the professor gives Alison a poor grade.

Which of the following explanations would you consider most likely to be true?

1. The professor gave Alison a poor grade because of her gender.

2. Alison chose not to participate in class, so she deserved a poor grade.

204 3. In the past, when Alison tried to participate in class discussions, she was often ignored by

her teachers in favor of her male classmates.

4. Women are not aggressive enough to have their opinions heard in the classroom.

Question 7

Susan is a computer programmer. An acquaintance compliments Susan on how good she is with computers. Susan becomes upset.

Which of the following explanations would you consider most likely to be true?

1. The acquaintance meant that it was unusual for a woman to be good at programming.

2. Susan overreacted to a well-meaning compliment.

3. Many people assume women are bad with technology, even if Susan's acquaintance did

not.

4. Women on average are worse with technology than men, so it is reasonable to assume

that Susan would not be good at programming.

Question 8

Amelia, a fourth-grader, is tested for mathematical disabilities. The results are

ambiguous, but the school counselor assigns Amelia to a special education program.

Which of the following explanations would you consider most likely to be true?

1. The school counselor deliberately referred Amelia for special education because of her

gender.

2. The school counselor used excellent professional judgment in making a difficult call.

3. Popular stereotypes portray female students as mathematically inferior, which influenced

205 the counselor's recommendation.

4. Women are less mathematically adept than men, so it is reasonable to assume that

Amelia's ambiguous results mean that she has a problem.

Question 9

Mary applies for a promotion to manager. Later, she learns that her boss recommended a

male candidate with six months' less experience. Mary does not get the promotion.

Which of the following explanations would you consider most likely to be true?

1. Mary's boss discriminated against her because of her gender.

2. Mary's boss recommended the most qualified candidate for the job.

3. Women are rarely portrayed as effective leaders, which influenced Mary's boss's

judgment about who could lead the team.

4. Men do not like to be led by a woman, so it is better to promote male candidates to

manager.

Question 10

Hannah attends a costume party with a male friend. Both of them wear skimpy costumes.

As they leave the party, a police officer warns Hannah about dressing so scandalously, while leaving her male friend alone.

Which of the following explanations would you consider most likely to be true?

1. The police officer only warned Hannah about her appearance because of her gender.

2. The police officer used excellent professional judgment in deciding who to caution.

3. Women's bodies are assumed to be on display for men, leading the officer to believe the

206 warning was appropriate.

4. Dressing provocatively increases the chances a woman will be sexually assaulted, so the

officer behaved correctly in warning Hannah.

207 Appendix C: Racism Attribution Test For the following questions, answers 1 and 2 are individual answers, while answers 3 and 4 are systemic. Questions are matched to questions with the same number on the gender questions.

To score the test, sum the number of systemic answers; a higher score indicates more systemic thinking about racism.

Question 1

The anthology "Best Short Stories By New Writers" is compiled by a single editor. The editor chooses stories for the anthology from obscure small-press literary magazines.

This year's anthology contains only stories by white writers.

Which of the following explanations would you consider most likely to be true?

1. The editor gave special preference to stories by white authors.

2. The editor selected the best stories without considering race.

3. Non-white authors are underrepresented in small-press literary magazines.

4. White authors are better at writing stories that appeal to an audience of sophisticated,

literary readers.

Question 2

Professor Jones teaches an advanced mathematics seminar at a public university. The seminar requires a lower-level prerequisite, as well as permission from Professor Jones, to enroll.

Only half the students who apply for the seminar are admitted.

This year, there are no black or Hispanic students in Professor Jones' course.

Which of the following explanations would you consider most likely to be true?

1. Professor Jones discouraged black and Hispanic students from taking his classes.

208 2. Professor Jones selected the best students for his seminar, regardless of race.

3. Black and Hispanic students are underrepresented in the lower-level math class required

to study with Professor Jones.

4. Black and Hispanic students prioritized fun over schoolwork in the past, leaving them

unprepared for Professor Jones' advanced classes.

Question 3

Every year, the president of a small country nominates several high-level army officers to become generals.

This year, all the nominees are white.

Which of the following explanations would you consider most likely to be true?

1. The president gave preferential treatment to the white candidates.

2. The president chose the best candidates available, regardless of race.

3. Non-whites have fewer opportunities to reach the top ranks of the military.

4. Since most of the lower-level officers are white, they would be most effectively led by

white generals.

Question 4

Every year, the hockey coach at Middlebrook College offers scholarships to promising young hockey players. The coach chooses from college-bound high-school students who have demonstrated skill at hockey.

This year, all the scholarships go to white hockey players.

Which of the following explanations would you consider most likely to be true?

209 1. The coach gave special treatment to the white players.

2. The coach picked the best players for the scholarship, regardless of race.

3. Because hockey is an expensive sport, white students have more opportunities to try

hockey at the high-school level than non-white students do.

4. White athletes value education more than their non-white peers, so more of them apply to

college.

Question 5

A white acquaintance refers to Jamal, a black man, and his friends as "you people." Jamal

becomes angry.

Which of the following explanations would you consider most likely to be true?

1. The acquaintance intended to disparage Jamal for his race.

2. Jamal overreacted to a good-humored comment.

3. In the past, Jamal has heard others use the term "you people" in ways that carry racist

overtones.

4. The term "you people" may once have been used in a racist way, but its use in other

contexts is more important.

Question 6

Ramon, a Hispanic college student, rarely participates in his literature seminar. At the end of the semester, the professor gives Ramon a poor grade.

Which of the following explanations would you consider most likely to be true?

1. The professor gave Ramon a poor grade because of his race.

210 2. Ramon chose not to participate in class, so he deserved a poor grade.

3. In the past, when Ramon tried to participate in class discussions, he was often ignored by

his teachers in favor of his white classmates.

4. Ramon grew up in a community that taught him not to value academic achievement or

class participation.

Question 7

Jin, an Asian man, speaks English as his first language. An acquaintance compliments Jin on how well he speaks English. Jin becomes upset.

Which of the following explanations would you consider most likely to be true?

1. The acquaintance meant that it was unusual for an Asian person to speak English well.

2. Jin overreacted to a well-meaning compliment.

3. In the past, Jin has encountered many people who assume he is not American-born

because of his ethnic background.

4. Many Asians are not fluent in English, so it is only reasonable to assume that Jin would

not be either.

Question 8

Tyrone, a black fourth-grader, is tested for reading disabilities. The results are ambiguous, but the school counselor assigns Tyrone to a special education program.

Which of the following explanations would you consider most likely to be true?

1. The school counselor deliberately referred Tyrone for special education because of his

race.

211 2. The school counselor used excellent professional judgment in making a difficult call.

3. Popular stereotypes portray black students as academically inferior, which influenced the

counselor's recommendation.

4. Tyrone's family and peers are negative role models for him, leading him to behave poorly

in the classroom and require special attention.

Question 9

Zhou, an Asian man, applies for a promotion to manager. Later, he learns that his boss recommended a white candidate with six months' less experience. Zhou does not get the promotion.

Which of the following explanations would you consider most likely to be true?

1. Zhou's boss discriminated against him because of his race.

2. Zhou's boss recommended the most qualified candidate for the job.

3. Asian men are rarely portrayed as effective leaders, which influenced Zhou's boss's

judgment about who could lead the team.

4. Zhou grew up in a culture that encourages deference and obedience, making him unsuited

for a leadership position.

Question 10

Deshaun, a black man, is hanging out with a white friend in the park. A police officer searches Deshaun for drugs but leaves his friend alone.

Which of the following explanations would you consider most likely to be true?

1. The police officer only searched Deshaun because of his race.

212 2. The police officer used his best professional judgment in deciding who to search.

3. Law enforcement is harsher on black men than white men for the same offense.

4. Most drug dealers are young, black and male, so police officers should focus their efforts

on young black men.

213 Appendix D: Attribution Test Validation Data Analysis

Question 1

Individual answers

Question Type Answer Individual Systemic Sig. Gender The editor gave special preference to stories by male authors. N = 38 N = 3 .000 The editor selected the best stories without considering gender. N = 28 N = 14 .050 Race The editor gave special preference to stories by white authors. N = 36 N = 5 .000 The editor selected the best stories without considering race. N = 36 N = 5 .000

Systemic answers

Question Type Answer Individual Systemic Sig. Gender Female authors are underrepresented in small-press literary N = 3 N = 38 .000 magazines. Male authors are better at writing stories that appeal to an audience N = 4 N = 37 .000 of sophisticated, literary readers. Race Non-white authors are underrepresented in small-press literary N = 0 N = 41 .000 magazines. White authors are better at writing stories that appeal to an audience N = 4 N = 37 .000 of sophisticated, literary readers.

Question 2

Individual answers

Question Type Answer Individual Systemic Sig. Gender Professor Jones discouraged female students from taking his classes. N = 37 N = 4 .000 Professor Jones selected the best students for his seminar, regardless N = 30 N = 11 .004 of gender. Race Professor Jones discouraged black and Hispanic students from N = 37 N = 4 .000 taking his classes. Professor Jones selected the best students for his seminar, regardless N = 33 N = 8 .000 of race.

214 Systemic answers

Question Type Answer Individual Systemic Sig. Gender Female students are underrepresented in the lower-level math class N = 4 N = 37 .000 required to study with Professor Jones. The female brain is not wired for mathematical excellence, so fewer N = 8 N = 33 .000 women are good enough to study with Professor Jones. Race Black and Hispanic students are underrepresented in the lower-level N = 0 N = 41 .000 math class required to study with Professor Jones. Black and Hispanic students prioritized fun over schoolwork in the N = 11 N = 30 .004 past, leaving them unprepared for Professor Jones' advanced classes.

Question 3

Individual answers

Question Type Answer Individual Systemic Sig. Gender The president gave preferential treatment to the male candidates. N = 39 N = 2 .000 The president chose the best candidates available, regardless of N = 29 N = 12 .012 gender. Race The president gave preferential treatment to the white candidates. N = 35 N = 6 .000 The president chose the best candidates available, regardless of race. N = 33 N = 8 .000

Systemic answers

Question Type Answer Individual Systemic Sig. Gender Women have fewer opportunities to reach the top ranks of the N = 4 N = 37 .000 military. Since most of the lower level officers are male, they would be most N = 7 N = 34 .000 effectively led by male generals. Race Non-whites have fewer opportunities to reach the top ranks of the N = 1 N = 40 .000 military. Since most of the lower level officers are white, they would be most N = 1 N = 40 .000 effectively led by white generals.

215 Question 4

Individual answers

Question Type Answer Individual Systemic Sig. Gender The coach gave special treatment to the male players. N = 36 N = 5 .000 The coach picked the best players for the scholarship, regardless of N = 28 N = 13 .028 gender. Race The coach gave special treatment to the white players. N = 36 N = 5 .000 The coach picked the best players for the scholarship, regardless of N = 36 N = 5 .000 race.

Systemic answers

Question Type Answer Individual Systemic Sig. Gender Many more high-schools have a hockey team for men than for N = 1 N = 40 .000 women. Students are more interested in attending the men's hockey games, N = 8 N = 33 .000 so the coach prioritized putting together a great men's team. Race Because hockey is an expensive sport, white students have more N = 1 N = 40 .000 opportunities to try hockey at the high-school level than non-white students do. White athletes value education more than their non-white peers, so N = 2 N = 39 .000 more of them apply to college.

Question 5

Individual answers

Question Type Answer Individual Systemic Sig. Gender The acquaintance intended to disparage Nancy for her gender. N = 36 N = 5 .000 Nancy overreacted to a good-humored comment. N = 39 N = 2 .000 Race The acquaintance intended to disparage Jamal for his race. N = 32 N = 9 .000 Jamal overreacted to a good-humored comment. N = 35 N = 8 .000

216 Systemic answers

Question Type Answer Individual Systemic Sig. Gender Many people use the word "babe" to disparage women, even if N = 8 N = 31 .000 Nancy's acquaintance did not mean to. The term “babe” may once have been used in a sexist way, but its N = 9 N = 32 .000 use in other contexts is more important. Race In the past, Jamal has heard others use the term“boy” in ways that N = 9 N = 32 .000 carry racist overtones. The term “boy” may once have been used in a racist way, but its use N = 6 N = 35 .000 in other contexts is more important.

Question 6

Individual answers

Question Type Answer Individual Systemic Sig. Gender The professor gave Alison a poor grade because of her gender. N = 39 N = 2 .000 Alison chose not to participate in class, so she deserved a poor N = 40 N = 1 .000 grade. Race The professor gave Ramon a poor grade because of his race. N = 34 N = 7 .000 Ramon chose not to participate in class, so he deserved a poor N = 41 N = 0 .000 grade.

Systemic answers

Question Type Answer Individual Systemic Sig. Gender In the past, when Alison tried to participate in class discussions, she N = 9 N = 32 .000 was often ignored by her teachers in favor of her male classmates. Women are not aggressive enough to have their opinions heard in N = 4 N = 37 .000 the classroom. Race In the past, when Ramon tried to participate in class discussions, he N = 11 N = 30 .004 was often ignored by his teachers in favor of his white classmates. Ramon grew up in a community that taught him not to value N = 2 N = 39 .000 academic achievement or class participation.

217 Question 7

Individual answers

Question Type Answer Individual Systemic Sig. Gender The acquaintance meant that it was unusual for a woman to be good N = 29 N = 12 .012 at programming. Susan overreacted to a well-meaning compliment. N = 38 N = 3 .000 Race The acquaintance meant that it was unusual for an Asian person to N = 31 N = 10 .001 speak English well. Jin overreacted to a well-meaning compliment. N = 37 N = 4 .000

Systemic answers

Question Type Answer Individual Systemic Sig. Gender Many people assume that women are bad with technology, even if N = 6 N = 35 .000 Susan's acquaintance did not. Women on average are worse with technology than men, so Susan's N = 5 N = 36 .000 skill at programming is surprising. Race In the past, Jin has encountered many people who assume he is not N = 9 N = 32 .000 American-born because of his ethnic background. Many Asians are not fluent in English, so it is only reasonable to N = 8 N = 33 .000 assume that Jin would not be either.

Question 8

Individual answers

Question Type Answer Individual Systemic Sig. Gender The school counselor deliberately referred Amelia for special N = 37 N = 4 .000 education because of her gender. The school counselor used excellent professional judgment in N = 35 N = 6 .000 making a difficult call. Race The school counselor deliberately referred Tyrone for special N = 34 N = 7 .000 education because of his race. The school counselor used excellent professional judgment in N = 39 N = 2 .000 making a difficult call.

218 Systemic answers

Question Type Answer Individual Systemic Sig. Gender Popular stereotypes portray female students as mathematically N = 7 N = 34 .000 inferior, which influenced the counselor's recommendation. Women are less mathematically adept than men, so it is reasonable N = 2 N = 39 .000 to assume that Amelia's ambiguous results mean that she has a problem. Race Popular stereotypes portray black students as academically inferior, N = 9 N = 32 .000 which influenced the counselor's recommendation. Tyrone's family and peers are negative role models for him, leading N = 7 N = 34 .000 him to behave poorly in the classroom and require special attention.

Question 9

Individual answers

Question Type Answer Individual Systemic Sig. Gender Mary's boss discriminated against her because of her gender. N = 36 N = 5 .000 Mary's boss recommended the most qualified candidate for the job. N = 34 N = 7 .000 Race Zhou's boss discriminated against him because of his race. N = 37 N = 4 .000 Zhou's boss recommended the most qualified candidate for the job. N = 38 N = 3 .000

Systemic answers

Question Type Answer Individual Systemic Sig. Gender Women are rarely portrayed as effective leaders, which influenced N = 8 N = 33 .000 Mary's boss's judgment about who could lead the team. Men do not like to be led by a woman, so it is better to promote N = 5 N = 36 .000 male candidates to manager. Race Asian men are rarely portrayed as effective leaders, which N = 6 N = 35 .000 influenced Zhou's boss's judgment about who could lead the team. Zhou grew up in a culture that encourages deference and obedience, N = 3 N = 38 .000 making him unsuited for a leadership position.

219 Question 10

Individual answers

Question Type Answer Individual Systemic Sig. Gender The police officer only warned Hannah about her appearance N = 34 N = 7 .000 because of her gender. The police officer used excellent professional judgment in deciding N = 33 N = 8 .000 who to caution. Race The police officer only searched Deshaun because of his race. N = 31 N = 10 .001 The police officer used his best professional judgment in deciding N = 34 N = 7 .000 who to search.

Systemic answers

Question Type Answer Individual Systemic Sig. Gender Women's bodies are assumed to be on display for men, leading the N = 8 N = 33 .000 officer to believe the warning was appropriate. Dressing provocatively increases the chances a woman will be N = 5 N = 36 .000 sexually assaulted, so the officer behaved correctly in warning Hannah. Race Law enforcement is harsher on black men than white men for the N = 2 N = 39 .000 same offense. Most drug dealers are young, black and male, so police officers N = 0 N = 41 .000 should focus their efforts on young black men.

220 Appendix E: Modern Sexism Scale (adapted) Questions 1, 3, 4, 5, 6, and 8 should be coded from “Strongly Disagree” = 5 to “Strongly

Agree” = 1. Questions 2 and 7 should be reverse-coded as “Strongly Disagree” = 1 to “Strongly

Agree” = 5.

Sum the scores for all questions to get the overall score. A higher score indicates less evidence of bias.

1. Discrimination against women is no longer a problem in the United States.

1 – Strongly agree

2 – Somewhat agree

3 – Neither agree nor disagree

4 – Somewhat disagree

5 – Strongly disagree

2. Women often miss out on good jobs due to sexual discrimination.

1 – Strongly agree

2 – Somewhat agree

3 – Neither agree nor disagree

4 – Somewhat disagree

5 – Strongly disagree

3. It is rare to see women treated in a sexist manner on television.

1 – Strongly agree

2 – Somewhat agree

221 3 – Neither agree nor disagree

4 – Somewhat disagree

5 – Strongly disagree

4. On average, people in our society treat husbands and wives equally.

1 – Strongly agree

2 – Somewhat agree

3 – Neither agree nor disagree

4 – Somewhat disagree

5 – Strongly disagree

5. Society has reached the point where women and men have equal opportunities for achievement.

1 – Strongly agree

2 – Somewhat agree

3 – Neither agree nor disagree

4 – Somewhat disagree

5 – Strongly disagree

6. It is easy to understand the anger of women's groups in America.

1 – Strongly agree

2 – Somewhat agree

3 – Neither agree nor disagree

222 4 – Somewhat disagree

5 – Strongly disagree

7. It is easy to understand why women's groups are still concerned about societal limitations of women's opportunities.

1 – Strongly agree

2 – Somewhat agree

3 – Neither agree nor disagree

4 – Somewhat disagree

5 – Strongly disagree

8. Over the past few years, the government and news media have been showing more concern about the treatment of women than is warranted by women's actual experiences.

1 – Strongly agree

2 – Somewhat agree

3 – Neither agree nor disagree

4 – Somewhat disagree

5 – Strongly disagree

223 Appendix F: Symbolic Racism Test (adapted)

Questions 1, 2, 4, and 8 should be coded from “Strongly Disagree” = 4 to “Strongly Agree”

= 1. Questions 5, 6, and 7 should be coded from “Strongly Disagree” = 1 to “Strongly Agree” =

4. Question 3 should be coded as follows:

– “Very much too fast” = 1

– “Moving at about the right speed” = 2

– “Going too slowly” = 3

Sum the scores for each question to get an overall score. A higher score indicates less evidence of bias.

1. It's really a matter of some people not trying hard enough; if blacks would only try harder they could be just as well off as whites.

1 – Strongly agree

2 – Somewhat agree

3 – Somewhat disagree

4 – Strongly disagree

2. Irish, Italian, Jewish and many other minorities overcame prejudice and worked their way up.

Blacks should do the same.

1 – Strongly agree

2 – Somewhat agree

3 – Somewhat disagree

224 4 – Strongly disagree

3. Some say that black leaders have been trying to push too fast. Others feel that they haven’t

pushed fast enough. What do you think?

1 – Trying to push very much too fast

2 – Going too slowly

3 – Moving at about the right speed

4. How much of the racial tension that exists in the United States today do you think blacks are responsible for creating?

1 – Strongly agree

2 – Somewhat agree

3 – Somewhat disagree

4 – Strongly disagree

5. How much discrimination against blacks do you feel there is in the United States today,

limiting their chances to get ahead?

1 – Strongly agree

2 – Somewhat agree

3 – Somewhat disagree

4 – Strongly disagree

6. Generations of and discrimination have created conditions that make it difficult for

225 blacks to work their way out of the lower class.

1 – Strongly agree

2 – Somewhat agree

3 – Somewhat disagree

4 – Strongly disagree

7. Over the past few years, blacks have gotten less than they deserve.

1 – Strongly agree

2 – Somewhat agree

3 – Somewhat disagree

4 – Strongly disagree

8. Over the past few years, blacks have gotten more economically than they deserve.

1 – Strongly agree

2 – Somewhat agree

3 – Somewhat disagree

4 – Strongly disagree

226 Appendix G: Control Text

Microaggressions are brief, everyday interactions that convey subtle negative messages

about race or gender. Microaggressions can be intentional or unintentional; in fact, often the

perpetrator of a microaggression does not even realize they have done something demeaning.

During his primary campaign against Barack Obama, Joe Biden was asked about Obama's appeal

to voters. He responded, “I mean, you got the first mainstream African-American candidate who

is articulate and bright and clean and a nice-looking guy.” On the surface, this comment sounds like praise. However, the implication is that Obama is an exception. If being articulate, bright, clean, and nice-looking is exceptional for a black man, then most blacks must be unintelligent, inarticulate, dirty, and unattractive. This is a classic example of a microaggression.

Each individual microaggression may have little impact on the person who experiences it,

but when they happen repeatedly over an entire lifetime, microaggressions can cumulatively do

great harm. Ongoing exposure to microaggresions hurts people physically, mentally,

emotionally, and socially. They produce health problems and reduce life expectancy; deplete

mental focus and energy; create feelings of anger while reducing feelings of self-worth; and deny

recipients equal access and opportunity in education, employment, and health care.

Schemas are hypotheses that we use to understand the world around us. They give us

clues about what to expect and how to interpret other people's behavior. For example, the

for “politician” often includes dishonesty, leading us to interpret anything a politician says as

self-serving.

227 American culture has strong schemas about gender. For example, women are thought of

as more nurturing and men as more aggressive. Because schemas tell us how to interpret

behavior, it is easier for us to notice men being aggressive and women being nurturing than the

other way around. Similarly, we have ideas about how white people and people of color behave.

In ambiguous situations, we interpret people's behavior in a way that matches our racial

expectations.

Our culture's model of professional competence contains many qualities from our

schemas of “man” and “white person,” and contains fewer qualities from our schemas of

“woman” and “person of color.” This makes it easier for us to see professional competence in

men and white people than in women and people of color. For example, we think of leadership

as part of the male role, making it easier to notice when men are leaders and interpreting

ambiguous actions in that light. This difference means we unconsciously overrate the professional ability of men and whites, while underrating the work of women and people of color.

Our everyday evaluations have a cumulative effect on the advancement of the people we judge, even when each individual effect is minor. The importance of the accumulation of advantage and disadvantage is that even small imbalances add up.

Every time we judge someone positively, it casts a slight halo over whatever they do next. They are also in a slightly better position to be thought of positively when the next opportunity to excel arises and to obtain, in turn, the next organizational reward. Further, they will be perceived as having earned their opportunities. Each small advantage generates opportunities for further advantage, even if every evaluation after the first is perfectly fair.

228 Naturally, a disadvantaged person can still go on to do stunning, brilliant work that, combined with superior interpersonal skills and an in-depth understanding of how institutions work, will guarantee their success. Only a tiny percentage of people, however, turn out stunningly brilliant work, have extensive interpersonal skills, and understand thoroughly how to exploit institutional procedures. Most advancement comes from having a small to medium edge over other employees. Our way of evaluating women and people of color puts them at a disadvantage, compared to men and white people, in acquiring that edge.

229 Appendix H: Check Questions

Control group questions

Schemas are . . .

1) hypotheses we use to understand the world.

2) trends in American behavior.

3) Strategies women use to improve their professional competence.

4) Dishonest techniques used by politicians.

As you were reading the text, how much did you want to continue reading?

1) Extremely.

2) A lot.

3) Somewhat.

4) A little bit.

5) Not at all.

How likely would you be to recommend this reading to a friend?

1) Very likely.

2) Likely.

3) Somewhat likely.

4) Unlikely.

5) Not at all likely.

230 Game condition questions

In the game you just played, money can be used to . . .

1) upgrade your clients' skills.

2) Bribe your clients' co-workers.

3) Send your clients on vacation.

4) Buy office supplies for your clients.

As you were playing the game, how much did you want to continue playing?

1) Extremely.

2) A lot.

3) Somewhat.

4) A little bit.

5) Not at all.

How likely would you be to recommend this game to a friend?

1) Very likely.

2) Likely.

3) Somewhat likely.

4) Unlikely.

5) Not at all likely.

231 Appendix I: Demographic Questions

• Age? [numeric response]

• Gender? [male, female, other]

• Race? [American Indian, Asian, Black, Hispanic, Native Hawaiian / Pacific Islander,

White, Other]

• First language? [English, Other]

• Nationality? [USA, Other with dropdown of ISO country codes]

• In which type of community do you live? [Urban, Rural, Suburban]

232 Appendix J: Full Data Tables

Table J1 ...... 246 Crosstabulation of Player Source and Player Gender Table J2 ...... 246 Crosstabulation of Player Source and Player Race Table J3 ...... 246 Mean Player Age by Player Source Table J4 ...... 247 ANOVA, Player Age by Player Source Table J5 ...... 247 Crosstabulation of Player Source and Living Area Table J6 ...... 247 Systemic Sexism Pretest Means by Player Source Table J7 ...... 247 ANOVA, Systemic Sexism Pretest Scores by Player Source Table J8 ...... 248 Systemic Racism Pretest Means by Player Source Table J9 ...... 248 ANOVA, Systemic Racism Pretest Means by Player Source Table J10 ...... 248 Modern Sexism Means by Player Source Table J11 ...... 248 ANOVA, Modern Sexism Pretest Score by Player Source Table J12 ...... 249 Symbolic Racism Pretest Means by Player Source Table J13 ...... 249 ANOVA, Symbolic Racism Pretest Score by Player Source Table J14 ...... 249 Crosstabulation of Player Source and Games Won Table J15 ...... 250 Mean Player Score by Player Source

233 Table J16 ...... 250 ANOVA, Player Score by Player Source Table J17 ...... 250 Demographics, Web Players Table J18 ...... 250 Age, Web Players Table J19 ...... 251 Systemic Sexism Pretest Means by Completion Status (web) Table J20 ...... 251 ANOVA, Systemic Sexism Pretest Score by Completion Status (web) Table J21 ...... 251 Systemic Racism Pretest Means by Completion Status (web) Table J22 ...... 251 ANOVA, Systemic Racism Pretest Score by Completion Status (web) Table J23 ...... 252 Modern Sexism Pretest Means by Completion Status (web) Table J24 ...... 252 ANOVA, Modern Sexism Pretest Score by Completion Status (web) Table J25 ...... 252 Symbolic Racism Pretest Means by Completion Status (web) Table J26 ...... 252 ANOVA, Symbolic Racism Pretest Score by Completion Status (web) Table J27 ...... 253 Systemic Sexism Posttest Means by Pretest Group (web) Table J28 ...... 253 ANOVA, Systemic Sexism Posttest Score by Pretest Group (web) Table J29 ...... 253 Systemic Racism Posttest Means by Pretest Group (web) Table J30 ...... 253 ANOVA, Systemic Racism Posttest Score by Pretest Group (web) Table J31 ...... 254 Modern Sexism Posttest Means by Pretest Group (web)

234 Table J32 ...... 254 ANOVA, Modern Sexism Posttest Score by Pretest Group (web) Table J33 ...... 254 Symbolic Racism Posttest Means by Pretest Group (web) Table J34 ...... 254 ANOVA, Symbolic Racism Posttest Score by Pretest Group (web) Table J35 ...... 255 Systemic Sexism Posttest Means by Treatment Condition (web) Table J36 ...... 255 Systemic Sexism Posttest Marginal Means by Treatment Condition (web) Table J37 ...... 256 ANCOVA, Systemic Sexism Posttest Score by Treatment Condition (web) Table J38 ...... 256 Treatment Condition Control vs. Game Contrast, Systemic Sexism Posttest Score (web) Table J39 ...... 256 Systemic Sexism Difference Score T-Tests by Treatment Condition (web) Table J40 ...... 257 Systemic Racism Posttest Means by Treatment Condition (web) Table J41 ...... 257 Systemic Racism Posttest Marginal Means by Treatment Condition (web) Table J42 ...... 258 ANCOVA, Systemic Racism Posttest Score by Treatment Condition (web) Table J43 ...... 258 Treament Condition Control vs Game Contrast, Systemic Racism Posttest Score (web) Table J44 ...... 258 Systemic Racism Difference Score Overall T-Test (web) Table J45 ...... 259 Game Data Correlations with Systemic Sexism Posttest Score* (web) Table J46 ...... 259 Systemic Sexism Posttest Means by Number of Plays (web) Table J47 ...... 259 ANCOVA, Systemic Sexism Posttest Score by Number of Plays (web)

235 Table J48 ...... 260 Game Data Correlations with Systemic Racism Posttest Score* (web) Table J49 ...... 260 Systemic Racism Posttest Means by Number of Plays (web) Table J50 ...... 260 ANCOVA, Systemic Racism Posttest Score by Number of Plays (web) Table J51 ...... 261 Mean Score by Player Race and Gender (web) Table J52 ...... 261 ANCOVA, Mean Score by Player Race and Gender (web) Table J53 ...... 261 Mean Clients Placed by Player Race and Gender (web) Table J54 ...... 262 ANCOVA, Mean Clients Placed by Player Race and Gender (web) Table J55 ...... 262 Mean Bias-Group Clients Placed by Player Race and Gender (web) Table J56 ...... 262 ANCOVA, Mean Bias-Group Clients Placed by Player Race and Gender (web) Table J57 ...... 263 Mean Guesses by Player Race and Gender (web) Table J58 ...... 263 ANCOVA, Mean Guesses by Player Race and Gender (web) Table J59 ...... 264 Crosstabulation of Player Race and Games Played (web) Table J60 ...... 264 Crosstabulation of Player Gender and Games Played (web) Table J61 ...... 265 Systemic Sexism Posttest Means by Bias Guess Condition (web) Table J62 ...... 265 Systemic Sexism Posttest Marginal Means by Bias Guess Condition (web) Table J63 ...... 266 ANCOVA, Systemic Sexism Posttest Score by Bias Guess Condition (web)

236 Table J64 ...... 266 Systemic Racism Posttest Means by Bias Guess Condition (web) Table J65 ...... 267 Systemic Racism Posttest Marginal Means by Bias Guess Condition (web) Table J66 ...... 267 ANCOVA, Systemic Racism Posttest Score by Bias Guess Condition (web) Table J67 ...... 268 Game Score Means by Bias Guess Condition (web) Table J68 ...... 268 Game Score Marginal Means by Bias Guess Condition (web) Table J69 ...... 269 ANCOVA, Game Score by Bias Guess Condition (web) Table J70 ...... 269 Bias Guess Condition No Guess vs. Guess Contrast, Game Score (web) Table J71 ...... 269 Mean Score Percentage Earned from Bias Group by Bias Guess Condition (web) Table J72 ...... 270 Marginal Means, Score Percentage Earned from Bias Group by Bias Guess Condition (web) Table J73 ...... 270 ANCOVA, Score Percentage Earned from Bias Group by Bias Guess Condition (web) Table J74 ...... 271 Clients Placed Means by Bias Guess Condition (web) Table J75 ...... 271 Clients Placed Marginal Means by Bias Guess Condition (web) Table J76 ...... 272 ANCOVA, Clients Placed by Bias Guess Condition (web) Table J77 ...... 272 Bias Group Clients Placed by Bias Guess Condition (web) Table J78 ...... 273 Bias Group Clients Placed Marginal Means by Bias Guess Condition (web) Table J79 ...... 273 ANCOVA, Bias Group Clients Placed by Bias Guess Condition (web)

237 Table J80 ...... 273 Modern Sexism Posttest Means by Treatment Condition (web) Table J81 ...... 274 Modern Sexism Posttest Marginal Means by Treatment Condition (web) Table J82 ...... 274 ANCOVA, Modern Sexism Posttest Score by Treatment Condition (web) Table J83 ...... 275 Modern Sexism Difference Score Overall T-Test (web) Table J84 ...... 275 Symbolic Racism Posttest Means by Treatment Condition (web) Table J85 ...... 275 Symbolic Racism Posttest Marginal Means by Treatment Condition (web) Table J86 ...... 276 ANCOVA, Symbolic Racism Posttest Score by Treatment Condition (web) Table J87 ...... 276 Treatment Condition Control vs Game Contrast, Symbolic Racism Posttest Score (web) Table J88 ...... 276 Symbolic Racism Posttest Score T-Tests by Treatment Condition (web) Table J89 ...... 277 Game Data Correlations with Modern Sexism Posttest Score* (web) Table J90 ...... 277 Modern Sexism Posttest Means by Number of Plays (web) Table J91 ...... 277 ANCOVA, Modern Sexism Posttest Score by Number of Plays (web) Table J92 ...... 278 Game Data Correlations with Symbolic Racism Posttest Score* (web) Table J93 ...... 278 Symbolic Racism Posttest Means by Number of Plays (web) Table J94 ...... 278 ANCOVA, Symbolic Racism Posttest Score by Number of Plays (web) Table J95 ...... 279 Modern Sexism Posttest Means by Bias Guess Condition (web)

238 Table J96 ...... 279 Modern Sexism Posttest Marginal Means by Bias Guess Condition (web) Table J97 ...... 280 ANCOVA, Modern Sexism Posttest Score by Bias Guess Condition (web) Table J98 ...... 280 Bias Guess Condition No Guess vs Guess Contrast, Modern Sexism Posttest Score (web) Table J99 ...... 280 Symbolic Racism Posttest Means by Bias Guess Condition (web) Table J100 ...... 281 Symbolic Racism Posttest Marginal Means by Bias Guess Condition (web) Table J101 ...... 281 ANCOVA, Symbolic Racism Posttest Score by Bias Guess Condition (web) Table J102 ...... 281 Bias Guess Condition No Guess vs Guess Contrast, Symbolic Racism Posttest Score (web) Table J103 ...... 282 Demographics, Mechanical Turk Players Table J104 ...... 282 Age, Mechanical Turk Players Table J105 ...... 282 Systemic Sexism Pretest Means by Completion Status (MT) Table J106 ...... 283 ANOVA, Systemic Sexism Pretest Score by Completion Status (MT) Table J107 ...... 283 Systemic Racism Pretest Means by Completion Status (MT) Table J108 ...... 283 ANOVA, Systemic Racism Pretest Score by Completion Status (MT) Table J109 ...... 283 Modern Sexism Pretest Means by Completion Status (MT) Table J110 ...... 284 ANOVA, Modern Sexism Pretest Score by Completion Status (MT) Table J111 ...... 284 Symbolic Racism Pretest Means by Completion Status (MT)

239 Table J112 ...... 284 ANOVA, Symbolic Racism Pretest Score by Completion Status (MT) Table J113 ...... 284 Systemic Sexism Posttest Means by Pretest Group (MT) Table J114 ...... 285 ANOVA, Systemic Sexism Posttest Score by Pretest Group (MT) Table J115 ...... 285 Systemic Racism Posttest Means by Pretest Group (MT) Table J116 ...... 285 ANOVA, Systemic Racism Posttest Score by Pretest Group (MT) Table J117 ...... 285 Modern Sexism Posttest Means by Pretest Group (MT) Table J118 ...... 286 ANOVA, Modern Sexism Posttest Score by Pretest Group (MT) Table J119 ...... 286 Symbolic Racism Posttest Means by Pretest Group (MT) Table J120 ...... 286 ANOVA, Symbolic Racism Posttest Score by Pretest Group (MT) Table J121 ...... 286 Systemic Sexism Posttest Marginal Means by Treatment Condition (MT) Table J122 ...... 287 ANCOVA, Systemic Sexism Posttest Score by Treatment Condition (MT) Table J123 ...... 287 Treatment Condition Control vs. Game Contrast, Systemic Sexism Posttest Score (MT) Table J124 ...... 287 Systemic Sexism Difference Score Overall T-Test (MT) Table J125 ...... 288 Systemic Racism Posttest Marginal Means by Treatment Condition (MT) Table J126 ...... 288 ANCOVA, Systemic Racism Posttest Score by Treatment Condition (MT) Table J127 ...... 288 Treatment Condition Control vs. Game Contrast, Systemic Racism Posttest Score (MT)

240 Table J128 ...... 289 Systemic Racism Difference Score Overall T-Test (MT) Table J129 ...... 289 Game Data Correlations with Systemic Sexism Posttest Score* (MT) Table J130 ...... 289 Systemic Sexism Posttest Means by Number of Plays (MT) Table J131 ...... 289 ANCOVA, Systemic Sexism Posttest Score by Number of Plays (MT) Table J132 ...... 290 Game Data Correlations with Systemic Racism Posttest Score* (MT) Table J133 ...... 290 Systemic Racism Posttest Means by Number of Plays (MT) Table J134 ...... 290 ANCOVA, Systemic Racism Posttest Score by Number of Plays (MT) Table J135 ...... 291 Mean Score by Player Race and Gender (MT) Table J136 ...... 291 ANCOVA, Mean Score by Player Race and Gender (MT) Table J137 ...... 291 Mean Clients Placed by Player Race and Gender (MT) Table J138 ...... 292 ANCOVA, Mean Clients Placed by Player Race and Gender (MT) Table J139 ...... 292 White vs. non-White Contrast, Clients Placed (MT) Table J140 ...... 292 Mean Bias-Group Clients Placed by Player Race and Gender (MT) Table J141 ...... 293 ANCOVA, Mean Bias-Group Clients Placed by Player Race and Gender (MT) Table J142 ...... 293 Mean Guesses by Player Race and Gender (MT) Table J143 ...... 293 ANCOVA, Mean Guesses by Player Race and Gender (MT)

241 Table J144 ...... 294 Crosstabulation of Player Race and Games Played (MT) Table J145 ...... 294 Crosstabulation of Player Gender and Games Played (MT) Table J146 ...... 295 Systemic Racism Posttest Marginal Means by Bias Guess Condition (MT) Table J147 ...... 295 ANCOVA, Systemic Sexism Posttest Score by Bias Guess Condition (MT) Table J148 ...... 295 Bias Guess Condition No Guess vs Guess Contrast, Systemic Sexism Posttest Score (MT) Table J149 ...... 296 Systemic Sexism Posttest Marginal Means by Player Race (MT) Table J150 ...... 296 Systemic Sexism Difference Score T-Tests by Player Race (MT) Table J151 ...... 296 Systemic Racism Posttest Marginal Means by Bias Guess Condition (MT) Table J152 ...... 297 ANCOVA, Systemic Racism Posttest Score by Bias Guess Condition (MT) Table J153 ...... 297 Bias Guess Condition No Guess vs Guess Contrast, Systemic Racism Posttest Score (MT) Table J154 ...... 298 Systemic Racism Posttest Means by Player Gender and Player Race (MT) Table J155 ...... 298 Systemic Racism Difference Score T-Tests by Player Race and Gender (MT) Table J156 ...... 299 Game Score Marginal Means by Bias Guess Condition (MT) Table J157 ...... 299 ANCOVA, Game Score by Bias Guess Condition (MT) Table J158 ...... 299 Mean Score Percentage Earned from Bias Group by Bias Guess Condition (MT) Table J159 ...... 300 ANCOVA, Score Percentage Earned from Bias Group by Bias Guess Condition (MT)

242 Table J160 ...... 300 Score Percentage Earned from Bias Group Means by Player Race and Gender (MT) Table J161 ...... 300 Total Clients Placed by Bias Guess Condition (MT) Table J162 ...... 301 ANCOVA, Total Clients Placed by Bias Guess Condition (MT) Table J163 ...... 301 Bias Group Clients Placed by Bias Guess Condition (MT) Table J164 ...... 301 ANCOVA, Bias Group Clients Placed by Bias Guess Condition (MT) Table J165 ...... 302 Modern Sexism Posttest Marginal Means by Treatment Condition (MT) Table J166 ...... 302 ANCOVA, Modern Sexism Posttest Score by Treatment Condition (MT) Table J167 ...... 302 Treatment Condition Control vs Game Contrast, Modern Sexism Posttest Score (MT) Table J168 ...... 303 Modern Sexism Difference Score T-Tests by Treatment Condition Table J169 ...... 303 Symbolic Racism Posttest Marginal Means by Treatment Condition (MT) Table J170 ...... 304 ANCOVA, Symbolic Racism Posttest Score by Treatment Condition (MT) Table J171 ...... 304 Treatment Condition Control vs Game Contrast, Symbolic Racism Posttest Score (MT) Table J172 ...... 304 Symbolic Racism Posttest Marginal Means by Player Race (MT) Table J173 ...... 305 Symbolic Racism Difference Score T-Tests by Player Race (MT) Table J174 ...... 305 Game Data Correlations with Modern Sexism Posttest Score* (MT) Table J175 ...... 305 Modern Sexism Posttest Means by Number of Plays (MT)

243 Table J176 ...... 306 ANCOVA, Modern Sexism Posttest Score by Number of Plays (MT) Table J177 ...... 306 Game Data Correlations with Symbolic Racism Posttest Score* (MT) Table J178 ...... 306 Symbolic Racism Posttest Means by Number of Plays (MT) Table J179 ...... 306 ANCOVA, Symbolic Racism Posttest Score by Number of Plays (MT) Table J180 ...... 307 Modern Sexism Posttest Marginal Means by Bias Guess Condition and Player Race (MT) Table J181 ...... 308 ANCOVA, Modern Sexism Posttest Score by Bias Guess Condition (MT) Table J182 ...... 308 Modern Sexism Posttest Marginal Means by Bias Guess Condition (MT) Table J183 ...... 308 Bias Guess Condition No Guess vs. Guess Contrast, Modern Sexism Posttest Score (MT) Table J184 ...... 309 Modern Sexism Posttest Marginal Means by Player Race (MT) Table J185 ...... 309 Player Race Contrast, Modern Sexism Posttest Score (MT) Table J186 ...... 309 Modern Sexism Posttest Means by Guess Condition, White Players Only (MT) Table J187 ...... 310 ANCOVA, Modern Sexism Posttest Score by Bias Guess Condition, White Players Only (MT) Table J188 ...... 310 Modern Sexism Difference Score T-Test, White Players Only Table J189 ...... 310 Modern Sexism Posttest Means by Bias Guess Condition, Black, Hispanic, and Other Players (MT) Table J190 ...... 310 ANCOVA, Modern Sexism Posttest Score by Bias Guess Condition, Black, Hispanic and Other Players (MT) Table J191 ...... 311

244 Modern Sexism Difference Score T-Test, Black, Hispanic, and Other Players (MT) Table J192 ...... 311 Symbolic Racism Posttest Marginal Means by Bias Guess Condition and Player Race (MT) Table J193 ...... 312 ANCOVA, Symbolic Racism Posttest Score by Bias Guess Condition (MT) Table J194 ...... 312 Symbolic Racism Posttest Means by Bias Guess Condition (MT) Table J195 ...... 312 Bias Guess Condition No Guess vs. Guess Contrast, Symbolic Racism Posttest Score (MT) Table J196 ...... 313 Symbolic Racism Posttest Marginal Means by Player Race (MT) Table J197 ...... 313 White vs. non-White Player Race Contrast, Symbolic Racism Posttest Score (MT) Table J198 ...... 313 Symbolic Racism Posttest Means by Bias Guess Condition, White Players (MT) Table J199 ...... 314 ANCOVA, Symbolic Racism Posttest Score by Bias Guess Condition, White Players (MT) Table J200 ...... 314 Symbolic Racism Difference Score T-Test, White Players Table J201 ...... 314 Symbolic Racism Posttest Means by Bias Guess Condition, Black, Hispanic, and Other Players (MT) Table J202 ...... 315 ANCOVA, Symbolic Racism Posttest Score by Bias Guess Condition,, Black, Hispanic, and Other Players (MT) Table J203 ...... 315 Symbolic Racism Difference Score T-Tests by Bias Guess Condition, Black, Hispanic, Other Players (MT)

245 Table J1

Crosstabulation of Player Source and Player Gender Player Gender χ² p Total

Player Source Female Male

Web 108 106 1.80a .180 214

Mechanical Turk 108 81 189 Total 216 187 403 a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 87.70.

Table J2

Crosstabulation of Player Source and Player Race Player Race χ² p Total

Black and Player Source White Hispanic Other

Web 194 5 22 10.94a .004* 221

Mechanical Turk 148 17 26 191

Total 342 22 48 a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 10.20.

* p≤ .005

Table J3 Mean Player Age by Player Source Player Source Mean SD N

Web 32.60 8.73 220

Mechanical Turk 34.72 10.93 190

Total 33.58 9.855 410

246 Table J4

ANOVA, Player Age by Player Source Source df F p η2

Player Source 1 4.79 .029* 0.012

* p≤ .05

Table J5

Crosstabulation of Player Source and Living Area Living Area χ² p Total

Player Source Rural Suburban Urban

Web 17 118 86 7.54 .023* 221

Mechanical Turk 31 97 63 191

Total 48 215 149 412 a. 0 cells (.0%) have expected count less than 5. The minimum expected count is 22.25. p ≤ .05

Table J6

Systemic Sexism Pretest Means by Player Source Player Source Mean SD N

Web 2.89 1.29 122

Mechanical Turk 1.93 1.16 99

Total 2.46 1.32 221

Table J7 ANOVA, Systemic Sexism Pretest Scores by Player Source Source df F p η2

Player Source 1 33.30 .000* 0.132

* p≤ .001

247 Table J8

Systemic Racism Pretest Means by Player Source Player Source Mean SD N

Web 3.08 1.28 122

Mechanical Turk 1.90 1.25 99

Total 2.55 1.40 221

Table J9

ANOVA, Systemic Racism Pretest Means by Player Source Source df F p η2

Player Source 1 47.56 .000* 0.178

* p≤ .001

Table J10

Modern Sexism Means by Player Source Player Source Mean SD N

Web 31.82 4.15 122

Mechanical Turk 27.43 5.56 99

Total 29.86 5.29 221

Table J11

ANOVA, Modern Sexism Pretest Score by Player Source Source df F p η2

Player Source 1 45.08 .000* 0.171

* p≤ .001

248 Table J12

Symbolic Racism Pretest Means by Player Source Player Source Mean SD N

Web 26.28 4.38 122

Mechanical Turk 20.12 4.99 99

Total 23.52 5.57 221

Table J13

ANOVA, Symbolic Racism Pretest Score by Player Source Source df F p η2

Player Source 1 95.39 .000* 0.303

* p≤ .001

Table J14

Crosstabulation of Player Source and Games Won

Won Game χ² p Total

Player Source No Wins Wins

Web 11 171 9.71a .002* 182

Mechanical Turk 27 136 163 Total 38 307 345 a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 17.95. p ≤ .05

249 Table J15

Mean Player Score by Player Source Player Source Mean SD N

Web 254490.33 1004718.87 123

Mechanical Turk 870099.71 4287142.14 105

Total 537994.65 3009601.50 228

Table J16

ANOVA, Player Score by Player Source Source df F p η2

Player Source 1 2.38 .124 0.010

Table J17

Demographics, Web Players N %

Player Gender Female 108 50.50

Male 106 49.50

Player Race White 194 87.80

Black and Hispanic 5 2.30

Other 22 10.00

Living Area Rural 17 7.70

Suburban 118 53.40

Urban 86 38.90

Table J18

Age , Web Players N Min Max Mean SD

Age 220 19 68 32.60 8.73

250 Table J19

Systemic Sexism Pretest Means by Completion Status (web) Completed Mean SD N

No 2.98 1.34 88

Yes 3.06 1.30 157

Total 3.03 1.31 245

Table J20

ANOVA, Systemic Sexism Pretest Score by Completion Status (web) Source df F p η2

Completed 1 0.21 .647 0.001

Table J21

Systemic Racism Pretest Means by Completion Status (web) Completed Mean SD N

No 2.97 1.36 88

Yes 3.24 1.29 157

Total 3.14 1.32 245

Table J22

ANOVA, Systemic Racism Pretest Score by Completion Status (web) Source df F p η2

Completed 1 2.49 .116 0.010

251 Table J23

Modern Sexism Pretest Means by Completion Status (web) Completed Mean SD N

No 31.93 4.30 88

Yes 32.18 3.86 157

Total 32.09 4.02 245

Table J24

ANOVA, Modern Sexism Pretest Score by Completion Status (web) Source df F p η2

Completed 1 0.22 .637 0.001

Table J25

Symbolic Racism Pretest Means by Completion Status (web) Completed Mean SD N

No 25.17 4.33 88

Yes 26.42 4.08 157

Total 25.97 4.21 245

Table J26

ANOVA, Symbolic Racism Pretest Score by Completion Status (web) Source df F p η2

Completed 1 5.06 .025* 0.020

* p≤ .05

252 Table J27

Systemic Sexism Posttest Means by Pretest Group (web) Pretest Group Mean SD N

No Pretest 3.50 1.24 98

Pretest 3.12 1.42 122

Total 3.29 1.35 220

Table J28

ANOVA, Systemic Sexism Posttest Score by Pretest Group (web) Source df F p η2

Pretest Group 1 4.3 .039* 0.019

* p≤ .05

Table J29

Systemic Racism Posttest Means by Pretest Group (web) Pretest Group Mean SD N

No Pretest 3.56 1.32 98

Pretest 3.00 1.43 122

Total 3.25 1.40 220

Table J30

ANOVA, Systemic Racism Posttest Score by Pretest Group (web) Source df F p η2

Pretest Group 1 9.01 .003* 0.040

* p≤ .005

253 Table J31

Modern Sexism Posttest Means by Pretest Group (web) Pretest Group Mean SD N

No Pretest 32.45 4.06 98

Pretest 31.98 4.14 122

Total 32.19 4.10 220

Table J32

ANOVA, Modern Sexism Posttest Score by Pretest Group (web) Source df F p η2

Pretest Group 1 0.698 .404 0.003

Table J33

Symbolic Racism Posttest Means by Pretest Group (web) Pretest Group Mean SD N

No Pretest 22.99 1.74 98

Pretest 23.11 2.15 122

Total 23.06 1.97 220

Table J34

ANOVA, Symbolic Racism Posttest Score by Pretest Group (web) Source df F p η2

Pretest Group 1 0.217 .642 0.001

254 Table J35

Systemic Sexism Posttest Means by Treatment Condition (web) Treatment Condition Mean SD N

Control 3.73 1.18 41

Informational 2.92 1.26 25

Financial 2.30 1.41 27

Generative 3.33 1.37 24

Total 3.15 1.39 117

Table J36

Systemic Sexism Posttest Marginal Means by Treatment Condition (web) Treatment Condition Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

Control 3.64a,b 0.38 2.90 4.38

Informational .a,c . . .

Financial .a,c . . .

Generative 3.13a,b 0.34 2.46 3.79 a. Covariates appearing in the model are evaluated at the following values: Systemic Sexism Pretest Score = 2.8974. b. Based on modified population marginal mean. c. This modified population marginal mean is not estimable.

255 Table J37

ANCOVA, Systemic Sexism Posttest Score by Treatment Condition (web) Source df F p η2

Condition 3 3.70 .014* 0.099

Gender 1 1.08 .301 0.011

Race 2 1.04 .357 0.02

Systemic Sexism Pretest 1 54.67 .000* 0.351

Gender * Race 1 0.56 .455 0.006

Condition * Gender 3 2.06 .110 0.058

Condition * Race 3 0.77 .514 0.022

*p≤ .05

**p≤ .001

Table J38

Treatment Condition Control vs. Game Contrast, Systemic Sexism Posttest Score (web) Source df F p η2

Contrast 1 5.19 .025* 0.049

* p≤ .005

Table J39 Systemic Sexism Difference Score T-Tests by Treatment Condition (web) Test Value = 0 N Mean SD t df p

Control Group 43 0.58 1.10 3.48 42 .001*

Informational Group 25 0.04 1.31 0.15 24 .880

Financial Group 28 -0.25 1.24 -1.07 27 .294

Generative Group 26 0.35 1.26 1.40 25 .175

* p≤ .001

256 Table J40

Systemic Racism Posttest Means by Treatm ent Condition (web) Treatment Condition Mean SD N

Control 3.24 1.37 41

Informational 2.92 1.44 25

Financial 2.56 1.55 27

Generative 3.00 1.38 24

Total 2.97 1.44 117

Table J41

Systemic Racism Posttest Marginal Means by Treatment Condition (web ) Treatment Condition Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

Control 3.013a,b 0.45 2.11 3.92

Informational .a,c . . .

Financial .a,c . . .

Generative 3.389a,b 0.40 2.59 4.19 a. Covariates appearing in the model are evaluated at the following values: Systemic Racism Pretest Score = 3.0940. b. Based on modified population marginal mean. c. This modified population marginal mean is not estimable.

257 Table J42

ANCOVA, Systemic Racism Posttest Score by Treatment Condition (web) Source df F p η2

Condition 3 1.55 .207 0.044

Gender 1 0.17 .680 0.002

Race 2 0.07 .932 0.001

Systemic Racism Pretest 1 27.01 .000* 0.211

Gender * Race 1 0.19 .665 0.002

Condition * Gender 3 0.32 .808 0.010

Condition * Race 3 0.98 .406 0.028

* p≤ .001

Table J43

Treament Condition Control vs Game Contrast, Systemic Racism Posttest Score (web) Source df F p η2

Contrast 1 0.001 .972 .000

Table J44 Systemic Racism Difference Score Overall T-Test (web) Test Value = 0 N Mean SD t df p

Systemic Racism Difference Score 122 -0.08 1.39 -0.65 121 .517

258 Table J45

Game Data Correlations with Systemic Sexism Posttest Score* (web) df r p

Player Score 70 -0.180 .130

Total Clients Placed 99 -0.038 .706

Total Clients Placed (Biased Group) 99 -0.054 .590

Number of Guesses 100 0.157 .115

*Controlling for Systemic Sexism Pretest Score

Table J46

Systemic Sexism Posttest Means by Number of Plays (web) Number of Plays Mean SD N

Two 3.13 1.42 120

More Than Two 3.00 1.41 2

Total 3.12 1.42 122

Table J47

ANCOVA, Systemic Sexism Posttest Score by Number of Plays (web) Source df F p η2

Systemic Sexism Pretest 1 62.08 .000* 0.343

More Than Two 1 0.06 .814 0

* p≤ .001

259 Table J48

Game Data Correlations with Systemic Racism Posttest Score* (web) df r p

Player Score 70 -0.059 .621

Total Clients Placed 99 0.032 .753

Total Clients Placed (Biased Group) 99 -0.039 .697

Number of Guesses 100 0.165 .097

*Controlling for Systemic Racism Pretest Score

Table J49

Systemic Racism Posttest Means by Number of Plays (web) Number of Plays Mean SD N

Two 3.02 1.43 120

More Than Two 2.00 0 2

Total 3.00 1.43 122

Table J50

ANCOVA, Systemic Racism Posttest Score by Number of Plays (web) Source df F p η2

Systemic Racism Pretest 1 36.98 .000* 0.237

More Than Two 1 2.89 .092 0.024

* p≤ .001

260 Table J51

Mean Score by Player Race and Gender (web) Player Gender Player Race Mean SD N

Female White 272694.04 1072307.04 54

Black and Hispanic 76740.00 - 1

Other 67562.50 18428.25 4

Male White 283028.52 1079014.54 54

Black and Hispanic 39420.00 - 1

Other 98473.33 48092.55 6

Total 258219.08 1017017.13 120

Table J52

ANCOVA, Mean Score by Player Race and Gender (web) Source df F p η2

Race 2 .19 .824 .003

Gender 1 .00 .998 .000

Race * Gender 2 .00 .999 .000

Table J53

Mean Clients Placed by Player Race and Gender (web) Player Gender Player Race Mean SD N

Female White 32.95 17.76 76

Black and Hispanic 34.00 9.90 2

Other 19.50 16.08 6

Male White 36.90 19.42 81

Black and Hispanic 18.00 - 1

Other 35.56 19.82 10

Total 34.90 18.67 176

261 Table J54

ANCOVA, Mean Clients Placed by Player Race and Gender (web) Source df F p η2

Race 2 .67 .514 .008

Gender 1 .10 .756 .001

Race * Gender 2 3.05 .055 .036

Table J55

Mean Bias-Group Clients Placed by Player Race and Gender (web) Player Gender Player Race Mean SD N

Female White 12.75 8.00 76

Black and Hispanic 11.00 2.82 2

Other 6.67 6.62 6

Male White 14.39 9.24 81

Black and Hispanic 8.00 - 1

Other 14.70 9.37 10

Total 13.36 8.65 176

Table J56

ANCOVA, Mean Bias-Group Clients Placed by Player Race and Gender (web) Source df F p η2

Race 2 1.03 .361 .012

Gender 1 .34 .564 .002

Race * Gender 2 1.06 .349 .012

262 Table J57

Mean Guesses by Player Race and Gender (web) Player Gender Player Race Mean SD N

Female White 1.96 2.13 77

Black and Hispanic 3.00 .00 2

Other .67 .82 6

Male White 1.63 2.08 81

Black and Hispanic 3.00 - 1

Other 1.90 2.73 10

Total 1.78 2.10 177

Table J58

ANCOVA, Mean Guesses by Player Race and Gender (web) Source df F p η2

Race 2 .86 .424 .010

Gender 1 .10 .750 .001

Race * Gender 2 .94 .391 .011

263

Table J59

Crosstabulation of Player Race and Games Played (web)

Games Played χ² p Total

Player Race 2 >2

White 189 5 .433a .806 194

Black and Hispanic 5 0 5

Other 21 1 22

Total 38 307 221 a. 3 cells (50.0%) have expected count less than 5. The minimum expected count is .14.

Table J60

Crosstabulation of Player Gender and Games Played (web)

Games Played χ² p Total

Player Gender 2 >2

Female 105 3 .001a .981 108

Male 103 3 106 Total 208 6 214 a. 2 cells (50.0%) have expected count less than 5. The minimum expected count is 2.97.

264 Table J61

Systemic Sexism Posttest Means by Bias Gues s Condition (web) Bias Guess Condition Mean SD N

No Guess 2.45 1.37 22

Informational Guess 3.00 1.26 20

Financial Guess 2.29 1.49 17

Generative Guess 3.56 1.15 16

Total 2.80 1.39 75

Table J62

Systemic Sexism Posttest Marginal Means by Bias Guess Condition (web) Guessing Condition Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

No Guess 2.37a,b 0.36 1.66 3.09

Informational Guess .a,c . . .

Financial Guess .a,c . . .

Generative Guess 3.24a,b 0.42 2.39 4.09 a. Covariates appearing in the model are evaluated at the following values: Systemic Sexism Pretest Score = 2.7200. b. Based on modified population marginal mean. c. This modified population marginal mean is not estimable.

265 Table J63

ANCOVA, Systemic Sexism Posttest Score by Bias Guess Condition (web) Source df F p η2

Bias Guess Condition 3 1.50 .225 0.071

Gender 1 0.50 .482 0.008

Race 2 0.53 .592 0.018

Systemic Sexism Pretest 1 29.15 .000* 0.331

Gender * Race 1 0.59 .446 0.010

Bias Guess Condition * Gender 3 2.43 .074 0.110

Bias Guess Condition * Race 3 0.59 .622 0.029

* p≤ .001

Table J64

Systemic Racism Posttest Means by Bias Guess Condition (web) Bias Guess Condition Mean SD N

No Guess 2.41 1.44 22

Informational Guess 2.80 1.58 20

Financial Guess 2.76 1.48 17

Generative Guess 3.38 1.26 16

Total 2.80 1.46 75

266 Table J65

Systemic Racism Posttest Marginal Means by Bias Guess Condition (web) Guessing Condition Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

No Guess 2.83a,b 0.43 1.96 3.70

Informational Guess .a,c . . .

Financial Guess .a,c . . .

Generative Guess 3.36a,b 0.51 2.33 4.38 a. Covariates appearing in the model are evaluated at the following values: Systemic Racism Pretest Score = 2.9067. b. Based on modified population marginal mean. c. This modified population marginal mean is not estimable.

Table J66

ANCOVA, Systemic Racism Posttest Score by Bias Guess Condition (web) Source df F p η2

Bias Guess Condition 3 0.87 .464 0.042

Gender 1 0.70 .406 0.012

Race 2 0.40 .675 0.013

Systemic Racism Pretest 1 15.16 .000* 0.204

Gender * Race 1 0.06 .803 0.001

Bias Guess Condition * Gender 3 0.12 .945 0.006

Bias Guess Condition * Race 3 0.69 .561 0.034

* p≤ .001

267 Table J67

Game Score Means by Bias Guess Condition (web) Guessing Condition Mean SD N

No Guess 82763.93 29562.34 28

Informational Guess 298966.45 1251433.62 31

Financial Guess 369154.00 1463081.04 25

Generative Guess 282558.06 818733.96 36

Total 258219.08 1017017.13 120

Table J68

Game Score Marginal Means by Bias Guess Condition (web) Guessing Condition Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

No Guess 99792.33a 305353.91 -505601.49 705186.14

Informational Guess .b . . .

Financial Guess .b . . .

Generative Guess 189722.95a 327346.89 -459274.09 838719.99 a. Based on modified population marginal mean. b. This modified population marginal mean is not estimable.

268 Table J69

ANCOVA, Game Score by Bias Guess Condition (web) Source df F p η2

Bias Guess Condition 3 0.07 .978 0.002

Gender 1 0.02 .883 0.000

Race 2 0.07 .929 0.001

Bias Guess Condition * Gender 3 0.72 .543 0.020

Bias Guess Condition * Race 2 0.07 .934 0.001

Gender * Race 1 0.04 .853 0.000

Table J70

Bias Guess Condition No Guess vs. Guess Contrast, Game Score (web) Source df F p η2

Contrast 1 0.05 .830 0.000

Table J71

Mean Score Percentage Earned from Bias Group by Bias Guess Condition (web) Guess Condition Mean SD N

No Guess 0.18 0.13 28

Informational Guess 0.18 0.14 31

Financial Guess 0.19 0.15 25

Generative Guess 0.16 0.10 36

Total 0.18 0.13 120

269 Table J72

Marginal Means, Score Percentage Earned from Bias Group by Bias Guess Condition (web) Guessing Condition Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

No Guess .19a 0.037 0.11 0.26

Informational Guess .b . . .

Financial Guess .b . . .

Generative Guess .16a 0.04 0.08 0.24 a. Based on modified population marginal mean. b. This modified population marginal mean is not estimable.

Table J73

ANCOVA, Score Percentage Earned from Bias Group by Bias Guess Condition (web) Source df F p η2

Bias Guess Condition 3 0.28 .838 0.008

Gender 1 0.23 .632 0.002

Race 2 0.45 .641 0.008

Bias Guess Condition * Gender 3 1.69 .173 0.046

Bias Guess Condition * Race 2 0.73 .484 0.014

Gender * Race 1 0.00 .972 0.000

270 Table J74

Clients Placed Means by Bias Guess Condition (web) Guess Condition Mean SD N

No Guess 39.38 16.13 29

Informational Guess 33.37 19.43 35

Financial Guess 30.96 19.81 28

Generative Guess 36.87 18.62 39

Total 35.23 18.63 131

Table J75

Clients Placed Marginal Means by Bias Guess Condition (web) Guessing Condition Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

No Guess 38.09a 5.32 27.55 48.64

Informational Guess .b . . .

Financial Guess .b . . .

Generative Guess 37.00a 5.72 25.67 48.33 a. Based on modified population marginal mean. b. This modified population marginal mean is not estimable.

271 Table J76

ANCOVA, Clients Placed by Bias Guess Condition (web) Source df F p η2

Bias Guess Condition 3 1.27 .290 0.032

Gender 1 1.88 .174 0.016

Race 2 1.24 .292 0.021

Bias Guess Condition * Gender 3 0.02 .996 0.001

Bias Guess Condition * Race 3 0.85 .472 0.021

Gender * Race 1 0.27 .606 0.002

Table J77

Bias Group Client s Placed by Bias Guess Condition (web) Guess Condition Mean SD N

No Guess 15.31 7.75 29

Informational Guess 13.09 8.57 35

Financial Guess 12.39 10.47 28

Generative Guess 13.44 8.01 39

Total 13.53 8.65 131

272 Table J78

Bias Group Clients Placed Marginal Means by Bias Guess Condition (web) Guessing Condition Mean Standard 95% Confidence Interval Error

Lower Bound Upper Bound

No Guess 13.84a 2.52 8.84 18.83

Informational Guess .b . . .

Financial Guess .b . . .

Generative Guess 13.42a 2.71 8.05 18.79 a. Based on modified population marginal mean. b. This modified population marginal mean is not estimable.

Table J79

ANCOVA, Bias Group Clients Placed by Bias Guess Condition (web) Source df F p η2

Bias Guess Condition 3 0.59 .620 0.015

Gender 1 1.09 .298 0.009

Race 2 1.42 .247 0.024

Bias Guess Condition * Gender 3 0.03 .992 0.001

Bias Guess Condition * Race 3 0.53 .665 0.013

Gender * Race 1 0.15 .697 0.001

Table J80

Modern Sexism Posttest Means by Treatment Condition (web) Treatment Condition Mean SD N

Control 32.20 3.79 41

Informational 31.56 3.49 25

Financial 31.00 4.91 27

Generative 32.54 4.39 24

Total 31.85 4.16 117

273 Table J81

Modern Sexism Posttest Marginal Means by Treatment Condition (web) Treatment Condition Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

Control 32.18a,b 0.33 31.52 32.84

Informational .a,c . . .

Financial .a,c . . .

Generative 31.86a,b 0.30 31.28 32.45 a. Covariates appearing in the model are evaluated at the following values: Modern Sexism Pretest Score = 31.7265. b. Based on modified population marginal mean. c. This modified population marginal mean is not estimable.

Table J82

ANCOVA, Modern Sexism Posttest Score by Treatment Condition (web) Source df F p η2

Condition 3 0.53 .666 0.015

Gender 1 0.10 .755 0.001

Race 2 2.25 .110 0.043

Modern Sexism Pretest 1 1996.76 .000* 0.952

Gender * Race 1 0.47 .496 0.005

Condition * Gender 3 0.20 .894 0.006

Condition * Race 3 0.37 .777 0.011

* p≤ .001

274 Table J83 Modern Sexism Difference Score Overall T-Test (web) Test Value = 0 N Mean SD t df p

Modern Sexism Difference Score 122 0.16 1.03 1.76 121 .082

Table J84

Symbolic Racism Posttest Means by Treatme nt Condition (web) Treatment Condition Mean SD N

Control 23.12 1.98 41

Informational 23.44 2.48 25

Financial 22.78 2.26 27

Generative 22.88 2.15 24

Total 23.06 2.18 117

Table J85

Symbolic Racism Posttest Marginal Means by Treatment Condition (web) Treatment Condition Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

Control 23.11a,b 0.52 22.09 24.14

Informational .a,c . . .

Financial .a,c . . .

Generative 22.43a,b 0.46 21.52 23.34 a. Covariates appearing in the model are evaluated at the following values: Symbolic Racism Pretest Score = 26.2137. b. Based on modified population marginal mean. c. This modified population marginal mean is not estimable.

275 Table J86

ANCOVA, Symbolic Racism Posttest Score by Treatment Condition (web) Source df F p η2

Condition 3 2.99 .034* 0.082

Gender 1 1.65 .202 0.016

Race 2 0.24 .784 0.005

Symbolic Racism Pretest 1 105.00 .000** 0.510

Gender * Race 1 0.18 .675 0.002

Condition * Gender 3 0.29 .832 0.009

Condition * Race 3 2.09 .106 0.058

* p≤ .05

** p≤ .001

Table J87

Treatment Condition Control vs Game Contrast, Symbolic Racism Posttest Score (web) Source df F p η2

Contrast 1 0.08 .773 0.001

Table J88 Symbolic Racism Posttest Score T-Tests by Treatment Condition (web) Test Value = 0

N Mean SD t df p

Control Group 43 -3.35 3.21 -6.85 42 .000*

Informational Group 25 -2.60 3.01 -4.31 24 .000*

Financial Group 28 -2.86 3.31 -4.57 27 .000*

Generative Group 26 -3.73 2.96 -6.43 25 .000*

* p≤ .001

276 Table J89

Game Data Correlations with Modern Sexism Posttest Score* (web) df r p

Player Score 70 -0.121 .313

Total Clients Placed 99 -0.021 .834

Total Clients Placed (Biased Group) 99 -0.011 .916

Number of Guesses 100 0.042 .673

*Controlling for Modern Sexism Pretest Score

Table J90

Modern Sexism Posttest Means by Number of Plays (web) Number of Plays Mean SD N

Two 31.98 4.17 120

More Than Two 32.00 2.83 2

Total 31.98 4.14 122

Table J91

ANCOVA, Modern Sexism Posttest Score by Number of Plays (web) Source df F p η2

Modern Sexism Pretest 1 1844.43 .000* 0.939

More Than Two 1 0.79 .375 0.007

* p≤ .001

277 Table J92

Game Data Correlations with Symbolic Racism Posttest Score* (web) df r p

Player Score 70 0.081 .497

Total Clients Placed 99 -0.085 .398

Total Clients Placed (Biased Group) 99 0.039 .700

Number of Guesses 100 -0.106 .288

*Controlling for Symbolic Racism Pretest Score

Table J93

Symbolic Racism Posttest Means by Number of Plays (web) Number of Plays Mean SD N

Two 23.13 2.15 120

More Than Two 22.00 2.83 2

Total 23.11 2.15 122

Table J94

ANCOVA, Symbolic Racism Posttest Score by Number of Plays (web) Source df F p η2

Symbolic Racism Pretest 1 146.50 .000* 0.552

More Than Two 1 0.41 .525 0.003

* p≤ .001

278 Table J95

Modern Sexism Posttest Means by Bias Guess Condition (web) Guess Condition Mean SD N

No Guess 32.00 3.88 22

Informational Guess 31.35 4.02 20

Financial Guess 30.47 5.15 17

Generative Guess 32.81 4.65 16

Total 31.65 4.38 75

Table J96

Modern Sexism Posttest Marginal Means by Bias Guess Condition (web) Guessing Condition Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

No Guess 31.656a,b 0.30 31.06 32.25

Informational Guess .a,c . . .

Financial Guess .a,c . . .

Generative Guess 31.625a,b 0.34 30.94 32.31 a. Covariates appearing in the model are evaluated at the following values: Modern Sexism Pretest Score = 31.6400. b. Based on modified population marginal mean. c. This modified population marginal mean is not estimable.

279 Table J97

ANCOVA, Modern Sexism Posttest Score by Bias Guess Condition (web) Source df F p η2

Bias Guess Condition 3 0.03 .994 0.001

Gender 1 0.05 .816 0.001

Race 2 2.47 .093 0.077

Modern Sexism Pretest 1 1399.11 .000* 0.960

Gender * Race 1 0.18 .669 0.003

Bias Guess Condition * Gender 3 0.50 .685 0.025

Bias Guess Condition * Race 3 0.70 .554 0.034

* p≤ .001

Table J98

Bias Guess Condition No Guess vs Guess Contrast, Modern Sexism Posttest Score (web) Source df F p η2

Contrast 1 0.01 .940 0.000

Table J99

Symbolic Racism Posttest Means by Bias Guess Condition (web) Guess Condition Mean SD N

No Guess 22.45 2.20 22

Informational Guess 23.35 2.72 20

Financial Guess 23.12 2.00 17

Generative Guess 23.25 2.27 16

Total 23.01 2.30 75

280 Table J100

Symbolic Racism Posttest Marginal Means by Bias Guess Condition (web) Guessing Condition Mean Standard 95% Confidence Interval Error

Lower Bound Upper Bound

No Guess 22.64a,b 0.46 21.73 23.55

Informational Guess .a,c . . .

Financial Guess .a,c . . .

Generative Guess 22.71a,b 0.53 21.65 23.78 a. Covariates appearing in the model are evaluated at the following values: Symbolic Racism Pretest Score = 26.0933. b. Based on modified population marginal mean. c. This modified population marginal mean is not estimable.

Table J101

ANCOVA, Symbolic Racism Posttest Score by Bias Gues s Condition (web) Source df F p η2

Bias Guess Condition 3 1.53 .217 0.072

Gender 1 0.01 .905 0.000

Race 2 0.39 .680 0.013

Symbolic Racism Pretest 1 88.99 .000* 0.601

Gender * Race 1 0.41 .526 0.007

Bias Guess Condition * Gender 3 0.77 .513 0.038

Bias Guess Condition * Race 3 0.82 .486 0.040

* p≤ .001

Table J102

Bias Guess Condition No Guess vs Guess Contrast, Symbolic Racism Posttest Score (web) Source df F p η2

Contrast 1 1.60 .212 0.026

281 Table J103

Demographics , Mechanical Turk Players N %

Player Gender F emale 108 57.10

Male 81 42.90

Player Race White 148 77.50

Black and Hispanic 17 8.90

Other 26 13.60

Living Area Rural 31 16.20

Suburban 97 50.80

Urban 63 33.00

Table J104

Age, Mechanical Turk Players N Min Max Mean SD

Age 190 18 76 34.72 10.93

Table J105

Systemic Sexism Pretest Means by Completion Status (MT) Completed Mean SD N

No 1.97 1.12 32

Yes 1.91 1.17 100

Total 1.92 1.16 132

282 Table J106

ANOVA, Systemic Sexism Pretest Score by Completion Status (MT) Source df F p η2

Completed 1 0.06 .804 0.000

* p≤ .001

Table J107

Systemic Racism Pretest Means by Completion Status (MT) Completed Mean SD N

No 2.25 1.52 32

Yes 1.90 1.24 100

Total 1.98 1.32 132

Table J108

ANOVA, Systemic Racism Pretest Score by Completion Status (MT) Source df F p η2

Completed 1 1.72 .193 0.013

Table J109

Modern Sexism Pretest Means by Completion Status (MT) Completed Mean SD N

No 27.59 4.68 32

Yes 27.43 5.53 100

Total 27.47 5.32 132

283 Table J110

ANOVA, Modern Sexism Pretest Score by Completion Status (MT) Source df F p η2

Completed 1 0.02 .880 0.000

Table J111

Symbolic Racism Pretest Means by Completion Status (MT) Completed Mean SD N

No 22.28 4.50 32

Yes 20.13 4.96 100

Total 20.65 4.93 132

Table J112

ANOVA, Symbolic Racism Pretest Score by Completion Status (MT) Source df F p η2

Completed 1 4.75 .031* 0.035

* p≤ .05

Table J113

Systemic Sexism Posttest Means by Pretest Group (MT) Pretest Group Mean SD N

No Pretest 2.53 1.31 92

Pretest 1.71 1.26 99

Total 2.10 1.35 191

284 Table J114

ANOVA, Systemic Sexism Posttest Score by Pretest Group (MT) Source df F p η2

Pretest Group 1 19.724 .000* 0.094

* p≤ .001

Table J115

Systemic Racism Posttest Means by Pretest Group (MT) Pretest Group Mean SD N

No Pretest 2.37 1.52 92

Pretest 1.89 1.31 99

Total 2.12 1.43 191

Table J116

ANOVA, Systemic Racism Posttest Score by Pretest Group (MT) Source df F p η2

Pretest Group 1 5.52 .020* 0.028

* p≤ .05

Table J117

Modern Sexism Posttest Means by Pretest Group (MT) Pretest Group Mean SD N

No Pretest 29.28 4.46 92

Pretest 27.56 5.43 99

Total 28.39 5.05 191

285 Table J118

ANOVA, Modern Sexism Posttest Score by Pretest Group (MT) Source df F p η2

Pretest Group 1 5.72 .018* 0.029

* p≤ .05

Table J119

Symbolic Racism Posttest Means by Pretest Group (MT) Pretest Group Mean SD N

No Pretest 20.84 2.59 92

Pretest 20.07 2.42 99

Total 20.44 2.52 191

Table J120

ANOVA, Symbolic Racism Posttest Score by Pretest Group (MT) 2 Source df F p η

Pretest Group 1 4.47 .036* 0.023

* p≤ .05

Table J121

Systemic Sexism Posttest Marginal Means by Treatment Condition (MT) Treatment Condition Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

Control 2.11a 0.66 0.81 3.41

Informational 1.19a 0.32 0.55 1.84

Financial 2.23a 0.46 1.31 3.15

Generative 2.27a 0.45 1.38 3.16 a. Covariates appearing in the model are evaluated at the following values: Systemic Sexism Pretest Score = 1.9293.

286

Table J122

ANCOVA, Systemic Sexism Posttest Score by Treatment Condition (MT) Source df F p η2

Condition 3 2.28 .086 0.079

Gender 1 0.00 .975 0.000

Race 2 1.05 .353 0.026

Systemic Sexism Pretest 1 12.31 .001* 0.133

Gender * Race 2 0.45 .642 0.011

Condition * Gender 3 0.42 .736 0.016

Condition * Race 6 1.40 .226 0.095

* p≤ .001

Table J123

Treatment Condition Control vs. Game Contrast, Systemic Sexism Posttest Score (MT) Source df F p η2

Contrast 1 0.093 .761 0.001

Table J124 Systemic Sexism Difference Score Overall T-Test (MT) Test Value = 0 N Mean SD t df p

Systemic Sexism Difference Score 99 -0.22 1.43 -1.55 98 .124

287 Table J125

Systemic Racism Posttest Marginal Means by Treatment Condition (MT) Treatment Condition Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

Control 1.24 a 0.71 -0.18 2.65

Informational 1.72a 0.35 1.02 2.42

Financial 1.95a 0.51 0.94 2.96

Generative 1.70a 0.49 0.73 2.68

a. Covariates appearing in the model are evaluated at the following values: Systemic Racism Pretest Score = 1.8990.

Table J126

ANCOVA, Systemic Racism Posttest Score by Treatment Condition (MT) Source df F p η2

Condition 3 0.29 .832 0.011

Gender 1 1.51 .223 0.019

Race 2 1.24 .295 0.030

Systemic Racism Pretest 1 11.64 .001* 0.127

Gender * Race 2 1.87 .161 0.045

Condition * Gender 3 0.17 .919 0.006

Condition * Race 6 1.02 .418 0.071

* p≤ .001

Table J127

Treatment Condition Control vs. Game Contrast, Systemic Racism Posttest Score (MT) Source df F p η2

Contrast 1 0.55 .462 0.007

288 Table J128 Systemic R acism Difference Score Overall T-Test (MT) Test Value = 0 N Mean SD t df p

Systemic Sexism Difference Score 99 -0.01 1.56 -0.07 98 .949

Table J129

Game Data Correlations with Systemic Sexism Posttest Score* (MT) df r p

Player Score 55 0.162 .227

Total Clients Placed 88 -0.083 .439

Total Clients Placed (Biased Group) 88 -0.131 .217

Number of Guesses 88 -0.106 .322

*Controlling for Systemic Sexism Pretest Score

Table J130

Systemic Sexism Posttest Means by Number of Plays (MT) Number of Plays Mean SD N

Two 1.68 1.21 95

More Than Two 2.25 2.22 4

Total 1.71 1.26 99

Table J131

ANCOVA, Systemic Sexism Posttest Score by Number of Plays (MT) Source df F p η2

Systemic Sexism Pretest 1 11.07 .001* 0.103

More Than Two 1 1.74 .190 0.018

* p≤ .001

289 Table J132

Game Data Correlations with Systemic Racism Posttest Score* (MT) df r p

Player Score 55 0.057 .671

Total Clients Placed 88 0.159 .134

Total Clients Placed (Biased Group) 88 0.077 .473

Number of Guesses 88 0.079 .459

*Controlling for Systemic Racism Pretest Score

Table J133

Systemic Racism Posttest Means by Number of Plays (MT) Number of Plays Mean SD N

Two 1.89 1.28 95

More Than Two 1.75 2.06 4

Total 1.89 1.31 99

Table J134

ANCOVA, Systemic Racism Posttest Score by Number of Plays (MT) Source df F p η2

Systemic Racism Pretest 1 7.25 .008* 0.070

More Than Two 1 0.24 .626 0.002

* p≤ .05

290

Table J135

Mean Score by Player Race and Gender (MT) Player Gender Player Race Mean SD N

Female White 1033152.39 4713025.69 46

Black and Hispanic 2047543.33 3471347.99 3

Other 40575.71 20201.24 7

Male White 171668.18 488615.17 33

Black and Hispanic 4458084.29 11238569.33 7

Other 49826.25 24847.07 8

Total 877133.85 4307294.34 104

Table J136

ANCOVA, Mean Score by Player Race and Gender (MT) Source df F p η2

Race 2 1.69 .190 .033

Gender 1 .17 .684 .002

Race * Gender 2 .58 .561 .012

Table J137

Mean Clients Placed by Player Race and Gender (MT) Player Gender Player Race Mean SD N

Female White 23.94 20.36 84

Black and Hispanic 9.25 12.51 8

Other 16.00 15.05 11

Male White 29.73 21.76 52

Black and Hispanic 19.00 14.51 9

Other 21.09 14.03 11

Total 24.06 20.06 175

291

Table J138

ANCOVA, Mean Clients Placed by Player Race and Gender (MT) Source df F p η2

Race 2 4.26 .016* .048

Gender 1 2.44 .120 .014

Race * Gender 2 .08 .920 .001

* p≤ .05

Table J139

White vs. non-White Contrast, Clients Placed (MT) Source df F p η2

Contrast 1 8.38 .004* .047

* p≤ .05

Table J140

Mean Bias-Group Clients Placed by Player Race and Gender (MT) Player Gender Player Race Mean SD N

Female White 9.06 8.24 84

Black and Hispanic 3.63 3.96 8

Other 7.09 8.24 11

Male White 11.08 9.32 52

Black and Hispanic 8.44 7.09 9

Other 9.09 6.20 11

Total 9.26 8.35 175

292 Table J141

ANCOVA, Mean Bias-Group Clients Placed by Player Race and Gender (MT ) Source df F p η2

Race 2 2.08 .128 .024

Gender 1 2.52 .114 .015

Race * Gender 2 .22 .805 .003

Table J142

Mean Guesses by Player Race and Gender (MT) Player Gender Player Race Mean SD N

Female White 2.24 2.24 84

Black and Hispanic 2.63 1.92 8

Other 2.45 2.16 11

Male White 1.62 1.84 52

Black and Hispanic 1.67 2.55 9

Other 2.27 2.06 11

Total 2.06 2.11 175

Table J143

ANCOVA, Mean Guesses by Player Race and Gender (MT) Source df F p η2

Race 2 .44 .643 .005

Gender 1 1.55 .215 .009

Race * Gender 2 .17 .843 .002

293

Table J144

Crosstabulation of Player Race and Games Played (MT)

Games Played χ² p Total

Player Race 2 >2

White 141 7 3.169 a .205 148

Black and Hispanic 15 2 17

Other 26 0 26

Total 182 9 191 a. 2 cells (33.3%) have expected count less than 5. The minimum expected count is .80.

Table J145

Crosstabulation of Player Gender and Games Played (MT)

Games Played χ² p Total

Player Gender 2 >2

Female 103 5 .010a .921 108

Male 77 4 81 Total 180 9 189 a. 1 cells (25.0%) have expected count less than 5. The minimum expected count is 3.86.

294 Table J146

Systemic Racism Posttest Marginal Means by Bias Guess Condition (MT) Guessing Condition Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

No Guess 1.95a 0.36 1.23 2.66

Informational Guess 1.29a 0.33 0.62 1.96

Financial Guess 2.11a 0.51 1.09 3.14

Generative Guess 2.31a 0.42 1.47 3.15 a. Covariates appearing in the model are evaluated at the following values: Systemic Sexism Pretest Score = 1.9315.

Table J147

ANCOVA, Systemic Sexism Posttest Score by Bias Guess Condition (MT) Source df F p η2

Bias Guess Condition 3 1.54 .214 0.079

Gender 1 0.03 .865 0.001

Race 2 3.19 .049* 0.106

Systemic Sexism Pretest 1 7.91 .007** 0.128

Gender * Race 2 0.90 .414 0.032

Bias Guess Condition * Gender 3 1.92 .137 0.096

Bias Guess Condition * Race 6 1.37 .242 0.132

* p≤ .05

*p≤ .01

Table J148

Bias Guess Condition No Guess vs Guess Contrast, Systemic Sexism Posttest Score (MT) Source df F p η2

Contrast 1 0.01 .925 0.000

295 Table J149

Systemic Sexism Posttest Marginal Means by Player Race (MT) Player Race Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

White 1.54a 0.15 1.23 1.85

Black and Hispanic 2.73a 0.45 1.83 3.63

Other 1.48a 0.35 0.77 2.19 a. Covariates appearing in the model are evaluated at the following values: Systemic Sexism Pretest Score = 1.9315.

Table J150 Systemic Sexism Difference Score T-Tests by Player Race (MT) Test Value = 0 N Mean SD t df p

White 54 -0.37 1.43 -1.90 53 .063

Black and Hispanic 7 0.71 2.06 0.92 6 .394

Other 12 -0.25 1.14 -0.76 11 .463

Table J151

Systemic Racism Po sttest Marginal Means by Bias Guess Condition (MT) Guessing Condition Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

No Guess 1.99a 0.38 1.24 2.75

Informational Guess 1.79a 0.34 1.11 2.47

Financial Guess 1.96a 0.52 0.91 3.01

Generative Guess 1.64a 0.44 0.76 2.51 a. Covariates appearing in the model are evaluated at the following values: Systemic Racism Pretest Score = 2.0137.

296 Table J152

ANCOVA, Systemic Racism Posttest Score by Bias Guess Condition (MT) Source df F p η2

Bias Guess Condition 3 0.14 .936 0.008

Gender 1 3.01 .089 0.053

Race 2 0.01 .987 0.000

Systemic Racism Pretest 1 9.76 .003* 0.153

Gender * Race 2 3.48 .038** 0.114

Bias Guess Condition * Gender 3 0.11 .954 0.006

Bias Guess Condition * Race 6 0.48 .820 0.051

* p≤ .01

** p≤ .05

Table J153

Bias Guess Condition No Guess vs Guess Contrast, Systemic Racism Posttest Score (MT) Source df F p η2

Contrast 1 0.19 .667 0.003

297 Table J154

Systemic Racism Posttest Means by Playe r Gender and Player Race (MT) Player Gender Player Race Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

Female White 1.62a 0.21 1.20 2.03

Black and Hispanic 3.06a 0.74 1.58 4.55

Other 2.14a 0.61 0.91 3.36

Male White 2.09a 0.25 1.60 2.58

Black and Hispanic 0.52a 0.74 -0.97 2.01

Other 1.64a 0.45 0.74 2.53

a. Covariates appearing in the model are evaluated at the following values: Systemic Racism Pretest Score = 2.0137.

Table J155 Systemic Racism Difference Score T-Tests by Player Race and Gender (MT) Test Value = 0

N Mean SD t df p

White Women 32 -0.56 1.34 -2.37 31 .024*

Black and Hispanic Women 3 2.00 1.00 3.46 2 .074

Other Women 5 -0.40 1.14 -0.78 4 .477

White Men 22 0.27 1.24 1.03 21 .315

Black and Hispanic Men 4 -2.00 1.83 -2.191 3 .116

Other Men 7 0.00 1.41 0.00 6 1.000

* p≤ .05

298 Table J156

Game Score Marginal Means by Bias Guess Condition (MT) Guessing Condition Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

No Guess 2939601.00 1026789.83 898410.38 4980791.63

Informational Guess 1524784.55 1106723.51 -675308.97 3724878.07

Financial Guess -1223709.07 1749676.16 -4701950.16 2254532.02

Generative Guess -1095283.95 1433944.59 -3945871.44 1755303.54

Table J157

ANCOVA, Game Score by Bias Guess Condition (MT) Source df F p η2

Bias Guess Condition 3 2.48 .067 0.079

Gender 1 1.77 .187 0.020

Race 2 0.15 .865 0.003

Bias Guess Condition * Gender 3 1.66 .181 0.055

Bias Guess Condition * Race 6 2.03 .071 0.124

Gender * Race 2 2.07 .133 0.046

Table J158

Mean Score Percentag e Earned from Bias Group by Bias Guess Condition (MT) Guessing Condition Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

No Guess 0.45 0.11 0.22 0.68

Informational Guess 0.23 0.12 -0.01 0.48

Financial Guess 0.11 0.19 -0.28 0.50

Generative Guess 0.21 0.16 -0.11 0.53

299 Table J159

ANCOVA, Score Percentage Earned from Bias Group by Bias Guess Condition (MT) Source df F p η2

Bias Guess Condition 3 1.16 .330 0.039

Gender 1 1.81 .182 0.021

Race 2 1.34 .266 0.030

Bias Guess Condition * Gender 3 0.78 .508 0.027

Bias Guess Condition * Race 6 2.02 .071 0.124

Gender * Race 2 3.82 .026* 0.082

* p≤ .05

Table J160

Score Percentage Earned from Bias Group Means by Player Race and Gender (MT) Player Race Player Gender Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

White Female 0.22 0.07 0.09 0.36

Male 0.14 0.08 -0.02 0.30

Black and Hispanic Female 0.11 0.35 -0.58 0.81

Male 0.22 0.19 -0.15 0.59

Other Female 0.09 0.18 -0.27 0.45

Male 0.72 0.17 0.39 1.05

Table J161

Total Clients Placed by Bias Guess Condition (MT) Guessing Condition Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

No Guess 24.85 4.79 15.36 34.34

Informational Guess 18.78 5.13 8.62 28.94

Financial Guess 26.98 6.25 14.60 39.36

Generative Guess 17.94 4.80 8.43 27.44

300 Table J162

ANCOVA, Total Clients Placed by Bias Guess Condition (MT) Source df F p η2

Bias Guess Condition 3 0.714 .546 0.018

Gender 1 1.373 .244 0.012

Race 2 2.253 .110 0.038

Bias Guess Condition * Gender 3 1.382 .252 0.035

Bias Guess Condition * Race 6 0.238 .963 0.012

Gender * Race 2 0.020 .980 0.000

Table J163

Bias Group Clients Placed by Bias Guess Condition (MT) Guessing Condition Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

No Guess 9.98 2.06 5.90 14.06

Informational Guess 9.14 2.21 4.77 13.51

Financial Guess 10.54 2.69 5.22 15.87

Generative Guess 6.37 2.06 2.28 10.46

Table J164

ANCOVA, Bias Group Clien ts Placed by Bias Guess Condition (MT) Source df F p η2

Bias Guess Condition 3 0.74 .533 0.019

Gender 1 1.87 .174 0.016

Race 2 0.87 .424 0.015

Bias Guess Condition * Gender 3 0.73 .534 0.019

Bias Guess Condition * Race 6 0.20 .976 0.010

Gender * Race 2 0.47 .626 0.008

301 Table J165

Modern Sexism Posttest Marginal Means by Treatment Condition (MT) Treatment Condition Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

Control 28.71a 0.69 27.34 30.09

Informational 26.85a 0.34 26.18 27.51

Financial 28.67a 0.49 27.70 29.64

Generative 27.29a 0.48 26.33 28.25 a. Covariates appearing in the model are evaluated at the following values: Modern Sexism Pretest Score = 27.4343.

Table J166

ANCOVA, Modern Sexism Posttest Score by Treatment Condition (MT) Source df F p η2

Condition 3 3.87 .012* 0.127

Gender 1 0.74 .392 0.009

Race 2 1.80 .172 0.043

Modern Sexism Pretest 1 1485.32 .000** 0.949

Gender * Race 2 0.71 .493 0.018

Condition * Gender 3 0.52 .667 0.019

Condition * Race 6 1.23 .298 0.085

* p≤ .05

** p≤ .001

Table J167

Treatment Condition Control vs Game Contrast, Modern Sexism Posttest Score (MT) Source df F p η2

Contrast 1 2.33 .131 0.028

302 Table J168 Modern Sexism Difference Score T-Tests by Treatment Condition Test Value = 0 N Mean SD t df p

Control Group 26 0.81 1.67 2.46 25 .021*

Informational Group 30 -0.37 1.00 -2.01 29 .054

Financial Group 24 0.25 1.19 1.03 23 .314

Generative Group 19 -0.21 0.63 -1.46 18 .163

* p≤ .05

Table J169

Symbolic Racism Posttest Marginal Means by Treatment Condition (MT) Treatment Condition Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

Control 19.94a 0.85 18.25 21.63

Informational 19.99a 0.42 19.16 20.82

Financial 18.25a 0.61 17.05 19.46

Generative 20.06a 0.61 18.84 21.27 a. Covariates appearing in the model are evaluated at the following values: Symbolic Racism Pretest Score = 20.1212.

303 Table J170

ANCOVA, Symbolic Racism Posttest Score by Treatment Condition (MT) Source df F p η2

Condition 3 2.28 .086 0.079

Gender 1 0.17 .683 0.002

Race 2 3.68 .030* 0.084

Symbolic Racism Pretest 1 113.58 .000** 0.587

Gender * Race 2 0.25 .779 0.006

Condition * Gender 3 0.66 .579 0.024

Condition * Race 6 1.66 .141 0.111

* p≤ .05

** p≤ .001

Table J171

Treatment Condition Control vs Game Contrast, Symbolic Racism Posttest Score (MT) Source df F p η2

Contrast 1 0.314 0.577 0.004

Table J172

Symbolic Racism Posttest Marginal Means by Player Race (MT) Player Race Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

White 20.24a 0.18 19.89 20.59

Black and Hispanic 19.65a 0.65 18.36 20.94

Other 18.78a 0.52 17.75 19.82 a. Covariates appearing in the model are evaluated at the following values: Symbolic Racism Pretest Score = 20.1212.

304 Table J173 Symbolic Racism Difference Score T -Tests by Player Race (MT) Test Value = 0 N Mean SD t df p

White 78 0.37 3.49 0.94 77 .350

Black and Hispanic 8 -1.13 4.18 -0.76 7 .472

Other 13 -1.92 2.72 -2.55 12 .026*

* p≤ .05

Table J174

Game Data Correlations with Modern Sexism Posttest Score* (MT) df r p

Player Score 55 -0.384 .003**

Total Clients Placed 88 0.070 .509

Total Clients Placed (Biased Group) 88 -0.002 .988

Number of Guesses 88 -0.109 .308

*Controlling for Modern Sexism Pretest Score

** p≤ .005

Table J175

Modern Sexism Posttest Means by Number of Plays (MT) Number of Plays Mean SD N

Two 27.51 5.44 95

More Than of Two 28.75 5.85 4

Total 27.56 5.43 99

305 Table J176

ANCOVA, Modern Sexism Posttest Score by Number of Plays (MT) Source df F p η2

Modern Sexism Pretest 1 1719.24 .000* 0.947

More Than Two 1 0.46 .500 0.005

* p≤ .001

Table J177

Game Data Correlations with Symbolic Racism Posttest Score* (MT) df r p

Player Score 55 0.089 .511

Total Clients Placed 88 -0.118 .268

Total Clients Placed (Biased Group) 88 -0.092 .387

Number of Guesses 88 -0.002 .988

*Controlling for Symbolic Racism Pretest Score

Table J178

Symbolic Racism Posttest Means by Number of Plays (MT) Number of Plays Mean SD N

Two 20.07 2.40 95

More Than Two 20.00 3.16 4

Total 20.07 2.42 99

Table J179

ANCOVA, Symbolic Racism Posttest Score by Number of Plays (MT) Source df F p η2

Symbolic Racism Pretest 1 129.33 .000* 0.574

More Than Two 1 0.00 .973 0.000

* p≤ .001

306 Table J180

Modern Sexism Posttest Marginal Means by Bias Guess Condition and Player Race (MT) Guessing Condition Player Race Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

No Guess White 27.74a 0.24 27.27 28.22

Black and Hispanic 27.57a 0.61 26.36 28.79

Other 27.66a 0.51 26.64 28.68

Informational Guess White 27.06a 0.23 26.60 27.51

Black and Hispanic 26.42a 0.61 25.20 27.63

Other 27.14a 0.44 26.27 28.01

Financial Guess White 27.14a 0.24 26.66 27.61

Black and Hispanic 31.37a 1.01 29.34 33.40

Other 28.54a 0.72 27.09 29.98

Generative Guess White 27.10a 0.25 26.59 27.60

Black and Hispanic 27.96a 0.82 26.33 29.60

Other 27.75a 0.51 26.74 28.76 a. Covariates appearing in the model are evaluated at the following values: Modern Sexism Pretest Score = 27.5068.

307 Table J181

ANCOVA, Modern Sexism Posttest Score by Bias Guess Condition (MT) Source df F p η2

Bias Guess Condition 3 7.04 .000* 0.281

Gender 1 0.10 .749 0.002

Race 2 5.30 .008** 0.164

Modern Sexism Pretest 1 2549.47 .000* 0.979

Gender * Race 2 0.94 .396 0.034

Bias Guess Condition * Gender 3 0.89 .452 0.047

Bias Guess Condition * Race 6 3.63 .004** 0.288

* p≤ .001

**p≤ .01 ***p≤ .005

Table J182

Modern Sexism Posttest Marginal Means by Bias Guess Condition (MT) Guessing Condition Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

No Guess 27.66a 0.28 27.10 28.22

Informational Guess 26.87a 0.26 26.34 27.40

Financial Guess 29.02a 0.40 28.22 29.82

Generative Guess 27.03a 0.33 26.94 28.27 a. Covariates appearing in the model are evaluated at the following values: Modern Sexism Pretest Score = 27.5068.

Table J183

Bias Guess Condition No Guess vs. Guess Contrast, Modern Sexism Posttest Score (MT) Source df F p η2

Contrast 1 0.26 .614 0.005

308 Table J184

Modern Sexism Posttest Marginal Means by Player Race (MT) Player Race Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

White 27.26a 0.12 27.02 27.50

Black and Hispanic 28.33a 0.35 27.63 29.03

Other 27.77a 0.27 27.22 28.32 a. Covariates appearing in the model are evaluated at the following values: Modern Sexism Pretest Score = 27.5068.

Table J185

Player Race Contrast, Modern Sexism Posttest Score (MT) Source df F p η2

Contrast 1 10.28 .002* 0.160

* p≤ .005

Table J186

Modern Sexism Posttest Means by Guess Condition , White Players Only (MT) Guessing Condition Mean SD N

No Guess 28.69 4.70 13

Informational Guess 27.19 5.41 16

Financial Guess 27.54 7.08 13

Generative Guess 26.00 6.27 12

Total 27.37 5.80 54

309 Table J187

ANCOVA, Modern Sexism Posttest Score by Bias Guess Condition, White Players Only (MT) Source df F p η2

Bias Guess Condition 3 2.18 .102 0.118

Modern Sexism Pretest 1 2482.75 .000* 0.981

* p≤ .001

Table J188 Modern Sexism Difference Score T-Test , White Players Only Test Value = 0 N Mean SD t df p

Modern Sexism Difference Score 78 0.06 1.28 0.44 77 .660

Table J189

Modern Sexism Posttest Means by Bias Guess Condition, Black, Hispanic, and Other Players (MT) Guessing Condition Mean SD N

No Guess 30.60 3.58 5

Informational Guess 22.83 6.46 6

Financial Guess 27.33 3.51 3

Generative Guess 29.80 5.22 5

Total 27.42 5.77 19

Table J190

ANCOVA, Modern Sexism Posttest Score by Bias Guess Condition, Black, Hispanic and Other Players (MT) Source df F p η2

Bias Guess Condition 3 7.35 .003 0.612

Modern Sexism Pretest 1 477.06 .000* 0.971

* p≤ .001

310 Table J191 Modern Sexism Difference Score T- Test, Black, Hispanic, and Other Players (MT) Test Value = 0 N Mean SD t df p

No Guess 5 0.00 0.71 0.00 4 1.000

Informative Guess 6 -0.5 1.22 -1.00 5 .363

Financial Guess 3 2.33 1.53 2.65 2 .118

Generative Guess 5 0.2 0.45 1.00 4 .374

Table J192

Symbolic Racism Posttest Marginal Means by Bias Guess Condition and Player Race (MT) Guessing Condition Player Race Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

No Guess White 20.62a 0.39 19.84 21.41

Black and Hispanic 20.13a 1.01 18.12 22.15

Other 18.34a 0.84 16.66 20.01

Informational Guess White 19.96a 0.38 19.20 20.71

Black and Hispanic 20.16a 1.00 18.16 22.16

Other 20.07a 0.70 18.66 21.48

Financial Guess White 20.48a 0.39 19.70 21.27

Black and Hispanic 15.62a 1.66 12.29 18.96

Other 17.05a 1.19 14.67 19.44

Generative Guess White 20.24a 0.42 19.40 21.07

Black and Hispanic 20.85a 1.38 18.09 23.61

Other 20.17a 0.84 18.49 21.85 a. Covariates appearing in the model are evaluated at the following values: Symbolic Racism Pretest Score = 20.4521.

311 Table J193

ANCOVA, Symbolic Racism Posttest Score by Bias Guess Condition (MT) Source df F p η2

Bias Guess Condition 3 3.39 .025* 0.158

Gender 1 0.48 .490 0.009

Race 2 5.39 .007* 0.166

Symbolic Racism Pretest 1 74.97 .000** 0.581

Gender * Race 2 0.12 .888 0.004

Bias Guess Condition * Gender 3 1.14 .340 0.060

Bias Guess Condition * Race 6 2.57 .029 0.222

* p≤ .05

** p≤ .01 *** p≤ .001

Table J194

Symbolic Racism Posttest Means by Bias Guess Condition (MT) Guessing Condition Mean Standard Error 95% Confidence Interval

Lower Bound Upper Bound

No Guess 19.70a 0.46 18.78 20.62

Informational Guess 20.06a 0.43 19.21 20.92

Financial Guess 17.72a 0.66 16.39 19.05

Generative Guess 20.42a 0.56 19.30 21.54 a. Covariates appearing in the model are evaluated at the following values: Symbolic Racism Pretest Score = 20.4521.

Table J195

Bias Guess Condition No Guess vs. Guess Contrast, Symbolic Racism Posttest Score (MT) Source df F p η2

Contrast 1 0.29 .593 0.005

312 Table J196

Symbolic Racism Posttest Marginal Means by Playe r Race (MT) Player Race Mean Standard 95% Confidence Interval Error

Lower Bound Upper Bound

White 20.33a 0.20 19.93 20.72

Black and Hispanic 19.19a 0.58 18.04 20.35

Other 18.91a 0.45 18.00 19.81 a. Covariates appearing in the model are evaluated at the following values: Symbolic Racism Pretest Score = 20.4521.

Table J197

White vs. non-White Player Race Contrast, Symbolic Racism Posttest Score (MT) Source df F p η2

Contrast 1 9.57 .003* 0.151

* p≤ .005

Table J198

Symbolic Racism Posttest Means by Bias Guess Condition, White Players (MT) Guessing Condition Mean SD N

No Guess 20.77 1.64 13

Informational Guess 19.88 1.75 16

Financial Guess 20.62 2.72 13

Generative Guess 19.75 2.45 12

Total 20.24 2.15 54

313 Table J199

ANCOVA, Symbolic Racism Posttest Score by Bias Guess Condition, White Players (MT) Source df F p η2

Bias Guess Condition 3 0.42 .739 0.025

Symbolic Racism Pretest 1 63.50 .000* 0.564

* p≤ .001

Table J200 Symbolic Racism Difference Score T-Test, White Players Test Value = 0 N Mean SD t df p

Symbolic Racism Difference 78 0.37 3.49 0.94 77 .350

Table J201

Symbolic Racism Posttest Means by Bias Guess Condition, Black, Hispanic, and Other Players (MT) Guessing Condition Mean SD N

No Guess 20.00 1.87 5

Informational Guess 19.83 2.79 6

Financial Guess 15.00 1.00 3

Generative Guess 21.80 1.92 5

Total 19.63 2.97 19

314 Table J202

ANCOVA, Symbolic Racism Posttest Score by Bias Guess Condition, , Black, Hispanic, and Other Players (MT) Source df F p η2

Bias Guess Condition 3 8.16 .002* 0.636

Symbolic Racism Pretest 1 41.35 .000** 0.747

* p≤ .005

* p≤ .001

Table J203 Symbolic Racism Difference Score T-Tests by Bias Guess Condition , Black, Hispanic, Other Players (MT) Test Value = 0

N Mean SD t df p

No Guess 5 -3.20 1.79 -4.00 4 .016

Informative Guess 6 0.17 3.92 0.10 5 .921

Financial Guess 3 -2.00 2.00 -1.73 2 .225

Generative Guess 5 -2.40 2.19 -2.45 4 .070

315