Reducing Stereotype Threat in Classrooms: a Review of Social-Psychological Intervention Studies on Improving the Achievement of Black Students

ISSUES& ANSWERS REL 2009–No. 076

Reducing stereotype threat At SERVE Center UNC, Greensboro in classrooms: a review of social- psychological intervention studies on improving the achievement of Black students

U.S. Department of Education ISSUES&ANSWERS REL 2009–No. 076

At SERVE Center UNC, Greensboro

Reducing stereotype threat in classrooms: a review of social-psychological intervention studies on improving the achievement of Black students

July 2009

Prepared by With contributions from Joshua Aronson Bianca Montrosse Department of Applied Psychology SERVE Center at New York University University of North Carolina at Greensboro Geoffrey Cohen Karla Lewis Department of Psychology SERVE Center at University of Colorado at Boulder University of North Carolina at Greensboro Wendy McColskey Kathleen Mooney SERVE Center at SERVE Center at University of North Carolina at Greensboro University of North Carolina at Greensboro

U.S. Department of Education WA ME MT ND VT MN OR NH ID SD WI NY MI WY IA PA NE NV OH IL IN

UT WV CA CO VA KS MO KY NC TN AZ OK NM AR SC

GA MS AL LA TX AK FL At SERVE Center UNC, Greensboro

Issues & Answers is an ongoing series of reports from short-term Fast Response Projects conducted by the regional educa tional laboratories on current education issues of importance at local, state, and regional levels. Fast Response Project topics change to reflect new issues, as identified through lab outreach and requests for assistance from policymakers and educa tors at state and local levels and from communities, businesses, parents, families, and youth. All Issues & Answers reports meet Institute of Education Sciences standards for scientifically valid research.

July 2009

This report was prepared for the Institute of Education Sciences (IES) under Contract ED-06-CO-0028 by Regional Edu cational Laboratory Southeast administered by the SERVE Center at the University of North Carolina at Greensboro. The content of the publication does not necessarily reflect the views or policies of IES or the U.S. Department of Education nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government.

This report is in the public domain. While permission to reprint this publication is not necessary, it should be cited as:

Aronson, J., Cohen, G., McColskey, W., Montrosse, B., Lewis, K., and Mooney, K. (2009). Reducing stereotype threat in classrooms: a review of social-psychological intervention studies on improving the achievement of Black students (Issues & Answers Report, REL 2009–No. 076). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory Southeast. Retrieved from http://ies.ed.gov/ncee/edlabs.

This report is available on the regional educational laboratory web site at http://ies.ed.gov/ncee/edlabs. Summary REL 2009–No. 076 Reducing stereotype threat in classrooms: a review of social-psychological intervention studies on improving the achievement of Black students

Stereotype threat arises from a fear peers). This second source of stress is specific among members of a group of reinforc to negatively stereotyped groups. It arises from ing negative stereotypes about the intel a fear of reinforcing negative stereotypes about lectual ability of the group. The report the intellectual ability of their racial group. identifies three randomized controlled Because Black students must contend with two trial studies that use classroom-based sources of stress rather than one, their perfor strategies to reduce stereotype threat mance may be suppressed relative to that of and improve the academic performance their nonminority peers. of Black students, narrowing their achievement gap with White students. A systematic search was conducted for em pirical studies of classroom-based social- This review located and summarized the psychological interventions designed to reduce findings of randomized controlled trial stud stereotype threat and thus improve the aca ies on classroom-based social-psychological demic performance of Black students. Search interventions aimed at reducing the experi term combinations, such as “stereotype threat” ence of stereotype threat that might otherwise and “intervention,” and “achievement gap” and lead some Black students to underperform on “intervention,” were used to search a number difficult academic tasks or tests. Reducing the of bibliographic databases. In addition, a web achievement gap between Black and White site on this topic with an extensive reference list students is a critical goal for states, districts, was also reviewed. This initial search identified and schools. Experimental research on both 289 references. After applying relevant inclu inducing and reducing stereotype threat can sion criteria for topical and sample relevance, inform discussions of strategies. three experimental studies were identified. The three studies found positive impacts on the Some students may perform below their academic performance of Black students for the potential because of the stress of being under following social-psychological strategies: constant evaluation in the classroom. Black students, however, may experience another • Reinforce for students the idea that intel source of stress in addition to this general ligence is expandable and, like a muscle, one (which they share with their nonminority grows stronger when worked. ii Summary

• Teach students that their difficulties in performance of Black students. Rather, they school are often part of a normal learning present scientific evidence suggesting that curve or adjustment process, rather than such strategies might reduce the level of something unique to them or their racial social-psychological threat that some Black group. students might otherwise feel in academic performance situations. It is important to • Help students reflect on other values in note that while the strategies use established their lives beyond school that are sources procedures that can be emulated by teach of self-worth for them. ers and administrators, they also require thought and care on the part of schools and These three experiments are not an exhaus teachers in applying them in their particular tive list of the interventions to consider in situations. reducing the racial achievement gap, nor are they silver bullets for improving the academic July 2009 Table of conTenTS iii

TaBle of conTenTs Why this study? 1 Need for the study 1 What is stereotype threat and how has it been studied? 2 Findings of three experimental studies of interventions to reduce stereotype threat in grade 7 classroom settings 4 Study 1: Blackwell, Trzesniewski, and Dweck 2007, “Implicit theories of intelligence predict achievement across an adolescent transition: a longitudinal study and an intervention” 4 Study 2: Good, Aronson, and Inzlicht 2003, “Improving adolescents’ standardized test performance: an intervention to reduce the effects of stereotype threat” 7 Study 3: Cohen, Garcia, Apfel, and Master 2006, “Reducing the racial achievement gap: a social-psychological intervention” 9 Concluding thoughts on turning research into practice 12 Notes 16 Appendix A Research on the relationship between stereotype threat and Black students’ academic performance 17 Appendix B Methodology 20 Appendix C Article screening and study quality review protocols 31 References 35 Boxes 1 Study methods 3 2 Key terms 6 Figures 1 Estimated mean math scores by experimental condition 7 2 Percentage of Black and White students receiving a grade of D or lower in targeted course in same semester as the intervention, by experimental condition 12 B1 First- and second-level screening and assessment of the quality of studies 21 Tables 1 Reported intervention impacts on the spring grade 7 state reading test (Texas Assessment of Academic Skills) 9 2 Covariate-adjusted mean grade point average (averaged over both studies) for intervention and control groups, by level of preintervention performance 11 3 Summary of effects reported by the three studies 15 B1 Search results 20 B2 Disposition of references 22 B3 Quality of final studies included in report 26 B4 Methodological summary of the three experimental studies reviewed 28 Why ThiS STudy? 1 stereotype threat Why This sTudy? At every level of family income and school prepa arises from a fear ration, Black students1 on average earn relatively lower grade point averages (GPAs) and scores on among members standardized tests (Bowen and Bok 1998; Hacker 1995; Jencks and Phillips 1998; Steele 1997). In of a group of a society where economic opportunity depends heavily on scholastic success, even a partial nar reinforcing negative rowing of the achievement gap would lead to a positive change in the lives of many academically stereotypes about at-risk children. the intellectual Need for the study ability of the Regional Educational Laboratory Southeast serves group. The report six southeastern states for which reducing the achievement gap between Black students and identifies three White students continues to be a major concern. The data indicate an education crisis in the South randomized east Region, especially for Black male students (KewalRamani et al. 2007; Wald and Losen 2005). controlled trial A report by the Southern Regional Education Board (SREB) on SAT and ACT scores concludes studies that use that between 1998 and 2002 none of the 16 SREB states narrowed the achievement gap between classroom-based Black and White students (Southern Regional Education Board 2003). The achievement gap even strategies to reduce widened for Black male students. Among the SREB states, which include the six states covered by the stereotype threat Regional Educational Laboratory Southeast, only 45 percent of Black male students graduated from and improve high school in 2003 compared with 61 percent of Black female students, 65 percent of White male the academic students, and 67 percent of White female students. performance of Thus, Regional Educational Laboratory South east frequently receives requests from Southeast Black students, Region educators for information on new ideas on interventions, programs, and policies that could narrowing their close the achievement gap between Black and White students. Several Southeast Region states achievement have regularly hosted conferences on this topic gap with White and published reports based on reviews. students. Many potential contributing factors in the achievement gap have been explored, some 2 reducing STereoType ThreaT in claSSroomS

Because of an awareness individual (for example, fam- A for a summary of the research on stereotype of negative stereotypes ily socioeconomic background, threat). Such worries, in turn, can hinder their test presupposing academic self-efficacy, and student aspira performance, motivation, and learning. inferiority, Black and tions) and some school related (for other minority students example, class size, distribution of Research on stereotype threat began with labora may worry that they could resources across schools, quality tory studies exploring why Black college students confirm the intellectual and diversity of teachers, and lack seemed to be performing below their potential. inferiority alleged by of explicit and high performance Although a test-taking situation may seem objec such stereotypes. such standards). Other factors external tively the same for all students, some students, worries can hinder to school are also suggested as because of their social identity, may experience it their test performance, having an impact on racial gaps in in a very different way. Steele and Aronson (1995) motivation, and learning academic performance, such as a conducted a seminal experiment to explore the lack of high-quality early child negative impact of administering a test under hood education and of economic potentially stereotype-threat-inducing conditions opportunities to pursue postsecondary education, by randomly assigning study participants to two an important incentive to do well in school. (For different test-taking conditions. In one test-taking reviews of research on the achievement gap, see condition, a standardized test (composed of verbal Bowen and Bok 1998; Jencks and Phillips 1998; Graduate Record Exam items) was presented Rothstein 2002.) This report recognizes that there to one group of college students as “diagnostic is a complex set of influential factors and that of intellectual ability.” It was hypothesized that many of them are beyond a teacher’s influence; Black students in this condition would worry that these are not addressed here. Rather, to respond to performing poorly could confirm a stereotype the ongoing need for new information in this area, about their racial group’s intellectual ability. this review located and summarized findings from Black students performed worse in this condition experimental studies on classroom-based social- than when the same test was given in a second psychological interventions to reduce stereotype condition that introduced the test as one that was threat in schools and classrooms that might lead “not diagnostic of your ability.” The two ways of some Black students to underperform on difficult introducing the test had no effect on the perfor academic tasks or tests.2 mance of White students. Black students in the study sample answered roughly 8 of 30 test items What is stereotype threat and how has it been studied? correctly in the “threat” condition and roughly 12 of 30 correctly in the “no threat” condition. What is stereotype threat? Social psychologists hypothesize that racial stigma could help explain Since the original experimental studies on the why, on average, Black and White students of effects of inducing stereotype threat (Steele and similar socioeconomic backgrounds perform dif Aronson 1995; Steele 1997), there has been an ferently in college and on key standardized tests explosion of research documenting the negative (Steele and Aronson 1995; see also Steele 1997). effect of this phenomenon on performance of vari As students progress through school, classroom ous types (for reviews see Ryan and Ryan 2005; learning environments may become increasingly Shapiro and Neuberg 2007; Steele, Spencer, and competitive, evaluative in nature, and stressful Aronson 2002; Walton and Cohen 2003; Wheeler for some minority students. The logic behind and Petty 2001). Shapiro and Neuberg (2007, stereotype threat is that because of an awareness p. 125), in reviewing this literature, suggest that of negative stereotypes presupposing academic inferiority, Black and other minority students may The intellectual excitement surrounding the worry that they could confirm the intellectual in stereotype threat concept and research pro feriority alleged by such stereotypes (see appendix gram stems in large part from the possibility findingS of Three experimenTal STudieS of inTervenTionS To reduce STereoType ThreaT 3

box 1 • Included Black students in the participants to experimental con Study methods sample. ditions (intervention or control) to • Included K–12 students as the focus. ensure that the groups are as simi Search and screening. This study lar as possible on all characteristics began with a thorough search, The second round of screening so that the outcomes measured screening, and quality review to iden excluded 72 studies. Most studies (65) reflect the influence of the inter tify empirical studies of classroom- were excluded for failing to meet the vention only. Only one study had based social-psychological interven first criterion—they explored vari a limitation in this area (Good, tions designed to reduce stereotype ous aspects of the negative impact Aronson, and Inzlicht 2003). threat and thus improve the academic of stereotype threat on performance performance of Black students. In ad rather than studying interventions to • Attrition of participants. Loss of dition to literature searches using key reduce the intensity of the experience participants can create differ terms, a web site on this topic with of stereotype threat. ences in measured outcomes by an extensive reference list of peer-re changing the composition of the viewed journal articles was examined A second, broader verification search intervention or control groups. (www.reducingstereotypethreat.org). (using the broadest search term Both overall attrition and differen The literature search yielded 158 cita “stereotype threat” without the word tial attrition (differences between tions, and the web site reference list “intervention”) was conducted to intervention and control groups) yielded an additional 131 citations. ensure that relevant studies had not are of concern. All three studies been missed. No additional studies were acceptable in this area. The 289 references were then appropriate for inclusion were found screened for inclusion using a set of among the 741 references identified. • Intervention contamination. Inter six questions (see appendix C for the vention contamination can happen article screening protocol). A total of Assessing the quality of identified inter when unintended events occur after 214 studies were excluded based on vention studies. The three remaining intervention begins that could affect the initial screening, applying the studies were subject to a final quality group outcomes and therefore the first three criteria (see table B2 and review to describe any methodologi conclusions of the experiment. One figure B1 in appendix B for disposi cal limitations, using a study coding study was noted as having a pos tion details). Studies were excluded protocol (see appendix C) based on sible limitation in this area (Good, as off-topic or irrelevant (87); because the five criteria below from the What Aronson, and Inzlicht 2003). they were literature reviews, book Works Clearinghouse Procedures and chapters, or summary articles Standards Handbook (U.S. Depart • Confounding factor. It is impor rather than empirical studies (20); ment of Education 2008) for assessing tant to examine factors beyond or because they focused on gender- the internal validity of studies exam the intervention that might affect based stereotype threat (107). The ining the effects of interventions: differences between groups, such remaining 75 references were subject as the effects of teachers or of to a second round of screening to • Outcome measures. The measures the intervention provider more see whether they met the following used to assess impact must be generally. No studies were noted as criteria: shown to actually measure what having problems in this area. they are intended to measure. The • Studied the effect of a social- three studies reported on here The completed study quality review p sychological intervention (rel used appropriate school measures protocols were used in developing the evant to reducing the intensity of student achievement. final list of limitations reported for of the psychological experience each of the three studies. of stereotype threat) on im • Random assignment process. In provements to student academic experimental studies researchers For further details on the methodol performance. use random assignment to assign ogy see appendixes B and C. 4 reducing STereoType ThreaT in claSSroomS

that the real-world costs of stereotype threat study methods). After relevant inclusion criteria are substantial. . . . If one could design effec were applied, three experimental studies were tive interventions for reducing the experi identified for description here. Those three studies ence of stereotype threat, then one would are described in the following section. have a powerful tool for influencing an important set of societal problems. findings of ThRee expeRimenTal sTudies The experimental manipulations used to study the of inTeRvenTions To Reduce sTeReoType effect of stereotype threat on academic test perfor ThReaT in gRade 7 classRoom seTTings mance are of two kinds, direct and indirect. The direct way of inducing stereotype threat in experi All three studies reported on here found statisti ments has been to tell the test-taking group that cally significant positive effects of the tested inter the test they will take has been sensitive to group ventions on achievement measures. The following differences in the past (for example, “this test intervention strategies were tested in the studies shows racial differences”), thus raising the poten described in detail below: tial relevance of the stereotype as an explanation for poor performance. An indirect way of studying • Reinforce for students the idea that intelli the negative effects of stereotype threat has been gence is expandable and, like a muscle, grows to inform students that a test is “diagnostic of your stronger when worked. ability” (as in Steele and Aronson 1995), convey ing that the test is designed to evaluate students’ • Teach students that their difficulties in school performance along a stereotype-relevant trait are often part of a normal “learning curve” or (intellectual ability) and consequently bringing to adjustment process, rather than something the fore concerns about confirming the stereotype. unique to them or their racial group.

To the extent that stereotype threat might be a • Help students reflect on other values in their factor in some Black students experiencing extra lives beyond school that are sources of self- stress when doing challenging academic work in worth for them. school, what can be done to alleviate this stress and possibly improve their performance? Relatively few Table 3 at the end of the main report summarizes experimental studies have been conducted in class the outcome measures, analytic techniques, and room settings on interventions to explicitly reduce the findings across the three studies. (Table B4 in the experience of stereotype threat and thus im appendix B summarizes the methodologies.) prove the academic performance intervention strategies of Black students. However, some Study 1: Blackwell, Trzesniewski, and Dweck in the studies described recent classroom-based experi 2007, “Implicit theories of intelligence predict reinforced the idea mental studies were identified that achievement across an adolescent transition: that intelligence is have relevance for educators. a longitudinal study and an intervention” expandable, taught students that difficulties This study’s search for empirical Intervention idea in school are often part studies of classroom-based social- of a normal “learning psychological interventions de- • Reinforce for students the idea that intelli curve,” or helped signed to reduce stereotype threat gence is expandable and, like a muscle, grows students reflect on other and thus improve the academic stronger when worked. values in their lives as performance of Black students sources of self-worth initially identified 289 references There is much research in psychology explor (see box 1 and appendix B on ing the idea that some students can be trained to findingS of Three experimenTal STudieS of inTervenTionS To reduce STereoType ThreaT 5

think more productively about how they approach groups of students can re- The Blackwell, performance challenges. One belief that seems to ceive more individual at- Trzesniewski, and dweck affect how students approach such challenges is tention from a teacher— study reports on the that intelligence is not fixed but malleable, that were randomly assigned effects of an intervention it can be developed through focus and effort and to an intervention or to teach students to thus that intelligence can be taught (Dweck 1999; control curriculum to see intelligence as Whimbey 1975). Indeed, Aronson, Fried, and test the effectiveness of incrementally developed Good (2002) posit that some Black students might teaching students about rather than fixed have developed a stereotype-consistent belief that the theory of incremental their intellectual ability is “fixed,” causing them intelligence. Both groups to feel more negative about academic performance received eight weekly 25-minute sessions begin situations than they would if they believed that ning in the spring of grade 7 during their regular their ability could grow with greater focus, ef advisory class period (to which they had been fort, and creativity in problem-solving strategies. assigned at random by the school). Alternatively, such students may feel that others see their ability as fixed and thus worry about Both intervention and control groups received four negative inferences being drawn about them based 25-minute sessions on the brain, the pitfalls of on their performance. Thus, reinforcing the idea stereotyping, and study skills. In four additional that intellectual ability is malleable and incremen sessions the intervention group received informa tally developed and that others view it in this way tion that focused on “growing your intelligence” indirectly reduces students’ sense of psychological and involved reading age-appropriate descriptions threat under challenging academic performance of neuroscience experiments documenting brain situations. growth in response to learning new skills and class discussions on how learning makes students The first study reports on the effects of an inter smarter. The intervention was based on previ vention to teach students to see intelligence as ous experimental materials used in studies with incrementally developed rather than fixed. college students (Aronson, Fried, and Good 2002; Chiu, Hong, and Dweck 1997). For these four ses Research question. Does teaching students to sions the control group received content unrelated see intelligence as malleable or incrementally to the malleability of intelligence and focused developed lead to higher motivation and perfor instead on topics about the brain and memory mance relative to not being taught this theory of that were unrelated to the incremental theory of intelligence? intelligence.

Study sample. The study sample included 91 grade The sessions were delivered by 16 trained under 7 students in an urban public school with low- graduate assistants, with two undergraduates as achieving students (52 percent Black, 45 percent signed to each class. To ensure consistent delivery Hispanic, and 3 percent White and Asian; 79 per of the intervention materials, session leaders cent eligible for free or reduced-price lunch). There received reading material and met weekly with the were 48 students in the intervention group and 43 research team to review the material and pre in the control group. The two groups did not differ pare to present it to their assigned advisory class. significantly in their prior academic achievement Intervention and control workshop leaders met (fall term math grades) or on any baseline mea separately to train to prepare for the four sessions sures of motivation. with different content.

What was the intervention? Students in advisory Results. The researchers first provided results to classes—periods in the schedule when small show that their intervention had been successful 6 reducing STereoType ThreaT in claSSroomS

The intervention in teaching the intervention group scores for the entire sample (spring grade 6, 2.86; group in the Blackwell, students about the incremental fall grade 7, 2.33; spring grade 7, 2.11). Analysis Trzesniewski, and dweck theory of intelligence. The results revealed a significant decline in scores for the total study improved from from a theory of intelligence ques sample between the spring of grade 6 and fall of pre- to postintervention, tionnaire given to students before grade 7 (b = –.34, t = –4.29, p < .05) and between whereas the control and after the intervention showed fall of grade 7 and spring of grade 7 (b = –.20, group showed a that participants in the interven t = –2.61, p < .05). continued downward tion group changed their opinions trajectory in performance toward a more incremental view of The researchers further reported that the interven intelligence after the intervention. tion group improved from pre- to postintervention The researchers reported that a (fall of grade 7 to spring of grade 7), whereas the paired sample t-test (see box 2 for definition of key control group showed a continued downward terms) was significant t( = 3.57, p < .05, Cohen’s trajectory in performance (figure 1). That is, d = .66), indicating that the intervention group en the intervention had a significant positive effect dorsed the incremental theory more strongly after (b = .53, t = 2.93, p < .05) on math scores from the the intervention (mean score of 4.95 on the ques fall of grade 7 to the spring of grade 7. tionnaire) than before (4.36). The control group mean score on the questionnaire did not change The researchers also collected comments from (4.62 preintervention and 4.68 postintervention; math teachers about students who had shown t = 0.32 and not significant, Cohen’s d = .07). changes in motivational behavior after the advisory class sessions. (The teachers did not know to which The important question, then, was whether condition their students had been assigned.) The achievement was higher in the intervention group study reported that 27 percent of the intervention as a result of the intervention. The researchers group students received positive comments from assessed the effect of the intervention on academic math teachers about motivational change after the achievement by examining the growth curves intervention, compared with 9 percent of the con of participants’ math scores across three points trol group, a statistically significant difference. in time: spring of grade 6 to fall of grade 7 (both prior to the intervention) and spring of grade Methodological review. No reservations were iden 7 (postintervention). The researchers noted an tified concerning the methodological quality of the overall downward trajectory in the mean math study based on the study quality review protocol

box 2 a sample be found if the real popula F-statistic. Represents the ratio of the Key terms tion difference were zero. between-group variation divided by the within-group variation. A statisti t-statistic. For a given sample size, Degrees of freedom. The number of cally significant F-statistic indicates the t-statistic indicates how often independent observations used in a that the mean is not the same for all differences in means as large as or given statistical calculation and typi groups (conditions). larger than those reported would be cally calculated by subtracting 1 from found when there is no true popula the number of independent observa Effect size. The impact of an effect ex tion difference in means (the “null tions (sample size). pressed in standard deviation units. hypothesis”). For example, a reported t-statistic that is statistically signifi b-statistic. Represents the slope of a Cohen’s d. A type of effect size that cant with a p-value of .05 indicates regression line based on predictors represents the standardized mean that in only 5 of 100 instances would measured in their naturally occur difference between the treatment and this difference between the means in ring units. control groups. findingS of Three experimenTal STudieS of inTervenTionS To reduce STereoType ThreaT 7

figure 1 assistants, not teachers. Thus, it is also unknown estimated mean math scores by experimental to what extent the intervention effect would hold condition up if delivered by teachers rather than trained 3.00 undergraduates who, in this case, were closer in age to the students.

2.75 Intervention group Study 2: Good, Aronson, and Inzlicht 2003, “Improving adolescents’ standardized test performance: an

2.50 intervention to reduce the effects of stereotype threat”

Control group Intervention idea 2.25 • Teach students that their difficulties in school are often part of a normal “learning curve” or 2.00 Spring Fall grade 7 Spring grade 7 adjustment process, rather than something grade 6 preintervention postintervention unique to them or their racial group.

Source: Blackwell, Trzesniewski, and Dweck 2007. A related potentially unproductive thought process occurs when students attribute academic struggles criteria. (See summary of quality criteria on this to their intellectual limitations, which may be more study in appendix B.) likely for students who struggle with stereotypes about their group’s intellectual inferiority. To the Conclusions. The researchers suggest that the extent that students attribute normal difficulties— incremental theory intervention “appears to have for instance, those that occur with hard-to-learn succeeded in halting the decline in mathemat topics or concepts—to fixed personal inadequacies, ics achievement” (p. 258). Future research on they may experience more distraction, anxiety, the role of teachers in changing students’ beliefs and pessimism. Thus, interventions might reduce about intelligence is needed, though these results the negative effects of stereotype threat, as well are promising, particularly as the treatment was as other forms of doubt, by encouraging students found to yield a significant effect in a low-income, to attribute difficulty in school to the transitory urban setting where problems associated with struggles all students experience. minority underperformance can be severe. Research question. Can The good, aronson, Study limitations. This study was conducted in teaching students to attri and inzlicht study asks a single school, and thus the uniqueness of the bute academic difficulties whether interventions school context or population as the setting for the to transitory situational might reduce the intervention is unknown. Another limitation in causes rather than to negative effects of generalizing the results of this study is that the stable personal causes im stereotype threat, as sample of students was racially mixed (primarily prove standardized math well as other forms of Hispanic and Black), making it difficult to deter and reading test scores? doubt, by encouraging mine whether the intervention benefited both mi students to attribute nority groups equally. The study authors acknowl Study sample. The difficulty in school to edge that the effects were measured at a single study took place in a the transitory struggles point in time, and it is not known whether the rural school district in all students experience effects of the intervention would hold up for stu Texas serving a largely rather than to fixed dents as they moved to grade 8. The intervention low-income and pre personal inadequacies sessions were delivered by trained undergraduate dominantly minority 8 reducing STereoType ThreaT in claSSroomS

population (63 percent Hispanic, 15 percent Black, reinforcing the message that they had learned and 22 percent White). Study participants were and helping to internalize the message through 138 grade 7 students enrolled in a computer skills a self-persuasion process. A restricted web space class as part of their junior high school curricu was created for each of the four conditions so that lum. Enrollment in the course was randomly students learning a particular message could read determined by the school administration, and all more about their assigned message but not read students in the course participated in the study. the messages for the other three groups and ac As part of the regular course curriculum, students quire additional ideas for polishing their web page. learned a variety of computer skills including using email and designing web pages. The mentors were 25 college students who partici pated in a three-hour training session on mentor What was the intervention? Shortly after the school ing required by the district and then supplemen year began (mid-October), students in the com tary training by the researchers on how to convey puter skills class were randomly assigned a mentor the four messages tied to the four conditions in the with whom they communicated in person and by study. The same mentors delivered the interven email throughout the school year. They were also tion to students in three of the four conditions. randomly assigned to receive one of four types of educational messages from their mentors: Results. At the end of the school year participating students’ scores on statewide standardized tests • Incremental message (40 students). Students in math and reading were analyzed for the four learned about the expandable nature of intelli groups of students. gence (as explored in the previously described study). Math test scores were analyzed using a 2 (gender) by 4 (experimental condition) analysis of variance. • Attribution message (36 students). Students The math analyses are not presented here because learned about the tendency for all students to they focused on understanding gender effects, initially experience difficulty during grade 7 which were not the focus of this report. and about the tendency for this difficulty to sub side with time and for performance to improve. Reading scores on the Texas Assessment of Academic Skills were analyzed using a one-way • Combination of messages (30 students). analysis of variance that compared the per Students received both the incremental and formance of students participating in the four attribution messages. experimental conditions. Although the research statistical tests for the ers were interested in differences between Black good, aronson, and • Control condition (32 stu and White students’ performance in the four inzlicht study showed dents). Students learned about conditions, the samples were not large enough to that scores on the the perils of drug use (an analyze the two groups separately. The analysis state reading test were unrelated topic). of variance conducted on the state reading test significantly higher scores revealed a significant effect (p < .05) of the for students in the The mentors conveyed the con- conditions, F (3,125) = 2.71. Follow-up statisti conditions receiving the tent of their assigned messages in cal tests showed that scores on the state reading incremental (malleable) person to the students during two test were significantly higher for students in the intelligence message and school visits of 90 minutes each. conditions receiving the incremental (malleable) the attributional message After learning this information, intelligence message (mean score of 88.26) and than for students in students created public service the attributional message (89.62) than for students the control group announcements on the web with in the control group (84.38) (table 1). There was guidance from their mentor, findingS of Three experimenTal STudieS of inTervenTionS To reduce STereoType ThreaT 9

Table 1 Reported intervention impacts on the spring grade 7 state reading test (Texas assessment of academic skills) intervention incremental attribution combined control effect condition condition condition condition mean reading score 88.26 89.62 86.71 84.38 Standard deviation 7.17 7.01 8.70 7.79 difference between each t(65) = 2.07 t(61) = 2.72 experimental condition and control p < .041 p < .008 not condition cohen’s d = .52 cohen’s d = .71 significant

Source: Good, Aronson, and Inzlicht 2003.

no significant difference between the combined interventions on state test performance builds on messages condition and the control condition. prior experimental studies showing the effects of similar interventions on college students’ class- Methodological review. Applying the study quality room performance (see Wilson and Linville 1985). review criteria revealed two limitations of the meth odology (see appendix B for complete summary). Study limitations. The sample in this study was mixed. Although it consisted mainly of minority Random assignment process. The study reported students, Hispanic students made up 63 percent of that 6 of the 138 students’ scores were removed the sample and Black students only 15 percent. So, from the analysis, which could be considered a there are limitations in generalizing the findings to disruption in the random assignment process. In Black students alone. As in the first study, teachers addition, no evidence was presented of the equiva did not deliver the intervention and thus it is dif lence of the four groups on baseline achievement. ficult to know under what conditions teachers can Although the authors reported that these six effectively deliver the intervention (for instance, students did not come from any particular experi how much teacher training would be needed, what mental condition or group, it is difficult to know kind of materials would they use). Nevertheless, the how well the random assignment process worked results are interesting, especially the finding of the in creating equivalent groups at baseline without intervention conditions’ significant effect on aca these data. Therefore, the study results showing demic achievement in a low-income school setting. differences between experimental conditions after the treatment should be interpreted with caution. Study 3: Cohen, Garcia, Apfel, and Master 2006, “Reducing the racial achievement gap: Intervention contamination. The same mentors a social-psychological intervention” delivered the intervention to students in three of the four conditions, so the intervention conditions Intervention idea could have been somewhat blurred if the mentors brought knowledge from one condition to their • Help students reflect on other values in their delivery of another. However, under What Works lives beyond school that are sources of self- Clearinghouse review standards, contamination worth for them. such as occurred in this study is not considered grounds for downgrading a study. Another route to alleviating stereotype threat is to allow individuals to affirm an alternative Conclusions. The authors suggest that showing positive identity—one that shores up their sense the positive impact of these attitude-changing of self-worth in the face of threat. Through 10 reducing STereoType ThreaT in claSSroomS

The cohen et al. study self-affirmation people reinforce What was the intervention? The intervention was examined whether their sense of personal worth or intended to engage students in a self-affirmation allowing students to integrity by reflecting on sources process that would alleviate some of the stress affirm an alternative of value and meaning in their Black students might feel from stereotype threat positive identity — one lives (Steele 1988). People are and thereby improve their academic performance. that shores up their sense better able to tolerate psychologi The affirmation intervention was a series of writ of self-worth in the face of cal threat in one domain (such as ing assignments designed to induce feelings of self- threat— would alleviate school) if they can shore up their worth and test whether psychological threat could stereotype threat self-worth in another domain be lessened through asking students to “reaffirm” (such as family). Laboratory their “self-integrity.” The assignments (developed research shows that self-affir by the researchers) were provided to students in an mations can reduce stress (Creswell et al. 2005). envelope and included self-explanatory instruc For example, college students asked to give a tions that required little teacher involvement. The speech in front of a sullen audience displayed teachers’ role in the study was to hand out the en lower levels of the stress hormone cortisol if they velopes containing the writing assignments, pro were first given the opportunity to engage in vide a brief scripted introduction to students, and the self-affirmation exercise of reflecting on an then to remain at their desks and allow students important value, such as their relationships with to independently complete the assignment and friends. return their work to the teacher in the envelope.

Research question. Would Black students perform The envelopes were identical for the intervention significantly better in a targeted course when they condition and the control condition assignments, received a self-affirmation intervention than when so teachers were unaware of which students were they did not? The researchers hypothesized less receiving the self-affirmation intervention. The of an intervention effect for the nonstereotyped self-affirmation assignment was designed to group, as the risk factors (elevated stress and encourage students to think about a personal value psychological threat) were expected to be lower for or values they had singled out as important and its nonstereotyped students, who do not contend with significance in their lives. a negative stereotype about their racial group. Students in both groups received a list of values and Study sample. The researchers report the results were asked to read and think about them. The val of two randomized experiments. The second, a ues were notions such as athletic ability, creativity, replication study, took place a year after the first music, relationships with friends, independence, re study and with a different cohort of students. A ligious values, and sense of humor. The instructions total of 119 Black and 124 White grade 7 students for students in the intervention group asked them participated in the two studies (roughly evenly to select their most important value (or values) and distributed across the two studies). Students were to write a paragraph about its importance to them. from a suburban northeastern middle school. The instructions for students in the control group The three teachers who participated all taught asked them to select their least important value (or the same subject area. At the beginning of the fall values) from the list and write about why it might semester students were randomly assigned to an be important to someone else. The instructions intervention or control condition. Teachers were then asked the students in the intervention group to unaware of which students in their classes were write the top two reasons why the value (or values) assigned to which of the two conditions, and the they selected was important to them. The students two experimental conditions as described below in the control group were instructed to write the were presented to students as part of the regular top two reasons why someone else might consider classroom curriculum. their least important value important. Finally, the findingS of Three experimenTal STudieS of inTervenTionS To reduce STereoType ThreaT 11

instructions asked students to select their level of The mean differences in in the cohen et al. agreement with four statements about the values the outcome measure for study Black students they chose (most important value for the interven Black students and White receiving the affirmation tion condition and least important for control con students by three levels of intervention had higher dition) as a way of reinforcing their value selection prior academic perfor grades in the targeted in the affirmation condition. mance are shown in course in the fall term table 2. The study reports than did Black students Teachers presented the instructions to students that the intervention was in the control condition as a regular classroom assignment. Completing as strong for previously the assignment took students in both interven low-performing Black tion and control conditions about 15 minutes. One students (t(31) = 2.74, p <.01) as for previously structured writing assignment was provided to moderate-performing Black students (t(30) = 2.40, students in the first study, and two were provided p < .02). The previously high-performing Black to students in the replication study. students benefited less from the intervention con dition (t(31) = 1.72, p < .10). Results. The outcome data collected were students’ GPAs from official transcripts in the targeted course The intervention effect on the difference in GPA for the fall term in which the intervention was between Black students receiving the affirma delivered. The data were analyzed using multiple tion intervention and those in the control group regression. The interaction of race (Black or White) was 0.43 point for the previously low-performing and experimental condition (affirmation interven group, 0.44 point for the previously moderate-per tion condition or control condition) was significant forming group, and 0.22 point for the previously for study 1 (b = 0.29, t(98) = 2.00, p < .05) and high-performing group. In all three cases Black study 2 (b = 0.52, t(119) = 2.80, p < .01), as was the students who received the affirmation interven treatment main effect for Black students in study 1 tion had a higher mean GPA in the course than (b = 0.26, t(41) = 2.44, p < .02) and study 2 (b = 0.34, did Black students in the control group. Addition t(60) = 2.69, p < .01). Black students receiving the af ally, the intervention effect for Black students firmation intervention had higher grades in the tar extended to courses beyond the targeted course, as geted course in the fall term than did Black students evidenced in an analysis of students’ mean GPA in in the control condition. The difference in GPA for core academic courses. Black students in the intervention condition and the control condition was 0.26 point in the first study Combining data from studies 1 and 2 shows that and 0.34 point in the second replication study. the intervention reduced the percentage of Black

Table 2 covariate-adjusted mean grade point average (averaged over both studies) for intervention and control groups, by level of preintervention performance low -performing moderate - performing high -performing student group student group student group condition black White black White black White affirmation intervention 1.7 2.2 2.8 3.3 3.5 4.0 control 1.3 2.3 2.4 3.2 3.3 4.0

Note: The grade point average is that received in the fall term in the academic subject in which the experiment was carried out at the beginning of the school year. The academic subject area was not identified in the study except to say that it was not one that was typically related to gender stereotypes (for example, math). Source: Cohen et al. 2006. 12 reducing STereoType ThreaT in claSSroomS

students earning a D or below in the fall term of although brief, can help reduce what many view the course from 20 percent in the control group, a as an intractable disparity in real-world academic rate consistent with historical norms at the school, outcomes” (p. 6). to 9 percent in the intervention group, a significant difference (figure 2). There was no significant dif Study limitations. Limitations of the study include ference between the intervention and the control the fact that it was conducted in only one school and conditions for White students. grade level in a suburban district and that it is dif ficult to determine how representative the sample is Methodological review. No reservations were of the general population from which it was drawn. identified concerning the methodological quality It is thus difficult to know whether the interven of this study on the study quality review protocol tion would yield similar benefits in other schools of criteria (see appendix B for details). varying demographic and socioeconomic charac teristics and in other grade levels. Additionally, as Conclusions. The authors conclude that “our with the other two interventions reported here, it is intervention is among the first aimed purely at unclear whether the intervention would be similarly altering psychological experience to reduce the beneficial when prepared and implemented entirely racial achievement gap.” That is, rather than “lift by teachers rather than trained researchers. Still, all ships,” the intervention benefits those most in the results are promising, as the intervention effect need—low-performing Black students. Addition proved replicable (obtained in two separate stud ally, “the research highlights the importance of ies), and the effect of the intervention on minorities’ situational threats linked to group identity in grades was consistently positive across most of the understanding intellectual achievement in real- range of prior achievement. world, chronically evaluative settings . . . [and] challenge[s] conventional and scientific wisdom by demonstrating that a psychological intervention, concluding ThoughTs on TuRning ReseaRch inTo pRacTice

figure 2 percentage of Black and White students receiving The objective of this report was to conduct a sys a grade of d or lower in targeted course in same tematic search to identify classroom-based strate semester as the intervention, by experimental gies designed to reduce stereotype threat and thus condition to improve the academic performance of Black students. The three studies that were identified Percent 30 found that the following social-psychological strat Black students White students egies had impacts on minority group achievement: 25 • Reinforce for students the idea that intelli 20 20 gence is expandable and, like a muscle, grows stronger when worked.

• Teach students that their difficulties in school 10 9 8 7 are often part of a normal “learning curve” or 6 adjustment process, rather than something unique to them or their racial group. 0 Historical norms Control condition Aﬃrmation condition • Help students reflect on other values in their

Source: Cohen et al. 2006. lives beyond school that are sources of self- worth for them. concluding ThoughTS on Turning reSearch inTo pracTice 13

When considering these studies, several limita of an intervention, key The evidence suggests tions of this review are important. First, the elements may be left out, that strategies such search was very focused, intended to identify only rendering the interven as those analyzed in studies of interventions that had been tried in real tion less effective. the three experiments school settings. For each strategy, there is a larger reported on here might body of social-psychological theory and research For example, the tim reduce the level of that led to the testing of the particular interven ing of interventions is psychological threat tion that is not reviewed. Few social-psychological important. The interven some Black students studies are conducted in classroom settings, but tions in the Blackwell, feel in the classroom it was important to focus only on studies with Trzesniewski, and Dweck and that, combined possible applicability for educators. Another (2007) and Cohen, Garcia, with other efforts, limitation is that these strategies do not repre Apfel, and Master (2006) these strategies could sent all the possible ways of reducing stereotype studies seemed to halt or benefit the performance threat, only those that have been studied with at least slow a downward of Black students rigorous research. There may be other, better ways performance spiral for of reducing stereotype threat that have not been students. All three stud studied. ies were conducted on students in grade 7, which raises the possibility that there may be windows Finally, readers should be aware that the studies of opportunity for influencing student attitudes here are small in scope, and their replicability is and beliefs. For instance, grade 7 is a time when unknown. However, it is clear that the stereotype concerns about race-based stereotype increase for threat phenomenon has been experimentally minority students and is a developmental period shown to exist across a wide variety of studies. when adolescents’ sense of identity is in flux. In Thus, it is important to share ideas for reducing terventions may be particularly influential at such the negative effects of this phenomenon, even if junctures by altering students’ early trajectory and they are in the early stages of knowledge devel preventing a path of compounding failures. opment. For the three experiments reported on here, evidence suggests that such strategies might Thus, the grade level at which the intervention reduce the level of psychological threat some Black ideas are applied is an important consideration, students feel in the classroom and that, combined as is the timing during the year. For example, the with other efforts, these strategies could benefit self-affirmation assignment may be most effec the performance of Black students. tive when given at times of high stress, such as the beginning of the school year, to halt or reverse Although researchers have developed specific a downward slide that could otherwise feed off protocols to follow for the interventions in some itself, with stress worsening performance and with contexts, educators might need to adapt the inter deteriorating performance heightening stress in ventions to fit their classrooms and then moni a repeating cycle. Such downward slides coincide tor them to determine what impact they have. with academic transitions, such as the transition An understanding of the purpose and process to middle school, high school, or college. These involved in using the strategy is important, as are times when performance standards shift is professional wisdom about how to apply the upward, when students’ sense of identity is not process in a given classroom context. Such under yet crystallized, and when social-support circles standing and awareness help ensure that the spirit are disrupted, heightening stress and feelings of of the intervention is not lost when local condi exclusion. If a small psychological intervention tions prevent a teacher from strictly following can interrupt a downward spiral at such times, or the protocols. If school teams or teachers do not prevent it from emerging, there is the possibility of grapple with the underlying rationale or purpose large and long-term effects (Cohen et al. 2006). 14 reducing STereoType ThreaT in claSSroomS

The three studies Social-psychological research not teach a student to spell who does not already reported here suggest suggests that human intellectual know the fundamentals. They will not suddenly that seemingly performance and motivation are motivate an unmotivated student or turn a small actions in the fragile (Aronson and Inzlicht low-performing and underfunded school into a classroom — when well 2004; Aronson and Steele 2005). model school. More generally, the interventions timed, well targeted, The three studies reported here would not work if there were not broader posi and thoughtfully suggest that seemingly small tive forces in the school environment (committed and systematically actions in the classroom—when staff, quality curriculum) operating to facilitate implemented — can well timed, well targeted, and student learning and performance. Without these produce positive results thoughtfully and systematically broader positive forces, social-psychological for minority students implemented—can produce posi interventions, while potentially reducing psycho tive results for minority students. logical threat levels for some students, would be unlikely to boost student learning and achieve It is important to bear in mind, however, that ment. However, when these broader positive none of these interventions would work unless forces are in place, social-psychological interven students already have some ability or motivation tions such as those reported on here may help to improve academically and unless the school Black and other minority students to overcome has the foundational resources to permit students stereotype threat and improve their performance to achieve at a higher level. The interventions will in school. concluding ThoughTS on Turning reSearch inTo pracTice 15

Table 3 summary of effects reported by the three studies analysis description of difference between Study outcome measure technique Treatment effecta intervention and control groups blackwell, predicted math growth curve t(371) = 2.93, p < .05 according to a figure presented in Trzesniewski, grades on a 0 [f] to analysis the study report, the intervention and dweck 4.0 [a] scale group averaged roughly a 0.10 (2007) increase in math course grades from fall to spring. The increase does not represent a letter grade change (such as c+ to b–); it remains within the c+ range. however, the increase for the intervention group from fall to spring, though small, contrasted with a decline in grades for the control group from fall to spring (from roughly a c+ to a c). good, reading analysis of incremental condition: compared with students in the aronson, achievement variance t(65) = 2.07, p < .05 control condition, students in the and inzlicht scores on the attributional condition: incremental condition earned (2003) Texas assessment t(61) = 2.72, p < .01 an average 3.88 points higher of academic on the TaaS and students in Skills (TaaS) effect sizes reported: the attributional intervention standardized tests • between students who condition an average of 5.24 received the message on points higher. the incremental theory of The .52 and .71 effect sizes intelligence and students in the reported are considered moderate control group: cohen’s d = .52 to large effects for educational • between students who interventions. received the attributional message and students in the control group: cohen’s d = .71 cohen, Targeted course multiple Study 1: according to the authors, the garcia, apfel, grade point regression t(41) = 2.44, p < .02 intervention effects translated and master average, on a 0 [f] Study 2 (replication): into an estimated 0.26 point (2006) to 4.33 [a+] grade t(60) = 2.69, p < .01 increase in study 1 and 0.34 point point average scale increase in study 2, respectively, in fall targeted course grades for black students in the intervention condition compared with those in the control condition.

Note: Only the Cohen et al. (2006) study directly analyzed the reduction in the achievement gap between Black and White students. The other two studies reported on positive effects of the intervention on the overall sample of students, which included primarily minority (Black and Hispanic) students. However, the two studies were not able to compare minority student improvement with that of White students. a. The effect size statistic represents the impact of the effect in standard deviation units. Because only the Good, Aronson, and Inzlicht (2003) study calcu lated and reported effect sizes, effect sizes could not be compared across the studies. Instead t-statistics and corresponding p-values are reported. For a given sample size, the t-statistic indicates how often differences in means as large as or larger than those reported would be found when there is no true population difference in means (the null hypothesis). The number in parentheses with the t-statistic indicates the degrees of freedom. Cohen’s d, a type of effect size, represents the standardized mean difference between the intervention and control groups. It is calculated by dividing the difference between the intervention group and control group means by either their average standard deviation or by the standard deviation of the control group. See box 2 for more detailed definitions. Source: Authors’ compilation and calculation from Blackwell, Trzesniewski, and Dweck (2007); Good, Aronson, and Inzlicht (2003); and Cohen et al. (2006). 16 reducing STereoType ThreaT in claSSroomS noTes intervention group that receives the interven tion and a control group that does not. If the 1. This report uses the termB lack students students randomly assigned to the interven throughout, even when the reported study tion group perform significantly better on used a different term. the outcome measure than do students in the control group (a less than 5 percent probabil 2. The No Child Left Behind Act of 2001 refers to ity of the difference between the two groups “scientifically based research” as an impor being due to chance), it is likely that the tant criterion for educators as they consider difference in performance was the result of new interventions or strategies. Randomized the intervention. Random assignment creates controlled experiments are said to be the groups that should be (on average) identical in “gold standard” of the sciences, the highest all dimensions except for receiving the inter standard of evidence or methodology avail vention; thus, any differences in outcomes can able for studying the effectiveness or impact be attributed to the intervention. The three of an intervention. In such experiments published studies identified and examined in participants are randomly assigned to one of this report use this type of research design two or more conditions that differ in a critical for testing interventions to reduce stereotype way that is hypothesized to have a particu threat in classrooms and improve academic lar impact. At the simplest level there is an performance. appendix a. reSearch on The relaTionShip beTWeen STereoType ThreaT and academic performance 17

appendix a items) was presented to one group of college stu ReseaRch on The RelaTionship dents as “diagnostic of intellectual ability.” It was BeTWeen sTeReoType ThReaT and Black hypothesized that Black students in this condition sTudenTs’ academic peRfoRmance would worry that performing poorly could con firm a stereotype about their racial group’s intel Students’ academic performance in classrooms, lectual ability. Black students performed worse in because of processes such as stereotype threat, this condition than when the same test was given can be more variable than people customarily in a second condition that introduced the test as think, fluctuating with changes in the situation one that was “not diagnostic of your ability.” The (Aronson and Steele 2005). For example, studies two ways of introducing the test had no effect on show that women’s performance on math tests the performance of White students. Black stu can be made to rise and fall with surprising ease. dents in the study sample answered roughly 8 of When women were asked to generate a short list 30 test items correctly in the “threat” condition of qualities shared by men and women, their math and roughly 12 of 30 correctly in the “no threat” test performance rose (Rosenthal and Crisp 2006). condition. In another study, when women were reminded that they were students at a selective liberal arts Since this first Steele and Aronson study, the con college and their attention was thus turned away cept of heightened performance stress or anxiety from their gender, women’s spatial-abilities test for certain groups has been found across a variety performance rose and the male-female gap shrank of potential stereotypes and minority groups. (McGlone and Aronson 2006). When women took Experimental studies have shown that detrimental a test in the presence of men, their math perfor stereotype threat affects not only Black students mance declined (Inzlicht and Ben-Zeev 2000). But on verbal tests, but Hispanic students on verbal when women were presented with a female test tests (Aronson 2002), young women on math tests proctor who excelled in math, their performance (Quinn and Spencer 2001; Spencer, Steele, and improved and the male-female gap again shrank Quinn 1999), White men in certain sports situ (Marx and Roman 2002). Such studies underscore ations (Stone et al. 1999), students from socio the degree to which human performance is shaped economically disadvantaged households on school by environmental and psychological forces—not tests (Croizet and Claire 1998), and high-perform simply by how smart a student is or how hard he ing White students on math tests when they are or she works. reminded of the stereotype of Asian superiority in math (Aronson et al. 1999). Research on stereotype threat began with labora tory studies exploring why Black college students Direct and indirect manipulations of stereotype threat seemed to be performing below their potential. Although a test-taking situation may seem objec Experimental manipulations of stereotype threat tively the same for all students, some students, have differed, and these differences can be relevant because of their social identity, may experience it to test-taking instructions used in K–12 settings in a very different way. (Quinn and Spencer 2001). One direct way of in ducing stereotype threat in experiments has been Steele and Aronson (1995) conducted an experi to tell the test-taking group that the test they will ment to explore the negative impact of adminis take has been sensitive to group differences in the tering a test under potentially stereotype-threat past (for example, “this test shows racial differ inducing conditions by randomly assigning study ences”), thus raising the potential relevance of the participants to two different test-taking condi stereotype as an explanation for the test taker’s tions. In one test-taking condition a standardized poor performance. Although drawing attention to test (composed of verbal Graduate Record Exam group differences just before administering a test 18 reducing STereoType ThreaT in claSSroomS

(for example, stating that girls have performed good student, can improve performance (McGlone worse than boys on the math test in the past, or and Aronson 2006). that Black students as a group performed poorly on the test the previous year) could cause a few Mediating mechanisms students to rise to the challenge, the laboratory research suggests that the average performance of Although inducing stereotype threat conditions negatively stereotyped group members decreases. has been shown across multiple studies to result in The fact that some of this laboratory research was poorer performance from the stereotyped group, conducted with college students on elite campuses the research has been less clear on the mediating (Steele and Aronson 1995) suggests that such a mechanisms—on why stereotype threat results in detrimental effect could occur even among the poorer performance. most confident and skilled students. Some researchers have studied mediating mecha- A less direct way of studying the negative ef nisms that might interfere with the quality of fects of stereotype threat has been to inform the the performance under conditions of stereotype students in the study that the test is “diagnostic threat such as increases in stress, anxiety, self- of your ability” (as in Steele and Aronson 1995). consciousness, mental load, or heightened de- This conveys that the test is designed to evaluate mands on working memory—all of which could students’ performance along a stereotype-relevant lead to less focus on the task at hand, suboptimal trait (intellectual ability) and consequently can test-taking strategies (such as guessing more), and bring to the fore concerns about confirming the underperformance (Beilock et al. 2006; Schmader stereotype. Experimental studies have shown that and Johns 2003). Making students aware of the the performance of the stereotyped group tended effects of anxiety from stereotype threat has to be poorer in the group that received the instruc been shown in several studies to improve the tion that the test was diagnostic of ability than in performance of negatively stereotyped students the comparison group that received instructions (Johns, Schmader, and Martens 2005; McGlone emphasizing that the test is not diagnostic of abil and Aronson 2007), presumably because aware ity (Spencer et al. 1999; Steele and Aronson 1995). ness of external pressures reduces the tendency to attribute test anxiety to one’s intellectual short- The power of these direct and indirect ways of comings by providing an alternative attribution. inducing stereotype threat relates to a general The study findings suggest that helping students psychological principle that has been widely understand stereotype threat might inoculate studied—the priming effect. The priming effect them in some way against the extra stress or lack refers to the tendency for people to conform their of focus that might take their attention away from thoughts, feelings, and behaviors to psychologi the performance at hand. cally accessible mental constructs such as stereo types. Thus, when individuals are “primed” with Experiencing stereotype threat over time a negative stereotype, their interpretations of ambiguous stimuli, behaviors, or performances Although difficult to study, some long-term effects are often influenced by the stereotype, even when of repetitively experiencing the extra stress due the priming occurs at the unconscious or sub- to stereotype threat have been suggested. One liminal level. The implication of priming effects consequence might be that as Black students have for teachers trying to encourage their students to the opportunity to make choices in school, some perform to their potential is that subtle events in of them might avoid challenges by selecting easier the classroom can undermine a student’s confi courses or assignments when they are being aca dence, trust, and performance. Studies also show demically evaluated. Studies with middle school that priming positive concepts, such as being a minority students have found that students asked appendix a. reSearch on The relaTionShip beTWeen STereoType ThreaT and academic performance 19

for easier problems to solve when confronted with But there were individual differences that moder the prospect of being intellectually evaluated on ated these findings. Minority students were less the basis of their performance (Aronson and Good likely to avoid a challenge if they believed that the 2002). Compared with White students, the minor challenge could increase their intelligence. Ad ity students showed a strong tendency to take on ditionally, reducing stereotype threat through an less challenging work, presumably because they experimental intervention increased minorities’ were threatened by the prospect of looking less interest in taking challenging rather than easy col intelligent if the challenge proved too great. lege courses (Walton and Cohen 2007). 20 reducing STereoType ThreaT in claSSroomS

appendix B threat.” Forward citation searches using seminal meThodology stereotype threat papers and searches of reference lists in newly published work were also conducted. The methodology for this study included a sys The searches yielded 158 citations (table B1). In tematic search, screening, and review process to addition, a web site on this topic, with an extensive ensure methodological replicability. reference list of peer-reviewed journal articles, was reviewed (www.reducingstereotypethreat.org). Search process Launched on November 28, 2007, the web site was developed by Steve Stroessner (Columbia Univer A systematic search was conducted to identify sity) and Catherine Good (Baruch College), but is empirical studies of classroom-based social- now maintained solely by Stroessner. Until June 26, psychological interventions designed to reduce ste 2008, it was updated monthly or bimonthly. Scan reotype threat and thus to improve the academic ning the web site reference list resulted in an ad performance of Black students. ditional 131 citations, for a total of 289 references.

The broadest search used the Education Resources Screening Information Center (ERIC) and the search term “stereotype threat,” resulting in 44 citations. The references were screened twice, first for Subsequently, narrower search term combinations, content relevance and then for intervention and such as “stereotype threat” and “intervention,” sample relevance (see appendix C for the six and “achievement gap” and “intervention,” were screening criteria). used to search several bibliographic databases. To identify new literature, PsycInfo was used to Initial screening of references. Citation informa search on “stereotype threat” and “social identity tion from these 289 references was entered into an

Table b1 search results number of references Search engine or web site database Search terms identified eric stereotype threat 44 ebScohost psycinfo achievement gap and intervention 0 ebScohost academic Search premier achievement gap and intervention 0 Wilson Web education index racial achievement gap and intervention 0 ebScohost psycinfo stereotype threat and intervention 3 ebScohost academic Search premier stereotype threat and intervention 1 Wilson Web education index stereotype threat and intervention 0 ebScohost eric stereotype threat and intervention 2 first Search dissertation abstracts stereotype and threat 108 first Search dissertation abstracts stereotype threat and intervention 0 first Search dissertation abstracts racial achievement gap and intervention 0 www.reducingstereotypethreat.com na na 131 Total references 289

na is not applicable. Source: Authors’ compilation. appendix b. meThodology 21

internal tracking database for documenting dispo figure b1 sition. These references were first screened for in first- and second-level screening and assessment clusion using three questions on content relevance of the quality of studies (see article screening protocol in appendix C): Search process = 289 citations • Is the article on topic? Included because Excluded because they were on topic they were oﬀ topic • Is the citation an empirical study? (202 of 289) (87 of 289)

• Does the study focus on race-based stereotype Excluded because threat? they were literature Included because they reviews, book chapters, were empirical studies summary articles, or If the title or abstract did not provide enough (182 of 202) nonempirical studies information about the study, the full article was (20 of 202) reviewed for relevance. Table B2 and figure B1 Round 1: initial screening Included because they Excluded because they show the disposition of references. focused on race-based focused on gender- stereotype threat based stereotype threat Applying the first set of three criteria in the article (75 of 182) (107 of 182) screening protocol led to 214 exclusions: References remaining after initial screening (75) • 87 references, as off-topic or irrelevant.

Excluded because they did not • 20 references, which were literature reviews, Included because they study interventions to reduce studied interventions to stereotype threat—most book chapters, or summary articles—not were studies trying to induce reduce stereotype threat stereotype threat to empirical studies. (10 of 75) demonstrate that the phenomenon exists (65 of 75) • 107 references, which focused on gender-based Included because the Excluded because the stereotype threat (conditions under which sample included sample did not include women perform worse than men on math Black students Black students (8 of 10) (2 of 10) tests) rather than race-based stereotype threat.

Round 2: second-level screening screening Round 2: second-level Included because Excluded because K–12 Second-level screening of relevant references. The K–12 students were students were not the remaining 75 references were subject to a second the focus focus—college students (3 of 8) were (5 of 8) round of screening to determine whether the stud ies met the following criteria: References remaining after second-level screening (3) • Examined the effect of a social-psychological intervention (relevant to reducing the in tensity of the psychological experience of stereotype threat) on improvements to student Blackwell, Good, Cohen, Garcia, Trzesniewski, Aronson, and Apfel, and academic performance. and Dweck Inzlicht Master (2007) (2003) (2006) • Included Black students in the sample.

• Included K–12 students as the focus (not college students fulfilling requirements to Results: Three studies included in the report What Works Clearinghouse criteria Clearinghouse criteria Works What participate in experiments). applying process review Eligible for 22 reducing STereoType ThreaT in claSSroomS 1 0 0 0 2 0 0 0 0 0 after Total Total included screening (conTinued) (conTinued) references -– setting/ non K 12 classroom student focus

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 2 0 black Sample students students excludes round 2 5 0 0 0 0 0 0 0 0 36 not an to reduceto intervention stereotype threat 0 0 0 1 1 0 0 0 18 32 gender threat only 2 10 round 1

review) review) (chapter/ literature stereotype nonempirical 10 35 off topic topic 0 0 0 0 0 0 0 0 3 0 0 1 0 0 0 0 0 2 1 0 1 0 0 0 44 108 number identifi ed identifi achievement gap and intervention intervention achievement gap and intervention stereotype threat and intervention intervention threat and intervention stereotype threat and intervention and threat stereotype threat threat and intervention Search terms of studies psycinfo academic Search achievement premier gap and education index racial psycinfo academic Search stereotype premier threat and education index stereotype eric dissertation stereotype abstracts dissertation stereotype abstracts database

Search engine ebScohost ebScohost Wilson Web ebScohost ebScohost Wilson Web ebScohost first Search eric first Search Table b2 Table disposition of references appendix b. meThodology 23

This second round of screening excluded 72 stud ies (see table B2). 0 0 na after Total Total included eferences screening r The majority of the studies (65) were excluded for not meeting the first criterion. The studies explored various aspects of the negative impact of -– stereotype threat on Black students. They did not assroom setting/ non K 12 cl

student focus test a social-psychological intervention aimed at

improving Black student performance by reducing 2 5 0 0 1 3 stereotype threat or mitigating its effects. black xcludes Sample students students e round 2 Two studies were excluded because they did not include Black students. Studies that included Black students as part of their sample were retained. No 0 24 65 not an specific percentage of the sample was stipulated to reduceto ntervention i as having to be Black students. (Also, no criterion stereotype threat was specified for sufficient representation of Black students for analyses of outcomes by ethnicity.) 0 55 107 gender tereotype s threat only Of the three studies that remained after screen ing, only one study (Cohen et al. 2006) specifi cally analyzed race as a factor. In the Blackwell, 7 20 round 1

Trzesniewski, and Dweck (2007) study the review) review) chapter/ ( literature students were from a large urban school district, nonempirical and all were minority (52 percent were Black and 0 0 41 87 45 percent were Hispanic). In the Good, Aronson, off opic opic t and Inzlicht (2003) study, the students were from a rural district in Texas with 70 percent eligible 0 for free or reduced-price lunch (67 percent were 131 f studies number identifi ed identifi

o Hispanic, 13 percent were Black, and 20 percent were White). The researchers noted that previous research had demonstrated stereotype threat ef

na na fects for Black, Hispanic, and low-income students rms chievement and argued that, for this reason, “all of the partici a gap and intervention Search te pants in the sample were potentially susceptible to stereotype threat” (p. 652). In the Cohen, Garcia, na na na Apfel, and Master (2006) study, participants were

issertation racial from a suburban northeastern middle school abstracts d database with a student population equally split between Black and White students. Whereas the other two studies were conducted in socioeconomically disadvantaged settings, this study was conducted in a suburban area. However, race (Black or White) was used as a factor in the analyses (119 Black Authors’ compilation. and 124 White students participating). Interest g ducingstereotypethreat. Search engine or first Search www. re Total references excludedTotal ingly, all three included studies focused on grade 7 na is not applicable. Source: Table b2 (conTinued) b2 Table disposition of references students. 24 reducing STereoType ThreaT in claSSroomS

Five studies were excluded because they did not (in appendix C) were reviewed first by a Regional include K–12 students as their focus. Though the Educational Laboratory (REL) Southeast researcher studies examined the impact of an intervention on using a study quality review protocol (see appendix improving Black student performance, the sample C). The researcher adapted the items on the proto was college students in laboratory settings, not col from one used by REL Central, which provided K–12 students. Thus, these studies lacked external the researcher with background knowledge about validity. Although a common practice in certain the meaning of each item. The completed protocols disciplines, it is difficult to generalize results from for each study and the study articles were then studies conducted with college students to other examined by an external reviewer trained in What populations, especially to populations that are Works Clearinghouse (WWC) criteria. significantly younger. Development of study quality review protocol. Verification search Researchers for this study obtained a copy of a cod ing protocol that REL Central had developed using Because of the small number of studies identified the WWC evidence standards (U.S. Department for inclusion, a second broader, verification search of Education 2008) to code studies included in the was conducted to catch any relevant studies that report Using strategy instruction to help struggling might have been missed in the focused search of high schoolers understand what they read (Apthorp the databases. This verification search used the and Clark 2007). This coding protocol included broadest search term of “stereotype threat” without criteria that WWC indicates are important, such the word “intervention,” searching the literature as adequacy of outcome measure, equivalence of using the terms “stereotype threat,” “stereotype,” groups at baseline, extent of overall and differential and “threat.” The EBSCO host search engine was group attrition, intervention contamination, and used to search the ERIC, PsycINFO, Academic confounding of teacher and intervention. Also Search Premier, and Soc Index with Full Text included were descriptive items to summarize each databases. Also, the Education Index database study, such as independent and dependent variable was searched using Wilson Web, and the Disserta description, summary of analysis and results, and tions Abstracts database was searched using First an overall narrative summary of the study. Search. The entire text of identified documents was searched, not just keywords or title. The only The REL Central coding protocol was simplified limit placed on the search was the publication year, for this study, as the intention was to describe any which was set at between 1990 and 2007 (as the limitations in the methodology of the three studies concept of stereotype threat emerged in the 1990s). based on an interpretation of WWC standards and the researchers’ understanding of good sci This search identified 741 references. Reviews of ence, rather than to conduct a WWC-level review. the titles and abstracts turned up no additional The REL Southeast staff member who developed studies appropriate for inclusion. The reasons for the protocol and who has experience in research exclusion were as follows: 74 percent were off-topic, design used the study quality review protocol to 14 percent were not empirical, and 12 percent were gather information from each study on items in the on-topic but did not test an intervention, occur in protocol: adequacy of outcome measure, random K–12 classrooms, or include Black students. assignment process, overall attrition, differential group attrition, intervention contamination, and Review process: identifying methodological confounding factors. A section was not included limitations of included studies on items related to assessing the quality of quasi- experimental designs in the protocol since all three The three studies identified as meeting the six identified studies used an experimental design. inclusion criteria in the article screening protocol The completed coding protocol on each study was appendix b. meThodology 25 reviewed by the external reviewer, who raised Because these new factors could affect group questions for clarification with the third researcher outcomes, they also could affect the conclu from REL Southeast and the initial coder. sions of the experiment. An example is a teacher in an intervention group sharing the Assessing the quality of identified intervention intervention materials with a teacher in a studies. The three studies were subject to a final control group. One study was noted as having quality review to describe any methodological a possible limitation in this area (Good, Aron limitations, using a study coding protocol (see ap son, and Inzlicht 2003). pendix C) based on the five criteria below from the What Works Clearinghouse Procedures and Stan • Confounding factor. It is important to examine dards Handbook (U.S. Department of Education factors beyond the intervention that might affect 2008) for assessing the internal validity of studies differences between groups, such as the effects examining the effects of interventions: of teachers or of the intervention provider more generally. For example, if each condition of the • Outcome measures. The measures used to assess study involves only one teacher’s classroom, then impact must be shown to actually measure what the effects of the teacher cannot be separated they are intended to measure. For studies in from the effects of the intervention. No studies school settings, common academic achievement were noted as having problems in this area. measures include state- or locally mandated tests and course performance (term grades). The Methodological review. The methodological three studies reported on here used such school limitations reported for each study were identi measures of student achievement. fied through this process. The results of the study quality review process are shown in the individual • Random assignment process. In experimental descriptions of each study below and summarized studies researchers use random assignment to in table B3. Table B4 summarizes the methodology assign participants to experimental condi of the three studies. tions (intervention or control) to ensure that the groups are as similar as possible on all Blackwell, Trzesniewski, and Dweck (2007). No characteristics so that the outcomes measured limitations were noted in applying the quality reflect the influence of the intervention only. review criteria to this study: All three of the studies reported on their random assignment process, so any threats to • Random assignment process. Students were random assignment could be identified. Only randomly assigned by the school to regularly one study had a limitation in this area (Good, scheduled advisory classes (groups of 12–14). Aronson, and Inzlicht 2003). Each pre-existing advisory group was as signed by the research team to an intervention • Attrition of participants. Loss of participants or control condition. The researchers reported can create differences in measured outcomes by baseline equivalence data: fall term math changing the composition of the intervention grades for the students were not significantly or control groups. Both overall attrition and different for the two groups (2.38 for the inter differential attrition (differences between inter vention group and 2.41 for the control group). vention and control groups) are of concern. All three studies were acceptable in this area. • Attrition. The attrition rate (students who did not complete the eight-week sessions) was • Intervention contamination. Intervention 5 percent and roughly equivalent for both contamination can happen when unintended groups (three from the intervention group and events occur after intervention begins. two from the control group). 26 reducing STereoType ThreaT in claSSroomS

Table b3 Quality of final studies included in report adequacy of random assign - differential intervention confounding outcome measure ment process overall attrition attrition contamination factor Study 1 2 3 1 2 3 1 2 3 1 2 3 1 2 1 2 blackwell, Trzesniewski, ✔ ✔ ✔ ✔ ✔ ✔ and dweck (2007) good, aronson, ✔ ✔ ✔ ✔ ✔ ✔ and inzlicht (2003) cohen, garcia, apfel, ✔ ✔ ✔ ✔ ✔ ✔ and master (2006)

Note: 1 = acceptable; 2 = acceptable with reservations; 3 = not acceptable. Source: Authors’ compilation.

• Intervention contamination. There was no achievement. Although the authors reported reporting of any events during the eight that the six excluded students did not come weekly 25-minute periods that might differ- from any particular condition, it is difficult entially affect the two groups. Each advisory to know how well the random assignment group was assigned to a condition, making it process worked in creating equivalent groups less likely students would share information at baseline. Therefore, results showing differ- across conditions. ences between experimental conditions after the intervention should be interpreted with • Confounding factors. The study used un caution. dergraduate assistants to deliver the eight sessions, assigning two undergraduates as • Intervention contamination. The same men- workshop leaders for each advisory class. Dif tors provided the intervention to students ferent workshop leaders were assigned to each in three of the four experimental condi advisory class. Student participants all had the tions, so the intervention conditions could same math teacher during the study period, have been somewhat blurred if the mentors so differences in math teachers could not have brought knowledge from one condition to influenced differences in math grades between their delivery of another. In addition, students the intervention and control students. were all in the same class so they could have discussed or shared their experiences across Good, Aronson, and Inzlicht (2003). Two limita the experimental conditions. Such a problem tions were noted in applying the study quality would work against finding a significant review criteria that might limit the confidence in difference between the control group and the the results of this study. other experimental conditions, thus, perhaps strengthening confidence in the intervention • Random assignment process. Six of the 138 condition effects where found. (Under WWC students’ scores were removed from the analy review standards, contamination such as oc sis. In addition, evidence was not presented on curred in this study is not considered grounds the equivalence of the four groups on baseline for downgrading a study.) appendix b. meThodology 27

No limitations were found relative to the attrition • Attrition. Individual student attrition (ab or confounding factors. sences, missing data, experimenter error) was four students for study 1 (roughly 3 percent • Attrition. Roughly 4 percent of students were attrition), leaving 111 students in the final excluded from the reading test analysis based sample, and seven students from study 2 on an outlier analysis intended to identify (roughly 5 percent), leaving 132 students in students whose test score results represented the final sample. There was no differential at- very limited English speaking skills. This trition as a function of condition, as indicated attrition rate is less than the 20 percent level by the authors in a subsequent correspon determined as significant attrition. The attri dence; baseline covariates were used in the tion was reported as occurring equivalently analysis. across groups. • Intervention contamination. There was no • Confounding factors. The participating stu reporting of events or circumstances that dents were all part of one class, but the teacher might have contributed to contamination. The did not provide the intervention. Students experiment was double-blind, so the teach- were randomly assigned to one of four condi ers did not know what condition the stu tions and also randomly assigned to a mentor dents were assigned to, nor did the students. who provided their condition. Additionally, neither group was aware of the experimental hypothesis, and students were Cohen et al. (2006). No limitations were noted in unaware of the intervention. applying the quality review criteria to this study, as summarized below. • Confounding factors. Students were the unit of analysis for the study and were randomly • Random assignment process. The article assigned to the two conditions in approxi reported on two randomized, double-blind mately equal numbers for each of the three experiments of an affirmation intervention. teachers. Because fall grades in the targeted Students in three teachers’ classrooms were course were the outcome measure and teach- involved. Random assignment to either the ers may grade differently, the regression affirmation intervention or control condi analysis included a teacher variable (dummy tion was at the level of the individual student. codes for the three teachers), a main effect For each teacher/classroom period, there of baseline in-class performance measures, were about equal numbers of students in the and two terms representing the interaction two conditions. Baseline measures for each of baseline in-class performance with each of student (standardized measure of pre-inter the two teacher dummy variables to control vention in-class performance, prior year grade for teacher differences in the predictiveness point average in core courses, and pre-inter of early in-class performance. Thus, teacher vention test score) were collected and used in effects were addressed and did not threaten the analysis as potential covariates. internal validity. 28 reducing STereoType ThreaT in claSSroomS

= 0.47), d = 0.47), (conTinued) (conTinued)

cessfully communicated. indicating that the theory of intelligence message was suc ligence questionnaire than did the control group and scored icantly higher on the mea sign fi sure after the intervention than did the control group ( of the intervention group as showing positive motiva tional change, compared with 9 percent in the control group, a ff ierence. icant signd fi intervention (Time Time 2 to 3; Thus, the p < .05). t = 2.93, b = 53, sample a whole as was decreas ing in grades, but decline this was eliminated for those in the experimental condition. The eredby the decline in grades suff control-group students mirrors that commonly observed over the junior high school transition” 257). (p. group showed a sign fi icantly group showed a sign fi greater change from pretest to posttest on the theory of intel of experimental condition on change in grades across the

findings findings 1. 1. Students in the intervention 2. 27 percent ed identifi Teachers 3. ect icant eff “There was a sign fi modeling modeling

analysis methods 1. 1. analysis of variance 2. chi-square test 3. 3. hierarchical linear scores in math 3) (time classroom motivation spring term intelligence questionnaire

measures and outcomes 1. 1. Theory of 2. changes in 3. grade 7, design the malleability of information on group sessions did not include the intervention and control.bothgroups received eight but the control random assign ment advi by sory class periods weeks of sessions, intelligence. to twoto conditions: hispanic. 45 percent45 were black. 52 percent were 52 Sample Sample • tal condition and in the control 43 condition): • 91 students91 in grade 7 in a large urban school district (48 in the experimen description intervention materials on the brain sions,but they received in the control condition also received eight ses reading, activities, and discussions related intelligence. Students of the sessions rather than the intervention materials on how to grow intelligence. students engaged in Students partici pated in eight weekly 25-minute sessions. theto malleability of and memory for four during the sessions

Study (Study 2) 2) (Study blackwell, Trzesniewski, and dweck (2007) (2007) Table b4 Table methodological summary of the three experimental studies reviewed appendix b. meThodology 29 (conTinued) (conTinued)

= 1.13), attribu d = 1.13),

= 1.50), or combined d = 1.50),

= 0.52) and attributionald = 0.52)

= 1.30) condition achievedd = 1.30) d = 0.64). = 0.71) conditions achieved d = 0.71) vention procedures meaning sizes indicating that the inter n fi ect of gender, icant main eff n fi fi icant gender- signby a ed qualifi condition interaction; girls in the incremental ( tional ( ( icantly higher math scores sign fi than girls in the control condi ect these are largetion. eff “all fully increased females’ math scores compared the to control marginally condition” 656). (p. icant improvement was sign fi found for boys in the incremen tal condition compared with boys in the control condition ( a sign fi ectcondition. of icant eff a sign fi planned comparisons indicated that students in the incremen tal ( ( icantly higher reading sign fi scores than students in the con trol condition.

findings findings 1. 1. math results indicated a sig 2. reading. revealed The anova

(gender) 4 by (experimental con analysisdition) of variance (anova) submitted a to was conducted fol lowed planned by comparisons. test scores were ier examineto d ff one-way anova comparing perfor mance across the four conditions fol lowed planned by comparisons. yses, the sample size was too small ences between ethnic groups.

analysis methods 1. 1. for math, a 2 2. for reading, 3. in both sets of anal measures and outcomes ment of academ mathematics assess(Texas achievement scores on state wide standard ized tests of reading and ic Skills) ic Skills) design blind to the specifi c blind the to specifi The mentors were of diffi culty, combi culty, of diffi incremental intel ligence, attribution hypotheses of the study. random assign ment of students in the computer skills course one to of four conditions: nation,andantidrug (control). hispanic. sample were 67 percent of 67 percent were 13 percent20 were White. black. Sample Sample free or reduced- stereotype threat students, girls in math, and students Texas, 70 percent 70 Texas, of them eligible for • • • The researchers note that previous research demon strated negative students, hispanic from low-income households. Thus, of the partici“all pants in the sample were potentially susceptible to stereotype threat” 652). (p. 138 grade 7 138 students who were enrolled in a computer skills class in a rural district in price lunch: ects on perfor eff mance for black description intervention tions that were hypoth school year through were three interven dangers of drugs (a who taught them mes sages throughout the esized help to students deal with stereotype taught that intelligence malleable,is school culty is normal, or diffi control group students taught them about the threat: students were Studentswere random ly assigned mentors (students from a local college trained de to liver the intervention) personal and email communication. There both. mentors for the topic unrelated the to interventions). Study good, aronson, and inzlicht (2003) Table b4 (conTinued)Table methodological summary of the three experimental studies reviewed 30 reducing STereoType ThreaT in claSSroomS the self-affi rmation condition the self-affi earned a higher grade in the targeted course than students in the control group (average ierence between intervention d ff and control group grades was ierence 0.30 grade no point). d ff was found between the condi tions for White students. The race-condition interaction was icant for both studies. sign fi ies showed that black students participating in the intervention were less likely receive to a d or below in the percent) course (9 than were students in the con trol conditionpercent). (20

findings findings 1. 1. black students participating in 2. combining data from both stud

analysis methods 1. 1. multiple regression measures and outcomes age in the tar geted academic course. The intervention was delivered early in the fall semester in the targeted course. The tar geted course was the same subject area for all ex fall semesterfall grade point aver perimental and control students. design to oneto of two condi rma tions: self-affi tion treatment or control condition, resulting in four racecells: (black or 2 condi by White) rmation tions (affi or control writing assignment) Students were randomly assigned 119 were black. 119 124 were White. 124 Sample Sample on (study 2 was a suburban north eastern middle school with a student body split evenly between black and White students. students 243 par ticipated in the two studies reported participants were seventh graders in a replication of study ierent 1 with d ff students): • • The treatment group had 119 students, and the control group had students. 124 description intervention cise in which students completed an in-class writing assignment early in the school writing aboutyear, an important value they held. The writing ex ercise (provided once in study 1 and twice in the replication study 2) was provided without The intervention was rmationexer a self-affi teacher instruction. Students read instruc tions provided in an the instructions, placed envelope, completed their work in the envelope and returned in the it the to teacher. rmation treatment, affi students were asked to select the most impor tant personal value (or fromvalues) a list and asked write to a para graph about the why value was important them.to control group students were asked to select the least impor tant value and write to about the why value might be important to someone else. Authors’ compilation. Study and master (2006) cohen, garcia, apfel, Table b4 (conTinued)Table methodological summary of the three experimental studies reviewed Source: appendix c. arTicle Screening and STudy qualiTy revieW proTocolS 31

appendix c Relevance screen summary. In order for the study to be aRTicle scReening and sTudy included, it must have passed all relevance screens (content QualiTy RevieW pRoTocols relevance, intervention-type relevance, and sample rel evance). If “yes” for all six items above, the study is eligible Article screening protocol for inclusion in the report if it is judged of sufficient meth odological quality applying What Works Clearinghouse– Coder (name and date): ______based criteria.

APA style citation: ______Study quality review protocol ______Study information and outcome measure Initial-level screening for relevance 1. Study information Content relevance a. Reference citation (author and publication year): 1. Is the article on topic? ______■ Yes ■ No (exclude) b. Title: ______

2. Is the citation an empirical study (not a literature c. Source: review, book chapter, conceptual paper, etc.)? ■ Dissertation ■ Yes ■ Conference presentation ■ No (exclude) ■ Technical report ■ Book or book chapter 3. Is the study focused on race-based stereotype threat (as ■ Journal (specify name): ______opposed to gender or other type of stereotype threat)? ■ Yes 2. Adequacy of outcome measure. List the outcome ■ No (exclude) measures and the validity and reliability evidence as outlined below. Second-level screening for relevance Examples of validity evidence includes test or measure Intervention-type relevance has correlations from studies of concurrent validity, predictive validity, factor analysis; measure is in estab 4. Does the study investigate interventions aimed at lished use as an academic achievement indicator (for reducing stereotype threat? example, a state-developed standardized test adminis ■ Yes tered as part of an annual student testing program or ■ No (exclude) course grades on official transcripts) and thus has face validity as a reliable measure of student achievement. Sample relevance Examples of reliability evidence include internal con sistency, test-retest, or, if measure requires judgment, 5. Does the study include African American students? interrater reliability. ■ Yes ■ No (exclude) Quality review criteria

6. Does the study focus on K–12 students? 3. Random assignment process. In looking at informa ■ Yes tion included in the study on the random assignment ■ No (exclude) process . . . (check one) 32 reducing STereoType ThreaT in claSSroomS

■ There were no reported disruptions of or con ■ There was no information on attrition provided, taminations in random assignment process, and degrees of freedom do not provide adequate and/or baseline equivalence was checked. ➔ information. ➔ (Not acceptable) (Acceptable) 5. Differential sample attrition. In looking at information ■ There was evidence of disruptions or contamina included in the study on sample attrition . . . (check one) tions in random assignment process, but they were minor and/or pre-test differences were checked ■ There was no significant attrition differential and were nonsignificant➔ (Acceptable with between intervention and comparison groups reservations) (<7 percent). ➔ (Acceptable)

■ There was evidence of disruptions or contamina ■ There was significant attrition differential (>7 per tions in random assignment process, and pretest cent), but group comparability was demonstrated. differences were not checked or were checked, ➔ (Acceptable) were significant, and were not corrected statisti cally. ➔ (Not acceptable) ■ There was significant attrition differential (>7 per cent), and group comparability was not demon 4. Attrition. In looking at information included in the strated. ➔ (Acceptable with reservations) study on attrition . . . (check one) ■ There was no information on attrition differential ■ There was no significant attrition (<20 percent provided, but degrees of freedom provide adequate overall). ➔ (Acceptable) information and indicate no significant attrition (<7 percent). ➔ (Acceptable) ■ There was significant attrition (>20 percent overall), but postattrition equivalence was demon ■ There was no information on attrition differential strated. ➔ (Acceptable) provided, but degrees of freedom provide informa tion that indicate significant attrition (>7 percent); ■ There was significant attrition (>20 percent over however, group comparability was demonstrated. all), and postattrition equivalence was not demon ➔ (Acceptable) strated. ➔ (Acceptable with reservations) ■ There was no information on attrition differential ■ There was no information on attrition provided, provided, but degrees of freedom provide informa but degrees of freedom provide adequate informa tion that indicate significant attrition (>7 percent); tion and indicate no significant attrition (<20 per group comparability was not demonstrated. ➔ cent overall). ➔ (Acceptable) (Acceptable with reservations)

■ There was no information on attrition provided, ■ There was no information on attrition differential but degrees of freedom provide information that provided, and degrees of freedom do not provide indicates significant attrition (>20 percent overall), adequate information. ➔ (Not acceptable) and postattrition equivalence was demonstrated. ➔ (Acceptable) 6. Intervention contamination. Was there evidence of something happening after the beginning of the ■ There was no information on attrition provided, intervention that affects the outcomes for the interven but degrees of freedom provide information that tion or control group (affects the outcome of one of the indicate significant attrition (>20 percent overall), groups in an unexpected way)? (check one) and postattrition equivalence was not demon ■ No ➔ (Acceptable) strated. ➔ (Acceptable with reservations) ■ Yes ➔ (Acceptable with reservations) appendix c. arTicle Screening and STudy qualiTy revieW proTocolS 33

7. Confounding factor (teacher or other intervention b. Race/ethnicity of students included (check all that delivery agent confounds). In looking at information apply): included in the study on the assignment or role of ■ African American/Black teachers or other intervention delivery agents (e.g., ■ Asian/Pacific Islander mentors), was there any situation in which one teacher ■ Hispanic/Latino or other intervention delivery agent was assigned to ■ White just one experimental condition? (check one) ■ Other ■ No ➔ (Acceptable) ■ Multiracial ■ Yes ➔ (Not acceptable) c. Percentage of students receiving free or reduced- 8. Summary of randomized controlled trial study quality price lunch: ______review criteria. Look back at your answers for ques tions 2–7 and enter the results in the following box: d. Grade level (check all that apply): ■ K–3 acceptable ■ 3–5 with not ■ criteria acceptable reservations acceptable 6–8 ■ q2. adequacy 9–12 of outcome measure e. Age (mean and/or minimum–maximum): _____ q random3. assignment f. Total sample size: ______process q overall4. attrition g. Achievement outcome variable (if more than one treatment group, specify in parentheses which q differential5. treatment group you are placing in each column) sample attrition achievement q intervention6. outcome outcome variable Standard score mean contamination outcome 1 Treatment 1 (specify) (specify) q confounding7. Treatment 2 (specify) factor Treatment 3 (specify)

control Overall study description. The following items are used to ensure that consistent information was gathered about each overall study. outcome 2 Treatment 1 (specify) (specify) 9. Study population sample. Treatment 2 (specify)

a. School district/local: Treatment 3 (specify) ■ Urban control ■ Suburban ■ Rural overall ■ Missing 34 reducing STereoType ThreaT in claSSroomS

Independent variable/intervention 12. Results. Please fill in the following table for each out come included in the study 10. Intervention description page outcome Statistic notes numbers a. Briefly describe the intervention, including the outcome 1 mean/count/ stated purpose and any required special condi (specify) proportions tions or resources: (include both treatment and control statistics) Sample size (include both treatment and control statistics) b. Subject area (check all that apply): Standard ■ English/language arts deviation ■ Math (include both ■ treatment and Social studies control statistics) ■ Science Test statistic ■ Other (specify course title) Were group ______differences statistically c. Duration of intervention: ______significant? (provide p-value) researcher Analysis and results. reported effect size 11. Unit of assignment and analysis match. (including type, if available) a. Was there a match between unit of assignment outcome 2 mean/count/ (specify) proportions and analysis? (Check one) (include both ■ Matched, both were students treatment and ■ Matched, both were teachers control statistics) ■ Matched, both were schools Sample size ■ Not matched, not addressed in analyses, (include both treatment and group differences not statistically significant control statistics) ■ Not matched, not addressed in analyses, Standard group differences statistically significant deviation (include ■ Not matched, but addressed in analyses both treatment Explain: ______and control statistics) ______Test statistic ______Were groups differences b. Was an effect size reported? statistically ■ No significant ■ Yes (specify pages) ______(provide p-value)? researcher reported effect size (including type, if available) referenceS 35

RefeRences Blackwell, L., Trzesniewski, K., and Dweck, C.S. (2007). Implicit theories of intelligence predict achievement Apthorp, H., and Clark, T. (2007). Using strategy instruc across an adolescent transition: a longitudinal study tion to help struggling high schoolers understand what and an intervention. Child Development, 78(1), 246–63. they read (Issues & Answers Report, REL 2007–No. 038). Washington, DC: U.S. Department of Educa Bowen, W.G., and Bok, D. (1998). The shape of the river: tion, Institute of Education Sciences, National Center Long-term consequences of considering race in college for Education Evaluation and Regional Assistance, and university admissions. Princeton, NJ: Princeton Regional Educational Laboratory Central. University Press.

Aronson, J. (2002). Stereotype threat: contending and cop Brown, R.P., and Lee, M.N. (2005). Stigma consciousness ing with unnerving expectations. In J. Aronson (Ed.), and the race gap in college academic achievement. Self Improving academic achievement: impact of psycho & Identity, 4, 149–57. logical factors on education. San Diego, CA: Academic Press. Chiu, C.Y., Hong, Y.Y., and Dweck, C.S. (1997). Lay disposi tionism and implicit theories of personality. Journal of Aronson, J., Fried, C., and Good, C. (2002). Reducing the ef Personality and Social Psychology, 73(1), 19–30. fects of stereotype threat on African American college students by shaping theories of intelligence. Journal of Cohen, G.L., Garcia, J., Apfel, N., and Master, A. (2006). Experimental Social Psychology, 38, 113–25. Reducing the racial achievement gap: a social- psychological intervention. Science Magazine, Aronson, J., and Good, C. (2002). The development and 313(5791), 1307–10. consequences of stereotype vulnerability in adoles cents. In F. Pajares and T. Urdan (Eds.), Adolescence Cohen, G.L., Steele, C.M., and Ross, L.D. (1999). The men and education. New York: Information Age Publishing. tor’s dilemma: providing critical feedback across the racial divide. Personality and Social Psychology Bul Aronson, J., and Inzlicht, M. (2004). The ups and downs letin, 25(10), 1302–18. of attributional ambiguity: stereotype vulnerability and the academic self-knowledge of African American Creswell, J.D., Welch, W.T, Taylor, S.E., Sherman, D.K, students. Psychological Science, 15(12), 829–36. Gruenewald, T.L., and Mann T. (2005). Affirmation of personal values buffers neuroendocrine and psycho Aronson, J., Lustina, M.J., Good, C., Keough, K., Steele, logical stress responses. Psychological Science, 16(11), C.M., and Brown, J. (1999). When white men can’t do 846–51. math: necessary and sufficient factors in stereotype threat. Journal of Experimental Social Psychology, 35(1), Croizet, J., and Claire, T. (1998). Extending the concept of 29–46. stereotype threat to social class: the intellectual un derperformance of students from low socioeconomic Aronson, J., and Steele, C.M. (2005). Stereotypes and the backgrounds. Personality and Social Psychology Bul fragility of human competence, motivation, and self- letin, 24(6), 588–94. concept. In C. Dweck and E. Elliot (Eds.), Handbook of competence and motivation. New York: Guilford. Dweck, C.S. (1999). Caution—praise can be dangerous. American Educator, 23(1), 4–9. Beilock, S.L., Jellison, W.A., McConnell, A.R., and Carr, T.H. (2006). On the causal mechanisms of stereotype Good, C., Aronson, J., and Inzlicht, M. (2003). Improv threat: can skills that don’t rely heavily on working ing adolescents’ standardized test performance: an memory still be threatened? Personality and Social intervention to reduce the effects of stereotype threat. Psychology Bulletin, 32(8), 1059–71. 36 reducing STereoType ThreaT in claSSroomS

Journal of Applied Developmental Psychology, 24(6), Rosenthal, H., and Crisp. R.J. (2006). Reducing stereotype 645–62. threat by blurring intergroup boundaries. Personality and Social Psychology Bulletin, 32(4), 501–11. Hacker, A. (1995). Two nations: Black and White, separate, hostile, unequal. New York: Ballantine. Rothstein, R. (2002). Classes and schools: using social, eco nomic, and educational reform to close the Black-White Inzlicht, M., and Ben-Zeev, T. (2000). A threatening intel achievement gap. Washington, DC: Economic Policy lectual environment: why females are susceptible to Institute. experiencing problem-solving deficits in the presence of males. Psychological Science, 11, 365–71. Ryan, K.E., and Ryan, A.M. (2005). Psychological processes underlying stereotype threat and standardized math test Jencks, C., and Phillips, M. (1998). The Black-White test performance. Educational Psychologist, 40(1), 53–63. score gap. Washington, DC: The Brookings Institution. Schmader, T., and Johns, M. (2003). Converging evidence that Johns, M., Schmader, T., and Martens, A. (2005). Knowing stereotype threat reduces working memory capacity. Jour is half the battle: teaching stereotype threat as a means nal of Personality and Social Psychology, 85(3), 440–52. of improving women’s math performance. Psychologi cal Science, 16(3), 175–79. Shapiro, J.R., and Neuberg, S.L. (2007). From stereotype threat to stereotype threats: implications of a multi- KewalRamani, A., Gilbertson, L., Fox, M., and Provasnik, threat framework for causes, moderators, mediators, S. (2007). Status and trends in the education of racial consequences, and interventions. Personality and and ethnic minorities (NCES 2007-039). Washing Social Psychology Review, 11(2), 107–30. ton, DC: U.S. Department of Education, Institute of Education Sciences, National Center of Education Southern Regional Education Board. (2003). ACT and Statistics. SAT scores in the south: the challenge to lead. Re trieved December 10, 2007, from www.sreb.org/main/ Marx, D.M., and Roman, J.S. (2002). Female role models: highschools/college/ACT-SAT_2003_.pdf. protecting women’s math test performance. Personality and Social Psychology Bulletin, 28(9), 1183–93. Spencer, S.J., Steele, C.M., and Quinn, D.M. (1999). Stereo type threat and women’s math performance. Journal of McGlone, M., and Aronson, J. (2006). Social identity Experimental Social Psychology, 35(1), 4–28. salience and stereotype threat. Journal of Applied Developmental Psychology, 27(5), 486–93. Steele, C.M. (1988). The psychology of self-affirmation: sustaining the integrity of the self. Advances in Experi McGlone, M.S., and Aronson, J. (2007). Forewarning and mental Social Psychology, 21, 261–302. forearming stereotype-threatened students. Communi cation Education, 56(2), 119–33. Steele, C.M. (1997). A threat in the air: how stereotypes shape the intellectual identities and performance of Mendoza-Denton, R., Downey, G., Purdie, V.J., Davis, A., women and African Americans. American Psycholo and Pietrzak, J. (2002). Sensitivity to status-based gist, 52(6), 613–29. rejection: implications for African American students’ college experience. Journal of Personality and Social Steele, C.M., and Aronson, J. (1995). Stereotype threat and the Psychology, 83(44), 896–918. intellectual test performance of African Americans. Jour nal of Personality and Social Psychology, 69(5), 797–811. Quinn, D.M., and Spencer, S.J. (2001). The interference of stereotype threat with women’s generation of math Steele, C.M., Spencer, S., and Aronson, J. (2002). Contend ematical problem-solving strategies. Journal of Social ing with group image: the psychology of stereotype and Issues, 57(1), 55–71. social identity threat. In M. Zanna (Ed.), Advances in referenceS 37

experimental social psychology Vol. 37. San Diego, CA: Walton, G.M., and Cohen, G.L. (2003). Stereotype lift. Academic Press. Journal of Experimental Social Psychology, 39(5), 456–67. Stone, J., Lynch, C.I., Sjomeling, M., and Darley, J.M. (1999). Stereotype threat effects on Black and White athletic Walton, G.M., and Cohen, G.L. (2007). A question of performance. Journal of Personality and Social Psychol belonging: race, social fit, and achievement. Journal of ogy, 77(6), 1213–27. Personality and Social Psychology, 92(1), 82–96.

U.S. Department of Education. 2008. WWC Procedures Wheeler, S.C., and Petty, R.E. (2001). The effects of ste and Standards Handbook. Version 2.0—December. reotype activation on behavior: a review of possible Washington, DC: U.S. Department of Education, mechanism. Psychological Bulletin, 127(6), 797–826. Institute of Education Sciences, What Works Clearing house. Retrieved December 10, 2007, from http:// Whimbey, A. (1975). Intelligence can be taught. New York: ies.ed.gov/ncee/wwc/references/idocviewer/doc. Dutton. aspx?docid=19&tocid=1. Wilson, T.D., and Linville, P.W. (1985). Improving the Wald, J., and Losen, D. (2005, May 17). Confronting the performance of college freshmen with attributional graduation rate crisis in the south. The Civil Rights techniques. Journal of Personality and Social Psychol Project, Harvard University, Cambridge, MA. ogy, 49, 287–93.