School, Studying, and Smarts:

Gender and Education Across 80 Years of American Print Media,

1930-2009*

Andrei Boutyline

Alina Arseniev-Koehler University of California, Los Angeles

Devin J. Cornell Duke University

May 22, 2020

* We thank Elizabeth Armstrong, Elizabeth Bruch, Erin Cech, Fabiana Silva, and Mark Mizruchi for reading the manuscript and providing insightful commentary. We also thank attendees at the University of Michigan Measuring and Modeling Culture workshop and the University of Chicago Computational Social Science workshop for their feedback. We are grateful to Yinan Wang and Hannah Simon for their help as undergraduate research assistants. We are also grateful to the Diverse Intelligences Summer Institute for bringing us together and providing the opportunity to begin this collaboration. Please direct all correspondence to Andrei Boutyline at Department of Sociology, University of Michigan, Room 3115 LSA Building, 500 S State St, Ann Arbor, MI 48109. Email: [email protected]

ABSTRACT

Gender stereotypes have important consequences for boys’ and girls’ academic outcomes. In this

article, we apply computational word embeddings to a 200-million-word corpus of American

print media (1930-2009) to examine how these stereotypes changed as women’s educational

attainment caught up with and eventually surpassed men’s. This transformation presents a rare

opportunity to observe how stereotypes change alongside the reversal of an important pattern of

stratification. We track six stereotypes that prior work has linked to academic outcomes. Our

results suggest that stereotypes of socio-behavioral skills and problem behaviors—attributes

closely tied to the core stereotypical distinction between women as communal and men as

agentic—remained unchanged. The other four stereotypes, however, became increasingly

gender-polarized: as women’s academic attainment increased, school and studying gained increasingly feminine associations, whereas both intelligence and unintelligence gained increasingly masculine ones. Unexpectedly, we observe that trends in the gender associations of intelligence and studying are near-perfect mirror opposites, suggesting that they may be connected. Overall, the changes we observe appear consistent with contemporary theoretical accounts of the gender system that argue that it persists partly because surface stereotypes shift

to reinterpret social change in terms of a durable hierarchical distinction between men and

women.

School, Studying and Smarts 2

School, Studying, and Smarts: Gender Stereotypes and Education

Across 80 Years of American Print Media, 1930-2009

A large body of recent work illustrates that the ways that girls and boys enact and enforce gender

stereotypes powerfully shapes their academic outcomes. For example, ethnographic and survey

evidence illustrates how putting effort into schoolwork or displaying an affinity for school

appears to often violate boys’ norms of masculinity while fitting girls’ norms of femininity

(Legewie and DiPrete 2012; Morris 2008; Musto 2019). Meanwhile, girls avoid disciplines

which are perceived to require brilliance, such as physical sciences, engineering, and

mathematics (Leslie et al. 2015). Many social and behavioral skills required for academic

success like attentiveness, politeness and diligence also appear to be closely related to

communality, which is a core aspect of hegemonic feminine stereotypes, while problem

behaviors like disobedience and aggressiveness fit hegemonic masculine stereotypes (Jackson

and Dempster 2009; Marrs 2016; Ridgeway 2011; Ridgeway and Correll 2004).

This article leverages eight decades of U.S. print media (1930-2009) to track these gender stereotypes at scale. Despite the wealth of recent work illustrating the importance of gender stereotypes for academic outcomes, we know little about whether and how these stereotypes changed as gender differences in educational attainment drastically transformed since the 1940s1.

While men reached higher levels of education than women for much of the 20th century, in

recent decades this gender gap reversed, with women’s educational attainment increasingly

surpassing men’s (Goldin, Katz, and Kuziemko 2006). Because gender stereotypes are partly the

product of observed material differences, they respond to social change (Koenig and Eagly 2014;

1 Our data covers the years 1930 to 2009, but, as we detail below, we use 20-year-wide sliding windows to partition the corpus. The earliest resulting window is centered on 1940, and the last on 2000. DRAFT School, Studying and Smarts 3

Ridgeway et al. 2009; Seguino 2007). Therefore, the dramatic changes in women’s educational achievement throughout the twentieth century may have been accompanied by changes in the gender associations of school, studiousness, intelligence, and classroom behaviors, among other concepts.

These changes in the educational landscape present a rare opportunity to observe how stereotypes change alongside the reversal of a core pattern of stratification. The longitudinal

examination of these stereotypes enables us to observe which gender beliefs transform along

with corresponding material changes, and which remain despite them. We use these observations

to contribute to theoretical discussions about the persistence of the gender system (Ridgeway

2011; Ridgeway and Correll 2004). Additionally, given the powerful role of education in

stratification and the salience of gender in the classroom, our empirical observations of these

stereotypes are important contributions in their own right. These observations may enable

scholars of gender in contemporary classrooms to contextualize their observations within broader

historical trends. They may also inform future scholarship in the sociology of education on the

gender gap reversal, which has implicitly assumed that these gender stereotypes are an important

but unchanging contributor to the gender gap.

As Ridgeway (2011) notes, the lack of systematic longitudinal data on gender stereotypes

has hampered previous efforts to understand their historical trajectories. To overcome this

problem, we examine a 200-million-word corpus of American print media (1930-2009) with

Word2Vec (Mikolov, Chen, et al. 2013; Mikolov, Sutskever, et al. 2013), which is a recently

developed neural network-based word embedding algorithm that has found wide application

across many disciplines, but has only recently begun to see sociological uses (e.g., Joseph and

Morgan 2020; Kozlowski, Taddy, and Evans 2019; Stoltz and Taylor 2019). Our approach lets

DRAFT School, Studying and Smarts 4

us estimate how often each of over 10,000 words in our vocabulary occurs in feminine versus masculine contexts at different points in our time period. We use this to track changes in stereotypes that prior work has linked to gender differences in contemporary academic outcomes.

To preview, we find that the masculine associations of terms related to disobedience, aggressiveness, and other problematic classroom behaviors were already in place at the start of our time series in 1940, as were the feminine associations for terms for positive social and behavioral skills. We note that these two sets of keywords are closely related to the core stereotypes of women as communal and men as forcefully agentic, which theoretical accounts of the gender system expect to be highly resistant to change (Ridgeway 2011; Ridgeway and

Correll 2004). And indeed, we observe that both sets of associations have remained substantively unchanged to the present day.

Terms for school and studying followed a different trajectory. Because women’s relative educational attainment increased drastically throughout the 20th century, theoretical accounts suggest that these stereotypes should also have shifted (Eagly 1987; Ridgeway 2011; Ridgeway et al. 2009). And indeed, we find that while neither school nor studying had significant gender associations in 1940, both gradually gained strong associations with femininity. In fact, out of the roughly 16,000 words in our vocabulary for both 1940 and 2000, “student” exhibited the

213th-largest shift towards femininity (top 1.5%). The feminine associations of schooling frequently observed in contemporary sociological work may thus be a relatively recent phenomenon.

Our most interesting result concerns the stereotypes of intelligence, which did not have a significant gender association in 1940. Throughout the 20th century, the gender associations of intelligence moved in nearly perfect counterpoint to the associations of school and studying, with

DRAFT School, Studying and Smarts 5

intelligence becoming significantly associated with masculinity just as school and studying

became associated with femininity. Indeed, in the year 2000, we find that “genius” is in the top

0.6% of most masculine terms in our vocabulary—roughly the same as the terms “marines”,

“football”, and “brother”, and roughly as masculine as the terms “motherhood” and “fragrance”

are feminine.

In the Discussion, we propose that the striking synchrony between the trend lines for

studying and intelligence (cor = -0.93), and the conspicuous timing of the change, may suggest

that both stereotypes emerged as part of one cultural transformation where women’s increasing

educational achievements became reframed as a product of effort, whereas men’s were reframed as consequences of effortless intelligence—a substantially higher-status avenue to success.

Taken together, our observations thus appear broadly consistent with the theoretical conception of a gender system that persists partly because surface stereotypes shift to accommodate broad social changes and reinterpret them in terms of a persistent hierarchical distinction between men and women, thereby “reinventing gender inequality for a new era” (Ridgeway 2011:157).

The rest of our manuscript proceeds as follows. We first survey existing work to determine which gender stereotypes are most relevant to the academic performance of boys and girls, and to derive predictions about how each of these stereotypes changed throughout the 20th

century. We next detail our methodological approach, which uses bootstrapped diachronic

Word2Vec embeddings. We then introduce our corpus, validate its coverage of print media

across time, and construct keyword scales to track each of the gender stereotypes identified in

our literature review. We detail the findings from our empirical investigation in the Results

section. Finally, in the Discussion, we interpret our observations through the lens of recent

theoretical work on the gender system.

DRAFT School, Studying and Smarts 6

GENDER STEREOTYPES AND EDUCATION

Gender stereotypes are sets of beliefs about traits that “most people” would attribute to men or

women (Fiske et al. 2002; Ridgeway et al. 2009). Stereotypes can be prescriptive or descriptive.

Both kinds can have important consequences for educational outcomes. Perhaps the most

immediate effects on students’ behavior come from prescriptive gender stereotypes that provide

‘blueprints’ for how boys and girls are expected to act in school. When boys and girls deviate

from these -prescribed behaviors, they may be penalized by other children and

sometimes by adults—especially when they display traits or behaviors that are prescriptive for the opposite gender (Prentice and Carranza 2002; Ridgeway 2011:59; Ruble, Martin, and

Berenbaum 2007). Descriptive gender stereotypes inform children’s understandings of their own capacities, which can affect their school performance and academic choices (e.g., Deigmayr,

Stern, and Schubert 2019; Spencer, Steele, and Quinn 1999). They can also alter how parents and educators perceive and interpret childrens’ abilities and behaviors (e.g., Tiedemann 2000;

Retelsdorf, Schwartz, and Asbrock 2015; Musto 2019). In the next section, we review recent scholarship on gender stereotypes relevant to education to identify which ones we should track across time in our corpus.

Which gender stereotypes affect educational outcomes?

Below, we identify six gender stereotypes that recent work has linked to girls’ and boys’ academic outcomes. In this section, we review the literature that identifies each stereotype’s importance and establishes our expectations about its present-day state. Then, in the following sections, we will review theoretical, longitudinal, and older cross-sectional studies to derive predictions about historical trends in each of these stereotypes.

DRAFT School, Studying and Smarts 7

Contemporary work illustrates adolescents view a wholehearted embrace of school as a stereotypically feminine characteristic (Jackson 2003; Kessels et al. 2014; Morris 2008;

Warrington, Younger, and Williams 2000). This body of work describes, for example, how boys

perceive the student role and academic work as feminine. Since masculinity is partly constructed

in opposition to femininity, an affinity for school is often stigmatizing for boys (Adler, Kleiss,

and Adler 1992; Archer, Pratt, and Phillips 2001; Epstein 1998; Pascoe 2007; Thorne 1993;

Young and Sweeting 2004). Indeed, students who conform more strongly to masculine norms

also tend to be less engaged with school (Marrs 2016; Yavorsky and Buchmann 2019). We will

thus examine the gender associations of (i) schooling.

An even stronger stigma centers around (ii) effort put into schoolwork. While

studiousness may be rewarded for girls (Adler et al. 1992; Entwisle, Alexander, and Olson

2007), boys who are seen as putting in substantial effort into school can be teased, ostracized and

mocked by other boys as effeminate and queer. This is exemplified by boys’ often

interchangeable use of derogatory terms such as “wussy,” “geek,” “sissie,” “nerd”, “pussy,”

“gay,” and “wimp” (Epstein 1998; Morris 2008; Pascoe 2007).

A large body of work shows that academic success requires a range of stereotypically feminine (iii) “socio-behavioral skills”—a term that education researchers use to refer to a broad range of competencies including attentiveness, organization, responsibility, communication, helpfulness, and the ability to get along with others (e.g., Downey and Vogt Yuan 2005). These

competencies affect student outcomes both because they are immediately critical to students’

learning in the classroom environment, and because they are rewarded by teachers and

educational institutions (DiPrete and Buchmann 2013; McMillan, Myran, and Workman 2002).

Prior work demonstrates that girls are socialized into these skills from a young age (DiPrete and

DRAFT School, Studying and Smarts 8

Jennings 2012),2 and that these skills are normative for girls but may be stigmatized for boys because they are considered effeminate (e.g., Cohen 1998; Epstein 1998; Mickelson 1989). Thus,

gender stereotypes of socio-behavioral skills may facilitate girls’ academic success and hinder

boys’ success.

Just as socio-behavioral skills are stereotypical of girls, (iv) problem behaviors such as

interrupting, fighting, disobeying teachers, and more broadly “getting into trouble” are

stereotypical of boys (Jackson and Dempster 2009; Morris 2008; Musto 2019). Much like socio-

behavioral skills tap into deeper understandings of femininity, problem behaviors map onto a

core set of attributes which characterize stereotypical masculinity, as we will see in the next

section. Problem behaviors may thus serve as a way to “do” masculinity in the classroom (Adler

et al. 1992; Jackson 2002). For example, in a recent ethnography, Morris observed that high

school boys took pride in their fights. Fighting was a vehicle to cultivate and display masculinity,

rather than merely a method to resolve conflicts (Morris 2008). However, problem behaviors

also inhibit boys’ academic success. Boys’ higher average levels of behavior problems at ages 4

to 5 years may partly explain the contemporary gender gap in degree attainment at ages 26 to 29

(Owens 2016).

Recent work also points to the importance of two additional stereotypes—(v) intelligence

and (vi) unintelligence. These stereotypes appear linked to the stereotypes of school effort

(studiousness), which, as we discussed earlier, is seen by many adolescents as feminine. In

contrast to academic effort, academic success is not seen as feminine, and often desirable for

both boys and girls. While studiousness is the stereotypically prescribed feminine route to

2 As DiPrete and Jennings (2012) point out, there are class differences in these behaviors, which suggests that they are indeed cultural in origin. DRAFT School, Studying and Smarts 9

academic success, prescriptive stereotypes for masculine educational success instead suggest that boys should effortlessly do well in school, succeeding solely because they are naturally intelligent (Cohen 1998:26; Jackson and Dempster 2009; Morris 2008; Musto 2019).3

These expectations have several consequences for boys. “Effortless success” is rarely a viable strategy for academic success, thus potentially leading boys to struggle in school. At the same time, a noticeable lack intellectual ability—i.e., unintelligence—is also distinctly undesirable for boys (Adler et al. 1992; Jackson 2002). Boys who have serious academic problems may thus face ridicule, such as being teased as “dummies” (Adler et al. 1992). Thus, as

Adler et al. (1992) and Jackson (2002) argued, some boys may excessively signal their own lack of effort to shift focus away from their intellectual ability as a cause for their academic failure

(see also Tagler 2012). For similar reasons, some boys may also “overdo” other displays of masculinity—namely, problem behaviors (e.g., Musto 2019). As Musto describes, boys have two core routes for masculinity in school: being “brilliant” or “bad” (2019).4 In addition, stereotyping intellectual ability as masculine may hinder boys’ success by facilitating a fixed view of intelligence for boys. An extensive research program on “growth mindset” has demonstrated that this kind of fixed view of intelligence can discourage effort and serve as a potent demotivating force with negative effects on educational outcomes (Aditomo 2015; Dweck 2006; Sriram 2014).

In these ways, stereotyping intellectual ability and effortless academic success as masculine may hinder boys’ academic achievement.

3 Some evidence suggests that this pattern may be especially pronounced for exceptional intelligence (Bian, Leslie, and Cimpian 2017; Musto 2019; Prentice and Carranza 2002), rather than generally being smart (Eagly et al. 2020). 4 While we focus on gender stereotypes, we note that other ethnographic work also observes that stereotypes of boys’ intellect varies further by boys’ class and race (Morris 2008; Musto 2019). DRAFT School, Studying and Smarts 10

While the prescriptive stereotype promoting effortless success may have negative

consequences for boys, a large body of research has documented that the descriptive stereotype

of boys’ higher intelligence also has substantial negative effects for girls (Bian, Leslie, and

Cimpian 2018; Deigmayr et al. 2019; Leslie et al. 2015; Storage et al. 2016). Importantly, girls

are less likely to pursue school subjects (and ultimately careers) that are believed to require

exceptional intelligence—namely, careers in science, technology, engineering or mathematics,

i.e., STEM (Deigmayr et al. 2019; Leslie et al. 2015; Storage et al. 2016). Further, as Cohen

notes (1998), while the stereotype of girls as hard-working students has the beneficial effect of

normalizing school effort, it also inadvertently undermines the perception of girls’ intellect: the

harder they work, the less ‘evidence’ there is for their innate intellectual potential (see also

Dweck 2006). Particularly in STEM subjects, girls' success is often attributed to effort rather

than intelligence (e.g., Espinoza, Arêas, and Arms-Chavez 2014), which, as we detail in the

Discussion, is a substantially lower-status route to academic success than effortless intelligence.

The literature reviewed above points to the importance of six different schooling-relevant

constructs, which we will examine in this manuscript: (i) schooling broadly conceived

(stereotypically feminine); (ii) putting effort into schoolwork, e.g., studying (stereotypically feminine); (iii) social and behavioral skills important for academic success, e.g., listening

(stereotypically feminine); (iv) problem behaviors, e.g., fighting, disobedience (stereotypically masculine); and finally intelligence, which we will track as two constructs: (v) high intelligence

(stereotypically masculine) and (vi) low intelligence (unclear: insufficient evidence). In the next section, we will examine existing literature on stereotypes across time to derive predictions about possible longitudinal changes in these six stereotypes.

DRAFT School, Studying and Smarts 11

Changes in stereotypes across time

Despite a wealth of research on the present-day role of gender stereotypes in schooling, existing

work provides little insight about how these education-relevant gender stereotypes emerged or

endured across time. However, we can glean predictions for some of our stereotypes using a combination of (1) theoretical work on why stereotypes change or remain constant across time,

(2) empirical longitudinal studies of stereotype change, (3) older cross-sectional studies that can be compared with present-day results to infer a change, and (4) scholarship tracing the history of education and intelligence.

Schooling across the 20th century

The radical changes in the educational attainment of American women and men over the 20th

century likely led to shifts in the cultural understanding of gendered academic potential. Higher

education rapidly expanded in the decades after World War II, and by the 1960s most high

school graduates attended college (Fischer and Hout 2006:10), thus making college a salient part

of American society. At the same time, the gender composition of college students drastically

transformed. While men made up the majority of college graduates5 for much of the 20th century,

women’s educational attainment grew at faster rates than men’s. By the 1980s, women’s

educational attainment caught up to (and then surpassed) that of men, and women have remained

the majority of college graduates ever since (DiPrete and Buchmann 2013:27).6 As prior scholarship on social role theory and status construction theory has demonstrated, gender beliefs

arise partly from observations of material differences between the social positions of men and

women (Eagly 1987; Ridgeway et al. 2009; Ridgeway and Correll 2004). Given the salience,

5 Specifically, men received more bachelor’s degrees than women, partly because women instead frequently attended two-year teacher training programs (Goldin, Katz, and Kuziemko 2006). 6 Even more strikingly, this reversal occurred across all racial groups (DiPrete and Buchmann 2013:39). DRAFT School, Studying and Smarts 12

scope, and durability of this change to educational attainment, we expect that the stereotypical

image of a being a student—and especially a high-achieving student—may have also feminized

across this period.7

Various scholarship on the history of education, and older cross-sectional studies, offer further support for this prediction. Historically, academic attainment was perceived as a man’s pursuit (Cohen 1998; Solomon 1985). Empirical studies show this perception lasted well through the 20th century: a 1938 study of boys’ and girls’ motivations to get good grades found that boys

were more likely than girls to be motivated by the desire to attend college. Girls, in contrast,

were more frequently motivated by a desire to please teachers and parents (Hicks and Hayes

1938). Even twenty-one years later, a well-cited interview study with 8-11 year old boys again

found that boys believed academic success was more important for boys than girls (Hartley

1959). Indeed, another study of 1,529 12th graders in 1960 similarly finds that more boys than

girls aspire to attend college (Bordua 1960).8 In contrast with these older results, contemporary

work shows that more girls than boys aspire to attend college (Fortin, Oreopoulos, and Phipps

2015). Furthermore, Fortin et al. show that this gender reversal in college aspirations

accompanied the reversal of the educational attainment gap (2015). This evidence is consistent

with a changing gender landscape where schooling and academic achievement came to be

gradually seen as more feminine, and thus reinforces our prior prediction.

7 There are also some possible theoretical reasons that stereotypes could have remained stable across this period. First, there may be cultural lag between changing demographics and changing stereotypes, such that stereotypes remained stable well after the gender gap reversal (Diekman, Eagly, and Johnston 2010). Second, confirmation bias may have slowed or inhibited individuals' changing stereotypes (Ridgeway 2011). However, the change in relative rates of educational attainment has been both long-lasting and far-reaching, which leads us to believe that these reasons would not have been sufficient to stop the gender beliefs about students from transforming along with the changes. 8 Other scholarship during this time suggested that while girls strived for academic success more than boys in elementary school, this pattern reversed in higher grades as boys realize the importance of education for their future success (Clark 1967; Kagan 1964:158–59). DRAFT School, Studying and Smarts 13

As women’s academic performance gained relative to men’s, it is also likely that

studiousness gained feminine associations. Indeed, across the reversal in educational attainment,

women also began to take more advanced courses and college preparatory courses than men

(DiPrete and Buchmann 2013:79; Goldin et al. 2006).9 A study of 9th graders additionally

suggests that between 1935 and 1953 there was a reversal in amount of time reported studying

among high school men and women, whereby women began to study more than men; this gap

continued to widen throughout the 1950s (Jones 1960). Thus, we predict that not only did

academic achievement become seen as more feminine, but, alongside this, effort put into

academic success also gained feminine associations.

Socio-behavioral skills and problem behaviors across the 20th century

While previous work has not directly investigated longitudinal changes in stereotypes around

socio-behavioral skills or problem behaviors, a range of other empirical and theoretical evidence suggests that they may be longstanding and durable. First, as we pointed out earlier, socio-

behavioral skills and problem behaviors closely map onto the core dimensions by which we

distinguish attributes as feminine or masculine: these are communal and forcefully agentic,

respectively (Eagly 1987; Eagly and Wood 1999; Ridgeway 2011). Communality is the orientation towards others’ well-being. For example, traits like emotional expressiveness, nurturance, interpersonal sensitivity, kindness, and responsiveness are all distinctly feminine.

Agency is the orientation towards independence and achieving one’s own goals.10 Agentic

attributes include assertiveness, confidence, independence, forcefulness, and dominance.

9 But see historical scholarship by Clark (1967) and Kagan (1964:158–59), who suggest that academical success in elementary school (but not necessarily in later grades) was historically seen as feminine. 10 Some scholarship also treats competence as a component of agency; however, we follow Ridgeway (2011:168– 69) and others in maintaining these as distinct, although correlated, ways of defining masculinity. DRAFT School, Studying and Smarts 14

As Ridgeway (2011), Eagly and Wood (Eagly and Wood 1999; Wood and Eagly 2002),

and other scholars have argued, these core aspects of gender stereotypes may be particularly

durable because they are reinforced by the distribution of power in society. As women occupy

lower-status and caregiving roles, femininity comes to be associated with the supportive,

agreeable, and conflict-avoidant behavior more characteristic of those roles; and, similarly, as

men occupy higher-status, higher-power positions, masculinity comes to be associated with the

agentic, forceful, dominant behavior more characteristic of those roles. Empirical work confirms

that the stereotype of women as communal (Spence and Buckner 2000) and men as forcefully

agentic are indeed highly persistent across time (Cejka and Eagly 1999; Diekman and Eagly

2000; Haines, Deaux, and Lofaro 2016; Koenig and Eagly 2005)11. We thus expect that the

gender associations of socio-behavioral skills and problem behaviors have also remained fixed

throughout our period.

Second, a large body of older cross sectional studies also suggest that these two stereotypes were in place throughout the 20th century, and that girls and boys enacted corresponding behaviors in gendered ways throughout the 20th century (Blatz and Bott 1927;

Hartley 1959; Hayes 1943; Meyer and Thompson 1956; Peltier 1968; Tuddenham 1952). For

example, in Hartley’s (1959) study mentioned earlier, boys’ self-reports closely echo the stereotypes of boys’ problematic behaviors noted by current work. Subjects stated that boys should be able to fight in case a bully comes along and should play rough games. They also stated that grown-ups expect boys to be noisy, dirty, naughty, and to get into more trouble than girls do. Further evidence comes from a study of 608 elementary school children across 85 first

11 But see Haines, Deaux, and Lofaro (2016) and Lupetow, Garovich, and Lupetow (1996).

DRAFT School, Studying and Smarts 15

to fifth grade classrooms (Tuddenham 1952). Tuddenham (1952) measures students’ opinions of their peers, which he interprets as gender stereotypes. Results resonate with modern stereotypes: compared to boys, girls were more likely to be seen as tidy, quiet, friendly, and “a good sport,” and less likely to fight, be “bossy,” “wiggly,” or “quarrelsome.” These empirical findings are strikingly similar to those of the contemporary work on socio-behavioral skills and problem behaviors we reviewed above. Thus, this empirical and theoretical evidence consistently predicts that the stereotypes of socio-behavioral skills and problem behaviors have likely remained broadly unchanged throughout our time period.

Intelligence across the 20th century

Finally, we can gain insights about changes in gender stereotypes of intelligence from a combination of historical work on public and academic discourses concerning gender and intelligence and various studies of popular stereotypes across time. Because this literature does not study gender stereotypes of unintelligence separately from intelligence, we also do not make separate predictions for unintelligence here. First, we review the historical accounts.

Over the first half of the 20th century, measurement of sex differences in intelligence became a staple of American public and academic discourses (Tannenbaum 2000). These discourses inherited long-standing beliefs that men were intellectually superior to women (Cohen

1998:28; Hegarty 2007; Shields 1975, 1982; Silverman 1989). But, despite this stereotypical distinction, the early 20th-century discourses did not view exceptional intelligence as masculine: rather, the dominant view of a gifted child imagined a weak, sickly, eccentric, effeminate, and possibly gay male (Elfenbein 1999; Fox 1968; Hegarty 2007; Shields 1982; Stadler 1999). This view was widely accepted until Lewis Terman’s much-publicized longitudinal study of gifted

DRAFT School, Studying and Smarts 16

children (1926-1986), which was designed in part to combat these stereotypes.12 Terman and his

successors instead argued that gifted children are healthy and successful—and, in the case of

boys, masculine and virile (Hegarty 2007; Jolly 2008; Warne 2019). Moreover, Terman stressed that both gifted boys and girls exhibit masculine traits, thus inverting the former stereotype of intelligence (Seashore 1931). Given the increasing importance of technology and science to national defense during the Cold War, Terman’s work found a ready audience among mid-

century educators and policy makers, who undertook sustained efforts to combat the negative

stereotypes of gifted children (e.g., Jolly 2009; Tannenbaum 2000; Pritchard 1952; Ely 2010;

Porter 2017). This intellectual history suggests that the association of intelligence and

masculinity may have grown stronger during the 20th century.

In contrast to this intellectual history, other scholarship on gender stereotypes presents a

more mixed picture. While no single longitudinal study tracked gender stereotypes of

intelligence consistently over long periods, a recent meta-analysis sought to synthesize results of sixteen cross-sectional surveys ranging from 1946 to 2018 (Eagly et al. 2020). They show that, between 1946 and 2018, the likelihood of respondents considering women to be more intelligent increased from 34% in a 1946 survey to 65% in a 2018 survey. This result, however, conflicts with an abundance of contemporary literature that finds intelligence is more closely associated with men rather than women (Bian, Leslie, and Cimpian 2017; Bian et al. 2018; Deigmayr et al.

2019; Leslie et al. 2015; Prentice and Carranza 2002; Storage et al. 2016).13 For example, in a

12 In fact, the earliest effort to quantitatively measure gender (non)conformity—the Terman and Miles (1936) Masculinity-Femininity (MF) test—emerged alongside this longitudinal study. 13 This discrepancy may be due to methodological differences. The measures analyzed by Eagly et al. (2020) rest on an explicit question asking the respondent to compare men’s and women’s intelligence. Because this design invites

DRAFT School, Studying and Smarts 17

well-cited recent article, Bian and colleagues find strong support for this stereotype across four

experiments with children, and observe that children endorse this stereotype by the time they are

6 years old (2017). Eagly’s results also conflict with a recent study of film representations of

brilliance in 11,550 films from 1967 to 2016, which found that brilliance was consistently

represented as masculine across this time frame, including among children’s genres (Gálvez,

Tiffenberg, and Altszyler 2019). Because different strands of scholarship point to conflicting

conclusions, we make no predictions regarding changes in the gender stereotypes of intelligence.

Predictions

Above, we identified six gender stereotypes relevant to academic outcomes: schooling, school

effort, socio-behavioral skills, problem behaviors, high intelligence, and low intelligence. We

first identified their present-day gender associations as suggested by contemporary research. We

then surveyed theoretical work, longitudinal studies, and sets of older cross-sectional studies to

arrive at predictions for their across-time change since the 1940s: an increased in the feminine

associations of schooling and school effort, and no change in socio-behavioral skills and problem

behaviors. The literature pointed to conflicting predictions regarding the changing associations of

intelligence. We summarize these predictions in Table 1.

[Table 1 about here]

subjects to explicitly express gender-biased beliefs, it may be subject to substantial social desirability bias (Klonis, Plant, and Devine 2005). In contrast, most of the contemporary work cited here uses more subtle methods for measuring this gender stereotype, such as vignettes, which may reduce the tendency for socially desirable responding (Hughes and Huby 2012). It is thus possible that the trend recorded by Eagly et al. reflects the increasing social unacceptability of sexist attitudes about women’s intelligence. DRAFT School, Studying and Smarts 18

DATA

To investigate these education-relevant gender stereotypes across time, we use the Corpus of

Historical American English (COHA) for the years 1930 to 2009 (Davies 2012). This well-

known corpus is commonly used in historical linguistics to study language change in American

English (e.g., Newberry et al. 2017; Perek 2018). It consists of a purposive sample of full-text

books, magazine articles, and newspaper articles that is balanced across the decades by medium and Library of Congress category. Roughly 50% of the corpus is fiction, with the remainder sourced from popular magazines (e.g., TIME), newspapers (e.g., New York Times), and a small number of nonfiction books. In sum, this corpus captures a wide and balanced variety of

American print media.

While COHA was designed for studying language change and has been validated for this

purpose in historical linguistics (Davies 2012) and been used in other studies using Word2Vec

(Garg et al. 2018; Hamilton, Leskovec, and Jurafsky 2016), its validity as a general measure of

popular culture is less well established. To verify that it provides a good window into the kinds

of works that were widely read by Americans within each time period, we validated the contents

of the corpus against the Publisher’s Weekly list of top-selling fiction for each year in our 80-

year time window. We randomly selected 30% of the titles for each decade, and examined

whether the title and the author were in the corpus. Our results indicated that COHA contained

books by an average of 75% (st. dev. = 9.5%) of the best-selling authors within each decade, and

an average of approximately 30% (st. dev. = 10%) of the specific best-selling novels. This

suggests that COHA indeed contains a collection of some of the works most widely read by

Americans this century.

DRAFT School, Studying and Smarts 19

As Ridgeway (2011) and others argue, popular media are far more likely to reflect the

gender beliefs of dominant groups that have decision-making authority within mass media

companies. Because of their overrepresentation in mass media, these dominant-group stereotypes

then come to be widely known across all social groups and thus acquire a taken-for-granted character (Milkie 1999). While alternate gender beliefs still guide behavior within homogeneous settings consisting of members of one non-dominant subgroup, dominant-group beliefs become

the widely accepted default of behavior within dominant-group and mixed-group settings—i.e.,

they become hegemonic (Ridgeway 2011). Since COHA is a dataset of popular media, we expect that it also largely reflects these hegemonic gender beliefs. Our work may thus be also most accurately understood as an analysis of hegemonic stereotypes.

In the following section, we will discuss how we used this corpus to produce our estimates.

METHODS

To study changes in associations, we partition the corpus into 13 time windows. For each of the

13 windows, we begin with 200 bootstrapped resamples of our corpus, and estimate a Word2Vec word embedding model for each one (thus yielding 2600 estimated embeddings). We then use multi-keyword scales to track the changing gender associations of the six constructs we identified above (see Table 2) across these embeddings. We detail these steps in the remainder of the “Methods” section. First, we provide a general overview of Word2Vec for methodologically minded readers who are unfamiliar with this method. Next, we describe the specifics of our embedding estimation procedures, including chronological partitioning and bootstrapping. We then describe how we extracted a set of bootstrapped “gender axes” which enable us to estimate the relative tendency of different keywords to appear in feminine vs. masculine contexts. We

DRAFT School, Studying and Smarts 20

conclude the methods section by detailing how we selected the six sets of keywords we use to

measure the gender associations of schooling, socio-behavioral skills, problem behaviors, school

effort, intelligence, and unintelligence.

Overview of Word2Vec

Word2Vec is a machine-learning algorithm that models the meaning of words by locating them relative to one another in a high-dimensional vector space (i.e., by representing each word as an

array of n coordinates; in our case, n = 200.) In a trained model,14 semantically or syntactically

similar words are located closer to each other than dissimilar words (as measured by cosine

similarity). Together, the collection of these word vectors forms a semantic space or word

embedding.

To map words to vectors in a 200-dimensional space, the Word2Vec algorithm initially

assigns each word a random vector. It then estimates the embedding model by sliding an 11- word window across the corpus. Each time the window advances, it takes the 5 context words to

the left and right the focal word in the center of the window, and mixes them with a set of words

randomly chosen from the corpus.15 The algorithm then tries to correctly pick out the observed context words from this list. If it makes a mistake, the word vectors are updated to minimize the chance that this error will occur again, thus gradually increasing accuracy. It then slides the window forward by one word and repeats this process, thus using millions of incremental adjustments to gradually learn the semantic space of the corpus (see Kozlowski et al. [2019] for an in-depth overview for a sociological audience.)

14 I.e., a model that has been successfully estimated from the data. 15 Specifically, we are describing Word2Vec with SkipGram and negative sampling (SGNS) with a window metaparameter of 5, which is the method we use here. Note that our description of the algorithm here simplifies some computational details for the purposes of exposition. DRAFT School, Studying and Smarts 21

Word2Vec gained widespread popularity in computer science and computational

linguistics after performing remarkably well on word similarity and analogy tasks (Mikolov,

Chen, et al. 2013; Mikolov, Sutskever, et al. 2013), two standard tests used to evaluate language

models. Word similarity tasks require the models to estimate the similarity between pairs of

words, with the similarity judgements compared to those made by human coders. Analogy tasks

ask the algorithm to solve analogies in the form “A is to B as C is to X”, e.g., “king is to queen

as father is to X”, with X= “mother”. Theoretically, the analogy tasks are designed to test the

extent to which embedding models are able to capture relational properties in texts that go

beyond individual words: predicting the correct solution “X” depends on relating some notion of

the relationship between “king” and “queen” to the meaning of “father”. The classical approach

to testing embedding models represents analogies as and thus the models

𝑏𝑏 𝑎𝑎 𝑥𝑥 𝑐𝑐 provide a prediction by choosing the word vector nearest𝒗𝒗 − to𝒗𝒗 ≈ 𝒗𝒗 − 𝒗𝒗 + . The fact that

𝑥𝑥 𝑏𝑏 𝑎𝑎 𝑐𝑐 these predictions accurately track human responses suggests𝒗𝒗 that≈ the𝒗𝒗 word− 𝒗𝒗 embedding𝒗𝒗 models

carry meaningful information in the linear relations between the word vectors: in other words,

not only are semantically similar words located near one another in this space, but semantically

similar distinctions also parallel one another.

Details of Estimation

In order to study longitudinal change, we segmented the COHA corpus into overlapping 20 year

sliding windows, which we “slide” by 5 years at a time. Each window is centered on January 1 of

a focal year, starting with 1940. This yields the following series of 13 windows, which we

transcribe below using the notation focal year: [range]:

1940 : [1930,1949], 1945 : [1935,1954], . . . , 1995 : [1985,2004], 2000: [1990,2009]

DRAFT School, Studying and Smarts 22

In the remainder of this manuscript, we will often use the focal year as shorthand for the full

window. We then applied Word2Vec to these windows to produce our estimated embeddings.

Word2Vec is a nondeterministic estimator. In pre-testing, we found that Word2Vec was producing unstable estimates from our data, with two estimation runs based on the same dataset sometimes yielding noticeably different results. This problem is consistent with findings reported by Antoniak and Mimno (2018), who observed that instability problems are common in corpora of our size (which, at 200 million words, is still substantially smaller than many used in machine learning.) To improve the quality of our models and ensure they are optimal for our data, we conducted a search through possible combinations of algorithm settings (called hyperparameters) such as word vector dimensionality and context window size, among others,16 and located

models that perform the best on the WordSim353 word similarity test17 and the “family” section of the Google Analogy Test.18 We also widened our corpus windows from 10 years to 20 years,

thus sacrificing some time resolution for improved reliability.

To quantify the remaining uncertainty in our estimates, we used bootstrapping to

resample the documents in our corpus. We then estimated a set of 200 bootstrapped embeddings

for each of the thirteen windows (2600 embeddings total). To estimate each term’s trend line, we

16 Our final embedding models were trained using Skip-Gram in GenSim with 7 negative samples, 5-word context windows (i.e., 5 words to the left, focal keyword, 5 words to the right), and vectors of size 200. 17 The WordSim353 Test includes a series of 353 pairs of words, each of which includes a human-rated similarity based on at least 13 annotators (Finkelstein et al. 2002). Scores are based on the correlation between rated similarities and distances between words in the embedding models (mean correlation across sections = 0.544, s.d.=0.022). 18 The Google Analogy Test (Mikolov, Chen, et al. 2013; Mikolov, Sutskever, et al. 2013) consists of 20,000 analogies, divided into 14 sections, including “family” (e.g., “uncle is to aunt as grandfather is to ___”). Scores reflect proportion of times the embedding model correctly fills in the blank. We optimized on the “family” section of these tests, which includes analogies representing familial relations that directly test gender. This includes relations like mother and father, brother and sister, or aunt and uncle. We use these two benchmarks to select hyperparameters that maximize embedding quality and tailored models to our analysis. Our final score on the Google Analogy Test family section was 0.742 (s.d.=0.038), which is comparable to prior work (Levy and Goldberg 2014; Mu and Viswanath 2018). DRAFT School, Studying and Smarts 23

begin with the first time window, and calculate the gendering of that term within each of the 200 bootstrapped embeddings in that window (details below). We leverage this distribution of scores across the bootstrapped embeddings estimate non-parametric confidence intervals, which we also use to calculate p-values.19 To further stabilize our results, we use the average across these 200

bootstrapped estimates as our main estimate, which greatly reduces the remaining estimation

error and lessens the impact of outlier documents on our estimates. We then repeat this process

again for the remaining twelve time windows to calculate its trend.

Measuring Gender Stereotypes with Word2Vec

Existing work demonstrated that it is possible to use word vector arithmetic to measure cultural associations of terms in embedding models (Bolukbasi et al. 2016; Caliskan, Bryson, and

Narayanan 2017; Garg et al. 2018; Grand et al. 2018; Jones et al. 2020; Joseph and Morgan

2020; Kozlowski et al. 2019). We follow these approaches by (i) “extracting” a gender axis from each embedding, and then (ii) estimating the gender coding of each keyword of interest by

“projecting” it onto this axis. Below, we detail these steps, and then review existing validation

studies for this procedure.

Recall that, in the embedding space, word vectors are more proximal if they correspond

to words that tend to be used in similar contexts—i.e., they tend to co-occur with the same sets of

words. The words with the highest proximity are those that (i) frequently co-occur with one

another, or (ii) or can be used interchangeably—even though they may rarely be used together

19 Our confidence intervals and p-values thus reflect both sampling error and the estimation error stemming from Word2Vec’s stochastic character. As the width of these confidence intervals demonstrates, the steps we took above to stabilize our embeddings were sufficient to produce statistically viable results. DRAFT School, Studying and Smarts 24

(e.g., synonyms) (Mrksic et al. 2016). Different areas of the embedding space thus represent

different contexts.

To differentiate words that occur in feminine versus masculine contexts, we extract a

gender axis from the embedding space. To do this, we identify pairs of anchor terms where the

primary difference between them is gender (e.g., ‘girl’, ‘boy’). Within each pair, we subtract the

vector corresponding to the masculine word (e.g., ‘he’) from the vector corresponding to the

feminine word (e.g., ‘she‘) and average the resulting difference vectors to create a single vector

corresponding to a gender dimension pointing from masculinity to femininity. We used the

following anchor term pairs:

('she', 'he'), ('her', 'him'), ('her', 'his'), ('girl', 'boy'), ('girls', 'boys'), ('daughter', 'son'),

('daughters', 'sons'), ('sister', 'brother'), ('sisters', 'brothers'), ('woman', 'man'),

('women', 'men'), ('mother', 'father'), ('mothers', 'fathers'), and ('female', 'male').

Second, to evaluate the relative extent to which a word tends to appear in feminine- or

masculine-gendered contexts, we project it onto the axis by computing the cosine similarity

between the gender axis vector and the word vector. This projection captures the extent to which

it is more likely to be used around feminine words versus masculine words.20

Prior studies demonstrate that this approach successfully extracts a meaningful semantic

dimension for gender and other socially relevant categories (Bolukbasi et al. 2016; Caliskan et al.

2017; Garg et al. 2018; Grand et al. 2018; Joseph and Morgan 2020; Kozlowski et al. 2019). For

20 More precisely, projection captures frequency of usage in both first- and second-order contexts of the anchor terms. The words we identify as feminine thus consist of (i) “first-order” feminine words that tend to co-occur with the feminine anchor terms; and (ii) “second-order” feminine words that tend to co-occur with the first-order feminine words (and conversely for masculine). Also note that two words that are largely proximate in the overall embedding space may still have very different gender projections if they differ specifically in how often they are used in feminine versus masculine contexts. Conversely, words that are distant in the overall space may have similar projections if they are used with similar frequency in feminine versus masculine contexts. DRAFT School, Studying and Smarts 25

example, Caliskan et al.’s (2017) well-known validation study shows that the projections of

terms estimated by this method corresponds to human participants' implicit gender associations

of these terms as measured by the Implicit Association Test. Following this literature (e.g.,

Bolukbasi et al. 2016), we thus occasionally refer to these projections as gender “associations”.

Previous work also shows that gender associations of occupations estimated via this approach are

correlated with survey self-reports on gender stereotypes of these occupations, as well as with

the changing gender composition of these occupations calculated from historical Census data

(Bolukbasi et al. 2016; Caliskan et al. 2017; Garg et al. 2018).21

To account for potential differences in the overall gendering of language across the

decades (for example, due to shifts away from using “man” to mean “human”), we project each

of over 10,000 unique words that comprise each window’s vocabulary onto the gender axis, and

then separately z-score each window’s set of resulting cosine similarities to calculate our final

projection scores. A projection score of 1 thus indicates that a word is 1 standard deviation more

feminine than the average word in the vocabulary, whereas -1 indicates that it is one standard

deviation more masculine.

To verify the face validity of our gender axis, we examined the ten most feminine and

most masculine words within each time window. Aside from first names (e.g., “Arnold,”

“Alexander,” “Martha,” “Elizabeth”) and terms that were used to construct the axis (e.g., “her,”

“girl,” “mother”), all but 6 of the 260 most gendered words belonged to just two gender-specific

categories within each gender: for men, these categories these were (1) war and conflict (e.g.,

21 These validation studies examined embeddings trained on the COHA dataset we use here, as well as on corpora from Google Books, Google Web Crawl, and Google News. Validation studies suggest that word embedding are especially accurate at detecting gender associations (Joseph and Morgan 2020; Kozlowski, Taddy, and Evans 2019) compared to associations with other social categories. DRAFT School, Studying and Smarts 26

“rifle,” “cavalry,” “platoon,” “commander,” “gen.,” “fighter,” “veteran,” “Eisenhower,”

“opponent,” “rival,” “enemy”), and (2) leadership and politics (e.g., “chief,” “boss,” “king,”

“Tammany,” “G.O.P,” “campaigned,” “elect,” [Ho Chi] “Minh”). For women, they were (1) reproduction (e.g., “pregnancy,” “breast,” “babies,” “infant,”), and (2) clothing or appearance

(e.g., “gown,” “makeup,” “dresses,” “perfume,” “blonde,” “skirt,” “lace,” “lovely,” “pink”).22

The extreme gendering of these terms matches our intuitions.

Constructing the Keyword Scales

We used multi-word scales to measure the gender associations of each of the six education-

relevant constructs we identified above: (i) schooling; (ii) studying (school effort); (iii) socio-

behavioral skills; (iv) problem behaviors; (v) high intelligence; and (vi) low intelligence. Our use

of multi-word scales serves two purposes: first, it reduces measurement error stemming from

eccentric changes to any one term’s usage; and second, it allows us to capture a variety of

different dimensions associated with each concept.

To select these keywords, we began by reviewing the academic and primary literature

relevant to each concept and recording all relevant keywords we encountered (including terms literally present in the text itself and the terms we would use to describe the contents of these texts.) To account for variations in vocabulary, we repeated this exercise with two team members who belonged to different age groups and genders. We then expanded our list of candidate keywords by referencing each term in a thesaurus. To ensure breadth, we used a range of modern-day thesauruses as well as the Historical Thesaurus of English (Kay et al. 2020). After assembling the initial candidate lists of terms for each scale, we applied a minimum word count

22 The three words that did not fit these categories for men were “brotherhood,” “hero,” and “uncle,” while for women, the three words were “aunt,” “princess,” and “flower” DRAFT School, Studying and Smarts 27

filter and relevant word usage filter to ascertain the quality of the scales (see below). We will

now discuss our reasoning behind the choice of terms for each scale.

We begin with scales for “schooling” and “school effort”. The impetus behind our (1) general “schooling” scale comes from accounts arguing that closeness to academic institutions may be stereotyped as feminine. When assembling this scale’s initial term list, we aimed for breadth, and thus included a diverse range of terms including social roles like “student,” abstract concepts like “educational” or “academic”, physical spaces and items like “classroom” and

“blackboard,” educational institutions like “school” and “college,” and important features of schooling like “classes,” “grades” and “graduation.” We excluded terms for teachers and teaching because the ratio of female to male teachers increased during our period. Our general scale for schooling also excluded terms directly related to studying or schoolwork, which we instead used to construct our second scale: (2) “school effort.” For this scale, our initial term list consisted of words directly related to studying (“study,” “studying,” “studious”) and effort

(“effortful”). We also included terms closely related to effortful academic activity (“homework,”

“textbook”).

To select terms for “socio-behavioral skills” and “problem behaviors,” we began with the concepts measured on the Early Child Longitudinal Study – Kindergarten (ECLS-K), which is a widely known multiscore, multimethod study of early education that figures prominently in academic work on gender differences in academic performance (DiPrete and Buchmann 2013;

Owens 2016; Tourangeau et al. 2015). Initial terms for (3) socio-behavioral skills included attributes directly relevant to learning (“attentive,” “listens,” “careful,” “meticulous”), and more general attributes that are typically positively associated with academic performance (“polite,”

“courteous”). Initial terms for (4) problem behaviors captured a range of emotional attributes

DRAFT School, Studying and Smarts 28

(“anger,” “violent,” “rebellious,” “uncontrollable”), actions (“disrupted,” “interrupted,” “attack,”

“fight”), and classifications of deviance (“rebel,” “bully”). These behaviors, as rated by both parents and teachers in the ECLS-K, have been shown to be associated with poor academic

performance from an early age, and to have negative effects on motivation and orientation towards school (DiPrete and Buchmann 2013).

Finally, we constructed scales to capture the gender stereotypes of intelligence and

unintelligence. Our initial term list for (5) intelligence included adjectives describing above-

average intelligence (“smart,” “clever,” “intelligent”) as well as exceptional intelligence

(“brilliant,” “genius”). Initial terms for (6) unintelligence include strongly demeaning terms

(“stupid,” “dumb,” “idiotic”) and terms related to foolishness (“fool,” “fools”).

After assembling the initial term lists described above, we took two filtering steps to

ascertain their quality. First, we eliminated terms that appeared fewer than 100 times in any time

window (which, due to the power-law distribution of word frequencies, is a more constraining

restriction than it may first appear). To account for polysemy and the difference between words’

definitions and empirical usage, we then hand-validated most of the keywords that passed the

frequency test. For the hand-validation, we constructed a codebook based on the dictionary

definition of each keyword, where we indicated which of its meaning fits our scale and which

does not. Research assistants then drew a random sample of corpus passages containing the

candidate keyword, read those passages, and used the codebook to code whether the terms were

being used in intended ways. They also flagged any unexpected patterns of word usage. At the

end of the coding procedure, we excluded terms that fit the intended usage less than 25% of the

DRAFT School, Studying and Smarts 29

time.23 Appendix tables A1 through A6 list the full initial term lists for each scale, and indicate

which (if any) of the criteria led to the rejection of the terms that did not make it into the final

scales. (The terms depicted in each figure and numerical estimate are strictly those that passed

both criteria.)

In the following section, we apply the methods we described above to our data to

examine the changing gender associations of these six stereotypes. We compare our observations

to the theoretical expectations we listed in Table 1. We additionally investigate observed

relationships for which present work offered no clear expectations.

RESULTS

To aid exposition, we analyze our six keyword scales in a different order than they were

discussed above: (1) schooling, (2) socio-behavioral skills, (3) behavioral problems, (4) school effort, (5) intelligence, and finally (6) unintelligence.

Schooling

Our schooling scale averages the gendering of terms for students, educational institutions, classrooms, and academic achievement. The projection score for the schooling scale is reported in the Schooling column of Table 2. As the result in the bottom row indicates, in the most recent time window (centered on 2000) schooling was roughly one standard deviation more feminine

(mean = 0.98, p < 0.01) than the average word in the vocabulary. This is consistent with observations from recent work.

23 While we initially planned to set this threshold at 50%, in pre-testing we found that terms generally retained a high correlation with the remainder of their scale until roughly this 25% point. Using a higher threshold results in smaller term lists and thus wider margins of errors around our estimates, but otherwise does not substantively affect the results. Mean and median percentages of times our included keywords were used in relevant ways were 75% and 82%, respectively. DRAFT School, Studying and Smarts 30

[Figure 1 about here]

[Table 2 about here]

As the trend line in Figure 1 indicates, the gender associations of schooling-related terms were not fixed throughout the time series. Instead, its present state is relatively recent. In the corpus window centered on 1940, the schooling scale was not significantly more feminine or masculine than the average word in the vocabulary (mean = 0.16, n.s.). However, it became increasingly more feminine in every subsequent window, and by 1955 became significantly more feminine than the average word (mean = 0.44, p<.01). Thus, although not initially significantly feminine, schooling steadily gained feminine associations. As the gender gap in educational attainment closed over the 20th century, schooling moved monotonically towards femininity before plateauing in 1985, right as college degree attainment reached gender parity (DiPrete and

Buchmann 2013:27). Since 1985, the schooling scale has remained nearly a full standard deviation more feminine than the average word (in 1985, mean = 0.97, p < 0.01; in 2000, mean =

0.98, p<0.001). This change in gender associations between 1940 and 2000 is consistent with our theoretical predictions (see table 1).

To explore this change in more detail, we examine some of the individual keywords that make up the schooling scale. Among the schooling keywords, “classroom”, “graduation”,

“education”, and “student” underwent the biggest changes. For example, in 1940, “student,” was

0.35 s.d. more masculine than the average word in the vocabulary, indicating that it had moderate masculine associations. By the last window, centered on in 2000, this increased to 1.12 s.d. more feminine than the average word, indicating a strong masculine association. In fact, looking at the changes for all the 16336 words present in both the 1940 and the 2000 windows,

DRAFT School, Studying and Smarts 31

we find that “student” experienced 213th-largest move towards femininity (top 1.5% of words).

The terms “education”, “graduation”, and “classroom” were similarly in this top 1.5%.

To provide context for this change, we can compare it with the changes in the gender

associations of the term “empowered.” Google Books Ngrams first register the use of the phrase

“women’s empowerment” in print in 1968. The phrase remained relatively obscure until 1980,

when its frequency of use began to rapidly increase. It reached a symbolic pinnacle during the

1995 United Nation’s Conference in Beijing, which “marked the apex of 20 years of sustained

endeavor to secure women's empowerment as a central element in international development

discourse” (Eyben and Napier-Moore 2009:258). From 1940 to 2000, the term “empowered”

shifted 1.34 standard deviations towards femininity (top 2%)—a large change which nonetheless

is noticeably smaller than the 1.47 standard deviation shift undergone by “student.” This

contextualizes the magnitude of the cultural shift which resulted in the association of schooling and femininity that we observe today.

Classroom behaviors

We next turn to results from two scales related to classroom behaviors: (iii) socio-behavioral skills and (iv) problem behaviors. In contrast with terms for schooling, figure 2 and table 2 indicate that socio-behavioral skills like attentiveness and politeness are feminine even in the oldest corpus window. Indeed, in 1940, terms for socio-behavioral skills are already nearly a standard deviation (mean = .84, p < 0.01) more feminine than the average word. They remain consistently feminine through the most recent window (2000), albeit at a slightly lower level

(mean = .57, p < 0.01)—however, this change is small in magnitude and lacks statistical significance (difference = -0.27, n.s.). The feminine associations of socio-behavioral skills thus

DRAFT School, Studying and Smarts 32

appear to pre-date the historical period we examine in this paper, as we predicted based on

existing work.

[Figure 2 about here]

Figure 3 contains the scale for problem behaviors like aggression and fighting. We see

that these are masculine in our first window (mean = -0.42, p < 0.01), and remain consistently as

masculine throughout the rest of the time period, ending with a comparable gender association in the final time window (mean = -0.55, p < 0.01). In any given decade, these terms for problem behaviors are thus approximately a half standard deviation more masculine than the average vocabulary word. If these terms underwent any change between 1940 and 2000, this was thus a shift towards even greater masculinity, although this shift is small in magnitude and not statistically significant (difference = -.13, n.s.). The masculine associations of being a poorly- behaved student thus also appear to pre-date the period in our study. This is again consistent with the expectations we derived from existing work.

[Figure 3 about here]

Studying and intelligence

While our above results suggest that the feminine associations of being a well-behaved student

may pre-date our time period, our analyses suggest that the feminine associations of being a

hard-working student may be more recent. In the 1940s, effort in school—captured by terms

such as “studying” and “textbook”—was not significantly gendered (mean = -0.13, n.s.; see

figure 4). The associations of school effort then steadily feminized, first becoming significantly

feminine in 1975 (mean = 0.43, p < 0.05), and reached its current level in 1985 (mean = 0.66, p <

0.01), where it has since remained. This is again consistent with our theoretical expectations

(fourth row of table 1).

DRAFT School, Studying and Smarts 33

[Figure 4 about here]

We will now investigate trends in the gender associations of intelligence and

unintelligence, for which existing work offered no consistent predictions (rows five and six in

table 1.) The gender associations of intelligence, too, appear to be dynamic (see figure 5). As

table 1 indicates, between the 1940 and 1955 windows, intelligence is neutral to slightly

feminine (mean = 0.26, n.s.), but this association is not significant except in 1950 (mean = 0.26,

p < 0.05). Starting 1955, however, the association moves persistently towards masculinity, until

plateauing with a significant masculine association starting 1990 (mean = -0.55, p < 0.01),

indicating that terms for intelligence were 0.55 standard deviations more masculine than the

average word in the vocabulary.

One word in the intelligence scale that deserves special attention is “genius.” This is by

far the most masculine term in the scale across all the corpus windows. Indeed, out of the 23,467 words we examine for the 2000 window, we find that “genius” is the 146th most masculine (top

0.6%) with a standardized projection of -2.39, as compared to -1.26 in the 1940 time window.

For context, that makes “genius” slightly more masculine in 2000 than “brother” (-2.26) or

“football” (-2.26) and roughly as masculine as “marines” (-2.33), “bullet” (-2.39), and “buddy” (-

2.37). Its projection places it between a number of names that unambiguously refer to male word leaders: “genius” is slightly less masculine than “Yeltsin” (-2.45) or “Gorbachev” (-2.44), but slightly more masculine than “Lincoln” (-2.37), “Jefferson” (-2.29), or “Nixon” (-2.29). Looking at terms that are the same distance from zero but in the direction of femininity, we find that the word “genius” is roughly as masculine as the terms “estrogen” (2.33), “fragrance” (2.38),

“quilts” (2.40), “sari” (2.40), and “motherhood” (2.43) are feminine. In our most recent decade,

DRAFT School, Studying and Smarts 34

“genius” thus has exceptionally strong associations with masculinity, which echoes recent

sociological work on the “genius effect” (e.g., Bian et al. 2017; Musto 2019).

[Figure 5 about here]

In Figure 6, we observe that not only is intelligence becoming increasingly masculine,

but so is unintelligence. Unintelligence begins in the 1940 window with a significant feminine

association (mean = 0.49, p<.01). It steadily sheds these associations until it eventually stops

being significantly associated with either gender in 1970. It then continues moving towards

masculinity, becoming significantly masculine in the 1990 window. By our final window (2000),

unintelligence has an average projection of -.55 (p < 0.01), which makes it approximately as

masculine as it was feminine in our first window (1940).

[Figure 6 about here]

Similarities between trend lines

As Figure 7 illustrates, the intelligence and unintelligence scales follow nearly identical

trajectories to one another (r = 0.98). We had not predicted this similarity, as none of the

literature we had reviewed suggested that the gender associations of intelligence and

unintelligence would undergo the same changes (rather than opposite changes). We interpret this unexpected simultaneous movement in detail in the discussion section, where we argue that it is

consistent with the questioning (or judgement) of intelligence becoming gendered masculine

(e.g., “is X smart, or is X stupid”?)

[Figure 7 about here]

Finally, we note that the gender associations of intelligence and school effort also move

with surprising synchrony, albeit in opposite directions. Overall, the Pearson’s correlation

between them is r = -0.93, which indicates that they have more than 86% of their variance in

DRAFT School, Studying and Smarts 35

common. Figure 8, which combines the intelligence and school effort scales onto one plot,

highlights the surprising extent to which the two trends mirror one another. As Table 3 details, in

1940, the intelligence and school effort scales start out slightly feminine (mean = 0.26, n.s.) and

masculine (-0.13, n.s.), respectively (difference = -.39, n.s.). Starting in 1950, both scales move in the direction of the opposite gender. Except for a shared pause in 1960-1965, these changes continue until the scales reach their present associations in 1990 for intelligence (masculine; mean = -0.55, p < 0.01) and 1985 for school effort (feminine; mean = 0.66, p < 0.01), after which point both sets of associations plateau (in 1990, difference = 1.14, p < 0.01; in 2000, difference =

1.17, p < 0.01). This unexpected synchrony leads us to speculate both changes may have occurred as part of one greater cultural realignment in the gender system. We detail this reasoning in the Discussion.

[Figure 8 about here]

[Table 3 about here]

DISCUSSION

Gender stereotypes have powerful consequences for adolescents’ academic outcomes. Because of a lack of systematic repeated measures of gender stereotypes across the years (Ridgeway

2011:167), it has remained largely unknown whether and how these stereotypes transformed

amid radical shifts in men and women’s educational attainment across the 20th century. In this article, we leveraged neural network word embeddings and a large corpus of American print media (1930-2009) to construct a novel historical barometer for these changes. We examined six sets of gender associations that prior work had linked to educational outcomes: (i) schooling; (ii) school effort; (iii) socio-behavioral skills; (iv) problem behaviors; (v) intelligence; and (vi) unintelligence. We found that, while the stereotypes of socio-behavioral skills and problem DRAFT School, Studying and Smarts 36

behaviors remained unchanged since the middle of the 20th century, the other four stereotypes have undergone dramatic changes, gaining substantial associations with either femininity or masculinity. None of the six stereotypes we tracked experienced a substantial decrease in gender associations.

Below, we revisit these empirical results to further develop their theoretical implications.

We begin with trends for which we had made theoretical predictions (see Table 1). We then examine the remaining trends and interrelationships between them, interpreting these unexpected results in light of existing empirical and theoretical work.

Expected Trends

We developed predictions regarding trends in four of these six stereotypes. To arrive at predictions regarding (i) schooling and (ii) school effort, we began by noting that, since the mid- twentieth century, women’s academic attainment rose drastically, and empirical evidence suggests that the amount of effort that girls put into schooling likely increased as well. Various theoretical accounts hold that cultural beliefs about gender are tightly intertwined with material arrangements, such as the proportion of men and women in higher education (Eagly 1987;

Ridgeway 2011). This pointed us to the prediction that schooling and school effort should become more closely associated with femininity.

Our results for scales that track (i) schooling broadly conceived (e.g., school, students) and (ii) specifically focusing on school effort (e.g., studying) support this supposition. Whereas neither scale had a statistically significant association with either gender in our earliest time window, we find that both steadily gained significant feminine associations across the 20th

century. This trend towards femininity is in line with theoretical expectations. Additionally, the

relatively recent emergence of these gender associations suggests that the stereotype of schooling

DRAFT School, Studying and Smarts 37

as “for girls” observed in existing empirical work (Jackson and Dempster 2009; Morris 2008)

may be a relatively recent cultural phenomenon—and thus possibly one that is less durable and

more open to classroom intervention than stereotypes around socio-behavioral skills and problem

behaviors.

Second, we noted that (iii) socio-behavioral skills and (iv) problem behaviors have

substantial overlap with the core stereotypes of women as communal and men as agentic—

stereotypes that have been shown to be especially resistant24 to change (Cejka and Eagly 1999;

Diekman and Eagly 2000; Haines et al. 2016; Koenig and Eagly 2005; Spence and Buckner

2000). This durability may be because these core stereotypes are particularly closely linked to

the basic structure of the gender system, where men are more likely to occupy positions of power

and prestige, and women to occupy lower-power positions that involve deference and caregiving

(Ridgeway 2011). We thus conjectured that socio-behavioral skills and problem behaviors would

also remain relatively fixed in their gender associations.

And indeed, our results in figures 2 and 3 indicate that both scales already had strong

gender associations by our first window (centered on 1940). We observe that these associations

then remained largely unchanged across the 20th century, which is in line with our theoretical

expectations. Differences in these stereotypical associations may have thus potentially provided a

longstanding cultural advantage for girls and disadvantage for boys, as suggested by previous

scholars (DiPrete and Buchmann 2013; Goldin et al. 2006).

24 There is evidence that women have come to see themselves as more agentic. However, gendered self-perceptions play a different psychological role than gender stereotypes (i.e, shared social knowledge of a typical man or woman), and may follow distinct trajectories of change (Ridgeway 2011:168). DRAFT School, Studying and Smarts 38

Other Observations

We will now discuss results for which we did not make predictions, beginning with trends for (v) intelligence and (vi) unintelligence. Our results indicate that intelligence did not have a significant gender association in 1940, whereas unintelligence had an association with

femininity. Across the decades, both stereotypes shifted substantially towards masculinity,

ending in 2000 with significant masculine associations. This invites two interesting questions:

first, why might both antonyms become associated with the same gender (rather than with

different genders); and second, why might intelligence follow the opposite trend from school

effort? We examine these in order.

To interpret the counterintuitive simultaneous movement of both intelligence and

unintelligence towards masculinity, recall that Word2Vec positions words relative one another in

a high-dimensional vector space so that different regions of this space represent different

contexts. There are some contexts that words for intelligence and unintelligence do not share: for

example, terms for intelligence are more likely to occur in the context of flattery or praise, while

terms for unintelligence are more likely to double as insults. Conversely, the context that is

perfectly shared by terms for intelligence and unintelligence—i.e., when terms for either are

equally likely to be observed25—occurs specifically when the intelligence of a person is judged,

compared, or questioned: e.g., X is “smart” or “dumb.” Because trends for intelligence and

unintelligence move in nearly perfect unison, we suspect that this is exactly the context that has

gained strong masculine associations. This means that the judgement of intelligence (whether

high or low) became an increasingly salient attribute of masculinity across time. This result

25 Net of how common the terms are overall (i.e., of their marginal probability of occurring in any context). DRAFT School, Studying and Smarts 39

recalls findings from recent ethnographic work, which has observed that boys are more likely

than girls to be seen as either “super smart” or unintelligent (Morris 2008; Musto 2019).

Perhaps our most striking finding concerns the mirror-opposite trends followed by

intelligence and school effort. This result recalls scholarship about students’ educational

mindsets (Dweck 2006), where the perception that intelligence and school effort are mutually

contradictory is a core aspect of the “fixed mindset”—the popular belief that intelligence is fixed from birth rather than acquired through effort. From the point of view of the fixed mindset,

“either you have ability or you expend effort”; thus, “effort is only for people with deficiencies”

(Dweck 2006:40–42). Indeed, research suggests that the appearance of effortlessness is held in

high esteem across many social domains. For example, perceptions of effortlessness have been

shown to increase subjects’ evaluations of the quality of inventor’s ideas (Elmore and Luna-

Lucero 2017), entrepreneurs’ business proposals (Tsay 2016), and musician’s talents (Tsay and

Banaji 2011). Thus, if intelligence and school effort are seen as competing routes to academic

achievement, intelligence is clearly the higher status and more desirable route.

A growing body of scholarship demonstrates that this opposition between intelligence

and school effort is strongly gendered, with students frequently associating effortless intelligence

with men and school effort with women (Elmore and Luna-Lucero 2017; Heyder and Kessels

2017; Jackson and Nyström 2015). Our results reinforce the conclusions of this work, but also

indicate that these gender associations may be a relatively recent phenomenon. Indeed, the

exceptionally strong negative correlation we observed between the gender associations of

intelligence and school effort seemingly suggests that these two sets of cultural associations are interlinked and may have arisen as part of one broader change to the gender system.

DRAFT School, Studying and Smarts 40

While our empirical analyses do not allow us to observe the details of this social change,

we can draw on contemporary accounts of the gender system (Correll and Ridgeway 2013;

Ridgeway 2011) to offer a tentative explanation. Ridgeway (2011) argues that the core of the

gender system can persist because its surface components are flexible: even amid material and cultural changes, women and men continue to be differentiated, with new social arrangements reframed so that men continue to be understood as higher status than women. From this perspective, the timing of our observed simultaneous change is conspicuous, with men’s intelligence and women’s school effort gaining in emphasis just as women came to challenge men in education. It is thus possible that, as educational attainment—an important social characteristic—stopped being a marker that advantaged men, women’s achievements in school became reframed as a product of effort—a less lauded and lower status avenue to success than

effortless brilliance (Heyder and Kessels 2017; Jackson and Nyström 2015). Meanwhile, men’s

academic performance became reframed as a question of effortless intelligence, thus possibly

allowing men to reap higher social standing and academic self-confidence from equal (or even

slightly lower) academic achievement as women (e.g., Smith et al. 2013; Tormala, Jia, and

Norton 2012).

Taken together, our results appear consistent with a dynamic landscape of gender

stereotypes across the 20th century, anchored by a persistent, underlying gender system whereby

men are afforded higher status than women. Prior work has argued that core aspects of gender

stereotypes remain largely unchanged partly because the distribution of men and women to high- prestige, powerful positions and lower-prestige, caretaking positions remains highly unequal.

Thus, women continue to be seen as more communal and men as more agentic. Accordingly, we

found that the schooling-relevant gender stereotypes that directly relate to these characteristics—

DRAFT School, Studying and Smarts 41

girls’ socio-behavioral skills and boys’ problem behaviors—also did not shift. Conversely, throughout the 20th century, women caught up to and surpassed men in educational attainment.

We observed that two gender stereotypes arose in parallel with this distinction: the association of school and studying with femininity, and of intelligence with masculinity. Since effortless intelligence is a higher-status route to achievement than effortful studying, we conjectured that this gendering of school effort may partly enable the hierarchical distinction at the core of the gender system to persist, even amidst sweeping material changes in women’s educational achievement.

CONCLUSION

As Ridgeway (2011:167) notes, “measures of gender stereotypes have not been systematically administered over the years,” which has limited our understanding of how gender stereotypes have changed across time. In this manuscript, we demonstrated how recently developed word embedding methods can be used to construct new historical measures of these stereotypes—even when no traditional data on these stereotypes are available. Moreover, embedding methods make it possible to investigate larger sets of related stereotypes than would generally be found in a longitudinal survey dataset—and to do so across vast time scales. A similar approach can be used to observe many further “missing” longitudinal cultural variables in other sociological subfields.

We thus hope our work helps further popularize embedding-based investigations in sociology— work that may be able to uncover important social shifts in stereotypes, categories, and schemas that have hitherto seemed empirically intractable.

REFERENCES

Aditomo, Anindito. 2015. “Students’ Response to Academic Setback: ‘Growth Mindset’ as a Buffer against Demotivation.” International Journal of Educational 4(2):198–222.

DRAFT School, Studying and Smarts 42

Adler, Patricia, Steven Kleiss, and Peter Adler. 1992. “Socialization to Gender Roles: Popularity among Elementary School Boys and Girls.” Sociology of Education 65(3):169–87. Antoniak, Maria, and David Mimno. 2018. “Evaluating the Stability of Embedding-Based Word Similarities.” Transactions of the Association for Computational Linguistics 6:107–119. Archer, Louise, Simon D. Pratt, and David Phillips. 2001. “Working-Class Men’s Constructions of Masculinity and Negotiations of (Non)Participation in Higher Education.” Gender and Education 13(4):431–49. Bian, Lin, Sarah-Jane Leslie, and Andrei Cimpian. 2017. “Gender Stereotypes about Intellectual Ability Emerge Early and Influence Children’s Interests.” Science 355(6323):389–91. Bian, Lin, Sarah-Jane Leslie, and Andrei Cimpian. 2018. “Evidence of Bias against Girls and Women in Contexts That Emphasize Intellectual Ability.” American Psychologist 73(9):1139. Blatz, W. E., and E. A. Bott. 1927. “Studies in Mental Hygiene of Children I. Behavior of Public School Children—A Description of Method.” 34. Bolukbasi, Tolga, Kai-Wei Chang, James Zou, Venkatesh Saligrama, and Adam Kalai. 2016. “Man Is to Computer Programmer as Woman Is to Homemaker? Debiasing Word Embeddings.” Adv. Neural Inf. Process. Syst. Bordua, David J. 1960. “Educational Aspirations and Parental Stress on College.” Social Forces 38(3):262–69. Caliskan, Aylin, Joanna J. Bryson, and Arvind Narayanan. 2017. “Semantics Derived Automatically from Language Corpora Contain Human-like Biases.” Science 356(6334):183–186. Cejka, Mary Ann, and Alice H. Eagly. 1999. “Gender-Stereotypic Images of Occupations Correspond to the Sex Segregation of Employment.” Personality and Bulletin 25(4):413–423. Clark, Edward. 1967. “Sex Differences in the Perception of Academic Achievement among Elementary School Children.” The Journal of Psychology 67(2):249–56. Cohen, Michele. 1998. “‘A Habit of Healthy Idleness’: Boys’ Underachievement in Historical Perspective.” in Failing Boys? Issues in gender and achievement. Buckingham: Open University Press. Correll, Shelley J., and Cecilia L. Ridgeway. 2013. “Expectation States Theory.” Pp. 29–51 in Handbook of Social Psychology. Davies, Mark. 2012. “Expanding Horizons in Historical Linguistics with the 400-Million Word Corpus of Historical American English.” Corpora 7(2):121–57. Deigmayr, Anne, Elizabeth Stern, and Renate Schubert. 2019. “Beliefs in ‘Brilliance’ and Belonging Uncertainty in Male and Female STEM Students.” Frontiers in Psychology 10(1114). Diekman, Amanda B., Alice H. Eagly, and Amanda M. Johnston. 2010. “Social Structure.” Pp. 209–24 in The Sage handbook of prejudice, stereotyping, and discrimination. Thousand Oaks, CA: Sage. Diekman, Amanda, and Alice Eagly. 2000. “Stereotypes as Dynamic Constructs: Women and Men of the Past, Present, and Future.” Personality and Social Psychology Bulletin 26(10):1171–88. DiPrete, Thomas A., and Claudia Buchmann. 2013. The Rise of Women: The Growing Gender Gap in Education and What It Means for American Schools. Russell Sage Foundation.

DRAFT School, Studying and Smarts 43

DiPrete, Thomas A., and Jennifer L. Jennings. 2012. “Social and Behavioral Skills and the Gender Gap in Early Educational Achievement.” Social Science Research 41(1):1–15. Downey, Douglas B., and Anastasia S. Vogt Yuan. 2005. “Sex Differences in School Performance During High School: Puzzling Patterns and Possible Explanations.” 46(2):229–321. Dweck, Carol S. 2006. Mindset : The New Psychology of Success. New York: Random House. Eagly, Alice. 1987. Sex Differences in Social Behaivor: A Social-Role Analysis. Hillsdale, NJ: Erlbaum. Eagly, Alice H., Christa Nater, David I. Miller, Michèle Kaufmann, and Sabine Sczesny. 2020. “Gender Stereotypes Have Changed: A Cross-Temporal Meta-Analysis of U.S. Public Opinion Polls from 1946 to 2018.” American Psychologist 75(3):301–15. Eagly, Alice H., and Wendy Wood. 1999. “The Origins of Sex Differences in Human Behavior.” American Psychologist 54(6):408–23. Elfenbein, Andrew. 1999. Romantic Genius: The Prehistory of a Homosexual Role. Columbia University Press. Elmore, Kristen C., and Myra Luna-Lucero. 2017. “Light Bulbs or Seeds? How Metaphors for Ideas Influence Judgments About Genius.” Social Psychological and Personality Science 8(2):200–208. Ely, Kimberly. 2010. “Understanding the Stereotypes Against Gifted Students: A Look at the Social and Emotional Struggles of Stereotyped Students.” Academic Leadership: The Online Journal 8(3). Entwisle, Doris R., Karl L. Alexander, and Linda S. Olson. 2007. “Early Schooling: The Handicap of Being Poor and Male.” Sociology of Education 80(2):114–38. Epstein, Debbie. 1998. “Real Boys Don’t Work:“underachievement’, Masculinity, and the Harassment of ‘Sissies.’” in Failing Boys? Issues in gender and achievement. Philadelpha, PA: Open University Press. Espinoza, Penelope, da Luz Fontes Arêas, and Clarissa Arms-Chavez. 2014. “Attributional Gender Bias: Teachers’ Ability and Effort Explanations for Students’ Math Performance.” Social Psychology of Education 17:105–26. Eyben, Rosalind, and Rebecca Napier-Moore. 2009. “Choosing Words with Care? Shifting Meanings of Women’s Empowerment in International Development.” Third World Quarterly 30(2):285–300. Finkelstein, Lev, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. 2002. “Placing Search in Context: The Concept Revisited.” ACM Transactions on Information Systems 20(1). Fischer, Claude S., and Michael Hout. 2006. Century of Difference: Diversity and Unity Among Americans, 1900-2000. Russell Sage Foundation. Fiske, Susan T., Amy J. C. Cuddy, Peter Glick, and Jun Xu. 2002. “A Model of (Often Mixed) Stereotype Content: Competence and Warmth Respectively Follow from Perceived Status and Competition.” Journal of Personality and Social Psychology 82(6):878–902. Fortin, Nicole M., Philip Oreopoulos, and Shelley Phipps. 2015. “Leaving Boys Behind: Gender Disparities in High Academic Achievement.” Journal of Human Resources 50(3):549– 79. Fox, Gudelia. 1968. “The Gifted: How Are They Viewed?” Gifted Child Quarterly 12(1):23–33.

DRAFT School, Studying and Smarts 44

Gálvez, Ramiro H., Valeria Tiffenberg, and Edgar Altszyler. 2019. “Half a Century of Stereotyping Associations between Gender and Intellectual Ability in Films.” Sex Roles 81:643–654. Garg, Nikhil, Londa Schiebinger, Dan Jurafsky, and James Zou. 2018. “Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes.” Proceedings of the National Academy of Sciences 115(16):E3635–44. Goldin, Claudia, Lawrence F. Katz, and Ilyana Kuziemko. 2006. “The Homecoming of American College Women: The Reversal of the College Gender Gap.” Journal of Economic Perspectives 20(4):133–56. Grand, G., I. A. Blank, F. Pereira, and E. Fedorenko. 2018. “Semantic Projection: Recovering Human Knowledge of Multiple, Distinct Object Features from Word Embeddings.” ArXiv Preprint 1802(01241). Haines, Elizabeth, Kay Deaux, and Nicole Lofaro. 2016. “The Times They Are A-Changing… or Are They Not? A Comparison of Gender Stereotypes.” Psychology of Women Quarterly 40(3):353–63. Hamilton, William L., Jure Leskovec, and Dan Jurafsky. 2016. “Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change.” Pp. 1489–1501 in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics. Hartley, Ruth E. 1959. “Sex-Role Pressures and the Socialization of the Male Child.” Psychological Reports 5:457–68. Hayes, Margaret Louise. 1943. A Study of the Classroom Disturbances of Eighth Grade Boys and Girls. New York: Teachers College, Columbia University. Hegarty, Peter. 2007. “From Genius Inverts to Gendered Intelligence: Lewis Terman and the Power of the Norm.” History of Psychology 10(2):132. Heyder, Anke, and Ursula Kessels. 2017. “Boys Don’t Work? On the Psychological Benefits of Showing Low Effort in High School.” Sex Roles 77(1):72–85. Hicks, Allan J., and Margaret Hayes. 1938. “Study of the Characteristics of 250 Junior High School Children.” Child Development 9(2):19–242. Hughes, Rhidian, and Meg Huby. 2012. “The Construction and Interpretation of Vignettes in Social Research.” Social Work and Social Sciences Review 11(1):36–51. Jackson, Carolyn. 2002. “‘Laddishness’ as a Self-Worth Protection Strategy.” Gender and Education 14(1):37–50. Jackson, Carolyn. 2003. “Motives for ‘Laddishness’ at School: Fear of Failure and Fear of the ‘Feminine.’” British Educational Research Journal 29(4):583–498. Jackson, Carolyn, and Steven Dempster. 2009. “‘I Sat Back on My Computer … with a Bottle of Whisky next to Me’: Constructing ‘Cool’ Masculinity through ‘Effortless’ Achievement in Secondary and Higher Education.” Journal of Gender Studies 18(4):341–56. Jackson, Carolyn, and Anne-Sofie Nyström. 2015. “‘Smart Students Get Perfect Scores in Tests without Studying Much’: Why Is an Effortless Achiever Identity Attractive, and for Whom Is It Possible?” Research Papers in Education 30(4):393–410. Jolly, Jennifer. 2008. “Historical Perspectives: Lewis Terman: Genetic Study of Genius— Elementary School Students.” Gifted Child Today 31(1):27–33. Jolly, Jennifer. 2009. “A Resuscitation of Gifted Education.” American Educational History Journal 36(1/2):37–52.

DRAFT School, Studying and Smarts 45

Jones, Jason, Mohammad Amin, Jessica Kim, and Steven Skiena. 2020. “Stereotypical Gender Associations in Language Have Decreased Over Time.” Sociological Science 7:1–35. Jones, Mary Cover. 1960. “A Comparison of the Attitudes and Interests of Ninth-Grade Students over Two Decades.” Journal of Educational Psychology 51(4):175–86. Joseph, Kenneth, and Jonathan H. Morgan. 2020. “When Do Word Embeddings Accurately Reflect Surveys on Our Beliefs About People?” ArXiv:2004.12043 [Cs]. Kagan, Jerome. 1964. “Acquisition and Significance of Sex Typing and Sex Role Identity.” Review of Child Development Research 1:137–67. Kay, Christian, Marc Alexander, Fraser Dallachy, Jane Roberts, Michael Samuels, and Irené Wotherspoon, eds. 2020. The Historical Thesaurus of English. 4.21. Glasgow: University of Glasgow. Kessels, Ursula, Anke Heyder, Martin Latsch, and Bettina Hannover. 2014. “How Gender Differences in Academic Engagement Relate to Students’ Gender Identity.” Educational Research 56(2):220–29. Klonis, Suzanne C., E. Ashby Plant, and Patricia G. Devine. 2005. “Internal and External Motivation to Respond Without Sexism.” Personality and Social Psychology Bulletin 31(9):1237–49. Koenig, Anne, and Alice Eagly. 2014. “Evidence for the Social Role Theory of Stereotype Content: Observations of Groups’ Roles Shape Stereotypes.” Journal of Personality and Social Psychology 107(3):371. Koenig, Anne M., and Alice H. Eagly. 2005. “Stereotype Threat in Men on a Test of Social Sensitivity.” Sex Roles 52(7–8):489–496. Kozlowski, Austin C., Matt Taddy, and James A. Evans. 2019. “The Geometry of Culture: Analyzing the Meanings of Class through Word Embeddings.” American Sociological Review 84(5). Legewie, Joscha, and Thomas A. DiPrete. 2012. “School Context and the Gender Gap in Educational Achievement.” American Sociological Review 77(3):463–85. Leslie, Sarah-Jane, Andrei Cimpian, Meredith Meyer, and Edward Freeland. 2015. “Expectations of Brilliance Underlie Gender Distributions across Academic Disciplines.” Science 347(6219):262–65. Levy, Omer, and Yoav Goldberg. 2014. “Linguistic Regularities in Sparse and Explicit Word Representations.” Pp. 171–180 in Proceedings of the Eighteenth Conference on Computational Natural Language Learning. Lupetow, Lloyd, Lori Garovich, and Margaret Lupetow. 1996. “The Persistence of Gender Stereotypes in the Face of Changing Sex Roles: Evidence Contrary to the Sociocultural Model.” Ethology and Sociobiology 16(6):509–30. Marrs, Heath. 2016. “Conformity to Masculine Norms and Academic Engagement in College Men.” Psychology of Men & Masculinity 17(2):197–205. McMillan, James H., Steve Myran, and Daryl Workman. 2002. “Elementary Teachers’ Classroom Assessment and Grading Practices.” The Journal of Educational Research 95(4):203–13. Meyer, William J., and George G. Thompson. 1956. Sex Differences in the Distribution of Teacher Approval and Disapproval among Sixth-Grade Children. Vol. 47. Mickelson, Roslyn Arlin. 1989. “Why Does Jane Read and Write so Well? The Anomaly of Women’s Achievement.” Sociology of Education 62(1):47–63.

DRAFT School, Studying and Smarts 46

Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Efficient Estimation of Word Representations in Vector Space.” ArXiv [Cs.CL] 1301.3781. Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeffrey Dean. 2013. “Distributed Representations of Words and Phrases and Their Compositionality.” Adv. Neural Inf. Process. Syst. 26:3111–3119. Milkie, Melissa A. 1999. “Social Comparisons, Reflected Appraisals, and Mass Media: The Impact of Pervasive Beauty Images on Black and White Girls’ Self-Concepts.” Social Psychology Quarterly 62(2):190–210. Morris, Edward W. 2008. “‘Rednecks,’ ‘Rutters,’ and `Rithmetic: Social Class, Masculinity, and Schooling in a Rural Context.” Gender & Society 22(6):728–51. Mrksic, Nikola, Diarmuid Ó. Séaghdha, Blaise Thomson, Milica Gašić, Lina Maria Rojas- Barahona, Pei-hao Su, David Vandyke, Tsung-Hsien Wen, and Steve J. Young. 2016. “Counter-Fitting Word Vectors to Linguistic Constraints.” Pp. 142–148 in Proceedings of NAACL-HLT. Mu, Jiaqi, and Pramod Viswanath. 2018. “All-but-the-Top: Simple and Effective Postprocessing for Word Representations.” in ICLR. Musto, Michela. 2019. “Brilliant or Bad: The Gendered Social Construction of Exceptionalism in Early Adolescence.” American Sociological Review 84(3):369–93. Newberry, Mitchell G., Christopher A. Ahern, Robin Clark, and Joshua B. Plotkin. 2017. “Detecting Evolutionary Forces in Language Change.” Nature 551(7679):223–26. Owens, Jayanti. 2016. “Early Childhood Behavior Problems and the Gender Gap in Educational Attainment in the United States.” Sociology of Education 89(3):236–58. Pascoe, CJ. 2007. Dude, You’re a Fag: Masculinity and Sexuality in High School. Berkeley: University of California Press. Peltier, Gary. 1968. “Sex Differences in the School: Problem and Proposed Solutions.” The Phi Delta Kappan 50(3):182–85. Perek, Florent. 2018. “Recent Change in the Productivity and Schematicity of the Way- Construction: A Distributional Semantic Analysis.” Corpus Linguistics and Linguistic Theory 14(1):65–97. Porter, Jim Wynter. 2017. “A ‘Precious Minority’: Constructing the ‘Gifted’ and ‘Academically Talented’ Student in the Era of Brown v. Board of Education and the National Defense Education Act.” Isis 108(3):581–605. Prentice, Deborah, and Erica Carranza. 2002. “What Women and Men Should Be, Shouldn’t Be, Are Allowed to Be, and Don’t Have to Be: The Contents of Prescriptive Gender Stereotypes.” Psychology of Women Quarterly 26(4):269–81. Pritchard, Miriam. 1952. “Total School Planning for the Gifted Child.” Exceptional Children 18(4):107–10. Retelsdorf, Jan, Katja Schwartz, and Frank Asbrock. 2015. “‘Michael Can’t Read!’ Teachers’ Gender Stereotypes and Boys’ Reading Self-Concept.” Journal of Educational Psychology 107(1):186–94. Ridgeway, Cecilia L. 2011. Framed by Gender. New York: Oxford University Press. Ridgeway, Cecilia L., Kristen Backor, Yan E. Li, Justine E. Tinkler, and Kristan G. Erickson. 2009. “How Easily Does a Social Difference Become a Status Distinction? Gender Matters.” American Sociological Review 74(1):44–62.

DRAFT School, Studying and Smarts 47

Ridgeway, Cecilia L., and Shelley J. Correll. 2004. “Unpacking the Gender System: A Theoretical Perspective on Gender Beliefs and Social Relations.” Gender & Society 18(4):510–31. Ruble, Diane, Carol Lynn Martin, and Sheri Berenbaum. 2007. “Gender Development.” in Handbook of Child Psychology. Vol. 3. John Wiley & Sons, Inc. Seashore, C. E. 1931. “Review of The Promise of Youth.” Bulletin of the American Association of University Professors (1915-1955) 17(3):243–45. Seguino, Stephanie. 2007. “PlusÇa Change? Evidence on Global Trends in Gender Norms and Stereotypes.” Feminist Economics 13(2):1–28. Shields, Stephanie. 1975. “Functionalism, Darwinism, and the Psychology of Women.” American Psychologist 30(7):739. Shields, Stephanie. 1982. “The Variability Hypothesis: The History of a Biological Model of Sex Differences in Intelligence.” Signs 7(4):769–97. Silverman, Linda. 1989. “It All Began with Leta Hollingsworth: The Story of Giftedness in Women.” Journal for the Education of the Gifted 12(2):86–98. Smith, Jessi L., Karyn L. Lewis, Lauren Hawthorne, and Sara D. Hodges. 2013. “When Trying Hard Isn’t Natural: Women’s Belonging With and Motivation for Male-Dominated STEM Fields As a Function of Effort Expenditure Concerns.” Personality and Social Psychology Bulletin 39(2):131–43. Solomon, Barbara Miller. 1985. In the Company of Educated Women: A History of Women and Higher Education in America. New Haven: Yale University Press. Spence, Janet, and Camille Buckner. 2000. “Instrumental and Expressive Traits, Trait Stereotypes, and Sexist Attitudes: What Do They Signify?” Psychology of Women Quarterly 24(1):44–53. Spencer, Steven J., Claude M. Steele, and Diane M. Quinn. 1999. “Stereotype Threat and Women’s Math Performance.” Journal of Experimental Social Psychology 28:4–28. Sriram, Rishi. 2014. “Rethinking Intelligence: The Role of Mindset in Promoting Success for Academically High-Risk Students.” Journal of College Student Retention: Research, Theory & Practice 15(4):515–36. Stadler, Gustavus. 1999. “Louisa May Alcott’s Queer Geniuses.” American Literature 71(4):657–77. Stoltz, Dustin S., and Marshall A. Taylor. 2019. “Concept Mover’s Distance: Measuring Concept Engagement via Word Embeddings in Texts.” Journal of Computational Social Science 2(2):293–313. Storage, Daniel, Zachary Horne, Andrei Cimpian, and Sarah-Jane Leslie. 2016. “The Frequency of ‘Brilliant’ and ‘Genius’ in Teaching Evaluations Predicts the Representation of Women and African Americans across Fields.” PloS One 11(3). Tagler, Michael. 2012. “Choking under the Pressure of a Positive Stereotype: Gender Identification and Self-Consciousness Moderate Men’s Math Test Performance.” The Journal of Social Psychology 152(4):401–16. Tannenbaum, Abraham. 2000. “A History of Giftedness in School and Society.” Pp. 23–53 in International Handbook of Giftedness and Talent. Elsevier. Terman, L., and C. C. Miles. 1936. Sex and Personality: Studies in Masculinity and Femininity. New York: McGraw-Hill.

DRAFT School, Studying and Smarts 48

Thorne, Barrie. 1993. Gender Play: Girls and Boys in School. New Brunswick, NJ: Rutgers University Press. Tiedemann, Joachim. 2000. “Parents’ Gender Stereotypes and Teachers’ Beliefs as Predictors of Children’s Concept of Their Mathematical Ability in Elementary School.” Journal of Educational Psychology 92(1):144–51. Tormala, Zakary L., Jayson S. Jia, and Michael I. Norton. 2012. “The Preference for Potential.” Journal of Personality and Social Psychology 103(4):567–83. Tourangeau, K., C. Nord, T. Lê, A. G. Sorongon, M. C. Hagedorn, P. Daly, and M. Najarian. 2015. “Early Childhood Longitudinal Study, Kindergarten Class of 2010–11.” Tsay, Chia-Jung. 2016. “Privileging Naturals Over Strivers: The Costs of the Naturalness Bias.” Personality and Social Psychology Bulletin 42(1):40–53. Tsay, Chia-Jung, and Mahzarin R. Banaji. 2011. “Naturals and Strivers: Preferences and Beliefs about Sources of Achievement.” Journal of Experimental Social Psychology 47(2):460– 65. Tuddenham, Read D. 1952. Studies in Reputation: I. Sex and Grade Differences in School Children’s Evaluations of Their Peers. II. The Diagnosis of Social Adjustment. Vol. 66. Warne, Russell. 2019. “An Evaluation (and Vindication?) Of Lewis Terman: What the Father of Gifted Education Can Teach the 21st Century.” Gifted Child Quarterly 63(1):3–21. Warrington, Molly, Mike Younger, and Jacquetta Williams. 2000. “Student Attitudes, Image and the Gender Gap.” British Educational Research Journal 26(3):393–407. Wood, Wendy, and Alice H. Eagly. 2002. “A Cross-Cultural Analysis of the Behavior of Women and Men:Implications for the Origins of Sex Differences.” Psychological Bulletin 128:699–727. Yavorsky, Jill, and Claudia Buchmann. 2019. “Gender Typicality and Academic Achievement among American High School Students.” Sociological Science 6:661–83. Young, Robert, and Helen Sweeting. 2004. “Adolescent Bullying, Relationships, Psychological Well-Being, and Gender-Atypical Behavior: A Gender Diagnosticity Approach.” Sex Roles 50(7–8):525–37.

DRAFT Table 1. Predictions based on the existing literature.

Present associations Change,1940 - 2000 Schooling Feminine Towards femininity Socio-behavioral skills Feminine No change Problem behaviors Masculine No change School effort Feminine Towards femininity High intelligence Masculine X Low intelligence X X

Note: An ‘X’ indicates situations when the existing literature does unambiguously identify a single prediction.

Table 2. Average projection of six keyword scales onto the Feminine (positive) to Masculine (negative) axis across 13 sliding 20-year windows.

Social and School Schooling Behavioral Skills Problem Behaviors Effort Intelligence Unintelligence Window Mean Change Mean Change Mean Change Mean Change Mean Change Mean Change 1940 .16 .84 ** -.42 ** -.13 .26 .49 ** 1945 .20 .03 .84 ** .00 -.31 ** .11 -.12 .01 .23 -.04 .46 ** -.03 1950 .30 .14 .85 ** .02 -.33 ** .09 -.09 .04 .26 * .00 .57 ** .09 1955 .44 ** .28 .73 ** -.11 -.45 ** -.03 -.02 .11 .19 -.07 .42 * -.07 1960 .44 ** .28 .66 ** -.18 -.45 ** -.03 .13 .26 .06 -.20 .46 * -.03 1965 .51 ** .35 .68 ** -.16 -.42 ** .00 .12 .25 .07 -.19 .41 * -.08 1970 .67 ** .51 .72 ** -.11 -.42 ** .00 .28 .41 .02 -.24 .27 -.22 1975 .80 ** .64 * .70 ** -.13 -.45 ** -.03 .43 * .56 -.04 -.31 .13 -.36 1980 .86 ** .70 ** .75 ** -.08 -.44 ** -.02 .43 * .56 -.09 -.35 .01 -.48 * 1985 .97 ** .80 ** .73 ** -.10 -.56 ** -.14 .66 ** .79 ** -.23 -.49 * -.19 -.68 ** 1990 .95 ** .79 ** .61 ** -.23 -.62 ** -.20 .60 ** .73 * -.55 ** -.81 *** -.41 * -0.90 *** 1995 1.01 ** .85 ** .63 ** -.21 -.59 ** -.17 .72 ** .85 ** -.55 ** -.82 *** -.46 * -0.95 *** 2000 .98 ** .82 *** .57 ** -.27 -.55 ** -.13 .63 ** .76 * -.54 ** -.80 *** -.55 ** -1.03 ***

Note: The midpoint of each time window t is the start of the year indicated in the “Window” column, i.e., “1940” corresponds to the time window from January 1st, 1930 to December 31st, 1949. Each result in each Mean column is the average projection of the corresponding keyword scale across embeddings estimated from bootstrap resamples of the corresponding COHA window (200 embeddings per window, 𝑡𝑡 2600 embeddings total). Change is from the 1940 mean. Significance𝑚𝑚𝑚𝑚𝑚𝑚 𝑛𝑛tests for mean against : = 0 are based on CIs from 200 bootstrapped embeddings. Tests for change against : = compare each of 200 bootstrapped embeddings for window 0 t to each embedding for 1940, since embeddings for different windows are independent (40,000𝐻𝐻 𝑚𝑚𝑚𝑚𝑚𝑚comparisons𝑚𝑚 per significance test). 𝐻𝐻0 𝑚𝑚𝑚𝑚𝑚𝑚𝑛𝑛𝑡𝑡 𝑚𝑚𝑚𝑚𝑚𝑚𝑛𝑛1940 * p < .05; ** p < .01; *** p < .001 (two-tailed tests).

Table 3. Differences in the average projections of School Effort and Intelligence onto the Feminine (positive) to Masculine (negative) axis across 13 sliding 20-year windows.

School Difference Window Intelligence Effort (effort – intelligence) 1940 .26 -.13 -.40 1945 .23 -.12 -.35 1950 .26 * -.09 -.36 1955 .19 -.02 -.20 1960 .06 .13 .07 1965 .07 .12 .05 1970 .02 .28 .26 1975 -.04 .43 * .48 1980 -.09 .43 * .53 * 1985 -.23 .66 ** .89 ** 1990 -.55 ** .60 ** 1.15 ** 1995 -.55 ** .72 ** 1.27 ** 2000 -.54 ** .63 ** 1.17 **

Note: The midpoint of each time window is January 1st of the year indicated in the “Window” column. Each result in the Studying and Intelligence column is the mean of that scale across 200 bootstrapped embeddings estimated from resamples of the corresponding part of COHA (total of 2600 embeddings). Difference(year) = Studying(year) – Intelligence(year). * p < .05; ** p < .01 (two-tailed tests); significance is estimated by comparing the mean projection of the Studying scale to that of the Intelligence scale within each bootstrapped embedding for a given window (row) (200 comparisons per significance test).

Appendix Table A1. Construction of the schooling scale. “Keyword” column contains all the keywords we initially attempted to use for this scale, and whether they satisfied the word “Count” criterion and the hand-coded “Usage” criterion. “Overall” column indicates whether the keyword entered the final scale.

Keyword Count Usage Overall Keyword Count Usage Overall

academic ✓ ✓ ✓ classmates X X

education ✓ ✓ ✓ schoolmates X X

educational ✓ ✓ ✓ sophomore X ✓ X

class ✓ ✓ ✓ diploma X ✓ X

classes ✓ ✓ ✓ diplomas X X

school ✓ ✓ ✓ graduation ✓ ✓ ✓

schools ✓ ✓ ✓ graduate ✓ ✓ ✓

schooling ✓ ✓ ✓ graduates ✓ ✓ ✓

semester X X graduated ✓ ✓ ✓

semesters X X educated ✓ ✓ ✓

schoolroom X X grade ✓ ✓ ✓

schoolyard X X grades ✓ ✓ ✓

blackboard ✓ ✓ ✓ exam X ✓ X

blackboards X X exams X X

classroom ✓ ✓ ✓ quiz X ✓ X

classrooms ✓ ✓ ✓ quizzes X X

student ✓ ✓ ✓ college ✓ ✓ ✓

students ✓ ✓ ✓ colleges ✓ ✓ ✓

classmate X X university ✓ ✓ ✓

schoolmate X X universities ✓ ✓ ✓

Note: ✓ = yes, X = no. Blank entries in the “Usage” criterion mean we did not hand-validate that keyword.

Appendix Table A2. Construction of the socio-behavioral skills scale. “Keyword” column contains all the keywords we initially attempted to use for this scale, and whether they satisfied the word “Count” criterion and the hand-coded “Usage” criterion. “Overall” column indicates whether the keyword entered the final scale.

Keyword Count Usage Overall Keyword Count Usage Overall attentive conscientious ✓ ✓ ✓ ✓ ✓ attentively conscientiously X X X X attention polite ✓ ✓ ✓ ✓ ✓ ✓ listening politeness ✓ ✓ ✓ ✓ ✓ listens politely ✓ ✓ ✓ ✓ ✓ listened careful ✓ ✓ ✓ ✓ ✓ ✓ courteous carefully ✓ ✓ ✓ ✓ ✓ courteously meticulous X X ✓ ✓ cooperative meticulously ✓ X X X X cooperatively X X

Note: ✓ = yes, X = no. Blank entries in the “Usage” criterion mean we did not hand-validate that keyword.

Appendix Table A3. Construction of the problem behavior scale. “Keyword” column contains all the keywords we initially attempted to use for this scale, and whether they satisfied the word “Count” criterion and the hand-coded “Usage” criterion. “Overall” column indicates whether the keyword entered the final scale.

Keyword Count Usage Overall Keyword Count Usage Overall Keyword Count Usage Overall rough attacking disobeys ✓ X X ✓ ✓ X X anger roughhouse disobeyed ✓ ✓ X X X X angry roughhousing disobeying ✓ ✓ X X X X aggression quarrelsome disobedience ✓ ✓ X X X X aggressive quarrel disrupt ✓ ✓ ✓ ✓ ✓ X X aggressiveness quarrels disrupting X X X X X X aggressively quarreling disrupts X X X X X X violent interrupt disruption ✓ ✓ ✓ ✓ ✓ X X violence interrupted disrupted ✓ ✓ ✓ ✓ ✓ ✓ violently interrupting disrupter ✓ ✓ ✓ ✓ X X destroys undisciplined impulsive ✓ ✓ ✓ X X ✓ ✓ destruction unruly impulsively ✓ ✓ X X X X destructive unruliest rebel ✓ ✓ X X ✓ ✓ fight disorderly rebels ✓ ✓ X X ✓ ✓ fights disorderliest rebelled ✓ ✓ ✓ X X X X fighting stubborn rebelling ✓ ✓ ✓ ✓ ✓ X X fought stubbornly rebellious ✓ ✓ ✓ ✓ ✓ ✓ ✓ combative stubbornest rebelliously X X X X X X combatively stubbornness rowdy X X X X X X conflict misbehave rowdily ✓ ✓ ✓ X X X X conflicts misbehaved rowdiest ✓ ✓ X X X X conflictual misbehaves uncontrollable X X X X ✓ ✓ conflictually misbehaving unmanageable X X X X X X Keyword Count Usage Overall Keyword Count Usage Overall Keyword Count Usage Overall confrontation misbehavior bully X X X X ✓ ✓ ✓ confrontations misbehaviors bullies X X X X X X confrontationally disobedient bullied X X X X X X attack disobediently bullying ✓ ✓ X X X X disobey attacks ✓ ✓ X X

Note: ✓ = yes, X = no. Blank entries in the “Usage” criterion mean we did not hand-validate that keyword.

Appendix Table A4. Construction of the school effort scale. “Keyword” column contains all the keywords we initially attempted to use for this scale, and whether they satisfied the word “Count” criterion and the hand-coded “Usage” criterion. “Overall” column indicates whether the keyword entered the final scale.

Keyword Count Usage Overall Keyword Count Usage Overall homework studies X ✓ X ✓ ✓ ✓ homeworks studied X X ✓ ✓ ✓ textbook studious ✓ ✓ ✓ X X textbooks effortfully ✓ ✓ ✓ X X study effortful ✓ ✓ ✓ X X studying ✓ ✓ ✓

Note: ✓ = yes, X = no. Blank entries in the “Usage” criterion mean we did not hand-validate that keyword.

Appendix Table A5. Construction of the intelligence scale. “Keyword” column contains all the keywords we initially attempted to use for this scale, and whether they satisfied the word “Count” criterion and the hand-coded “Usage” criterion. “Overall” column indicates whether the keyword entered the final scale.

Keyword Count Usage Overall Keyword Count Usage Overall genius intelligently ✓ ✓ ✓ X X geniuses cleverly X X ✓ ✓ ✓ brilliant ✓ ✓ ✓ smarter ✓ ✓ ✓ brilliance cleverer ✓ ✓ ✓ X X smartest savvier X X X X ingenious intelligence ✓ ✓ ✓ ✓ ✓ ingeniously smarts X X X X cleverest aptitude X X X X savviest intellect X X ✓ ✓ ✓ brainiest acumen X X X X polymath precocious X X X X polymaths precocity X X X X smart brainy ✓ ✓ ✓ X X intelligent brainier ✓ ✓ ✓ X X clever intellectual ✓ ✓ ✓ ✓ savvy intellectually X X ✓ ✓ insightful X X

Note: ✓ = yes, X = no. Blank entries in the “Usage” criterion mean we did not hand-validate that keyword.

Appendix Table A6. Construction of the unintelligence scale. The number of synonyms for “stupid” is staggeringly large; however, most were rejected because they were too rare. This table lists only a part of this mass of rejected terms.

Keyword Count Usage Overall Keyword Count Usage Overall Keyword Count Usage Overall stupid dimwits thickheaded ✓ ✓ ✓ X X X X stupidly blockhead thickheads ✓ ✓ X X X X stupidity blockheaded featherhead ✓ ✓ X X X X stupidest blockheads featherheaded X X X X X X dumb dolt featherheads ✓ ✓ ✓ X X X X dumbest dolts fathead X X X X X X idiotic oaf fatheads ✓ ✓ X X X X idiot oafs fatheaded ✓ ✓ ✓ X X X X idiocy simpleton jackass X X X X X X idiotically simpletons jackasses X X X X X X fool knucklehead cretin ✓ ✓ ✓ X X X X fools knuckleheaded cretins ✓ ✓ ✓ X X X X foolish knuckleheads halfwit ✓ ✓ ✓ X X X X dummies bonehead halfwitted X X X X X X dummy boneheads halfwits ✓ ✓ X X X X dunce boneheaded nitwit X X X X X X dunces ignoramus nitwits X X X X X X moron ignoramuses nitwitted X X X X X X morons numskull nincompoop X X X X X X imbecile numskulls nincompoops X X X X X X imbeciles numbskull harebrain X X X X X X dimwit numbskulls harebrained X X X X X X dimwitted X X thickhead X X

Note: ✓ = yes, X = no. Blank entries in the “Usage” criterion mean we did not hand-validate that keyword. FIGURES

Figure 1. Gendered associations (y) of schooling across time (x). Embedding-based estimates using COHA (200 bootstraps). Lighter CI is for = .05 comparisons against 0. Darker CI is for approx. = .05 comparisons against a different time point or trend line.

𝛼𝛼 𝛼𝛼

Figure 2. Gendered associations (y) of social-behavioral skills across time (x). Embedding-based estimates using COHA (200 bootstraps). Lighter CI is for = .05 comparisons against 0. Darker CI is for approx. = .05 comparisons against a different time point or trend line.

𝛼𝛼 𝛼𝛼

Figure 3. Gendered associations (y) of problem behaviors across time (x). Embedding-based estimates using COHA (200 bootstraps). Lighter CI is for = .05 comparisons against 0. Darker CI is for approx. = .05 comparisons against a different time point or trend line.

𝛼𝛼 𝛼𝛼

Figure 4. Gendered associations (y) of studying across time (x). Embedding-based estimates using COHA (200 bootstraps). Lighter CI is for = .05 comparisons against 0. Darker CI is for approx. = .05 comparisons against a different time point or trend line.

𝛼𝛼 𝛼𝛼

Figure 5. Gendered associations (y) of intelligence (x). Embedding-based estimates using COHA (200 bootstraps). Lighter CI is for = .05 comparisons against 0. Darker CI is for approx. = .05 comparisons against a different time point or trend line. 𝛼𝛼 𝛼𝛼

Figure 6. Gendered associations (y) of unintelligence (x). Embedding-based estimates using COHA (200 bootstraps). Lighter CI is for = .05 comparisons against 0. Darker CI is for approx. = .05 comparisons against a different time point or trend line. 𝛼𝛼 𝛼𝛼

Figure 7. Gendered associations (y) of intelligence and unintelligence across time (x). Lighter CI is for = .05 comparisons against 0. Darker CI is for approx. = .05 comparisons against a different time point, or across the two scales. 𝛼𝛼 𝛼𝛼

.

Figure 8. Gendered associations (y) of intelligence and school effort (studying) across time (x). Lighter CI is for = .05 comparisons against 0. Darker CI is for approx. = .05 comparisons against a different time point, or across the two scales. 𝛼𝛼 𝛼𝛼