Shifting Standards: How Voters Evaluate the Qualifications of Female and Male Candidates

Nichole Bauer, PhD University of Alabama

Abstract: Existing empirical research finds that female candidates, including challenger and incumbent female candidates, have higher levels of qualifications for serving in political office compared to their male counterparts. One assumption behind this finding is that female candidates must have higher levels of qualifications to overcome negative feminine stereotypes that characterize women as ill qualified for leadership positions. However, it is not clear if voters hold female candidates to higher, or different, qualification standards relative to male candidates. I draw on psychology research to develop a theory that explains how and when voters shift their qualification standards based on candidate gender. I argue that voters will shift the reference points used to evaluate political candidates based on a candidate’s gender. I test this theory using a series of survey experiments that investigate gender differences in voter evaluations of candidate quality. The experiments in this manuscript also examine how two electoral factors that can affect how voters evaluate female candidate qualifications: candidate productivity and the competitive context of an election. The results show that voters do, in fact, hold female and male candidates to different qualification standards, and these more stringent standards can decrease vote support for female candidates. These findings have broad implications as they explain why women’s underrepresentation continues at every level of elected office in the United States.

Paper prepared for the Rutgers Conference on Resistance to Women in Leadership. Please do not cite without the author’s permission.

A critique faced by Hillary Clinton during the 2016 Democratic primary and the general election is that she lacked the qualifications necessary to serve as president.1 Clinton’s resume included winning election to the Senate, twice, and serving as Secretary of State under the

Obama administration. Moreover, an October 2016 Pew study found that 62% of respondents agreed that Hillary Clinton is well qualified to serve as president.2 Clinton is not alone in facing qualification critiques as female candidates often face scrutiny for lacking the necessary skills for serving in political office (Carroll and Sanbonmatsu 2013). Despite these criticisms, empirical research finds that female candidates, including incumbents and challengers, have more political experience, stronger professional profiles, and higher levels of education relative to male candidates (Fulton 2012, Anzia and Berry 2011). In other words, female candidates are of a higher quality compared to male candidates. An underlying assumption behind this finding is that female candidates must meet higher qualification standards to overcome bias rooted in conventional feminine stereotypes that characterize women as ill suited for political office. I test this assumption.

A long-standing empirical finding is that female candidate win elections at equal rates as male candidates (Darcy and Schramm 1977, Seltzer, Newman, and Leighton 1997, Burrell 1994,

Carroll 1994, Fox 2006, Duerst-Lahti 1998), and this finding leads to the conclusion that gender bias is no longer a problem for women in the political arena. Yet, descriptive data about the number of women in elected office at the national, state, and local levels illustrates a striking gender disparity in representation. Current approaches to measure differences in how voters ascribe gendered traits to female and male candidates (Bauer 2015, Rosenwasser et al. 1987,

1 http://www.cnn.com/2016/04/06/politics/bernie-sanders-hillary-clinton-qualified/

2 http://www.people-press.org/question-search/?qid=1887942&pid=51&ccid=51 3

Brooks 2013), perceived issue competencies of candidates (Huddy and Terkildsen 1993,

Schneider 2014), and voter willingness to support a female candidate (Dolan 2014). A critical omission from this body of research is how candidate gender affects perceptions of candidate qualifications at the voter level.3 Equal outcomes at the ballot do not necessarily mean that the electoral process is gender neutral.

Using a series of innovative survey experiments, I show that voters use different qualification standards to rate the abilities of female and male candidates. These experiments identify not only the role of candidate gender in perceptions of qualifications, but also identify the electoral conditions under which voters are most likely to hold female candidates to higher qualification standards. This project clarifies two empirical puzzles in the gender, political psychology, and political behavior literatures. First, this research confirms that only high quality female candidates can mitigate the potential for gender bias among voters, but that low quality male candidates have an electoral advantage. Second, this research shows that gender bias can affect evaluations of female candidates in different ways throughout the impression formation process.

Identifying gender differences in perceptions of candidate qualifications has critical implications for elections and representation. A high qualification standard based on candidate gender creates electoral obstacles for female candidates. For example, female candidates may y win elections by smaller margins than male candidates (Pearson and McGhee 2013) thus opening up female incumbents to more primary and general election challengers (Burrell 1994, Milyo and

3 Fulton (2012) examines how political activists rate the qualifications of female candidates, and connects these perceptions of candidate quality to vote outcomes, but does not examine how voters, who are not activists, perceive candidate qualifications. 4

Schlosberg 2000). Holding female and male candidates to different qualification standards, even if the female candidates who do run for office can meet these standards and win a seat to

Congress, means that female candidates may face a subtle and pernicious form of gender bias.

Gender Differences in Candidate Qualifications

Female House and Senate candidates, as challengers and as incumbents, have a stronger set of qualifications relative to their male counter-parts. As challengers, female candidates have more experience serving in lower levels of political office (Palmer and Simon 2001, Pearson and

McGhee 2013, Maestas et al. 2006). As incumbents, female candidates have stronger legislative records to run on than male incumbents due to higher levels of legislative productivity (Anzia and Berry 2011, Volden, Wiseman, and Wittmer 2013). In short, female candidates outperform their male counterparts at every stage of the candidate emergence process. Female candidates are able to win elections at equal rates as male candidates because of their exceptionally high qualification levels. These high qualifications enable female candidates to win election at equal rates as candidates (Fulton 2012, Pearson and McGhee 2013, Fulton 2014). Existing research identifies several sources of the gendered qualification gap including the role of institutional gatekeepers and perceptions of bias among women in the potential candidate pool.

State and local party networks influence the candidate recruitment process, and these networks often engage in recruitment patterns that favor male candidates (Sanbonmatsu 2006,

Carroll 1994). Local party networks are more likely to ask men rather than women to run for office (Lawless 2012), and these dynamics mean female candidates need to be exceptionally well-qualified to get on the radar of local party leaders. There are also differences in recruitment practices across parties that contribute to the gendered qualification gap. The Democratic Party is more likely to recruit and field female candidates compared to the Republican Party (Crowder- 5

Meyer and Lauderdale 2014, Elder 2012). A perception exists that female Republican candidates are more ideologically moderate than male Republicans (Thomsen 2014), and Republican voters express lower levels of support for female Republican candidates compared to male Republican candidates (Matland and King 2002). These factors suggest that recruitment patterns within and across parties create entry barriers that contribute to the gendered qualification gap.

Socialization factors also contribute to the gendered qualification gap. Women in the pool of potential political candidates often downgrade their own qualifications for political office, even if these potential contenders have the same set of objective qualifications as male candidates (Lawless and Fox 2010). Consequently, female candidates spend more time in lower levels of political office establishing their profiles before moving onto a higher level of office

(Maestas et al. 2006, Fulton 2006). Potential female candidates also express aversion to electoral competition, conflict, and the power-related goals involved in pursuing political office (Kanthak and Woon 2015, Schneider et al. forthcoming). The perception of gender bias leads to a sex- based candidate selection process where female candidates perceive that they “must be better than their male counterparts to be elected (p. 481)” (Anzia and Berry 2011). Together, recruitment patterns among parties as well as perceptions of gender bias among potential female contenders contribute to the gendered qualification gap, but neither of these explanations considers how voters evaluate the qualifications of female candidates compared to male candidates.

Few studies investigate how voters evaluate the qualifications of female and male candidates. Observational research offers comparisons of the qualifications of female and male candidates and connects objective measures of candidate qualifications to vote outcomes to test for evidence of gender bias (Seltzer, Newman, and Leighton 1997, Lawless and Pearson 2008, 6

Pearson and McGhee 2013, Barnes and Cassese 2016). Pearson and McGhee (2013) offer evidence of gendered electoral results showing women win a much smaller share of the vote given their high levels of qualification—suggesting that voter bias affects how voters make decisions about candidates. Comparisons of vote totals, alone, do not lend insight into how candidate gender affects the way voters assess candidate qualifications during the impression formation process (Bauer 2013). For example, voters are more likely to seek out qualification related information about female candidates compared to male candidates (Ditonto, Hamilton, and Redlawsk 2014). In addition, this information-seeking behavior suggests that voters may be uncertain that female candidates have the qualifications necessary for political office. Thus, voters may evaluate a candidate’s qualifications before deciding whether to support that candidate, and assessments of a candidate’s qualifications may change as voters learn more about a candidate during the duration of a campaign

Research, both observational and experimental, asking voters to evaluate female and male candidates along dimensions such as experience, knowledge, emotionality, and warmth often finds no gender differences or that female candidates receive more positive ratings than male candidates of an equal status—suggesting that gender bias does not affect candidate evaluations (Dolan 2014, Brooks 2013). These conclusions, however, do not comport with the status of women in politics. Women’s political underrepresentation persists at all levels of political office, and in the 2016 elections women did not gain a single seat in Congress (CAWP

2016). In fact, women have only incrementally increased their numbers, from 16% of Congress to 19%, over the last five election cycles (Bos, Schneider, and Utz forthcoming, Carroll and

Sanbonmatsu 2013). These marginal gains suggest that voter bias may still limit the descriptive and substantive representation of women. This manuscript uses a series of innovative survey 7 experiments to identify not only whether female candidates face bias but also when during the impression formation process this bias will most likely occur. The next section outlines a theoretical framework based in psychology research that explains what standards voters will use to assess a female candidate’s qualifications for political office, and these standards can lead to gender bias.

A Shifting Standards Perspective on Evaluations of Candidate Qualifications

I draw on psychology research to develop a framework that explains when voters will shift the standards used to evaluate female and male candidates. I then integrate this shifting standards framework into theories of voter impression formation to develop hypotheses about how voters evaluate candidate qualifications. This framework argues that individuals will use either gender-typicality or role-typicality standards to evaluate the qualifications of women and men, and the standard individuals use depends on whether individuals are forming narrow assessments of minimum qualifications or broad-based performance inferences (Biernat and

Manis 1994, Biernat, Manis, and Nelson 1991, Kunda, Sinclair, and Griffin 1997). Delineating the standards voters use to evaluate female and male political candidates clarifies why high qualified female candidates often do not outperform their less qualified male counterparts.

First, individuals will rely on gender-typicality standards when asked to form narrow assessments about whether a woman or a man has the minimum qualifications needed for a specific position (Foschi 1992, Biernat and Kobrynowicz 1997). Minimum qualification assessments include determining whether an individual has sufficient levels of experience or competency. Using a gender-typicality standard means that individuals will compare the qualifications of a woman or a man relative to the qualifications of a “typical woman” or a

“typical man.” These narrow evaluations lead individuals to rely on a “within group standard of 8 judgment,” and this is where stereotypes about women and men will affect decision-making.

Stereotypes about women characterize them as not very competent, and this creates a low expectation standard for the minimum qualifications of women. Stereotypes about men, on the other hand, characterize them as highly competent, and this creates a high expectation for a man’s minimum qualifications. Biernat and Kobrynowicz (1997) measured the minimum qualification expectations individuals have for the competency of female and male job applicants, and found that participants had lower competency expectations for female job applicants relative to male job applicants.

Gender-typicality standards set a low minimum bar for female candidates relative to male candidates because being a woman, in general, is incongruent with serving in leadership roles, including political leadership (Eagly and Karau 2002, Bos, Schneider, and Utz forthcoming).

This incongruity creates lower expectations for female candidate relative to male candidates on whether a candidate has the minimum qualifications needed for political office. Being male is congruent with political leadership and this means voters will have high expectations for the qualifications of male candidates (Koenig et al. 2011, Vinkenburg et al. 2011).. The gender- typicality hypothesis describes these dynamics:

Gender-Typicality Hypothesis: All else equal, female candidates will receive more

positive evaluations relative to male candidates on minimum qualification assessments.

The observable effect of these differing standards is that female candidates will receive more positive evaluations than equitably qualified male candidates. However, this does not necessarily mean that than male candidates will evaluate female candidates as being more qualified than a male candidate but that female candidates have higher levels of experience relative to women more generally. 9

Individuals use role-typicality standards on broad inference judgments because these types of assessments require individuals to think about the ability of a candidate to succeed in a particular capacity (Foschi 1992). Role-typicality standards can lead to gender differences in evaluations of women and men when gendered expectations define a specific role (Biernat and

Kobrynowicz 1997). Foschi (1992) explains that gendered expectations in favor of men “will in turn bias the evaluations so that a man's performance at a masculine task will be assessed as a better than the same performance by a woman” (p. 185). For instance, Biernat and Kobrynowicz

(1997) asked participants to indicate how well they thought a female or male job applicant would perform in the role of an executive chief of staff, a more masculine role. The authors found that the male applicant received more positive ratings than the female applicant received even though participants indicated higher competency expectations for the male applicant.

Masculine expectations define political leadership roles (Holman, Merolla, and

Zechmeister 2016, Conroy 2015, Huddy and Terkildsen 1993). Stereotypes about women characterize them as lacking masculine qualities (Huddy and Terkildsen 1993, Schneider and

Bos 2014), and this means women will have to work harder demonstrate their ability to fill a masculine leadership position. Stereotypes about men align with stereotypes of leadership roles

(Eagly and Karau 2002, Koenig et al. 2011, Vinkenburg et al. 2011, Rudman et al. 2012), and this means that men will not have to work very hard to demonstrate their ability to fill leadership roles. Voters will assume male candidates will perform well in leadership because of the high level of gender-role congruity. Female candidates, because their gender is incongruent with leadership expectations, will need to meet a higher qualification standard compared to male candidates. The role-typicality hypothesis outlines these differences: 10

Role-Typicality Hypothesis: All else equal, female candidates will receive less positive

evaluations relative to male candidates on broader inference assessments of candidate

ability.

This hypothesis means that voters will hold female and male candidates to the same standard of a

“typical politician.” But, the political role-typicality standard means that a female candidate will receive less positive evaluations compared to an equitably qualified male candidate.

In sum, this shifting standards model contributes to theories of stereotyping and addresses a critical gap in the literature on candidate qualification. Voters will shift the reference points used to evaluate female and male candidates according to standards of gender-typicality when developing initial assessments about whether a candidate has the qualifications needed to serve in political office. However, in vote choice decisions voters will evaluate female and male candidates according to standards of role-typicality. These different standards mean that voters may rate a female candidate as highly qualified to serve in political office but decide not to vote for her. Meanwhile, voters may rate a male candidate as not very qualified to serve in political office but decide to vote for the candidate anyway.

Experimental Evidence of Shifting Standards

I use three original survey experiments to test whether voters hold female and male candidates to different qualification standards. Experiments are appropriate because the method offers a high level of internal validity that allows me to isolate the causal mechanisms behind the outcomes measured (Morton and Williams 2010). In other words, I can directly control candidate quality ensuring that the only factor that varies between two candidates is gender. The three experiments use different designs to test the conditions that lead voters to rely on gender- and role-typicality standards. I start by describing the candidate productivity experiment (study 1) 11 and presenting the results from this first test. I build on these findings with the competitive context experiment which examines how having a male opponent shifts the standards voters use to evaluate candidate qualifications (study 2). The partisan comparison experiment (study 3) examines differences in the standards co-partisan voters hold for Democratic and Republican female candidates.

Candidate Productivity Experimental Design

The candidate productivity experiment used a 2x2 design manipulating candidate gender and level of high or low candidate productivity. I manipulated candidate gender with names,

Carol or Chris Hartley, and female and male photos.4 Each candidate was an incumbent running for re-election. I control for incumbency status, rather than manipulating incumbency status, because this presents a more stringent test for gender differences in candidate evaluations given the incumbency advantage in elections. I excluded partisan labels in this first study to ensure that partisan affinity does not affect the results.5 Table 1 lists the full set of conditions.6

4 I conducted a separate pre-test of the candidate names and photos using a sample recruited through Amazon’s Mechanical Turk, N=129. There are no significant differences in the average ratings of the female or male candidate in terms of age, p=0.1460, education, p=0.9887, or attractiveness, p=0.3630. See Appendix 3 for the full pre-test information.

5 The experiment asked participants to place each candidate on a liberal-conservative ideological spectrum. Participants, across both conditions, perceive the female candidate as more liberal than the male candidate, but this finding is in line with extant scholarship about the way voters use gender as an ideological cue (McDermott 1998, 1997, Koch 2002, 2000). Participants also indicated which political party they thought each candidate belonged to, and on this question a 12

[Table 1 Here]

The high productivity condition mentioned that the candidate chairs committees in the

House, is a member of their party’s leadership, and passed multiple bills into law. The low productivity condition mentioned that the candidate just finished a first-term in the House, is a member of multiple committees, and sponsored bills that did not become law. I manipulated legislator productivity in this study because campaigns and campaign news coverage generally provide voters with some information about a legislator’s performance in office. Including the high and low productivity conditions lets me examine how the use of gender and role-typicality standards shifts depending on the legislator’s past performance. The information on candidate quality comes from research analyzing gender differences in legislative productivity—a dimension where lawmakers outperform male lawmakers (Volden, Wiseman, and Wittmer 2013,

Volden, Wiseman, and Wittmer 2016, Pearson and McGhee 2013). Gender and role-typicality standards may be more likely to lead to biased evaluations of female candidates who are not highly productive.7

The experimental sample comes from Amazon’s Mechanical Turk (MTurk), N=401, and the data collection occurred in June 2016. MTurk is an online recruitment platform where

majority of participants selected Republican for the high productivity female and male candidates, but most participants selected Democrat for the low productivity female candidate.

6 A manipulation check asked participants to recall the name of the candidate they read about in the stimulus, and 90% of participants selected the correct candidate name. Additionally, a multinomial logit shows that participant demographics (gender, education, region, etc.) do not predict group assignment, χ2(9) = 6.25, p=0.7150.

7 See Appendix 1 for the full set of stimuli. 13 individuals complete small tasks for a nominal fee. While MTurk samples are not random samples of the U.S. population, the results conducted via MTurk samples do resemble those conducted with nationally representative samples (Berinsky, Huber, and Lenz 2012, Mullinix et al. 2015, Krupnikov and Levine 2014). Moreover, studies show that MTurk samples are particularly useful for research on gender (Hannagan, Schneider, and Greenlee 2012), in part, because these samples are less likely to misreport preferences for female candidates due to social desirability pressures (Krupnikov, Piston, and Bauer 2016). Appendix 2 includes the sample characteristics of the candidate productivity experiment along with comparisons to other national

Internet-based surveys. The partisan distribution of the MTurk sample resembles the Pew

Internet sample and the CCES sample. The MTurk sample differs from other samples in that it skews young with most participants falling in the 25 to 44 ranges, and has a slightly higher proportion of female participants compared to other samples at 62%. These characteristics offer a more stringent test of gender bias toward female candidates given the gender affinity effects among women (Sanbonmatsu 2002).

I use an innovative set of measures to test the use of gender-typicality standards in minimum qualification assessments. Participants indicated whether or not a candidate possessed a set of skills identified as critical for serving in Congress through a pre-test: willingness to work hard, good organizational skills, the ability to build consensus, and the ability to manage multiple priorities. I chose these particular skills based on a pre-test that asked individuals to list all the skills a good member of Congress should posses, and previous research that identifies these skills as contributing to the gendered qualification gap (Fox and Lawless 2011, Kanthak and

Woon 2015). 8 I randomized the order in which participants evaluated the skills. A value of 1

8 See Appendix 3 for information about the skills pre-test. 14 with this measure indicates that the candidate possesses the skill and a value of 0 indicates the candidate does not possess the particular skill.

To measure gender differences in the broad impressions voters form of candidates I use a question that asks how likely it is the candidate will win their election. The response options include highly likely, somewhat likely, somewhat unlikely, and highly unlikely. The viability questions serve as a proxy for asking directly about vote preferences. Asking about vote choice in the context of an experiment can lead participants to over-report support for female candidates due to social desirability pressures (Krupnikov, Piston, and Bauer 2016, Streb et al. 2008).

Questions about candidate viability ask participants to indicate which candidate other voters will support, and shifting the context of the question to what other voters will do can limit the effect of social desirability (Claassen and Ryan 2016). Moreover, the viability question asks participants to consider the future performance of the candidate in the campaign. This measure ranges from 0 to 1 with higher values indicating evaluations that are more positive.

Candidate Productivity Results

The gender-typicality hypothesis argues that, all else equal, female candidates will receive higher evaluations relative to male candidates on minimum qualification assessments but that male candidates will receive higher ratings when voters form broad impressions. These gender differences may be more pronounced in the low productivity conditions because participants will compare the female candidate to a typical woman. All the comparisons use two- tailed t-tests and compare female and male candidates within the high and low productivity conditions. To test for gender differences in minimum qualification assessments, I use the set of measures asking participants whether the candidate possesses a specific skill needed for serving in a leadership role. The electoral viability outcome tests the role-typicality hypothesis. 15

Figure 1 shows the difference in the percentage of participants indicating that the female candidate possessed a skill relative to the male candidate for the high and low productivity conditions. Each bar displays the difference in the rating of the female candidate relative to the male candidate. Positive values indicate the female candidate received a more positive evaluation compared to the male candidate and negative values indicate the male candidate has an advantage. The less productive female candidate consistently receives higher ratings relative to the low qualified male incumbent on all the outcome variables. The gender gap in evaluations is largest on organizational skills where the less productive female candidate receives a rating that is 25% higher than the comparably qualified male candidate, p=0.004. The gender gap is more narrow on building consensus where the low productivity female candidate’s rating is only 9% higher than the low productivity male candidate’s rating, p=0.0955. Even though the female and male candidates have the same set of qualifications in the low productivity condition, the female candidate consistently receives ratings that are more positive. This indicates that participants hold the female candidate to a lower gender-typicality standard and not role-typicality standards.

If participants used role-typicality then the female candidate would receive less positive ratings because the masculine role standards for politics creates a higher bar for female candidates to meet. In the high productivity conditions, there are no significant gender differences. This null result suggests that participants evaluate highly productive female candidates more in line with role-typicality rather than gender-typicality expectations. Again, if participants in the high productivity conditions used a gender-typicality standard than the female candidate should outperform the male candidate due to the lower expectations.

[Figure 1 Here]

The first set of results indicate that less productive female candidates are held to different evaluative standards relative to comporably productive male candidates in minimum qualification assessments. Next, I turn to testing the role-typicality hypothesis which predicts that female candidates will be held to role-typicality and not gender-typicality standards, and this means that female candidates will not have an advatnage over male candidates. I use the candidate viability question (see Table 2). In the low productivity conditions, participants rated the feamle and male candidates as equally likely to win their election, p=0.2687, even though the female candidate received more positive rating on minimum qualification assessments. In the high productivty conditions, participants rated the male candidate as more likely to win the election compared to the female candidate, p=0.0363, even though the two candidates receive equitable ratings on minimum qualification assessments. These findings reinforce the expectation that female candidates need to be exceptionally more qualified than their male counterparts to achieve parity at the ballot box and suggests a higher bar for female candidates in broad inference decisions.9

9Appendix 5 includes comparisons across participant gender and participant partisanship.

Previous research shows that female voters tend to favor female candidates (Sanbonmatsu 2002,

Plutzer and Zipp 1996), and female participants do rate the female candidate in the low productivity condition more positively than the male candidate on being hardworking, building consensus, organizational skills, and managing multiple priorities as well as on the electoral viability outcome. Partisan differences may also occur in the female candidate’s rating as

Democratic tend to favor female candidates (King and Matland 2003), but the comparisons in

Appendix 5 find no significant differences in the female candidate’s rating across participant partisanship on any of the outcome variables. 17

[Table 2 Here]

This initial test of the gendered qualification gap finds that female candidates receive more positive ratings than male candidates when voters form assessments of minimum qualifications. Female candidates do not receive more positive ratings when particpants form broad inferneces about the candidate’s leadership ability. These findings suggest that voters shift their metrics of evaluation based on the specific type of assessment formed about a candidate, and provide evidence of a subtle form of bias that may limit the success of feamle candidates.

Also notable about these results is that the gender differences in evaluations dissipate in the high productivity conditions, and the findings here reinforce the premise that female candidates have to work exceptionally hard to overcome gender bias throughout the candidate emergence process. The competetive election experiment builds on these initial results to examine how female candidate evaluations shift in the presence of a male opponent in an open race where both candidates are challengers.

Competitive Context Experimental Design

The competitive election experiment tests how candidate standards shift in the context of a race with two candidates running for an open seat to Congress. Presenting participants with two candidates bolsters external validity of the study as most female candidates run against male candidates (Palmer and Simon 2005). Each article mentioned that the two candidates were vying for their party’s primary nomination. The gender of the manipulated candidate was either female or male and the gender of the second candidate was always male across the two conditions. I manipulated candidate gender with the same names and photos from the productivity study:

Carol or Chris Hartley. Carol or Chris Hartley’s opponent was always Tom Larson. I use the same set of female and male candidate photos from the candidate productivity experiment, and 18 include an additional male photo for the opponent, Tom Larson. 10 Each condition includes two candidates because the context of a mixed gender race can affect how participants perceive the status of a candidate. In other words, a woman running against a man can highlight the relative absence of women in leadership roles. A female candidate running against a male candidate may prime participants to rely on role-typicality rather than gender-typicality even in minimum qualification assessments. In other words, participants may be more likely to compare the female to the male candidate. This component of the design overcomes some of the limitations of the productivity experiment, which often includes single candidate manipulations and makes gender comparisons across treatments.

The manipulation included information about candidate partisanship, and because this was a primary election Hartley and Larson always belonged to the same political party. Holding partisanship equitable across the two candidates mimics the conditions of a primary election and increases the salience of gender as an informational cue. I matched participants into conditions based on shared partisanship. This means Democratic participants received information about two Democratic candidates running in the Democratic primary for an open House seat while

Republican participants received information about two Republican candidates running in the

Republican primary. I sorted participants identifying as Independent into partisan conditions based on the party they leaned most closely toward in a follow-up question about partisan identification.11 With this design, I can be sure that any negative effects in the female candidate

10 See Appendix 3 for information about the stimuli pre-test.

11 Twenty-percent of participants indicated they were a political independent on the first PID question. All these participants selected a party on the second PID question. 19 condition come from candidate gender, and not from inferred information about candidate partisanship. Table 3 outlines the full set of conditions.12

[Table 3 Here]

Participants read a newspaper article about a primary election for an open seat to the U.S.

House of Representatives featuring two candidates. I use a newspaper article for the manipulation because this treatment resembles how most citizens learn about elections (West

2005), and research shows that media coverage of candidate competency differs for female candidates compared to male candidates (Bligh et al. 2012). The article described an upcoming primary election as a close race with Hartley and Larson evenly tied in the polls. The article mentioned Hartley first and Larson second. The article stated that both candidates had prior experience serving in the state legislature. Unlike in the first study, I hold information about candidate quality and productivity constant across the conditions. I rely on prior political experience as a component of candidate qualifications because this is one of the most common objective measures of qualifications used in observational research (Pearson and McGhee 2013), and because female candidates are more likely to start their political careers in state legislatures compared to male candidates (Maestas et al. 2006).

The experimental sample comes from Survey Sampling International, N=226, and the data collection occurred in November 2016.13 SSI is a market-based research company that

12 See Appendix 1 for the full set of stimuli.

13 The experiment took place approximately one-week before the 2016 presidential election. The highly gendered context of the 2016 presidential race may have affected how participants evaluated the female candidates. However, these dynamics may, if anything, lead to more conservative estimates of gender bias. One week before the election, the expectation among 20 recruits adult participants to complete studies online. Robustness tests of SSI find that these samples include populations often excluded from online survey platforms (Berinsky et al. 2014).

Appendix 2 compares the SSI sample to the MTurk sample in the candidate productivity experiment. Overall, the sample over-represents women at 57% but the sample resembles other survey populations on demographics including age, race, political interest, and party identification.

In this experiment, I take a slightly different approach to measure minimum qualification assessments. The experiment presented participants with a list of 10 activities members of

Congress engage and participants indicated how many of the activities, but not which ones, each candidate needed to demonstrate skill on before the participant considered the candidate an effective lawmaker. If participants indicate the female candidate needs to complete fewer activities compared to the male candidate this suggests the use of a gender-typicality standards, but if participants indicate the female candidate needs to complete more tasks than the male candidate does this suggests a role-typicality standard. This approach is valuable for several reasons. First, this question uses an objective rather than a subjective metric (Biernat and

Kobrynowicz 1997). Moreover, asking participants to indicate how many skills a candidate possesses, rather than which specific skills, can alleviate some of the problems with social desirability bias. Finally, this minimal qualification question asks participants about their expectations for candidates rather than asking for assessments of the actual candidate’s qualifications. To measure broad inferences, I, again, ask participants about the electoral viability of candidates in both the primary election and the general election. However, the response

pundits and media outlets was that Clinton would win the race, and this expectation can lead to over-reports of support for the female candidate. 21 options differ slightly from the viability question used in the candidate productivity study as participants selected either Hartley or Larson as the likely winner. Placing two candidates in an election against one another allows me to assess the perceived viability of female candidates when they are running against a male opponent. I code the viability variable as 1 if participants selected Hartley as the likely winner. The full set of activities is included in Appendix 4.

Competitive Context Results

I start by analyzing how candidate gender influences the minimum qualification assessments and then turn to the electoral viability question. All the comparisons, as with the first experiment, use two-tailed t-tests. Because the study matched participants into shared partisan conditions, I group together the Democratic and Republican responses.14 This approach allows me to measure how participants rate female candidate qualifications when she belongs to their own political party.15 The addition of a male opponent in this experiment can shed light on how the dynamics of electoral competition can lead voters to shift from a gender-typicality to a role-typicality standard because the male comparison can prime the masculine expectations of leadership roles.

14 Appendix 6 includes partisan comparisons and finds no significant differences in ratings of the female candidate between Democratic and Republican participants. However, the low statistical power may mask any partisan differences, but the partisan differences study includes a more robust comparison across partisanship.

15 A manipulation check asked participants to identify the level of office both Hartley and Larson previously held, and 91% correctly identified that both candidates previously served in the state legislature. Additionally, a multinomial logit shows that participant demographics (gender, education, region, etc.) do not predict group assignment, χ2(9) = 13.42, p=0.1447. 22

I start with the task number variable to measure minimum qualification assessments. I compare the number of tasks the female Hartley had to complete relative to her male opponent, and then I compare the number of tasks the male Hartley had to complete relative to his male opponent. If participants indicate that the female Hartley needs to demonstrate more skills than her male opponent does, this suggests use of a role-typicality standard but if the female Hartley needs to demonstrate fewer skills than Larson this suggests the use of a gender-typicality standard. Women, in general, are not perceived as having the skills needed for political office, and if participants use the “typical woman” standard then it is not unreasonable for the female

Hartley to need to demonstrate more skills than the male Hartley needs to demonstrate.

Figure 2 graphs the difference in the number of tasks required for Hartley to complete relative to Larson, the opposing candidate, when Hartley is a woman relative to when Hartley is a man. When Hartley is a woman, participants indicated that she had to demonstrate ability on

1.09 tasks more than Larson. In other words, as a female candidate Hartley had to demonstrate ability on an average of 5.99 tasks (SD=3.26) while Larson only had to demonstrate ability on

4.90 tasks, and this difference is statistically significant, p=0.0242. This higher need to demonstrate minimum ability suggests that running against a male opponent raises the qualification bar for female candidates. In the male candidate condition, there is no statistically significant difference in the number of tasks the male version of Hartley had to complete,

(M=6.11, SD=3.09), relative to the number of tasks Larson had to complete (M=5.57, SD=3.63), p=0.1900. These findings provide evidence that running against a male opponent leads participants to shift from a gender-typicality to a role-typicality standard.

[Figure 2 Here] 23

Next, I investigate whether there are gender differences in the broad inferences voters form of candidates using the primary and general election viability questions. The response options for this question differed slightly from the first study. The question asked people to indicate whether Hartley or Larson was more likely to win the election rather than the likelihood of each candidate winning the election. As such, I compare the percentages of participants selected Hartley as the likely winner of the election based on Hartley’s gender. A gender difference indicating that Hartley is not as likely to win the election relative to Larson suggests that participants hold Hartley to role-typicality standards that create a high bar for female contenders. In the female condition, 49% of participants selected Hartley as the likely winner of the primary but 58% of participants also selected Hartley as the likely winner in the male candidate condition—however, this 9-point gap in support is not statistically significant, p=0.1553. These results replicate on the general election viability question. Forty-eight percent of participants selected Hartley as likely to win a general election in the female condition while

53% selected Hartley as the likely winner in the general, and these levels of support do not statistically differ from one another, p=0.461. This absence of gender differences suggests that participants rely on role-typicality standards, but pairing these results with the task number variable suggests that female candidates may have to work harder to persuade voters of their ability.

Adding a male opponent to the electoral context leads voters to shift away from gender- typicality standards and to rely more heavily on role-typicality standards to evaluate candidate qualifications. This shift raises the qualification bar for female candidates. The female candidate had to demonstrate ability on more tasks relative to her male counterparts, but this does not provide the female candidate with any electoral advantage—she only manages to break even 24 while working harder. A limitation of this experiment is the lack of statistical power to discern differences between Democratic and Republican participants, but the third experiment, the partisan differences experiment, builds on these initial findings.

Partisan Differences Experimental Design

Democratic women vastly outnumber Republican women as political candidates and in elected office (Carroll and Sanbonmatsu 2013).16 The dynamics of this partisan disparity are not well understood, but this third experiment tests whether Republican voters hold Republican female candidates to exceptionally high standards. The partisan differences experiment uses the same design and stimulus as the competitive context experiment, but this study has enough statistical power to make comparisons across female candidate partisanship. A key difference between this experiment and the competitive context experiment is that I conducted the study through Amazon’s Mechanical Turk (MTurk) rather than SSI. Table 4 outlines the full design of the study along with the number of participants in each experimental group.

[Table 4 Here]

Participants compared Hartley and Larson’s levels of experience and knowledge as assessments of minimum qualifications. Experience and knowledge are appropriate assessments of candidate qualification because these are two characteristics that voters place a high level of importance on in leadership evaluations (Miller, Wattenberg, and Malanchuk 1986, Huddy and

16 A manipulation check asked participants to recall the level of office the candidates in the article previously held, and 98% of participants correctly answered that both candidates previously served in the state legislature. There were no differences across conditions in responses on the manipulation check. Moreover, a randomization check shows that demographic factors do not jointly predict group assignment, χ2[5] = 3.96, p=0.5552. 25

Terkildsen 1993, Funk 1999). Moreover, experience and knowledge are dimensions where voters often perceive female candidates to be at a deficit (Schneider and Bos 2014). Finally, these variables reflect the more conventional measures of candidate qualifications scholars use to detect gender bias. I recoded both variables to range from 0-1 with higher values indicating evaluations that are more positive. Appendix 4 includes the full question wordings and response options.

Participants also answered two broad inference questions about candidate viability. One question asked which candidate was more likely to win the primary election and the second question asked which candidate was more likely to win the general election. With both questions, the response options are either Hartley or Larson with the order of the options randomized.

Moreover, the primary viability question offers insight into whether female candidates will face bias within their own political party and the general election viability question offers insight into whether female candidates will face bias from within the general electorate. This variable is coded 1 if Hartley was selected as the likely winner and 0 for Larson as the likely winner.

Partisan Differences Results

[Figure 3 Here]

I start with the minimum ability results and then move onto the broad inference questions. I break down all the results across candidate partisanship. Figure 3 shows the difference in the experience and knowledge ratings for the female Hartley compared to the male

Hartley.17 Positive values indicate the female Hartley receives a more positive evaluation than

17 In the figures, I collapsed the extremely more and somewhat more experienced/knowledgeable than Larson conditions for ease of presentation. For the full set of group means across the response options see Appendix 7. 26 the male Hartley while negative values indicate the female Hartley receives more negative evaluations than her male counterpart receives. On both measures, there is no statistically significant difference in how Democrats rate Hartley across the candidate gender conditions. The gender gap in evaluations is significantly wider among Republicans. The Republican female candidate received a significantly lower rating on both experience (p=0.0014) and knowledge

(p=0.0179) relative to the Republican male candidate with equitable qualifications running against the same male opponent. In other words, Republican participants consistently rate

Hartley as less qualified relative to her male opponent, Larson, when she is female; but when

Hartley is male Republican participants rate him much more positively. These differences suggest that participants use a role-typicality standard to form these impressions. These results suggest that participants require more evidence of ability and qualifications of the female candidate, and the qualification bar is exceptionally high in the context of a male opponent.

Figure 4 graphs the margin of victory or the gap in the percentage of participants selecting Hartley as the likely winner over Larson based on Hartley’s gender and partisanship. 18

Thus, positive values indicate that Hartley has the electoral advantage, and negative values indicate greater vote support for Larson. The results are consistent with the minimum ability measures: participants perceive the Republican female candidate as less electorally viable in the general or the primary election relative to her equitably qualified male counterpart. In the

Democratic partisan conditions, Hartley has a large margin of victory, 30 points, over Larson when she is female and a similarly sized margin of victory, 35 points, in the male conditions.

Sixty-five percent of participants selected Hartley as the likely winner in the female condition

18 Appendix 7 includes the full values for the percent selecting Hartley or Larson as the likely winner in each condition. 27 while 67% selected Hartley as the likely winner in the male condition, and these differences are not statistically significant, p=0.8083. There are also no gender differences in Hartley’s margin of victory in the Democratic partisan conditions on the general election viability question, p=0.9784. These results suggest that Democratic voters may be less likely to hold female candidates to an exceptionally high qualification bar, and these lack of differences are not surprising given that the vast majority of female candidates and female office holders identify as

Democrats.

[Figure 4 Here]

The comparisons in the Republican partisan conditions suggest that Republican female candidates may face a tough time securing support from within their own party. On the primary election question, Hartley has a narrow 5-point margin of victory over Larson, her male opponent, but when Hartley is male, he has a 66-point margin of victory over Larson. These differences are striking. Only 53% of participants selected Hartley as the likely winner in the female condition while 83% of participants selected Hartley as the winner in the male conditions, p=0.0010. The gender gap in perceived viability widens even further on the general election viability question. The negative value of Hartley’s margin of victory in the female condition is negative, indicating that a majority of participants thought Larson was more likely to win the primary election. Only 45% of participants selected the female candidate as a viable general election candidate. In the male candidate condition, Chris Hartley has a clear 70-point electoral advantage over Larson with 85% of participants selecting him as the likely winner in a general election contest. This difference in perceptions of general election viability between the female and male candidate conditions among Republicans is statistically significant, p=0.001. These results indicate that Republicans are far less likely to see the female candidate as a viable 28 contender in a primary election compared to Democrats. And, these findings are consistent with a shifting standards model where the use of a role-typicality standard raises the qualification bar female candidates need to meet in order to be perceived as qualified as their male counterpart.

The partisan differences experiment confirms that Republican female candidates may face an exceptionally high qualification barrier among co-partisan voters. Republican voters consistently rated the Republican female candidate poorly on both the minimum ability and broad inference questions. Democratic female candidates may face a somewhat lower qualification barrier as these results show that the Democratic female and Democratic male candidates receive near equitable ratings.19

Experimental Results Summary

All three experiments uncovered subtle evidence of gender bias in female candidate evaluations. The productivity study found that female candidates without established legislative records received higher ratings compared to the male candidates on the minimum qualification assessments, but these female candidates were less likely to win the election. The competitive context experiment reaffirmed that female candidates need to work harder to prove their

19Appendix 7 breaks down all the main results to look at differences across participant gender.

Overall, there are no significant differences in how female and male participants rated the co- partisan female candidate, with the exception of the primary viability variable. Here, male participants perceived the female candidate as more electorally viable in the primary compared to male participants. This difference could come from several dynamics. It is possible that female voters hold female candidates to an even higher qualification standard. Alternatively, it may be that female voters are more likely to see other voters as less likely to support a female candidate due to perceptions of gender bias in the electorate. 29 qualifications to achieve electoral parity with male candidates at the ballot. Finally, the partisan differences study showed that Republican female candidates face particularly steep qualification barriers even when facing a co-partisan electorate. Together, these findings provide evidence that voters do, in fact, hold female candidates to a higher qualification standard, and that the standards voters hold female candidates to shifts depending on the electoral context and dimension of evaluation.

Discussion

A common finding in the literature on candidate gender is that when women run for political office they win elections at equal rates as male candidates (Seltzer, Newman, and

Leighton 1997). Overlooked in this statistic is that voters do not evaluate candidates in a gender- neutral way. Indeed, female candidates need to be much more qualified than male candidates to win support from voters (Pearson and McGhee 2013, Fulton 2012, 2014). This study illustrates how and when voters downgrade the qualifications of female candidates. Voters are more likely to assume male candidates have the qualifications needed to serve in elected office, and voters will be significantly more critical of the qualifications of female candidates. These findings have critical implications for the ability of female candidates to successfully enter into the political pipeline, candidate campaign strategy, the role of gender bias in voter impression formation, and the representation of women in the American electorate.

The candidate productivity experiment uncovered evidence that female candidates face the greatest perceptual bias when these candidates are relatively new to the political arena. While voters found the less productive and more junior female candidate to have higher levels of minimum ability than the equitably qualified male candidates this perception did not carry over into vote choice decisions. This finding comports with previous research identifying that female 30 candidates frequently face high quality challengers at the primary and general election level when they first run for re-election (Palmer and Simon 2001, Milyo and Schlosberg 2000). Once female candidates become more senior, the gender differences in voter perceptions decrease.

Earlier in their careers, female candidates may need to work harder to convince voters that they have solid leadership credentials. This perceptual bias can limit the ability of female candidates to move up the political leadership pipeline, and contributes to the underrepresentation of female candidates in senior positions of political power and influence.

The gender gap in perceptions of candidate qualifications means that female candidates not only need to be more qualified to secure an electoral victory, but female candidates must work exceptionally hard to actively persuade voters of their qualifications. Female candidates need to use their campaign messages to persuade voters of their abilities to overcome this qualification bias. Psychology research suggests that women, in general, tend to refrain from touting their qualifications for fear of being perceived negatively (Rudman 1998, Parks-Stamm,

Heilman, and Hearns 2008). Conducting further research on how female candidates communicate their qualifications to voters and voter responses to such messages is a critical next step in this line of inquiry. It may be that female candidates undersell their qualifications in campaign communication—and, this behavior can limit electoral success.

The competitive context election used a primary election to test the gender and role- typicality hypotheses. Primary elections often pose unique challenges for female candidates

(Lawless and Pearson 2008, Barnes, Branton, and Cassese forthcoming), but gender-typicality and role-typicality standards can certainly come into play in competitive general election contexts as well. Developing experimental designs that rely on the context of a general election with female and male candidates, or races with two female candidates, can clarify how voters 31 evaluate the leadership abilities of female candidates. Voters in primaries tend to be more ideologically driven and politically engaged than general election voters. In a general election, partisan identities may play a more prominent role in vote choice compared to primary elections where all the candidates share the same partisan label. The higher standards for female candidates, in either a primary or a general election, explains, in part, why female candidates win elections at smaller margins than male candidates (Pearson and McGhee 2013), and why female candidates face more competitive re-election races (Milyo and Schlosberg 2000, Palmer and

Simon 2012). Thus, the role of gender-typicality and role-typicality standards may differ across electoral contexts.

Republican female candidates are vastly under-represented at the state legislative level and at the congressional level. The findings uncovered in this manuscript indicate that

Republican female contenders may face gender bias from within their own party. Indeed, a growing body of research shows that Republican female candidates face perceptual obstacles among voters based on the extent to which these candidates ideologically fit with their party

(Thomsen 2015, Bauer forthcoming). To win elections, especially Republican primaries, these female candidates need expend campaign resources extolling their qualifications but also need to persuade voters they are the right ideological fit for the Republican Party. Republican male candidates do not have to overcome these perceptual obstacles.

A novel aspect of the experiments conducted in this manuscript is the use of more objective measures that ask voters about the minimum qualifications of candidates as separate constructs. Subjective metrics of candidate impressions, such as asking about experience or knowledge, often lead to findings that voters rate female candidates as equally qualified as male candidates. However, this null result may mask the use of gender-typicality standards—which 32 creates a lower bar for female candidates to meet in minimum qualification assessments. Future work should continue to develop novel measures of candidate evaluations to gain a more clear understanding of the gender similarities and gender differences in how voters form impressions of candidates.

Conclusion

Exceptionally qualified female candidates often run for political office, but fail to win even when running against less qualified male candidates. Indeed, Hillary Clinton possessed ample political experience; but, political experience and a long life of public service may not always be enough to overcome deeply ingrained perceptions about female candidates as lacking leadership abilities. The qualification barriers facing female candidates limit the potential candidate pool, and make it more difficult to recruit solid female candidates to stand for political office. And, state and local parties, because of these qualification dynamics, may choose to spend resources on lesser qualified male candidates thereby limiting the ability of female candidates to emerge as viable contenders. Moreover, higher qualification barriers mean that female candidates have no choice but to wait longer before pursuing elected office or moving onto a higher level of office. This extended timeline of progressive candidate ambition limits the ability of female politicians to achieve seniority in legislatures and to move up the political ladder.

These dynamics not only perpetuate the descriptive under-representation of women but the substantive under-representation of women as well. Achieving gender parity in elected office requires shifting how voters, parties, and other gatekeepers evaluate candidate qualifications so that voters do not perceive a female candidate’s gender as a detriment to her future success. 33

References:

Anzia, Sarah F. , and Christopher R. Berry. 2011. "The Jackie (and Jill) Robinson Effect: Why Do Congresswomen Outperform Congressmen?" American Journal of Political Science 55 (3):478-493. Barnes, Tiffany D. , Regina P. Branton, and Erin C. Cassese. forthcoming. "A Re-Examination of Women's Electoral Success in Open Seat Elections: the Conditioning Effect of Electoral Competition." Journal of Women, Politics & Policy. Barnes, Tiffany D., and Erin C. Cassese. 2016. "American Party Women: A Look at the Gender Gap wtihin Parties." Southern Political Science Association Annual Meeting San Juan Puerto Rico. Bauer, Nichole M. 2013. "Rethinking stereotype reliance: Understanding the connection between female candidates and gender stereotypes." Politics & the Life Sciences 32 (1):22-42. Bauer, Nichole M. 2015. "Emotional, Sensitive, and Unfit for Office: Gender Stereotype Activation and Support for Female Candidates." Political Psychology 36 (6):691-708. doi: 10.1111/pops.12186. Bauer, Nichole M. forthcoming. "Untangling the Relationship between Partisanship, Gender Stereotypes, and Support for Female Candidates." Journal of Women, Politics & Policy doi: 10.1080/1554477X.2016.1268875. Berinsky, Adam J. , Gregory A. Huber, and Gabriel S. Lenz. 2012. "Using Mechanical Turk as a Subject Recruitment Tool for Experimental Research." Political Analysis 20 (3):351-368. Biernat, Monica, and Diane Kobrynowicz. 1997. "Gender- and Race-Based Standards of Competence: Lower Minimum Standards but Higher Ability Standards for Devalued Groups." Journal of Personality & Social Psychology 72 (3):554-557. Biernat, Monica, and Melvin Manis. 1994. "Shifting Standards and Stereotype Based Judgments." Journal of Personality and Social Psychology 66 (1):5-20. Biernat, Monica, Melvin Manis, and Thomas E. Nelson. 1991. "Stereotypes and Standards of Judgment." Journal of Personality and Social Psychology 60 (4):485-499. Bligh, Michelle C. , Michele M. Shlehofer, Bettina J. Casad, and Amber M. Gaffney. 2012. "Competent Enough, but Would you Vote for Her? Gender Stereotypes and Media Influences on Perceptions of Women Politicians." Journal of Applied Social Psychology 42 (3):560-597. Bos, Angela L., Monica C. Schneider, and Brittany L. Utz. forthcoming. "Gender Stereotypes and Prejudice in U.S. Elections." In APA Handbook of the Psychology of Women edited by Cheryl Travis and Jackie White. Brooks, Deborah Jordan. 2013. He Runs, She Runs. Princeton: Princeton University Press. Burrell, Barbara C. 1994. A Woman's Place is in the House. Michigan: University of Michigan Press. Carroll, Susan J. 1994. Women as Candidates in American Politics. Bloomington, IN: Indiana University Press. Carroll, Susan J., and Kira Sanbonmatsu. 2013. More Women Can Run: Gender and Pathways to the State Legislatures. New York: Oxford University Press. 34

CAWP. 2016. Current Numbers of Women Officeholders. Center for American Women in Politics. Claassen, Ryan L., and John Barry Ryan. 2016. "Social Desirability, Hidden Biases, and Support for Hillary Clinton." PS: Political Science and Politics 49 (4):730-735. Conroy, Meredith 2015. Masculinity, Media, and the American Presidency. New York: Palgrave McMillan. Crowder-Meyer, Melody, and Benjamin Lauderdale. 2014. "A partisan gap in the supply of female potential candidates in the United States." Research and Politics 1:1-7. Darcy, R. , and Sarah Slavin Schramm. 1977. "When Women Run Against Men." Public Opinion Quarterly 41:1-12. Ditonto, Tessa M. , Allison J. Hamilton, and David P. Redlawsk. 2014. "Gender Stereotypes, Information Search, and Voting Behavior in Political Campaigns." Political Behavior 36 (2):335-358. Dolan, Kathleen. 2014. When Does Gender Matter? Women Candidates & Gender Stereotypes in American Elections. New York: Oxford University Press. Duerst-Lahti, Georgia. 1998. "The Bottlenetck, Women Candidates." In Women and Elective Office: Past, Present, adn Future, edited by Sue Thomas and Clyde Wilcox. New York: Oxford University Press. Eagly, Alice H., and Steve J. Karau. 2002. "Role Congruity Theory of Prejudice Toward Female Leaders." Psychological Review 109 (3):573-594. doi: 10.1037//0033- 295X.109.3.573. Elder, Laurel. 2012. "The Partisan Gap Among Women State Legislators." Journal of Women, Politics & Policy 33 (1):65-85. Foschi, Martha. 1992. "Gender and Double Standards for Competence." In Gender, Interaction, and Inequality, edited by Cecilia L. Ridgeway. New York: Springer- Verlap. Fox, Richard L. 2006. "Congressional Elections: Where Are We on the Road to Gender Parity." In Gender and Elections, edited by Sue J. Carroll and Richard L. Fox. New York: Cambridge University Press. Fox, Richard L., and Jennifer L. Lawless. 2011. "Gendered Perceptions and Political Candidacies: A Central Barrier to Women’s Equality in Electoral Politics." American Journal of Political Science 55 (1):59-73. Fulton, Sarah A. 2006. "The Sense of a Woman: Gender Ambition, and the Decision to Run for Congress." Political Research Quarterly 59 (2):235-148. Fulton, Sarah A. 2012. "Running Backwards and in High Heels: The Gendered Quality Gap and Incumbent Electoral Success." Political Research Quarterly 65 (2):303-314. Fulton, Sarah A. 2014. "When Gender Matters: Macro-dynamics and Micro-mechanisms." Political Behavior 36:605-630. Funk, Carolyn L. . 1999. "Bringing the Candidate into Models of Candidate Evaluation." Journal of Politics 61 (3):700-720. Hannagan, Rebecca J. , Monica C. Schneider, and Jill S. Greenlee. 2012. "Symposium: Data, Methods, and Theoretical Implications." PS: Political Science and Politics 45 (2):232- 237. Holman, Mirya R., Jennifer L. Merolla, and Elizabeth J. Zechmeister. 2016. "Terrorist Threat, Male Stereotypes, and Candidate Evaluations." Political Research Quarterly 69 (1):134-147. 35

Huddy, Leonie, and Nayda Terkildsen. 1993. "Gender Stereotypes and the Perception of Male and Female Candidates." American Journal of Political Science 37 (1):119-147. Kanthak, Kristin, and Jonathon Woon. 2015. "Women Don't Run: Election Aversion and Candidate Entry." American Journal of Political Science 59 (3):595-612. King, David C. , and Richard E. Matland. 2003. "Sex and the Grand Old Party: An Experimental Investigation of the Effect of Candidate Sex on Support for a Republican Candidate." American Politics Research 31 (6):595-612. Koch, Jeffrey W. 2000. "Do Citizens Apply Gender Stereotypes to Infer Candidates' Ideological Orientations?" The Journal of Politics 62 (2):414-429. Koch, Jeffrey W. 2002. "Gender Stereotypes and Citizens' Impressions of House Candidates' Ideological Orientations." American Journal of Political Science 46 (2):453-462. Koenig, Anne M. , Alice H. Eagly, Abigail A. Mitchell, and Tiina Ristikari. 2011. "Are Leader Stereotypes Masculine? A Meta-Analysis of Three Research Paradigms." Psychological Bulletin 137 (4):616-642. Krupnikov, Yanna, and Adam Seth Levine. 2014. "Cross Sample Comparisons and External Validity." Journal of Experimental Political Science 1 (1):59-80. Krupnikov, Yanna, Spencer Piston, and Nichole M. Bauer. 2016. "Saving Face: Identifying Voter Responses to Black and Female Candidates." Political Psychology 37 (2):253- 273. doi: 10.1111/pops.12261. Kunda, Ziva, Lisa Sinclair, and Dale Griffin. 1997. "Equal Ratings but Separate Meanings: Stereotypes and the Construal of Traits." Journal of Personality and Social Psychology 72 (4):720-734. Lawless, Jennifer L. 2012. Becoming a Candidate: Political Ambition and the Decision to Run for Office. New York: Cambridge University Press. Lawless, Jennifer L., and Richard L. Fox. 2010. It Still Takes a Candidate: Why Women Don't Run for Pffice. New York: Cambridge University Press. Lawless, Jennifer L., and Kathryn Pearson. 2008. "The Primary Reason for Women's Underrepresentation? Reevaluating Conventional Wisdom." Journal of Politics 70 (1):67-82. Maestas, Cherie D. , Sarah A. Fulton, L. Sandy Maisel, and Walter J. Stone. 2006. "When to Risk It? Institutions, Ambition, and the Decision to Run for the U.S. House." American Political Science Review 199 (2):195-208. Matland, Richard E., and David C. King. 2002. "Women as Candidates in Congressional Elections." In Women Transforming Congress, edited by Cindy Simon Rosenthal, 119- 145. Norman, OK: University of Oklahoma Press. McDermott, Monika L. 1997. "Voting Cues in Low-Information Elections: Candidate Gender as a Social Information Variable in Contemporary United States Elections." American Journal of Political Science 41 (1):270-283. McDermott, Monika L. 1998. "Race and Gender Cues in Low-Information Elections." Political Research Quarterly 51 (4):895-918. Miller, Arthur H., Martin P. Wattenberg, and Oksana Malanchuk. 1986. "Schematic Assessments of Presidential Candidates." American Political Science Review 80 (2):521-540. Milyo, Jeffrey, and Samantha Schlosberg. 2000. "Gender Bias and Selection Bias in House Elections." Public Choice 105 (1/2):41-59. 36

Morton, Rebecca B., and Keneth C. Williams. 2010. Experimental Political Science and the Study of Causality: From Nature to the Lab. New York: Cambridge University Press. Mullinix, Kevin J. , Thomas J. Leeper, James N. Druckman, and Jeremy Freese. 2015. "The Generalizability of Survey Experiments." Journal of Experimental Political Science 2:109-138. Palmer, Barbara, and Dennis Simon. 2001. "The Political Glass Ceiling: Gender, Strategy, and Incumbency in U.S. House Elections, 1978-1998." Women & Politics 23 (1/2):59-78. Palmer, Barbara, and Dennis M. Simon. 2005. "When Women Run Against Women: The Hidden Influence of Female Incumbents in Elections to the US House of Representatives." Politics & Gender 1:35. Palmer, Barbara, and Dennis M. Simon. 2012. Women & Congressional Elections: A Century of Change. Boulder, CO: Lynne Reinner Publishers. Parks-Stamm, Elizabeth, Madeline E. Heilman, and Krystle A. Hearns. 2008. "Motivated to Penalize: Women’s Strategic Rejection of Successful Women." Personality and Social Psychology Bulletin 34 (2):237-247. Pearson, Kathryn, and Eric McGhee. 2013. "What it Takes to Win: Questioning "Gender Neutral" Outcomes in U.S. House Elections." Politics & Gender 9:439-462. Plutzer, Eric, and John F. Zipp. 1996. "Identity Politics and Voting for Women Candidates." The Public Opinion Quarterly 60 (1):30-57. Rosenwasser, Shirley M., Robyn R. Rogers, Sheila Fling, Kayla Silvers-Pickens, and John Butemeyer. 1987. "Attitudes toward Women and Men in Politics: Perceived Male and Female Candidate Competencies and Participant Personality Characteristics." Political Psychology 8 (2):191-200. Rudman, Laurie A. 1998. "Self-Promotion as a risk factor for women: The costs and benefits of counter-stereotypic impression management." Journal of Personality and Social Psychology 74:629-645. Rudman, Laurie A, Corinne A. Moss-Rascusin, Julie E. Phelan, and Sanne Nauts. 2012. "Status incongruity and backlash effects: Defending the gender hierarchy motivates prejudice against female leaders." Journal of Experimental Social Psychology 48:165- 179. Sanbonmatsu, Kira. 2002. "Gender Stereotypes and Vote Choice." American Journal of Political Science 46 (1):20-34. Sanbonmatsu, Kira. 2006. Where Women Run: Gender and Party in the American States. Ann Arbor, MI: University of Michigan Press. Schneider, Monica C. 2014. "The Effects of Gender-Bending on Candidate Evaluations." Journal of Women, Politics, and Policy 35:55-77. Schneider, Monica C., and Angela L. Bos. 2014. "Measuring Stereotypes of Female Politicians." Political Psychology 35 (2):245-266. Schneider, Monica C., Mirya R. Holman, Amanda B. Diekman, and Thomas McAndrew. forthcoming. "Power, Conflict, and Community: How Gendered Views of Political Power Influence Women’s Political Ambition." Political Psychology doi: 10.1111/pops.12268. Seltzer, Richard, Jody Newman, and Melissa Leighton. 1997. Sex as a Political Variable: Women as Candidates & Voters in U.S. Electiosn. Boulder, CO: Lynne Rienner Publishers. 37

Streb, Matthew J., Barbara Burrell, Brian Frederick, and Michael A. Genovese. 2008. "Social Desirability Effects and Support for a Female American President." The Public Opinion Quarterly 72 (1):76-89. Thomsen, Danielle. 2014. "Ideological Moderates Won't Run: How Party Fit Matters for Partisan Polarization in Congress." Journal of Politics 76 (3):786-797. Thomsen, Danielle. 2015. "Why So Few (Republican)Women? Explaning the Partisan Imbalance in the US Congress." Legislative Studies Quartelry 50 (2):295-323. Vinkenburg, Claartje, Marloes L. van Engen, Alice H. Eagly, and Mary C. Johannesen- Schmidt. 2011. "An Exploration of Stereotypical Beliefs about Leadership Styles: Is Transformational Leadership a Route to Women's Promotion?" The Leadership Quarterly 22 (1):10-21. Volden, Craig, Alan E. Wiseman, and Dana E. Wittmer. 2013. "When are Women More Effective Lawmakers than Men?" American Journal of Political Science 57:326-341. Volden, Craig, Alan E. Wiseman, and Dana E. Wittmer. 2016. West, Darrell. 2005. Air Wars: Television Advertising in Election Campaigns, 1952-2004. Vol. 4. Washington, DC: CQ Press.

Tables & Figures

Table 1: Experimental Conditions, Candidate Productivity, N=401 Candidate Gender Candidate Productivity n Carol Hartley High 101 Chris Hartley High 103 Carol Hartley Low 96 Chris Hartley Low 101 Note: Study conducted via Amazon’s Mechanical Turk.

Table 2: Candidate Productivity Experiment: Gender Differences in Electoral Viability Low Productivity Conditions High Productivity Conditions Female 0.18 (0.17) Female 0.53 (0.13) Male 0.16 (0.16) Male 0.57 (0.13) p-value 0.2687 p-value 0.0363 Note: The scales for electoral viability range from 0-1 with values closer to 1 indicating a high level of electoral viability and values closer to 0 indicate a low level of electoral viability.

Table 3: Competitive Context Experimental Design N=226 Candidate Party Candidate Gender Opposing Candidate n Democratic Carol Hartley Tom Larson 75 Democratic Chris Hartley Tom Larson 73 Republican Carol Hartley Tom Larson 40 Republican Chris Hartley Tom Larson 38 Note: Study conducted via SSI. This experiment matched participants into conditions based on shared partisanship.

Table 4: Partisan Differences Experimental Design, N=223 Candidate Party Candidate Gender Opposing Candidate n Democratic Carol Hartley Tom Larson 63 Democratic Chris Hartley Tom Larson 58 Republican Carol Hartley Tom Larson 55 Republican Chris Hartley Tom Larson 47 Note: Study conducted via Amazon’s Mechanical Turk. This experiment matched participants into conditions based on shared partisanship.

Figure 1: Candidate Productivity Experiment: Minimum Qualification Assessments

Note: 95% confidence intervals included.

Figure 2: Competitive Context Experiment: Gender Differences in Minimum Qualification Expectations

2.5 Competitive Context Election

2.0

1.5

1.0

0.5 Hartley - Larson Hartley 0.0

-0.5 Female Male

Note: 95% confidence intervals included.

Figure 3. Partisan Differences Experiment: Gender Differences in Experience and Knowledge Ratings

Note: Each bar displays the difference in the average evaluation for the female Hartley to the male Hartley broken down by partisanship.

Figure 4. Partisan Differences Experiment: Gender Differences in Perceptions of Electoral Viability

Note: Each bar displays the difference in the percentage of participants selecting Hartley over Larson as the likely winner across the partisan conditions. Thus, each bar represents Hartley’s margin of perceived electoral victory.

Appendix 1: Additional Experiment Information

Experimental Stimuli: Candidate Productivity Experiment

High Productivity: Congresswoman/man Carol/Chris Hartley is running for re-election. Hartley’s three terms in the House of Representatives have been productive. Hartley chairs two important committees, is a member of the party leadership, and saw numerous successes as a lawmaker. During the last legislative term, Carol/Chris Hartley secured federal dollars to support repairing roads in the district and building a new community center. Hartley also successfully oversaw the passage of three bills that she sponsored into law, and co-sponsored several other pieces of legislation. Carol/Chris Hartley ranks as one of the most productive members of Congress with a reputation of working across the partisan aisle to get things done.

Low Productivity: Congresswoman/man Carol/Chris Hartley is running for re-election. Hartley just finished her first term in the House of Representatives. Hartley serves on several committees. During the last legislative term, Carol/Chris Hartley sponsored bills to bring federal dollars to the district to repair roads and build a new community center—but was ultimately not successful. Hartley sponsored no bills that successfully became law. Carol/Chris Hartley ranks as one of the least productive members of Congress with a reputation of refusing to work across the partisan aisle to get things done.

Experimental Stimuli: Competitive Context Study & Partisan Differences Stimuli

Female and Male Candidate Photos

Male Candidate Photos

Democratic (Republican) candidates Carol (Chris) Hartley and Tom Larson continued campaigning throughout the state as they seek the party’s nomination for the open seat to the House of Representatives.

Carol (Chris) Hartley just completed her (his) first term in the state senate. Before serving in the senate, Carol (Chris) worked in finance, and received her (his) bachelor’s degree from the State University.

Hartley’s opponent is Tom Larson who also has experience in the state senate. Larson is certainly a formidable opponent, and also held several major events and rallies throughout the state.

The two candidates are tied in the polls, and the race is expected to remain close right up until the day of the primary.

Appendix 2: Sample Information

Table A1: Sample Demographics Candidate Competitive Partisan Difference 2012 2014 2010 Productivity Context Sample Sample Pew CCES Census Sample, (MTurk) (SSI) (MTurk) % Women 62% 58% 62% 55.74% 53% 50.8% % Democrats 46% 52% 43% 40% 37% ** % Republicans 23% 28% 22% 24% 27% ** % Independents 32% 21% 35% 35% 27% ** % White 80% 72% 74% 80.87% 74% 74.83% Age

18-24 17% 19% 15 % 10.25% 8% 13.08%

25-44 58% 42% 59% 23.41% 21% 35.01% 45-64 22% 35% 18% 39.01% 47% 34.74% 65+ 3% 4% 5% 27.33% 25% 17.17%

Table A2: Competitive Context and Partisan Differences Experiments: Full Breakdown of Partisan Identification First Party Identification Sorting Question Competitive Context Partisan Difference Strong Democrat 16% 9% Democrat 25% 17% Weak Democrat 11% 17% Independent 20% 22 % Weak Republican 8% 16 % Republican 12% 14 % Strong Republican 7% 4% Partisan Sorting Question for Independents Competitive Context Partisan Difference Strong Democrat 2% 0.90% Democrat 5% 0.90% Weak Democrat 55% 1.79% Independent – Lean 0% 7.62% Democrat Independent – Lean 0% 8.07% Republican Weak Republican 0 1.35% Republican 28% 1.35% Strong Republican 4% 0.45% Note: The bottom half of the tables represents the partisan preferences of those selected Independent as their partisanship in the top half of the table. 5

Appendix 3: Experimental Design Pre-Tests

Candidate Names & Photos Pre-Test: The names along with the candidate photos were pre- tested with a separate sample of participants on Amazon’s Mechanical Turk in September 2012, N=129. There were no significant differences in the average ratings of the female or male candidate in terms of age, p=0.1460, education, p=0.9887, or attractiveness, p=0.3630.

Candidate Productivity Stimulus Pre-Test: The productivity stimulus was pre-tested with a sample recruited through Amazon’s Mechanical Turk in March 2016 (N=185). The pre-test asked participants to rate the high and low productivity stimuli. The stimulus did not mention a candidate’s gender or partisanship, and this choice is intentional. Presenting the text without the gender and partisanship of the candidate allows me to test whether participants make assumptions about the specific characteristics of the presented candidate. The pre-test asked participants to indicate the gender, partisanship, and ideology of the candidate. Here, I compare the perceptions of these candidate characteristics across the high and low productivity conditions. There are no differences in perceptions of the candidate’s gender across the two conditions (p=0.1174), the candidate’s partisanship (p=0.8499), and the candidate’s ideology (p=0.3174). The distribution of responses within the high and low productivity conditions on the partisanship question are evenly split between the Democratic, Republican, and Independent choices. In addition, on the ideology measure participants rated the candidate biographies for both conditions at the mid-point of the scale. The partisan and ideology variables affirm the partisan and ideological neutrality of the stimuli.

On the candidate gender question, participants uniformly indicated that the candidate in the high and low productivity question is male. This is not unexpected given that 80% of members of Congress are male. However, to be sure that the text is gender neutral I also included a feminine and masculine stereotype variable. Each participant rated how well a series of feminine and masculine traits described each treatment. The masculine traits included: assertive, tough, aggressive, masculine, active. The feminine traits included: warm, gentle, feminine, and sensitive. These traits come from existing research on the trait content of feminine and masculine stereotypes (Huddy and Terkildsen 1993, Prentice and Carranza 2002). I averaged the feminine and masculine trait items and coded the scales to range from 0-1. Masculine traits are the traits voters overwhelmingly identify with political leadership roles, and feminine traits are considered undesirable in leaders (Conroy 2015, Bauer 2015). With these measures, participants should associate both texts with masculine stereotypes and not feminine stereotypes. Moreover, participants do associate the high/low productivity text with masculine stereotypes rather than feminine stereotypes, p <0.001

Competitive Context/Partisan Differences Stimulus Pre-Test: The competitive context stimuli were pre-tested in June 2016 with an MTurk sample, (N=223). The pre-test included all four conditions of the experiment: Democratic candidates with a woman running, Republican candidates with a woman running, Democratic candidates with two men, and Republican candidates with two men. The pre-test asked participants to rate the perceived ideology of the candidates. Here, I compare the female versus male conditions within candidate party. On ideology, there were no significant differences across gender in the Republican conditions, 6 p=0.3314. There are significant differences across gender in the Democratic conditions with the female candidate rated as more liberal than the male candidate, p<0.001. This difference is not surprising and it is in line with extant scholarship showing that voters rate Democratic female candidates as more liberal than their male counterparts (McDermott 1998, 1997, Koch 2002, 2000).

Minimum Assessment Measures Pre-Test: The minimum qualification assessment measure in the candidate productivity experiment asked participants to rate whether the candidate possessed the following skills: hardworking, organized, ability to build consensus, and ability to manage multiple priorities. These four items come from a pre-test. The pre-test asked respondents recruited on MTurk (in November 2016, N=80) to list the skills a “typical member of Congress” needs to be an effective legislator. These four traits were the most frequently referenced qualifications participants wanted to see in congressional candidates.

The competitive context experiment used the same MTurk pre-test to identify a broader set of 10 skills that effective legislators should possess. These 10 items include ones listed by less than 50% of participants, but encompass nearly all of the skills participants listed in the study. The ten items include: compromising, building consensus, holding committee hearings, responding to constituent needs, public speaking, managing multiple priorities/organized, hardworking, sharing credit, sponsoring and co-sponsoring legislation, and passing bills. With this measure, participants indicated the number of items, but not which items, effective lawmakers should possess.

Appendix 4: Full Question Wordings and Response Options

Candidate Productivity Questions:

Minimum Ability: Which skills from the list below does Hartley have, based on the brief biography you read? Organized, Hardworking, Able to Manage Multiple Priorities, Able to Build Consensus among Colleagues Has this Skill/Does Not Have this Skill

Broad Inference: If this candidate were running for the Senate, how would you rate Hartley's chances of winning? Very Likely, Somewhat Likely, Somewhat Unlikely, Very Unlikely

Competitive Context Questions:

Minimum Ability: Below are ten types of behaviors that legislators typically engage in, how many of these behaviors would Hartley/Larson need to engage in to be perceived as an effective legislator? 1. Compromising with fellow lawmakers 2. Building bipartisan consensus 3. Holding committee hearings 4. Responding to constituents 5. Public speaking 6. Managing multiple priorities 7. Willing to stand ground 8. Sharing credit with others 9. Sponsoring and co-sponsoring legislation 10. Passing bills into laws

Broad Inference: Which candidate do you think has the best chance of winning the primary/general election? Carol/Chris Hartley, Tom Larson

Partisan Differences Questions:

Minimum Ability: In your opinion, which of the two candidates is more experienced/knowledgeable? Hartley is extremely more knowledgeable, Harley is somewhat more knowledgeable, Larson is somewhat more knowledgeable, Larson is extremely more knowledgeable

Broad Inference: Which candidate do you think has the best chance of winning the primary/general election? Carol/Chris Hartley, Tom Larson 8

Appendix 5: Candidate Productivity Experiment: Additional Analyses

Table A3: Proportion of Participants indicating Hartley Possesses Each Skill with Standard Deviation in Parentheses Hard Working Organized High Low High Low Female 0.971 0.521 0.881 0.625 (0.171) (0.502) (0.358) (0.487) Male 0.942 0.327 0.922 0.376 (0.235) (0.471) (0.269) (0.487) p-value 0.3232 0.0057 0.3256 0.0004 Build Consensus Manage Multiple Priorities High Low High Low Female 0.792 0.219 0.861 0.396 (0.408) (0.416) (0.347) (0.492) Male 0.835 0.129 0.903 0.248 (0.373) (0.337) (0.298) (0.434) p-value 0.4341 0.0955 0.3598 0.0257 Note: Each cell represents the proportion of participants selecting Hartley as possessing each of these four minimum qualifications. All p-values represent the results of a two-tailed t-test.

Table A4: Gender Differences across Participants, Proportion of Participants indicating Candidate Possesses Each Skill with Standard Deviation in Parentheses Hard Working Low High Female Male p-value Female Male p-value Cand. Cand. Cand. Cand. Female 0.545 0.344 0.0206 0.967 0.949 0.6374 Participant (0.507) (0.479) (0.156) (0.222) Male Participant 0.467 0.297 0.1587 0.976 0.931 0.3467 (0.507) (0.478) (0.181) (0.255) p-value 0.4791 0.6356 0.7973 0.7135 Build Consensus Low High Female Male p-value Female Male p-value Cand. Cand. Cand. Cand. Female 0.258 0.125 0.0558 0.767 0.847 0.2683 Participant (0.441) (0.347) (0.427) (0.362) Male Participant 0.133 0.135 0.9832 0.829 0.818 0.8950 (0.346) (0.347) (0.381) (0.363) p-value 0.1759 0.8849 0.4515 0.6956 Organized Low High Female Male p-value Female Male p-value Cand. Cand. Cand. Cand. Female 0.606 0.359 0.0046 0.850 0.949 0.0736 Participant (0.492) (0.484) (0.360) (0.222) Male Participant 0.667 0.405 0.0335 0.927 0.886 0.5289 (0.479) (0.498) (0.264) (0.321) p-value 0.5744 0.6494 0.2456 0.2431 Manage Multiple Priorities Low High Female Male p-value Female Male p-value Cand. Cand. Cand. Cand. Female 0.439 0.281 0.0614 0.900 0.932 0.5307 Participant (0.500) (0.453) (0.401) (0.254) Male Participant 0.300 0.189 0.2972 0.805 0.864 0.4714 (0.466) (0.397) (0.303) (0.347) p-value 0.1994 0.3064 0.1777 0.2492 Note: Each cell represents the proportion of participants, based on candidate gender, selecting Hartley as possessing each of these four minimum qualifications. All p-values represent the results of a two-tailed t-test.

Table A5: Partisan Differences across Participants, Proportion of Participants indicating Candidate Possesses Each Skill with Standard Deviation in Parentheses Hard Working Low High Female Male p-value Female Male p-value Cand. Cand. Cand. Cand. Democratic 0.500 0.291 0.0462 0.960 0.947 0.7812 Participant (0.505) (0.460) (0.198) (0.226) Republican 0.553 0.358 0.0515 0.980 0.938 0.2737 Participant (0.503) (0.484) (0.140) (0.242) p-value 0.5392 0.4795 0.5508 0.8540 Build Consensus Low High Female Male p-value Female Male p-value Cand. Cand. Cand. Cand. Democratic 0.224 0.105 0.1127 0.820 0.895 0.3334 Participant (0.422) (0.309) (0.428) (0.311) Republican 0.213 0.151 0.4270 0.764 0.800 0.6498 Participant (0.414) (0.361) (0.388) (0.403) p-value 0.8910 0.4882 0.4985 0.2153 Organized Low High Female Male p-value Female Male p-value Cand. Cand. Cand. Cand. Democratic 0.633 0.333 0.0029 0.880 0.895 0.8316 Participant (0.487) (0.477) (0.328) (0.311) Republican 0.617 0.415 0.0443 0.882 0.938 0.2893 Participant (0.491) (0.477) (0.325) (0.311) p-value 0.8760 0.4020 0.9712 0.4287 Manage Multiple Priorities Low High Female Male p-value Female Male p-value Cand. Cand. Cand. Cand. Democratic 0.408 0.250 0.0996 0.880 0.947 0.2815 Participant (0.497) (0.438) (0.367) (0.226) Republican 0.383 0.245 0.1401 0.843 0.877 0.6042 Participant (0.491) (0.434) (0.367) (0.226) p-value 0.8034 0.9568 0.5963 0.2482 Note: Each cell represents the proportion of participants, based on candidate gender, selecting Hartley as possessing each of these four minimum qualifications. All p-values represent the results of a two-tailed t-test.

Table A6: Differences across Participant Gender in Electoral Viability for High and Low Productive Female and Male Incumbents Low Productivity Conditions High Productivity Conditions Female Cand. Male Cand. p-values Female Cand. Male Cand. p-values Female 0.189 (0.170) 0.129 (0.133) 0.0262 0.533 (0.127) 0.568 (0.138) 0.1570 Male 0.167 (0.178) 0.203 (0.194) 0.4355 0.530 (0.126) 0.574 (0.138) 0.1205 p-value 0.5512 0.0261 0.9118 0.8200 Note: The scales for electoral viability range from 0-1 with values closer to 1 indicating a high level of electoral viability and values closer to 0 indicate a low level of electoral viability. All p-values represent the results of a two-tailed t-test.

Table A7: Differences across Participant Gender in Electoral Viability for High and Low Productive Female and Male Incumbents Low Productivity Conditions High Productivity Conditions Female Cand. Male Cand. p-values Female Cand. Male Cand. p-values Dems. 0.168 (0.187) 0.167 (0.158) 0.9576 0.520 (0.141) 0.605 (0.138) 0.0058 Reps. 0.197 (0.156) 0.146 (0.166) 0.1554 0.544 (0.108) 0.550 (0.127) 0.7920 p-value 0.4208 0.5279 0.3380 0.0412 Note: The scales for electoral viability range from 0-1 with values closer to 1 indicating a high level of electoral viability and values closer to 0 indicate a low level of electoral viability. All p-values represent the results of a two-tailed t-test.

Appendix 6: Competitive Context Experiment: Additional Analyses

Table A8: Full Group Means on Task Number Assessments, M (SD) Hartley Larson p-value Female vs. Male 5.991 (3.262) 4.904 (3.591) 0.0242 Male vs. Male 6.108 (3.088) 5.568 (3.627) 0.1900 p-value 0.7829 0.1686 Note: Each cell displays the number of tasks a candidate needed to complete to be considered qualified for political office. The variable ranges from 0-10. Primary Viability General Viability Female vs. Male 49% 48% Male vs. Male 58% 53% p-value 0.1553 0.4630 Note: Each cell displays the percentage of participants selected Hartley as more likely to win the election relative to Larson. All p-values represent the results of a two-tailed t-test.

Table A9: Differences across Participant Gender, Task Number Assessments, M (SD) Hartley Larson Female Part. Male Part. p-value Female Part. Male Part. p-value Female vs. Male 6.042 (3.243) 5.907 (3.330) 0.8314 4.320 (3.665) 5.884 (3.665) 0.0231 Male vs. Male 6.508 (3.181) 5.654 (2.943) 0.1464 5.102 (3.854) 6.096 (3.309) 0.1503 p-value 0.4099 0.6951 0.2372 0.7551 Note: Each cell displays the number of tasks a candidate needed to complete to be considered qualified for political office. The variable ranges from 0-10. Primary Viability General Viability Female Part. Male Part. p-value Female Part. Male Part. p-value Female vs. Male 58% 33% 0.0072 52% 42% 0.3272 Male vs. Male 66% 49% 0.0712 63% 42% 0.0240 p-value 0.3664 0.1089 0.2007 0.9473 Note: Each cell displays the proportion of participants selected Hartley as more likely to win the election relative to Larson. All p-values represent the results of a two-tailed t-test.

Table A10: Differences across Participant Party, Task Number Assessments, M (SD) Hartley Larson Dem. Part. Rep. Part. p-value Dem. Part. Rep. Part. p-value Female vs. Male 6.278 (3.277) 6.208 (3.128) 0.9065 4.807 (3.687) 4.438 (3.903) 0.5893 Male vs. Male 5.985 (3.013) 5.300 (3.313) 0.3211 6.154 (3.251) 5.667 (3.367) 0.5036 p-value 0.5776 0.2264 0.0217 0.1584 Note: Each cell displays the number of tasks a candidate needed to complete to be considered qualified for political office. The variable ranges from 0-10. Primary Viability General Viability Dem. Part. Rep. Part. p-value Dem. Part. Rep. Part. p-value Female vs. Male 66.27% 54.17% 0.1722 56.10% 57.45% 0.8829 Male vs. Male 36.92% 51.72% 0.1823 40.62% 43.33% 0.8063 p-value 0.0003 0.8378 0.0642 0.2322 Note: Each cell displays the proportion of participants selected Hartley as more likely to win the election relative to Larson. All p-values represent the results of a two-tailed t-test. Appendix 7: Partisan Differences Experiment: Additional Analyses

Table A11: Group Means on Experience and Knowledge, M(SD) Experience Knowledge Democrats Republicans p-value Democrats Republicans p-value Female v. Male 0.571 0.521 0.2679 0.667 0.564 .0235 (0.243) (0.246) (0.232) (0.256) Male v. Male 0.592 0.660 0.0879 0.615 0.667 0.1433 (0.226) (0.162) (0.195) (0.155) p-value 0.6315 0.0014 0.1891 0.0179 Note: Each cell includes the full group means with the standard deviations in parentheses. The p- values reflect the results of a two-tailed t-test Primary Viability General Viability Democrats Republicans p-value Democrats Republicans p-value Female v. Male 65% 53% 0.1759 57% 45% 0.2083 Male v. Male 67% 83% 0.0676 57% 85% 0.0016 p-value 0.8038 0.0010 0.9784 0.0001 Note: Each cell includes the percentage of participants selected Hartley as the likely winner of the primary and general election. The response options on these questions included either Hartley or Larson. All p-values represent the results of a two-tailed t-test.

Table A12: Partisan Differences Experiment: Group Means on Experience and Knowledge, M(SD) Experience Democrats Republicans Female Part. Male Part. p-value Female Part. Male Part. p-value Female v. Male 0.561 (0.252) 0.591 (0.228) 0.6444 0.500 (0.229) 0.569 (0.283) 0.3447 Male v. Male 0.635 (0.214) 0.539 (0.232) 0.1039 0.679 (0.146) 0.633 (0.084) 0.3473 p-value 0.1855 0.4364 0.0007 0.4087 Knowledge Democrats Republicans Female Part. Male Part. p-value Female Part. Male Part. p-value Female v. Male 0.674 (0.217) 0.652 (0.262) 0.7073 0.570 (0.244) 0.549 (0.287) 0.7796 Male v. Male 0.635 (0.214) 0.590 (0.171) 0.3809 0.667 (0.131) 0.667 (0.187) 1.00 p-value 0.4412 0.3318 0.0661 0.1434 Primary Viability Democrats Republicans Female Part. Male Part. p-value Female Part. Male Part. p-value Female v. Male 0.561 (0.502) 0.818 (0.394) 0.0419 0.526 (0.506) 0.529 (0.514) 0.9834 Male v. Male 0.688 (0.471) 0.654 (0.485) 0.7904 0.815 (0.396) 0.850 (0.366) 0.7573 p-value 0.2763 0.2100 0.0162 0.0340 General Viability Democrats Republicans Female Part. Male Part. p-value Female Part. Male Part. p-value Female v. Male 0.536 (0.505) 0.636 (0.492) 0.4536 0.447 (0.504) 0.471 (0.514) 0.8759 Male v. Male 0.594 (0.500) 0.538 (0.508) 0.6789 0.900 (0.321) 0.800 (0.410) 0.4084 p-value 0.6310 0.5034 0.0002 0.0372 Note: Cells displays Hartley’s average rating in across the candidate gender conditions broken down by female and male participants. All p-values represent the results of a two-tailed t-test.