Download Preprint

The Necessity of Construct and External Validity for Generalized Causal Claims∗ Kevin M. Esterling David Brady (Corresponding Author) Professor Professor School of Public Policy School of Public Policy and UC{Riverside and Department of Political Science WZB Berlin Social Science Center UC{Riverside [email protected] [email protected] Eric Schwitzgebel Professor Department of Philosophy UC{Riverside [email protected] June 18, 2021 ∗An earlier version was presented as part of the WZB Talks series. We thank Elias Barein- boim, Michael Bates, Shaun Bowler, Nancy Cartwright, Carlos Cinelli, Uljana Feest, Diogo Ferrari, Christian Fong, Francesco Guala, Steffen Huck, Macartan Humphreys, Robert Kaest- ner, Sampada KC, Jon Krosnick, Dorothea Kubler, Doug Lauen, Joscha Legewie, Michael Neblo, JörgPeters, Alex Rosenberg, Heike Solga, Jacqueline Sullivan, Nicholas Weller, Bernhard Wes- sels, Ang Yu and the participants in the MAMA workshop in UCR Psychology for comments. Abstract By advancing causal identification, the credibility revolution has facilitated tremendous progress in political science. The ensuing emphasis on internal validity however has led to the neglect of construct and external validity. This article develops a framework we call causal specification. The framework formally demonstrates the joint necessity of internal, construct and external validity for causal generalization. Indeed, the lack of any of the three types of validity undermines the credibility revolution's own goal to understand causality deductively. Without construct, one cannot accurately label the cause or outcome. Without external, one cannot understand the conditions enabling the cause to have an effect. Our framework clarifies the assumptions necessary for each of the three. We show why internal validity should not have lexical priority, and advocate for equally valuing construct and external validity. Political scientists should use causal specification, not just causal identification, as a framework for the goal of causal generalization. Introduction The Gold Standard Lab (GSL) undertakes a study of an intervention aiming to increase juror turnout.1 Inspired by get out the vote (GOTV) studies (e.g., Arceneaux & Nickerson 2010; Gerber & Green 2000), GSL exposes residents to different messages using jury summons reminder postcards (partially replicating Bowler et al. 2014). Fortunately, the researchers in GSL are well-trained in design-based causal inference and so they conduct a gold-plated randomized controlled trial (RCT). GSL ask the Riverside (CA) County Superior Court to mail official government postcards to residents who recently received a jury summons, randomizing so that half receive a standard reminder postcard and the other half receive a postcard indicating that failure to appear could result in fines or imprisonment. The \enforcement" condition results in a statistically significant 10 percent increase in turnout relative to the \control" condition { an effect size more than 20 times that typically found in GOTV postcard experiments. Given these strong results, GSL recommends courts adopt the enforcement message as a policy, and they publish an article containing the causal generalization: \Enforcement messages increase juror turnout." Eager to demonstrate the efficacy of the enforcement message in other contexts, GSL next collaborates with the superior court in Orange County, California { a more affluent adjacent county { to implement the identical gold- plated evaluation. Much to their surprise, the enforcement postcard shows no treatment effect. Discouraged, GSL returns to Riverside to replicate the initial results. This time the jury administrator is too busy to collaborate so GSL use postcards without the official court seal and return address. They believe the different postcards will not matter because the enforcement message { the causal variable in their understanding { is the same. Unfortunately, they find no treatment effect in this replication. GSL understands research best practices in light of the \credibility revolution." The 1Jury service is a form of democratic participation, which is a political right that governments can coerce (Rose 2005). Courts routinely seek low-cost methods to increase the yield for jury summonses (Boatright 1999). 1 credibility revolution emphasizes formalizing the assumptions necessary for using observed data to identify causal effects. While the credibility revolution has facilitated progress in political science by emphasizing internal validity, the GSL vignette illustrates a critical problem. GSL's gold-plated experiment may have achieved excellent internal validity. But because they ignored construct validity, GSL misattributed the cause (e.g., the enforcement message vs. some other feature of the intervention), and because they ignored external validity they misunderstood the scope of generalization (e.g., that the intervention would also work in Orange County). To state the problem more generally: our casual identification strategies have pri- marily aimed at ensuring internal validity (Angrist & Pischke 2010), but they routinely omit considerations of construct and external validity. This leaves a substantial gap between identifying a causal effect of measured variables and arriving at a correct causal generalization about what causes what (Shadish et al. 2002). Without construct validity, one cannot accurately label the cause or outcome. Without external validity, one cannot understand the conditions enabling the cause to have an effect (Falleti & Lynch 2009). The methods inspired by the credibility revolution take the variables that are actually measured as fundamental for understanding causality, and often leave the scope of generalization implicitly local to the experimental setting. Nevertheless, researchers rarely interpret their findings in terms of only the measured variables and are often not explicit about their limited scope of generalization. Instead, scholars typically interpret their findings in terms of general causes (\enforcement messages") and general effects (\juror turnout") that generalize beyond their study's setting. Causes, effects, and the conditions in settings are referents in nature, and claims regarding their relationships are ontolog- ical. As a result, a correct generalized causal claim requires not only the identification of a causal effect between measured variables (i.e. internal validity) but also the correct specification of the cause and effect (i.e. construct validity) and correct specification of the scope of the generalization (i.e. external validity). 2 Readers of social science journals likely recognize that the scenario of the GSL vignette is unfortunately common. The influential textbooks from Morgan & Winship(2015), Imbens & Rubin(2015), and Gerber & Green(2012) have no index entries for either construct or external validity. Likewise, the foundational paper from Angrist et al. (1996) lacks mention of these concepts.2 Partly because so much attention has focused solely on internal validity, we have collectively lost sight of the importance of construct and external validity. In turn, prevailing practices and the singular focus on internal validity are inhibiting collective progress in political science towards causal generalization. This article makes several unique contributions to address the neglect of construct and external validity. First, we develop a formal framework we call causal specification. This formal framework synthesizes and lends analytical precision to a variety of interdis- ciplinary developments and augments prevailing approaches to causal inference. Second, the framework clarifies and illuminates the assumptions necessary for construct and external validity. Within the framework, all three types of validity are founded on assumptions and hence are not established by statistical validation (Morton & Williams 2010, 255). Consequently, our framework does not propose new statistical procedures. However, it clarifies why even those studies with ideal causal identification can never be theory-free (Deaton & Cartwright 2018; Pearl & MacKenzie 2018). Third, the framework demonstrates that all three of internal, construct and external validity are jointly necessary. Internal validity alone is not sufficient for causal generalization. While internal validity assumes the causal effect is not confounded, construct validity assumes correct labeling of the cause and outcome, and external validity assumes correctly specifying the conditions that enable or disable the cause. Absent any of the three validities, the credibility revolution cannot meet its own goal of understanding 2Morton & Williams(2010, chapter 7) has a sophisticated discussion of both construct and external validity drawing largely from Shadish et al. (2002). Our framework formalizes and extends that discussion and highlights some limitations of existing approaches to these typically neglected aspects of validity. 3 causality deductively. Fourth, the causal specification framework charts a way forward for social scientists aiming to make causal generalizations. In the process, we advocate for a rebalancing that equally values all three validities. Internal validity has no special status or lexical priority. We encourage political scientists to use causal specification { not just causal identification { as a framework for causal generalization. The Credibility Revolution The GSL team reported the successful results from the first Riverside trial because their training in design-based

Download Preprint

Assessment Center Structure and Construct Validity: a New Hope

Construct Validity and Reliability of the Work Environment Assessment Instrument WE-10

Construct Validity in Psychological Tests

Questionnaire Validity and Reliability

How to Create Scientifically Valid Social and Behavioral Measures on Gender Equality and Empowerment

Reliability & Validity

Construct Validity in Psychological Tests – the Case of Implicit Social Cognition

Reliability and Validity of the MBTI® Instrument

Handout 5: Establishing the Validity of a Survey Instrument STAT 335 – Fall 2016

Document Resume

Stages of Psychometric Measure Development: the Example of the Generalized Expertise Measure (GEM)

Psychological Testing: Basic Concepts Commonand Misconceptions Anne Anastasi