Are Survey Experiments Externally Valid?

Total Page:16

File Type:pdf, Size:1020Kb

Are Survey Experiments Externally Valid? American Political Science Review Vol. 104, No. 2 May 2010 doi:10.1017/S0003055410000092 Are Survey Experiments Externally Valid? JASON BARABAS and JENNIFER JERIT Florida State University esearchers use survey experiments to establish causal effects in descriptively representative samples, but concerns remain regarding the strength of the stimuli and the lack of realism in R experimental settings. We explore these issues by comparing three national survey experiments on Medicare and immigration with contemporaneous natural experiments on the same topics. The survey experiments reveal that providing information increases political knowledge and alters attitudes. In contrast, two real-world government announcements had no discernable effects, except among people who were exposed to the same facts publicized in the mass media. Even among this exposed subsample, treatment effects were smaller and sometimes pointed in the opposite direction. Methodologically, our results suggest the need for caution when extrapolating from survey experiments. Substantively, we find that many citizens are able to recall factual information appearing in the news but may not adjust their beliefs and opinions in response to this information. ocial science researchers seek to establish causal comparing the effects of two naturally occurring po- relationships that are generalizable—that is, they litical events to contemporaneous survey experiments Stry to maximize internal and external validity. Sur- that delivered similar information to diverse samples vey experiments are becoming more popular among of the American public. In the first study, we examine scholars because they seem to possess both proper- the effects of news coverage of the 2007 Medicare trust ties. The random assignment of respondents to treat- fund warning and compare it to a survey experiment ment and control conditions reveals whether one factor that provided the same key facts about Medicare’s fi- causes another, whereas the use of a representative nancial status. In the second study, we analyze media sample allows generalization to the larger population. coverage of a new citizenship test launched by the U.S. However, even in nationally representative survey Citizenship and Immigration Services (USCIS) in the experiments, external validity may still be a concern if fall of 2008. Once again, we conducted a concurrent the treatments do not resemble the relevant phenom- survey experiment that delivered similar information ena in question or if the experimental setting exagger- to an adult national sample. Taken together, the two ates the effect of the stimulus. This article investigates studies offer a rare opportunity to compare the treat- a question that recent studies have raised (e.g., Gaines, ment effects from survey experiments with the effects Kuklinski, and Quirk 2007; Kinder 2007) but that has of real-world political events. not been examined empirically: are the causal findings This research is timely, if only because survey ex- of survey experiments reliable predictors of how opin- periments have overturned much of the conventional ion changes in the wake of actual political events? wisdom on the nature of public opinion (e.g., Gibson For researchers using survey experiments, the im- 1998; Kinder and Sanders 1996; Krosnick and Schu- plicit assumption is that a significant treatment effect man 1988; Prior and Lupia 2008; Sniderman and Pi- says something about the direction, if not the rough azza 1993; Zaller and Feldman 1992). Indeed, one magnitude, of effects that might be expected to occur scholar recently proclaimed, “survey experiments that in the real world (Gaines, Kuklinski, and Quirk 2007, integrate representative samples with the experimental 5). Our study investigates this assumption. We do so by control of questions represent the most valuable tool for gaining access to the processes that underlie opin- ion formation” (Lavine 2002, 242). With the expansion Jason Barabas is Associate Professor, Department of Political Sci- of Internet surveys and multi-investigator studies such ence, Florida State University, 531 Bellamy Building, Tallahassee, FL 32306 ([email protected]). as Time-sharing Experiments in the Social Sciences, Jennifer Jerit is Associate Professor, Department of Political Sci- researchers from a variety of subfields are now using ence, Florida State University, 531 Bellamy Building, Tallahassee, survey experiments (Druckman et al. 2006). Thus, the FL 32306 ([email protected]). number of scholars affected by this issue is large and The authors appreciate the helpful comments they received from growing.1 Scott Allard, Charles Barrilleaux, Bill Berry, John Bullock, Brian Gaines, Cindy Kam, Gary King, Jon Krosnick, Jim Kuklinski, Skip We draw two lessons from our study—one is pri- Lupia, Cherie Maestas, Rose McDermott, Betsy Sinclair, Mark marily methodological, the other is more substantive. Souva, John Transue, Mathieu Turgeon, Gabor Toka, Paul Quirk, Methodologically, we show that survey experiments Will Shadish, Joe Young, and participants in colloquia at Florida generate effects that are observable among particu- State University, the University of North Carolina at Chapel Hill, lar subgroups, not necessarily the entire population. Texas A&M University, and Vanderbilt University. Earlier ver- sions of this article were presented at the Visions in Methodol- Insofar as researchers keep this point in mind, we be- ogy Conference at Ohio State University, and at annual meetings lieve that survey experiments can be a valuable tool of the American Political Science Association, the Canadian Po- litical Science Association, and the International Society of Politi- cal Psychology. Ben Gaskins provided valuable research assistance. 1 Indeed, scholars in other fields are engaged in an analogous line of Data, replication code, and an online Appendix are available at inquiry (e.g., Benz and Meier 2008; Cook, Shadish, and Wong 2008; http://polisci.fsu.edu/people/faculty/index.htm. Levitt and List 2007). 226 American Political Science Review Vol. 104, No. 2 for studying public opinion. Substantively, we find that findings may be “the product of an unrealistically pow- people in the natural experiments do not integrate new erful manipulation, one that rarely occurs in natural information and adjust their political beliefs to the de- settings.” For example, in a critique of the framing lit- gree that they do in survey experiments. Thus, scholars erature, Sniderman and Theriault (2004) argue that it might come to different conclusions about the nature is unrealistic to examine the effects of a single frame of public opinion depending on the manner in which when citizens typically experience competing frames they study it. (also see Chong and Druckman 2007). To the extent that treatments in survey experiments are overly strong or atypical, the observed effects may not generalize VARIETIES OF EXPERIMENTS beyond the particular study at hand. AND VALIDITY A second and related issue is the manner in which Years ago, scholars rarely undertook social science treatments are received (i.e., the experimental setting). experiments (Kinder and Palfrey 1993). Today, exper- Kinder (2007, 157) is concerned that imental research is conducted in many different ways (McDermott 2002): laboratory experiments, carried experimental results can always be questioned on their out in controlled settings with students or members generalizability, and framing effects are no exception. The of the local community as subjects (e.g., Grosser and major worry in this respect is that framing experiments— Schram 2006; Iyengar and Kinder 1987; Kam, Wilking, like experiments in mass communication generally— typically obliterate the distinction between the supply of and Zechmeister 2007; Mutz and Reeves 2005); information, on the one hand, and its consumption, on field experiments, in which randomized treatments are the other. That is, experiments are normally carried out in delivered outside the lab (e.g., Arceneaux and Kolodny such a way that virtually everyone receives the message. 2009; Gerber and Green 2000); and natural experi- The typical experiment thereby avoids a major obstacle ments, where analysts take advantage of variation in standing in the way of communication effects, namely, an real-world phenomena (e.g., Huber and Arceneaux inattentive audience, lost in the affairs of private life. By 2007; Lassen 2005; Mondak 1995).2 Increasingly, public ensuring that frames reach their intended audiences, ex- opinion researchers employ survey experiments that periments may exaggerate their power. deliver randomized treatments in telephone or Inter- net polls. This approach is common in the area of citizen Likewise, in their framework, Gaines, Kuklinski, and competence, where researchers use survey experi- Quirk (2007, 16) introduce an “inflation parameter” in ments to examine whether people are capable of learn- recognition of the fact that “the artificially clean envi- ing (e.g., Kuklinski et al. 2000) and to analyze how the ronment of the survey question makes treatment easier presentation of information alters beliefs or opinions to receive than in real life.” Unlike the natural world, (e.g., Berinsky 2007; Kuklinski et al. 2001; Lupia et al. which contains competing messages and other distrac- n.d.; Sniderman and Theriault 2004; Turgeon 2009). tions that make exposure to the treatment probabilistic, Because of their randomized treatments, experi- exposure is essentially forced in a survey experiment.
Recommended publications
  • Statistical Inference: How Reliable Is a Survey?
    Math 203, Fall 2008: Statistical Inference: How reliable is a survey? Consider a survey with a single question, to which respondents are asked to give an answer of yes or no. Suppose you pick a random sample of n people, and you find that the proportion that answered yes isp ˆ. Question: How close isp ˆ to the actual proportion p of people in the whole population who would have answered yes? In order for there to be a reliable answer to this question, the sample size, n, must be big enough so that the sample distribution is close to a bell shaped curve (i.e., close to a normal distribution). But even if n is big enough that the distribution is close to a normal distribution, usually you need to make n even bigger in order to make sure your margin of error is reasonably small. Thus the first thing to do is to be sure n is big enough for the sample distribution to be close to normal. The industry standard for being close enough is for n to be big enough so that 1 − p 1 − p n > 9 and n > 9 p p both hold. When p is about 50%, n can be as small as 10, but when p gets close to 0 or close to 1, the sample size n needs to get bigger. If p is 1% or 99%, then n must be at least 892, for example. (Note also that n here depends on p but not on the size of the whole population.) See Figures 1 and 2 showing frequency histograms for the number of yes respondents if p = 1% when the sample size n is 10 versus 1000 (this data was obtained by running a computer simulation taking 10000 samples).
    [Show full text]
  • SAMPLING DESIGN & WEIGHTING in the Original
    Appendix A 2096 APPENDIX A: SAMPLING DESIGN & WEIGHTING In the original National Science Foundation grant, support was given for a modified probability sample. Samples for the 1972 through 1974 surveys followed this design. This modified probability design, described below, introduces the quota element at the block level. The NSF renewal grant, awarded for the 1975-1977 surveys, provided funds for a full probability sample design, a design which is acknowledged to be superior. Thus, having the wherewithal to shift to a full probability sample with predesignated respondents, the 1975 and 1976 studies were conducted with a transitional sample design, viz., one-half full probability and one-half block quota. The sample was divided into two parts for several reasons: 1) to provide data for possibly interesting methodological comparisons; and 2) on the chance that there are some differences over time, that it would be possible to assign these differences to either shifts in sample designs, or changes in response patterns. For example, if the percentage of respondents who indicated that they were "very happy" increased by 10 percent between 1974 and 1976, it would be possible to determine whether it was due to changes in sample design, or an actual increase in happiness. There is considerable controversy and ambiguity about the merits of these two samples. Text book tests of significance assume full rather than modified probability samples, and simple random rather than clustered random samples. In general, the question of what to do with a mixture of samples is no easier solved than the question of what to do with the "pure" types.
    [Show full text]
  • Summary of Human Subjects Protection Issues Related to Large Sample Surveys
    Summary of Human Subjects Protection Issues Related to Large Sample Surveys U.S. Department of Justice Bureau of Justice Statistics Joan E. Sieber June 2001, NCJ 187692 U.S. Department of Justice Office of Justice Programs John Ashcroft Attorney General Bureau of Justice Statistics Lawrence A. Greenfeld Acting Director Report of work performed under a BJS purchase order to Joan E. Sieber, Department of Psychology, California State University at Hayward, Hayward, California 94542, (510) 538-5424, e-mail [email protected]. The author acknowledges the assistance of Caroline Wolf Harlow, BJS Statistician and project monitor. Ellen Goldberg edited the document. Contents of this report do not necessarily reflect the views or policies of the Bureau of Justice Statistics or the Department of Justice. This report and others from the Bureau of Justice Statistics are available through the Internet — http://www.ojp.usdoj.gov/bjs Table of Contents 1. Introduction 2 Limitations of the Common Rule with respect to survey research 2 2. Risks and benefits of participation in sample surveys 5 Standard risk issues, researcher responses, and IRB requirements 5 Long-term consequences 6 Background issues 6 3. Procedures to protect privacy and maintain confidentiality 9 Standard issues and problems 9 Confidentiality assurances and their consequences 21 Emerging issues of privacy and confidentiality 22 4. Other procedures for minimizing risks and promoting benefits 23 Identifying and minimizing risks 23 Identifying and maximizing possible benefits 26 5. Procedures for responding to requests for help or assistance 28 Standard procedures 28 Background considerations 28 A specific recommendation: An experiment within the survey 32 6.
    [Show full text]
  • Survey Experiments
    IU Workshop in Methods – 2019 Survey Experiments Testing Causality in Diverse Samples Trenton D. Mize Department of Sociology & Advanced Methodologies (AMAP) Purdue University Survey Experiments Page 1 Survey Experiments Page 2 Contents INTRODUCTION ............................................................................................................................................................................ 8 Overview .............................................................................................................................................................................. 8 What is a survey experiment? .................................................................................................................................... 9 What is an experiment?.............................................................................................................................................. 10 Independent and dependent variables ................................................................................................................. 11 Experimental Conditions ............................................................................................................................................. 12 WHY CONDUCT A SURVEY EXPERIMENT? ........................................................................................................................... 13 Internal, external, and construct validity ..........................................................................................................
    [Show full text]
  • Evaluating Survey Questions Question
    What Respondents Do to Answer a Evaluating Survey Questions Question • Comprehend Question • Retrieve Information from Memory Chase H. Harrison Ph.D. • Summarize Information Program on Survey Research • Report an Answer Harvard University Problems in Answering Survey Problems in Answering Survey Questions Questions – Failure to comprehend – Failure to recall • If respondents don’t understand question, they • Questions assume respondents have information cannot answer it • If respondents never learned something, they • If different respondents understand question cannot provide information about it differently, they end up answering different questions • Problems with researcher putting more emphasis on subject than respondent Problems in Answering Survey Problems in Answering Survey Questions Questions – Problems Summarizing – Problems Reporting Answers • If respondents are thinking about a lot of things, • Confusing or vague answer formats lead to they can inconsistently summarize variability • If the way the respondent remembers something • Interactions with interviewers or technology can doesn’t readily correspond to the question, they lead to problems (sensitive or embarrassing may be inconsistemt responses) 1 Evaluating Survey Questions Focus Groups • Early stage • Qualitative research tool – Focus groups to understand topics or dimensions of measures • Used to develop ideas for questionnaires • Pre-Test Stage – Cognitive interviews to understand question meaning • Used to understand scope of issues – Pre-test under typical field
    [Show full text]
  • MRS Guidance on How to Read Opinion Polls
    What are opinion polls? MRS guidance on how to read opinion polls June 2016 1 June 2016 www.mrs.org.uk MRS Guidance Note: How to read opinion polls MRS has produced this Guidance Note to help individuals evaluate, understand and interpret Opinion Polls. This guidance is primarily for non-researchers who commission and/or use opinion polls. Researchers can use this guidance to support their understanding of the reporting rules contained within the MRS Code of Conduct. Opinion Polls – The Essential Points What is an Opinion Poll? An opinion poll is a survey of public opinion obtained by questioning a representative sample of individuals selected from a clearly defined target audience or population. For example, it may be a survey of c. 1,000 UK adults aged 16 years and over. When conducted appropriately, opinion polls can add value to the national debate on topics of interest, including voting intentions. Typically, individuals or organisations commission a research organisation to undertake an opinion poll. The results to an opinion poll are either carried out for private use or for publication. What is sampling? Opinion polls are carried out among a sub-set of a given target audience or population and this sub-set is called a sample. Whilst the number included in a sample may differ, opinion poll samples are typically between c. 1,000 and 2,000 participants. When a sample is selected from a given target audience or population, the possibility of a sampling error is introduced. This is because the demographic profile of the sub-sample selected may not be identical to the profile of the target audience / population.
    [Show full text]
  • The Evidence from World Values Survey Data
    Munich Personal RePEc Archive The return of religious Antisemitism? The evidence from World Values Survey data Tausch, Arno Innsbruck University and Corvinus University 17 November 2018 Online at https://mpra.ub.uni-muenchen.de/90093/ MPRA Paper No. 90093, posted 18 Nov 2018 03:28 UTC The return of religious Antisemitism? The evidence from World Values Survey data Arno Tausch Abstract 1) Background: This paper addresses the return of religious Antisemitism by a multivariate analysis of global opinion data from 28 countries. 2) Methods: For the lack of any available alternative we used the World Values Survey (WVS) Antisemitism study item: rejection of Jewish neighbors. It is closely correlated with the recent ADL-100 Index of Antisemitism for more than 100 countries. To test the combined effects of religion and background variables like gender, age, education, income and life satisfaction on Antisemitism, we applied the full range of multivariate analysis including promax factor analysis and multiple OLS regression. 3) Results: Although religion as such still seems to be connected with the phenomenon of Antisemitism, intervening variables such as restrictive attitudes on gender and the religion-state relationship play an important role. Western Evangelical and Oriental Christianity, Islam, Hinduism and Buddhism are performing badly on this account, and there is also a clear global North-South divide for these phenomena. 4) Conclusions: Challenging patriarchic gender ideologies and fundamentalist conceptions of the relationship between religion and state, which are important drivers of Antisemitism, will be an important task in the future. Multiculturalism must be aware of prejudice, patriarchy and religious fundamentalism in the global South.
    [Show full text]
  • A Meta-Analysis of the Effect of Concurrent Web Options on Mail Survey Response Rates
    When More Gets You Less: A Meta-Analysis of the Effect of Concurrent Web Options on Mail Survey Response Rates Jenna Fulton and Rebecca Medway Joint Program in Survey Methodology, University of Maryland May 19, 2012 Background: Mixed-Mode Surveys • Growing use of mixed-mode surveys among practitioners • Potential benefits for cost, coverage, and response rate • One specific mixed-mode design – mail + Web – is often used in an attempt to increase response rates • Advantages: both are self-administered modes, likely have similar measurement error properties • Two strategies for administration: • “Sequential” mixed-mode • One mode in initial contacts, switch to other in later contacts • Benefits response rates relative to a mail survey • “Concurrent” mixed-mode • Both modes simultaneously in all contacts 2 Background: Mixed-Mode Surveys • Growing use of mixed-mode surveys among practitioners • Potential benefits for cost, coverage, and response rate • One specific mixed-mode design – mail + Web – is often used in an attempt to increase response rates • Advantages: both are self-administered modes, likely have similar measurement error properties • Two strategies for administration: • “Sequential” mixed-mode • One mode in initial contacts, switch to other in later contacts • Benefits response rates relative to a mail survey • “Concurrent” mixed-mode • Both modes simultaneously in all contacts 3 • Mixed effects on response rates relative to a mail survey Methods: Meta-Analysis • Given mixed results in literature, we conducted a meta- analysis
    [Show full text]
  • (“Quasi-Experimental”) Study Designs Are Most Likely to Produce Valid Estimates of a Program’S Impact?
    Which Comparison-Group (“Quasi-Experimental”) Study Designs Are Most Likely to Produce Valid Estimates of a Program’s Impact?: A Brief Overview and Sample Review Form Originally published by the Coalition for Evidence- Based Policy with funding support from the William T. Grant Foundation and U.S. Department of Labor, January 2014 Updated by the Arnold Ventures Evidence-Based Policy team, December 2018 This publication is in the public domain. Authorization to reproduce it in whole or in part for educational purposes is granted. We welcome comments and suggestions on this document ([email protected]). Brief Overview: Which Comparison-Group (“Quasi-Experimental”) Studies Are Most Likely to Produce Valid Estimates of a Program’s Impact? I. A number of careful investigations have been carried out to address this question. Specifically, a number of careful “design-replication” studies have been carried out in education, employment/training, welfare, and other policy areas to examine whether and under what circumstances non-experimental comparison-group methods can replicate the results of well- conducted randomized controlled trials. These studies test comparison-group methods against randomized methods as follows. For a particular program being evaluated, they first compare program participants’ outcomes to those of a randomly-assigned control group, in order to estimate the program’s impact in a large, well- implemented randomized design – widely recognized as the most reliable, unbiased method of assessing program impact. The studies then compare the same program participants with a comparison group selected through methods other than randomization, in order to estimate the program’s impact in a comparison-group design.
    [Show full text]
  • How to Design and Create an Effective Survey/Questionnaire; a Step by Step Guide Hamed Taherdoost
    How to Design and Create an Effective Survey/Questionnaire; A Step by Step Guide Hamed Taherdoost To cite this version: Hamed Taherdoost. How to Design and Create an Effective Survey/Questionnaire; A Step by Step Guide. International Journal of Academic Research in Management (IJARM), 2016, 5. hal-02546800 HAL Id: hal-02546800 https://hal.archives-ouvertes.fr/hal-02546800 Submitted on 23 Apr 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. International Journal of Academic Research in Management (IJARM) Vol. 5, No. 4, 2016, Page: 37-41, ISSN: 2296-1747 © Helvetic Editions LTD, Switzerland www.elvedit.com How to Design and Create an Effective Survey/Questionnaire; A Step by Step Guide Authors Hamed Taherdoost [email protected] Research and Development Department, Hamta Business Solution Sdn Bhd Kuala Lumpur, Malaysia Research and Development Department, Ahoora Ltd | Management Consultation Group Abstract Businesses and researchers across all industries conduct surveys to uncover answers to specific, important questions. In fact, questionnaires and surveys can be an effective tools for data collection required research and evaluation The challenge is how to design and create an effective survey/questionnaire that accomplishes its purpose.
    [Show full text]
  • Questionnaire Analysis Using SPSS
    Questionnaire design and analysing the data using SPSS page 1 Questionnaire design. For each decision you make when designing a questionnaire there is likely to be a list of points for and against just as there is for deciding on a questionnaire as the data gathering vehicle in the first place. Before designing the questionnaire the initial driver for its design has to be the research question, what are you trying to find out. After that is established you can address the issues of how best to do it. An early decision will be to choose the method that your survey will be administered by, i.e. how it will you inflict it on your subjects. There are typically two underlying methods for conducting your survey; self-administered and interviewer administered. A self-administered survey is more adaptable in some respects, it can be written e.g. a paper questionnaire or sent by mail, email, or conducted electronically on the internet. Surveys administered by an interviewer can be done in person or over the phone, with the interviewer recording results on paper or directly onto a PC. Deciding on which is the best for you will depend upon your question and the target population. For example, if questions are personal then self-administered surveys can be a good choice. Self-administered surveys reduce the chance of bias sneaking in via the interviewer but at the expense of having the interviewer available to explain the questions. The hints and tips below about questionnaire design draw heavily on two excellent resources. SPSS Survey Tips, SPSS Inc (2008) and Guide to the Design of Questionnaires, The University of Leeds (1996).
    [Show full text]
  • Questions Planned for the 2020 Census and American Community Survey Federal Legislative and Program Uses
    Questions Planned for the 2020 Census and American Community Survey Federal Legislative and Program Uses Issued March 2018 This page is intentionally blank. Contents Introduction . 1 Protecting the Information Collected by These Questions . 2 Questions Planned for the 2020 Census . 3 Age .......................................................................................... 5 Citizenship. 7 Hispanic Origin ................................................................................ 9 Race. 11 Relationship ................................................................................... 13 Sex. 15 Tenure (Owner/Renter) ......................................................................... 17 Operational Questions for use in the 2020 Census. ................................................. 19 Questions Planned for the American Community Survey . 21 Acreage and Agricultural Sales .................................................................. 23 Age .......................................................................................... 25 Ancestry. 27 Commuting (Journey to Work) .................................................................. 29 Computer and Internet Use ..................................................................... 31 Disability. 33 Fertility. 35 Grandparent Caregivers ........................................................................ 37 Health Insurance Coverage and Health Insurance Premiums and Subsidies ........................... 39 Hispanic Origin ...............................................................................
    [Show full text]