<<

CHAPTER NINE

Survey

PBNNY S. VISSBR, JON A. KROSNICK, AND PAUL J. LAVRAWS

Social have long recognized that every Frey, 1989; Lavrakas, 1993; Weisberg, Krosnick, 6 method of scientific inquiry is subject to limitations Bowen, 1996).We begin the chapter by suggestingwhy and that choosing among research methods inherently research may be valuable to sodal psychologists involves trade-offs. With the control of a and then outline the utility of various study designs. , for example, comes an artificiality that Next, we review the basics of survey and raises questions about the generalizability of results. design. Fiy,we describe procedures And yet the 'naturalness" of a field study or an observa- for pretesting questionnairesand for collection. tional study can jeopardize the of causal infer- ences. The inevitability of such limitations has led many REASONS FOR SOCIAL PSYCHOLOGISTS TO methodologists to advocate the use of multiple meth- CONDUCT SURVEY RBSEARCH ods and to insist that substantive conclusions can be most confidently derived by triangulating across mea- Social psychologists are interested in understanding sures and methods that have nonoverlapping strengths how people influence, and are influenced by, their so- and weaknesses (see, e.g., Brewer, this volume, Ch. 1; cial environment. And to the extent that social psy- Cpmpbell 6 Piske, 1959; Campbell 6 Stanley, 1963; chological phenomena are universal across different Cook 6 Campbell, 1969; Crano 6 Brewer, 1986; B. types of people, it makes little difference precisely with Smith, this volume, Ch. 2). whom social is conducted - This chapter describes a research that even data collected from samples that are decidedly un- we believe has much to offer social psychologists in- representative of the general population can be used to terested in a multimethod approach: survey research. draw about that population. Survey research is a specific type of field study that in- In recent years, however, psychologists have be- volves the collection of data from a of ele- come increasingly sensitive to the impact of disposi- ments (e.g., adult women) drawn from a well-defined tional and contextual factors on human thought and population (e.g., all adult women living in the United social behavior. Instead of broad statements about uni- States) through the use of a questionnaire (for more versal processes, social psychologists today are far more lengthy discussions, see Babbie, 1990; Fowler, 1988; likely to offer qualified accounts of which people, un- der which conditions, are likely to exhibit a partic- ular psychological or process. And ac- This chapter was completed while the second author was a cordingly, social psychologists have increasingly turned Fellow at the Center for Advanced Study in the Behavioral their attention to interactions between various social Sciences, supported by National Science Foundation Grant psychological processes and characteristics of the indi- SBR-9022192. Correspondence should be directed to Penny S. vidual, such as personality traits, identification with a Visxr, Department of , Princeton University, Prince- ton, New Jersey 08544 (e-mail:pvisseSPrinceton.edu). or Jon or category, or membership in a distinct A. Krosnick. Department of Psychology, Ohio State University, culture. In many cases, the nature of basic social psy- 1885 Neil Avenue, Columbus, Ohio 43210 (e-mail: Krosnick chological processes has been shown to depend to a @osu.edu). large degree on characteristics of the individual. 2U PENNY S. VISSER, JON A. KROSNICK, AND PAUL J. LAVRAKAS

The process by which attitude change occurs, for 6 Arenberg, 1983; Nesselroade 6 Baltes, 1974), have example, has been shown to differ for people who are more weakly established self-images (Mortimer, Pinch, low and high in what Petty and Cacioppo ( 1986) have 6 Kumka, 1982), and have less well-developed social termed 'need for cognition," a general enjoyment of identities (Alwin et al., 1991) than-do older adults. and preference for effordul thinking. Attitude change Because of these kinds of differences, Sears (1986) among people high in need for cognition tends to be argued, the field's reliance on participant pools of mediated by careful scrutiny of the in a per- college-aged adults raises questions about the gener- suasive appeal, whereas attitude change among people ahability of some findings from social psychological low in need for cognition tends to be based on cues in laboratory research and may have contributed to a dis- the persuasive message or context, such as the attrac- torted portrait of 'human nature." However, the evi- tiveness of the source. dence Sears (1986) cited largely reveals the prevalence Similarly, attributions have been shown to differ de- of certain characteristics (e.g., the frequency of atti- pending on social group membership (see, e.g., Hew- tude change or the firmness of social identities), rather stone, Bond, 6 Wan, 1983). People tend to attribute than differences in the processes by which these char- positive behaviors by members of their own social acteristics or others emerge in different age groups. We group or category to stable, internal causes. Those same currently know so little about the operation of social positive behaviors performed by a member of a differ- psychological processes in other of the popula- ent social group, however, are more likely to be at- tion that it is impossible to assess the extent of in tributed to transitory or external factors. this regard. According to much recent research, culture may Doing so will require studies of samples that are also moderate many social psychological phenomena. representative of the general population, and induc- Markus and Kitayama (1991), for example, have ar- ing most members of such samples to visit a labora- gued that members of different cultures have differ- tory seems practically impossible. Studying a repre- ent construals of the self and that these differences sentative sample through , however, is can have a profound impact on tlye nature of cogni- relatively easy and surprisingly practical. Using the ba- tive, emotional, and mo@vationalpmceses. Similarly, sic tenets of probability , survey researchers have Nisbett and his colleagues'(Coheh, Nisbett, Bowdle, 6 developed a number of effident strategies for drawing Schwan, 1996; Nisbra, 1993; Nisbett 6 Cohen, 1996) representative samples that are easy to contact. And have explored what they tcnned the 'culture of honorw when samples have been selected in such a manner, of the American South and have demonstrated marked social psychologists can confidently generalize findings differences in the cognitive, eimotiod, behavioral, and to the entire population. Furthermore, survey research even physiological reactions af southem men (relative provides conditions for the exploration of Pro- to their northern counterparts) when confronted with cess x Individual Difference interactions because care- insult. fully selected samples reflect the full heterogeneity of These kinds of process-by-indi--difference in- the general population. teractions suggest that precisely who participates in so- There are two primary limitations of survey research cial psychological research can have a profound im- for social psychologists. First, surveys are more expen- pact on what results are obtained. And of course, for sive and time-consuming than most laboratory experi- the vast majority of social psychological research, that ments using captive participant pools. However, many 'who" has been the infamous college sophomore. Sears cost-saving approaches can be implemented. Second is (1986) has argued that the field's overwhelming re- the impracticality of executing elaborate scripted sce- liance on this narrow base of research -ts narios for social interaction, especially ones involving may represent a serious problem for sodal psychol- deception. Whereas these sorts of events can be ae- ogy. Pointing to various attributes that are charac- ated in labs with undergraduate participants, they are teristic of young adults, Sears (1986) suggested that tougher to do in the field. But as .we discuss shortly, the 'college sophomore" partidpant pool is unrep- many experimental procedures and manipulations can resentative of the general population in a num- be incorporated in surveys. ber of important ways. Among other things, young Put simply, can happily proceed adults are more susceptible to attitude change (Alwin, doing most of our research with college sophomores, Cohen, 6 Newcomb, 1991; Glenn, 1980; Krosnick 6 assuming that our findings generalize. And we can live Alwin, 1989; Sears, 1983), exhibit less stable person- with the skepticism of scholars from other disciplines ality traits (Caspi, Bem, 6 Elder, 1989; Costa, McCrae, who question that generalizabiity, having documented the profound impact that context and history have on that can be brought about by respondents' own be- social processes. Or we can accept the challenge and havior (e.g., misreporting true attitudes, failing to pay explore the replicabiity of our findings in the general close attention to a question), interviewer behavior population. Either we will confirm our assumptions of (e.g., misrecording responses, providing cues that lead generalizability or we will reline our by adding partidpants to respond in one way or another), and to them new mediators and moderators. The explica- the questionnaire (e.g., ambiguous or confusing ques- tion of the survey method offered below is intended to tion wording, biased question wording or response help those who accept the challenge. options). The total swey error perspective advocates explic- itly taking into consideration each of these sources of error and making decisions about the allocation of fi- Even researchers who recognize the value of swey nite resources with the goal of reducing the sum of the methodology for social psychological inquiry are some- four. In the sections that follow, we consider each of times reluctant to initiate swey research because of these potential sources of survey error and their im- misconceptions regarding the feasibility of conducting plications for psychologists seeking to balance prag- a survey on a limited budget. And indeed, the cost of matic budget considerations against concerns about prominent large- national sweys conducted by data quality. major swey organhtions are well outside of the R- social search budgets of most psychologists. But survey STUDY DESIGNS methodologists have recently begun to rekindle and expand the early work of Hansen and his colleagues Sweys offer the opportunity to execute studies with (e.g., Hansen 6 Madow, 1953)in thinking about survey various &signs, each of which is suitable for addressing design issues within an explicit cost-benefit framework particular research questions of long-standing interest geared toward helping researchers make design dea- to social psychologists. In this section, we will review sions that maximhe data quality within the constraints several standard designs, including aoss-sectional, re- of a limited budget. This approach to survey methodol- peated cross-sectional, panel, and mixed designs, and ogy, known as the 'total swey error" perspective (d. discuss when each is appropriate for social psychologi- Dillman, 1978, Fowler, 1988; Groves, 1989; Lavrakas, cal investigation. We will also review the incorporation 1993), can provide social psychologists with a broad of within sweys. framework and specific guidelines for making decisions to conduct good sweyson limited budgets while max- Cross-Sectional Surveys imizing data quality. The total survey error perspective recognizes that Cross-sectional surveys involve the collection of the ultimate goal of swey research is to accurately data at a single point in time from a sample drawn from measure particular constructs within a sample of a speci6ed population. This design is most often used people who represent the population of interest. In any to document the prevalence of particular characteristics given survey, the overall deviation from this ideal is the in a population. For example, aoss-sectional sweys cumulative result of several sources of survey error. are routinely conducted to assess the frequency with Spedfically, the total swey error perspective disag- which people perform certain behaviors or the num- gregates overall error into four components: coverage ber of people who hold particular attitudes or beliefs. error, , nonresponse error, and measure- However, documenting prevalence is typically of little ment error. Cwage mor refers to the bias that can interest to social psychologists, who are usually more result when the pool of potential survey partidpants interested in documenting assodations between vari- from which a sample is selected does not include some ables and the causal processes that give rise to those portions of the population of interest. Smnpling mor associations. refers to the random differences that invariably exist Cross-sectional surveys do offer the opportunity to between any sample and the population from whicb it assess relations between variables and differences be- was selected. Nonresponse mris the bias that can re- tween subgroups in a population. But although many sult when data are not collected from all of the mem- scholars believe their value ends there, this is not the bers of a sample. And musumnmt mor refers to all case. Cross-sectional data can be used to teit causal distortions in the assessment of the construct of inter- hypotheses in a number of ways. For example, using est, including systematic and random statistical techniques such as two- least squares 226 PENNY S. YISSBR, JON A. KROSNICK, AND1 PAUL J. LAVRAWS

regression, it is possible to estimate the causal impact most when and -formation processes of variable A on variable B, as well as the effect of vari- are not firmly grounded in past experience and in sup- able B on variable A (Blalock, 1972). Such an anal- porting knowledge bases. From a political viewpoint, ysis rests on important assumptions about causal re- this finding suggests that news priming may not lations among variables, but these assumptions can be especially politically consequential in nations where be tested and revised as necessary (see, e.g., James 6 political expertise is high throughout the population. Singh, 1978). Furthermore, path analytic techniques can be applied to test hypotheses about the mediators of Repeated Cross-Sectional Surveys causal relations (Baron 6 Kenny, 1986; Kenny, 1979), thereby validating or challenging notions of the psy- Additional evidence consistent with a hypothesized chological mechanisms involved. And cross-sectional causal relation would be that changes over time in data can be used to identify the moderators of rela- a dependent variable parallel changes in a proposed tions between variables, thereby also shedding some independent variable. One way to generate such evi- light on the causal processes at work (e.g., Krosnick, dence is to conduct repeated aoss-sectional surveys, 1988b). in which data are collected from independent sam- A single, cross-sectional survey can even be used to ples drawn from the same population at two or more assess the impact of a social event. For example, Kros- points in time. If a hypothesized causal relation exists nick and Kinder ( 1990)studied priming in a real-world between two variables, between-wave changes in the setting by focusing on the IranIContra scandal. On independent variable should be mirrored by between- November 25,1986, the American public learned that wave changes in the dependent variable. Fir example, members of the National Security Council had been if one that interracial contact may reduce inter- funneling funds (earned through arms sales to Iran) to racial prejudice, an increase in interracial contact over the Contras fighting to overthrow the Sandinista gov- a period of years in a sodety should be paralleled by a ernment in Nicaragua. Although there had been almost reduction in interracial prejudice. no national news media attention to Nicaragua and the One study along these lines was reported by Schu- Contras previously, this revelation led to a dramatic in- man, Steeh, and Bobo (1985). Using aoss-sectional crease in the salience of that country in the American surveys conducted between the 1940s and the-1980s press during the following weeks. Krosnick and Kinder in the , these investigators documented suspected that this coverage might have primed Amer- dramatic increases in the prevalence of positive atti- icans' attitudes toward U.S. involvement in Nicaragua tudes toward principles of equal treatment of Whites and thereby increased the impact of these attitudes and Blacks. And there was every reason to believe that on of President Ronald Reagan's job these general principles might be important detenni- performance. nants of people's attitudes toward specific government To test this , Krosnick and Kinder ( 1990) efforts to ensure equality. However, there was almost took advantage of the that for the no shift during these years in public attitudes toward 1986 National Election Study, a' national survey, was specific implementation strategies. This challenges the underway well before November 25 and continued notion that the latter attitudes were shaped powerfully well after that date. So these investigators simply split - by the general principles. the survey sample into one group of respondents who Repeated aoss-sectional surveys can also be used to had been interviewed before November 25 and the study the impact of social events that occurred between others, who had been interviewed afterward. As ex- the surveys (e.g., Weisberg, Haynes, 6- Krosnick, 1995). pected, overall assessments of presidential job perfor- And repeated cross-sectional surveys can be combined mance were based much more strongly on attitudes into a single data set for statistical analysis, using in- toward U.S. involvement in Nicaragua in the second formation from one survey to estimate parameters in group than in the first group. another survey (e.g., Brehm 6 Rahn, 1997). Furthermore, Krosnick and Kinder (1990) found that this priming effect was concentrated primarily Panel Surveys among people who were not especially knowledgeable about (so-called 'political novices"), a finding In a panel survey, data are collected from the same permitted by the heterogeneity in political expertise in people at two or more points in time. Perhaps the most a national sample of adults. From a psychological view- obvious use of panel data is to assess the stability of psy- point, this suggests that news media priming occurs chological constructs and to identify the determinants SURVEY RESEARCH 227 of stability (e.g., Krosnick, 1988a; Krosnick 6 Alwin, such data. Their interest was in the impact of the Gulf ' 1989). But with such data, one can test causal hypothe- War on the ingredients of public evaluations of presi- ses in at least two ways. First, one can examine whether dential job performance. For the 1990-1991 National individual-level changes over time in an independent Election Panel Study of the Political Consequences of variable correspond to individual-level changes in a War, a panel of respondents had been interviewed first dependent variable over the same period of time. So, in late 1990 (before the Gulf War) and then again for example, one can ask whether people who expe- in mid-1991 (after the war). The war brought with rienced increasing interracial contact manifested de- it tremendous news coverage of events in the Gulf, creasing racial prejudice, while at the same time, people and Krosnick and Brannon suspected that this cover- who experienced decreasing interracial contact mani- age might have primed attitudes toward the Gulf War, fested increasing racial prejudice. thereby increasing their impact on public evaluations of Second, one can assess whether changes over time President George Bush's job performance. This hypoth- in a dependent variable can be predicted by prior levels esis was confirmed by comparing the determinants of of an independent variable. So, for example, do people presidential evaluations in 1990 and 1991. Because the who had the highest amounts of interracial contact at same people had been interviewed on both occasions, Time 1 manifest the largest decreases in racial preju- this demonstration is not vulnerable to a possible alter- dice between Time 1 and Time 2. Such a demonstra- native of the Krosnick and Kinder (1990) tion provides relatively strong evidence consistent with results described above: that different sorts of people a causal hypothesis, because the changes in the depen- were interviewed before and after the IranlContra rev- dent variable could not have caused the prior levels elation and their preestablished presidential of the independent variable (see, e.g., Blalock, 1985; strategiesmay have produced the patterns of regression Kessler 6 Greenberg, 1981, on the methods; see Rahn, coefficients that would then have been misdiagnosed Krosnick, 6. Breuning, 1994, for an illusuation of its as evidence of news media priming. application). Panel surveys do have some disadvantages. First, al- One application of this approach occurred in a study though people are often quite willing to participate in of a long-standing social psychological idea called the a single cross-sectional survey, fewer may be willing to projedm hypothesis. Rooted in cognitive consistency complete multiple . Furthermore, with each theories, it proposes that people may overestimate the additional wave of panel data collection, it becomes extent to which they agree with others whom they like, increasingly difficult to locate respondents to reinter- and they may overestimate the extent to which they view them, because some people move to different lo- disagree with others whom they dislike. By the late cations, some die, and so on. This may threaten the 19805, a number of cross-sectional studies by political representativeness of panel surve* samples if the mem- psychologists yielded correlations consistent with the bers of the first-wave sample who agree to participate notion that people's perceptions of the policy stands of in several waves of data collection differ in meaning- presidential candidates were distorted to be consistent ful ways from the people who are interviewed initially with attitudes toward the candidates (e.g., Granberg, but do not agree to partidpate in subsequent waves of 1985; Kinder, 1978). However, there were alterna- interviewing. tive theoretical interpretations of these correlations, so Also, participation in the initial survey may sen- an analysis using panel survey data seemed in order. sitize respondents to the issues under investigation, Krosnick (199 la) did just such an analysis exploring thus changing the phenomena being studied. As a re- whether attitudes toward candidates measured at one sult, respondents may give special attention or thought time point could predict subsequent shifts in percep- to these issues, which rhay have an impact on sub- tions of presidential candidates' issue stands. And he sequent survey responses. For example, Bridge et al. found no projection at all to have occurred, thereby ( 1977) demonstrated that individuals who participated suggesting that the previously documented correla- in a survey about health subsequently con- tions were more likely due to other processes (e.g., sidered the topic to be more important. And this in- deciding how much to like a candidate based on agree- creased importance of the topic can be translated into ment with him or her on policy issues; see Byrne, 1971; changed behavior. For example, people interviewed Krosnlck, 1988b). about politics are subsequently more likely to vote The impact of social events can be gauged especially in elections (Granberg 6. Holmberg, 1992; Kraut 6 powerfully with panel data. For example, Krosnick and McConahay, 1973; Yalch, 1976). Even answering a Brannon (1993) studied news media priming using single survey question about one's intention to vote increases the likelihood that an individual will turn out of affirmative action have characterized it as entailing to vote on election day (Greenwald, Carnot, Beach, 6. reverse disaimination against qualified White candi- Young, 1987). dates; other times, opponents have characterized affir- Finally, panel survey respondents may want to ap- mative action as giving unfair advantages to minority pear consistent in their responses across waves. There- candidates. Did this difference in framing change the fore, people may be reluctant to report opinions or be- way the general public formed opinions on the issue? haviors that appear inconsistent with what they had To answer this question, Kinder and Sanders (1990) reported during earlier interviews. The desire to appear asked White respondents in a national survey about consistent could mask genuine changes over time. whether they favored or opposed affirmative action Researchers can capitalize on the strengths of each programs in hiring and promotions and in college ad- of the designs discussed above by incorporating both missions. Some respondents were randomly assigned cross-sectional and panel surveys into a single study. to receive a description of opposition to affirmative ac- If, for example, a researcher is interested in conduct- tion as emanating from the that it involves re- ing a two-wave panel survey but is concerned about verse dwmmmtion.. .. Other respondents, again selected carry-over effects, he or she could conduct an addi- randomly, were told that opposition to affirmative ac- tional cross-sectional survey at the second wave. That tion emanates from the belief that it provides unfair is, the identical questionnaire could be administered advantages to minorities. to both the panel respondents and to an independent This experimental manipulation of the framing of sample drawn from the same population. Significant opposition did not alter the percentages of people who differences between the data collected from these two said they favored or opposed affirmative action, but it samples would suggest that carry-over effects were, in did alter the processes by which those opinions were fact, a problem in the panel survey. In effect, the cross- formed. When affirmative action was framed as giving sectional survey respondents can serve as a 'control unfair advantage to minorities (thereby making minor- group" against which panel survey respondents can be ity group members salient), it evoked more anger, dis- compared. gust, and fury from respondents, and opinions were based more on general racial prejudice, on intoler- ance of diversity in , and on belief in general Experiments within Surveys moral decay in society. But when affirmative action Additional evidence of causal processes can be doc- was framed as reverse discrimination against qualified umented in surveys by building in experiments. If re- Whites (thereby making Whites more salient), opin- spondents are randomly assigned to 'treatment' and ions were based more on the perceived material inter- 'control" groups, differences between the two groups ests of the respondent and of Whites as a group. can then be attributed to the treatment. Every one of Because Kinder and Sanders (1990) analyzed data the survey designs desaibed above can be modified to from a national survey, respondents varied a great deal incorporate experimental manipulations. Some suryq in terms of their political expertise. Capitalizing on respondents (selected at random) can be exposed to this diversity, Kinder and Sanders found that the im- one version of a questionnaire, whereas other respon- pact of framing was concentrated nearly exclusively dents are exposed to another version. Differences in re- among political novices. This reinforced the implica- sponses can then be attributed to the specific elements tion of Krosnick and Kinder's (1990) finding regard- that were varied. ing political expertise in their research on news media Many social psychologists are aware of examples of priming described earlier. survey research that have incorporated experiments to Sniderman and Tetlock (1986) and Sniderman, explore effects of question order and question word- Tetlock, 6. Peterson ( 1993) have also conducted exper- ing (see, e.g., Schuman 6 Presser, 1981). Less salient iments within surveys to assess whether conservative are the abundant examples of experiments within sur- values encourage racial prejudice in judgments about veys that have been conducted to explore other social who is entitled to public assistance and who is not. In psychological phenomena. their studies, respondents were told about a hypothet- ical person in need of public assistance. Different re- RACISM. One experimental study within a survey spondents were randomly assigned to receive different was reported by Kinder and Sanders (1990),who werv descriptions of the person, varying in terms of previous interested in the impact of public debates on public work history, marital and parental stams, age, and race. opinion on affirmative action. Sometimes, opponents Interestingly, conservatives did not exhibit prejudice against Blacks when deading whether he or she should who were described as having been model prisoners. receive public assistance, even when the person was This suggests that stereotypes can have relatively little said to have violated traditional values (e.g., by being a impact on public policy debates if didcussions focus on single parent or having a history of being an unreliable counterstereotypic perpetrators. worker). And in fact, when presented with an individ- ual who had a history of being a reliable worker, con- MOOD AND LIFE SATISFACTION. SchwaI'Z and servatives were substantially more supportive of public Clore (1983) conducted an experiment in a survey to assistance for Blacks than for Whites. However, con- explore mood and misattribution. They hypothesized servatives were significantly less supportive of public that general affective states can sometimes influence policies designed to assist Blacks as a group and were judgments via misattribution. Specifically, these inves- more likely to believe that Blacks are irresponsible and tigators presumed that weather conditions (sunny vs. lazy. Sniderman and Tetlock (1986) concluded that a cloudy) influence people's moods, which in turn may key condition for the expression of racial discrimina- influence how happy they say they are with their lives. tion is therefore a focus on groups rather than individ- This presumably occurs because people misattribute ual members of the groups and that a generally conser- their current mood to the general conditions of their vative orientation does not encourage individual-level lives, rather than to the weather conditions that hap- discrimination. pen to be occurring when they are asked to make the Peffley and Hurwitz (1997; Peffley, Hurwitz, 6 judgment. As a result, when people are in good moods, Sniderman, 1997) also conducted experiments within they may overstate their happiness with their lives. surveys to explore the impact of racial stereotypes on To test this hypothesis, Schwam and Clore (1983) judgments regarding . These investigators hy- / conducted telephone interviews with people on either pothesized that although some Americans may hold sunny or cloudy days. Among respondents who were negative stereotypes of Blacks, those stereotypes may randomly selected to be asked simply how happy they only be used to make judgments about criminal acts were with their lives, those interviewed on sunny days or public policies regarding crime when the perpetra- reported higher satisfaction than people interviewed tors have characteristics consistent with those negative on cloudy days. But among people randomly selected stereotypes. Therefore, when debates about govern- to be asked first, 'By the way, how's the weather down ment policy focus on counterstereotypic African Amer- there?", those interviewed on sunny days reported ican perpetrators, publicattitudes may not be especially identical leveb of life satisfaction to those interviewed driven by stereotypes. on cloudy days. The question about the weather pre- In one experiment, respondents in a representative sumably led people to properly attribute some of their sample survey were told about a man accused of com- current mood to current weather conditions, thereby mitting a crime and were asked how likely he was insulating subsequent life satisfaction judgments from to have actually committed it and how likely he was influence. to commit a similar crime in the future. Respondents were randomly assigned to be told that the accused aPB BBNBPFFS OF BXPBRlMBNTS WFFHIN SURVEYS. was either Black or White, and they were randomly What is the benefit of doing these experiments in repre- assigned to be told that the crime was either violent or sentative sample surveys? Couldn't they instead have not violent. When the perpetrator was Black and the been done just as well in laboratory settings with col- crime was violent (and thereby consistent with some lege undergraduates? Certainly, the answer to this lat- negative stereotypes of Black criminals), respondents ter question is yes; they could have been done as tradi- who held especially negative stereotypes of Blacks were tional social psychological experiments. But the value more likely than others to say the person had com- of doing the studies within representative sample sur- mitted the crime and would do so again. But when veys is at least three-fold. First, survey evidence docu- the crime was not violent or when the perpetrator ments that the phenomena are widespread enough to was White, stereotypes of Blacks had no impact on be observable in the general population. This bolsters judgments of guilt or likelihood of recidivism. In an- the apparent value of the findings in the eyes of the other experiment, respondents with especially nega- many nonpsychologists who instinctively question the tive stereotypes of Blacks were more opposed than generaliability of laboratory 5ndings regarding under- others to furlough programs for Blacks convicted of graduates. committing violent , but were not especially op- Second, estimates of effect sue from surveys pro- posed to furlough programs for Whites or for Blacks vide a more accurate basis for assessing the significance 230 PENNY S. MSSBR, JON A. KROSNICK, AND PAUL J. LAVRAKAS that any social psychological process is likely to have method has been used. When elements have been se- in the course of daily life. Effects that seem large in lected through other procedures or when portions of the lab (perhaps because undergraduates are easily the population had no chance of being included in the influenced) may actually be quite small and thereby sample, there is no way to know whether the sam- much less socially consequential in the general popu- ple is representative of the population. Generalizations lation. And third, general population samples allow re- beyond the specific elements in the sample are there- searchers to explore whether attributes of people that fore only warranted when probability sampling meth- are homogeneous in the lab but vary dramatically in ods have been used. the general population (e.g., age, educational attain- The second advantage of probability sampling is that ment) moderate the magnitudes of effects or the pro- it permits researchers to precisely estimate the amount cesses producing them (e.g., Kinder 6. Sanders, 1990). of variance present in a given data set that is due to sam- pling error. That is, researchers can calculate the degree to which random differences between the sample and SAMPLING the are likely to have diminished the Once a survey design has been specified, the next step predsion of the obtained estimates. Probability sam- in a survey investigation is selecting a sampling method pling also permits researchers to construct confidence (see, e.g., Henry, 1990; Kalton, 1983; Kish, 1965; Sud- intervals around their parameter estimates, which in- man, 1976). One need not look far in the social sci- dicate the predsion of the point estimates. ence literature to find examples where the conclusions of studies were dramatically altered when proper sam- SIMPLE RANDOM SAMPLING. Simple random sam- pling methods were used (see, e.&, Laurnann, Michael, pling is the most basic form of probability sampling. Gagnon, 6. Michaels, 1994). In &is section, we ex- With this method, elements are drawn from the pop- plain a number of sampling methods and discuss their ulation at random, and all elements have the same strengths and weaknesses. In this discussion, the term chance of being selected. Simple random sampling can elnncnt is used to refer to the individual unit about be done with or without replacement, where replace- which is sought. In most studies, elements ment refers to returning selected elements to the pop- are the people who make up the population of inter- ulation, making them eligible to be selected again. In est, but elements can also be groups of~pple,such as practice, sampling without replacement (i.e., so that families, corporations, or departments. k population is each element has the potential to be selected only once) the complete group of elements to which one wishes is most common. to generalize findings obtained from a sample. Although conceptually a very straightforward pro- cedure, in practice, simple random sampling is rela- tively difficult and costly to execute. Its main disad- Probability Sampling vantage is that it requires that all members of the There are two general classes of sampling methods: population be identified so that elements can be in- nonprobability and probability sampling. NonprAbil- dependently and directly selected from the full popula- ity sampling refers to selection procedures in which el- tion listing (the sampling frame). Once this has been ac- ements are not randomly selected from the population complished, the is drawn from or some elements have unknown probabilities of being the frame by applying a series of random numbers that selected. Probability sampling refers to selection proce- lead to certain elements being chosen and others not. dures in which elements are randomly selected from In many cases, it is impossible or impractical to enumer- the sampling frame and each element has a known, ate every element of the population of interest, which nonzero chance of being selected. This does not require rules out simple random sampling. that all elements have an equal probability, nor does it preclude some elements from having a certain (1.00) SYSTBMATIC SAMPLING. Systematic sampling is a probability of selection. However, it does require that slight variation of simple random sampling that is more the selection of each element must be independent of convenient to execute (e.g., see Ahlgren, 1983). Like the selection of every other element. simpk random sampling, systematic sampling requires Probability sampling affords two important advan- that all elements be identified and listed. Based on the tages. First, researchers can be confident that a selected number of elements in the population and the desired sample is representative of the larger population from sample size, a sampling interval is determined. For ex- which it was drawn only when a probability sampling ample, if a population contains 20,000 elements and a sample of 2,000 is desired, the appropriate sampling in- s- SAMPLING. is a terval would be 10. That is, every 10th element would slight variation of random and systematic sampling, be selected to arrive at a sample of the desired size. where the sampling frame is divided into subgroups To start the sampling process, a random number be- (i.e., strata), and the sampling process is executed sep- tween one and 10 is chosen, and the element on the arately on each stratum (e.g., see Ross, 1988; Stapp list that corresponds to this number is included in the 6- Fulcher, 1983). In the example above, the sampling sample. This randomly selected number is then used as frame could be divided into categories (e.g., husbands the starting point for choosing all other elements. Say, and wives), and elements could be selected from each for example, the randomly selected starting point was category by either a random or systematic method. 7 in a systematic sample with a sampling interval of Stratified sampling provides greater control over the 10. The 7th element on the list would be the first to be composition of the sample, assuring the researcher of included in the sample, followed by the 17th element, representativeness of the sample in terms of the strati- the 27th element, and so forth.' fication variable(s).When the stratification variable is It is important to note that systematic sampling will related to the dependent variable of interest, stratified yield a sample that is representative of the sampling sampling reduces sampling error below what would re- frame from which it was drawn only if the elements sult from simple random sampling. composing the list have been arranged in a random or- Stratification that involves the use of the same sam- der. When the elements are arranged in some nonran- pling fraction in each stratum is referred to as pro- dom pattern, systematic sampling will not necessarily portional stratified sampling. Disproportional stratified yield samples that are representative of the populations sampling - using different sampling fractions in differ- from which they are drawn. This potential problem is ent strata - can also be done. This is typically done exacerbated when the elements are listed in a cyclical when a researcher is interested in reducing the stan- pattern. If the cyclical pattern of elements coincided dard error in a stratum where the is with the sampling interval, one would draw a distinctly expected to be high. By inaeasing the sampling frac- unrepresentativesample. tion in that stratum, he or she can increase the number To illustrate this point, consider a researcher in- of elements allocated to the stratum. This is often done terested in drawing a systematic sample of men and to ensure large enough subsamples for subpopulation women who had sought marital counseling within the analyses. For example, survey researchers sometimes last 5 years. Suppose he br she obtained a sampling increase the (often called oversam- frame consisting of a list of individuals meeting this cri- pling) for minority groups in national surveys so that terion, arranged by couple: each husband's name listed reliable parameter estimates can be generated for such first, followed by the wife's name. If the researcher's , subgroups. randomly chosen sampling interval was an even num- Stratification requires that researchers know in ber, he or she would end up with a sample composed advance which variables represent meaningful dis- exclusively of women or exclusively of men, depend- tinctions between elements in the population. In the ing on the random start value. This problem is referred example above, was assumed to be an impor- to as periodidly, and it can be easily avoided by random- tant dimension, and substantive differences were ex- izing the order of elements within the sampling frame pected to exist between men and woman who had before applying the selection scheme. sought marital counseling in the past 5 years. Other- wise, it wouldn't matter if the sample included only Some have argued that the requirement of independence men or only women. As Kish (1965) pointed out, the among sample elements eliminates systematic sampling as magnitude of the advantage of stratification depends a probability sampling method, because once the sampling on the relation between the stratification variable and interval has been established and a random stan value has been chosen, the selection of elements is no longer inde- the variable(s) of substantive interest in a study; the pendent. Nevertheless, sampling statisticians and survey re- stronger this relation, the greater the gain from using searchers have traditionally regarded systematic sampling as a stratified sampling strategy. a probability sampling method, as long as the sampling frame has been arranged in a random order and the start value has . When a population is dis- been chosen through a random selection mechanism (e.g., Henry, 1990; Kalton, 1983; Kish, 1965).We have therefore in- persed over a broad geographic region, simple random duded systematic sampling as a probability sampling method, sampling and systematic sampling will result in a sam- notwithstanding the potential problem of nonindependence ple that is also dispersed broadly. This presents a prac- of element selection. tical (and costly) challenge in conducting face-to-face 232 PENNY S. YISSBR, JON A. KROSNICK AND PAUL J. LAVRAKAS interviews, because it is expensive and time-consu- clusters. For statistical tests to be unbiased, this sort ming to transport interviewers to widely disparate lo- of nonindependence needs to be statistically modeled cations, collecting data from only a small number of and incorporated in any analysis, thus making the en- respondents in any one place. terprise more cumbersome. To avoid this problem, researchers sometimes irnple- ment cluster sampling, which involves drawing a sam- ple with elements in pups (called 'clusters") rather Threats to Sample Representativeness than one-by-one (e.g., see Roberto 6 Scott, 1986; Ideally, these sampling processes will yield sam- Tziner, 1987). Then all elements within a cluster are ples that are perfectly representative of the populations sampled. From the full geographic region of interest, from which they were drawn. In practice, however, this the researcher might randomly select neighborhoods, virtually never occurs. Two of the four sources of error for example, and collect data from all of the house- within the total survey error perspective can distort sur- holds in each selected neighborhood. In fact, face-to- vey results by compromising representativeness: sam- face interviewing of the American adult population pling error and nonresponse error. is typically done in clusters of 80 to 100 households within randomly selected neighborhoods, keeping the SAMPLING ERROR. Sampling error refers to the dis- cost of national interviewing staffs at a manageable crepancy between the sample data and the true popula- level. tion values that are attributable to random differences Cluster sampling can also be implemented in mul- between the sample and the sampling frame. When tiple stages, with two or more sequential steps of ran- one uses a probabity sample, estimates of sampling dom sampling; this is called (e.g., error can be calculated, representing the magnitude of see Himmelfarb 6 Norris, 1987).To assemble a national uncertainty regarding obtained parameter estimates. sample for an in-person survey, for example, one might Sampling error is typically expressed in terms of the begin by randomly selecting 100 or so counties from of an estimate, which refers to the vari- among the more than 3,000 in the nation. W~thineach ability of sample estimates around the true population selected county, one could then randomly select a cen- value, assuming repeated sampling. That is, the stan- sus tract; and from each selected tract, one could select dard error indicates the probability of observing sample a specific block. Then a certain number of households estimates of varying distances from the true population on each selected block could be randomly selected for value, assuming that an infinite number of samples of inclusion in the sample. To do this, a researcher would a particular size are drawn from the same population. only need a list of all counties in the United States, all Probability theory provides an equation for calculating of the tracts in the selected counties, and all the the standard error for a single sample from a population blocks within the selected tracts, and only then one of "infinite" size: would one need to enumerate all of the households on the selected blocks, from which to 6nally draw the SE = ,/sample variancelsample size. (1) sample elements. Cluster sampling can substantially reduce the time Once the standard error has been calculated, it can and cost of face-to-face data collection, but it also re- be used to construct a around a duces accuracy by increasing sampling error. Members sample estimate, which is informative regarding the of a cluster are likely to share not only proximity, but precision of the parameter estimate. For example, a re- a number of other attributes as well - they are likely searcher can be 95% confident that an observed sample to be more similar to one another along many dimen- (e.g., the sample mean for some variable) falls sions than a sample of randomly selected individuals within 1.96 standard errors of the true population pa- would be. Therefore, interviews with a cluster of re- rameter. A small standard error, then, suggests that the spondents will typically yield less precise information sample statistic provides a relatively precise estimate of about the full population than would the same number the population parameter. of interviews with randomly selected individuals. As Equation 1 shows, one determinant of sam- Furthermore, clustering aeates problems because pling error is sample size - as sample size inaeases, it violates an assumption underlying most statistical sampling error deaeases. This decrease is not linear, tests: independence of . That is, all the peo- however. Moving from a small to a moderate Sam- ple in a particular cluster are likely to be more sim- ple size produces a substantial decrease in sampling ilar to each other than they are to people in other error, but further inaeases in sample size produce SURVEY RESEARCH 233 smaller and smaller decrements in sampling error. based on the assumption that the sample was drawn us- Thus, researchers are faced with a trade-off between ing simple random sampling. When another probabil- the considerable costs associated with inaeases in sam- ity sampling method has been used, the sampling error ple size and the relative gains that such inaeases afford may actually be slightly higher or slightly lower than in accuracy. the standard formula indicates. This impact of sampling The formula in Equation 1 is correct only if the pop- strategy on sampling error is called a design effect. De- ulation size is infinite. If the population is finite, then fined more formally, the design effect associated with a correction factor needs to be added to the formula a probability sample is 'the ratio of the actual variance for the standard error. Thus, the ratio of sample size to of a sample to the variance of a simple random sample population size is another determinant of sampling er- of the same elements" (Kish, 1965, p. 258). ror. Data collected from 500 people will include more Any probability sampling design that uses cluster- sampling error if the sample was drawn from a popu- ing will have a design effect in excess of 1.0. That is, lation of 100,000 people than if the sample was drawn the sampling error for cluster sampling will be higher from a population of only 1,000 people. When sam- than the sampling error for simple random sampling. pling from relatively small populations (i.e., when the Any stratified sampling design, on the other hand, will sample to population ratio is high), the following alter- have a design effect less than 1.0, indicating that the native sampling error formula should be used: sampling error is lower for stratified samples than for simple random samples. Researchers should be atten- tive to design effects, because taking them into account can increase the likelihood of statistical tests detecting population size - sample genuinely significant effects. sample size population size ~oms~o~saERROR. Even when probabiity sampling is done for a survey, it is unlikely that 100% As a general rule of thumb, this correction only of the sampled elements will be successfully contacted needs to be done when the sample contains over 5% and will agree to provide data. Therefore, most survey of the population (Henry, 1990). However, even major samples include some elements from whom no data are differences in the ratio of the sample size to population gathe~d.~A survey's findings may be subject to non- size have only a minor impact on sampling error. For response error to the extent that the sampled elements example, if a dichotomous variable has a 50150 distri- from whom no data are gathered differ systematically bution in the population and a sample of 1,000 ele- from those from whom data are gathered. ments is drawn, the standard sampling error formula To minimize the potential for nonresponse error, re- would lead to a confidence interval of approximately 6 searchers implement various procedures to encourage percentage points in width. If the population were only as many selected respondents as possible to partici- 1,500 in size (i.e., two thirds of the elements were sam~ pate (see, e.g., Dillman, 1978; Fowler, 1988; Lavrakas, pled), the confidence interval width would be redud 1993). Stated generally, the goal here is to minimize to 5 percentage points. the apparent costs of responding, maximize the appar- As Equations 1 and 2 illustrate, sampling error is also ent rewards for doing so, and establish trust that those dependent on the amount of variance in the variable of rewards will be delivered (Dillman, 1978). One con- interest. If there is no variance in the variable of inter- aete approach to accomplishing these goals is sending est, a sample of 1 is sufficient to estimate the population value with no sampling error. And as the variance ion- Most researcher use the term 'sample" to refer both to (a) aeases, sampling error also inaeases. With a sample the set of elements that are sampled from the sampling frame of 1,000, the distribution of a dichotomous variable from which data ideally will be gathered and (b) the final with a 50150 distribution in the population can be es- set of elements on which data actually are gathered. Because timated with a confidence interval 6 percentage points almost no survey has a perfect response rate, a discrepancy in width. However, the distribution of a dichotomous almost always exists between the number of elements that variable with a 10190 distribution would have a confi- are sampled and the number of elements from which data are gathered. Lavrakas (1993) suggested that the term 'sampling 3.7 dence interval of approximately percentage points pool' be used to refer to the elements that are dram from in width. the sampling frame for use in sampling and that the term The standard formula for calculating sampling error, 'sample" be pPServed for that of the sampling pool and that used by most computer statistical programs, is from which data are gathered. 234 PENNY S. MSSER, JON A. KROSNICK, AND PAUL J. LAVRAKAS letters to potential respondents informing them that mail surveys documented the demographic character- they have been selected to participate in a study and istics of voters more accurately than did the telephone will soon be contacted to do so, explaining that their surveys. Therefore, simply having a low response rate participation is essential for the study's success because does not necessarily mean that a survey suffers from a of their expertise on the topic, suggesting reasons why large amount of nonresponse error. participation will be enjoyable and worthwhile, as- Studies exploring the impact of response rates on suring respondents of confidentiality, and informing correlational results have had mixed implications. For them of the study's purpose and its sponsor's aedibil- example, Brehm (1993) found that statistically cor- ity. Researchers also make numerous attempts to con- recting for demographic biases in sample composition tact hard-to-reach people and to convince reluctant re- had very little impact on the substantive implications spondents to participate and sometimes pay people for of correlational analyses. However, Traugott, Groves, participation or give them gifts as inducements (e.g., and Lepkowski (1987) reached a different conclusion. pens, golf balls). Nonetheless, most telephone surveys These investigators conducted identical telephone in- have difficulty achieving response rates much higher terviews with two equivalent samples of respondents. than 60%, and most face-to-face surveys have diffi- A higher response rate was achieved with one of the culty achieving response rates much higher than 70%. samples by mailing letters to them in advance, noti- In even the best academic surveys with such fying them about the survey (this improved response response rates, there are significant biases in the rates from about 56% to about 70%). Correlations be- demographic composition of samples. For example, tween some pairs of variables were the same in the Brehm (1993) showed that in the two leading, re- two samples. Correlations between other pairs of vari- curring academic national surveys of ables were weakly positive in the sample with the lower (the National Election Studies and the General Social response rate and much more strongly positive in the Surveys), certain demographic groups have been sample with the higher response rate. And correlations routinely represented in misleading numbers. For between still other pairs of variables were strongly pos- example, young adults and old adults are underrepre- itive in the sample with the lower'response rate and sented; males are underrepresented; people with the zero in the sample with the higher response rate. Thus, lowest levels of education are overrepresented; and substantive results can change substantiallyas response people with the highest incomes are underrepresented. rates are improved. Likewise, Smith (1983) reported evidence suggesting Consequently, it is worthwhile to assess the degree that people who don't participate in surveys are likely to which nonresponse error is likely to have distorted to have a number of distinguishing demographic char- data from any particular sample of interest. One ap- acteristics (e.g., living in big cities and working long proach to doing so involves making aggressive efforts to hours). recontact a randomly selected sample of people who re- Although a response rate may seem far from loo%, fused to participate in the,survey and collect some data such a high rate of nonresponse does not necessar- from these individuals. One would especially want to ily mean that a study's implications about nondemo- collect data on the key variables of interest in the study, graphic variables are contaminated with error. If the but it can also be useful to collect data on those dimen- constructs of interest are not correlated with the like- sions along which nonrespondents and respondents lihood of participation, then nonresponse would not seem most likely to differ substantially (see Brehm, distort results. So investing large amounts of money 1993). A researcher is then in a position to assess the and staff effort to inaease response rates might not magnitude of differences between people who agreed translate into higher data quality. to participate in the survey and those who refused to A particularly dramatic demonstration of this fact do so. was reported recently by Visser, Krosnick, Marquette, A second strategy rests on the assumption that re- and Curtin (1996). These researchers compared the spondents from whom data were difficult to obtain accuracy of self-administered mail surveys and tele- (either because they were difficult to reach or be- phone surveys forecasting the outcomes of statewide cause they initially declined to participate and were elections in Ohio over a 15-year period. Although the later persuaded to do so) are likely to be more similar mail surveys had response rates of about 20% and the to nonrespondents than are people from whom data telephone surveys had response rates of about 60%, were relatively easy to obtain. Researchers can com- the mail surveys predicted election outcomes much pare responses of people who were immediately will- more accurately (average error = 1.6%) than the tele- ing to participate with those of people who had to be phone surveys (averageerror = 5.2%). In addition, the recontacted and persuaded to participate. The smaller SURVEY RESEARCH 235 the discrepanaes between these groups, the less of a processes (e.g., Kitayama 6 Markus, 1994; Nisbett 6 threat nonresponse error would seem to be (though Cohen, 1996). In a spate of articles published in top see Lin 6 Schaeffer, 1995). journals, a sample of people froni one country has been compared with a sample of people from another coun- COYBRACE ERROR. One other sort of possible er- try, and differences between the samples have been ror deserves mention at this point: . For attributed to the impact of the countries' cultures (e.g., reasons of economy, researchers sometimes draw prob- Benet 6 Waller, 1995; Han 6 Shavitt, 1994; Heine 6 ability samples not from the full set of elements in a Lehman, 1995; Rhee, Uleman, Lee, 6 Roman, 1995). population of interest but rather from more limited In order to convincingly make such comparisons and sampling frames. The greater the discrepancy between properly attribute differences to culture, of course, the the population and the sampling frame, the greater po- sample drawn from each culture must be representa- tential there is for coverage error. Such error may in- tive of it. And for this to be so, one of the probability validate inferences about the population that are made sampling procedures described above must be used. on the basis of data collected from the sample. Alternatively, one might assume that cultural im- By way of illustration, many national surveys these pact is so universal within a country that any arbitrary days involve telephone interviewing. And although sample of people will reflect it. However, hundreds their goal is to represent the entire country's popu- of studies of Americans have documented numerous lation, the sampling methods used restrict the sam- variations between subgroups within the culture in so- pling frame to households with working telephones. dal psychological processes, and even recent work on Although the vast majority of American adults do live the impact of culture has documented variation within in households with working telephones (about 95%; nations (e.g., Nisbett 6 Cohen, 1996). Therefore, it is Congressional Information Semce, 1990), there is a diflidt to have much confidence in the presumption 5% gap between the population of interest and the that any given social psychological process is universal sampling frame. To the extent that people in house- within any given culture, so probability sampling seems holds without telephones are different from those in essential to permit a conclusion about differences be- households with telephones, generalization of sample tween cultures based on differences between samples results may be inappropriate. of them. As compared with residents of households with Instead, however, nearly all recent social psycho- working telephones, those in households with- logical studies of culture have employed nonprobabil- out working telephones tend to earn much less ity sampliug procedures. These are procedures where money, have much less formal education, tend to be some elements in the population have a zero proba- much younger, and are more often racial minorities bility of being selected or have an unknown probabil- (Thornberry 6 Massey, 1988). If attitudes toward ity of being selected. For example, Heine and Lehman government-sponsored social welfare programs to (1995) ampared college students enrolled in psy- help the poor are especially positive in such households chology courses in a public and private university in (as seems quite likely), then telephone surveys may with college students enrolled in a psychology underestimate popular support for such programs. course at a public university in Canada. Rhee et al. Fortunately, however, households with and without (1995) compared students enrolled in introductory telephones tend not to differ much on many behav- psychology courses at New York University with psy- ioral and attitudinal measures that are unrelated to chology majors at Yonsei University in Seoul, Korea. income (Groves 6 Kahn, 1979; Thornberry 6 Maq, Han and Shavitt (1994) compared undergraduates at 1988). Nonetheless, researchers should be aware of the University of Illinois. with students enrolled in coverage error due to inadequate sampling frames introductory communication or classes at and, when possible, should attempt to estimate and a major university in Seoul. And Benet and Waller correct for such error. (1995) compared students enrolled at two universities in Spain with Americans listed in the California 'Ibvin Registry. In all of these studies, the researchers generalized Having considered probability sampling, we tum the findings from the samples of each culture to the now to nonprobabiity sampling methods. Such meth- entire cultures they were presumed to represent. For ods have been used frequently in studies inspired by example, after assessing the extent to which their two the recent surge of interest among social psychologists samples manifested self-enhancing biases, Heine and in the impact of culture on social and psychological Lehman (1995) concluded that 'people from cultures 236 PENNY S. VISSER JON A. KROSNICK. ANDI PAUL J. LAVRAKAS representative of an interdependent construal of the This technique has been used in a number of social self," instantiated by the Japanese students, "do not psychological studies to afford comparisons of what self-enhance to the same extent as people from cultures are called "known groups" (e.g., Hovland, Harvey, 6 characteristic of an independent self," instantiated by Sherif, 1957; Webster 6 Kruglanski, 1994). For exam- the Canadian students (p. 605). Yet the method of re- ple, in order to study people strongly supporting pro- cruiting potential respondents for these studies ren- hibition, Hovland et al. (1957) recruited participants dered zero selection probabilities for large segments of from the Women's Christian Temperance Union, stu- the relevant populations. Consequently, it is impossi- dents preparing for the ministry, and students enrolled ble to know whether the obtained samples were rep- in religious colleges. And to compare people who were resentative of those populations, and it is impossible high and low in need for closure, Webster and Kruglan- to estimate sampling error or to construct confidence ski (1994) studied majors and studio art intervals for parameter estimates. As a result, the statis- majors, respectively. tical calculations used in these articles to compare the In these studies, the groups of participants did in- different samples were invalid, because they presumed deed possess the expected characteristics, but they may simple random sampling. have had other characteristics as well that may have More important, their results are open to alternative been responsible for the studies' results. This is so be- interpretations, as is illustrated by Benet and Waller's cause the selection procedures used typically yield un- (1995) study. One of the authors' conclusions is that in usual homogeneity within the "known groups" in at contrast to Americans, "Spaniards endorse a 'radical' least some regards and perhaps many. For example, form of individualism" (p. 71 5). Justifying this con- accounting majors may have much more training in clusion, ratings of the terms "unconventional," "pecu- mathematics and related thinking styles than studio liar," and "odd" loaded in a factor analysis on the same art majors. Had more heterogeneous groups of people factor as ratings of "admirable" and "high-ranking" in high and low in need for closure been studied by Web- the Spanish sample, but not in the American sample. ster and Kruglanski (1994), it is less likely that they However, Benet and Waller's American college student would have sharply differed in other regards and less sample was significantly younger and more homoge- likely that such factors could provide alternative expla- neous in terms of age than their sample of California nations for the results observed. twins (the average ages were 24 years and 37 years, re- Snowball sampling is a variant of purposive sam- spectively; the standard deviations of ages were 4 years pling, where a few members of a subpopulation are and 16 years, respectively). Among Americans, young located, and each is asked to suggest other members adults most likely value unconventionality more than of the subpopulation for the researcher to contact. older adults, so what may appear in this study to be Judd and Johnson (198 1) used this method in an in- a difference between countries attributable to culture vestigation comparing people with extreme views on may instead simply be an effect of age that would be women's issues to people with moderate views. To as- apparent within both cultures. semble a sample of people with extreme views, these The sampling method used most often in the studies investigators initially contacted undergraduate women described above is called haphazard sampling, because who were members of feminist organizations and then participants were selected solely on the basis of conve- asked them to provide names of other women who nience (e.g., because they were enrolled in a particular were also likely to hold similar views on women's is- course at a particular university). In some cases, no- sues. Like cluster sampling, this sampling method also tices seeking volunteers were widely publicized, and violates the assumption of independence of observa- people who contacted the researchers were paid for tions, complicating analyis. their participation (e.g., Han 6 Shavitt, 1994). This is hobably the best known form of nonprobability problematic because people who volunteer tend to be sampling is quota sampling, which involves selecting more interested in (and sometimes more knowledge- members of various subgroups of the population to as- able about) the survey topic than the general pub- semble a sample that accurately reflects known charac- lic (see, e.g., Bogaert, 1996; Coye, 1985; Dollinger 6 teristics of the population. Predetermined numbers of Leong, 1993), and social psychological processes seem people in each of several categories are recruited to ac- likely to vary with interest and expertise. complish this. For example, one can set out to recruit Yet another nonprobability sampling method is pur- a sample containing half men and half women, and posive sampling, which involves haphazardly selecting one third people with less than high school education, members of a particular subgroup within a population. one third high school graduates, and one third people SURVEY RESEARCH with at least some college education. If quotas are im- documented occur only among select groups of Amer- posed on a probability sampling procedure (e.g., tele- ican college sophomores. phone interviews done by random digit dialing) and if After an initial demonstration of an effect or process the quotas are based on accurate information about a or tendency, subsequent research can assess its gen- population's composition (e.g., the U.S. Census), then erality. Therefore, work such as Heine and Lehrnan's the resulting sample may be more accurate than sim- (1995) is valuable because it shows us that some find- ple random sampling would be, though the gain would ings are not limitlessly generalizable and sets the stage most likely be very small. for research illuminating the relevant limiting con- However, quotas are not usually imposed on prob- ditions. We must be careful, though, about presum- ability sampling procedures but instead are imposed ing that we know what these limiting conditions are on haphazard samples. Therefore, this approach can without proper, direct, and compelling tests of our give an arbitrary sample the patina of representative- conjectures. ness, when in fact only the distributions of the quota criteria match the population. A particularly dramatic illustration of this problem is the failure of preelec- QUESTIONNAIRE DESIGN AND tion polls to predict that Truman would win his bid ERROR for the U.S. Presidency in 1948. Although interviewers Once a sample is selected, the next step for a survey conformed to quotas in selecting respondents, the re- researcher is questionnaire design. When designing a sulting sample was quite unrepresentative in some re- questionnaire, a series of decisions must be made about gards not explicitly addressed by the quotas (Mosteller, each question. First, will it be open-ended or closed- Hyman, McCarthy, Marks, 6 Truman, 1949). A study ended?,And for some closed-ended question tasks, by Katz (1942) illustrated how interviewers tend to should one use rating scales or ranking tasks? If one over-sample residents of one-family houses, American- uses rating scales, how many points should be on the born people, and well-educated people when these di- scales and how should they be labeled with words? mensions are not explicit among the quota criteria. Should respondents be explicitly offered "no-opinion" Given all this, we urge researchers to recognize the response options or should these be omitted? In what inherent limitations of nonprobability sampling meth- order should response alternatives be offered? How ods and to draw conclusions about populations or should question stems be worded? And finally, once about differences between populations tentatively, if at all the questions are written, decisions must be made all, when nonprobability sampling methods are used. about the order in which they will be asked. Furthermore, we encourage researchers to attempt to Every researcher's goal is to maximize the reli- assess the representativeness of samples they study by ability and validity of the data he or she collects. comparing their attributes with known population at- Therefore, each of the above design decisions should tributes in order to bolster confidence in generalization presumably be made so as to maximize these two indi- when appropriate and to temper such confidence when cators of data quality. Fortunately, thousands of empir- necessary. To scholars in disciplines that have come to ical studies provide clear and surprisingly unanimous recognize the necessity of probability sampling in or- advice on the issues listed aboye. Although a detailed der to describe populations (e.g., and polit- review of this literature is far beyond the scope of this ical science), social psychological research attempting chapter (see Bradburn, et al., 1981; J. M. Converse 6 to generalize from a college student sample to a nation Presser, 1986; Krosnick 6 Fabrigar, in press; Schuman looks silly and damages the apparent credibility of our 6 Presser, 1981; Sudrnan, Bradburn, 6 Schwarz, 1996), enterprise. we will provide a brief tour of the implications of these Are we suggesting that all studies of college sopho- studies. mores enrolled in introductory psychology courses are of minimal scientific value? Absolutely not. The Open versus Closed Questions value of the vast majority of social psychological lab- oratory experiments does not hinge on generalizing An open-ended question permits the respondent to their results to a population. Instead, these studies test answer in his or her own words (see, e.g., C. Smith, whether a particular process occurs at all, to explore this volume, Ch. 12; Bartholomew, Henderson, 6 its mechanisms, and to identify its moderators. Any Marcia, this volume, Ch. 11). For example, one com- demonstrations along these lines enhance our under- monly asked open-ended question is "What is the standing of the human mind, even if the phenomena most important problem facing the country today?" In 238 PENNY S. VISSBR JON A. KROSMCK, AND PAUL J. WVRAWS contrast, a closed-ended question requires that the re- not work well for respondents who are not especially spondent select an answer from a set of choices offered articulate, because they might have special difficulty explicitly by the researcher. A closed-ended version of explaining their feelings. However, this seems not to the above question might ask 'What is the most im- be a problem (England, 1948; Geer, 1988). Second, portant facing the country today: inflation, unemploy- some researchers feared that respondents would be ment, crime, the federal budget deficit, or some other especially likely to answer open-ended questions by problem?" mentioning the most salient possible responses, not The biggest challenge in using open-ended ques- those that are truly most appropriate. But this, too, tions is the task of coding responses. In a survey of appears not to be the case (Schuman. Ludwig, 6 Kros- 1,000 respondents, nearly 1,000 different answers will nick, 1986). Thus, open-ended questions seem to be be given to the 'most important problem" question worth the trouble they take to ask and the complexi- if considered word-for-word. But in order to analyze ties in analysis of their answers. these answers, they must be clumped into a relatively small number of categories. This requires that a cod- Rating versus Ranking ing scheme be developed for each open-ended ques- tion. Multiple people must read and code the answers Practical considerations enter into the choice be- into the categories, the level of agreement beenthe tween ranking and rating questions as well. Imagine coders must be ascertained, and the procedure must be that one wishes to determine whether people prefer refined and repeated if agreement is too low. The time to eat carrots or peas. Respondents could be asked this and financial costs of such a procedure, coupled with question directly (a ranking question), or they could the added challenge of requiring interviewers to care- be asked to rate their attitudes toward carrots and peas fully transcribe answers, have led many researchers to separately, and the researcher could infer which is pre- favor closed-ended questions, which in essence ask re- ferred. With this research goal, asking the single rank- spondents to directly code themselves into categories ing question seems preferable and more direct than that the researcher spedfies. asking the two rating questions. But rank-ordering a Unfortunately, when used in certain applications, large set of objects takes much longer and is less en- closed-ended questions have distinct disadvantages. joyed by respondents than a rating task (Elig 6 Frieze, Most important, respondents tend to confine their an- 1979; Taylor 6 Kinnear, 1971). Furthermore, ranking swers to the choices offed even if the researcher might force respondents to make choices between ob- does not wish them to do so (Jenkins, 1935; Lindzey 6 jects toward which they feel identically, and ratings can Guest, 1951). Explicitly offah# the option to specify reveal not only which object a respondent prefers but a different response does little to combat this problem. also how different his or her evaluations of the objects If the list of choices offered by a question is incom- are. plete, even the rank ordering of the choices that are Surprisingly, however, rankings are more effective explicitly offered can be difkrctlt from what would than ratings, because ratings suffer from a significant be obtained from an open-ended question. There- problem: nondifl~~mtiatimt.When rating a large set of fore, a closed-ended question, aan only be used ef- objects on a single scale, a signiticantly number of fectively if its answer choices arr comprehensive, and respondents rate multiple objects identically as a re- this can often be assured only if an open-ended ver- sult of survey satisficing (Krosnick, 1991b). That is, sion of the question is admidstered in a pretest using although these respondents could devote thought to a reasonably large sample. Perhaps, then, researchers the response task, retrieve relevant information from should simply include the open-ended question in memory, and report differentiated attitudes toward the the final questionnaire, because they will otherwise objects, they choose to shortcut this process instead. To have to deal with the challenges of coding during do so, they choose what appears to be a reasonable pretesting. Also supportive of this conclusion is evi- point to rate most objects on the scale and select that dence that open-ended questions have higher reliabil- point over and over (i.e., nondifferentiation), rather ities and validities than closed-ended questions (e.g., than thinking carefully about each object and rat- Hurd, 1932; Remmers, Marschat, Brown, 6 Chapman, ing different objects differently (see Krosnick, 1991b; 1923). Krosnick 6 Alwin, 1988).As a result, the reliability and One might hesitate in implementing this advice validity of ranking &ta are superior to those of rating because open-ended questions may themselves be &ta (e.g., Miethe, 1985; Munson 6 McIntyre, 1979; susceptible to unique problems. For example, some Nathan 6 Alexander, 1985; Rankin 6 Grube, 1980; researchers feared that open-ended questions would Reynolds 6 Jolly, 1980). So although rankings do not SURVEY RESEARCH 139 yield interval-level measures of the perceived distances later in a questionnaire, when respondents are pre- between objects in respondents' minds and are more sumably more fatigued (see Krosnick, 1991b) . A num- statistically cumbersome to analyze (see Alwin 6 Jack- ber of studies now demonstratehow acquiescence can son, 1982). these measures are apparently more useful distort the results of substantive investigations (e.g., when a researcher's goal is to ascertain rank orders of Jackman, 1973; Winkler, Kanouse, 6 Ware, 1982), objects. and in a particularly powerful example, acquiescence undermined the scientific value of The Authoritarian Personality's extensive investigation of faasm and an- Rating Scale Formats tisemitism (Adorno, Frankel-Brunswick, Levinson, 6 When designing a rating scale, one must begin by Sanford, 1950). This damage occurs equally when di- spedfying the number of points on the scale. A great chotanous items offer just two choices (e.g., 'agree" number of studies have compared the reliability and and 'disagree8) as when a rating scale is used (e.g., validity of scales of varying lengths (for a review, see ranging from 'strongly agree' to 'strongly disagree"). Krosnick 6 Fabrigar, in press). For bipolar scales (e.g., It might seem that acquiescence can be controlled running from positive to negative with neutral in the by measuring a COnStruct with a large set of items, half middle), reliability and validity are highest for about of them making assertions opposite to the other half 7 points (e.g., Matell 6 Jacoby, 1971). In contrast, the (called 'item reversals"). This approach is designed to reliability and validity of unipolar scales (e.g., running place acquiescers in the middle of the final dimension from no importance to very high importance) seem to but will do so only if the assertions made in the re- be optimized for a bit shorter scales, approximately 5 versals~areequally extreme as the statements in the points long (e.g, Whan 6 Warneryd, 1990). Tech- original items. This involves extensive pretesting and niques such as magnitude scaling (e.g., Lodge, 1981), is therefore! cumbersome to implement. Furthermore, which offer scales with an infinite number of points, it is difficult to write large sets of item reversals with- yield data of lower quality than more conventional rat- out using the word 'not" or other such negations, and ing scales and should therefore be avoided (e.g., Cooper evaluating assertions that include negations is cogni- 6 Clare, 1981; Miethe, 1985; Patrick, Bush, 6 Chen, tiveIy burdensome and error-laden for respondents, 1973). thus adding measurement error and inaeasing respon- A good number of studies suggest that data quality dent fatigue (e.g., Eifermann, 1961; Wason, 1961). is better when all scale points are labeled with words And even after all this, acquiescers presumably end than when only some are (e.g., Krosnick 6 Berent, up at the midpoint of the resulting measurement di- 1993). Furthermore, respondents are more satisfied mension, which is probably not where most belong when more rating scale points are verbally labeled (e.g., on subtantive grounds anyway. That is, if these indi- Dickinson 6. Zellinger, 1980).When selecting labels, re- viduals were induced not to acquiesce but to answer searchers should strive to select ones that have mean- the items thoughtfully, their final index scores would ings that divide up the continuum into approximately presumably be more valid than placing them at the equal units (e.g., Klockars 6 Yarnadshi, 1988).For ex- midpoint. ample, 'very good, good, and poor" is a combination Most important, answering an agree-disagree, true- that should be avoided, because the terms do not di- false, or yes-no question always involves answering a vide the continuum equally: the meaning of 'good" is comparable rating question in one's mind first. For ex- much closer to the meaning of 'very good" than it is to ample, if a man is asked to agree or disagree with the as- the meaning of 'poor" (Myers 6 Warner, 1968). sertion 'I am not a friendly person," he must first decide Researchers in many fields these days ask people how friendly he is (perhapsconcluding 'very friendly") questions offering response choices such as 'agree- and then translate that conclusion into the appropri- disagree," 'true-false," or 'yes-no" (see, e.g., Bearden, ate selection in order to answer the question he was Netemeyer, 8 Mobley, 1993). Yet a great deal of re- asked ('disagree" to the original item). It would be sim- search suggests that these response choices sets are pler and more direct to ask the person how friendly he problematic because of acquiescence is. In fact, every agree-disagree, true-false, or yes-no (see, e.g., Couch 6 Keniston, 1960; Jackson, 1979; question implicitly requires the respondent to make a Schuman 6. Presser, 1981). That is, some people are mental rating of an object along a continuous dimen- inclined to say 'agree," 'true," or 'yes," regardless of sion, so asking about that dimension is simpler, more the content of the question. Furthermore, these re- direct, and less burdensome. It is not surprising, then, sponses are more common among people with limited that the reliabity and validity of other rating scale cognitive skills, for more difficult items, and for items and forced choice questions are higher than those of 240 PENNY S. VISSBR, JON A. KROSMCK, AND PAUL J. LAVRAKAS

agree-disagree, true-false, and yes-no questions (e.g., not surprising, then, that the quality of data collected is Ebel, 1982; Mirowsky 6 Ross, 1991; Ruch 6 DeGraff, no higher when a 'no opinion" option is offered than 1926; Wesman, 19%). Consequently, it seems best to when it is not (e.g., McClendon 6 Alwin, 1993). mat avoid long batteries of questions in these latter formats is, people who would have selected this option if of- and instead ask just a couple of questions using other fered nonetheless give meaningful opinions when it is rating scales and forced choice formats. not offered. A better way to accomplish the goal of differentiat- ing 'real" opinions from 'nonattitudes" is to measure The Order of Response Alternatives the strength of an attitude using one or more follow- The answers people give to closed-ended questions up questions. Krosnick and Petty (1995)proposed that are sometimes influenced by the order in which the strong attitudes can be defined as those that are resis- alternatives are offered. When categorical response tant to change, are stable over time, and have pow- choices are presented visually, as in self-administered erful impact on cognition and action. Many empiri- , people are inclined toward primacy ef- cal investigations have confirmed that attitudes vary fects, whereby they tend to select answer choices of- in strength, and the respondent's presumed task when fered early in a list (e.g., Krosnick 6 Alwin, 1987; confronting a 'don't know" response option is to de- Sudmaa Bradburn, 6 Schwan, 1996). But when cide whether his or her attitude is sufficiently weak as categorical answer choices are read aloud to peo- to be best described by selecting that option. But be- ple, recency effects tend to appear, whereby peo- cause the appropriate cutpoint along the strength di- ple are inclined to select the options offered last mension seems exceedingly hard to specify, it would (e.g., McClendon, 1991). These effects are most pro- seem preferable to ask people to describe where their nounced among respondents low in cognitive skills attitude falls along the strength continuum. and when questions are more cognitively demand- However, there are many different aspects of at- ing (Krosnick 6 Alwia 1987; Payne, 194911950). titudes related to their strength that are all some- All this is consistent with the theory of satisficing what independent of each other (see, e.g., Krosnick, (Krosnick, 1991b), which posits that response order Boninger, Chuang, Berent, 6 Carnot, 1993). For ex- effects are generated by the confluence of a confirma- ample, people can be asked how important the is- tory bias in evaluation, cognitive fatigue, and a bias sue is to them personally or how much they have in memory favoring respomx choices read aloud most thought about it or how certain they are of their opin- recently. Therefore, it seems best to minrmize the dif- ion or how knowledgeable they are about it (for de- ficulty of questions and to rotate the order of response tails on measuring these and many other dimensions, choices across respondents. see Wegener, Downing, Krosnick, 6 Petty, 1995). Each of these dimensions can help to differentiate attitudes that are crystallized and consequential from those that No-Opinion Filters and Attitude Strength are not. Concerned about the possibility that respondents may feel pressure to offer opinions on issues when they Question Wording truly have no attitudes (e.g., P. E. Converse, 1964), questionnaire designers have often explicitly offered The of questionnaire-based research requires respondents the option to say they have no opinion. that all respondents be confronted with the same stim- And indeed, many more people say they 'don't know" ulus (i.e., question), so any differencesbetween people what their opinion is when this is done than when it in their responses are due to real differences between is not (e.g., Schuman 6 Presser, 1981). People tend to the people. But if the meaning of a question is am- offer this response under conditions that seem sensible biguous, different respondents may interpret it differ- 1 (e.g., when they lack knowledge on the issue; Dono- ently and respond to it differently. Therefore, experi- van 6- Leivers, 1993), and people prefer to be given enced survey researchers advise that questions always I this option in questionnaires (Ehrlich, 1964). However, avoid ambiguity. They also recommend that wordings 6 most 'don't know" responses are due to codicting be easy for respondents to understand (thereby min- I feelings or beliefs (rather than lack of feelings or be- imizing fatigue), and this can presumably be done by liefs all together) and uncertainty about exactly what I using short, simple words that are familiar to people. 1 a question's response alternatives mean or what the When complex or jargony words must be used, it is ? question is asking (e.g., Coombs 6 Coombs, 1976).It is best to define them explicitly. SURVEY RESEARCH 241

Another standard piece of advice from seasoned sur- it is often useful to rotate question order across re- veyers is to avoid double-barreled questions, which spondents so that any question order effects can be actually ask two questions at once. Consider the ques- empirically gauged and statistically controlled for if tion, 'Do you think that parents and teachers should necessary. teach middle school students about birth control op- tions?" If a respondent feels that parents should do such Questions to Avoid teaching and that teachers should not, there is no com- fortable way to say so, because the expected answers It is often of interest to researchers to study trends are simply 'yes" or 'no." Questions of this sort should over time in attitudes or beliefs. To do so usually re- be decomposed into ones that address the two issues quires measuring a construct at repeated time points separately. in the same group of respondents. An appealing short- Sometimes, the particular words used in a ques- cut is to ask people to attempt to recall the attitudes or tion stem can have a big impact on responses. For beliefs they held at specific points in the past. However, example, Smith (1987) found that respondents in a a great deal of evidence suggests that people are quite national survey were much less positive toward 'peo- poor at such recall, usually presuming that they have ple on welfare" than toward 'the poor." But Schuman always believed what they believe at the moment (e.g., and Presser (1981) found that people reacted equiva- Bem 6 McConnell, 1970; Ross, 1989).Therefore, such lently to the of 'abortion' and 'ending preg- questions vastly underestimate change and should be nancy," despite the investigators' intuition that these avoided. concepts would elicit different responses. These inves- Because researchers are often interested in identi- tigators also found that more people say that a con- fying the causes of people's thoughts and actions, it troversial behavior should be 'not allowed" than say it is tempting to ask people directly why they thought should be "forbidden,' despite the apparent conceptual a certain thing or behaved in a certain way. This in- equivalence of the two phrases. Thus, subtle aspects of volves askkg people to introspect and describe their question wording can sometimes make a big difference, own cognitive processes, which was one of modem so researchers should be careful to say exactly what psychology's first core research methods (Hothersall, they want to say when wording questions. Unfortu- 1984).However, it became clear to researchers early in nately, though, this literature does not yet offer general this century that it did not work all that well, and Nis- guidelines or principles about wording selection. bett and Wilson (1977) articulated an about why this is so. Evidence produced since their landmark paper has largely reinforced the conclusion that many Question Order cognitive processes occur very quickly and automati- An important goal when ordering questions is to cally "behind a black curtain" in people's minds, so they help establish a respondent's comfort and motivation are unaware of them and cannot describe them. Con- to provide high-quality data. If a questionnaire begins sequently, questions asking for such descriptions seem with questions about matters that are highly sensitive best avoided as well. or controversial or that require substantial cognitive ef- fort to answer carefully or that seem poorly written, re- PRETESTING spondents may become uncomfortable, uninterested, or unmotivated and may therefore terminate their par- Even the most carefully designed questionnaires some- ticipation. Seasoned questionnairedesigners advise be- times include items that respondents find ambiguous ginning with items that are easy to understand and an- or difficult to comprehend. Questionnairesmay also in- swer on noncontroversial topics. clude items that respondents understand perfectly well, Once a bit into a questionnaire, grouping questions but interpret differently than the researcher intended. by topic may be useful. That is, once a respondent starts Because of this, questionnaire pretesting is conducted thinking about a particular topic, it is presumably easier to detect and repair such problems. Pretesting can also for him or her to continue to do so, rather than hav- provide information about probable response rates of ing to switch back and forth between topics, question a survey, the cost and timeframe of the data collection, by question. However, initial questions in a sequence the effectiveness of the field organization, and the skill can influence responses to later, related questions, level of the data collection staff. A number of pretest- for a variety of reasons (see Tourangeau 6 Rasinski, ing methods have been developed, each of which has 1988). Therefore, within blocks of related questions, advantages and disadvantages, as we review next. PENNY S. WSSBR JON A. KROSNICK, AND PAUL J. LAVRAKAS

Pretesting Methods for that eliat frequent deviations from the script are pre- Interviewer-Administered Questionnaires sumed to require modification. Although-behavior coding provides a more sys- CONVW"ONU p-smG* In cOnVentional tematic, objective approach than conventional pretest face-to-face and telephone survey pretesting, inter- methods, it is also subject to Moa impor- viewers conduct a small number of interviews (usually unS behavior coding is likely to miss problems cmter- between 15 and 25) and then discuss their experiences around rmamnsrmed Which may not with the researcher in a debriefing session (see, e.g., elidt any deviations from the Bischoping, 1989; Nelson, 1985). They describe any they encountered (e.g.,-identifyingquestions COG- ~~G.TO overcome this im- that required further explanation, wording that was portant weakness, researchers employ a third pretest difficult to read or that respondents seemed to find method, borrowed from cognitive psychology. It in- confusing) and their impressions of the respondents' volves administering a questionnaireto a small number experiences in answering the questions. Researchers of people who are asked to 'think aloud," verbalizing might also look for excessive item-nonresponse in the whatever considerations come to mind as they formu- pretest interviews, which might suggest a question late their responses (e.g., Forsyth 6 Lessler, 1991 ) .This is problematic. On the basis of this information, 'think aloud" procedure is designed to assess the cog- researchers can make modifications to the survey nitive processes by which respondents answer ques- instrument to inaease the likelihood that the meaning tions, which presumably provides insight into the way of each item is clear to respondents and that the each item is comprehended and the strategies used to interviews proceed smoothly. devise answers. Interviewers might also ask respon- Conventional pretesting can provide valuable infor- dents about particular elemem of a survey question, mation about the survey instrument, especially when such as interpretations of a specific word or phrase or the interviewers are experienced survey data collec- overall impressions of what a question was designed to tors. But this approach has limitations. For example, assess. what constitutes a 'problem" in the survey interview is often defined rathir loosely, so there is potential for cortwmtnu~~SB PRFI~STMG ~ODS.These considerable variance aaoss interviewers in terms of three methods of pretesting focus on different aspects what is reported during debriefing sessions. Also, de- of the process, and one might briefing interviews are sometimes relatively unstruc- expect that they would detect different types of inter- tured, which might further contribute to variance in view problems. And indeed, sug- interviewers' reports. Of course, researchers can stan- gests that the methods do differ in terms of the kinds of dardize their debriefing interviews, thereby reducing problems they detect, as well as in the reliability with the idiosyncrasies in the reports from pretest inter- which they detect these problems (i.e., the degree to viewers. Nonetheless, interviewers' impressions of re- which repeated pretesting of a particular questionnaire spondent reactions are unavoidably subjective and are consistently detects the same problems). likely to be imprecise indicators of the degree to which Presser and Blair ( 1994) demonstrated that behav- respondents actually had difficulty with the survey ior coding is quite consistent in detecting apparent mstrument. respondent difficulties and interviewer problems. Con- ventional pretesting also detects both sorts of po- BBHAVIOR CODING. A second method, called be- tential problems, but less reliably. In fact, the corre- havior coding, offers a more objective, standardized lation between the apparent problems diagnosed in approach to pretesting. Behavior coding involves mon- independent conventional pretesting trials of the same itoring pretest interviews (either as they take place or questionnaire can be remarkably low. Cognitive inter- via tape recordings of them) and noting events that oc- views also tend to exhibit low reliability aaoss trials, lI cur during interactions between the interviewer and and they tend to detect respondent difficulties almost i the respondent (e.g., Cannell, Miller, 6 Oksenberg, exclusively. I 1981). The coding reflects each deviation from the However, the relative reliability of the various script (caused by the interviewer misreading the ques- pretesting methods is not necessarily informative about i tionnaire, for example, or by the respondent asking for the validity of the insights gained from them. And one I additional information or providing an initial response might even imagine that low reliability actually re- that was not suffidently clear or complete). Questions flects the capadty of a particular method to continue to I SURVEY RESEARCH 243 reveal additional, equally valid problems aaoss pretest- relevant to data-collection mode (face-to-face, tele- ing iterations. But unfortunately, we know of no em- phone, and self-administered) and interviewer selec- pirical studies evaluating or comparing the validity of tion, training, and supelvision (for comprehensive dis- the various pretesting methods. Much research along cussions, see, e.g., Bradburn & Sudman, 1979; Dillman, these lines is clearly needed. 1978; Fowler & Mangione, 1990; Prey, 1989; Lavrakas, 1993). Self-Administered Questionnaire Pretesting Mode Pretesting is especially important when data are to be collected via self-administered questionnaires, FACE-TO-FACEINTBRVIBWS. Face-to-face data ~01- because interviewers will not be available to clarify lection often requires a large staff of well-trained inter- question meaning or probe incomplete answers. Pur- viewers who visit respondents in their homes. But this thermore, with self-administered questionnaires, the mode of data collection is not limited to in-home in- researcher must be as concerned about the layout of the terviews; face-to-face interviews can be conducted in questionnaire as with the content; that is, the format a laboratory or other locations as well. Whatever the must be "user-friendly" for the respondent. A ques- setting, face-to-faceinterviews involve the oral presen- tionnaire that is easy to use can presumably reduce tation of survey questions, sometimes with visual aids. measurement error and may also reduce the potential Until recently, interviewers always recorded responses for nonresponse error by providing a relatively pleasant on paper copies of the questionnaire, which were later task for the respondent. returned to the researcher. Unfortunately, however, pretesting is also most dif- Inaeasingly, however, face-to-faceinterviewers are ficult when self-administered questionnaires are used, being equipped with laptop computers, and the entire because problems with item comprehension or re- data-collection process is being regulated by computer sponse selection are less evident in self-administered programs. In computer-assisted personal interviewing questionnaires than face-to-face or telephone inter- (CAPI), interviewers work from a computer saeen, on views. Some researchers rely on observations of how which the questions to be asked appear one by one pretest respondents fill out a questionnaire to infer in the appropriate order. Responses are typed into the problems in the instrument - an approach analogous to computer, and subsequent questions appear instantly behavior coding in face-to-face or telephone interview- on the screen. This system can reduce some types of in- ing. But this is a less than optimal means of detecting terviewer error, and it permits researchers to vary the weaknesses in the questionnaire. spe~cquestionseach participant is asked based on re- A more effective way to pretest self-administered sponses to previous questions. It also makes the incor- questionnaires is to conduct personal interviews with poration of experimental manipulations into a survey a group of survey respondents drawn from the target easy, because the manipulations can be written directly population. Researchers can use the 'think aloud" pro- into the CAPI program. In addition, this system elimi- cedure described above, asking respondents to verbal- nates the need to enter responses into a computer after ize their thoughts as they complete the questionnaire, the interview has been completed. Alternatively, respondents can be asked to complete the questio~airejust as they would during actual data TBWHONB ~RVIBWS.Instead of interviewing collection, after which they can be interviewed about rkndents in person, researchers rely on telephone the experience. They can be asked about the clarity interviews as their primary mode of data collec- of the instructions, the question wording, and the re- tion. And whereas computerized data collection is a sponse options. They can also be asked about their in- relatively recent development in face-to-face inter- terpretations of the questions or their undemanding viewing most large-scale telephone survey organi- of the response alternatives and about the ease or dif- zations have been using such systems for the past ficulty of responding to the various items. decade. In fact, computer-assisted telephone interview- ing (CATI) has become the industry standard, and several software packages are available to simplify com- DATA COLLECTION puter programming. Like CAPI, CATI involves inter- The survey research process culminates in the collec- viewers reading from a computer screen, on which tion of data, and the careful execution of this final step each question appears in turn. Responses are entered is critical to success. Next, we discuss considerations immediately into the computer. 244 PBNNY S. VISSBR JON A. KROSNICK. AND PAUL J. LAVRAKAS

SELF-ADMINISTERED QUESTIONNAIRBS. Often, THE POPULATION. A number of characteristics of questionnaires are mailed or dropped off to individuals the population are relevant to selecting a mode of at their homes, along with instructions on how to data collection. For example, completion of a self- return the completed surveys. Alternatively, people administered questionnairerequires a basic proficiency can be intercepted on the street or in other public in reading and, depending on the response format, per- places and asked to compete a self-administeredques- haps writing. Clearly this mode of data collection is in- tionnaire, or such questionnaires can be distributed appropriate if a nonnegligible portion of the population to large groups of individuals gathered specifically Wig studied does not meet this minimum proficiency for the purpose of participating in the survey or requirement. Some level of computer literacy is nec- for entirely unrelated purposes (e.g., during a essary if CASAI is to be used, and again, this may be period or at an employee staff meeting). Whatever the an inappropriate mode of data collection if a nonneg- method of distribution, this mode of data collection ligible portion of the population is not computer lit- typically requires respondents to complete a written erate. Motivation is another relevant characteristic - questionnaire and return it to the researcher. researchers who suspect that respondents may be un- Recently, however, even this very simple mode of motivated to participate in the survey should select a data collection has benefited from advances in com- mode of data collection that involves interaction with puter technology and availability. Paper-and-pencil trained interviewers. Skilled interviewers can often in- self-administered questionnaires have sometimes been aease response rates by convincing individuals of the replaced by laptop computers, on which respondents value of the survey and persuading them to partici- proceed through a self-guided program that presents pate and provide high-quality data (Cannell, Oksen- the questionnaire. When a response to each question berg, 6. Converse, 1977; Marquis, Cannell, 6. Laurent, is made, the next question appears on the saeen, per- 1972). mitting respondents to work their wpy through the instrument at their own pace and with complete pri- SAMPLING STRATEGY. The sampling strategy to be vacy. Computer assisted self-administered interviewing used may sometimes suggest a particular mode of data (CASAI), as it is known, thus affords all of the ad- collection. For example, some preelection polling orga- vantages of computerized face-to-face and telephone nizations draw their samples from lists of currently reg- interviewing, along with many of the advantages of istered voters. Such lists often provide only names and self-administered questionnaires. A very new devel- mailing addresses, which limits the mode of data col- opment is audio CdSAI, where the computer 'reads lection to face-to-face interviews or self-administered aloud" questions to respondents, who listen on head- surveys. phones and type their answers on computers. DESIRED RBSPONSB RATB. Self-administered mail surveys typically achieve very low response rates, of- Choosing- a Mode ten less than 50% of the original sample when a sin- gle mailing is used. ~echni~ieshave been developed Face-to-face interviews, telephone interviews, and to yield strikingly high response rates for these sur- certain self-administered questionnaires each afford veys, but they are complex and more costly (see Dill- advantages, and choosing among them requires trade- man, 1978).Face-to-face and telephone interviews of- offs. This choice should be made with several factors ten achieve much higher response rates, which reduces in mind, including cost, characteristics of the popula- the potential for nonresponse error. tion, sampling strategy, desired response rate, ques- tion format, question content, questionnaire length, QUESTION FORM. If a survey includes open-ended length of the data-collection period, and availability of questions, face-to-face or telephone interviewing is of- facilities. ten preferable, because interviewers can, in a standard- ized way, probe incomplete or ambiguous answers to COST. The first factor to be considered when ensure the usefulness and comparability of data aaoss selecting a mode of data collection is cost. Face-to- respondents. face interviews are generally more expensive than tele- phone interviews, which are usually more expensive QUESTION CONTENT. If the issues under investi- than self-administered questionnaire surveys of com- gation are sensitive, self-administered questionnaires parable size. may provide respondents with a greater sense of SURVEY RESEARCH 245 privacy and may therefore elicit more candid responses interviewing task can be done in a standardized way than telephone interviews and face-to-face interviews while still establishing rapport. (e.g., Bishop 6. Fisher, 1995; Cheng, 1988; Wiseman, SBLECTING INTERVIEWERS. It is best to use expe- 1972). rienced, paid interviewers, rather than volunteers or students, because the former approach permits the QUBSTIONNAIRB LENGTH. Face-to-face data col- researcher to be selective and choose only the most lection permits the longest interviews, an hour or more. skilled and qualified individuals. Furthermore, volun- Telephone interviews are typically quite a bit shorter, teers or students often have an interest or stake in the usually lasting no more than 30 min, because respon- substantive outcome of the research, and they may dents are often uncomfortable staying on the phone have expectancies that can inadvertently bias data col- for longer. W~thself-administered questionnaires, re- lection. sponse rates typically decline as questionnaire length Whether they are to be paid for their work or not, inaeases, so they are generally kept even shorter. all interviewers must have good reading and writing skills, and they must speak clearly. Aside from these ba- LENGTH OF DATA COLLBCTION PERIOD. Di~Mbut- sic requirements, few interviewer characteristics have ing questionnaires by mail requires significant amounts been reliably associated with higher data quality (Bass of time, and follow-up mailings to inaease response 6. Tortora, 1988; Sudrnan 6. Bradburn, 1982). How- rates further increase the overall turnaround time. ever, interviewer characteristics can sometimes affect Similarly, face-to-face interview surveys typically re- answers to questions relevant to those characteristics. quire a substantial length of time in the field. In con- One instance where interviewer race may have had trast, telephone interviews can be completed in very impact along these lines involved the 1989 Virginia gu- little time, within a matter of days. bernatorial race. Preelection polls showed Black candi- date Douglas Wilder with a very comfortable lead over AVAILABILITY OF STAFF AND PACILITIBS. Self- his White opponept. On election day, Wilder did win administered mail surveys require the fewest facilities the election, but by a 'slim margin of 0.2%. Accord- and can be completed by a small staff. Face-to-face or ing to Pinkel, Guterbock, and Borg (1991), the over- telephone interview surveys are most easily conducted estimation of support for Wilder was due at least in with a large staff of interviewers and supervisors. And part to social desirability. Some survey participants ap- ideally, telephone surveys are conducted from a central parently believed it was sodally desirable to express location with sufficient office space and telephone lines support for the Black candidate, especially when their to accommodate a staff of interviewers, which need not interviewer was Bladc. Therefore, these respondents be large. overstated their likelihood of voting for Wilder. Likewise, Robinson and Rohde (1946) found that Interviewing the more clearly identifiable an interviewer was as being Jewish, the less likely respondents were to ex- When data are collected face-to-face or via tele- press anti-Jewish sentiments. Schuman and Converse phone, interviewers play key roles. We therefore re- (1971) found more favorable views of Blacks were view the role of interviewers, as well as interviewer se- expressed to B4& interviewers, though no race-of- lection, training, and supervision (see J. M. Converse interviewer effects appeared on numerous items that 6. Schuman 1974; Fowler 6. Mangione, 1986, 1990; did not explicitly ask about liking of Blacks (see also Saris, 1991). Hyman, Feldman, 6. Stember, 1954). It seems impos- sible to eliminate the impact of interviewer race on THE ROLE OF THB INTBRVIHWBR. Survey inter- responses, so it is preferable to randomly assign inter- viewers usually have three responsibilities. First, they viewers to respondents and then statistically control for are often responsible for locating and gaining coop- interview race and the match between interviewer race eration from respondents. Second, interviewers are and respondent race in analyses of data on race-related responsible to 'train and motivate" respondents to pro- topics. More broadly, incorporating interviewer char- vide thoughtful, accurate answers. And third, inter- acteristics in statistical analyses of survey data seems viewers are responsible for executing the survey in a well worthwhile and minimally costly. standardized way. The second and third resp~nsibili- ties do conflict with one another. But providing explicit TRAINING INTERVIEWERS. Interviewer training is cues to the respondent about the requirements of the an important predictor of data quality (Fowler 6. 146 PENNY S. VISSBR, JON A. KROSNICK, AND PAUL J. LAVRAKAS

Mangione, 1986, 1990). Careful interviewer training of the interviews themselves should be supervised. can presumably reduce random and systematic sur- When surveys are conducted by telephone, monitor- vey error due to interviewer mistakes and nonstan- ing the interviews is relatively easy and inexpensive dardized survey implementation across interviewers. and should be done routinely. When interviews are It seems worth the effort, then, to conduct thorough, conducted face-to-face, interviewers can tape record well-designed training sessions, especially when one some of their interviews to permit evaluation of each is using inexperienced and unpaid interviewers (e.g., aspect of the interview. students as part of a class project). aaining programs last 2 days or longer at some survey research organi- v~mo~.When data collection occurs from a zations, because shorter training programs do not ad- single location (e.g., telephone interviews that are equately prepare interviewers, resulting in substantial conducted from a central phone bank), researchers reductions in data quality (Fowler 6 Mangione, 1986, can be relatively certain that the data are authen- 1990). tic. When data collection does not occur from a cen- In almost all cases, training should cover topics such tral location (e.g., face-to-face interviews or telephone as interviews conducted from interviewers' homes), re- searchers might be less certain. It may be tempting how to use all interviewing equipment, for interviewers to falsify some of the questionnaires procedures for randomly selecting respondents that they turn in, and some occasionally do. To guard within households, against this, researchers often establish a procedure for techniques for eliciting survey participation and confirming that a randomly selected subset of all in- avoiding refusals, terviews did indeed occur (e.g., recontacting some re- oppommities to gain familiarity with the survey in- spondents and asking them about whether the inter- strument and to practice administering the ques- view took place and how long it lasted). tionnaire, instructions regarding how and when to probe in- complete responses, CONCLUSIONS instructions on how to record answers to open- and Over the last several decades, social psychological closed-ended questions, and researchers have come to more fully appreciate the guidelines for establishing rapport while maintain- complexity of social thought and behavior. In do- ing a standardized interviewing atmosphere. main after domain, simple 'main effect" theories have been replaced by more sophisticated theories in- aaining procedures can take many forms (e.g., lec- volving moderated effects. Statements such as 'Peo- tures, written training materials, of real or ple behave like this" are being replaced by 'Cer- simulated interviews), but it is important that at least tain types of people behave like this under these part of the training session involve supeMsed practice circumstances." More and more, social psychologists interviewing. Pairs of trainees can take turns playing are recognizing that psychological processes apparent the roles of interviewer and respondent, for example. in one social group may operate differently among And role playing might also involve the use of various other social groups and that personality factors, so- 'respondent scripts" that present potential problems for cial identity, and cultural norms can have a profound the interviewer to practice handling. impact on the nature of many social psychological phenomena. SUPERVISION. Carefully monitoring ongoing data And yet, the bulk of research in the field contin- collection permits early detection of problems and ues to be conducted with a very narrow, homogeneous seems likely to improve data quality. In self- base of participants - the infamous college sophomores. administered surveys, researchers should monitor in- As a result, some scholars have come to question the coming data for signs that respondents are having generalizability of social psychological findings, and trouble with the questionnaire. In face-to-face or tele- some disciplines look with skepticism at the bulk of phone surveys, researchers should maintain running our empirical evidence. Although we know a lot about estimates of each interviewer's average response rate, the way college students behave in contrived labora- level of productivity, and cost per completed interview. tory settings, such critics argue, we know considerably The quality of each interviewer's completed ques- less about the way all other types of people think and tionnaires should be monitored, and if possible, some behave in their real-world environments. SURVEY RESEARCH 247

In this chapter, we have provided an overview of Ahlgren, A. (1983). Sex differences in the correlates of co- a research methodology that pennits social psycholo- operative and competitive school attitudes. Devr&pmrntal gists to address this concern about the meaning and Psycholology, 19,881-888. value of our work. Surveys enable scholars to explore Alwin, D. F., Cohen, R. L., 6. Newcomb, T. M. (1991). The social psychological phenomena with samples that ac- women of Bmningtmr: A of political orientations over the lifr span. Madison: University of Wiscansin Press. curately represent the population about whom gen- Alwin, D. F., 6. Jackson, D. J. (1982).Adult values for chil- eralizations are to be made. And the incorporation of dren: An application of factor analysis to ranked prefer- experimental manipulations into survey designs offers ence data. In K. F. Schuessler (Ed.),Sociologicalmethodology special scientific power. We believe that these advan- 1980. San PransW Jossey-Bass. tages of survey research make it a valuable addition Babbie, E. R. (1990). Survcy research methods. Belmont, CA: to the methodological arsenal available to social psy- : Wadsworth. chologists. The incorporation of this methodology into Bardn, R. M., 6. Kenny, D. A. (1986). The moderator- a full program of research enables researchers to trian- mediator variable distinction in social psychological re- gulate aaoss measures and methods, providing more search: Conceptual, strategic, and statistical considera- compelling evidence of social psychological phenom- tions. Journal of Personalily and Soda1 Psychology, 51.1 173- 1182. ena than any single methodological approach can. R. this Bass, R. T., 6 Tortora, D. (1988). A comparison of cen- Perhaps a first step for many scholars in direc- tralized CAT1 facilities for an agricultural labor survey. In tion might be to make use of the many archived sur- R. M. Groves, P. P. Beimer, L. E. Lyberg, J. T. Massey, W. vey data sets that are suitable for secondary analysis. L. Nicholls, 6. J. Waksberg (Eds.), Trlephonesumy method- The University of Michigan is the home of the Inter- ow(pp; 497-508). New York: Wiley. university Consortium for Political and Bearden, W. Q., Netemeyer, R. G., 6. Mobley, M. F. (ICPSR), which stores and makes available hundreds of (1993). Handbook of scales. Newbury Park, CA: survey data sets on a wide range of topics, dating back sage. at least five decades. Regardless of what topic is of inter- Ban, D. J., 6 McConneU, H. K. (1970.). Testing the self- est to a social , relevant surveys are likely perception explanation of dissonance phenomena: On the to have been done that could be usef'ully reanalyzed. salience of premanipulation attitudes. Journal of Pmmral- 3 ilyandSodalPrydtology, 1123-31. And the cost of accessing these data is quite minimal Benet, V., 6.Waller, N. G. (1995).The big seven factor model to scholars working at academic that are of personality description: Evidence for its cross-cultural members of ICPSR and only slightly more to others. generality in a Spanishhsample.Journal of Personalily and Also, the Roper Center at the University of Connecti- soda1 Prydtology, 69,701-718. cut archives individual survey questions from thou- Bischoping, K. (1989).An evaluation of interviewerdebrief- sands of surveys, some in ICPSR and some not. They ing in survey pretests. In C. F. Cannell, L. Oskenberg, F. J. do computer-based searches for questions containing Fowler, G. Kalton, 6 K. Bischoping (Eds.), New techniques particular key words or addressing particular topics, for ptrUrting survey quemons (pp. 15-29). Ann Arbor, MI: allowing social psychologists to make use of data sets Survey Research Center. collected as long ago as the 1940s. Bishop, G. F., 6 Fisher, B. S. (1995).'Seaet ballots" and self- reports in an exit-poll experiment. Public Opinion Quar- So even when the cost of conducting an original sur- tn&, 59, 568-588. vey is prohibitive, survey data sets have a lot to offer Blalodr. H. M. (1972). Caw1 infffrmcrr in non-mental social psychologists. We hope that more social psychol- march. New York: Norton. ogists will take advantage of the opportunity to col- Blalock, H. M. ( 1985).Causal models in panel and e.kpmmmrntal lect and analyze survey data in order to strengthen our desi~.New York: Aldine. collective enterprise. Doing so may require somewhat Bogaert, A. F. (1996). Volunteer bias in human sexuality higher levels of funding than we have had in the past, research: Evidence for both sexuality and personality dif- but our theoretical richness, scientific aedibiity, and ferences in males. Archives of Stncal Bchaviot. 25, 125- impact aaoss disciplines are likely to grow as a result, 140. in ways that are well worth the price. Bradbum, N. M., 6. Sudman, S. (1979). Improving inter- view method and questionnaire design. San Francisco: Jossey- Bass. Bradburn, N. M., Sudman, S., 6. Associates. (1981 ). Improv- RBPBRBNCBS ing intrrvinvmethodand qwstt'onnairc &sign. San Francisco: Adomo, T. W., Frenkel-Brunmrik, E., Levinson, D. J., 6. Jossey-Bass. Sanford, R. N. (1950). The authoritarian personalily. New Brehm, J. (1993). Thr phantomrrspondntts. Ann Arbor: York: Harper 6. Row. University of Michigan Press. 248 PENNY S. VISSER, JON A. KROSNICK, AND1 PAUL J. LAVRAKAS

Brehm, J., 6 Rahn, W. (1997). Individual-level evidence for Couch, A., 6 Keniston, K. (1960).Yeasayers and naysayers: the causes and consequences of social capital. American Agreeing response set as a personality variable. Journal of Journal of Political Sciena, 41,999-1 023. Abnormal and Soda1 Psychology, @,15 1-1 74. Bridge, R. G., Reeder, L. G., Kanouse, D., Kinder, D. R., Coye, R. W. (1985). Characteristicsof partidpants and non- Nagy, V. T., 6 Judd, C. M. (1977). Interviewing changes partidpants in experimental research. Psychological &- attitudes - sometimes. Public Opinion Quarterly, 41, 56- ports, 56, 19-2 5. 64. Crano, W. D., 6 Brewer, M. B. (1986).Principalsandmethods Byme, D. (1971). Thr amdm paradigm. New York: of social research. Newton, MA: AUyn and Bacon. Academic Press. Dickinson, T. L., 6 Zellinger, P. M. (1980). A compari- Campbell, D. T., 6 Fiske, D. W. (1959). Convergent and di- son of the behaviorally anchored rating and mixed stan- vergent validation by the multitrait-multimethod matrix. dard scale formats. Journal of Applied Psychology, 65, 147- Psychological Bulletin, 56,881-105. 154. Campbell, D. T., 6 Stanley, J. C. (1963). Experimental and Dillman, D. A. (1978). Mail and telephone surwys: Thr total quasi-experimental designs for research. In N. L. Gage design method. New York: Wiley. (Ed.), Handbook of march on teaching (pp. 171-246). Dollinger, S. J., 6 Leong, F. T. (1993). Volunteer bias and Chicago: Rand McNally. the five-factor model. Journal of Psychology, 127,29-36. Cannell, C. F., Miller, P., 6 Oskenberg, L. ( 1981). Research Donovan, R. J., 6 Leivers, S. (1993). Using paid advertising on interviewing techniques. In S. Leinhardt (Ed.), Sodo- to modify racial stereotypebeliefs. Public Opinion Quarterly, logical methodology (pp. 389-4371. San Prandsco: Jossey- 57,205-218. Bass. Ebel, R. L. (1982). Proposed solutions to two problems of Cannell, C. F., Oskenberg, L., 6 Converse, J. M. (1977). test construction. Journal of Educational Measurement, 19, lhprirnmts in intcnkving techniques: PirId orprrinrmts in 267-278. health reporting. 1971-1977. Hyattsville, MD: National Ehrlich, H. J. (1964). Instrument error and the study of prej- Center for Health Services Research. udice. Soda1 PO=, 43,197-206. Caspi, A., Bem, D. J., 6 Elder, G. H., Jr. (1989). Continuities Eifermann, R. R. (1961). Negation: A linguistic variable. and consequences of interactional styles aaoss the life ACra Psychologica, 18,258-273. course. Joumal of Personality, 57,375-406. Elig, T. W., 6 Frieze, I. H. (1979). Measuring causal attribu- Chew S. (1988). Subjective quality of life in the planning tions for success and failwe. Journal ofPmonaltyandSocia1 and evaluation of programs. Bvaluath and l4vgram Plan- psvchology, 37,62 1-634. ning, 11. 123-134. England L. R. (1948). Capital punishment and open-end Cohen, D., Nisbett, R. E., Bowdle, B. P., 6 Schwm, N. questions. Public Opinion Quarterly, 12,412416. (1996). Insult, aggression, and the southern culture of Finkel, S. E., Guterbock, T. M., 6 Borg, M. J. (1991).Race- honor: An 'experimental .." Journal of P.- of-interviewereffects in a preelection poll: Virginia 1989. ality and Soda1 Psychology, 70,945-960. Public Opinion Quarterly, 55, 31 3-330. Congressional Information Service. (1990). Anvrican statis- Forsyth, B. H., 6 Lessler, J. T. (1991). Cognitive laboratory tical inda. Bethesda, MD: Author. methods: A taxonomy. In P. Biemer, R. Groves, L. Lyberg, Converse, J. M., 6 Presser, S. (1986). ~q~:Hand- N. Mathiowetz, 6 S. Sudman (Eds.), Measumnrnt mor in crafh'ng the standardized qwriirmnain. Beverly Wls. CA: surwys (pp. 393-418). New York: Wiley. Sage. Fowler, F. J. (1988). Surwy march methods (2nd 4.). Converse, J. M., b Schuman, H. (1974). CkmmaZionsat ran- Beverly Hills, CA: Sage. dom. New York: Wiley. Fowler, F. J., Jr., 6 Mangione, T. W. (1986). Reducing in- Converse, P. E. (1964). The nature of belief systems in the terviewer effccts on health surwy data. Washington, DC: mass public. In D. E. Apter (Ed.), Zdcology and discontent National Center for Health Statistin. (pp. 206-261). New York: Free Press. Fowler, F. J., Jr., 6 Mangione, T. W. (1990). Standardized Cook, A. R., 6 Campbell, D. T. (1969). Quusi-ts: surwy inm'cwing. Newbury Park, CA: Sage. Design and analjwis busforprld settings. Skokie, IL: Rand Prey, J. H. (1989).Sun*yresemch by telephone (2nd 4.).New- McNally. bury Park, CA: Sage. Coombs, C. H., 6 Coombs, L. C. (1976). 'Don't know': Geer, J. G. (1988). What do open-ended questions measure? Item ambiguity or respondent uncertainty? Public Opin- Public Opinion Quarterly, 52, 365-371. h Qlcarterly,40,497-5 14. Glenn, N. 0. (1980). Values, attitudes, and beliefs. In Cooper, D. R., 6 Clare, D. A. (1981). A majpitude estima- 0. G. Brim 6 J. Kagan (Eds.), Constancy and change in hu- tion scale for human values. Psychologic111Reports, 49,43 1- man development (pp. 596440). Cambridge, MA: Harvard 438. University Press. Costa, P. T., McCrae, R. R., 6 Arenberg, D. (1983). Recent Granberg, D. (1985). An anomaly in political perception. longitudinal research on personality and aging. In K. W. Public Opinion Quarterly, 49, 504-5 16. Schaie (Ed.), Longf'Zudina~sZudbofadult psychological &I- Granberg, D., 6 Holmberg, S. (1992). The Hawthorne ef- opmmt (pp. 222-263). New York: Guilford Press. fect in election studies: The impact of survey partidpation SURVEY RESEARCH 249

on voting. British Journal of Politicul Scim, 22, 240- Kenny, D. A. (1979). Cornlation and carcsaliiy. New York: 247. Wiley. Greenwald, A. G., Carnot, C. G., Beach, R., 6 Young, B. Kessler, R. C., 6 Greenberg, D. P. (1981).Linear panel anal- ( 1987). Increasing voting behavior by asking people if @: Mo&k of quantitatiw change. New York: Academic they expect to vote. Journal of Applied Psychology, 72, 31 5- Press. 318. Kinder, D. R. (1978).Political person perception: The asym- Groves, R. M. (1989). Survey errors andsurvey mts. New York: metrical influence of sentiment and choice on perceptions Wiley. of presidential candidates. Journal of Personaliiy and Social Groves, R. M., 6 Kahn, R. L. (1979). Su~cyrby telephone: Psychology, 36,859-871. A national eonzpariron with personal indrrvinus. New York: Kinder, D. R., 6 Sanders, L. M. (1990). Mimicking polit- Wiley. ical debate within survey questions: The case of White Han, S., 6 Shavitt, S. (1994).Penuasionand culture: Adver- opinion on affirmative action for Blacks. Social Cognition, tising appeals in individualistic and collectivist sodeties. 8, 73-103. Journal of Experimental and Soda1 Psycholo#, 30,326350. Kish, L. (1965).. New York: Wiley. Hansen, M. H., 6 Madow, W. G. (1953).Survey methodsand Kitayama, S., 6 Markus, H. R. (1994). Emotion and CUE themy. New York: Wiley. tun: Empirical studies of mutual injluenac. Washington, DC: Heine, S. J., 6 Lehman, D. R. (1995).Cultural variation in American Psychological Association. unrealistic optimism: Does the west feel more invulnaa- Klockars, A. J., 6 Yamagishi, M. (1988).The influence of ble than the east? Journalof Personality andSonaIPsychology, labeIs and positions in rating scales. Journal of Eduational 68, 595-607. Measumnmt, 25,85-96. Henry, G. T. ( 1990). Pracfhl sampling. Newbury Park, CA: Kraut, R. E., 6 McConahay, J. B. (1973).How being in- Sage. terviewed affects voting: An experiment. Public Opinion Hewstone, M., Bond, M. H., &Wan, K. C. (1983).Sodalfac- Qumtwly, 37, 398406. ton and social attribution: The explanation of intergroup Krosnidc, J. A. (1988a)-Attitude importance and attitude differences in Hong Kong. Social Cognition, 2, 142-157. change. Journal of EXpcrimental Social Psychology, 24,240- Himmelfarb, S., 6 No&, P. H. (1987).An examination of 255. testing effects in a panel study of older persons. PmmIiiy Krosnick, J. A. (1988b). The role of attitude importance and Social Psycho&# Bulletin, 13,188-209. in social evaluation: A study of policy preferences, Hothersall, D. (1984). History of psychology. New York presidential candidate evaluations, and voting behav- Random House. ior. Journal of Pmonaliiy and Social Psycho&#, 55, 196 Hovland, C. I., Harvey, 0. J., 6 Sherif, M. (1957).Asdmi- 210. lation and contrast effects in reactions to communication Krosnick, J. A. ( 1991a). Americans' perceptions of presiden- and attitude change. Journal of Personaliiy and Social m- tial candidates: A test of the projection hypothesis. Journal chology, 55,244252. of Soda1 Zmm, 46,159-182. Hurd, A. W. (1932).Comparisons of short answer and multi- Krosnick, J. A. (199 lb). Response strategies for coping with ple choice tests covering identical subject content. Journal the cognitive demands of attitude measures in surveys. of Educational Psychology, 26,28-30. Applied Cognitive Psychology, 5,213-236. Hyman, H. A., Peldman, J., 6 Stember, C. ( 1954). hWviw- Krosnick, J. A., 6 Alwin, D. P. (1987).An evaluation of a ing in soda1 research. Chicago: University of Chicago Press. cognitive theory of response order effects in survey mea- Jackman, M. R. (1973,June). Education and prejudice or surement. Public Opinion Quarterly, 51,201-219. education and response-set? American Soab&gical Review, Krosnick, J. A., 6 Alwin, D. P. (1988).A test of the form- 38,327-339. resistant correlation hypothesis: Ratings, rankings, and Jackson, J. B. ( 1979). Bias in dosed-ended issue questions. the measurement of values. Public Opinion Quartrrb,52, Political Methodology, 6, 393424. 526538. James, L. R., 6 Singh, B. H. (1978).An introduction to the Krosnick, J. A., 6 Alwin, D. F. (1989).Aging and suscep- logic, assumptions, and the basic analytic procedures of tibility to attitude change. Journal of Pmnaliry and Social two-stage least squares. Psychologiurl Bulletin, 85, 1104- Psychology, 57,416425. 1122. Krosnick, J. A., 6 Berent, M. K. (1993). Comparisons of Jenkins, J. G. (1935).Psydro&#inbusinessandindwtry. New party identification and policy preferences: The impact of York: Wiley. survey question format. American Journal of Political Sd- Judd, C. M., 6 Johnson, J. T. (1981).Attitudes, polarization, ma^, 37,941-964. and diagnosticity: Exploring the effect of affect. Journal of Krosnick. J. A., Boninger, D. S., Chuang, Y. C., Berent, M. K., Personality and Social Psychology, 41,2636. 6 Camot, C. G. (1993).Attitude strength: One construct Kalton, G. (1983).Zntroductitm to survey sampling. Beverly or many related constructs?JournalofPmnaliiy and Soda1 Wls, CA: Sage. Psychology, 65, 1132-1151. Katz, D. (1942). Do interviewen bias poll results? Public Krosnick, J. A., 6 Brannon, L. A. (1993).The impact of the Opinion Quarterly, 6, 248-268. Gulf War on the ingredients of presidential evaluations: 250 PENNY S. VISSRR, JON A. KROSMCK, ANDI PAUL J. LAVRAWS

Multidimensional effects of political involvement. Mosteller, P., Hyman, H., McCarthy, P. J., Marks, B. S., 6 ean RPMRPMcw,87,963-975. Ihunan, D. B. ( 1949). The pre-e&& polls of 1948: Report Krosnick, J. A., 6 Pabrigar, L. R. (in press). Designing great to the Gntunitz2e on Analysis of Pre-EIeftr'onPolls and Forecasts. question~im:Insights from psychology. New York: Oxford New York: Sodal Science Research Council. University Press. Munson, J. M., 6 McIntyre, S. H. ( 1979). Developing prac- Krosnick, J. A., 6 Kinder, D. R. (1990). Altering popu- tical procedures for the measurement of personal values lar support for the president through priming: The Iran- in cross-cultural marketing. Journal of Marketing Research, Contra affair. Political Science 84, 497- 16.48-52. 512. Myers, J. H., 6 Warner, W. G. (1968). Semantic properties Krosnick, J. A., 6 Petty, R. E. (1995).Attitude strength: An of selected evaluation adjectives. Journal of Marketing Re- overview. In R. E. Petty 6 J. A. Krosnick (Eds.), Attitude search, 5,409412. strength: Antccedmts and mequmm (pp. 1-24). We, Nathan, B. R., 6 Alexander, R. A. (1985). The role of infer- NJ: Erlbaum. ential accuracy in performance rating. Acadnny of Manage- Laumann, E. 0.. Michael, R. T., Gagnon, J. H., 6- Michaels, mentRrvinu, 10, loe115. S. ( 1994). Thr soda1 organization of sclacoliiy: Sancal prac- Nelson, D. (1985). Informal testing as a means of question- tices in the United States. Chicago: University of Chicago naire development. Journal of 0malStatistia, 1, 179-1 88. Press. Nesselroade, J. R., 6 Baltes, P. B. (1974). Adolescent per- Lavrakas, P. J. (1993). Telephone survey methods: Sampling, sc- sonality development and historical change: 1970-1972. kction, and supervision (2nd ed.). Newbury Park, CA: Sage. Monographs of the Society for Research in Child Dtwlopmmt, Lin, I. P., 6 Shaeffer, N. C. (1995). Using survey participants 39 (No. 1, Serial No. 154). to estimate the impact of nonpartidpation. Public Opinion Nisbett, R. E. (1993). Violence and U.S. regional culture. Quartrrly, 59,236-258. AmmAmmeanPsychologist, 48,441-449. Lindzey, G. E., 6 Guest, L. (1951).To repeat - chmcan Nisbett, R. E., 6 Cohen, D. (1996). Culture of honor: The be dangerous. Public Opinion Quarterly, 15, 355-358. psycho&gy of Violence in the south. Boulder, CO: Westview Lodge, M. ( 1981). Magnitude sealing: Quantitatiw measurement Press. of opinions. Beverly Hills, CA: Sage. Nisbett, R. B., 6 Wilson, T. D. (1977).Telling more than we Markus, H. R., 6 Kitayama, S. (1991). Culture and the can know: Verbal reports on mental processes. PsychologV

self: Implications for cognition, emotion, and motiva- &, 231-259. I tion. Journal of Pmonali@ and Soda1 PrychoLyy, 98, 224- Patrick, D. L., Bush, J. W., 6 Chen, M. M. (1973). Meth- I 253. ods for measuring levels of well-being for a health status I Marquis, K. H., Cannell, C. P., 6 Laurent, A. (1972). Report- index. Health Scrvim Research, 8,228-245. ing for health events in household interviews: Effects of Payne, S. L. (194911950). in question complex- reinforcement, question length, and reintervim. In Vital ity. Public Opinion Quarterly, 13,653-658. 1 and health statisth (Series 2, No. 45) (pp. 1-70). Washing- Peffley, M., 6 Hurwitz, J. (1997).Public perceptions of race l ton, DC: U.S. Government Printing Offlce. and crime: The role of racial stereotypes. Ameriun Journal I Matell, M. S., 6 Jacoby, J. (1971).Is there an optimal num- of Political Sn'enac, 41, 375-401. ber of alternativesfor items? Study I: Reliabil- Peffley, M., Hurwitz, J., 6 Sniderman, P. M. (1997). Racial ity and validity. Edmatio~Iand mlogiculMeasurrmmt, stereotypes and Whites' political views of Blacks in the 31.657474. context of welfare and crime. American Journal of Political McClendon, M. J. (1991). Acquiescence and recency Scimce, 41, 3060. response-order effects in interview surveys. Sodolqhzl Petty, R. E., 6 Cacioppo, J. T. (1986). Communication and Methoak and Research, 20,60-103. pmuasion: Central and peripheral routes to attitude change. McClendon, M. J., 6 Alwin, D. P. (1993). No-opinion filters New York: Springer-Verlag. and attitude measurement reliability. Sodological Methods Presser, S., 6 Blair, J. (1994). Survey pretesting: Do differ- and Research, 21,438464. ent methods produce different results? In P. V. Marsden Miethe, T. D. (1985). The validity and reliability of value (Ed.), Socblogicul methodology, 1994 (pp. 73-104). Cam- . Journal of Pmonaliv, 119,441453. bridge, MA: Blackwell. Mirowsky, J., 6 Ross, C. E. (1991). Eliminating defense Rahn, W. M., Krosnick, J. A., 6 Breuning, M. (1994). Ra- and agreement bias from measures of the sense of con- tionalization and derivation processes in survey studies of trol: A 2 x 2 index. Soda1 Psychology QwrWly, 54, 127- political candidate evaluation. Amdean Journal of Political 145. Sam,38, 582400. Mortimer, J. T., Pinch, M. D., 6 Kumka, D. (1982). Penis- Rankin. W. L., 6 Grube, J. W. (1980). A comparison of rank- tence and change in development: The multidimensional ing and rating procedures for value system measurement. self-. In P. B. Baltes 6 0. G. Brim, Jr. (Bds.), Lifr- European Journal of Social l?ycho&gy,10,233-246. span development and behavior (Vol. 4, pp. 263-312). New Remmers, H. H., Marschat, L. E., Brown, A., 6 Chapman, York: Academic Press. I. (1923).An experimental study of the relative dif5culty SURVEY RESEARCH 251

of true-false, multiple-choice, and incomplete-sentence Smith, T. W. (1987).That which we call welfare by any other types of examination questions. Journal of Bducational Psy- name would smell sweeter: An analysis of the impact chology, 14,367-372. of question wording on response patterns. Public Opinion Reynolds, T. J., 6 Jolly, J. P. (1980). Measuring personal Quartrrly, 51,7543. values: An evaluation of alternative methods. Journal of Sniderman, P. M.,6 Tetlock, P. E. (1986).: Marketing Research, 17,53 1-536. Problems of motive attribution in political analysis. Jour- Rhee, E., Uleman, J. S., Lee, H. K., 6 Roman, R. J. (1995). nal of Sodal Issues, 42, 129-1 50. Spontaneous self-descriptionsand ethnic identities in in- Snidennan, P. M., 'ktlock, P. E., 6 Peterson, R. S. (1993). dividualistic and collectivist cultures. Journal of Prrsonality Radsm and . Politics and the Individual, 3,

and Social Psjvhology, 69, 142-1 52. I 1-28. Roberto, K. A., 6 Scott, J. P. (1986).Confronting widow- Stapp, J., 6 Fulcher, R. (1983). The employment of hood: The influence of informal supports. A-can &- APA members: 1982. Amerifan Psychologist, 38, 1298- Moral Seimrist, 29,497-5 1 1. 1320. Robinson, D., 6 Rohde, S. (19%). TWO experiments with Sudman, S. (1976). Applied sampling. New York: Academic an anti-Semitism poll. Journal of Abnormal and Soda1 Psy- Press. chology, 41,136-144. Sudman, S., 6 Bradbum, N. M. (1982).Askingquestions. San Ross, M. (1989). Relation of implidt theories to the con- Frandsco: Jossey-Bass. struction of personal histories. Pychological RNinv, 96, Sudman, S., Bradbum, N. M., 6 Schwa, N. (1996).Think- 341-357. ingabout answers: Thr application of cognitiwpr~esto survey Ross, M. W. (1988). Prevalence of classes of risk be- mcrhodology. San Prandsco: Jossey-Bass. haviors for HIV infection in a randomly selected Aus- %ylor, J. R., 6 Kinnear, T. C. (1971).Numerical compari- tralian population. Journal of Sex Research, 2% 441- son of alternative methods for collecting proximity judge- 450. ments. AmmAmmunMarketing Association Proceeding of the Fall Ruch, G. M., 6 DeGraff, M. H. (1926). Corrections for Crf~ttt@,547-550. chance and 'guess" vs. 'do not guess" instructions in Thornberry, 0. T., Jr., 6 Massey, J. T. (1988).Rends in multiple-response tests. Journal of Educational Psycho@, United States telephone coverage across time and sub- 17,368-375. groups. In R. M. Groves, P. P. Biemer, L. E. Lyberg, Saris, W. E. (199 1). Computer-assisted interw'ewing. Newbury J. T. Massey, W. L. Nicholls, 6 J. Waksberg (Eds.), Park, CA: Sage. Tckphone (pp. 25-50). New York: Schuman, H., 6 Converse, J. M. (197 1). The effects of Black Wiley. and White interviewerson Black responses in 1968. Public Tourangeau, R., 6 Rasinski, K. A. (1988).Cognitive pro- Opinion Quarterly,33x44-68. cesses underlying context effects in attitude measure- Schuman, H., Ludwig, J., 6 Krosnick, J. A. (1986).The per- ment. Psychological Bulletin, 103,299-3 14. ceived threat of nuclear war, salience, and open questions. hugott, M. W., Groves, R. M., 6 Lepkowski, J. M. Public Opinion Quarterly, 50, 519-536. (1987).Using dual frame designs to reduce nonresponse Schumau, H., 6 Presser, S. (1981).Qucsrions and ansum in in telephone surveys. Public Opinion Quarterly, 51, 522- attit& sum.San Diego, CA: Academic Press. 539. Schumaa H., Steeh, C., 6 Bobo, L. (1985). Radal atti- Tziner, A. ( 1987). Congruency issues retested using Rine- tudes in America: bmds and interpretations.Cambridge, MA: man's achievement climate notion. Journal of Social Be- Harvard University Press. havior and Pmonality, 2, 63-78. Schwa, N., 6 Clore, G. L. (1983).Mood, misattribution, Visser, P. S., Krosnidc, J. A., Marquette, J., 6 Curtin, M. and judgments of well-being: Informative and directive (1996).Mail surveys for election forecasting? An evalu- functions of affective states. Journal of Pmonality and Sodal ation of the Columbus Dispatch poll. Public Opinion Quar- Psychology, 45, 5 1 3-523. terly, 60,181-227. Sears, D. 0. (1983). The persistence of early political Wason, P. C. ( 1961 ). Response to annative and negative predispositions: The role of attitude object and Me binary statements. British Journal of Psychology, 52, 133- stage. In L. Wheeler (Ed.), of pmmlity and so- 142. dal psychology (Vol. 4, pp. 79-116). Beverly Hills, CA: Webster, D. M., 6 Kruglanski, A. W. (1994).Individual dif- Sage. ferences in need for cognitive closure. Journal ofPmonality Sears, D. 0. (1986).College sophomores in the laboratory: and Soda1 Psycholofl, 67, 1 049-1 062. Influences of a narrow data base on social psychology's Wegener, D. T., Downing, J., Krosnick, J. A., 6 Petty, R. E. view of human nature. Journal of Pmonality and SodalPsy- (1995).Measures and manipulations of strength-related chology, 51,515-530. properties of attitudes: Current practice and future di- Smith, T. W. (1983). The hidden 25 percent: An analysis rections. In R. E. Petty 6 J. A. Krosnick (Eds.),Attitude of nonresponse in the 1980 . Public strength: Antecedmts and mqurnm (pp. 45 5-487).Hills- Opinion Quarterly, 47,386404. dale, NJ: Erlbaum. art PENNY S. VISSBR, JON A. KROSNICK, AND PAUL J. LAVRAKAS

Weisberg, H. F., Haynes, A. A., & Krosnick, J. A. (1995). Wm,A., & Warneryd, B. (1990). Measurement errors Sodal group polarization in 1992. In H. F. Weisberg in survey questions: Explaining response variability. Soda1 (Ed.), Democracy'sfrast:Elebrons in America (pp. 241-249). Indicators Research, 22, 199-2 12. Chatham, NJ: Chatham House. Waer, J. D., Kanouse, D. E., & Ware, J. E., Jr. (1982). Weisberg, H. F., Krosnick, J. A., & Bowen, B. D. (1996). An Controlling for acquiescence response set in scale devel- introduction to survey march, polling, and hta analysis (3rd opment. Journal of Applied Psychology, 67, 555-561. ed.). Newbury Park, CA: Sage. Wiseman, F. (1972). Methodological bias in public opinion Wesman, A. G. ( 19%). The usefulness of correctly spelled surveys. Public Opinion Quarterly,36, 105-108. words in a spelling test. Journal of Educational Psychology, Yalch, R. F. (1976). Pre-election interview effects on voter 37,242-246. turnouts. Public Opinion Quarterb 40, 331-336.