<<

View metadata, and similar papers at core.ac.uk brought to you by CORE

provided by School of Hotel Administration, Cornell University

Cornell University School of Hotel Administration The Scholarly Commons

Articles and Chapters School of Hotel Administration Collection

4-2003

Experiments and Quasi-: Methods for Evaluating Marketing Options

Ann Lynn Ithaca College

Michael Lynn Cornell University, [email protected]

Follow this and additional works at: https://scholarship.sha.cornell.edu/articles

Part of the Hospitality Administration and Management Commons

Recommended Citation Lynn, A., & Lynn, M. (2003). Experiments and quasi-experiments: Methods for evaluating marketing options [Electronic version]. Cornell Hotel and Restaurant Administration Quarterly, 44(2), 75-84. Retrieved [insert date], from Cornell University, School of Hospitality Administration site: http://scholarship.sha.cornell.edu/articles/109/

This Article or Chapter is brought to you for free and open access by the School of Hotel Administration Collection at The Scholarly Commons. It has been accepted for inclusion in Articles and Chapters by an authorized administrator of The Scholarly Commons. For more , please contact [email protected].

If you have a disability and are having trouble accessing information on this website or need materials in an alternate format, contact [email protected] for assistance. Experiments and Quasi-experiments: Methods for Evaluating Marketing Options

Abstract [Excerpt] Hospitality executives have available a number of different and tools to aid them in decision making. Each is valuable in its own way, but no single technique can provide all the answers to decision makers' questions. This article advocates the increased use of experiments and quasi-experiments in hospitality-marketing research. To be as effective as possible, marketers should develop and test several potential courses of action before embarking on any of them. The best way to develop a variety of courses of action is to conduct exploratory or descriptive research. The best way to evaluate those options is to conduct a causal-research study that compares consumers' behavior when faced with various options. Experiments and quasi-experiments are under-used, but are nevertheless powerful research tools that allow hospitality marketers to draw strong causal conclusions about the effects of pricing, design, and other changes on the amount of money customers spend, or the number of visits they make to an establishment.

Keywords experiments, methods, , hospitality industry

Disciplines Hospitality Administration and Management

Comments Required Publisher Statement © Cornell University. Reprinted with permission. All rights reserved.

This article or chapter is available at The Scholarly Commons: https://scholarship.sha.cornell.edu/articles/109 Experiments and Quasi-experiments: Methods for Evaluating Marketing Options

Hospitality managers could achieve greater success with marketing initiatives using experiments or quasi-experiments to test those initiatives

BY ANN LYNN AND MICHAEL LYNN

ospitality executives have available a number of dif- for promising courses of action. These techniques do not, ferent research methodologies and tools to aid them however, allow researchers to draw conclusions about cause- H in decision making. Each methodology is valuable and-effect relationships. Consequently, these techniques are in its own way, but no single technique can provide all the of limited usefulness in revealing how effective a specific po- answers to decision makers’ questions. Exploratory research tential action will be. That is the province of causal research (such as focus groups and depth ’) and descriptive methods, such as choice modeling and experimentation, which research (such as surveys2 or naturalistic observations3) can can help decision makers draw conclusions about the effects, provide insight and understanding about business problems benefits, and influences of their prospective actions. Explor- and opportunities and thereby guide decision makers’ search atory, descriptive, and causal research methods have a place in every functional area of hospitality businesses. In this ar- ticle we focus on the use of causal-research methods in hospi- 1 See: Robert J. Kwortnik, Jr., “Clarifying ‘Fuzzy’ Hospitality- tality marketing. management Problems with Depth Interviews and Qualitative Analysis,” on pages 117-129 of this issue of Cornell Quarter&. Although systematic on the use of different types of research in marketing are not available, several informal sources ’ See: Matthew Schall, “Best Practices in the Assessment of Hotel-guest Attitudes,” on pages 51-65 of this issue of Cornell Quarterb. suggest that marketers often rely on exploratory and descrip-

3 See: Kate Walsh, “: Advancing the Science and Prac- tice of Hospitality,” on pages 6674 of this issue of Cornell Quarter&. 0 2003, CORNELL UNIVERSITY

APRIL 2003 Cornell Hotel and Restaurant Administration Quarterly 75 FOCUSONRESEARCH EXPERIMENTS AND QUASI-EXPERIMENTS I

tive research, but rarely use causal research. For Section two contains a brief description of two example, a web-based search by topic of Quir& causal-research methods-namely, true experi- Marketing Research Review from 1986 to 2001 ments and quasi-experiments-along with a dis- indicated that this magazine has published 162 cussion of their strengths and weaknesses. articles on focus groups and 59 articles on tele- Section three contains a discussion of issues re- phone interviewing and mail surveys, but only lating to conducting experiments and quasi- 22 articles on choice modeling (i.e., conjoint or experiments and interpreting their results, which trade-off analysis) .* “Experiments” was not even should give the reader an understanding of how listed as a search topic. Assuming that the fre- to conduct and evaluate this type of research. quency with which different research methods are written about in marketing-research maga- The Need for Causal Research Ideally, hospitality marketers would first conduct exploratory and descriptive research to get an We advocate the increased use of understanding of marketing problems or oppor- tunities and would use this information to de- experiments and quasi-experiments velop multiple courses of action that they believe will address those problems or capitalize on those in hospitality-marketing research. opportunities. The proposed courses of action would then be systematically tested to discover whether they actually influence consumption zines roughly reflects the with which behavior. Too often, however, marketers conduct those methods are used by marketing research- only exploratory or descriptive research (as de- ers, the data from Quirk? would indicate that 90 scribed above) and then develop just one course percent of marketing research is exploratory or of action based on what they learn from those descriptive and only 10 percent is causal. Similar exercises. Kevin Clancy and Peter Krieg charac- estimates were obtained from a query of terize this failure to develop and test several mar- the founders of two large firms engaged in keting options as a form of “death-wish market- marketing research for the hospitality industry. ing.“’ The problem with this practice is that the One estimated that 95 percent of hospitality- marketplace is so complex that no single course marketing-research expenditures are devoted to of action, even ifwell grounded in an understand- exploratory or descriptive research, while the ing of the marketplace, is assured of producing other estimated that 80 percent of hospitality- the desired outcomes. In fact, marketers have a research budgets are for exploratory or descrip- history of failing more than they succeed. Con- tive research.5 It is clear to us that causal sider the following compiled by Clancy methods such as experiments are a rarity in and Krieg: hospitality-marketing research today. the average brand loses market share In this article we advocate the increased use each year, of experiments and quasi-experiments in 90 percent of new products fail within hospitality-marketing research. The article is di- three years, vided into three sections. Section one contains the average advertisement returns only an explanation of why marketers should use 1 to 4 percent on the investment made in it, causal research methods to evaluate the effects only 16 percent of trade promotions on consumers of different marketing actions. generate a profit, and the average firm satisfies less than 80 * Quirks Marketing Research Review can be searched at percent of its customers.7 www.quirks.comlarticles/search.asp.

5 The experts queried were Stanley Hog, founder of Plog Research, and Peter Yesawich, president and CEO of 6 Kevin J. Clancy and Peter C. Krieg, Counter-intuitive Yesawich, Pepperdine & Brown. The differences in their Marketing (New York: Free Press, 2000). estimates probably reflect differences in the work done by their respective firms. 7 Ibid.

76 Cornell Hotel and Restaurant Administration Quarterly APRIL 2003 EXPERIMENTS AND QUASI-EXPERIMENTS FOCUSONRESEARCH I

These statistics suggest that there is enormous room for improvement in market research. If marketers rigorously tested various marketing options before settling on a course of action, we Quasi-experiments: A class of common field-research techniques in believe they could identify where failure will oc- which at least one treatment is manipulated and there is at least one cur before encountering it first hand. comparison. The difference between quasi- and true experiments is On the rare occasions that marketers do test that in quasi-experiments consumers are not randomly assigned to different marketing options or evaluate specific treatments. marketing actions already undertaken, they of- ten use exploratory or descriptive research : Assignment of consumers to treatments in such methods that are poorly suited to support con- a way that each consumer has an equal chance of getting each treatment. clusions about the proposed actions’ effects on consumer behavior. Focus groups and surveys, True : A true experiment conducted in a model for instance, are used to get consumers’ opin- of the real world (a lab). Laboratory experiments are useful in basic ions about advertisements, frequency pro- research in consumer behavior because they can identify and ex- grams, new product ideas, and other market- plain the general conditions that influence consumer choices. While ing options under consideration. Those options laboratory experiments are high in internal validity, they tend to that consumers report liking best are then be low in external validity. implemented or continued. One example of this approach can be found in the Harris Ad True : An experiment conducted in the real world. Service, which surveys a national experiments use random assignment, but do not attempt to control all of adults about how much they like factors extraneous to the ones being manipulated. Field experiments are useful for answering applied hospitality marketing questions various ads being run in the marketplace and because they have high internal validity and high external validity. how effective they think those ads are.* Com-

panies who subscribe to this service are told Type-l error: Concluding that the treatments being tested had an effect how consumer attitudes and opinions about when they really did not. their ads compare to the average of consumer attitudes and opinions about the other ads Type-2 error: Concluding that the treatments being tested had no effect being evaluated. Presumably, companies use when they really did.-A.L. and AM. this information by continuing ads that score well and by discontinuing ads that score poorly. Among the hospitality brands whose ads have tion behavior. However, psychologists have found been evaluated using this service in the past that behavior is affected by many factors and that several years are Avis, Burger King, Domino’s, specific attitudes and beliefs are only weakly pre- Hertz, Holiday Inn, KFC, &Donald’s, Pizza dictive of how people will behave in any given Hut, Priceline.com, and Red Lobster.y situation.‘O For example, attitudes towards an ad One of the problems with this use of descrip- are only weakly related to purchase intentions tive research to evaluate marketing options is that and brand choice.” The weak link between atti- it is based on incorrect assumptions about con- sumer . Using consumers’ attitudes lo See: David G. Myers, “Behavior andAttitudes,” in Social and beliefs to predict how they will react to cer- F’~c~o~~, third edition (New York: McGraw-Hill, 1990), tain marketing initiatives assumes that their atti- pp. 33-68; and A.W. Wicker, “Attitudes versus Actions: The Relationship ofVerbal and Overt Behavioral Responses tudes and beliefs strongly affect their consump- to Attitude Objects,“]ounalofSociallssues, Vol. 25 (1969), pp. 41-78.

r’ See: Stephen I? Brown and Douglas M. Stayman, “An- * Sample Ad Track findings are reported in USA l&&y tecedents and Consequences of Attitude toward the Ad: every Monday and can be found at wwwusatodaycoml money/advertising/adtracWindex.htm. A Meta-analysis,” Journal of Consumer Research, Vol. 19, June 1992, pp. 3451; and Gabriel Biehal, Debra Stephens, 9 This claim is based on the list of Ad Track findings avail- and Eleonora Curlo, “Attitude toward the Ad and Brand able at www.usatoday.com/money/advertising/adtrack/ Choice,” Journal ofAdvertising, Vol. 21, September 1992, indexhtm. pp. 19-36.

APRIL 2003 Cornell Hotel and Restaurant Administration Quarterly ,77 FOCUS ON RESEARCH EXPERIMENTS AND QUASI-EXPERIMENTS I

tudes toward an advertisement and purchase be- elude that those differences must be caused by havior explains why popular ad campaigns, like the treatments. The following are characteristics Taco Bell’s ads that feature a Chihuahua saying, of true experiments: (1) at least one treatment “Yo quiero Taco Bell,” often fail to increase sales.r2 group and one comparison group, (2) at least one Asking consumers to predict their own behavior outcome measure, and (3) random assignment is also unreliable. Psychologists have found that of subjects to treatments. Experiments can be people are not aware of all the factors that affect used to test the effects of different prices, ad ap- their behavior and, most important, that they peals, sales promotions, product changes, or any cannot accurately predict how they will react to other marketing actions being considered on con- events.13 This is one reason that attempts to sell sumer attitudes and, most important, behavior. healthful or low-fat menu items in restaurants For example, an experiment examining the have generally failed despite consumers’ reports effects of two menu designs on a restaurant’s sales that they want and will buy such items.‘* might randomly assign dining parties to receive To achieve company goals, marketers need to one of the two menus by flipping a coin. A party know what the effects of various marketing op- sees one menu design when heads comes up and tions or actions are on consumers’ purchase be- the other menu design when tails comes up. By havior. While exploratory and descriptive research keeping track of which dining parties received can provide information about consumer percep- which menu design, the experimenter can com- tions of, and attitudes toward, marketing options, pare the average check achieved with each menu these techniques cannot answer questions about design. Assuming that the samples involved are how those options will affect consumer behav- large, random assignment of dining parties dis- ior. This is because the underlying causes of tributes their characteristics evenly across the dif- behavior are too complex to be accurately pre- ferent groups and ensures that the groups seeing dicted from attitudes and opinions-even by each menu design (or treatment) really are com- consumers themselves. Fortunately, causal re- parable. Thus, any subsequent difference in av- search methods can answer such questions. In erage check observed between the treatment the sections below we describe and discuss two groups can be attributed to the menu designs, causal research methods-namely, experiments and the experimenter can be confident that she and quasi-experiments. knows which of the two designs will produce the largest sales in that restaurant. Experiments, Quasi-experiments, and True experiments with random assignment to Their Limitations treatments are sometimes impossible or imprac- Experiments are a type of research based on the tical. In this case, one can conduct a quasi- following logic. If you identify two or more experiment, a procedure that has the following groups that are equivalent, expose those groups characteristics: (1) at least one manipulated treat- to different treatments, and subsequently observe ment group and one comparison group15 (2) at differences between the groups on some dimen- least one outcome measure, and (3) nonrandom sion of interest, then you can reasonably con- assignment of subjects to treatments. For ex- ample, a restaurant-chain executive may want to test the effects on sales of a proposed renovation r* Christine R. McLaughlin, “Animals Gone Commercial: of the chains restaurants. Randomly assigning Do They SeII Products?,” as viewed at animaI.discovery.com/ convergence/commerciaIs/marketing_print.htmI. units within the chain to renovation and nonrenovation treatment groups would not be I3 See: David G. Myers, “Intuition: The Power and Limits of Our Inner Knowing,” in Exploring (New practical because renovating enough restaurants York: McGraw-Hill, 1994), pp. 23-30; and RE. Nisbett to make such random assignment meaningful and T.D. Wilson, “Telling More Than We Can Know: would be too costly. In such a case, one could Verbal Reports on Mental Processes,” Psychological Review, Vol. 84 (1977), pp. 231-259.

I4 See: Amy Oplinger, “ Says . . . ,” Pr&, FaU 1998- l5 The comparison group in both experiments and quasi- Winter 1999, pp. 82-85; and Wilbert Jones, “New Wealth experiments can be a different group or the treatment group from Health,” RestaurantHospitaliy, Oct. 1999, pp. 98-102. at a different poinr in time.

78 Cornell Hotel and Restaurant Administration Quarterly APRIL 2003 EXPERIMENTS AND QUASI-EXPERIMENTS FOCUS ON RESEARCH I

use a quasi-experimental design to test the re- design options. l6 For example, the executive could (1) identify a pair of units that are well matched on relevant characteristics such as The procedure for deciding on a sample size described in the accompany- ing article is the correct method. However, the required sample sizes indi- customer demographics and sales, (2) renovate cated by this method are usually large, and marketers often want to avoid one unit in the pair, and (3) compare the sales the costs of working with such large samples. In those cases, marketers achieved by each unit. To the extent that the may be tempted to run an experiment with smaller sample sizes than the matched pair is similar on the characteristics number recommended by standard procedure, analyze the results, and that are most likely to affect the outcome vari- then add additional subjects if a practically meaningful but not statistically ables, this quasi-experimental design provides significant effect is found. a reasonable basis for conclusions about the effects of the renovation without the costs of We advise against this two-step procedure for two reasons. First, the small renovating many units. initial sample sizes may result in chance reductions of the observed effect Although there are no theoretical limits to such that what is in reality a practically meaningful effect appears not to be the size and complexity of experiments and so. In that case, marketers will not add subjects, and the statistical power quasi-experiments, practical considerations needed to avoid this Type-2 error will not be available. Second, the deci- such as cost and the availability of suitable sub- sion to run the experiment with additional subjects only when there is a jects generally restrict such studies to only a sizeable but not significant effect in the initial, small sample the final few conditions. One prominent experimental- test with the larger sample and increases the of a Type-l error. marketing researcher reports that the average If marketers can afford the additional subjects required by the second step experiment he does has about three levels per of this procedure, they should use that larger sample size in the first place.-A. L. and M. L. variable studied. l7 Direct-mail experiments often involve as many as 12 different treat- ments, but direct-mail experiments are typi- cally less expensive than others, so this repre- Issues in Designing Experiments sents the high end of practicable experiment size. Thus, experiments and quasi-experiments and Interpreting Their Results are primarily useful in selecting from among a The important issues that arise when designing relatively narrow of options.‘* marketing experiments and interpreting their results involve three types of validity-those be- ing statistical- validity, internal validity, l6 Interested readers can find a discussion of many quasi- and external validity, l9 These three types of valid- experiment designs in: William R. Shadish, Thomas D. Cook, and Donald T Campbell, Experimental and Quasi- ity refer to the causal conclusions derived from experimental Designs for Generalized (New an experiment?’ Such a conclusion has statistical- York: Houghton Mifflin, 2002). inference validity if the experimenter can rule out I7 Eric Marder, The Laws of Choice (New York: Free Press, chance as an explanation for the absence or ex- 1997). istence of differences between treatment groups. r* If a marketer is interested in identifying the effects of Such a conclusion has internal validity if more than I2 different treatments, experimentation may be less cost-effective than choice modeling. This is not the place for a detailed description of choice modeling. How- I9 Academic researchers are concerned about a fourth type ever, Cornell Quarterly readers can find an overview of one of valid$-known as construct validity. A conclusion has type of choice modeling known as discrete-choice analysis construct validity if the variables being manipulated and in: Rohit Verma, Gerhard Plaschka, and Jordan J. Louvirre, measured in an experiment are correctly identified and la- “Understanding Customer Choices: A Key to Successful beled. This is a concern in basic science where researchers Management of Hospitality Services,” Cornell Hotel and want to make conclusions about general, abstract constructs RestaurantAdministration Quarterly, Vol. 43, No. G (Decem- based on specific, concrete manipulations and measures. ber ZOOZ), pp. 15-24. One thing to keep in mind is that However, in applied marketing research, the variables experiments provide stronger evidence of cause-and-effect researchers want to make conclusions about are generally relationships than do choice models and, therefore, are pref- defined by their , so construct validity erable to choice models as long as their costs are not pro- is not a potential problem. hibitive. However, if an experiment is not possible or would be too expensive, then choice modeling is another causal a0 For a thorough discussion of these types of validity, see: method worth considering. Shadish et al., op. cit.

APRIL 2003 Cornell Hotel and Restaurant Administration Quarterly 79 FOCUSONRESEARCH EXPERIMENTS AND QUASI-EXPERIMENTS I

the experimenter can rule out nonchance causes error is that lowering this probability increases other than the intended treatments as a source the probability of making a Type-2 error. Thus, of differences between treatment groups. Finally, marketers must weigh the relative consequences such a conclusion has external validity if it can of making a Type-l or a Type-2 error when de- be generalized beyond the experimental sample ciding on the acceptable alpha level for a study. and context. The requirements for each of these Another thing to keep in mind is that the types of validity are briefly discussed below along probability of making a Type-l error increases with the implications for marketers who are de- with the number of comparisons being made signing an experiment or interpreting its results. between treatment groups. Assuming that an experiment’s treatments have no real effect, the Statistical-inference Validity probability of making a Type-l error could be 5 A marketer has established statistical-inference percent when making one comparison between validity if he or she has ensured that chance or two treatments, but it may be 50 percent when does not explain the observed difher- using the same alpha level to make 10 different ences among the treatment groups. Statistical- comparisons among multiple treatments. Thus inference validity is threatened by random (or marketers making separate comparisons among chance) variations in the outcome variable of an multiple treatment groups may want to select a experiment. Such variation can lead to one of more-stringent acceptable alpha level than would two fundamental errors when interpreting the those making a single comparison or else choose a experiment’s results. First, chance can increase dif- statistical analysis that helps control the error rate. ferences among treatment groups and lead ex- Sufficient sample sizes. Large samples are less perimenters to conclude that the treatments had susceptible to the vagaries of chance than are small an effect when they really did not. This is known samples. Thus, marketers can reduce the prob- as “Type- 1 error.” Second, chance can decrease ability of making a Type-2 error (using a given differences among treatment groups and lead alpha level) by increasing the sample size. How- experimenters to conclude that the treatments ever, this does not that marketers always had no effect when they really did. This is known need to use large samples. If real treatment ef- as “Type-2 error.” Marketers can reduce these two fects are large or alpha levels are high, then Type- threats to statistical-inference validity by select- 2 errors could be rare even with small samples. ing appropriate acceptable alpha levels, obtain- Since large samples are expensive to obtain, mar- ing sufficient sample sizes, and reducing within- keters should make sure they are needed before treatment-group variability. Each of these using them. methods of reducing statistical error is described To save money, marketers should determine below. the sample size needed to keep the Appropriate acceptable alpha levels. The al- of Type-l and Type-2 errors at desired levels. pha level for a study is the probability of making These calculations can be done by hand or on a Type-l error. The actual alpha is reported on one of several available computer programs.‘l the output of statistical-analysis programs, and To calculate sample size, marketers must specify is sometimes referred to as the “p value.” Mar- (1) the desired probability of Type-2 errors, keters decide what probability of making aTType- (2) the real size of treatment effects, (3) the ac- 1 error is acceptable and conclude that observed ceptable alpha level, and (4) the within-treatment differences between treatments reflect real (non- variability in the outcome measure. Since the size chance) effects only when appropriate statistical of the real treatment effect is generally unknown tests indicate that the probability of making a (that is why an experiment is being conducted Type-l error is tolerable. The conventionally ac- in the first place), marketers should decide what cepted alpha level is p I .05, meaning that the the smallest practically meaningful effect would experimenter is willing to take no more than a 5- be and use that as the . percent chance of accepting an observed effect

as real when it is not. The reason for accepting 21A free, online program that calculates the needed sample some nonzero probability of making a Type-l size is available at http://calculators.stat.ucla.edu/powercalc/

80 Cornell Hotel and Restaurant Administration Quarterly APRIL 2003 EXPERIMENTS AND QUASI-EXPERIMENTS FOCUSONRESEARCH I

Variabili~within treatments. While research- the experimenter must interact with the subject ers hope for variability in the outcome measure after personally delivering the treatment may between treatment groups (that is, the effect of confound the treatment with other experimenter the treatment), variability within treatment actions. has found that ex- groups increases the chances of a Type-2 error. perimenters who knew what treatment subjects Variability within treatments is a measure of the received and who subsequently interacted with consistency of subjects’ responses to the treat- the subjects often unintentionally behaved dif- ment. Ideally, there is low variability within treat- ferently to those in the various treatment ments, which indicates that subjects responded groups. ” of this kind that to the treatments in the same way. If there is high researchers cannot tell whether any differences variability, then observed differences between between the treatment groups in the outcome treatments may be due to chance and not the manipulated treatment. So, one way marketers can reduce the probability of making a Type-2 When designing experiments, keep error is to reduce differences in the outcome vari- in mind that large samples are less able among the subjects within each treatment group. This can be accomplished by increasing susceptible to the vagaries of chance the similarity of the subjects to one another and than are small samples. by increasing the uniformity of the conditions under which data are collected. For example, the restaurant-menu experiment described previously variable were caused by the treatments or by the would have less variability in check size if it used experimenter’s actions. Such post-treatment con- only evening dining parties comprising one male founding can be eliminated by keeping experi- and one female as subjects than if it included menters blind to the subject’s treatment group. lunch and evening dining parties of all composi- Pre-treatment confounding can be eliminated tions. Of course, increasing the similarity of sub- through random assignment of subjects to treat- jects and the uniformity of conditions can com- ments and (barring random assignment) can be promise the generalizability of the results, so one reduced through the of samples and must take this potential shortcoming into ac- the use of other quasi-experimental designs. Each count. More will be said about generalizability of these latter means of promoting internal va- in a subsequent section. lidity is discussed below. Random assignment. Assigning subjects to Internal Vilidity treatments so that each subject has an equal Internal validity is the strength with which one chance of getting in each treatment group pro- can conclude that the manipulated treatment vides the greatest assurance that treatment groups caused the observed changes in the outcome are similar prior to the implementation of the measure. High internal validity occurs when all treatments. As long as sample sizes are large, this alternative explanations for the observed treat- random assignment distributes the subjects’ char- ment effect have been ruled out. Confounded acteristics evenly across the different treatment treatments are the threat to internal validity. groups.23 The larger the sample being randomly Confounding occurs when the treatment groups assigned, the greater the similarity between differ prior to the treatments or when the treat- the resulting groups, but samples of 20 to 30 ments differ in more ways than intended. For example, an experiment in which men get one treatment and women get another confounds the 22 Robert Rosenthal, Experimenter Effects in Behavioral treatment with the sex of the subject. In this case, Research (New York: Appleton-Century-Crofts, 1966). the researcher cannot tell whether any difference *3 More precisely, random assignment makes groups com- between the treatment groups in the outcome parable on the expected, pre-treatment level of the outcome variable. Essentially, it distributes the pre-treatment pro- variable was caused by the treatments or by the pensity to respond on the outcome variable evenly across subjects’ sex. Similarly, an experiment in which groups.

APRIL 2003 Cornell Hotel and Restaurant Administration Quarterly 81 FOCUS ON RESEARCH I EXPERIMENTS AND QUASI-EXPERIMENTS

subjects per treatment group are often sufficient any subsequent difference between the treatment to consider the different treatment groups as groups in average daily sales can be safely attrib- equivalent.‘* uted to the different styles of music. In general, Random assignment of individual consumers researchers can assign many different units (e.g., to treatments is easy when experiments are con- individual consumers, multi-person dining parties, ducted in a laboratory or are conducted via post days, units of a restaurant chain) to treatment or e-mail. In those cases, the experimenter has groups, but should make sure that those units control over which subjects get which treatment. are what are described by the outcome meas_uesz5 In addition, many magazines and television-cable If random assignment of individuals or other companies now have the ability to deliver dis- units of analysis is not practical, marketers can tinct content to various (essentially random) sub- use a quasi-experimental design. To do this the sets of their customers. This allows marketers to marketer must try to anticipate all the variables expose different people to different ads even that might affect the outcome variable and find though they are reading the same magazine or naturally occurring units matched on those vari- watching the same television show. Those people ables. Unfortunately, it is nearly impossible to can then be contacted and asked to provide in- anticipate all the relevant variables and find units formation used to compare the effectiveness of that are perfectly matched thereon. Even if the different ads. matched pairs could be found, it is possible that In some cases, random assignment of individu- factors outside the experimenter’s control could als to different treatments is not possible. For change one of the units during the course of the example, a restaurateur could not randomly as- study and thereby create a new confound. For sign individual dining parties in a field experi- example, a competitor of one of two matched ment that compares the effects on sales of play- restaurants in a quasi-experiment could suddenly ing two different styles of music over the sound close, boost the other restaurant’s sales, and con- system. In such cases, however, it is possible to found the experiment. The internal validity of use different units of analysis and to conduct true this simple quasi-experimental design falls far experiments by randomly assigning those units short of that for a true experiment with random to treatments. A restaurateur could, for example, assignment. There are a variety of more-complex randomly determine which of two different styles quasi-experimental designs that help address dif- of music are played each day for two months and ferent threats to internal validity, and marketers could then compare the average daily sales un- interested in conducting a quasi-experiment der each style of music. In this case, any differ- should consult experts about the options avail- ences between days in the number and type of able. However, the internal validity of quasi- customers or other characteristics will be evenly experiments is never as great as that of true ex- distributed across the two treatment groups and periments, so whenever practical, random assign- ment is the preferred method of assigning sub- jects to treatments. ‘* An Y sample size that produces statistically significant re- suits in an experiment with random assignment is sufficient for random assignment to have worked. As long as subjects External Validity are randomly assigned, any pre-treatment differences in pro- pensity to respond on the outcome variable can be due only External validity is the extent to which an to chance. means that the post-treat- experiment’s results apply or generalize to the real ment differences on the outcome variable are too large to be attributed to chance, so the sample size was (by defini- tion) large enough to rule out pre-existing chance differ- 25In other words, if the unit being randomly assigned is ences between treatment groups. Samples of 20 to 30 sub- days or restaurants then the outcome measure should be jects per treatment are common in academic psychological a daily or restaurant average. There are statistical tech- experiments. However, psychologists are more concerned niques that allow researchers to correctly analyze experi- about the existence of a treatment effect than about its ments where the units ofrandom assignment and the units exact size. Marketers interested in reliable estimates oftreat- of outcome measurement are different, but those are new ment-effect sizes will need to use samples larger than 20 to and sophisticated statistical techniques that are likely to 30 subjects per treatment. In addition, marketers that are be beydnd the typical executive or manager’s ability to studying insensitive or highly variable outcome measures implement. Thus, we advise randomly assigning and mea- may need to use large sample sizes. suring the same units.

82 Cornell Hotel and Restaurant Administration Quarterly APRIL 2003 EXPERIMENTS AND QUASI-EXPERIMENTS FOCUSONRESEARCH I

marketing environment of interest. External va- Some things to keep in mind when evaluating lidity is threatened by differences between the the generalizability of results across samples and real-world and experimental samples, treatments, measures are discussed below. measured behavior, or contexts. A common ex- People of different ages, sexes, and ethnicities, ample of such a threat can be found in test mar- as well as people from different regions of the keting of new entree items by, for instance, country or world, differ in terms of tastes, value McDonald’s (e.g., the M&b sandwich). Restau- priorities, and other factors that may affect their rants (not only McDonald’s) often label such responses to marketing communications and of- items as special offers that are available “for a lim- fers. As a result, it is dangerous to draw conclu- ited time only.“The problem with that approach sions about one group of people based on data stems from the fact that the availability of the about a different group of people. However, gen- items will not remain limited if they are judged a success and permanently added to the menu. In other words, the experimental conditions differ Differences between two groups of from those to which the marketer wants to gen- people do not necessarily make it eralize the experimental results. This difference is important because limited availability increases inappropriate to generalize results demand for products. “Test markets that describe from one group to the other. items as special offers available for a limited time generally inflate the demand for those items and do not provide good estimates of the demand e&zing findings across groups of people can be that item would generate as a permanent addi- reasonable when there are only small differences tion to the menu. between the groups or when the differences that The way to ensure external validity is to make exist are unlikely to affect responses to the treat- the features of the experiment similar to the fea- ment. For example, researchers have found only tures of the situation to which the experimental small demographic and psychographic differences results will be generalized. Marketers should draw between the users of different brands within a sample that is representative of the actual con- consumer-product and -service categories.” This sumers of the product or service, deliver the treat- suggests that marketers can run experiments on ments to subjects in the same way and in the their own customers and safely generalize the re- same context that they will be delivered in the sults to all users of the product category. In addi- marketplace, and measure the same outcome vari- tion, researchers have found that differences be- able that managers want to affect in the market- tween African-American and Caucasian place. However, it is expensive and difficult (if consumers do not affect their responsiveness to not impossible) to make experiments similar in point-of-purchase displays or price discounts.28 all respects to real-world situations of interest. This suggests that marketers can generalize find- Thus, marketers must often conduct experiments ings about the effects of these tactics among one that differ in some ways from the situations to ethnic group to the other. The important thing which they want to generalize the experimental to keep in mind is that differences between two results. For example, marketers often settle for groups of people do not necessarily make it in- nonrepresentative samples or measure attitudi- appropriate to generalize results from one group nal outcome variables when it is consumers’ be- to the other. Only when those differences affect havior that they ultimately want to affect. How much these differences affect the generalizability of the results depends on the specifics of the case. ” Rachel Kennedy and Andrew Ehrenberg, “There Is No Brand Segmentation,” Marketing Research, Spring 2001, pp. 47.

26 See: Laura A. Branon and Amy E. McCabe, “Time- ss Corliss L. Green, “Differential Responses to Retail Sales restricted Sales Appeals: The Importance of Offering Real Promotion among African-American and Anglo-American Value,” Cornell Hotel and Restaurant Administration Quar- Consumers,” Journal of Retailing, Vol. 71 (1995), ter& Vol. 42, No. 4 (August-September 2001), pp. 47-52. pp. 83-92.

APRIL 2003 Cornell Hotel and Restaurant Administration Quarterly 83 FOCUS ON RESEARCH EXPERIMENTS AND QUASI-EXPERIMENTS

responsiveness to the experimental treatments is Conclusion generalizability called into question. To be as effective as possible, marketers should The difficulty and expense of measuring develop and test several potential courses of ac- purchase behavior in naturalistic experiments tion before embarking on any of them. The best leads many marketing researchers to use self- way to develop a variety of courses of action is to reported attitudes, beliefs, or purchase intentions conduct exploratory or descriptive research. The as outcome variables in experiments instead of best way to evaluate those options is to conduct using actual purchase behavior. As mentioned a causal-research study that compares consum- earlier, this practice is flawed because attitudes, ers’ behavior when faced with various options. beliefs, and intentions are weak predictors of ac- Experiments and quasi-experiments are under- tual behavior.29 Consequently, treatments may used, but are nevertheless powerful research affect attitudes, beliefs, and intentions, but not tools that allow hospitality marketers to draw affect purchase behavior. Since purchase behav- strong causal conclusions about the effects of ior is what marketers are ultimately trying to in- pricing, design, and other changes on the amount fluence, that behavior should be used as the out- of money customers spend, or the number of vis- come variable in marketing experiments its they make to an establishment. whenever possible. Experiments should be designed to have When actual marketplace behavior cannot be statistical-inference validity, internal validity, and measured in an experiment, researchers should external validity. The perfect marketing experi- measure consumer choice in an artificial situa- ment would have (1) a large sample size that was tion that is structurally similar to the choice situ- drawn randomly from the population the mar- ation in the marketplace. Eric Marder developed keter wanted to make conclusions about, (2) ran- an artificial-choice task (called STEP) that has dom assignment of subjects to treatments, similar choice options, information about each (3) treatments that are delivered in the same way option, and ease of choosing each option as those and in the same contexts they would be deliv- that consumers face in the marketplace. He found ered in the marketplace, and (4) measures of con- that consumers’ STEP choices closely parallel sumer choice, or purchase behavior, in the mar- their marketplace choices.30 Thus, choice tasks ketplace. Such an experiment would allow a like STEP provide a reasonable alternative to mar- marketer to identify with complete confidence ketplace choices when measuring the effects of the best option under consideration. marketing experiments. Unfortunately, practical considerations often require compromises in experimental design. Such compromises are not a reason to dismiss the experimental results out of hand. Instead, marketers should assess the level of threat to sta- tistical inference, internal or external validity 29 See: Myers, op. cit.; and A.W. Wicker, “Attitudes versus Actions: The Relationship of Verbal and Overt Behavioral posed by a compromise in experimental design, Responses to Attitude Objects,” Journal of Social Issues, and adjust their confidence in the experimental Vol. 25, pp. 41-78. results accordingly. Even an imperfect experiment 3o STEP measurement involves giving subjects a booklet can provide useful input into the selection of mar- that describes all the major competitors in a product cat- keting options as long as the marketer is aware egory and instructs subjects to distribute ten stickers among the competing options to reflect the likelihood that the sub- of what conclusions the experiment can support Ann Lynn, Ph.D. (top jects would buy the products as described. Each product photo), is an assistant and what conclusions it cannot. We hope that description is on a separate page of the booklet. Product professor in the department this article will increase such awareness among descriptions include a brand name, a picture, a price and a of psychology at Ithaca hospitality marketers and will encourage them College (alynnOithaca.edu). summary of product attributes and benefits (taken from Michael Lynn, Ph.D. real promotional materials on that product). The number to make greater use of this research tool. (bottom photo), is an of STEP stickers a person gives a product is related to that associate professor at person’s subsequent purchase behavior. Furthermore, the the School of Hotel Admin- average shares of STEP stickers products receive correlate istration at Cornell Univer- at .92 with the products’ actual market shares. See: Marder, sity (wml30cornell.edu). op. cit.

84 Cornell Hotel and Restaurant Administration Quarterly APRIL 2003