<<

International Statistical Institute, 55th Session 2005

Inference from Non-Probability Samples in Marketing Research

Bill Blyth TNS plc Westgate, London, W5 1UA, UK [email protected]

1. Introduction This paper discusses the manner in which marketing research designers have evolved robust methods for producing reliable results from which inference can confidently be made. It draws on the structure proposed by Groves (1) to argue that market researchers can provide academic statisticians with a different and valuable perspective to their own about survey design and stakeholder value.

2. Marketing research today

Annual industry data for 2003 (2) estimates the global expenditure on marketing research at $ 18.9 billion. 38% of this was in North America, 44% in Europe and 14% in Asia Pacific. The latter area provided the fastest annual growth with China growing at 28% and at 17%. Contrary to popular belief very little marketing research consists of either street interviewing or opinion polling. The greatest proportion of activity is in the area of continuous measurement. Market size measurement, be it via retailers or consumers, or is the single largest area of activity. This and the majority of other work undertaken on consumer behaviour are invariably continuous, in the sense of being repeated through time. The movement towards evidence based policy and the need for performance indicator measurement has particularly spurred expenditure by the public sector. Our best estimate is that 7% or $1.3 billion dollars came from the public sector excluding utilities. The largest agencies span the world with revenues of several billions of dollars and workforces many thousand strong. In recent years consolidation has been rapid and less than ten, mainly publicly quoted, companies account for more than half of global revenues. Consolidation has produced increased standardisation between and within countries. Whilst agencies still present a wide range of expertise and resource, competition and standardisation has increasingly undermined poor quality suppliers. The comments in this paper derive from the practices of these larger successful agencies, rather than fringe or ‘rogue’ elements. Standards of staff qualifications, training and fieldwork are high. The draft of an ISO quality standard specifically for market research has recently been published for consultation. It has been drawn up by a global working party with representation from the private sector, official statistics organisations and academia. .

3. and survey design considerations

Marketing research survey designers carry out a complex trade-off between speed, accuracy and cost to provide ’research value.’ Tender deadlines and budgets are tight and are an important part of the evaluative mix and the design consideration. I prefer to use the description of survey designer, rather than statistician. This is because the question of sample design is inextricably entwined with other aspects of survey design, not least the method of . Each country has evolved a preferred method or restricted set of methods International Statistical Institute, 55th Session 2005

for the conduct of quantitative studies. These are influenced by local geography, population characteristics, relative costs, labour force availability and characteristics, sampling frames, technological infrastructure and GDP per capita. The timing of the local ‘take-off’ of marketing research is also crucial. In the USA market research moved rapidly away from personal interviewing first to postal research, then to telephone and recently is in the forefront of the use of the . Europe on the other hand invariably has regarded mail surveys negatively and it was not until the late 1980’s or 1990’s that telephone surveys became acceptable for general populations of interest (3). Personal in-home interviewing is preferred in many countries as is the case in Africa, Asia and S America. The early move to the web in the USA has been decelerating. Web take-up elsewhere in the world has been slow other than in a few northern Europe countries and a couple of Asian countries with intensive Internet use. The consequence of this local methodological evolution was a variety of methods and approaches that have converged over time. A general complaint- Kish (4) -is that such methods are not written up. Reported experimental work on non-probability methods is scanty. Moser and Stuart (5) and Marsh and Scarborough (6) being rare exceptions and somewhat dated. However, there are few reasons why commercial researchers should want to write up their work. Increasingly when they carry out experimental work it is unpublished. The reason why this omission occurs is straightforward competitive advantage. Without academic career pressure to publish there is no desire in the face of weak or unenforceable IPR legislation to tell the world how one makes a better widget! The movement of staff and ideas makes it very difficult to protect the fruits of what are increasingly expensive investments from which shareholders expect a measurable return. This desire for confidentiality is evidenced by the Black Box nature of much of the discussion elsewhere about propensity weighting systems. What market researchers will agree on is that their methods are not generally probability methods. That has not been and is not always the case. Most countries have several major commercial surveys using probability methods or carry out public sector surveys that use such methods. However, whilst marketing researchers do know how to conduct random surveys and, when required, can do so adequately, they or their clients choose not to. Why is that?

4. Inferential structure

Groves summarises a number of constructs that describe differing approaches of survey researchers. To paraphrase:

Describers v modellers persons v non-sampling error persons Error measurers v error reducers

To this one might add:

Survey topic specialists v generalists Stand- alone v repeat

Whilst he states they are not mutually exclusive the marketing researcher and the academic sampling statistician find themselves more often than not at different ends of these scales. Marketing researchers are non-sampling error; we are reducers; we are generalists; we are trackers; we are describers and modellers and increasingly use modelling to describe. Much of the literature is written as if a survey takes place in a vacuum, or is totally removed from other surveys about different subjects. Relevant administrative data rarely makes an appearance and discussion is around theoretical or actual single parameter estimates. In literature real time appears to stand still as consideration is given to the matters at hand. This state of affairs is International Statistical Institute, 55th Session 2005

not the commercial reality where on-time delivery is of prime importance. In the majority of commercial quantitative studies the survey is invariably repeated at some time interval- be it a month, a quarter, a year, bi-ennially or longer. Others surveys also will have previously asked the majority of the same questions. We are concerned with datasets and inter-relationships, with overall conclusions, with time-dependent decision contexts, with cost and resource constraints. We are concerned with checking results against a wide range of prior data, an increasing amount of which is comprehensive administrative data of some sort emanating from the client. Inference occurs within this context- it is a complex and professional task. Much of this approach is, of course, shared with survey statisticians in public sector organisations where the pressures on delivery and accuracy of trend are much the same. However, there is still a marked divide between the sectors in the use of probability and non-probability methods. I would argue that where the primary concern is to track change through time, then the use of non-probability methods provides greater control of sources of variability and bias. Whilst theoretically random methods may be better, their application is not justified unless response is so high and unbiased as to be ignorable. This unfortunately is rarely the case even in much public sector research. Within marketing research organisations, the production process is increasingly mechanical and process driven. Survey after survey goes through ‘the mill’ using the same sampling, interviewers, coding teams, CAPI equipment, weighting, back-checking etc. It is easy, particularly within the context of integrated databases, for the experienced researcher to be able to generalise aspects of the error and bias structures that exist from one survey to another. In many countries larger organisations are equal in size or larger than many public sector organisations carrying many hundreds of surveys and many hundreds of thousands of interviews each year. In parallel, the large amount of material, available around the world, collected in the same manner on the same application enables the researcher to develop relevant knowledge about data relationships. An example of this is the ‘Theory of Repeat Buying Behaviour’ developed by Ehrenberg (7). The theory establishes empirical generalisations about patterns of buying behaviour and then proceeds to model these within a coherent theory. It has been validated across a wide range of markets and choice behaviours around the world. Other law-like relationships have been derived in areas such as attitude and behaviour measurement. Such knowledge enables the market researcher to set prior models of the data structure and the criteria by which data is ‘acceptable’. A primary objective for our data collection, whether its use is descriptive or modelling, is to initiate a design with minimum error and bias, of every type, and then to keep these constant through time on repeat waves. Our concern is primarily with non-sampling errors. Experience has shown these to be the most uncontrollable and more prone to exogenous factors. Our priority is to understand these errors and keep them constant, in the total error sense. Improvement should only be countenanced within the context of unchanged bias or reworked historic data. It follows that our validation and inference are focused much more widely than any one current survey. Many parameters that our surveys report should not change from period to period. The absence of such change provides us confidence for the use of the data set generally. The dog not barking is often more important than the dog that barks. In an era which will see increasing use of synthetic and modelled data we should be concentrating more on the tests of the validity of the datasets we are using, rather than one specific aspect of their derivation. We need to focus on general data structures as evidenced by covariance rather than individual parameter estimates and their associated error or bias. This is an area where theory needs greater development.

5. Sampling design and related implications

An adequate description of the variety of sampling methods employed by marketing researchers around the world would be a very long book. As a general description the common denominator in sampling methods is random selection down to the penultimate units and quota International Statistical Institute, 55th Session 2005

thereafter. Thus in personal interviewing one might have multi- stratified PPS selection of PSU’s and then some form of random walk or address blocking. Alternatively with telephone we would see the use of RDD for terrestrial numbers then quota within responding households. Practical quotas can be set for the sample selection, and more complex post-stratification applied at the analysis stage. I would disagree with the opinion of Kish (4) that such methods provide very little control. It is a commonplace method and constantly validated by experience. The design of surveys down to the last stage units is capable of substantial rigour with this approach, and any estimation errors that arrive are more probably due to administrative errors that can occur in any organisation, than as a result of the deviation from purity at the last stage of the process. Our results are judged by their application validity not within a theoretical framework. Broadly speaking researchers are not afraid of bias so long as they can measure it and keep it constant. In much current research access is available to detailed electronic administrative data. Whilst not comprehensive this often provides the equivalent of accurate data biopsy; i.e. an accurate cross-section of part of the world. The availability of dis-aggregated survey data from a range of sources provides a tremendous opportunity for large-scale research operations to build up a detailed picture of data inter-relationships across a wide range of topics and countries. In turn this provides a design and inference approach which is essentially inductive across a range of activity rather than deductive within a narrow range. Within the commercial sector the most interesting development work in the sampling design and estimation area is currently around these issues of combining survey and other data, be they , administrative, or other survey sourced to produce more accurate modelled estimates. This estimation may be via direct modelling, as in the case of propensity scoring to adjust for sample frame coverage, or by the use of fusion and ascription to combine data sources: For example, integrating TV set-top reception data with audience panels. ‘Synthetics’ will be replacing ‘natural’ data fibres more and more I believe. In today’s competitive international economy the success achieved by the major research companies would not be possible without substantial and proven confidence in the findings they provide. Future user concern will be increasingly on ‘fit’ with multiple independent data sources and the development of criteria to measure that fit. The schism between the academic theorist and the commercial practitioner need not widen, but currently priorities appear to push in different directions that sadly will inhibit the interchange that would enrich both sides. The challenge for academic survey researchers is to address the methods that are being employed and to work with practitioners in improving them. Currently the commercial and academic worlds often pass like ships in the night. Our similarity with public sector researchers is in many ways much closer than generally recognised. The greatest difference methodologically is the probability/non-probability divide. In the face of falling response rates and improved methods of estimation, one must ask, given the commercial experience, whether the probability approach, with its higher costs and slower delivery is the best use of scarce resource. Would some other approach provide a better use of public investment in survey research? That’s a paper for another time.

REFERENCES 1 Groves, R.M. (1989) Survey errors and Survey Costs. Wiley Interscience 2 ESOMAR (2004) 2003 Annual Statistics. Esomar, Amsterdam 3 Blyth, W.G. (1998) Current and Future Technology Utilization in European Market Research, in Computer Assisted Survey Information Collection, ed Couper et al, Wiley 4 Kish, L. (1998) Quota Sampling, Old Plus New Thought, ISR, Univ of Michigan 5 Moser,C.A. & Stuart, A. (1953) An Experimental study of Quota Sampling. JRSS(series A) 116, 349-405 International Statistical Institute, 55th Session 2005

6 Marsh, C & Scarborough, E.(1990) Testing Nine Hypotheses about Quota Sampling JMRS 32,485-506 7 Ehrenberg,A.S.C. (1972) Repeat-Buying theory and applications North Holland

RÉSUMÉ [Type a summary of your paper in French]