<<

SSC – JSM 2010

A Framework for the Meta-Analysis of

Karla Fox, Canada

Abstract This presents a framework for the meta-analysis of complex survey data using the design and a superpopulation model to quantitatively summarize different surveys that represent the same underlying population. Additionally, it will compare the classical design based methods using a superpopulation model approach.

Keywords: Meta-analysis, observational data, surveys

1. Introduction

The recent increase in straightforward access to data has lead to a proliferation of analysis that groups or pools multiple study results. This ready access, combined with user-friendly computer software, has lead researchers to try to borrow strength from the many different observational and non-randomized studies in an attempt to either improve precision or address research questions for which the original studies were not designed. Researchers have started to employ many different techniques including meta-analysis; however, the analysis is often done without reference to a generalized framework or a and often without an understanding of the methodological differences between surveys and .

Although meta-analysis was first used in 1976 for the combination of educational studies, it is now common in the medical literature (Glass). This integration of studies has expanded from combining randomized control trials and other experimental studies to blending from epidemiological quasi-experimental studies (cohort and case-control), and diagnostic tests and bioassays and now survey data.

This paper begins with an overview of the differences in the processes in experiments and surveys. Next, we describe model and design based inference and estimation in surveys, followed by an overview of fixed and random effect models in meta-analysis. We then describe the used to illustrate meta-analysis in the survey context, highlighting issues with convergence, inference and small sample properties.

2. Randomization: Allocation compared to Selection

It is important to address the differences of the randomization frameworks in experimental design, and that of survey design. What is random in experimental data is the assignment of individuals. With experimental data, we assign a treatment(s) and a control to individuals in a random fashion to create two (or more) probabilistically equivalent groups. Altman (1985) points out, however, that randomization does not guarantee that the treatment groups are comparable with respect to their baseline characteristics, and suggests that comparability should be assessed prior to analysis. We are thus using randomization to reduce and to ensure that the results of the are internally valid.

When we combine experiments we make several assumptions about the treatment effect in the different experiments, based on the fact that we are combing estimates that have been generated from a process with random allocation. We assume that the treatment effect follows a distribution and we either assume that the effects we are combining are identical and independent from this distribution or that the differences between the treatment effects from the different

527 SSC – JSM 2010

experiments can be modeled in an analogous fashion to random effect ANOVA models. We make no assumptions on the population of individuals.

On the other hand, in the case of classical design-based survey methods, what is random is the selection of individuals. The purpose of random selection is to generalize to the entire finite population from which we are drawing our sample. Finite population parameters describe the finite population U, from which we draw our sample. Estimation is then about characteristics of this population. These quantities always portray the population at a given time point (the time of ), and are purely descriptive in nature. A finite population parameter has the property that in a (with no non-response and no error) its value can be established exactly. If the researcher was able to collect information from the entire population he would do so. However, since the researcher rarely can take a census of his population of interest and probability sampling is done to produce estimates of the finite population parameter of a sufficient precision at a minimum cost.

In probability sampling we select samples that satisfy the following conditions (Sarndal 2003):

1. We can define a distinct set of samples, S1, S2, …, Sv, which the sampling procedure is capable of selecting if applied to a specific population. This we can say precisely which units belong to S1, to S2, and so on. 2. Each possible sample Si has assigned to it a known probability of selection P(Si), and the probabilities of the possible samples sums to 1. 3. This procedure gives every unit a known non-zero probability of selection. 4. We select one of the Si by a random process in which each sample has its appropriate probability of selection P(Si).

3. The Model and the Design

Though classical sampling theory is concerned with inference for finite population parameters we can extend inference to a superpopulation. Superpopulation modeling occurs when the researcher specifies or assumes a random mechanism that generated the finite population (Hartley and Sielken, 1975). We now have what is called model-based as opposed to design-based inference, where the model is the statistical conceptualization of a superpopulation.

For example if we assume that the vector y of responses in a survey are a realization of a random Y  (Y ,Y ,...,Y ) variable 1 2 N , then the finite population vector y is itself a sample from a hypothetical superpopulation model, ξ. Here then the model-builder is not interested in the finite population U at particular time, but the causal system described in the model. So in the case where we have a census we still would not be estimating these model parameters exactly.

Figure 1 illustrates the relationship of the superpopulation with the finite population with the  sample. If we let  be the parameter from a superpopulation model, then we can let N be the ˆ F ( F ) estimator from the entire finite population and then is the estimator for the finite population based on the sample. Figure 1

ˆ F ( F )

θ θ N

528 SSC – JSM 2010

We now note that we can approach estimation in either a finite sampling or a superpopulation framework. If we cross the estimation approach used with the level of inference we have the following table (Hartley and Sielken, 1975): Estimation Framework Sampling from a fixed or Two-stage sampling from a finite population superpopulation Parameter of interest

Finite population Classical finite population Superpopulation theory for the finite parameters or descriptive sampling population parameters ˆ F ( F ) ˆ F ( S )

Infinite superpopulation Infeasible Inference about superpopulation parameters or model parameters from two-stage sampling parameters ˆ S (S )

For clarity the following notation will be used to represent the level of inference and the ˆ S (S ) used to estimate a particular parameter.  denotes an estimator of the parameter of the superpopulation where the first superscript S denotes that the parameter of interest is from the superpopulation the second enclosed in parenthesis (S) denotes the two-stage superpopulation framework used to estimate the parameter. Hartely and Seilken and later Sarndal refer to this as ˆ F (S ) case 3. Likewise  is the finite population parameter estimate under a two-stage ˆ F (F ) superpopulation framework (case 2 Hartley Seilken) and  is the finite population parameter estimate under classical finite population sampling (case 1 Hartley Sielken). As the estimation of a superpopulation parameter under finite population sampling is not possible the superscript S(F) is not used.

As it is not logical to combine the estimates from different studies that are inferentially different or calculated under different estimation methods (without accounting for the differences) we only consider cases where we are combining like estimators.

3.1 Combining estimates of the finite population parameter using sampling from a fixed ˆ F (F ) population - 

With this first type of inference the simplest approach is to use design-based estimation such as a ˆ F (F ) Horvitz-Thompson estimator . Here inference is based on repeated sampling of the population of the same design. We may have an analytical model, but we do not use it in any way. Survey methodologists most frequently encounter this case.

ˆ F (F ) When we consider combining estimates of  from different surveys we can see that several cases commonly discussed in the literature fall under this category. Pooling or cumulating cases as suggested by Kish (1999) is a method of taking different estimates of a finite population

529 SSC – JSM 2010

characteristic and combining them. Kish uses the term combining when he estimates what he describes as multipopulation or mulitdomains, and cumulating or pooling for periodic or rolling samples. In both these cases we need not invoke a superpopulation and inference is for the ‘finite’ population of interest. Repeated sampling and panel samples of the same population are ˆ F (F ) also a method to combine  (Cochran 1977). Wu (2004) suggests a method to combine information from different surveys using an empirical likelihood method similar to the approach of Zieschang (1990) and Rensessn and Nieuwenbroek (1997), the purpose of which is to improve estimates of the census parameter. Merkouis (2004) suggests a regression approach to calculate design-based direct estimators using multiple surveys and contrasts it with the typical small-area estimation indirect approach which generally uses one sample. We can also consider combing samples from non-overlapping surveys and overlapping frames as special cases of combining ˆ F (F )  estimates of . In all these cases inference is about the census parameter N and not the superpopulation parameter.

3.2 Combining estimates of the finite population parameter using superpopulation theory ˆ F (S ) for a finite population parameter -

 When we invoke a superpopulation model we would want the estimator of N to take into account both the model and the sampling design. Therefore, with inference of the finite population parameter under a superpopulation model we would select an estimator where 2  ˆ F (S )   E E p  S  N  is minimized. Here, the subscript implies expectation taken over the   model and the subscript p indicates expectation taken over the sampling plan.

Fuller (1973) presents theory and estimation for the case where we assume that the finite population is a with finite fourth moments from the superpopulation and the sample is drawn from this finite population. Hartley and Seilken provide more on the inference of the finite population parameter for regression parameters. Isaki and Fuller (1982) expand these ideas to construct model-unbiased and design-consistent regression predictors.

ˆ F (S ) Here under the approach of Isaki and Fuller  is called a predictor. A predictor is  conditionally model unbiased for the finite population parameter N if given the sample S if we E  F (S )  S  0 have   N   . It is important to note that the conditioning is not on a characteristic of the population. The predictor is unbiased with respect to the design P(S) if E ˆ F (S )    p   N . They then define the best model-unbiased predictor of N as the one with minimum model .

Although a simple random sample of a simple random sample is also a simple random sample, most surveys have a more complex design. If inclusion probabilities depend on y through more than the covariates then we will make erroneous conclusions if we do not consider the design (Pffeferman 1993). So we approach this estimation problem by assuming that the sampling design P(S) plays no role (Royall 1973), we are stating that the relationship in the model holds for every unit in the population U and the sample design should have no effect on estimation. As noted above, this can be the case as long as the probability of selection depends on y only through the x’s. If the design and inclusion probabilities play no role in estimation then design

530 SSC – JSM 2010

unbiasedness is unimportant. This is often referred to as a frequentist approach and or a pure model-based inference.

This frequentist approach is related to the idea of ignorability. We consider a design to be non- informative if the sampling mechanism does not depend on Y and informative if it does. More f (i ) formally, let the related to a design be U . This distribution is a function of the indicator variable IU and the finite population U. Where I denotes the (N x 1) j U sample indicator variable such that Ij =1 if the unit is selected to the sample 0 otherwise. In  j U Z  Z , Z ,...,Z  the case of probability sampling Pr(Ij =1) >0 for all . Let 1 2 N define a (N x q) matrix of population values of q design variables Z(1),…, Z(q) (different from Y). For informative sampling plans we have the case that the sampling plan is dependent on Y, that f (i Y  y ) is U U U . Then the joint distribution of the realized variable y and the design

f (iU YU  yU ) f (yU ; ) is . If we can state that the variables IU and the values YU are conditionally independent given other design variables ZU, we can also have non-informative

f (IU ZU  zU ;) f (YU ZU  zU ) design where and the distribution of YU given ZU=zU as (Pffeferman 2003).

Binder and Roberts (2001) state, that a design is ignorable if for a particular analysis if the inference based on all the known information is equivalent to the inference based on the same information excluding the outcomes of the random variables related to the design. They note that ignorability is a concept related to a particular analysis and one can have ignorability with both informative and non-informative designs. All non-informative designs are ignorable but not all ignorable analyses come from a non-informative design.

In practice the analyst does not always have the exact sampling plan and should not assume that the sampling plan is independent of y. And like any classical procedure model misspecification can lead to incorrect conclusions, particularly for large samples. As Sharon Lohr states ‘theoretically derived models known to hold for all observations do not often exist for survey situations’.

Thus if we consider combining estimates of this type we can see that the variance will no longer just be a function of the sampling, but it now includes a part for the superpopulation, even though ˆ F (S ) the point estimate is of the finite population parameter. If we naively tried to combine  we need to ensure that we take into account this extra variance. Additionally, we note that ˆ F (F ) ˆ F (S ) combining  and type estimators would be problematic as the variance of the first would always be smaller than the second even for exactly the same sample.

3.3 Combining estimates of the superpopulation parameter using a two-stage sampling ˆ S (S ) approach -

If we are interested in the inferences from this model, and we wish to extend our inference to the superpopulation and its parameters, Molina et al (2001) suggest that inference should be made using the joint process of model and design. They propose a joint model-design process satisfying the following: For all the samples S that are possible under p

531 SSC – JSM 2010

E (Y I  I )  E (Y),  S s  E (YY T I  I )  E (YY T )  S s  These conditions are satisfied when I and Y are independent. If this condition does not hold then they suggest modeling the and variance conditional not on the observed sample but on all possible samples under p.

Estimation of a superpopulation characteristic can be the objective of a meta-analysis. In this ˆ S (S ) case we need to be combining estimates of  that we felt were from different finite populations that were realizations of the same superpopulation as in Figure 2 in order to converge to the superpopulation parameter.

Figure 2

Finite Population

Superpopulation Finite Model Population

Finite Population

4. Meta-Analysis: Fixed and Random Effect Models

In traditional meta-analysis when we combine estimates from an experiment we assume observed in a study estimates the corresponding ‘population’ effect with a random error that stems only from the chance associated with the individual level ‘’ (not sampling in the case of probability sampling but due to the distribution). Differences in the studies are in the power to detect the outcome of interest and each experiment is independent of  other experiments. Here we assume asymptotically that the estimator of the treatment effect i is  v considered a and has a normal distribution with mean i and variance i . We also assume that there is homogeneity of treatment effects across all n studies, that     ...     is 1 2 n .

We can see that we can calculate linear combinations of unbiased estimates that are themselves unbiased estimates and then the simplest approach is to calculate a weighted average from all the studies. The fixed effect model (the weighted average) takes the general form n ˆ  wi i ˆ i1 g( )  n  wi i1

532 SSC – JSM 2010

ˆ where  is the estimator and the weights wi are chosen to minimize some constraint, such as ˆ g( ) having minimum variance. Here the weights themselves are generally not known, but are estimated from the data. Hardy (2003) shows that there is no systematic caused by the estimation of the weights in the overall fixed effect estimate.

Many researchers have addressed the optimum choice of weights. In general we wish to give the greatest weight to studies that have the most precise estimates. If we want to minimize the variance then the minimum variance estimate is the one obtained by taking the weight for the ith study to be the inverse of the variance.

Fixed effect meta-analysis methods do not provide unbiased results when the heterogeneity of the estimates is not explained by known differences in the studies. Here the assumptions are that the true treatment effect of each trial is normally distributed with mean  and a between-study 2 variance B . Inference is based on the assumption that effects are independently and identically sampled from an overall population of all such effects.

The infinite population is an important assumption in the classical parametric approach. The assumption that the effects are random implies that we must include a between-study as well as a within study component of variation when estimating an effect size and a (Demets). Then we would assume that the variability beyond the individual ‘sampling error’ is random, that it stems from random differences among studies whose sources cannot be identified. This will normally mean that the confidence interval is at least as wide as that from a fixed effect model.

 v ˆ We suppose that each trial estimate has a normal distribution i and variance i . Here let  be a pooled estimate of effect size (with weights chosen to be the inverse of the estimated ). Then to calculate the random-effects estimator (DerSimonian-Laird) we use k ˆ /(ˆ 2  v 2 )   i i i1  DL  k ˆ2 2 1/(  vi ) i1

2 where  is a point estimate of the heterogeneity variance. This is a generalization of the fixed 2 effect approach which occurs when = 0. However, it should be noted that the weights are themselves estimates and that will lead to an overestimation of the between study variance (Hardy year).

5. Simulations

We illustrate meta-analysis on the most common case where we have a single finite population. Using this framework we ran two simulations.

The first was of a finite population of 100000 individuals from a simple superpopulation model. The outcome of interest Y was generated as a normal random variable with mean 500 and variance of 10000. The stratification variable used in this example had three levels and was generated to be strongly correlated with y (   .7 ). 1000 replicated samples of

533 SSC – JSM 2010

1000 were then selected from this large finite population using SAS using both simple random and . For the stratified case a simple random sample was selected within each stratum proportional to the size of the stratum. Design unbiased estimates were calculated for each of the 1000 replicates. Fixed model using the inverse of the design variance as the weight were then calculated and random effects estimates using the variance between samples as an 2 estimate of  and compared for convergence to the superpopulation estimate.

The second simulation involved a finite population of 10 million individuals from a simple superpopulation model. The outcome of interest was generated as a normal random variable with mean 500 and variance of 100. The stratification variable used in this example had three levels and was generated to be strongly correlated with y (   .7 ). 1000 replicated samples of 100 were then selected from this large finite population using procsurvey select stratified sampling with a simple random sample within each stratum proportional to the size of the stratum. Design based and purely model based estimates were calculated for each of the 1000 replicates. Fixed model estimates using the inverse of the design variance as the weight were then calculated and compared for convergence of the two methods.

5.1 Convergence

We can empirically show that when we are combining design unbiased estimates of the finite population parameter we will converge to the finite population parameter. Here the superpopulation has a mean of 500, but the finite population had a mean of 512. The convergence of the first 500 samples of the first simulation to the finite population mean is shown in Figure 3. Figure 3

630

610

590

570

550

Mean 530

510

490

470

450 1 16 31 46 61 76 91 106 121 136 151 166 181 196 211 226 241 256 271 286 301 316 331 346 361 376 391 406 421 436 451 466 481 496 # of Samples

This should make intuitive sense to the reader though when we combine information under a superpopulation framework we often neglect this. If we wish to converge to the superpopulation parameter we must in fact be combining different realizations of the superpopulation as in Figure 2, not different samples of the same finite population.

5.2 Design or Model based

If we have a very large finite sample we know that the census estimate will be closer to the superpopulation estimate. Using a large finite population where sample mean is 499, we illustrate that purely model based estimates can be biased when the sampling design is not ignorable. Here

534 SSC – JSM 2010

the stratification variable was highly correlated and if we look at the first 10 replicates, in Figure 4, the design based estimates in blue are not biased whereas the purely model based is. Figure 4

540

520

500

480

Mean 460

440

420

400 1 2 3 4 5 6 7 8 9 10 # of Samples

5.3 Small Sample Properties

We used different seeds to discover the behaviour of these estimates when the number of surveys we would combine is small. For many of the simulations these estimation techniques are not particularly robust when the number of samples being combined is small. Figure 5 illustrates that only two of the estimates were within 5% of the true value. This is consistent with findings combining experimental data. Figure 5

55

50

45 Thousands

40

35 Mean 30

25

20

15 1 2 3 4 5 6 7 8 9 10 # of Samples

6. Practical Considerations

Systematic reviews and subsequent meta-analysis are a form of research that use other primary research studies as their subjects or units of observation. Like other forms of research they have many steps from planning, through implementation to dissemination. It has been widely recognized that as Naylor states ‘methodology is less important than determining which results are to be aggregated’ when combining experimental data. No two studies are identical and although it is often reported that research experiments should be reproducible, the reality is that research evolves. Individual research is designed based on previous results and findings not to just reproduce findings but to advance knowledge (Peto 87).

535 SSC – JSM 2010

Establishing the eligibility criteria for inclusion and exclusion is a fundamental step in this process. There has been considerable research on the affect of data quality and publication bias on the results of a meta-analysis of a (Schulz, 1996). In the case of survey data, as with other systematic reviews, the meta-analysis is not simply the combining of simple random sets data. Data sets need to be chosen using the same rigorous standards of other systematic reviews in order to minimize the publication and other sources of bias.

The selection of surveys should be a clearly defined process. Whether we use a research-based electronic facility (i.e. Pub-med, the Canadian Research Index, the Social Sciences Index, POPLINE, PapersFirst or J-stor) or web-based electronic search engines including Google and Google Scholar, or contact a statistical directly, the researcher should set up and document the protocol use to select surveys. These protocols should include the key words, search criteria, the years under study and a description of the databases that were searched. Then the researcher should apply clear criteria for inclusion and exclusion to all studies found in the review. These criteria should be based on the comparability and the quality of the studies.

Data quality has many dimensions; accessibility, interpretability, coherence, relevance, unbiasedness, accuracy, precision, timeliness, coverage, completeness, consistency, and some would add methodological soundness. Researchers need to be able to determine the extent to which the accuracy and other quality factors are consistent with their interpretation. They can do this by verifying the conceptual framework and definitions used in the survey design, collection and processing of the data. Without this verification the researcher can not be certain that the data they analyze meets their needs.

7. Conclusions

As Schenker et al (2007) points out when combining complementary survey data the researcher needs to know ‘to what extent are the target populations of the surveys mutually exclusive and exhaustive subsets of the overall domain of interest’. Following Schenker’s lead we expand this to take into account that estimation can be done for different levels of inference using different estimation frameworks. So not only would want to know what is the target population and if the surveys differentially cover their target population we would also need to know why and how these survey estimates were calculated; adding into the framework the inference and the estimation procedure used and then assessing if these definitions overlap.

Additionally we would also want to understand the possible associated with combining estimates that are purely model based and recommend that whenever possible design based estimates are used. And like any estimation technique we would want to not only have convergence to what we feel we are estimating (be it the superpopulation or the finite population parameter), but also that the small sample properties behave in a way which satisfies our estimation needs.

References:

Glass, VG (1976). Primary, secondary, and meta-analysis of research. International Journal of Psychiatry, 1, 97-178.

Altman D, (1985) Comparability of randomised groups. 1985; 34: 125–36.

Sarndal, C. E., Swensson, B., and Wretman, J. (1992). Model Assisted . New York: Springer-Verlag.

536 SSC – JSM 2010

Hartley HO, Sielken RL Jr., (1975) A ‘super-population viewpoint' for finite population sampling. Biometrics. Jun;31(2):411-22.

Kish, L. (1999). Cumulating/combining population surveys. 25, 129–138. Cochran (1977). Cochran,W.G. (1977). Sampling Techniques, 3rd ed. JohnWiley & Sons, New York.

Wu (2004). Combining information from multiple surveys through the empirical likelihood method. Canadian Journal of Statistics, 32 (1), 15-26.

Zieschang (1990). Sample Weighting Methods and Estimation of Totals in the Consumer Expenditure Survey. Journal of the American Statistical Association, 85, 986-1001

Rensessn and Nieuwenbroek (1997), Aligning estimates for common variables in two or more sample surveys. Journal of the American Statistical Association, 92, 368374.

Fuller (1973) Fuller,W.A. (1975). for sample survey. Sankhy¯a Series C 37, 117–132.

Isaki and Fuller (1982) Isaki, C.T., Fuller, W.A. (1982). Survey design under the regression super-population model. Journal of the American Statistical Association 77, 89–96.

Merkouris, T. (2004). Combining independent regression estimators from multiple surveys. Journal of the American Statistical Association, 99, 11311139.

Graubard, Barry I. and Korn, Edward L. (2002), “Inference for Superpopulation Parameters Using Sample Surveys,” Statistical Science, 17, pp. 73- 96.

Pfeffermann, D. (1993). The role of sampling weights when modeling survey data. International Statistical Review/Revue Internationale de Statistique 61(2), 317–337.

Royall, R.M. (1970). On finite population sampling theory under certain regression models. Biometrika 57,377–387.

Binder, David A. and Roberts, Georgia R. (2001), “Can Informative Designs be Ignorable?” Newsletter of the Survey Research Methods Section, Issue 12, American Statistical Association.

Molina, EA, Smith, TMF and Sugden, RA (2001) Modelling for complex survey data. International Statistical Review, 69, (3), 373-384.

Naylor, D (1997)Meta-analysis and the meta- of clinical research: Meta-analysis is an important contribution to research and practice but it's not a panacea, Editorial, BMJ. ; 315 : 617

Peto, R (1987). Why do we need systematic overviews of randomized trials? Stat Med. Apr- May;6(3):233-44.

Schulz KF. Randomised trials, human nature, and reporting guidelines. Lancet 1996; 348: 596– 98.

537 SSC – JSM 2010

Schenker, N. and T.E. Raghunathan, (2007), “Combining Information from Multiple Surveys to Enhance Estimation of Measures of Health,” Statistics in Medicine, 26, 1802-1811.

538