Practical Meta-Analysis -- Lipsey & Wilson Overview
Total Page:16
File Type:pdf, Size:1020Kb
Practical Meta-Analysis -- Lipsey & Wilson The Great Debate • 1952: Hans J. Eysenck concluded that there were no favorable Practical Meta-Analysis effects of psychotherapy, starting a raging debate • 20 years of evaluation research and hundreds of studies failed David B. Wilson to resolve the debate American Evaluation Association • 1978: To proved Eysenck wrong, Gene V. Glass statistically Orlando, Florida, October 3, 1999 aggregate the findings of 375 psychotherapy outcome studies • Glass (and colleague Smith) concluded that psychotherapy did indeed work • Glass called his method “meta-analysis” 1 2 The Emergence of Meta-Analysis The Logic of Meta-Analysis • Ideas behind meta-analysis predate Glass’ work by several • Traditional methods of review focus on statistical significance decades testing – R. A. Fisher (1944) • Significance testing is not well suited to this task • “When a number of quite independent tests of significance have been made, it sometimes happens that although few or none can be – highly dependent on sample size claimed individually as significant, yet the aggregate gives an – null finding does not carry to same “weight” as a significant impression that the probabilities are on the whole lower than would often have been obtained by chance” (p. 99). finding • Source of the idea of cumulating probability values • Meta-analysis changes the focus to the direction and – W. G. Cochran (1953) magnitude of the effects across studies • Discusses a method of averaging means across independent studies – Isn’t this what we are interested in anyway? • Laid-out much of the statistical foundation that modern meta-analysis – Direction and magnitude represented by the effect size is built upon (e.g., inverse variance weighting and homogeneity testing) 3 4 1 Overview Practical Meta-Analysis -- Lipsey & Wilson When Can You Do Meta-Analysis? Forms of Research Findings Suitable to Meta-Analysis • Meta-analysis is applicable to collections of research that • Central Tendency Research – are empirical, rather than theoretical – prevalence rates • Pre-Post Contrasts – produce quantitative results, rather than qualitative findings – growth rates – examine the same constructs and relationships • Group Contrasts – have findings that can be configured in a comparable – experimentally created groups statistical form (e.g., as effect sizes, correlation coefficients, • comparison of outcomes between treatment and comparison groups odds-ratios, etc.) – naturally occurring groups • comparison of spatial abilities between boys and girls – are “comparable” given the question at hand • Association Between Variables – measurement research • validity generalization – individual differences research • correlation between personality constructs 5 6 Effect Size: The Key to Meta-Analysis The Replication Continuum • The effect size makes meta-analysis possible – it is the “dependent variable” Pure Conceptual Replications – it standardizes findings across studies such that they can be Replications directly compared You must be able to argue that the collection of studies you are meta- • Any standardized index can be an “effect size” (e.g., analyzing examine the same relationship. This may be at a broad level of abstraction, such as the relationship between criminal justice standardized mean difference, correlation coefficient, odds- interventions and recidivism or between school-based prevention ratio) as long as it meets the following programs and problem behavior. Alternatively it may be at a narrow – is comparable across studies (generally requires level of abstraction and represent pure replications. standardization) – represents the magnitude and direction of the relationship of The closer to pure replications your collection of studies, the easier it interest is to argue comparability. – is independent of sample size • Different meta-analyses may use different effect size indices 7 8 2 Overview Practical Meta-Analysis -- Lipsey & Wilson Which Studies to Include? Searching Far and Wide • It is critical to have an explicit inclusion and exclusion criteria • The “we only included published studies because they have (see handout) been peer-reviewed” argument – the broader the research domain, the more detailed they • Significant findings are more likely to be published than tend to become nonsignificant findings – developed iteratively as you interact with the literature • Critical to try to identify and retrieve all studies that meet your • To include or exclude low quality studies eligibility criteria – the findings of all studies are potentially in error • Potential sources for identification of documents (methodological quality is a continuum, not a dichotomy) – computerized bibliographic databases – being too restrictive may restrict ability to generalize – authors working in the research domain – being too inclusive may weaken the confidence that can be – conference programs – dissertations placed in the findings – review articles – must strike a balance that is appropriate to your research – hand searching relevant journal question – government reports, bibliographies, clearinghouses 9 10 Strengths of Meta-Analysis Weaknesses of Meta-Analysis • Imposes a discipline on the process of summing up research • Requires a good deal of effort findings • Mechanical aspects don’t lend themselves to capturing more • Represents findings in a more differentiated and sophisticated qualitative distinctions between studies manner than conventional reviews • “Apples and oranges”; comparability of studies is often in the • Capable of finding relationships across studies that are “eye of the beholder” obscured in other approaches • Most meta-analyses include “blemished” studies • Protects against over-interpreting differences across studies • Selection bias posses continual threat • Can handle a large numbers of studies (this would overwhelm – negative and null finding studies that you were unable to traditional approaches to review) find – outcomes for which there were negative or null findings that were not reported • Analysis of between study differences is fundamentally correlational 11 12 3 Overview Practical Meta-Analysis -- Lipsey & Wilson Examples of Different Types of Effect Sizes: The Effect Size The Major Leagues • Standardized Mean Difference • The effect size (ES) makes meta-analysis possible. – group contrast research • The ES encodes the selected research findings on a • treatment groups numeric scale. • naturally occurring groups – inherently continuous construct • There are many different types of ES measures, each • Odds-Ratio suited to different research situations. – group contrast research • Each ES type may also have multiple methods of • treatment groups computation. • naturally occurring groups – inherently dichotomous construct • Correlation Coefficient – association between variables research 1 2 Examples of Different Types of Effect Sizes: Two from the Minor Leagues What Makes Something an Effect Size for Meta-Analytic Purposes • Proportion – central tendency research • The type of ES must be comparable across the collection • HIV/AIDS prevalence rates of studies of interest. • Proportion of homeless persons found to be alcohol abusers • This is generally accomplished through standardization. • Standardized Gain Score • Must be able to calculate a standard error for that type of – gain or change between two measurement points on the same ES variable – the standard error is needed to calculate the ES weights, called • reading speed before and after a reading improvement class inverse variance weights (more on this latter) – all meta-analytic analyses are weighted 3 4 4 Effect Size Overheads Practical Meta-Analysis -- Lipsey & Wilson The Standardized Mean Difference The Correlation Coefficient 2 2 X G1 X G2 s1 n1 1 s2 n2 1 s ES pooled ES r s pooled n1 n2 2 • Represents a standardized group contrast on an • Represents the strength of association between two inherently continuous measure. inherently continuous measures. • Uses the pooled standard deviation (some situations use • Generally reported directly as “r” (the Pearson product control group standard deviation). moment coefficient). • Commonly called “d” or occasionally “g”. 5 6 Methods of Calculating the The Odds-Ratio Standardized Mean Difference • The Odds-Ratio is based on a 2 by 2 contingency table, such as the one below. • The standardized mean difference probably has more methods of calculation than any other effect size type. Frequencies Success Failure ad ES Treatment Group a b bc Control Group c d • The Odds-Ratio is the odds of success in the treatment group relative to the odds of success in the control group. 7 8 5 Effect Size Overheads Practical Meta-Analysis -- Lipsey & Wilson The different formulas represent degrees of approximation to the ES value that would be obtained based on the means and Methods of Calculating the standard deviations Standardized Mean Difference – direct calculation based on means and standard deviations – algebraically equivalent formulas (t-test) – exact probability value for a t-test Direction Calculation Method Great – approximations based on continuous data (correlation coefficient) X X X X ES 1 2 1 2 – estimates of the mean difference (adjusted means, regression B 2 2 s weight, gain score means) s1 (n1 1) s2 (n2 1) pooled – estimates of the pooled standard deviation (gain score standard n1 n2 2 deviation, one-way ANOVA with 3 or more groups, ANCOVA) Good – approximations based on dichotomous data Poor 9 10 Methods of