InvestigatingInvestigating heterogeneity:heterogeneity: PlanPlan ofof sessionsession SubgroupSubgroup analysisanalysis andand meta-regressionmeta-regression • Presentation

CochraneCochrane StatisticalStatistical MethodsMethods GroupGroup TrainingTraining coursecourse Cardiff,Cardiff, 44 MarchMarch 20102010 •Exercises

Roger Harbord Department of Social Medicine, University of Bristol • Discussion

AcknowledgementsAcknowledgements OutlineOutline ofof presentationpresentation • What is heterogeneity? Many of these slides were written or designed by: • What can we do about it? • Julian Higgins (MRC Biostatistics Unit, Cambridge) • Georgia Salanti (U. of Ioannina School of Medicine) • Measuring and presenting heterogeneity 2 •Judith Anzures (ex MRC BSU, now Roche) – I , ², predictive intervals • Jonathan Sterne (U. of Bristol) • Subgroup analysis & meta-regression – How are they are related? • Much of this talk will be based on the Cochrane – Fixed- or random-effects? Handbook for Systematic Reviews of Interventions, in – Problems and pitfalls particular sections 9.5 “Heterogeneity” and 9.6 – Practical guidance “Investigating heterogeneity”. – Extensions • I will make it clear when I express a personal viewpoint. • Summary WhatWhat cancan wewe dodo aboutabout heterogeneity?heterogeneity? WhatWhat isis (statistical)(statistical) heterogeneity?heterogeneity? • Check the data Incorrect data extraction, unit of analysis errors • Variation in the true effects underlying the studies • Ignore it Don’t do that! • Observed effects more variable than would expect by chance (sampling error) alone • Resign to it Do no meta-analysis

May be due to: • Adjust for it Random effects meta-analysis • Clinical diversity (variation in participants, interventions, outcomes) •Explore it Subgroup analyses, meta-regression • Methodological diversity (varying degrees of • Change effect measure OR, RR, RR(non-event)… bias) • Exclude studies As sensitivity analysis Only if an obvious reason – preferably prespecified Hbk: 9.5.1 Hbk: 9.5.3

PresentingPresenting heterogeneityheterogeneity inin random-random- MeasuringMeasuring heterogeneityheterogeneity effectseffects meta-analysismeta-analysis • Cochran’s Q gives a test for heterogeneity • ² is the between-study – Follows a chi-squared distribution under the null •So is the between-study 2 • I quantifies the degree of inconsistency – Therefore is measured on the analysis scale I 2 = (Q – df) / Q × 100% – This will be on the log scale for ratio estimates (if negative, set to zero) (OR, RR) – % of variability due to heterogeneity rather than so can be hard to interpret chance – Variability due to chance depends on study size –So I 2 depends on size of studies as well as between- study variability

Hbk: 9.5.2 Hbk: 9.5.4 PredictionPrediction intervalsintervals PredictionPrediction intervalsintervals

•If and μ were known, would expect 95% of the true • In practice, both and μ are estimated effects in future studies to lie within μ ± 1.96 τ • Bayesian analysis would give a rigorous way of taking all sources of uncertainty into account • A reasonable approximation: μ ± t √ τ²+ SE(μ)² – t is 97.5 percentile of a t-distribution (instead of 1.96) – df debateable: compromise on df = #studies – 2 – Needs at least 3 studies! – 1.96 + 1.96 • Implemented in the metan command in Stata – rfdist option • Not in RevMan yet but may be in future

Hbk: 9.5.4

Meta-regressionMeta-regression andand subgroupsubgroup analysisanalysis However…However…

• Methods for investigating possible explanations of heterogeneity in a meta-analysis • Typically unlikely to obtain useful results (low power) • Used to examine associations between study-level • Risk of wrong results by chance (false positives) characteristics and treatment effects • Assume the treatment effect is related to one or more • Risk of wrong results due to nature of data (confounding) covariates • Potential for biases • Estimate the interaction between the covariate and the treatment effect, • Meta-regression can’t be done in RevMan i.e. how the treatment effect is modified by the covariate • Test whether this interaction is zero

Hbk: 9.6 UndertakingUndertaking subgroupsubgroup analysisanalysis Example:Example: exerciseexercise forfor depressiondepression Estimates and 95% confidence intervals Study • Tempting to compare effect estimates between Doyne subgroups by considering results from each subgroup Epstein Hess-Homeier separately Klein • Ok to compare magnitudes of effect informally Martinsen • Not ok to compare statistical significance or McNeil Mutrie Lawlor DA, Hopker SW. p-values! Reuter BMJ 2001; 322: 763-7 • “It is extremely misleading to compare the statistical Singh Veale significance of the results [in different subgroups]” -2 -1 0 1 Standardized mean difference Favours exercise Favours control

Hbk: 9.6.3

Example:Example: HeterogeneityHeterogeneity Example:Example: random-effectsrandom-effects meta-analysismeta-analysis Estimates and 95% confidence intervals Estimates and 95% confidence intervals Study Study Doyne • Recognizes that true Doyne Epstein effects differ between Epstein • Test whether different Hess-Homeier Hess-Homeier SMDs underlie different Klein studies but does not Klein studies: Martinsen explain why Martinsen McNeil McNeil Mutrie Mutrie Q = 35.4 (9 d.f.) • Summary estimate is the (p = 0.00005) Reuter Reuter Singh centre of a distribution of Singh Veale true effects Veale I 2 = 75% -1.06 (-1.53 to -0.59) -2 -1 0 1 Can we explain some -2 -1 0 1 Standardized mean difference or all of the between- Standardized mean difference Favours exercise Favours control Favours exercise Favours control study heterogeneity? Meta-regression:Meta-regression: fixedfixed oror randomrandom TestsTests forfor subgroupsubgroup effectseffects effects?effects? • “In general, it is an unwarranted assumption that all the • Cochrane Handbook section 9.6.3.1 describes a test for heterogeneity is explained by the covariate, and the between- differences between two or more subgroups based on an trial variance should be included as well, corresponding to a ANOVA-like partitioning of Cochran’s Q statistic “random-effects” analysis.” (Thompson 2001 Systematic Reviews in Health Care Ch. 9) • Reported by RevMan 5 for fixed-effect meta-analysis with subgroups • “Fixed effect meta-regression is likely to produce seriously • Equivalent to fixed-effect meta-regression with an misleading results in the presence of heterogeneity” indicator (dummy) variable for each group (Higgins and Thompson 2004 in Medicine) • Therefore, in my view, • Fixed-effect meta-regression should not be used! this method should not be used either! • A better method (allowing for unexplained heterogeneity) is likely to be introduced in the next version of RevMan Hbk: 9.6.4

Random-effectsRandom-effects meta-regressionmeta-regression SubgroupSubgroup analysisanalysis Estimates and 95% confidence intervals

• Allows for heterogeneity beyond that explained by the • Divide up the studies > 8 weeks follow-up Klein explanatory variable(s) • For example, by duration Martinsen of trial Reuter Singh • Allow for a variance component  2 which accounts for Veale -0.82 (-1.46 to -0.19) unexplained heterogeneity between studies •Compare effects 4-8 weeks follow-up Doyne between subgroups Epstein • Like a random-effects meta-analysis •NB do not compare p- Hess-Homeier Mutrie values! McNeil -1.33 (-1.99 to -0.67) • By comparing a random-effects meta-analysis with a -2 -1 0 1 random-effects meta-regression, can determine Standardized mean difference how much of the heterogeneity (between-study variance) Favours exercise Favours control is explained by the explanatory variable(s) Hbk: 9.6.4 Meta-regressionMeta-regression withwith aa continuouscontinuous studystudy Meta-regressionMeta-regression toto comparecompare subgroupssubgroups characteristiccharacteristic Estimates and 95% confidence intervals • Assumes the between-study variance  2 is the same in Follow-up all subgroups 12 – Sensible when some or all subgroups have few studies • Predict effect according 10 • Estimates the difference in treatment effect between to length of follow-up subgroups • SMD decreases by 0.18 8 • Example: Long duration vs. short duration (95%CI: 0.02 to 0.34) Difference in SMD = 0.5 (95%CI: –0.5 to 1.5) p = 0.32 for each extra week of 6 – longer duration trials have a less negative SMD treatment – i.e. treatment effect is smaller in long duration trials •(p = 0.008) 4 • Weak statistical evidence for this being a true effect -2 -1 0 1 – but dichotomization reduces statistical power Standardized mean difference Favours exercise Favours control

‘Bubble plots’ ‘Bubble plots’ ProblemsProblems andand pitfallspitfalls • Circle sizes vary with inverse of within-study variance (weight in a fixed-effect meta-analysis) 1 • We shall consider – Choice of explanatory variables and spurious findings

.5 – Confounding – Lack of power – Aggregation bias 0 Prevented fraction

HasHas effectiveness effectiveness of of fluoride fluoride gels gels changed changed -.5 overover time?time?Marinho Marinho etet alal (2004)(2004) -1 1960 1970 1980 1990 2000

Year SelectingSelecting studystudy characteristicscharacteristics ConfoundingConfounding

• Meta-regression looks at observational relationships • There are typically many study characteristics that might – even if the studies are randomized controlled trials be used as explanatory variables • A relationship may not be causal – Heterogeneity can always be explained if you look at enough of them • Confounding (due to co-linearity) is common – Great risk of spurious findings • Beware ‘prognostic factors’ – things that predict clinical outcome don’t necessarily affect Treatment Year of Quality treatment effects effect randomization – e.g. age may be strongly prognostic, but risk ratios may well be the same irrespective of age Confounder • Explanatory variable data may be missing (associated with – e.g. no information on dose; unable to assess quality, etc treatment effect and Hbk: 9.6.5 Hbk: 9.6.5.6 year)

LackLack ofof powerpower AggregationAggregation biasbias (ecological(ecological fallacy)fallacy)

• Unfortunately most meta-analyses in Cochrane reviews • Think about study characteristics that do not have many studies summarize patients within a study, e.g. – average age • Meta-regression typically has low power to detect – % females relationships – average duration of follow-up – % drop out • Model diagnostics / adequacy difficult to assess

Hbk: 9.6.5.5 10 Relationship between treatment effect and average age 10 Relationship between treatment effect and average age

2 2

1 1

.5 .5 SMD SMD .1 .1

.01 .01 30 40 50 60 70 30 40 50 60 70 Average age in each study Average age in each study

AggregationAggregation biasbias (ecological(ecological fallacy)fallacy) PracticalPractical guidanceguidance

• Think about study characteristics that summarize patients • How do I choose characteristics? within a study, e.g. • How many studies do I need? – average age –% females • How many characteristics can I look at? – average duration of follow-up • How do I do the meta-regression? – % drop out • How do I interpret the results? • How do I incorporate meta-regression into a Cochrane • Relationships across studies may not reflect relationships within studies review?

• The relationship between treatment effect and age, sex, etc best measured within a study – Collect individual patient data SelectingSelecting explanatoryexplanatory variablesvariables HowHow manymany studiesstudies // characteristics?characteristics?

• Specify a small number of characteristics in advance • Ensure there is scientific rationale for investigating each • Typical guidance for regression is to have at least 10 characteristic observations (in our case, studies) for each characteristic • Make sure the effect of a characteristic can be identified examined – does it differentiate studies? – aggregation bias • Some say 5 studies is enough • Think about whether the characteristic is closely related to another characteristic – confounding

Hbk: 9.6.5

SoftwareSoftware Meta-regressionMeta-regression inin StataStata RevMan • Not available • metareg is an easy-to-use Stata command Stata [recommended] • metareg :random-effects meta-regression • For details of obtaining the command, type • vwls :fixed-effect meta-regression findit metareg in Stata and click the links to install SAS • See van Houwelingen et al (2002) • For explanation of command syntax, then type Comprehensive Meta-analysis help metareg in Stata • Single covariate only in CMA 2; multiple in next version Other software • For more explanation and discussion, see: •R, WinBUGS Harbord & Higgins Stata Journal 2008; 8(4):493-519 ShouldShould youyou believebelieve meta-regressionmeta-regression IncludingIncluding meta-regressionmeta-regression inin aa results?results? CochraneCochrane reviewreview • Was the analysis pre-specified or post hoc? • You are encouraged to use meta-regression if it is appropriate • Is there indirect evidence in support of the • ‘Bubble’ plots may be included as Additional Figures findings? • Results should be presented in Additional tables • Is the magnitude of the relationship of practical • Consider presenting: importance? Explanatory Slope or 95% Proportion Interpretation • Is there strong statistical evidence of an effect variable Exp(slope) confidence of variation interval explained (small p-value) ? Duration OR = 1.3 0.9 to 1.8 14% Weak evidence that increases with duration

Hbk: 9.6.6

Key messages ExtensionsExtensions Key messages • Meta-regression and subgroup analysis examine the • Baseline risk of the studied population (measured in the relationship between treatment effects and one or more Control group) might be considered as an explanatory variable study-level characteristics – Beware! It is inherently correlated with treatment effects • Using meta-regression to explain heterogeneity sounds – Special methods are needed (Thompson et al 1997) Hbk: 9.6.7 great in theory, and is straightforward to perform in Stata • In practice subgroup analysis and meta-regression • If a statistically significant result is obtained, consider should be undertaken and interpreted with caution using a permutation test to obtain the ‘correct’ p-value – also can be used to ‘adjust’ for multiple testing of several – observational relationships explanatory variables (Higgins and Thompson 2004) – few studies – implemented as permute() option to metareg – many potential sources of heterogeneity – confounding and aggregation bias ReferencesReferences MoreMore referencesreferences • Berkey CS, Hoaglin DC, Mosteller F and Colditz GA. A random- effects regression model for meta-analysis. Statistics in Medicine • Thompson SG, Smith TC and Sharp SJ. Investigation underlying risk 1995; 14: 395-411 as a source of heterogeneity in meta-analysis. Statistics in Medicine • Harbord RM,Higgins JPT. Meta-regression in Stata. Stata Journal 1997; 16: 2741-2758 2008; 8(4): 493-519 • Thompson SG, Sharp SJ. Explaining heterogeneity in meta-analysis: • Higgins J, Thompson S, Altman D and Deeks J. Statistical a comparison of methods. Statistics in Medicine 1999; 18: 2693-2708 heterogeneity in systematic reviews of clinical trials: a critical • Thompson SG, Higgins JPT. How should meta-regression analyses appraisal of guidelines and practice. Journal of Health Services be undertaken and interpreted? Statistics in Medicine 2002; 21: Research and Policy 2002; 7: 51-61 1559-1574 • Higgins JPT, Thompson SG. Controlling the risk of spurious results • Thompson SG and Higgins JPT. Can meta-analysis help target from meta-regression. Statistics in Medicine 2004; 23: 1663-1682 interventions at individuals most likely to benefit? The Lancet 2005; • Marinho VCC, Higgins JPT, Logan S, Sheiham A. Fluoride gels for 365: 341-346,. preventing dental caries in children and adolescents (Cochrane • van Houwelingen HC, Arends LR, Stijnen T. Tutorial in Biostatistics: Review). In: The Cochrane Library, Issue 3, 2004. Chichester, UK: Advanced methods in meta-analysis: multivariate approach and John Wiley & Sons, Ltd meta-regression. Statistics in Medicine 2002; 21: 589–624