<<

Introduction to and Meta-Analysis: A Perspective Sally C. Morton Department of University of Pittsburgh

Methods for Synthesis: A Cross-Disciplinary Approach, October 2013 Cross-Disciplinary Communication From the Healthcare Systematic Review World • Terminology • Institute of (IOM) standards for systematic reviews – of individual studies –Heterogeneity –Strength of the body of evidence

Research Synthesis, Morton, 10/13, 2 How Did We Get Here? “The Evidence Paradox” (Sean Tunis):

• 18,000+ RCTs published each year • Tens of thousands of other clinical studies • Systematic reviews routinely conclude that:

“The available evidence is of poor quality and therefore inadequate to inform decisions of the type we are interested in making.”

Research Synthesis, Morton, 10/13, 3 Terminology Systematic Review (SR): Review of a clearly formulated question that uses systematic and explicit methods to identify, select, and critically appraise relevant research, and to collect and analyze from the studies that are included in the review Meta-analysis (MA): Use of statistical techniques in an SR to integrate the results of included studies to conduct

Adapted from the Collaboration Glossary Research Synthesis, Morton, 10/13, 4 Key Points

1. MA should not be used as a synonym for SR 2. An MA should be done in the context of an SR 3. “An MA should not be assumed to always be an appropriate step in an SR. The decision to conduct an MA is neither purely analytical nor statistical in nature.”

MAs SRs

Research Synthesis, Morton, 10/13, 5 Steps of a Systematic Review

1. Develop a focused 2. Define inclusion/exclusion criteria 3. Select the outcomes for your review 4. Find the studies 5. Abstract the data 6. Assess quality of the data 7. Explore data (heterogeneity) 8. Synthesize the data descriptively and inferentially via meta-analysis if appropriate 9. Summarize the findings

Research Synthesis, Morton, 10/13, 6 Systematic Review Standards A standard is a process, action, or procedure for performing SRs that is deemed essential to producing scientifically valid, transparent, and reproducible results

Research Synthesis, Morton, 10/13, 7 IOM Report on Standards for SRs (2011)

Committee Charge: Recommend methodological standards for SRs of comparative effectiveness research (CER) on health and health care • Assess potential methodological standards that would assure objective, transparent, and scientifically valid SRs of CER • Recommend a set of methodological standards for developing and reporting such SRs Research Synthesis, Morton, 10/13, 8 Committee • Available research evidence • Expert guidance from: — Agency for Healthcare Research and Quality (AHRQ) Effective Health Care Program — Centre for Reviews and Dissemination (CRD) 21 standards — Cochrane Collaboration and 82 elements — Grading of Recommendations Assessment, recommended Development and (GRADE) Working Group — Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) • Committee’s assessment criteria: o Acceptability (credibility) o Scientific rigor o Applicability (generalizability) o Timeliness o o Transparency Patient-centeredness o Research Synthesis, Morton, 10/13, 9 IOM standards are categorized into four subgroups: • Initiating an SR • Finding and assessing individual studies • Synthesizing the body of evidence • Reporting SRs

Research Synthesis, Morton, 10/13, 10 Buscemi et al., 2006 – “Single extraction was faster, but resulted in 21.7% more mistakes.”

AHRQ – Ensure quality control mechanism; usually through use of independent researchers to assess studies for eligibility. Pilot testing is particularly important if there is not dual-review screening. CRD – Good to have more than one researcher to help minimize bias and error at all stages of the review. Parallel independent assessments should be conducted to minimize the risk of errors. Cochrane – At least two people, independently. Process must be transparent, and chosen to minimize biases and human error. Research Synthesis, Morton, 10/13, 11 “Collectively the standards and elements present a daunting task. Few, if any, members of the committee have participated in an SR that fully meets all of them. Yet the evidence and experience are strong enough that it is impossible to ignore these standards or hope that one can safely cut corners. The standards will be especially valuable for SRs of high-stakes clinical questions with broad population impact, where the use of public funds to get the right answer justifies careful attention to the rigor with which the SR is conducted. Individuals involved in SRs should be thoughtful about all of the standards and elements, using their best judgment if resources are inadequate to implement all of them, or if some seem inappropriate for the particular task or question at hand. Transparency in reporting the methods actually used and the reasoning behind the choices are among the most important of the standards recommended by the committee.”

Research Synthesis, Morton, 10/13, 12 IOM Standards Regarding Study Quality and Heterogeneity

STANDARD 3.6 Critically appraise each study 3.6.1 Systematically assess the risk of bias, using predefined criteria 3.6.2 Assess the relevance of the study’s populations, interventions, and outcome measures 3.6.3 Assess the fidelity of the implementation of interventions

STANDARD 4.2 Conduct a qualitative synthesis 4.2.4 Describe the relationships between the characteristics of the individual studies and their reported findings and patterns across studies

STANDARD 4.4 If conducting a meta-analysis, then do the following: 4.4.2 Address the heterogeneity among study effects Why Assess the Quality of Individual Studies?

• Combining poor quality studies may lead to biased, and therefore, misleading , pooled estimates • Assessment of quality can be controversial and lead to its own form of bias • Variety of methods exist including the Cochrane risk of bias tool

• Assessing quality of observational studies is very difficult

Research Synthesis, Morton, 10/13, 14 Cochrane Risk of Bias Tool Domain Description Review authors’ judgement Describe the method used to generate the allocation sequence in Was the allocation sequence 1. Sequence generation. sufficient detail to allow an adequately generated? assessment of whether it should produce comparable groups.

Describe the method used to conceal the allocation sequence in sufficient detail to determine Was allocation adequately 2. Allocation concealment. whether intervention allocations concealed? could have been foreseen in advance of, or during, enrolment.

Describe all measures used, if any, to blind study participants and 3. Blinding of participants, personnel from knowledge of which Was knowledge of the allocated personnel and outcome intervention a participant received. intervention adequately assessors Provide any relating to prevented during the study? whether the intended blinding was effective. Research Synthesis, Morton, 10/13, 15 Domain Description Review authors’ judgment

Describe the completeness of outcome data for each main outcome, including attrition and exclusions from the analysis. State whether attrition and exclusions were reported, the numbers in each Were incomplete outcome data 4. Incomplete outcome data intervention group (compared with adequately addressed? total randomized participants), reasons for attrition/exclusions where reported, and any re-inclusions in analyses performed by the review authors.

State how the possibility of selective Are reports of the study free of outcome reporting was examined by 5. Selective outcome reporting suggestion of selective outcome the review authors, and what was reporting? found.

State any important concerns about bias not addressed in the other domains in the tool. Was the study apparently free of 6. Other sources of bias If particular questions/entries were other problems that could put it at pre-specified in the review’s , a high risk of bias? responses should be provided for each question/entry. Research Synthesis, Morton, 10/13, 16 Research Synthesis, Morton, 10/13, 17 “Heterogeneity is Your Friend” (J. Berlin)

Fruit salad may, or may not, be tasty and interesting Which are the apples and oranges, and how do they differ? Research Synthesis, Morton, 10/13, 18 Definitions of Heterogeneity From a Health Care Perspective Different types of heterogeneity: • Clinical heterogeneity (diversity): Variability in participants, interventions and outcomes • Methodological heterogeneity (diversity): Variability in study design and risk of bias • Statistical heterogeneity: Variability in treatment effects, resulting from clinical and/or methodological diversity

• Statistical heterogeneity is present if the observed treatment effects are more different from each other than would be expected due to chance alone

Research Synthesis, Morton, 10/13, 19

Discuss Clinical/Methodological or “Substantive” Heterogeneity Prior To Analysis • Think first: Are included studies similar with respect to treatment effect? Study design, subjects, treatments, etc. may affect results. • Include in protocol: Sources of heterogeneity that you might stratify analysis on, or that you might include as independent variables in a meta-regression • Do later: Q to test the that the true (population) treatment effect is equal in all studies; and/or I-squared (I2) statistic • Remember: Tests for heterogeneity have low statistical power

Research Synthesis, Morton, 10/13, 20 Evaluating The Strength of The Body of Evidence

Research Synthesis, Morton, 10/13, 21 IOM Standard Regarding Strength of Evidence

Research Synthesis, Morton, 10/13, 22 • Use consistent language to summarize the conclusions of individual studies as well as the body of evidence: – Presenting results not sufficient – Reviews often very long

Evidence on assessment methods is elusive • Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach is becoming more popular • Anecdotal evidence that – GRADE is difficult to apply – GRADE is being modified for specific situations • GRADE starts by downgrading observational studies

Research Synthesis, Morton, 10/13, 23 Reliability Testing of the AHRQ EPC Approach to Grading the Strength of Evidence in Comparative Effectiveness Reviews (Berkman et al., RTI/UNC EPC)

Journal of Clinical 66 (2013) 1105e1117

ORIGINAL ARTICLES

Interrater reliability of grading strength of evidence varies with the complexity of the evidence in systematic reviews a, a a b c Nancy D. Berkman *, Kathleen N. Lohr , Laura C. Morgan , Tzy-Mey Kuo , Sally C. Morton aDivision of Social Policy, Health, and Economics Research, RTI International (Research Triangle Institute), Research Triangle Park, NC 27709-2194, USA bLineburger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, 101 E. Weaver Street, Carrboro, NC, 27599, USA cDepartment of Biostatistics, University of Pittsburgh, 130 DeSoto Street, Pittsburgh, PA, 15261, USA Accepted 5 June 2013

Research Synthesis, Morton, 10/13, 24 Study Design • Inter-rater reliability of – 4 required domains (risk of bias; consistency; directness; and precision) for RCTs and observational studies separately • 10 exercises from 2 published CER SRs on depression and rheumatoid – All exercises contained RCTs – 6 exercises included one or more observational studies • Eleven pairs of reviewers (10 from 9 EPCs, 1 from AHRQ) participated in each exercise

Research Synthesis, Morton, 10/13, 25 Study/Design N Comparison Quality Results: Biologics vs. Oral DMARDs

TEMPO, 2005 451 ETN 25mg Fair Remission at week 24: RCT twice/wk vs. MTX DAS < 1.6: 13.0% vs. 13.6% (P = NS) DAS28 < 2.6: 13.9% vs. 13.6% (P = NS) Remission at week 52: DAS <1.6: 17.5% vs. 14%, (P = NS) DAS28 < 2.6: 17.5% vs. 17.1%, (P = NS)

PREMIER, 2006 531 ADA 40 mg Fair Clinical remission (DAS28 < 2.6) at 1 year: RCT biweekly vs. MTX 23% vs. 21%, (P = 0.582†) 20 mg/wk Listing 2006 1083 Biologics vs. Fair Odds of achieving remission (DAS28 < 2.6) at 12 Prospective conventional months: DMARDs Adjusted* OR, 1.95 (95% CI, 1.20-3.19); (P = 0.006) *Adjusted for age, sex, # of previous DMARDs, DAS28, ESR, FFbH, osteoporosis, previous txt with cyclosporine A. Matched pairs analysis DAS28 remission at 12 months: 24.9% vs. 12.4%, (P = 0.004)

Research Synthesis, Morton, 10/13, 26 Study Results

Domain or Strength Independent Reconciled of Evidence (SOE) Reviewer Reviewer Pair Agreement Agreement

Bias: RCTs 0.67 (0.61,0.73) 0.65 (0.56,0.73)

Bias: Observational 0.11 (0.05,0.18) 0.22 (0.13,0.32) studies SOE: All studies 0.20 (0.16,0.25) 0.24 (0.14,0.34)

SOE: RCTs only 0.22 (0.17,0.28) 0.30 (0.17,0.43)

Research Synthesis, Morton, 10/13, 27 Study Conclusions • Inter-rater reliability is low • Complex evidence bases, particularly those with a mix of randomized and observational studies, can be extremely difficult to grade • Dual review with adjudication of differences improves reliability • Additional methodological guidance needed for reviewers • More research is needed on reliability, and the advantages and disadvantages of concrete rules for determining strength of evidence Research Synthesis, Morton, 10/13, 28 Final words

“To do a meta-analysis is easy, to do one well is hard.” - Ingram Olkin

Research Synthesis, Morton, 10/13, 29 Thank You

Sally C. Morton Department of Biostatistics Graduate School of Public Health University of Pittsburgh

[email protected] Are RCTs Enough? Austin Bradford Hill Heberden Oration (1965): “this leads directly to a related criticism of the present controlled trial – that it does not tell the doctor what he wants to know. It may be so constituted as to show without any doubt that treatment A is on the average better than treatment B. On the other hand, that result does not answer the practicing doctor’s question – what is the most likely outcome when this drug is given to a particular patient?”

Research Synthesis, Morton, 10/13, 31 PRIMSA Flowchart and Checklist

Endorsed (required) by many top clinical journals

Research Synthesis, Morton, 10/13, 32 Research Synthesis, Morton, 10/13, 33 Research Synthesis, Morton, 10/13, 34 Q Statistic – The Math Let θ = true treatment effect for study ii for =1,..., K i

Individual study iT effect is estimated by i K ∑ wTii i=1 Pooled treatment effect θPP is estimated by T = K ∑ wi i=1

Ho:θθ12= = ... = θKP = θ I-squared is a newer K heterogeneity statistic 2 that measures the Test statistic is Qw= ∑ i ()TTiP− percentage of variation i=1 across studies that 2 cannot be explained and QK ~χ with − 1 degrees of freedom by chance H0

Research Synthesis, Morton, 10/13, 35 Assessment of Heterogeneity – Proceed with Meta-analysis, or Not? First, check your data! Options: • Do not do meta-analysis • Change treatment effect measure • Explore heterogeneity via stratification or meta-regression • Account for heterogeneity via random effects model (not advisable if heterogeneity large) • Exclude studies (can do “leave-one- out” or jackknife test to determine individual study effect on heterogeneity)

Research Synthesis, Morton, 10/13, 36 Example Strength of Evidence Table

Research Synthesis, Morton, 10/13, 37 Example SOE Table, continued

Research Synthesis, Morton, 10/13, 38