<<

EUROPEAN UROLOGY 71 (2017) 811–819 available at www.sciencedirect.com journal homepage: www.europeanurology.com

Platinum Priority – Guidelines Editorial by James W.F. Catto, Jane M. Blazeby, Lars Holmberg, Freddie C. Hamdy and Julia Brown on pp. 820–821 of this issue Conflict of Evidence: Resolving Discrepancies When Findings from Randomized Controlled Trials and Meta-analyses Disagree

Richard J. Sylvester a,*, Steven E. Canfield b, Thomas B.L. Lam c, Lorenzo Marconi d, Steven MacLennan c, Yuhong Yuan e, Graeme MacLennan f, John Norrie f, Muhammad Imran Omar c, Harman M. Bruins g, Virginia Herna´ndez h, Karin Plass i, Hendrik Van Poppel j, James N’Dow c a EAU Guidelines Office, Brussels, Belgium; b Division of Urology, University of Texas McGovern Medical School, Houston, TX, USA; c Academic Urology Unit, University of Aberdeen, Aberdeen, UK; d Department of Urology, Coimbra University Hospital, Coimbra, Portugal; e Department of , McMaster University, Hamilton, ON, Canada; f Health Services Unit, University of Aberdeen, Aberdeen, UK; g Department of Urology, Radboud University Medical Center, Nijmegen, The Netherlands; h Department of Urology, Hospital Universitario Fundacion Alcorcon, Madrid, Spain; i EAU Central Office, Guidelines Office, Arnhem, The Netherlands; j Department of Urology, University Hospital Gasthuisberg, Katholieke Universiteit Leuven, Leuven, Belgium

Article info Abstract

Article history: Context: Clinicians and treatment guideline developers are faced with a dilemma when Accepted November 16, 2016 the results of a new, large, well-conducted randomized controlled trial (RCT) are in direct conflict with the results of a previous (SR) and meta-analysis (MA). Associate Editor: Objective: To explore and discuss possible reasons for disagreement in results from SRs/ James Catto MAs and RCTs and to provide guidance to clinicians and guideline developers for making well-informed treatment decisions and recommendations in the face of conflicting data. Evidence acquisition: The advantages and limitations of RCTs and SRs/MAs are reviewed. Keywords: Two practical examples that have a direct bearing on European Association of Urology Conflict of evidence guidelines on treatment recommendations are discussed in detail to illustrate the points to be considered when conflicts exist between the results of large RCTs and SRs/MAs. Meta-analyses Evidence synthesis: RCTs are the for providing evidence of the effective- Randomized controlled trials ness of interventions. However, concerns regarding the internal and external validity of Systematic reviews an RCT may limit its applicability to clinical practice. SRs/MAs synthesize all evidence Treatment guidelines related to a given , but two urologic examples show that the validity of the results depends on the quality of the individual studies, the clinical and methodo- logical heterogeneity of the studies, and . Conclusions: Although SRs/MAs can provide a higher level of evidence than RCTs, the quality of the evidence from both RCTs and SRs/MAs should be investigated when their results conflict to determine which source provides the better evidence. Guideline developers should have a well-defined and robust process to assess the evidence from MAs and RCTs when such conflicts exist. Patient summary: We discuss the advantages and limitations of using data from randomized controlled trials and systematic reviews/meta-analyses in informing clini- cal practice when there are conflicting results. We provide guidance on how such conflicts should be dealt with by guideline organizations. # 2016 European Association of Urology. Published by Elsevier B.V. All rights reserved.

* Corresponding author. EAU Guidelines Office, 500 avenue Moliere, Brussels 1050, Belgium. Tel. +322 346 4916. E-mail address: [email protected] (R.J. Sylvester). http://dx.doi.org/10.1016/j.eururo.2016.11.023 0302-2838/# 2016 European Association of Urology. Published by Elsevier B.V. All rights reserved. 812 EUROPEAN UROLOGY 71 (2017) 811–819

1. Introduction 2.1. Advantages of RCTs

The practice of evidence-based medicine means integrating RCTs are the gold standard for providing evidence on the individual clinical expertise with the best available external effectiveness of interventions [11,12]. Randomization bal- clinical evidence from systematic research [1]. ances, on average, the distribution of both known and Treatment recommendations in European Association of unknown prognostic factors at baseline in the intervention Urology (EAU) guidelines are underpinned, whenever groups, thereby minimizing when assigning possible, by the results of systematic reviews (SRs)/meta- patients to treatments. Although adjusting for baseline analyses (MAs) and large randomized controlled trials covariates used in the randomization process can improve (RCTs). According to the 2009 Oxford Centre for Evidence- statistical power, complex adjustment procedures such as based Medicine, SRs of RCTs (with or without a meta- propensity score weighting are not usually required when analysis) that are free of worrisome variations (heteroge- comparing outcomes. neity) in results between individual studies provide the Patients are selected, treated, followed, and assessed highest level of evidence (LE), 1a, whereas individual RCTs according to a common testing a specific hypothe- with a narrow confidence interval provide the next highest sis. Blinding of participants and physicians to the allocated LE, 1b [2]. As SRs can provide a higher LE than RCTs, the intervention may be possible to minimize performance bias, results of SRs are generally considered to take precedence and is especially important when assessing outcomes when developing treatment recommendations. [13]. Quality control measures and external review of key The quality of the results of an SR/MA depends on the parameters maximize study quality. quality of the studies included. Kjaergard et al [3] found a correlation between methodologic quality and discrepan- 2.2. Limitations of RCTs cies in the results of large and small RCTs included in MAs. Intervention effects were exaggerated in small trials with RCTs can be challenging to design (randomization and inadequate allocation sequence generation, inadequate blinding), conduct (poor recruitment, loss to follow-up), allocation concealment, and no double blinding. analyze (missing data), and report (patient exclusions). Discrepancies have also been noted between large RCTs RCTs require an adequate sample size and follow-up to and previously published MAs on the same subject [4– have sufficient power to detect clinically relevant differ- 6]. For 12 large RCTs carried out subsequent to 19 MAs ences between interventions [14]. In practice, many clinical addressing the same question, LeLorier et al [7] found that trials do not meet their prespecified power requirements, so the results of subsequent RCTs[1_TD$IF]disagreed with those of a conclusion of ‘‘no significant difference’’ in outcome earlier MAs 35% of the time. should not be interpreted as meaning that two or more To illustrate these points and provide guidance to treatments are equivalent in effect. Sample size estimation guideline developers in dealing with conflicting data from requires data about expected differences and variability of different sources, two examples that have a direct bearing the primary outcome. Often these data are unknown or only on EAU Guidelines treatment recommendations are pre- available from observational studies prone to bias. sented. In the first example, the EAU Guidelines Office has Although analyses using the intention-to-treat principle recently been confronted with the results of a large RCT that can provide an unbiased estimate of the treatment effect, found no beneficial effect of medical expulsive therapy this assumes that there are no differences in follow-up or (MET) on stone passage, contrary to results of previous missing outcome data that may bias the treatment meta-analyses that formed the basis for treatment recom- comparison [15]. In some RCTs, not all participants receive mendations [8]. In the second example, which compares the their randomized intervention; they may, for example, efficacy of partial and radical nephrectomy for localized cross over to the other randomized treatment, in which case renal tumors, discordance between the results of the meta- a per-protocol analysis may also provide useful informa- analysis and the only available RCT are investigated [9,10]. tion. Various analysis strategies exist, depending on whether the objective is to estimate treatment efficacy 2. Advantages and limitations of RCTs (the intervention effect under perfect conditions, in which case intent to treat can dilute the size of the treatment As summarized in Table 1, RCTs have a number of effect) or effectiveness (the real-world intervention effect advantages and limitations. with ‘‘imperfect’’ compliance).

Table 1 – Advantages and limitations of randomized controlled trials (RCTs)

Advantages Limitations

Randomization minimizes the influence of both known and It may be difficult to recruit and follow up patients unknown prognostic variables on treatment outcome RCTs can demonstrate causality Ethical considerations may make randomization difficult Patients are treated according to a common protocol Required study power might not be met Quality control of treatment and outcome assessment Generalizability may be low RCTs provide the strongest empirical evidence of treatment efficacy RCTs are expensive and resource-intensive EUROPEAN UROLOGY 71 (2017) 811–819 813

Table 2 – Advantages and limitations of systematic reviews and meta-analyses

Advantages Limitations

Focused, well-defined clinical question with a Depends on the quality of the studies included clear objective and explicit, predefined study Susceptible to the effects of heterogeneity of the studies included eligibility criteria  Clinical heterogeneity Comprehensive literature search strategy to  Participants (eg, age, gender, severity and subtype, study eligibility criteria) guarantee the identification of all potentially  Interventions (eg, drug doses, duration/intensity of treatment, delivery, co-interventions, surgeon eligible studies experience) Critical appraisal of all the included studies  Outcomes (eg, definition of outcome, outcomes reported, timing and method of measurements, that is used to guide the analysis and follow-up duration, cutoff points) conclusions  Methodological heterogeneity (eg, different study designs, across studies) Increases the power to detect differences  Statistical heterogeneity between interventions Publication bias Increases the precision of the estimate of the Time- and resource-consuming treatment effect Allows the comparison of treatment effects across different studies or subgroups of patients, interventions and outcomes

An RCT with double blinding, few missing data, and good individual RCTs. They are useful when individual studies are compliance will have high internal validity, but if an RCT underpowered or yield inconclusive or conflicting results, recruits only a very select population, the external validity or when an overall, more precise estimate of the size of the (generalizability) may be low. This can happen because of treatment effect is required. MAs increase the power to overly restrictive inclusion/exclusion criteria or inclusion of detect moderate but clinically meaningful differences in only expert clinicians in select sites [16]. Single-center RCTs treatment outcome and assess if the treatment effect is typically have lower external validity compared to multicen- similar across different studies or types of patients ter RCTs that allow comparison of results between centers. [22]. They are useful in exploring the effects of an Finally, robust, adequately powered RCTs with long-term intervention in subgroups of patients, especially in IPD follow-up are difficult to organize, expensive, and resource- MAs [20,21]. intensive. Thus, many RCTs focus on short-term or SRs and MAs are vital for guideline developers, surrogate outcomes, the clinical significance of which is healthcare providers, patients, researchers, and policy- often uncertain. Any short-term benefits might not be makers in order to guide clinical practice, research, and maintained over the longer time horizons that are more healthcare policies [23]. relevant to patients, clinicians, and policy-makers [17]. 3.2. Limitations of SR/MAs 3. Advantages and limitations of SRs and MAs The validity of an MA depends on the quality of the SR on Table 2 outlines the advantages and limitations of SR/MAs. which it is based. SRs and MAs have a number of potential limitations, including poor quality of the studies included, 3.1. Advantages of SR/MAs heterogeneity, and publication bias. The literature summary provided in an SR and the results An SR is a focused on a research question of an MA are only as reliable as the quality of the studies that tries to identify, appraise, select, and synthesize all included. Although IPD MAs and multicenter RCTs can be research evidence relevant to that question. analyzed using the same statistical techniques for clustered SRs are a priori defined in a participant, intervention, data, where the clusters are studies and centers, respec- comparator, outcome (PICO)–based protocol outlining the tively, there may be important clinical and methodological study inclusion criteria. They are the only transparent and heterogeneity between the studies in an MA since they are replicable form of literature review that provide a rigorous not carried out based on a common protocol. The studies and critical qualitative appraisal of the evidence related to may be heterogeneous regarding patients included, the an intervention. SRs explore the findings of individual intervention, or the assessment of treatment outcome. studies, draw attention to their differences, and identify Although heterogeneity in treatment effect can be better sources of bias [18]. investigated in IPD MAs, the primary studies should be An MA is a statistical technique for quantitatively similar enough to be combined, otherwise genuine differ- combining data from two or more separate RCTs asking ences in effects may be obscured [24,25]. Since institutions the same or a similar question [19]. They should only be participating in a multicenter study are supposed to treat, performed as part of an SR, otherwise the MA is a combined follow up, and assess patients according to a common analysis that is susceptible to study selection bias. Two protocol, there is potentially a greater degree of standardi- different types of MAs exist: literature-based or aggregate zation and higher quality data in multicenter clinical trials data MAs, and individual patient data (IPD) MAs [20,21]. compared to studies included in MAs. MAs provide an overall estimate of the size of the If bias is present in the individual studies included in an treatment effect, giving due weight to the size of the MA, MAs will compound these errors and produce a biased 814 EUROPEAN UROLOGY 71 (2017) 811–819 result. The risk of bias (RoB) for the outcomes in each study Small studies that find statistically significant (but unreal- should be systematically assessed and sensitivity analyses istically large) treatment effects are more likely to be performed to examine the effect of RoB on the conclusions. published than negative studies, and then included in an SR Observational and nonrandomized comparative studies in and MA, leading to publication bias. Both of these SRs of interventions should not be included in MAs because phenomena can be investigated using funnel plots [31]. the MA may provide very precise but spurious results because of confounding and patient selection bias. 4.3. Heterogeneity Only a nonrandom proportion of research projects ultimately reach publication in an indexed journal and Heterogeneity within an SR/MA can arise from many become readily identifiable for SRs. Statistically significant sources, including the population recruited (age, sex, ‘‘positive’’ results favoring an intervention are more likely to disease severity, etc), the intervention(s) and control be published, published more quickly, and published in treatments, and the definition and timing of outcome higher impact journals, leading to publication bias measurements. If studies included in an SR/MA differ [26]. When these trials are pooled together in an MA, this substantially from a subsequent large RCT, then judgment is may lead to exaggeration of the treatment effect. Begg and required on whether similar findings should be expected. Egger tests, along with funnel graphs and plots, can be Another source of heterogeneity is differences in the applied for detection of publication bias, but these have methodological quality of the studies included. Deficiencies limited power in small MAs, such as those including fewer in the generation and concealment of the allocation than ten studies [27]. To minimize publication bias, authors sequence, adherence to treatment, handling of missing should not only perform a comprehensive systematic data, and outcome assessment can all introduce bias in the literature search looking for published trials in various outcomes reported in the studies included [18]. Bias may electronic databases, but should also search trial registries then be propagated in MAs through the pooling of biased for unpublished studies and conference abstracts or study effects, thus contributing to different estimates of proceedings [18]. effectiveness between an SR/MA and subsequent large RCTs. Nevertheless, since an MA is generally seen to have a 4. When the results of an RCT are in conflict with higher LE than a single RCT, the results of a poor-quality MA the results of an SR/MA may have more impact than a well-conducted RCT. Heterogeneity should be assessed using both clinical It is not uncommon for the results of a large RCT to appear to knowledge and statistical methods. If substantial heteroge- be inconsistent with evidence from SRs/MAs. The most neity from any source is suspected, random effects models extreme case is when an intervention thought to be are recommended; however, the pooling of data and beneficial is demonstrated to be harmful in a large RCT estimation of an overall treatment effect may be inappro- [9,10]. More commonly, an RCT may show that a treatment priate with any statistical model in the presence of is ineffective, or less effective than found in a previous MA, heterogeneity. Meta-regression is a useful tool for exploring or perhaps only effective in a subpopulation of patients. the relationship between RCT effect sizes and character- Assuming the conflicting RCT was of high quality, a number istics at a study level [32]; however, IPD MAs are required of issues should be explored to try to explain the for assessment at a patient level [21,33]. Appropriate discrepancies. statistical modeling may show that after correcting for sources of bias and heterogeneity, discrepancies between 4.1. Quality of the SR SRs/MAs and definitive RCTs are reduced. Whatever the approach, interpretation of results is less straightforward The starting point is the methodological quality of the SR. when heterogeneity is present. Assessment of Multiple Systematic Reviews (AMSTAR)[6_TD$IF] and To provide guidance to clinicians and guideline devel- Documentation and Appraisal Review Tool (DART) check- opers when there is a conflict in results between a large RCT lists [28–30] allow readers to judge a review’s quality by and an SR/MA, a practical checklist of points to consider is focusing on the essential components of a well-conducted provided in Table 3. SR. Items include the comprehensiveness of the search strategy,[7_TD$IF] a description of the characteristics of studies 5. Examples of discrepancies between findings included and an assessment of their scientific quality. A from MAs and large RCTs poor quality SR/MA may produce biased results that conflict with a large RCT. 5.1. Medical expulsive therapy

4.2. Small study effects and publication bias Five SRs and MAs on the management of uncomplicated symptomatic ureteric stones using MET were published in Small study effects and publication bias can individually the past 10 yr [34–38]. All five suggested that alpha blockers and jointly produce results in an SR/MA that conflict with a and nifedipine were more effective in increasing spontane- large RCT. Studies have shown that small RCTs can ous passage of ureteric stones compared to control (risk exaggerate intervention effects owing to shortcomings in ratios ranging from 1.45 to 1.59). The reviews identified methodological rigor, which may then introduce bias [3]. numerous sources of potential bias that limited the strength EUROPEAN UROLOGY 71 (2017) 811–819 815

Table 3 – Checklist of points to consider when the findings from a systematic review and meta-analysis differ with those from a large randomized controlled trial

Criteria to consider Questions to ask Rationale

Selection bias Were the sequence generation and allocation If the sequence generation was not truly random or the concealment adequate in both the studies included allocation was not effectively concealed, this can lead to in the SR/MA and the subsequent RCT?[4_TD$IF] exaggerated estimates in individual studies, and these may be amplified in MAs. Confounding bias Were the groups balanced for known prognostic Imbalances in known and unknown prognostic factors are factors at baseline and were any imbalances possible even in well-designed RCTs. Baseline imbalances may controlled for in the analysis? explain differences in estimates of effect if not controlled for in the analysis. Performance and Where possible, in all the studies included in the SR/ Some objective outcomes are unlikely to be affected by detection bias MA and for the new RCT,[4_TD$IF] was there blinding of study knowledge of the intervention arm, but failure to blind participants, clinicians administering the treatment, (particularly for subjective outcomes) may lead to an ancillary care-givers, and outcomes[5_TD$IF]assessors? exaggeration of effect sizes in individual studies, and these may When blinding is not possible, could knowledge of be amplified in MAs. the treatment received affect interpretation of any of the outcomes? Attrition bias Were all dropouts documented and unlikely to be If dropout rates differ between the treatment arms, then the related to the treatment outcome in the studies reasons may be related to the outcome of interest and may hide included in the SR/MA and in the new RCT?[4_TD$IF] important outcome effects. Reporting bias Were all outcomes that were stated in the methods Selective reporting of outcomes, or selective methods of and/or protocol for all the studies included in the reporting, may lead to exaggerated estimates of effect. SR/MA and in the new RCT[4_TD$IF] documented in the trial report? Were all the outcomes measured appropriately (as defined in the protocol) or were deviations reasonably explained? Publication bias Were funnel plots used to investigate publication Asymmetric funnel plots raise suspicion that there are bias in the SR/MA? Is the symmetric or is systematic differences between published and unpublished there reason to believe there is a systematic studies and that some positive or negative trials may be difference between published and unpublished unpublished. This may lead to exaggerated effect sizes in an studies? MA. Note: This is difficult to assess when there are <10 RCTs contributing to an MA. Consistency and Did the studies included in the SR/MA have If it can be shown that the outcomes are more effective in heterogeneity overlapping 95% CIs for the outcome? certain subgroups, or with variations of an intervention (eg, a of outcome Was variation more than would be expected by higher dose), then this explained heterogeneity may indicate a chance alone? key difference that may justify the results in the new RCT. Was the I2 statistic <40%? (Cochrane/GRADE rule of Where unexplained heterogeneity exists, then the estimate of thumb) effect is likely to be uncertain, even if precise. Were subgroups used to explain any observed heterogeneity? Were event rates in the control group similar in the different studies? Note: Subgroups of the population, the intervention/ control types, or the outcome measurement may explain heterogeneity. Directness Do the studies included in the SR/MA and the new Indirect populations, interventions, surrogate outcome RCT[4_TD$IF] both directly assess the research question about measures, or indirect comparisons may conceal or exaggerate the population, interventions, and outcomes? important differences within and between studies, and may impact the estimate of effect. Precision Were the sample sizes for the studies included in If any of the trials in the SR/MA or the new RCT[4_TD$IF] were not the SR/MA and the new RCT[4_TD$IF] powered to address the powered to detect a clinically meaningful difference in the outcomes of interest? effect estimate, this may reduce confidence in the estimate of Does the 95% CI in the MA include clinically judged effect. appreciable benefit and harm? If the lower and upper 95% CI thresholds indicate that the intervention may be beneficial at one end, but harmful at the other, this will probably reduce confidence in the estimate of effect. Sensitivity When some studies included in an SR/MA are Sensitivity analyses are different from subgroup analyses. Some analyses judged to be at high RoB and others at low RoB, or studies are actively omitted as we are only interested in the extreme variations in the populations or results when the biased or ‘‘different’’ studies are omitted. interventions in the studies are apparent, did the authors conduct a sensitivity analysis to ascertain the estimates of effect for only studies judged to be at low RoB?

SR = systematic review; MA = meta-analysis; RCT = randomized controlled trial; CI = confidence interval; RoB = risk of bias. 816 EUROPEAN UROLOGY 71 (2017) 811–819 of the evidence, and the authors concluded that there was studies. The only RCT was the EORTC study. No information an urgent need to conduct a large, robust, multicenter RCT was provided about the distribution of follow-up or patient to address these shortcomings. Pickard et al [8] published characteristics by treatment group (T category when >T1, the results of such an RCT in 1167 patients and found no tumor size, tumor grade, cell type, or renal function). evidence that either tamsulosin or nifedipine increased the Consequently, the differences in survival observed may not rate of spontaneous stone passage compared to placebo. The be directly due to differences in treatment efficacy. In results were consistent across subgroup and sensitivity addition, it is not clear to which patients the results can be analyses. generalized. Lastly, there was significant heterogeneity in We compare the RCT by Pickard et al [8] to the MA with the size of the treatment effect across the studies, so the the most studies, by Seitz et al [36], to explore and discuss overall estimate of the HR is not meaningful. Nevertheless, discordant findings. Most RCTs included in the Seitz MA the EORTC RCT also had limitations and should be were small and recruited from a single center; only six of 35 interpreted with caution: 55 patients crossed over to the (17%) recruited more than 100 patients. The majority had other randomized treatment, 140 patients were clinically or low internal validity and only one RCT reported allocation pathologically ineligible, and there were few cancer-related concealment. As small RCTs may report larger effect sizes events. compared to larger RCTs, an MA of small RCTs can lead to The MA found that PN was associated with a lower risk of biased estimates of treatment effects [39]. Seitz et al also severe chronic kidney disease (CKD); however, the EORTC found evidence of publication bias, which can lead to study only found lower of at least moderate renal overestimation of treatment effects and compromise the dysfunction, not of advanced kidney disease or renal failure, validity of the MA findings [40]. and this was not associated with a corresponding difference Seitz et al [36] found evidence of clinical heterogeneity in survival [41]. The studies in the MA did not always specify concerning the patient inclusion criteria, stone character- the status of the contralateral kidney, whereas in the EORTC istics, intervention, treatment in the control group, and study the contralateral kidney had to be normal. outcome measurement. In the MA, the primary outcome of Critical information regarding the biases of the studies being stone-free was inconsistently defined, assessed using included in the SR were not made explicit, since a Grading of different imaging modalities, and measured at a variety of Recommendations, Assessment, Development, and Evalua- time points. In the RCT of Pickard et al [8], the primary tion (GRADE) approach [42] was not used to assess the outcome was any need for further intervention within 4 wk quality of the evidence. The quality of the studies in the SR of randomization, which is compared here to being stone- and the heterogeneity of the results call into question the free. In the control group of the Pickard RCT, 80% of patients validity of the conclusions of the MA, which should thus be were stone-free, whereas in the Seitz review, stone-free viewed with skepticism. In the same year, another SR rates ranged from 4% to 78%, which highlights the potential suggested that localized renal cell carcinomas are best impact of heterogeneity in the studies included. managed by PN where technically feasible. However, the With contrasting primary outcomes and different base- evidence base had significant limitations owing to studies of line event rates in the control groups, it is not surprising that low methodological quality and high RoB [43]. the RCT and MA reported discordant findings. The choice of Further nonrandomized studies have found improved primary outcome is clearly of paramount importance in any survival after PN [44,45] and a reduction in the risk of trial. Heterogeneity in the conduct, design, and reporting of cardiovascular events [46] relative to RN; however, patients trials in this MA makes pooled treatment effects difficult, if chosen for PN had a higher baseline likelihood of long-term not impossible, to interpret. survival [47,48]. In another study, only patients with stage II CKD had a lower risk of developing significant renal 5.2. Partial versus radical nephrectomy impairment after PN [49]. More recently, an SR and MA of 21 nonrandomized comparative studies in patients with In a European Organization for Research and Treatment of clinical T1b and T2 renal tumors found better tumor control Cancer (EORTC) RCT involving 541 patients with a solitary and survival for PN compared to RN [50], but the SR is T1–2N0M0 renal tumor of 5 cm, 21 patients progressed, subject to the same biases as the Kim MA. nine after radical nephrectomy (RN) and 12 after partial Taking into account all the available efficacy data and a nephrectomy (PN). An intent-to-treat analysis found an perceived advantage in renal function, the 2016 EAU overall survival (OS) advantage in favor of RN ( Guidelines recommend, with several exceptions, that [HR] 1.5, p = 0.03); however, only 12 of the 117 deaths were localized renal cancers are better managed by PN than RN. due to kidney cancer, four after RN and eight after PN [10]. Subsequently, Kim et al [9] published an SR and MA 6. Discussion including some 41 000 patients and found statistically significant improvements in both OS (HR 0.81, p < 0.001) It is generally accepted that a high-quality SR of RCTs and an and disease-specific survival (HR 0.71, p < 0.001), but this associated MA can provide a higher LE than a single RCT time in favor of PN [9]. How can this discordance be addressing the same question [2]. This can be problematic, explained? however, when the results of the MA are in direct conflict The Kim MA has a number of limitations. First, the with the RCT, making it difficult for guideline organizations 38 trials included were mostly retrospective, single-center to interpret the evidence and issue recommendations. EUROPEAN UROLOGY 71 (2017) 811–819 817

Guideline groups should follow well-defined methodo- Despite the availability of MAs and RCTs, and in cases logical rules to assess the studies in these situations. RCTs where high LE does not exist, we may still not know what the should be appraised for their internal and external validity best treatment is. The GRADE system, which takes into using established tools [51]. The conflicting SR/MA should account the quality of evidence (high, moderate, low, very be appraised in the same fashion to determine the low) for critical outcomes, provides strengths of recommen- methodological quality of the review, the quality of the dations (strong, weak) for or against a treatment to aid studies included, inconsistency within the studies, unex- clinicians in their practice when consensus is not possible plained heterogeneity, and the likelihood of publication bias [42,53]. A decision curve approach, which takes into account using tools such as AMSTAR [28,29] and DART [30]. In some a patient’s values and preferences, may also be used to help cases, the discrepancy may be due to errors in the MA in choose between the different treatment options. applying study eligibility criteria or even data extraction [52], so there is a need for an SR/MA protocol and strict 7. Conclusions quality control. When MAs include many small underpowered studies, New or existing RCT data can lead to conflicts with MA data. especially combined with likely presence of publication In this paper, we present examples of and explore reasons bias, there is immediate concern for overinflation of, or for such conflicts. Guidance is provided to guideline completely erroneous, effect size measurement. In addition, developers on how to interpret conflicting data in such when a great degree of heterogeneity exists in the MA that circumstances to help assess which source is more reliable. cannot be easily accounted for, the results may be highly For guideline organizations both within and outside unreliable. In this regard, IPD MAs provide a better platform urology, having a well-defined and robust process to deal for assessing and explaining heterogeneity than aggregate with such conflicts is essential to improve guideline quality. data MAs do. Two examples were discussed in this manuscript to Author contributions: Richard J. Sylvester had full access to all the data in the study and takes responsibility for the integrity of the data and the illustrate the assessment process. In the case of MET for accuracy of the data analysis. ureteric stones, a large, high-quality RCT [8] contradicted many well-established MAs that pointed to a benefit of this Study concept and design: Sylvester, N’Dow. therapy. Analysis of a representative MA [36] revealed the Acquisition of data: Sylvester, Lam, Marconi, S. MacLennan, Yuan, Van inclusion of many small RCTs, poor internal validity, Poppel, N’Dow. significant study[2_TD$IF]heterogeneity and likely publication bias. Analysis and interpretation of data: Sylvester, Canfield, Lam, Marconi, When such MA concerns are present, a single high-quality S. MacLennan, Yuan, G. MacLennan, Norrie, Omar, Bruins, Hernandez, Plass, Van Poppel, N’Dow. RCT may be considered as having the higher LE. For Drafting of the manuscript: Sylvester, Canfield, Lam, Marconi, guideline organizations, this process can be used to justify a S. MacLennan, Yuan, G. MacLennan, Norrie, Omar, Bruins, Hernandez, change in recommendations based on methodologically Plass, Van Poppel, N’Dow. sound principles. Critical revision of the manuscript for important intellectual content: Radical versus partial nephrectomy provides a more Sylvester, Canfield, Lam, Marconi, S. MacLennan, Yuan, G. MacLennan, complex example. The MA [9] included only a single RCT, Norrie, Omar, Bruins, Hernandez, Plass, Van Poppel, N’Dow. which was the study in conflict with its own results. The Statistical analysis: None. other studies included were all retrospective, which in Obtaining funding: None. general provide a lower LE. Risk of bias was poorly assessed, Administrative, technical, or material support: None. and significant study heterogeneity was present. It is Supervision: Sylvester, N’Dow. important to reiterate that combining observational studies Other: None. in general, and even comparative nonrandomized studies Financial disclosures: Richard J. Sylvester certifies that all conflicts of with RCTs in an intervention MA, may produce unreliable interest, including specific financial interests and relationships and results and is not considered valid. In light of all this, the affiliations relevant to the subject matter or materials discussed in the single RCT [10] in this circumstance might provide more manuscript (eg, employment/affiliation, grants or funding, consultan- guidance than the MA if it was of significantly high quality. cies, honoraria, stock ownership or options, expert testimony, royalties, However, this RCT also had some concerns, so or patents filed, received, or pending), are the following: None. the comparison is not so simple. Funding/Support and role of the sponsor: None. Instead of automatically assigning a higher LE to SR/MAs that conflict with RCTs, these examples have shown that the References quality of the evidence and the RoB of studies included in SRs/MAs should be assessed to determine which source [1] Sackett DL, Rosenberg WMC, Gray JAM, Haynes RB, Richardson WS. provides the better evidence. Evidence based medicine: what it is and what it isn’t. Br Med J 1996;312:71–2. Although non-RCTs can be included in SRs, we have [2] Oxford Centre for Evidence-based Medicine. Levels of evidence. emphasized that only RCTs should be included in interven- Oxford, UK: CEBM; 2009. www.cebm.net/oxford-centre-evidence- [3_TD$IF] tion MAs. RCTs are not required for MAs of prognostic factors based-medicine- levels-evidence-march-2009/. [8_TD$IF] and the accuracy of diagnostic tests, however, the studies [3] Kjaergard LL, Villumsen J, Gluud C. Reported methodologic quality included in these MAs should preferably be prospective in and discrepancies between large and small randomized trials in nature and based on a protocol to minimize RoB. meta-analyses. Ann Intern Med 2001;135:982–9. 818 EUROPEAN UROLOGY 71 (2017) 811–819

[4] Borzak S, Ridker PM. Discordance between meta-analyses and [27] Egger M, Davey Smith G, Schneider M, Minder C. Bias in meta- large-scale randomized, controlled trials. Ann Intern Med analysis detected by a simple graphical test. Br Med J 1997;315: 1995;123:873–7. 629–34. [5] Flather MD, Farkouh ME, Pogue JM, Yusuf S. Strengths and limita- [28] Shea BJ, Grimshaw JM, Wells GA. Development of AMSTAR: a tions of meta-analysis: larger studies may be more reliable. Con- measurement tool to assess the methodological quality of system- trolled Clinical Trials 1997;18:568–79. atic reviews. BMC Med Res Methodol 2007;7:10. [6] DerSimonian R, Levine RJ. Resolving discrepancies between a meta- [29] Shea BJ, Hamel C, Wells GA, et al. AMSTAR is a reliable and valid analysis and a subsequent large controlled trial. JAMA 1999;282: measurement tool to assess the methodological quality of system- 644–70. atic reviews. J Clin Epidemiol 2009;62:1013–20. [7] LeLorier J, Gregoire G, Benhaddad A, Lapierre J, Derderian F. Dis- [30] Diekemper RL, Ireland BK, Merz LR. Development of the Documen- crepancies between meta-analyses and subsequent large random- tation and Appraisal Review Tool for systematic reviews. World J ized, controlled trials. N Engl J Med 1997;337:536–42. Meta Anal 2015;3:142–50. [8] Pickard R, Starr K, MacLennan G, et al. Medical expulsive therapy in [31] Sterne JA, Egger M, Smith GD. Investigating and dealing with publi- adults with ureteric colic: a multicentre, randomised, placebo- cation and other biases in meta-analysis. Br Med J 2001;323:101–5. controlled trial. Lancet 2015;386:341–9. [32] Thompson SG, Higgins JP. How should meta-regression analyses be [9] Kim SP, Thompson RH, Boorjian SA, et al. Comparative effectiveness undertaken and interpreted? Stat Med 2002;21:1559–73. for survival and renal function of partial and radical nephrectomy [33] Thompson SG, Higgins JPT. Can meta-analysis help target inter- for localized renal tumors: a systematic review and meta-analysis. J ventions at individuals most likely to benefit? Lancet 2005;365: Urol 2012;188:51–7. 341–6. [10] Van Poppel H, Da Pozzo L, Albrecht W, et al. Prospective, random- [34] Hollingsworth JM, Rogers MA, Kaufman SR, et al. Medical therapy to ised EORTC intergroup phase 3 study comparing the oncologic facilitate urinary stone passage: a meta-analysis. Lancet 2006;368: outcome of elective nephron-sparing surgery and radical nephrec- 1171–9. tomy for low-stage renal cell carcinoma. Eur Urol 2011;59:543–52. [35] Campschroer T, Zhu Y, Duijvesz D, Grobbee DE, Lock MTWT. Alpha [11] Byar DP, Simon RM, Friedewald WT, et al. Randomized clinical blockers as medical expulsive therapy for ureteral stones. trials—perspectives on some recent ideas. N Engl J Med 1976;295: Database Syst Rev 2014;4:CD008509. 74–80. [36] Seitz C, Liatsikos E, Porpiglia F, Tiselius H-G, Zwergel U. Medical [12] Sibbald B, Roland M. Understanding controlled trials: why are therapy to facilitate the passage of stones: what is the evidence. Eur randomized controlled trials important? Br Med J 1998;316:201. Urol 2009;56:455–71. [13] Hansson L, Hedner T, Dahlo¨f B. Prospective Randomized Open [37] Singh A, Alter HJ, Littlepage A. A systematic review of medical Blinded End-point (PROBE) study. A novel design for intervention therapy to facilitate passage of ureteral calculi. Ann Emerg Med trials. Blood Press 1992;1:113–9. 2007;50:552–63. [14] Julious SA. Sample sizes for clinical trials. Boca Raton, FL: Chapman [38] EAU/AUA Nephrolithiasis Guideline Panel. 2007 guideline for the and Hall/CRC; 2009. management of ureteral calculi. www.auanet.org/education/ [15] Armijo-Olivo S, Warren S, Magee D. Intention to treat analysis, guidelines/ureteral-calculi.cfm compliance, drop-outs and how to deal with missing data in clinical [39] Schulz K, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of research: a review. Phys Ther Rev 2009;14:36–49. bias. Dimensions of methodological quality associated with esti- [16] Rothwell PM. External validity of randomised controlled trials: ‘‘To mates of treatment effects in controlled clinical trials. JAMA whom do the results of this trial apply?’’ Lancet 2005;365:82–93. 1995;273:408–12. [17] Fleming TR, DeMets DL. Surrogate end points in clinical trials: are [40] Driessen E, Hollon SD, Bockting CL, Cuijpers P, Turner EH. Does we being misled? Ann Intern Med 1996;125:605–13. publication bias inflate the apparent efficacy of psychological [18] Higgins JPT, Green S, editors. Cochrane handbook for systematic treatment for major depressive disorder? A systematic review reviews of interventions, version 5.1.0. The Cochrane Collabora- and meta-analysis of US National Institutes of Health–funded trials. tion; 2011. http://handbook.cochrane.org. PLoS One 2015;10:e0137864. http://dx.doi.org/10.1371/journal. [19] Egger M, Smith GD, Phillips AN. Meta-analysis: principles and pone.0137864. procedures. Br Med J 1997;315:1533–7. [41] Scosyrev E, Messing EM, Sylvester R, Campbell S, Van Poppel H. [20] Riley RD, Lambert PC, Abo-Zaid G. Meta-analysis of individual Renal function after nephron-sparing surgery versus radical ne- participant data: rationale, conduct, and reporting. Br Med J phrectomy: results from EORTC randomized trial 30904. Eur Urol 2010;340:c221. 2014;65:372–7. [21] Tudur Smith C, Marcucci M, Nolan SJ, et al. Individual participant [42] Guyatt GH, Oxman AD, Vist GE, et al. GRADE: an emerging consen- data meta-analyses compared with meta-analyses based on aggre- sus on rating quality of evidence and strength of recommendations. gate data. Cochrane Database Syst Rev 2016;9:MR000007. http:// Br Med J 2008;336:924–6. dx.doi.org/10.1002/14651858.MR000007.pub3. [43] MacLennan S, Imamura M, Lapitan MC, et al. Systematic review of [22] Mulrow CD. Rationale for systematic reviews. Br Med J oncological outcomes following surgical management of localised 1994;309:597–9. renal cancer. Eur Urol 2012;61:972–93. [23] Graham R, Mancher M, Miller Wolman D, et al., editors.Clinical [44] Tan HJ, Norton EC, Ye Z, Hafez KS, Gore JL, Miller DC. Long-term practice guidelines we can trust.. National Academies Press: survival following partial versus radical nephrectomy among older Washington, DC; 2011. patients with early-stage kidney cancer. JAMA 2012;307:1629–35. [24] Higgins J, Thompson SG. Quantifying heterogeneity in a meta- [45] Roos FC, Steffens S, Junker K, et al. Survival advantage of partial over analysis. Stat Med 2002;21:1539–58. radical nephrectomy in patients presenting with localized renal cell [25] Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring incon- carcinoma. BMC Cancer 2014;14:372–8. sistency in meta-analyses. Br Med J 2003;327:557–60. [46] Capitanio U, Terrone C, Antonelli A, et al. Nephron-sparing tech- [26] Thornton A, Lee P. Publication bias in meta-analysis: its causes and niques independently decrease the risk of cardiovascular events consequences. J Clin Epidemiol 2000;53:207–16. relative to radical nephrectomy in patients with a T1a–T1b renal EUROPEAN UROLOGY 71 (2017) 811–819 819

mass and normal preoperative renal function. Eur Urol and T2 renal tumors: a systematic review and meta-analysis of 2015;67:683–9. comparative studies. Eur Urol 2017;71:606–17. [47] Tomaszewski JJ, Kutikov A. Retrospective comparison of cardiovas- [51] Guyatt GH, Sackett DL, Cook DJ. Evidence-Based Medicine Working cular risk in preselected patients undergoing kidney cancer sur- Group. Users’ guides to the II: how to use an gery: reflection of reality or simply what we want to hear? Eur Urol article about therapy or prevention (A): are the results of the study 2015;67:690–1. valid? JAMA 1993;270:2598–601. [48] Shuch B, Hanley J, Lai J, et al. Overall survival advantage with partial [52] Ford AC, Guyatt GH, Talley NJ, Moayyedi P. Errors in the conduct of nephrectomy: a bias of observational data? Cancer 2013;119: systematic reviews of pharmacological interventions for irritable 2981–9. bowel syndrome. Am J Gastroenterol 2010;105:280–8. [49] Woldu SL, Weinberg AC, Korets R, et al. Who really benefits from [53] Jaeschke R, Guyatt GH, Dellinger P, et al. Use of GRADE grid to reach nephron-sparing surgery? Urology 2014;84:860–8. decisions on clinical practice guidelines when consensus is elusive. [50] Mir MC, Derweesh I, Porpiglia F, Zargar H, Mottrie A, Autorino R. Br Med J 2008;337:a744. Partial nephrectomy versus radical nephrectomy for clinical T1b