Education and Debate
Total Page:16
File Type:pdf, Size:1020Kb
Education and debate Assessing the quality of research Paul Glasziou, Jan Vandenbroucke, Iain Chalmers Inflexible use of evidence hierarchies confuses practitioners and irritates researchers. So how can we improve the way we assess research? The widespread use of hierarchies of evidence that questions, not just the effect of interventions.2 Other Department of Primary Health grade research studies according to their quality has hierarchies might be necessary to answer questions Care, University of helped to raise awareness that some forms of evidence about aetiology, diagnosis, disease frequency, progno- Oxford, Oxford are more trustworthy than others. This is clearly desir- sis, and adverse effects.3 Thus, although a systematic OX3 7LF able. However, the simplifications involved in creating review of randomised trials would be appropriate for Paul Glasziou reader and applying hierarchies have also led to misconcep- answering questions about the main effects of a Leiden University tions and abuses. In particular, criteria designed to treatment, it would be ludicrous to attempt to use it to Medical School, guide inferences about the main effects of treatment ascertain the relative accuracy of computerised versus Leiden 9600 RC, have been uncritically applied to questions about aeti- human reading of cervical smears, the natural course Netherlands ology, diagnosis, prognosis, or adverse effects. So of prion diseases in humans, the effect of carriership of Jan Vandenbroucke professor of clinical should we assess evidence the way Michelin guides a mutation on the risk of venous thrombosis, or the epidemiology assess hotels and restaurants? We believe five issues rate of vaginal adenocarcinoma in the daughters of James Lind 4 should be considered in any revision or alternative pregnant women given diethylstilboesterol. Initiative, Oxford approach to helping practitioners to find reliable To answer their everyday questions, practitioners OX2 7LG answers to important clinical questions. need to understand the “indications and contraindica- Iain Chalmers 5 editor, James Lind tions” for different types of research evidence. library Randomised trials can give good estimates of Different types of question require Correspondence to: treatment effects but poor estimates of overall progno- P Glasziou different types of evidence sis; comprehensive non-randomised inception cohort paul.glasziou@ dphpc.ox.ac.uk Ever since two American social scientists introduced studies with prolonged follow up, however, might 1 the concept in the early 1960s, hierarchies have been provide the reverse. BMJ 2004;328:39–41 used almost exclusively to determine the effects of interventions. This initial focus was appropriate but has Systematic reviews of research are also engendered confusion. Although interventions always preferred are central to clinical decision making, practice relies on answers to a wide variety of types of clinical With rare exceptions, no study, whatever the type, should be interpreted in isolation. Systematic reviews are required of the best available type of study for answering the clinical question posed.6 A systematic review does not necessarily involve quantitative pooling in a meta-analysis. Although case reports are a less than perfect source of evidence, they are important in alerting us to poten- tial rare harms or benefits of an effective treatment.7 Standardised reporting is certainly needed,8 but too few people know about a study showing that more than half of suspected adverse drug reactions were confirmed by subsequent, more detailed research.9 For reliable evidence on rare harms, therefore, we need a systematic review of case reports rather than a haphaz- ard selection of them.10 Qualitative studies can also be incorporated in reviews—for example, the systematic compilation of the reasons for non-compliance with hip protectors derived from qualitative research.11 References w1-w9 are available on bmj.com BMJ VOLUME 328 3 JANUARY 2004 bmj.com 39 Education and debate Level alone should not be used to question, the hierarchy would suggest first looking for grade evidence a systematic review of randomised controlled trials. However, only a fraction of the hundreds of thousands The first substantial use of a hierarchy of evidence to of reports of randomised trials have been considered grade health research was by the Canadian Task Force 18 12 for possible inclusion in systematic reviews. So when on the Preventive Health Examination. Although such there is no existing review, a busy clinician might next systems are preferable to ignoring research evidence or try to identify the best of several randomised trials. If failing to provide justification for selecting particular the search fails to identify any randomised trials, research reports to support recommendations, they non-randomised cohort studies might be informative. have three big disadvantages. Firstly, the definitions of For non-therapeutic questions, however, search the levels vary within hierarchies so that level 2 will mean strategies should accommodate the need for observa- different things to different readers. Secondly, novel or tional designs that answer questions about aetiology, hybrid research designs are not accommodated in these prognosis, or adverse effects.19 Whatever evidence is hierarchies—for example, reanalysis of individual data found, this should be clearly described rather than from several studies or case crossover studies within simply assigned to a level. Such considerations have led cohorts. Thirdly, and perhaps most importantly, the authors of the BMJ’s Clinical Evidence to use a hier- hierarchies can lead to anomalous rankings. For archy for finding evidence but to forgo grading example, a statement about one intervention may be evidence into levels. Instead, they make explicit the graded level 1 on the basis of a systematic review of a type of evidence on which their conclusions are based. few, small, poor quality randomised trials, whereas a statement about an alternative intervention may be graded level 2 on the basis of one large, well conducted, Balanced assessments should draw on a multicentre, randomised trial. variety of types of research This ranking problem arises because of the objective For interventions, the best available evidence for each of collapsing the multiple dimensions of quality (design, outcome of potential importance to patients is needed.20 conduct, size, relevance, etc) into a single grade. For Often this will require systematic reviews of several example, randomisation is a key methodological feature different types of study. As an example, consider a in research into interventions,13 but reducing the quality woman interested in oral contraceptives. Evidence is of evidence to a single level reflecting proper available from controlled trials showing their contracep- randomisation ignores other important dimensions of tive effectiveness. Although contraception is the main randomised clinical trials. These might include: intended beneficial effect, some women will also be x Other design elements, such as the validity of meas- interested in the effects of oral contraceptives on acne or urements and blinding of outcome assessments dysmenorrhoea. These may have been assessed in short x Quality of the conduct of the study, such as loss to term randomised controlled trials comparing different follow up and success of blinding contraceptives. Any beneficial intended effect needs to x Absolute and relative size of any effects seen be weighed against possible harms, such as increases in x Confidence intervals around the point estimates of thromboembolism and breast cancer. The best evidence effects. for such potential harms is likely to come from None of the current hierarchies of evidence non-randomised cohort studies or case-control studies. includes all these dimensions, and recent methodologi- For example, fears about negative consequences on fer- cal research suggests that it may be difficult for them to tility after long term use of oral contraceptives were do so.14 Moreover, some dimensions are more allayed by such non-randomised studies. The figure important for some clinical problems and outcomes gives an example of how all this information might be than for others, which necessitates a tailored approach amalgamated into a balance sheet.21 22 to appraising evidence.15 Thus, for important recom- Sometimes, rare, dramatic adverse effects detected mendations, it may be preferable to present a brief with case reports or case control studies prompt summary of the central evidence (such as “double- further investigation and follow up of existing blind randomised controlled trials with a high degree randomised cohorts to detect related but less severe of follow up over three years showed that . .”), coupled adverse effects. For example, the case reports and case- with a brief appraisal of why particular quality dimen- control studies showing that intrauterine exposure to sions are important. This broader approach to the diethylstilboestrol could cause vaginal adenocarci- assessment of evidence applies not only to randomised noma led to further investigation and follow up of the trials but also to observational studies. In the final rec- mothers and children (male as well as female) who had ommendations, there will also be a role for other types participated in the relevant randomised trials. These of scientific evidence—for example, on aetiological and investigations showed several less serious but more pathophysiological