RESEARCH/TOOLBOXTOOLBOX

GRADING – levels of evidence

Derek Richards Editor

As this journal changes, it is worth highlighting one the key elements of the Summaries we publish in Evidence-based Dentistry, namely the assignment of levels of evidence.

We assign levels of evidence to each main Therapy/Prevention/ Summary published in Evidence-based Evidence Graphic Evidence Level Aetiology/Harm Dentistry, with the exception of guidelines which contain a mix of levels and therefore SR (with homogeneity*) 1A of RCTs present more of a challenge. The system we 3A 2C 2B 2A 1B 1A use in the journal is based on that employed Individual RCT (with narrow by the Oxford Centre for Evidence-based 1B ) Medicine (OCEBM) as shown in Table 1. 3A 2C 2B 2A 1B 1A The level of evidence we assign is high- SR (with homogeneity*) lighted using our evidence graphic (Figure 1). 2A of cohort studies We will continue to use this system for the 3A 2C 2B 2A 1B 1A present, but it is worth mentioning some of Individual the work that has taken place in the area over 2B (including low quality RCT; 3A 2C 2B 2A 1B 1A e.g. <80% follow-up) the past few years that may change the way we assign levels of evidence in the journal. Ecological studies 2C One of the first attempts to explicitly 3A 2C 2B 2A 1B 1A characterise a of evidence was SR (with homogeneity*) of made by the Canadian Task Force on the 3A case-control studies Periodic Health Examination in 1979,1 to 3A 2C 2B 2A 1B 1A link their healthcare recommendations with * By homogeneity we mean a that is free of worrisome variations (heterogeneity) in the the strength of underlying evidence. Since directions and degrees of results between individual studies. Not all systematic reviews with statistically significant 2 heterogeneity need be worrisome, and not all worrisome heterogeneity need be statistically significant. then, Holger et al. have identified more than 100 other groups that have used vari- Figure 1. The Evidence-based Dentistry evidence graphic. ous systems of codes to communicate grades of evidence and recommendations. Glasziou and colleagues3 subsequently identified five tified by these authors, eg, levels may cohort or case–control studies: risk–benefit issues that they believed should be addressed mean different things to different read- assessments thus need to draw on a variety when looking at alternative approaches to ers, and novel or hybrid approaches are of research types. identifying reliable evidence: not easily accommodated. This can lead • Different types of question require to anomalous , where a system- These authors suggested that there were different types of evidence. For exam- atic review (usually the highest level) two broad options to address these concerns; ple, randomised controlled trials can give that is based on a few small poor qual- to extend and improve existing , or good estimates of treatment effects but ity trials might be placed above a large, to abolish evidence hierarchies and levels of poor estimates of prognosis. well-conducted, multicentre trial. evidence and concentrate instead on teaching • Systematic reviews are preferable: • What if there are no systematic practitioners general principles of research so studies, with rare exceptions, should not reviews? Systematic reviews are only avail- that they can use these principles to appraise be interpreted in isolation, so pooling of able for a small number of topics so whatever the quality and relevance of particular studies. study findings using standardised report- evidence is found should be clearly described. I would suggest that both are necessary. ing is preferable. • Balanced assessment should draw In 2004, the GRADE (Grading of • Level alone should not be used on a variety of research. Even if the Recommendations Assessment, Development to grade evidence. Although this effectiveness of any particular treatment and Evaluation) working group published a approach helps to justify study selection, has good systematic evidence, data about critical appraisal of the six most prominent a number of disadvantages were iden- potential harm is likely to come from systems for grading levels of evidence and

24 © EBD 2009:10.1 RESEARCH/TOOLBOXTOOLBOX

Table 1. Simplified version of the Oxford Centre for Evidence-based Medicine levels of evidence table*

Type of Question

Therapy/ Differential Evidence prevention, Prognosis Diagnosis diagnosis/ symptom Economic and decision analyses level aetiology/ harm study

Systematic review of Systematic review of cohort Systematic review of level Systematic review of Systematic review of level 1 1a RCT studies 1 diagnostic studies prospective cohort studies economic studies

Analysis based on clinically sensible Individual RCT with Validating cohort Individual inception cohort costs or alternatives; systematic 1b narrow confidence study with good study with ≥80% followup with good followup review of the evidence. Multiway intervals reference standards sensitivity analysis included

Systematic review of either Systematic review of Systematic review of retrospective cohort studies Systematic review of level Systematic review of level >2 2a level >2 diagnostic cohort studies or untreated control groups ≥2b studies economic studies studies in RCT

Individual cohort Analysis based on clinically sensible Retrospective cohort study Exploratory cohort study (including low Retrospective cohort costs or alternatives; limited review 2b or followup of untreated study with good quality RCT; eg, study or poor followup of the evidence, or single studies. control patients in RCT reference standards <80% followup) Multiway sensitivity analysis included

‘Outcomes’ research; 2c ‘Outcomes’ research Ecological studies Audit or ‘outcomes’ research ecological studies

Systematic review of Systematic review of Systematic review of 3a Systematic review of ≥3b studies case–control studies level ≥3b studies level ≥3b studies

RCT, Randomised controlled trial. *Full version available from Oxford Centre for Evidence-based Medicine website (www.cebm.net/levels_of_evidence.asp)

strength of recommendations,4 as follows: Table 2. GRADE: quality of evidence and definitions • The American College of Chest Physicians • Australian National Health and Medical Quality of evidence Definitions Research Council Further research is very unlikely to change our confidence in the High quality • OCEBM estimate of effect • Scottish Intercollegiate Guidelines Network Further research is likely to have an impact on our confidence in the Moderate quality • US Preventive Services Task Force estimate of effect and may change the estimate • US Task Force on Community Preventive Further research is very likely to have an impact on our confidence in Low quality Services the estimate of effect and is likely to change the estimate

Very low quality Any estimate of effect is very uncertain The working group found that there was poor agreement about the sense of the sys- tems; all of the systems used were consid- recommend it and it is likely that this will future. Readers who would like more informa- ered to have important shortcomings when be an important system in the future — par- tion on GRADE can find this on their website attempting to grade levels of evidence and ticularly in guideline development. There are (www.gradeworkinggroup.org). the strength of clinical recommendations. of course differences between the role of this 1. Canadian Task Force on the Periodic Health There was agreement that the OCEBM sys- journal and guideline development: Evidence- Examination. The periodic health examination. Can Med Assoc J 1979; 121: 1193–1254. tem worked well for all four types of ques- based Dentistry identifies good quality articles 2. Guyatt GH, Oxman AD, Vist GE, et al. GRADE: an tions (effectiveness, harm, diagnosis and and provides a commentary from a practi- emerging consensus on rating quality of evidence and strength of recommendations. Br Med J 2008; prognosis) considered for the appraisal, tioner working in the area, whereas guidelines 336: 924–926. 3. Glasziou P, Vandenbroucke JP, Chalmers I. Assessing although it was not without its faults. (particularly the better ones) are developed by the quality of research. Br Med J 2004; 328: 39–41. This critical appraisal examined both the a group that includes a number of topic spe- 4. Atkins D, Eccles M, Flottorp S, et al. Systems for grading the quality of evidence and the strength of way these six systems rank the evidence and cific and methodology experts. Guidelines recommendations. I. Critical appraisal of existing how they then grade the strength of clinical groups are likely to have access to a very wide approaches. The GRADE Working Group. BMC Health Services Res 2004; 4: 38. recommendations. A number of key conclu- knowledge base and are thus well placed to 5. Atkins D, Briss PA, Eccles M, et al. Systems for sions were drawn, and a new scheme pro- apply the GRADE definitions effectively; grading the quality of evidence and the strength of recommendations. II. Pilot study of a new system. posed. This has been adopted by the GRADE more so than the smaller number of people BMC Health Services Res 2005; 5: 25. 6. Guyatt GH, Oxman AD, Vist GE, et al. GRADE: an emerging group to develop a new rating of quality and employed in developing and preparing sum- consensus on rating quality of evidence and strength of strength of evidence (Table 2).5,6 maries for this journal. Consequently, we will recommendations. Br Med J 2008; 336; 924–926. The GRADE approach to linking evidence continue to rate studies individually using the Evidence-Based Dentistry (2009) 10, 24–25. and clinical recommendations has much to OCEBM approach (Table 1) for the foreseeable doi:10.1038/sj.ebd.640636 www.nature.com/ebd 25