<<

Journal of Physiotherapy 60 (2014) 170–173

Journal of PHYSIOTHERAPY journal homepage: www.elsevier.com/locate/jphys

Appraisal Research Note Forest plots

Introduction Forest plots of randomised trials

A systematic literature review can provide a robust answer to a Continuous measures of treatment outcome clinical question by identifying individual studies that provide evidence relevant to the question and summarising their results. In The forest presented in Figure 1 was generated using the field of physiotherapy, systematic reviews commonly summa- from a of randomised controlled trials of rise the results of randomised trials that test the effect of an rhythmic cueing to improve walking speed in people with intervention.1 However, systematic reviews can also summarise Parkinson’s disease.10 The first column (on the left) lists the studies studies of the accuracy of diagnostic tests,2 studies of the included in the meta-analysis. Next are three columns of data for the prevalence of a clinical condition,3 studies of prognostic factors,4 experimental group in each study: the walking speed (in cm/s), or other study types. If the included studies are sufficiently similar the (to indicate how much walking speed varied and the results can be obtained in the same format, a meta-analysis among the participants) and the number of participants. The same may be performed.5,6 Meta-analysis is a statistical method used to three columns of data are then presented for the control group. These summarise numerical data from the individual studies into one data are then used to generate a mean difference in walking speed overall estimate. The results of the individual studies and the (still in cm/s) between the groups for each study. In the first study, for overall estimate from the meta-analysis are usually presented in a example, walking speed improved by a mean of 6 cm/s in the graph called a forest plot. experimental group and 1 cm/s in the control group. The mean difference is therefore 6 minus 1 = 5 cm/s. This is presented What is a forest plot? numerically and also graphically by plotting a blue square over the horizontal line. Blue squares to the right of the vertical line Although forest plots have been used since the 1970s,7 the indicate that the study favoured cueing, whilst those to left favour name ‘forest plot’ was first used in 2001.8 The name refers to the control. Note that a 95% CI is presented in both the numerical forest of lines produced when the results of multiple individual presentation (by two numbers in parentheses) and the graphical studies are plotted against the same axis. The display (by a horizontal black line). Confidence intervals have been 11,12 Collaboration’s official definition9 states: ‘A forest plot is a discussed previously in this journal. Briefly, each study provides graphical representation of the individual results of each study an estimate of the true effect of cueing on gait speed in people with included in a meta-analysis together with the combined meta- Parkinson’s disease. If any of the studies were repeated, a slightly analysis result. The plot also allows readers to see the heterogene- different result would be expected. Loosely speaking, the confidence ity among the results of the studies.’ The forest plot provides a interval indicates the within which the true effect of cueing quick visual representation of overall effect estimates and study probably lies. The estimate from each study is plotted, with the heterogeneity and is therefore considered to be a very powerful vertical line presenting the line of no effect. The size of the squares tool in meta-analysis.5,6 denotes the weight given to the study, with larger squares reflecting [(Figure_1)TD$IG]

Cueing Control Mean difference Mean difference Study mean SD Total mean SD Total Weight IV, Random, 95% CI IV, Random, 95% CI

Almeida 2012 6 19 28 1 27 14 4.7% 5.00 (−10.80, 20.80)

de Bruin 2010 3 22 11 −2 17 11 4.3% 5.00 (−11.43, 21.43)

Haase 201 −5 26 17 5 20 6 2.8% −10.00 (−30.22, 10.22)

Mak 2008 5 6 19 0 6 14 67.8% 5.00 (0.86, 9.14)

Nieuwboer 2007 8 16 76 2 23 77 17.3% 6.00 (−2.20, 14.20)

Thaut 1996 16 22 15 −5 27 11 3.1% 21.00 (−2.20, 14.20)

Total 166 133 100% 5.24 (1.83, 8.65) Heterogeneity: Tau2 = 0.00; Chi2 = 4.75, df = 5 (p = 0.45); I2 = 0% p Test for overall effect: Z = 3.01 ( = 0.003) −50 −25 0 25 50 Favours control Favours cueing

Figure 1. Meta-analysis of trials of the effect of cueing versus no cueing on gait speed (cm/s) in people with Parkinson’s disease, using a random-effect model. A negative mean gait speed a slower overall gait speed, SD = standard deviation, total = number of participants. Test for heterogeneity: chi2 = 4.75, I2 = 0%, p = 0.45. Modified from the systematic review by Tomlinson and colleagues.10 http://dx.doi.org/10.1016/j.jphys.2014.06.021 1836-9553/ß 2014 Australian Physiotherapy Association. Published by Elsevier B.V. All rights reserved. Appraisal Research Note 171 more weight. The weight given to each study is automatically Interpreting forest plots calculated based on the precision of the study’s estimate (precise estimates receive more weight). All the individual estimates are then Statistical significance statistically pooled using meta-analysis to produce an overall estimate. This is presented graphically as a black diamond, where When a confidence interval includes the vertical line of no the centre of the diamond is the overall estimate and the width of the effect, the result is statistically non-significant. This is evident diamond is the overall confidence interval. The pooled estimate is graphically when the horizontal black line (for an individual study) also presented numerically. or the black diamond (for the pooled estimate) crosses the vertical The format of the forest plot presented in Figure 1 would be line of the forest plot. Therefore, the pooled estimate in Figure 1 is suitable for other continuous outcomes, such as activities of daily statistically significant but several of the individual studies are not. living or quality of life, where higher values are better. For This illustrates the ability of meta-analyses to harness the continuous outcomes where lower values are better (such as pain statistical power of multiple studies to produce a more precise intensity), values to the left of the vertical line would favour the overall estimate, as shown by the narrower confidence interval of experimental group. Also, the studies in Figure 1 all reported gait the pooled estimate than the individual studies. speed in more or less the same way and reported the results in cm/s, so these units can be retained throughout the analysis. If Analysis of subgroups instead the group of studies had measured an outcome using a variety of tests (such as different scales for depression), the data Individual studies in a forest plot can be arranged in groups could still be pooled, but each result would be reported as a according to a characteristic of the patient population, interven- standardised mean difference (SMD). To calculate the SMD, the tion or follow-up. The estimate of the treatment effect in each data in the original units are divided by the standard deviation. subgroup can be compared to the overall effect estimate. Figure 3 This presents the size of the treatment effect independent of presents a hypothetical subgroup analysis according to the age of the scale used. Commonly, effect sizes below 0.3 are considered to participants. Subgroup differences can occur by chance, so a test be small, above 0.5 are considered to be moderate, and above 0.8 for subgroup heterogeneity is presented. Subgroup differences are considered to be large. However, these thresholds can be can occur even when the test for heterogeneity does not indicate misleading if they are not interpreted in relation to the standard heterogeneity. deviation of the study population. Another way that forest plots can reveal the relationship between the treatment effect and a characteristic of the participants, interventions or assessment is to place the studies Dichotomous measures of treatment outcome on a vertical axis that shows the range of that characteristic. A forest plot with an extra vertical axis locating each study by Although forest plots for dichotomous outcomes are similar to a characteristic on a continuous scale (eg, the gender ratio of the those for continuous outcomes, some differences in format are participants, the duration of the intervention, or the time of required. The forest plot presented in Figure 2 is from a systematic assessment) is called a modified forest plot.14 review of trials of surgical techniques in people with chronic neck pain.13 The outcome assessed in the studies is recovery, so the data Clinical relevance columns now present the number of events (ie, people who had recovered during the follow-up period) and the total number of Sometimes, an additional vertical line is added to the forest plot, participants. The contrast between groups is now calculated as a indicating the threshold for clinical relevance; that is, the effect risk difference (ie, the difference in the chance or ‘risk’ of recovery that is large enough to justify the cost, risks and inconvenience of between the groups) and this is again presented numerically and the intervention. A hypothetical example of this is shown by the graphically. Although Figure 2 shows risk difference, other red dotted line in Figure 3. The location of the confidence interval can be calculated and meta-analysed for dichotomous in relation to this line and the line of no effect shows how it should outcomes. One is relative risk, which is the ratio of the probability be interpreted. The effect in children is statistically non-significant, of an event in the treated and control groups. Another is the odds whereas the other subgroups all show a statistically significant ratio, which is the ratio of the odds of recovery in the treatment effect because their confidence intervals are all to the right of the group to the odds of recovery in the control group. line of no effect. Among the subgroups with a statistically

[(Figure_2)TD$IG]

Prosthetic disc Fusion Risk Difference Risk Difference Study, YearEvents Total Events Total Weight M-H, Random, 95% CI M-H, Random, 95% CI Mummameni 2007 44 276 61 265 32.3% −0.07 (−0.14, −0.00)

Murrey 2009 21 103 23 106 13.7% −0.01 (−0.12, 0.10)

Heller 2009 40 242 53 221 27.8% −0.07 (−0.15, −0.00)

Cheng 20096 31 10 34 4.2% −0.10 (−0.31, 0.11)

Jawahar 2010 20 59 9 34 4.9% 0.07 (−0.12, 0.27)

Coric 2011 20 136 39 133 17.1% −0.15 (−0.24, −0.05)

Total(95% CI) 847 793 100.0% −0.07 (−0.11, −0.03)

Total events 151 195 −0.5−0.25 0 0.25 0.5 Heterogeneity: Tau2 = 0.00; Chi2 = 5.64, df = 5 (p = 0.34); I2 = 11% Favours prosthetic disc Favoursfusion Test for overall effect: Z = 3.23 ( p = 0.001)

Figure 2. Meta-analysis of trials of the effect of prosthetic disc versus cervical fusion on recovery in people with chronic disabling neck pain, using a random-effects model and the Mantel-Haenszel method. Test for heterogeneity: chi2 = 5.64, I2 = 11%, p = 0.34. Modified from the systematic review by Verhagen and colleagues.13 172[(Figure_3)TD$IG] Appraisal [(Figure_4)TD$IG] Research Note

Mean difference Study Prevalence (95% CI) 95% CI Children Hanania 0.26 (0.24 to 0.28)

Study A Manen 0.22 (0.16 to 0.29)

Study B Ng 0.23 (0.17 to 0.29)

Subtotal Yohannes 0.46 (0.36 to 0.56)

Omachi 0.27 (0.25 to 0.30) Adolescents

Study C Engstrom 0.07 (0.03 to 0.16)

Study D Aghanwa 0.17 (0.07 to 0.34)

Subtotal Schneider 0.21 (0.21 to 0.22) Pooled 0.25 (0.21 to 0.29) Adults

Study E 0 0.2 0.4 0.6 0.8 1 Study F Prevalence (95% CI)

Subtotal Figure 4. Forest plot of eight studies of the prevalence of depression among people with chronic obstructive pulmonary disease. 3 Elderly Modified from the systematic review by Zhang and colleagues.

Study G

Study H has high reproducibility (intra-class correlation = 0.87) and has a significant association with the presence of heterogeneity, as Subtotal assessed by a statistical test,7 suggesting this is a reasonable approach. Total Forest plots of other study types

-2.5 0 2.5 5 7.5 Prevalence studies Favours control Favours treatment Estimates of the prevalence of a clinical condition from Figure 3. Hypothetical meta-analysis of trials, using a random-effects model, with subgrouping by age categories. Dotted vertical line represents the threshold for multiple observational studies can also be summarised in a clinical relevance. Test for heterogeneity: I2 = 67%, p = 0.04. forest plot. The individual and pooled estimates of prevalence and their confidence intervals are presented in a similar way to the forest plots discussed earlier. However, the significant effect, the effect in adolescents is clearly not worth- estimates are plotted over the x-axis, which extends from 0 while because the confidence interval is below the threshold for (no members of the population have the condition) to 1 clinical relevance. The effect in adults may or may not be clinically (all members of the population have the condition). An example, worthwhile because the confidence interval crosses the threshold. which is summarising studies of the prevalence of depression The effect in the elderly is clearly worthwhile. among people with chronic obstructive pulmonary disease,3 is presented in Figure 4. Note that the confidence intervals are Heterogeneity symmetrical around prevalence estimates near 0.5, but they become increasingly asymmetrical as the estimates approach Heterogeneity refers to variability between studies and can 0or1. affect the ability to combine the data of the individual studies. There are two types of heterogeneity: clinical heterogeneity and Reliability studies statistical heterogeneity. Clinical heterogeneity refers to the variability caused by differences in clinical variables, such as the Multiple reliability studies can also be presented on a forest patient population, interventions, outcome measures or setting of plot. The example shown in Figure 5 is derived from a systematic the included studies. Clinicians determine clinical heterogeneity, review of reliability studies of the Berg Balance Scale.2 which means that it will always be a rather subjective decision. The[(Figure_5)TD$IG] example summarises intra-rater reliability, with uniformly Readers should also consider these differences and subjectively decide whether the clinical heterogeneity is small enough for Study Relative reliability Relative reliability Weight meta-analysis to be appropriate. Statistical heterogeneity is the (random (95% CI) (random (95% CI) (%) variability in effect estimates between the studies and can be quantified by various statistics. Forest plots only present the Berg 0.97 (0.93 to 0.99) 9 statistical heterogeneity. The simplest is the I2,which quantifies the heterogeneity from 0 to 100%. There is no clear cut- Cattaneo 0.97 (0.91 to 0.98) 6 point beyond which there is too much heterogeneity. Some use a rule of thumb stating that around 25% is low heterogeneity, Liaw 0.98 (0.97 to 0.99) 83 around 50% medium and around 75% high heterogeneity.15 Although other statistics, such as the Tau2 or chi,2 are sometimes Pooled (I2 = 0%, p = 0.73) 0.98 (0.97 to 0.99) 100 used, the I2 does not suffer from some of the drawbacks of these other tests.15 Because forest plots provide a visual representation of study 0 0.51 estimates, another approach is to simply view the variation Figure 5. Forest plot of three studies of the intra-rater reliability of the Berg Balance between studies and judge the presence of heterogeneity Scale. (‘eyeball’ analysis). This subjective assessment of heterogeneity Modified from the systematic review by Downs and colleagues.2 Appraisal[(Figure_6)TD$IG] Research Note 173

StudyTPFP FN TN Sensitivity SpecificitySensitivity Specificity

Firooznia 1984 97 4870.92 (0.86 to 0.97) 0.64 (0.31 to 0.89)

Forristall 1988 20 2450.83 (0.63 to 0.95) 0.71 (0.29 to 0.96)

Jackson 1989 I 89 25 36 81 0.71 (0.62 to 0.79) 0.76 (0.67 to 0.84)

Jackson 1989 II 35 8 24 53 0.59 (0.46 to 0.72)0.87 (0.76 to 0.94)

Schaub 1989 13 6550.72 (0.47 to 0.90) 0.45 (0.17 to 0.77)

Schipper 1987 1407 57 31 0.71 (0.64 to 0.77) 0.82 (0.66 to 0.92)

Thornbury 1993 175190.94 (0.73 to 1.00) 0.64 (0.35 to 0.87)

00.410.2 0.6 0.8 00.410.2 0.6 0.8

Figure 6. Forest plot of seven studies of the diagnostic accuracy of computer tomography (CT) scan compared to surgery as reference standard in people with low back pain. TP = true positive, FP = false positive, FN = false negative, TN = true negative. Modified from the systematic review by van Rijn and colleagues.16 excellent results across all of the studies. Again the x-axis extends Summary from 0 to 1 and the confidence intervals are asymmetrical. The same review uses exactly the same format for a forest plot of inter-rater Forest plots are frequently used in meta-analysis to present the reliability.2 results graphically. Without specific knowledge of statistics, a visual assessment of heterogeneity appears to be valid and reproducible. Possible causes of heterogeneity can be explored Diagnostic accuracy studies in modified forest plots. Forest plots in meta-analyses appear to be a valid and useful tool to quickly and efficiently scan and interpret The forest plots of diagnostic accuracy studies differ from the evidence. The expression ‘a picture is worth a thousand words’ forest plots of treatment effectiveness because they show a certainly expresses the value of forest plots. double plot. The sensitivity and specificity estimates are presented together in one graph, as shown in Figure 6.In a b diagnostic test accuracy meta-analyses, the data presented Arianne P Verhagen and Manuela L Ferreira a entail true positive, false positive, false negative and true Department of General Practice, Erasmus Medical Centre University, negative data for each study. Forest plots of diagnostic accuracy Rotterdam, The Netherlands b can provide the same information of pooled summary estimates Musculoskeletal Division, The George Institute for Global Health, and test heterogeneity. However, this is not recommended and Sydney, Australia therefore the program Review Manager of the Cochrane Collaboration does not provide this information together with References the forest plots, but shows that in a receiver operating characteristic (ROC) graph. 1. Moseley A, et al. J Clin Epidemiol. 2009;62:1021–1030. 2. Downs S, et al. J Physiother. 2013;59:93–99. 3. Zhang MWB, et al. Gen Hosp Psychiatry. 2011;33:217–223. 4. Fermont AJ, et al. J Orthop Sports Phys Ther. 2014;33:153–163. Other variations 5. Israel H, Richter RR. J Orthop Sports Phys Ther. 2011;41:496–504. 6. Callcut RA, Branson RD. Respir Care. 2009;54:1379–1385. 7. Bax L, et al. Am J Epidemiol. 2009;169:249–255. Forest plots may also report information about the statistical 8. Lewis D, Clarke M. BMJ. 2001;322:1479–1480. method used in the meta-analysis. For example, the meta-analysis 9. Cochrane glossary. [Accessed May 4 2014]. in Figure 2 used the Mantel-Haenszel method, but other methods 10. Tomlinson CL, et al. Cochrane Database Syst Rev. 2013;9:CD002817. 11. Herbert RD. Aust J Physiother. 2000;46:229. for dichotomous outcomes include the Inverse method 12. Herbert RD. Aust J Physiother. 2000;46:309. and the Peto method. Another possible variation in meta-analyses 13. Verhagen AP, et al. Pain. 2013;154:2388–2396. is whether a fixed-effect model or a random-effects model is used. 14. Groenwold RHH, et al. BMC Med Res Methodol. 2010;65:253–261. 15. Higgins JP, et al. BMJ. 2003;327:557–560. Further information about these methods is available in the 16. van Rijn RM, et al. Eur Spine J. 2012;21:228–239. 17 Cochrane handbook. 17. Cochrane handbook. [Accessed May 5 2014].