Reporting Statistical Information in Medical Journal Articles
Total Page:16
File Type:pdf, Size:1020Kb
EDITORIAL Reporting Statistical Information in Medical Journal Articles TATISTICS IS not merely about distributions Much regression output serves little purpose in medi- or probabilities, although these are part of cal research publication; this usually includes the inter- the discipline. In the broadest sense, statis- cept coefficient, R2, log likelihood, standard errors, and tics is the use of numbers to quantify rela- P values. Estimates of variance explained (such as R2, cor- tionships in data and thereby answer ques- relation coefficients, and standardized regression coef- Stions. Statistical methods allow the researcher to reduce ficients (sometimes called effect size) are not useful mea- a spreadsheet of data to counts, means, proportions, rates, sures of causal associations or agreement and should not risk ratios, rate differences, and other quantities that con- be presented as the main results of an analysis.9-12 These vey information. We believe that the presentation of nu- measures depend not only on the size of any biological merical information will be enhanced if authors keep in effect of an exposure, but also on the distribution of the mind that their goal is to clarify and explain. We offer exposure in the population. Because this distribution can suggestions here for the presentation of statistical infor- be influenced by choice of study population and study mation to the readers of general medical journals. design, it makes little sense to standardize on a measure that can be made smaller or larger by the investigator. NUMBERS THAT CAN BE OMITTED Useful measures for causal associations, such as risk ra- tios and rate differences, are discussed below. Useful mea- Most statistical software packages offer a cornucopia of sures of agreement, such as kappa statistics, the intra- output. Authors need to be judicious in selecting what class correlation coefficient, the concordance coefficient, should be presented. A chi-square test will typically pro- and other measures, are discussed in many textbooks and duce the chi-square statistic, the degrees of freedom in articles.11-20 the data, and the P value for the test. In general, chi- Global tests of regression model fit are not helpful square statistics, t statistics, F statistics, and similar val- in most articles. Investigators can use these tests to check ues should be omitted. Degrees of freedom are not needed. that a model does not have major conflicts with the data, Even the P value can usually be omitted. In a study but they should be aware that these tests have low power that compares groups, it is customary to present a table that to detect problems.21 If the test yields a small P value, allows the reader to compare the groups with regard to vari- which suggests a problem with the model, investigators ables such as age, sex, or health status. A case-control study need to consider what this means in the context of their typically compares the cases and controls, whereas a co- study. But a large P value cannot reassure authors or read- hort study typically compares those exposed and not ex- ers that the model presented is correct.5 posed. Sometimes authors use P values to compare these Complex formulas or mathematical notation, such study groups. We suggest that these P values should be omit- as log likelihood expressions or symbolic expressions for ted. In a case-control or cohort study, there is no hypoth- regression models, are not useful for general medical esis that the 2 groups are similar. We are interested in a readers. comparison because differences between groups may con- found estimates of association. If the study sample is large, NUMBERS THAT SHOULD BE INCLUDED small differences that have little confounding influence may be statistically significant. If the study sample is small, large Several authors, including the International Committee differences that are not statistically significant may be im- of Medical Journal Editors, have urged that research ar- portant confounders.1 The bias due to confounding can- ticles present measures of association, such as risk ra- not be judged by statistical significance; we usually judge tios, risk differences, rate ratios, or differences in means, this based on whether adjustment for the confounding vari- along with an estimate of the precision for these mea- able changes the estimate of association.2-6 sures, such as a 95% confidence interval.1,22-29 Even in a randomized trial, in which it is hoped that Imagine that we compared the outcomes of pa- the compared groups will be similar as a result of ran- tients who received treatment A with the outcomes of domization, the use of P values for baseline compari- other patients. If we find that the 2-sided P value for this sons is not appropriate. If randomization was done prop- comparison is .02, we conclude that the probability of erly, then the only reason for any differences must be obtaining the observed difference (or a greater differ- chance.7 It is the magnitude of any difference, not its sta- ence) is 1 in 50 if, in the population from which we se- tistical significance, that may bias study results and that lected our study subjects, the treatment actually had no may need to be accounted for in the analysis.8 effect on the outcomes. But the P value does not tell read- (REPRINTED) ARCH PEDIATR ADOLESC MED/ VOL 157, APR 2003 WWW.ARCHPEDIATRICS.COM 321 ©2003 American Medical Association. All rights reserved. Downloaded From: https://jamanetwork.com/ on 09/27/2021 In tables of column percentages, do not include a Data From a Hypothetical Randomized Controlled Trial row for counts and percentages of missing data. Doing of Treatment A Compared With Standard Care this will distort the other percentages in the same col- umn, making it difficult for readers to compare known No. (%) of Patients Risk Ratio information in different columns. The records with miss- Treatment A Standard Care (95% Confidence ing data are best omitted for each variable. The investi- Outcome (N = 800) (N = 1200) Interval) gator hopes that the distribution of information about Death 18 (2.3) 54 (4.5) 0.50 (0.30 to 0.85) those with missing data was similar to those with known Hospitalization 100 (12.5) 230 (19.2) 0.65 (0.52 to 0.81) data. The amount of missing data should be described Treatment in the methods section. If there is a lot of missing data complications* None 563 (75.1) 615 (55.0) Reference for a variable, say more than 5%, a table footnote can point 1 113 (15.1) 280 (25.0) 0.53 (0.44 to 0.65) this out (Table). 2 or more 74 (9.9) 223 (19.9) 0.44 (0.34 to 0.56) ODDS RATIOS VS RISK RATIOS *Data on complications were missing for 6.3% of the treatment A patients and 6.8% of the standard care patients. In a case-control study, associations are commonly es- timated using odds ratios. Because case-control studies ers if those who received treatment A did better or worse are typically done when the study outcome is uncom- compared with those who did not. Nor does it tell read- mon in the population from which the cases and con- ers how much better or worse one group did compared trols arose, odds ratios will approximate risk ratios.31,32 with the other. However, if we report that the risk ratio Logistic regression is typically used to adjust odds ra- for a bad outcome was 0.5 among those who received tios to control for potential confounding by other vari- treatment A, compared with others, readers can see both ables. the direction (beneficial) and the size (50% reduction in In clinical trials or cohort studies, however, the out- the risk of a bad outcome) of treatment A’s association come may be common. If more than 10% of the study with bad outcomes. It is also useful, when possible, to subjects have the outcome, or if the baseline hazard of show the proportion of each group that had a bad out- disease is common in a subgroup that contributes a sub- come or to use something similar, such as a Kaplan- stantial portion of subjects with the outcome, then the Meier survival curve. If we report that the 95% confi- odds ratio may be considerably further from 1.0 than the dence interval around the risk ratio of 0.5 was 0.3 to 0.7, risk ratio.6,33 This may result in a misinterpretation of the readers can see that the null hypothesis of no associa- study results by authors, editors, or readers.34-40 One op- tion (a risk ratio of 1.0) is unlikely and that risk ratios of tion is to do the analysis using logistic regression and con- 0.9 or 0.1 are also unlikely. If we report that the risk ra- vert the odds ratios to risk ratios.41-43 Another option is tio was 0.5 (95% confidence interval, 0.2 to 1.3), a reader to estimate a risk ratio using Poisson regression, nega- can see that the estimate of 0.5 is imprecise and the data tive binomial regression, or a generalized linear model are compatible with no association between treatment and with a log link and binomial error distribution.44-55 What- outcome (a risk ratio of 1.0) and are even compatible with ever choice is made, we urge authors not to interpret odds a harmful association (a risk ratio greater than 1.0). A ratios as if they were risk ratios in studies where this in- point estimate and confidence interval convey more in- terpretation is not warranted.