<<

should never be summarized with the stan¬ Standard , dard error of the .3*"25™ A closer look at the source and mean¬ ing of SD and SE may clarify why medical investigators, journal review¬ ers, and editors should scrutinize their usage with considerable care. Which 'Standard' Should We Use? DISPERSION An essential function of "descriptive George W. Brown, MD " is the presentation of con¬ densed, shorthand symbols that epito¬ mize the important features of a collec¬ \s=b\ (SD) and standard shorthand expression" in 1968; Fein- tion of . The idea of a central value error (SE) are quietly but extensively used stein2 later again warned about the is intuitively satisfactory to anyone who biomedical These terms in publications. fatuity and confusion contained in any needs to summarize a group of measure¬ and notations are used as sta- descriptive a ± b statements where b is not defined. ments, or counts. The traditional indica¬ tistics (summarizing numerical data), and Warnings notwithstanding, a glance tors of a central are the they are used as inferential statistics (esti- tendency almost medical will most the mating population parameters from sam- through any journal (the frequent value), of this value between the lowest ples). I review the use and misuse of SD show examples usage. (the midway and SE in several authoritative medical Medical journals seldom state why and the highest value), and the mean journals and make suggestions to help SD or SE is selected to summarize data (the ). Each has its special uses, clarify the usage and meaning of SD and in a given report. A search of the three but the mean has great convenience and SE in biomedical reports. major pediatrie journals for 1981 (Amer¬ flexibility for many purposes. (Am J Dis Child 1982;136:937-941) ican Journal of Diseases of Children, The dispersion of a collection of values Journal of Pediatrics, and Pediatrics) can be shown in several ways; some are deviation (SD) and stan¬ failed to turn up a single article in which simple and concise, and others are com¬ Standarddard error (SE) have surface simi¬ the selection of SD or SE was explained. plex and esoteric. The is a simple, larities; yet, they are conceptually so There seems to be no uniformity in the direct way to indicate the spread of a different that we must wonder why they use of SD or SE in these journals or in collection of values, but it does not tell are used almost interchangeably in the The Journal of the American Medical how the values are distributed. Knowl¬ medical literature. Both are usually Association (JAMA), the New England edge of the mean adds considerably to preceded by a plus-minus symbol (±), Journal of Medicine, or Science. The the information carried by the range. suggesting that they define a sym¬ use of SD and SE in the journals will be Another is pro¬ metric interval or range of some sort. discussed further. vided by the differences (deviations) of They both appear almost always with a If these respected, well-edited jour¬ each value from the mean of the values. mean (average) of a set of measure¬ nals do not demand consistent use of The trouble with this approach is that ments or counts of something. The med¬ either SD or SE, are there really any some deviations will be positive, and ical literature is replete with statements important differences between them? some will be negative, and their sum like, "The serum cholesterol measure¬ Yes, they are remarkably different, will be zero. We could ignore the sign of ments were distributed with a mean of despite their superficial similarities. each deviation, ie, use the "absolute 180±30 mg/dL (SD)." They are so different in fact that some mean deviation," but mathematicians In the same journal, perhaps in the authorities have recommended that SE tell us that working with absolute num¬ same article, a different statement may should rarely or never be used to sum¬ bers is extremely difficult and fraught appear: "The weight gains of the sub¬ marize medical research data. Fein- with technical disadvantages. jects averaged 720 (mean) ±32 g/mo stein2 noted the following: A neglected method for summarizing the of data is the calculation (SE)." Sometimes, as discussed further, A standard error has nothing to do with dispersion the summary data are presented as the standards, with errors, or with the commu¬ of (or deciles, or ). "mean of 120 mg/dL ±12" without the nication of scientific data. The concept is an Percentiles are used more frequently in "12" being defined as SD or SE, or as abstract idea, spawned by the imaginary pediatrics than in other branches of some other index of dispersion. Eisen¬ world of and pertinent medicine, usually in growth charts or in hart1 warned against this "peril of only when certain operations of that imagi¬ other data arrays that are clearly not world are met in scientific nary reality.2(p336) symmetric or bell shaped. In the gen¬ Glantz3 also has made the following rec¬ eral medical literature, percentiles are From the Los Lunas Hospital and Training ommendation: because of a School, New Mexico, and the Department of Pedi- sparsely used, apparently atrics, University of New Mexico School of Medi- Most medical investigators summarize their common, but erroneous, assumption ± cine, Albuquerque. data with the standard error because it is that the mean SD or SE is satisfactory Reprint requests to Los Lunas Hospital and central and Training School, Box 1269, Los Lunas, NM 87031 always smaller than the standard deviation. for summarizing tendency (Dr Brown). It makes their data look better data dispersion of all sorts of data.

. . . STANDARD DEVIATION The generally accepted answer to the a for ( -µ)' - )7 need for concise expression the SD = dispersion of data is to the differ¬ "V< ence of each value from the group mean, giving all positive values. When these SD of Population Estimate of Population SD From squared deviations are added up and then divided by the number of values in µ = Mean of Population X = Mean of Sample the group, the result is the . = Number in Population = Number in Sample The variance is always a positive num¬ ber, but it is in different units than the mean. The way around this inconve¬ Fig 1.—Standard deviation (SD) of population is shown at left. Estimate of population SD derived from is shown at nience is to use the square root of the sample right. variance, which is the population stan¬ dard deviation ( ), which for conve¬ nience will be called SD. Thus, the SD is the square root of the averaged squared QT) - ) pq mean. The is = / ( deviations from the SD SD =_ = SEM SE sometimes called by the shorthand * s/a term, "root-mean-square." The SD, calculated in this way, is in the same as the values and units original SEM SE of Proportion the mean. The SD has additional prop¬ erties that make it attractive for sum¬ SD = Estimate of Population SD = Proportion Estimated From Sample if the = (1 marizing dispersion, especially = Sample Size q -P) data are distributed symmetrically = Sample Size in the revered bell-shaped, gaussian curve. Although there are an infinite Fig 2.—Standard error of mean (SEM) is shown at left. Note that SD is estimate of population SD number of gaussian curves, the one for (not , actual SD of population). Sample size used to calculate SEM is n. Standard error of the data at hand is described completely proportion is shown at right. by the mean and SD. For example, the mean+ 1.96 SD will enclose 95% of the values; the mean ±2.58 SD will enclose that no matter how many times the die determine the deviations is concep¬ 99% of the values. It is this symmetry is thrown, it will never show its aver¬ tualized as an estimate of the mean, x, and elegance that contribute to our age score of 3.5.) rather than as a true and exact popula¬ admiration of the gaussian curve. The SD wears two hats. So far, we tion mean (µ). Both are calcu¬ The bad news, especially for biologic have looked at its role as a descriptive lated in the same way, but a population data, is that many collections of mea¬ for or counts mean, µ, stands for itself and is a pa¬ surements or counts are not sym¬ that are representative only of them¬ rameter; a sample mean, x, is an esti¬ metric or bell shaped. Biologic data selves, ie, the data being summarized mate of the mean of a larger population tend to be skewed or double humped, J are not a sample representing a larger and is a statistic. shaped, U shaped, or flat on top. Re¬ (and itself unmeasurable) universe or The second change in calculation is in gardless of the shape of the distribu¬ population. the arithmetic: the sum of the squared tion, it is still possible by rote arithme¬ The second hat involves the use of SD deviations from the (estimated) mean is tic to calculate an SD although it may from a random sample as an estimate of divided by -1, rather than by N. (This be inappropriate and misleading. the population standard deviation ( ). makes sense intuitively when we recall For example, one can imagine The formal statistical language says that a sample would not show as great a throwing a six-sided die several hun¬ that the sample statistic, SD, is an spread of values as the source popula¬ dred times and recording the score at unbiased estimate of a population pa¬ tion. Reducing the denominator [by each throw. This would generate a rameter, the population standard devia¬ one] produces an estimate slightly flattopped, ie, rectangular, distribu¬ tion, . larger than the sample SD. This "cor¬ tion, with about the same number of This " SD" is calculated dif¬ rection" has more impact when the sam¬ counts for each score, 1 through 6. The ferently than the SD used to describe ple is small than when is large.) mean ofthe scores would be 3.5 and the data that represent only themselves. Formulas for the two versions of SD SD would be about 1.7. The trouble is When a sample is used to make esti¬ are shown in Fig 1. The formulas follow that the collection of scores is not bell mates about the population standard the customary use of Greek letters for shaped, so the SD is not a good sum¬ deviation, the calculations require two population parameters and English let¬ mary statement of the true form of the changes, one in concept and the other in ters for sample statistics. The number data. (It is mildly upsetting to some arithmetic. First, the mean used to in a sample is indicated by the lowercase " ," and the number in a population is SE. At first glance, the SE looks like a fore, the narrower the confidence in¬ indicated by the capital "N." measure of dispersion, just as the SD terval. Stated differently, if the esti¬ The two-faced nature of the SD has does. The trouble is that the dispersion mate of a population mean is from a caused tension between medical in¬ implied by the SE is different in nature large sample, the interval that proba¬ vestigators on the one hand and statisti¬ than that implied by the SD. bly brackets the population mean is cians on the other. The investigator may The SE is always an estimator of a narrower for the same level of confi¬ believe that the subjects or measure¬ population characteristic; it is not a dence (probability). To reduce the con¬ ments he is summarizing are self- descriptive statistic—it is an inferen¬ fidence interval by half, it is necessary contained and unique and cannot be tial statistic. The SE is an estimate of to increase the sample size by a multi¬ thought of as a random sample. There¬ the interval into which a population ple of four. For readers who know that fore, he may decide to use the SD as a parameter will probably fall. The SE the SD is preferred over the SEM as descriptive statement about dispersion also enables the investigator to choose an index for describing dispersion of of his data. On the other hand, the the probability that the parameter will gaussian data, the formula for the biostatistician has a tendency, because fall within the estimated interval, usu¬ SEM can be used (in reverse, so to of his training and widespread statis¬ ally called the "." speak) to calculate the SD, if sample tical practice, to conceive ofthe SD as an Here is a statement containing the size is known. estimator of a parameter of a popula¬ SE: The mean of the sample was The theoretical meaning of the SEM tion. The may hold the view 73 mg/dL, with an SE of the mean of is quite engaging, as an example will that any small collection of data is a 3 mg/dL. This implies that the mean of show. One can imagine a population stepping-stone to higher things. the population from which the sample that is too large for every element to be The pervasive influence of statisti¬ was randomly taken will fall, with measured. A sample is selected ran¬ cians is demonstrated in the program 95% probability, in the interval of domly, and its mean is calculated, then for calculating the SD that is put into 73 ±(1.96x3), which is from 67.12 the elements are replaced. The selec¬ many handheld calculators; they usu¬ to 78.88. Technically the statement tion and measuring are repeated sev¬ ally calculate the estimator SD rather should be: 95 out of 100 confidence eral times, each time with replace¬ than the "descriptor SD." intervals calculated in this manner will ment. The collection of means of the In essence, the investigator and his include the population parameter. If samples will have a distribution, with a statistical advisor, the journal review¬ 99% probability is desired, the confi¬ mean and an SD. The mean of the ers, and the editors all confront a criti¬ dence interval is 73 ±(2.58 3), which sample means will be a good estimate cal decision whenever they face the is from 65.26 to 80.74. of the population mean, and the SD of term "standard deviation." Is it a de¬ As Feinstein2 notes, the SE has the means will be the SEM. Figure 2 scriptive statistic about a collection of nothing to do with standards or with uses the symbol SD8 to show that a (preferably gaussian) data that stand errors; it has to do with predicting collection of sample means (x) has a free and independent of con¬ confidence intervals from samples. Up SD, and it is the SEM. The interpreta¬ straints, ie, is it a straightforward to this point, I have used SE as though tion is that the true population mean indication of dispersion? Or, is the SD it meant only the SE of the mean (µ) will fall, with 95% probability, being used as an estimate of a popula¬ (SEM). The SE should not be used within ±1.96 SEM of the mean of the tion parameter? Although the SD is without indicating what parameter in¬ means. commonly used to summarize medical terval is being estimated. (I broke that Here, we see the charm and attrac¬ information, it is rare that the reports rule for the sake of clarity in the intro¬ tiveness of the SEM. It enables the indicate which version of the SD is duction of the contrast between SD investigator to estimate from a sam¬ being used. and SE.) ple, at whatever level of confidence statistic can be used desired, the interval STANDARD ERROR Every sample (probability) to estimate an SE; there is an SE for within which the population mean will In some ways, standard error is the mean, for the difference between fall. If the user wishes to be very simpler than the SD, but in other the means of two samples, for the confident in his interval, he can set the ways, it is much more complex. First, of a regression line, and for a- correla¬ brackets at±3.5 SEM, which would the simplicities will be discussed. The tion coefficient. Whenever the SE is "capture" the mean with 99.96% prob¬ SE is always smaller than the SD. This used, it should be accompanied by a ability. may account for its frequent use in symbol that indicates which of the sev¬ Standard errors in general have medical publications; it makes the data eral SEs it represents, eg, SEM for SE other seductive properties. Even look "tighter" than does the SD. In the of the mean. when the sample comes from a popula¬ previously cited quotation by Glantz,3 Figure 2 shows the formula for tion that is skewed, U shaped, or flat the implication is that the SE might be calculating the SEM from the sample; on top, most SEs are of used in a conscious attempt at distor¬ the formula requires the estimator nearly gaussian distributions for the tion or indirection. A more charitable SD, ie, the SD calculated using n-1, statistic of interest. For example, for view is that many researchers and not N. It is apparent from the formula samples of size 30 or larger, the SEM clinicians simply are not aware of the for the SEM that the larger the sample and the sample mean, x, define a important differences between SD and size, the smaller the SEM and, there- nearly gaussian distribution (of sam- pie means), regardless of the shape of ported in proportions or percentages, Pediatrics. In a less systematic way, I the population distribution. such as, "Six of the ten patients with perused several issues of JAMA, the These elegant features of the SEM zymurgy syndrome had so-and-so." New England Journal ofMedicine, and are embodied in a statistical principle From this, it is an easy step to say, Science. called the , "Sixty percent of our patients with Every issue of the three pediatrie which says, among other things: zymurgy syndrome had so-and-so." The journals had articles, reports, or letters of such a statement be in which SD was without The mean of the collection of many sample implication may mentioned, that the author wishes to alert other of whether it was the means is a good estimate of the mean of the specification or Ev¬ population, and the distribution of the sam¬ clinicians, who may encounter samples descriptive SD the estimate SD. ple means (if = 30 or larger) will be nearly from the universe of patients with ery issue of the Journal of Pediatrics gaussian regardless of the distribution of the zymurgy syndrome that they may see contained articles using SE (unspec¬ population from which the samples are so-and-so in about 60% of them. ified) and articles using SEM. Pedi¬ taken. The proportion—six of ten—has an atrics used SEM in every issue and the The theorem also says that the collec¬ SE of the proportion. As shown in Fig 2, SE in every issue except one. Eight of tion of sample means from large sam¬ the SEP in this situation is the square the 12 issues of the American Journal of ples will be better in estimating the root of (0.6 x 0.4) divided by ten, which Diseases of Children used SE or SEM population mean than means from small equals 0.155. The true proportion of so- or both. All the journals used SE as if samples. and-so in the universe of patients with SE and SEM were synonymous. Given the symmetry and usefulness zymurgy syndrome is in the confidence Every issue of the three journals con¬ of SEs in inferential statistics, it is no interval that falls symmetrically on both tained articles that stated the mean and wonder that some form of the SE, sides of six of ten. lb estimate the range, without other indication of especially the SEM, is used so fre¬ interval, we start with 0.6 or 60% as the dispersion. Every journal contained re¬ quently in technical publications. A midpoint of the interval. At the 95% ports with a number ± (another num¬ flaw occurs, however, when a confi¬ level of confidence, the interval is ber), with no explanation of what the dence interval based on the SEM is 0.6±1.96 SE„, which is 0.6 ± (1.96 x number after the plus-minus symbol used to replace the SD as a descriptive 0.155), or from 0.3 to 0.9. represented. statistic; if a description ofdata spread If the sample shows six of ten, the Every issue of the pediatrie journals is needed, the SD should be used. As 95% confidence interval is between 30% presented proportions of what might be Feinstein2 has observed, the reader of (three often) and 90% (nine often). This thought of as samples without indicat¬ a research report may be interested in is not a very narrow interval. The ex¬ ing that the SE„ (standard error of the the span or range of the data, but the panse of the interval may explain the proportion) might be informative. author of the report instead displays almost total absence of the SE„ in medi¬ In several reports, SE or SEM is used an estimated zone of the mean (SEM). cal reports, even in journals where the in one place, but SD is used in another An absolute prohibition against the SEM and SD are used abundantly. In¬ place in the same article, sometimes in use of the SEM in medical reports is vestigators may be dismayed by the the same paragraph, with no explana¬ not desirable. There are situations in dimensions of the confidence interval tion of the reason for each use. The use which the investigator is using a truly when the SE,, is calculated from the of percentiles to describe nongaussian random sample for estimation pur¬ small samples available in clinical situa¬ distributions was infrequent. Similar poses. Random samples of children tions. examples of stylistic inconsistency were have been used, for example, to es¬ Of course, as in the of seen in the haphazard survey of JAMA, timate population parameters of self-contained data, the investigator the New England Journal ofMedicine, growth. The essential element is that may not think of his clinical material as a and Science. the investigator (and editor) recognize sample from a larger universe. But A peculiar graphic device (seen in when should be often, it is clear that the purpose of several journals) is the use, in illustra¬ used, and when inferential (estima¬ publication is to suggest to other in¬ tions that summarize data, of a point tion) statistics are required. vestigators or clinicians that, when they and vertical bars, with no indication of see of a certain what the of the bars signifies. SE OF PROPORTION patients type, they length might expect to encounter certain char¬ A prevalent and unsettling practice is As mentioned previously, every sam¬ acteristics in some estimated propor¬ the use of the mean ± SD for data that ple statistic has its SE. With every tion of such patients. are clearly not gaussian or not sym¬ there is a confidence interval metric. Whenever data are reported statistic, JOURNAL USE OF SD AND SE that can be estimated. Despite the with the SD as large or larger than the widespread use of SE (unspecified) and lb get empiric information about pe¬ mean, the inference must be that sev¬ of SEM in medical journals and books, diatrie journal standards on descriptive eral values are zero or negative. The there is a noticeable neglect of one statistics, especially the use of SD and mean ±2 SDs should embrace about important SE, the SE ofthe proportion. SE, I examined every issue of the three 95% of the values in a gaussian distribu¬ The discussion so far has dealt with major pediatrie journals published in tion. If the SD is as large as the mean, measurement data or counts of ele¬ 1981: American Journal of Diseases of then the lower tail of the bell-shaped ments. Equally important are data re- Children, Journal of Pediatrics, and curve will go below zero. For many biologie data, there can be no negative controls and treated subjects, when statistical style, the reply often is, values; blood chemicals, serum en¬ such a difference exists. This failure is "The editors made me do it." zymes, and cellular elements cannot called the "error ofthe second kind," the An articulate defender of good sta¬ exist in negative amounts. Type II error, or the beta error. In tistical practice and usage is Feins¬ An article by Fletcher and Fletcher4 laboratory language, this error is called tem,2 who has regularly and effectively entitled "Clinical Research in General the false-negative result, in which the urged the appropriate application of Medical Journals" in a leading publica¬ test result says "normal" but nature , including SD and SE. In tion demonstrates the problem of ± SD reveals "abnormal" or "disease pres¬ his book, Clinical Biostatistics, he in real life. The article states that in 1976 ent." (The Type I error, the error, devotes an entire chapter (chap 23, pp certain medical articles had an average is a more familiar one; it is the error of 335-352) to "problems in the summary of 4.9 authors ±7.3 (SD)! If the author¬ saying that two groups differ in some and display of statistical data." He ship distribution is gaussian, which is important way when they do not. The offers some advice to readers who wish necessary for ± SD to make sense, this Type I error is like a false-positive to improve the statistics seen in medi¬ statement means that 95% of the arti¬ laboratory test in that the test suggests cal publications: "And the best person cles had 4.9±(1.96x7.3) authors, or that the subject is abnormal, when in to help re-orient the editors is you, from -9.4 to +19.2. Or stated another truth he is normal.) dear reader, you. Make yourself a one- way, more than 25% of the articles had In comparative trials, calculation of person vigilante committee."2