The Interpretation of the Probable Error and the Coefficient of Correlation
Total Page:16
File Type:pdf, Size:1020Kb
THE UNIVERSITY OF ILLINOIS LIBRARY 370 IL6 . No. 26-34 sssftrsss^"" Illinois Library University of L161—H41 Digitized by the Internet Archive in 2011 with funding from University of Illinois Urbana-Champaign http://www.archive.org/details/interpretationof32odel BULLETIN NO. 32 BUREAU OF EDUCATIONAL RESEARCH COLLEGE OF EDUCATION THE INTERPRETATION OF THE PROBABLE ERROR AND THE COEFFICIENT OF CORRELATION By Charles W. Odell Assistant Director, Bureau of Educational Research ( (Hi THE UBMM Of MAR 14 1927 HKivERsrrv w Illinois PRICE 50 CENTS PUBLISHED BY THE UNIVERSITY OF ILLINOIS. URBANA 1926 370 lit TABLE OF CONTENTS PAGE Preface 5 i Chapter I. Introduction 7 Chapter II. The Probable Error 9 .Chapter III. The Coefficient of Correlation 33 PREFACE Graduate students and other persons contemplating ed- ucational research frequently ask concerning the need for training in statistical procedures. They usually have in mind training in the technique of making tabulations and calcula- tions. This, as Doctor Odell points out, is only one phase, and probably not the most important phase, of needed train- ing in statistical methods. The interpretation of the results of calculation has not received sufficient attention by the authors of texts in this field. The following discussion of two derived measures, the probable error and the coefficient of correlation, is offered as a contribution to the technique of educational research. It deals with the problems of the reader of reports of research, as well as those of original investigators. The tabulating of objective data and the mak- ing of calculations from the tabulations may be and fre- quently is a tedious task, but it is primarily one of routine. The interpreting of the results of calculation is not a routine task. Many conditions affect their meaning and the research worker constantly encounters new problems of interpretation. It is, however, possible to state certain general principles that will serve as a guide in this phase of educational research. Walter S. Monroe, Director. July 7, 1926 THE INTERPRETATION OF THE PROBABLE ERROR AND THE COEFFICIENT OF CORRELATION CHAPTER I INTRODUCTION Purpose of this bulletin. One of the most noticeable recent devel- opments in the field of education has been the extensive application of statistical methods to the description of educational conditions and the solution of educational problems. Only a comparatively few years ago conditions were portrayed chiefly in terms of adjectives and other ex- pressions of quality or degree, but now these have been superseded to a considerable extent by definite quantitative terms. There are at least two reasons why everyone engaged in educational work, from the class- room teacher to the research expert, should become acquainted with certain commonly used formulae, methods of computation, and other statistical procedures. In the first place, situations frequently are en- countered in which it is desirable to make use of statistical procedures for the purpose of collecting and analyzing data which have a bearing upon practical educational problems. For the great majority of educa- tional workers, however, it is probably more important to be able to interpret correctly the numerous statistical expressions and discussions which are encountered in professional reading and other work. It is almost impossible to peruse a single issue of an educational periodical or a recent book in the field of education or to attend an educational meeting without seeing or hearing many statistical terms employed. Most of the commonly used methods of computation can be mastered by practically any person of average intelligence and arithmetical ability within a rather short time, but the power of interpreting correctly the various measures derived by statistical methods is not so easily ac- quired. The acquisition of this power demands considerable familiarity with the concepts involved and this in turn requires clear and critical thinking. It is with the second of the two purposes mentioned in the preced- ing paragraph chiefly in mind that the writer has attempted in this bulletin to throw some light on the use and interpretation of two of the most frequently used statistical measures, 1 the probable error 2 (com- 'The term "statistical measure," sometimes shortened to "measure," is used in this bulletin to refer to a measure or quantitative expression which has been derived from a number of data such as scores or other measurements and which summarizes [7] monly abbreviated P.E.), and the coefficient of correlation (commonly abbreviated r), in the hope that readers will be helped in their under- standing of the significance of these terms. Since the methods of com- puting them can be found in many places, 3 their actual calculation will not be explained in detail, although the formulae for them will be given. or expresses in a single numerical index some tendency of the original data. All such expressions as means, medians, modes, measures of deviation, measures of relationship, and so on are statistical measures. In order to avoid confusion the term "measure," which is often used to refer to the result obtained by applying a measuring instrument to an individual case, will not be used in this bulletin in that sense, but "score" or "measure- ment" will be used instead. 'The term "probable error" (P.E.) has come to be generally used to include both the probable error proper and the median deviation (abbreviated Md.D.), although, as will be shown later in the discussion, the latter is in no real sense an error. For this reason and also to avoid confusion the term probable error will sometimes be used when median deviation would be preferable from the standpoint of strict accuracy of use. The reader should not obtain the idea, however, that the writer believes it is desirable to use probable error instead of median deviation; he distinctly does not believe so. 3 See: Odell, C. W. Educational Statistics. New York: The Century Company, 1925, p. 138-39, 150-8-, 221-41, or any other standard text on statistics. [8] CHAPTER II THE PROBABLE ERROR 1 Formula for the probable error. Since the probable error, as shown by the substitute term median deviation, is the median of the deviations or differences of the individual scores or measurements from their aver- 2 age, it may be computed simply by determining the median of these deviations or differences. However, the customary method is to deter- 3 4 mine the standard deviation first and then to multiply it by .6745 to obtain the probable error. In other words the usual formula for the probable error is: P.E. = .6745 a. The relationship existing between the probable error and the standard deviation is therefore of the same sort as that existing between a foot and a yard or a pint and a quart, that one always equals the other mul- tiplied by a constant factor. Thus just as .5 quart equals a pint and 2 5 pints a quart, so .6745o- equals 1 P.E. and 1.4826 P.E. equals la. Different uses of the probable error. There are several more or less different uses or meanings of the probable error, at least five of 1 The probable error, often more properly called the median deviation, is only one of several commonly used measures of the same sort. Among the other similar ones are the standard deviation (abbreviated S.D. or a (sigma) ), which in certains uses becomes the standard error, the mean deviation (M.D. or A.D.), the quartile deviation or semi-interquartile range (Q), and the 10-90 percentile range (D). All of these except the last are rather frequently encountered. In general, what- ever is said about the probable error may be applied to these other measures also. The one important exception to this statement is that since these measures, except Q, differ from the probable error in magnitude, their interpretations in numerical statements will, of course, differ. For a discussion of these other measures see: . Odell, op. cit., p. 120-38. 2 The term "average" is used here in a general sense, that is, it includes the arith- metic mean, commonly called the average, the median, the mode, and all other measures of central tendency. Deviations or differences are usually computed from the arithmetic mean but may be taken from any other measure of central tendency. t=tN in which .v denotes the deviation or difference of a particular score from the average, N stands for the total number of cases or scores and 2 (sigma) is the symbol for summation. 4 It is only in the case of a normal distribution or by chance that the probable error is equal to nearly .6745 times the standard deviation. However, most educational data form distributions which approximate normality closely enough that no serious, error is involved in using the given decimal as the multiplier. This number is of course the reciprocal of .6745. [9] which are fairly distinct from one another, and will be dealt with in this discussion. 1. A measure of the spread or variability of a distribution of data about the average. When used in this way it should properly be called the median deviation (Md.D.). If the term probable error is employed it should be followed by the words "of the distribu- tion" and abbreviated P.E. Dis. 2. A unit of measurement. This also is a use for which the term median deviation is really the correct one to employ, since it involves merely a particular use of the median deviation of a dis- tribution. Since no subscript has been agreed upon to denote this use, the writer suggests "U" for "unit." Thus, when designating the median deviation used as a unit of measurement one should write Md.D.