Scales and Statistics: Parametric and Nonparametric1 Norman H
Total Page:16
File Type:pdf, Size:1020Kb
Psychological Bulletin 1961, Vol. 58, No. 4, 305-316 SCALES AND STATISTICS: PARAMETRIC AND NONPARAMETRIC1 NORMAN H. ANDERSON University of California, Los Angeles The recent rise of interest in the use since the former do not apply to of nonparametric tests stems from strictly categorical data (but see two main sources. One is the concern Cochran, 1954). Second, parametric about the use of parametric tests tests will mean tests of significance when the underlying assumptions are which assume equinormality, i.e., not met. The other is the problem of normality and some form of homo- whether or not the measurement scale geneity of variance. For convenience, is suitable for application of paramet- parametric test, F test, and analysis ric procedures. On both counts of variance will be used synony- parametric tests are generally more in mously. Although this usage is not danger than nonparametric tests. strictly correct, it should be noted Because of this, and because of a that the t test and regression analysis natural enthusiasm for a new tech- may be considered as special applica- nique, there has been a sometimes tions of F. Nonparametric tests will uncritical acceptance of nonparamet- refer to significance tests which make ric procedures. By now a certain considerably weaker distributional degree of agreement concerning the assumptions as exemplified by rank more practical aspects involved in the order tests such as the Wilcoxon T, choice of tests appears to have been the Kruskal-Wallis H, and by the reached. However, the measurement various median-type tests. Third, the theoretical issue has been less clearly main focus of the article is on tests of resolved. The principal purpose of significance with a lesser emphasis on this article is to discuss this latter descriptive statistics. Problems of issue further. For the sake of com- estimation are touched on only pleteness, a brief overview of practi- slightly although such problems are cal statistical considerations will also becoming increasingly important. be included. Finally, a word of caution is in A few preliminary comments are order. It will be concluded that needed in order to circumscribe the parametric procedures constitute the subsequent discussion. In the first everyday tools of psychological sta- place, it is assumed throughout that tistics, but it should be realized that the data at hand arise from some sort any area of investigation has its own of measuring scale which gives nu- statistical peculiarities and that gen- merical results. This restriction is eral statements must always be implicit in the proposal to compare adapted to the prevailing practical parametric and nonparametric tests situation. In many cases, as in pilot work, for instance, or in situations in which data are cheap and plentiful, 1 An earlier version of this paper was pre- nonparametric tests, shortcut para- sented at the April 1959 meetings of the Western Psychological Association. The au- metric tests (Tate & Clelland, 1957), thor's thanks are due F. N. Jones and J. B. or tests by visual inspection may well Sidowski for their helpful comments. be the most efficient. 305 306 NORMAN H. ANDERSON PRACTICAL STATISTICAL analogous inflation occurs with non- CONSIDERATIONS parametric tests. There are para- The three main points of compari- metric multiple comparison proce- son between parametric and non- dures which are rigorously applicable parametric tests are significance level, in many such situations (Duncan, power, and versatility. Most of the 1955; Federer, 1955) but analogous relevant considerations have been nonparametric techniques have as treated adequately by others and yet been developed in only a few only a brief summary will be given cases. here. For more detailed discussion, Power. As Dixon and Massey the articles of Cochran (1947), Sav- (1957) note, rank order tests are age (1957), Sawrey (1958), Gaito nearly as powerful as parametric (1959), and Boneau (1960) are espe- tests under equinormality. Con- cially recommended. sequently, there would seem to be no Significance level. The effects of pressing reason in most investiga- lack of equinormality on the signifi- tions to use parametric techniques cance level of parametric tests have for reasons of power if an appropriate received considerable study. The rank order test is available (but see two handiest sources for the psy- Snedecor, 1956, p. 120). Of course, chologist are Lindquist's (1953) cita- the loss of power involved in dichoto- tion of Norton's work, and the recent mizing the data for a median-type article of Boneau (1960) which sum- test is considerable. marizes much of the earlier work. Although it might thus be argued The main conclusion of the various that rank order tests should be gen- investigators is that lack of equi- erally used where applicable, it is to normality has remarkably little effect be suspected that such a practice although two exceptions are noted: would produce negative transfer to one-tailed tests and tests with con- the use of the more incisive experi- siderably disparate cell n's may be mental designs which need para- rather severely affected by unequal metric analyses. The logic and com- variances.2 puting rules for the analysis of vari- ance, however, follow a uniform pat- A somewhat different source of tern in all situations and thus provide perturbation of significance level should also be mentioned. An over- maximal positive transfer from the all test of several conditions may simple to the more complex experi- ments. show that something is significant but will not localize the effects. As is There is also another aspect of power which needs mention. Not in- well known, the common practice of t testing pairs of means tends to in- frequently, it is possible to use exist- ing data to get a rough idea of the flate the significance level even when chances of success in a further related the over-all F is significant. An experiment, or to estimate the N re- quired for a given desired probability 2 The split-plot designs (e.g., Lindquist, 1953) commonly used for the analysis of re- of success (Dixon & Massey, 1957, peated or correlated observations have been Ch. 14). Routine methods are avail- subject to some criticism (Cotton, 19S9; able for these purposes when para- Greenhouse & Geisser, 19S9) because of the metric statistics are employed but additional assumption of equal correlation which is made. However, tests are available similar procedures are available only which do not require this assumption (Cotton, for certain nonparametric tests such 1959; Greenhouse & Geisser, 1959; Rao, 1952). as chi square. PARAMETRIC AND NONPARAMETRIC STATISTICS 307 Versatility. One of the most re- quately or not at all by current non- markable features of the analysis of parametric methods. variance is the breadth of its ap- It thus seems fair to conclude that plicability, a point which has been parametric tests constitute the stand- emphasized by Gaito (1959). For ard tools of psychological statistics. present purposes, the ordinary fac- In respect of significance level and torial design will serve to exemplify power, one might claim a fairly even the issue. Although factorial designs match. However, the versatility of are widely employed, their uses in the parametric procedures is quite un- investigation and control of minor matched and this is decisive. Unless variables have not been fully ex- and until nonparametric tests are de- ploited. Thus, Feldt (1958) has veloped to the point where they meet noted the general superiority of the the routine needs of the researcher as factorial design in matching or equat- exemplified by the above designs, ing groups, an important problem they cannot realistically be con- which is but poorly handled in cur- sidered as competitors to parametric rent research (Anderson, 1959). tests. Until that day, nonparametric Similarly, the use of replications as a tests may best be considered as use- factor in the design makes it possible ful minor techniques in the analysis to test and partially control for drift of numerical data. or shift in apparatus, procedure, or Too promiscuous a use of F is, of subject population during the course course, not to be condoned since there of an experiment. In the same way, will be many situations in which the taking experimenters or stimulus ma- data are distributed quite wildly. terials as a factor allows tests which Although there is no easy rule with bear on the adequacy of standardiza- which to draw the line, a frame of tion of the experimental procedures reference can be developed by study- and on the generalizability of the re- ing the results of Norton (Linquist, sults. 1953) and of Boneau (1960). It is An analogous argument could be also quite instructive to compare p given for latin squares, largely re- values for parametric and nonpara- habilitated by the work of Wilk and metric tests of the same data. Kempthorne (1955), which are useful It may be worth noting that one of when subjects are given successive the reasons for the popularity of non- treatments; for orthogonal poly- parametric tests is probably the cur- nomials and trend tests for corre- rent obsession with questions of sta- lated scores (Grant, 1956) which give tistical significance to the neglect of the most sensitive tests when the in- the often more important questions of dependent variable is scaled; as well design and power. Certainly some as for the multivariate analysis of minimal degree of reliability is gen- variance (Rao, 1952) which is appli- erally a necessary justification for cable to correlated dependent vari- asking others to spend time in assess- ables measured on incommensurable ing the importance of one's data. scales. However, the question of statistical The point to these examples and to significance is only a first step, and a the more extensive treatment by relatively minor one at that, in the Gaito is straightforward.