<<

Psychological Bulletin 1961, Vol. 58, No. 4, 305-316

SCALES AND : PARAMETRIC AND NONPARAMETRIC1 NORMAN H. ANDERSON University of California, Los Angeles

The recent rise of interest in the use since the former do not apply to of nonparametric tests stems from strictly categorical (but see two main sources. One is the concern Cochran, 1954). Second, parametric about the use of parametric tests tests will tests of significance when the underlying assumptions are which assume equinormality, i.e., not met. The other is the problem of normality and some form of homo- whether or not the measurement scale geneity of . For convenience, is suitable for application of paramet- parametric test, F test, and analysis ric procedures. On both counts of variance will be used synony- parametric tests are generally more in mously. Although this usage is not danger than nonparametric tests. strictly correct, it should be noted Because of this, and because of a that the t test and natural enthusiasm for a new tech- may be considered as special applica- nique, there has been a sometimes tions of F. Nonparametric tests will uncritical acceptance of nonparamet- refer to significance tests which make ric procedures. By now a certain considerably weaker distributional degree of agreement concerning the assumptions as exemplified by rank more practical aspects involved in the order tests such as the Wilcoxon T, choice of tests appears to have been the Kruskal-Wallis H, and by the reached. However, the measurement various -type tests. Third, the theoretical issue has been less clearly main focus of the article is on tests of resolved. The principal purpose of significance with a lesser emphasis on this article is to discuss this latter . Problems of issue further. For the sake of com- estimation are touched on only pleteness, a brief overview of practi- slightly although such problems are cal statistical considerations will also becoming increasingly important. be included. Finally, a word of caution is in A few preliminary comments are order. It will be concluded that needed in order to circumscribe the parametric procedures constitute the subsequent discussion. In the first everyday tools of psychological sta- place, it is assumed throughout that tistics, but it should be realized that the data at hand arise from some sort any area of investigation has its own of measuring scale which gives nu- statistical peculiarities and that gen- merical results. This restriction is eral statements must always be implicit in the proposal to compare adapted to the prevailing practical parametric and nonparametric tests situation. In many cases, as in pilot work, for instance, or in situations in which data are cheap and plentiful, 1 An earlier version of this paper was pre- nonparametric tests, shortcut para- sented at the April 1959 meetings of the Western Psychological Association. The au- metric tests (Tate & Clelland, 1957), thor's thanks are due F. N. Jones and J. B. or tests by visual inspection may well Sidowski for their helpful comments. be the most efficient. 305 306 NORMAN H. ANDERSON PRACTICAL STATISTICAL analogous inflation occurs with non- CONSIDERATIONS parametric tests. There are para- The three main points of compari- metric multiple comparison proce- son between parametric and non- dures which are rigorously applicable parametric tests are significance level, in many such situations (Duncan, power, and versatility. Most of the 1955; Federer, 1955) but analogous relevant considerations have been nonparametric techniques have as treated adequately by others and yet been developed in only a few only a brief summary will be given cases. here. For more detailed discussion, Power. As Dixon and Massey the articles of Cochran (1947), Sav- (1957) note, rank order tests are age (1957), Sawrey (1958), Gaito nearly as powerful as parametric (1959), and Boneau (1960) are espe- tests under equinormality. Con- cially recommended. sequently, there would seem to be no Significance level. The effects of pressing reason in most investiga- lack of equinormality on the signifi- tions to use parametric techniques cance level of parametric tests have for reasons of power if an appropriate received considerable study. The rank order test is available (but see two handiest sources for the psy- Snedecor, 1956, p. 120). Of course, chologist are Lindquist's (1953) cita- the loss of power involved in dichoto- tion of Norton's work, and the recent mizing the data for a median-type article of Boneau (1960) which sum- test is considerable. marizes much of the earlier work. Although it might thus be argued The main conclusion of the various that rank order tests should be gen- investigators is that lack of equi- erally used where applicable, it is to normality has remarkably little effect be suspected that such a practice although two exceptions are noted: would produce negative transfer to one-tailed tests and tests with con- the use of the more incisive experi- siderably disparate cell n's may be mental designs which need para- rather severely affected by unequal metric analyses. The logic and com- .2 puting rules for the analysis of vari- ance, however, follow a uniform pat- A somewhat different source of tern in all situations and thus provide perturbation of significance level should also be mentioned. An over- maximal positive transfer from the all test of several conditions may simple to the more complex experi- ments. show that something is significant but will not localize the effects. As is There is also another aspect of power which needs mention. Not in- well known, the common practice of t testing pairs of tends to in- frequently, it is possible to use exist- ing data to get a rough idea of the flate the significance level even when chances of success in a further related the over-all F is significant. An , or to estimate the N re- quired for a given desired probability 2 The split-plot designs (e.g., Lindquist, 1953) commonly used for the analysis of re- of success (Dixon & Massey, 1957, peated or correlated observations have been Ch. 14). Routine methods are avail- subject to some criticism (Cotton, 19S9; able for these purposes when para- Greenhouse & Geisser, 19S9) because of the metric statistics are employed but additional assumption of equal correlation which is made. However, tests are available similar procedures are available only which do not require this assumption (Cotton, for certain nonparametric tests such 1959; Greenhouse & Geisser, 1959; Rao, 1952). as chi square. PARAMETRIC AND 307 Versatility. One of the most re- quately or not at all by current non- markable features of the analysis of parametric methods. variance is the breadth of its ap- It thus seems fair to conclude that plicability, a point which has been parametric tests constitute the stand- emphasized by Gaito (1959). For ard tools of psychological statistics. present purposes, the ordinary fac- In respect of significance level and torial design will serve to exemplify power, one might claim a fairly even the issue. Although factorial designs match. However, the versatility of are widely employed, their uses in the parametric procedures is quite un- investigation and control of minor matched and this is decisive. Unless variables have not been fully ex- and until nonparametric tests are de- ploited. Thus, Feldt (1958) has veloped to the point where they meet noted the general superiority of the the routine needs of the researcher as factorial design in or equat- exemplified by the above designs, ing groups, an important problem they cannot realistically be con- which is but poorly handled in cur- sidered as competitors to parametric rent research (Anderson, 1959). tests. Until that day, nonparametric Similarly, the use of replications as a tests may best be considered as use- factor in the design makes it possible ful minor techniques in the analysis to test and partially control for drift of numerical data. or shift in apparatus, procedure, or Too promiscuous a use of F is, of subject population during the course course, not to be condoned since there of an experiment. In the same way, will be many situations in which the taking experimenters or stimulus ma- data are distributed quite wildly. terials as a factor allows tests which Although there is no easy rule with bear on the adequacy of standardiza- which to draw the line, a frame of tion of the experimental procedures reference can be developed by study- and on the generalizability of the re- ing the results of Norton (Linquist, sults. 1953) and of Boneau (1960). It is An analogous argument could be also quite instructive to compare p given for latin squares, largely re- values for parametric and nonpara- habilitated by the work of Wilk and metric tests of the same data. Kempthorne (1955), which are useful It may be worth noting that one of when subjects are given successive the reasons for the popularity of non- treatments; for orthogonal poly- parametric tests is probably the cur- nomials and trend tests for corre- rent obsession with questions of sta- lated scores (Grant, 1956) which give tistical significance to the neglect of the most sensitive tests when the in- the often more important questions of dependent variable is scaled; as well design and power. Certainly some as for the multivariate analysis of minimal degree of is gen- variance (Rao, 1952) which is appli- erally a necessary justification for cable to correlated dependent vari- asking others to spend time in assess- ables measured on incommensurable ing the importance of one's data. scales. However, the question of statistical The point to these examples and to significance is only a first step, and a the more extensive treatment by relatively minor one at that, in the Gaito is straightforward. Their anal- over-all process of evaluating a set of ysis is more or less routine when results. To say that a result is sta- parametric procedures are used. tistically significant simply gives However, they are handled inade- reasonable ground for believing that 308 NORMAN H. ANDERSON some nonchance effect was obtained. Coombs, 1952) have considered vari- The meaning of a nonchance effect ous scales which lie between the or- rests on an assessment of the design of dinal and interval scales. However, the investigation. Even with judi- it will not be necessary to take this cious design, however, phenomena further refinement of the scale typol- are seldom pinned down in a single ogy into account here. study so that the question of replica- As before, we suppose that we have bility in further work often arises a measuring scale which assigns num- also. The statistical aspects of these bers to events of a certain class. It is these two questions are not without assumed that this measuring scale is importance but tend to be neglected an ordinal scale but not necessarily when too heavy an emphasis is an interval scale. In order to fix placed on p values. As has been ideas, consider the following example. noted, it is the parametric procedures Suppose that we are interested in which are the more useful in both re- studying attitude toward the church. spects. Subjects are randomly assigned to two groups, one of which, reads Com- MEASUREMENT SCALE munication A, while the other reads CONSIDERATIONS Communication B. The subjects' The second and principal part of attitudes towards the church are the article is concerned with the rela- then measured by asking them to tions between types of measurement check a seven category pro-con rating scales and statistical tests. For con- scale. Our problem is whether the venience, therefore, it will be as- data give adequate reason to con- sumed that lack of equinormality clude that the two communications presents no serious problem. Since had different effects. the F ratio remains constant with To ascertain whether the com- changes in unit or zero point of the munications had different effects, measuring scale, we may ignore ratio some statistical test must be ap- scales and consider only ordinal and plied. In some cases, to be sure, the interval scales. These scales are de- effects may be so strong that the test nned following Stevens (1951). can be made by inspection. In most Briefly, an ordinal scale is one in cases, however, some more objective which the events measured are, in method is necessary. An obvious some empirical sense, ordered in the procedure would be to assign the same way as the arithmetic order of numbers 1 to 7, say, to the rating the numbers assigned to them. An scale categories and apply the F test, interval scale has, in addition, an at least if the data presented some equality of unit over different parts semblance of equinormality. How- of the scale. Stevens goes on to char- ever, some writers on statistics (e.g., acterize scale types in terms of Siegel, 1956; Senders, 1958) would permissible transformations. For an object to this on the ground that the ordinal scale, the permissible trans- rating scale is only an ordinal scale, formations are monotone since they the data are therefore not "truly leave rank order unchanged. For an numerical," and hence that the interval scale, only the linear trans- operations of addition and multipli- formations are permissible since cation which are used in computing F only these leave relative distance cannot meaningfully be applied to unchanged. Some workers (e.g., the scores. There are three different PARAMETRIC AND NONPARAMETRIC STATISTICS 309 questions involved in this objection, The case in which equinormality and much of the controversy over does not hold remains to be consid- scales and statistics has arisen from ered. We may still use F, of course, a failure to keep them separate. Ac- and as has been seen in the first part, cordingly, these three questions will we would still have about the same be taken up in turn. significance level in most cases. The Question 1. Can the F test be ap- F test might have less power than a plied to data from an ordinal scale? It rank order test so that the latter is convenient to consider two cases of might be preferable in this simple two this question according as the as- group experiment. However, insofar sumption of equinormality is satis- as we wish to inquire into the relia- fied or not. Suppose first that equi- bility of the difference between the normality obtains. The caveat measured behavior of the two groups against parametric statistics has been in our particular experiment, the stated most explicitly by Siegel choice of statistical test would be (1956) who says: governed by purely statistical consid- The conditions which must be satisfied . . . erations and have nothing to do with before any confidence can be placed in any scale type. probability statement obtained by the use of Question 2. Will statistical results be the t test are at least these: ... 4. The vari- invariant under change of scale? The ables involved must have been measured in at least an interval scale ... (p. 19). (By permis- problem of invariance of result stems sion, from Nonparametric Statistics, by from the work of Stevens (1951) who S. Siegel. Copyright, 1956. McGraw-Hill observes that a computed on Book Company, Inc.) data from a given scale will be invari- This statement of Siegel's is com- ant when the scale is changed accord- pletely incorrect. This particular ing to any given permissible transfor- question admits of no doubt whatso- mation. It is important to be precise ever. The F (or t) test may be about this usage of invariance. It applied without qualm. It will then means that if a statistic is computed answer the question which it was de- from a set of scale values and this signed to answer: can we reasonably statistic is then transformed, the conclude that the difference between identical result will be obtained as the means of the two groups is real when the separate scale values are rather than due to chance? The transformed and the statistic is com- justification for using F is purely puted from these transformed scale statistical and quite straightfor- values. ward ; there is no need to waste space Now our scale of attitude toward on it here. The reader who has the church is admittedly only an doubts on the matter should postpone ordinal scale. Consequently, we them to the discussion of the two would expect it to change in the subsequent questions, or read the direction of an interval scale in future elegant and entertaining article by work. Any such scale change would Lord (1953). As Lord points out, correspond to a monotone transfor- the statistical test can hardly be mation of our original scale since only cognizant of the empirical meaning of such transformations are permissible the numbers with which it deals. with an ordinal scale. Suppose then Consequently, the validity of a sta- that a monotone transformation of tistical inference cannot depend on the scale has been made subsequent the type of measuring scale used. to the experiment on attitude change. 310 NORMAN H. ANDERSON We would then have two sets of data: cooperate in the experimental work, the responses as measured on the making the same observations, except original scale used in the experiment, that they use different measuring and the transformed values of these scales. P decides to measure time responses as measured on the new, intervals. He reasons that it makes transformed scale. (Presumably, sense to speak of one time interval as these transformed scale values would being twice another, that time inter- be the same as the subjects would vals therefore form a ratio scale, and have made had the new scale been hence a fortiori an interval scale. Q used in the original experiment, al- decides to measure the speed of the though this will no doubt depend on process (feet per second, problems per the experimental basis of the new minute). By the same reasoning as scale.) The question at issue then used by P, Q concludes that he has an becomes whether the same signifi- interval scale also. Both P and Q are cance results will be obtained from aware of current strictures about the two sets of data. If rank order tests are used, the same significance results will be found in either case because any permissible transforma- tion leaves rank order unchanged. However, if parametric tests are employed, then different significance Ul statements may be obtained from the 2 two sets of data. It is possible to get a significant F from the original data and not from the transformed data, and vice versa. Worse yet, it is even logically possible that the means of the two groups will lie in reverse order on the two scales. The state of affairs just described is clearly undesirable. If taken uncriti- cally, it would constitute a strong argument for using only rank order tests on ordinal scale data and re- o aUil stricting the use of F to data obtained o_ from interval scales. It is the purpose VI of this section to show that this con- clusion is unwarranted. The basis of the argument is that the naming of the scales has begged the psychologi- cal question. Consider interval scales first, and imagine that two students, P and Q, FIG. 1. Temporal aspects of some process in an elementary lab course are as- obtained from a 2X2 design. (The data are signed to investigate some process. plotted as a function of Variable A with Vari- This process might be a ball rolling on able B as a . Subscripts denote the two levels of each variable. Note that Panel a plane, a rat running an alley, or a P shows an , but that Panel Q does child doing sums. The students not.) PARAMETRIC AND NONPARAMETRIC STATISTICS 311 scales and statistics. However, since each believes (and rightly so) that he has an interval scale, each uses means and applies parametric tests in writ- ing his lab report. Nevertheless, when they compare their reports they find considerable difference in their z descriptive statistics and graphs (Fig- o ure 1), and in their F ratios as well. V) z Consultation with a statistican shows Ul that these differences are direct con- 5 sequences of the difference in the measuring scales. Evidently then, possession of an interval scale does not guarantee invariance of interval scale statistics. For ordinal scales, we would expect to obtain invariance of result by using ordinal scale statistics such as the DIMENSION I median (Stevens, 1951). Let us sup- FIG. 2. The curved line represents the ordi- pose that some future investigator nal scale of attitude toward the church plotted finds that attitude toward the church in the two-dimensional space underlying the is multidimensional in nature and attitude. (Points A and B denote the has, in fact, obtained interval scales of two experimental groups. The graph is for each of the dimensions. In some hypothetical, of course.) of his work he chanced to use our original ordinal scale so that he was contrary to our ordinal scale results. able to find the relation between this Evidently then, possession of an ordinal scale and the multidimen- ordinal scale does not guarantee sional representation of the attitude. invariance of ordinal scale statistics. His results are shown in Figure 2. A rather more drastic loss of invari- Our ordinal scale is represented by ance would occur if the ordinal scale the curved line in the plane of the two were measuring the resultant effect of dimensions. Thus, a greater distance two or more underlying processes. from the origin as measured along the This could happen, for instance, in line stands for a higher value on our the study of approach-avoidance ordinal scale. Points A and B on the conflict, or ambivalent behavior, as curve represent the medians of Groups might be the case with attitude A and B in our experiment, and it is toward the church. In such situa- seen that Group A is more pro-church tions, two people could give identical than Group B on our ordinal scale. responses on the one-dimensional scale The median scores for these two and yet be quite different as regards groups on the two dimensions are the two underlying processes. For obtained simply by projecting Points instance, the same resultant could A and B onto the two dimensions. All occur with two equal opposing tend- is well on Dimension 2 since there encies of any given strength. Repre- Group A is greater than Group B. On senting such data in the space formed Dimension 1, however, a reversal is by the underlying dimensions would found: Group A is less than Group B, yield a smear of points over an entire 312 NORMAN H. ANDERSON region rather than a simple curve as invariance of result over the class of in Figure 2. scale changes which must actually be Although it may be reasonable to considered by the investigator in his think that simple sensory phenomena work. are one-dimensional, it would seem This point is no doubt pretty obvi- that a considerable number of psy- ous, and it should not be thought that chological variables must be con- those who have taken up the scale- ceived of as multidimensional in type ideas are unaware of the prob- nature as, for instance, with "IQ" lem. Stevens, at least, seems to ap- and other personality variables. Ac- preciate the difficulty when, in the cordingly, as the two cited examples concluding section of his 1951 article, show, there is no logical guarantee he distinguishes between psychologi- that the use of ordinal scale statistics cal dimensions and indicants. The will yield invariant results under scale former may be considered as inter- changes. vening variables whereas the latter It is simple to construct analogous are effects or correlates of these vari- examples for nominal scales. How- ables. However, it is evident that an ever, their only relevance would be to indicant may be an interval scale in show that a reduction of all results to the customary sense and yet bear a categorical data does not avoid the complicated relation to the underly- difficulty with invariance. ing psychological dimensions. In such It will be objected, of course, that cases, no procedure of descriptive or the argument of the examples has inferential statistics can guarantee in- violated the initial assumption that variance over the class of scale changes only ' 'permissible" transformations which may become necessary. would be used in changing the meas- It should also be realized that only uring scales. Thus, speed and time a partial list of practical problems of are not linearly related, but rather invariance has been considered. Ef- the one is a reciprocal transformation fects on invariance of improvements of the other. Similarly, Dimension 1 in experimental technique would also of Figure 2 is no monotone transfor- have to be taken into account since mation of the original ordinal scale. such improvements would be ex- This objection is correct, to be sure, pected to purify or change the de- but it simply shows that the problem pendent variable as well as decrease of invariance of result with which one variability. There is, in addition, a is actually faced in science has no problem of invariance over subject particular connection with the invari- population. Most researches are ance of "permissible" statistics. The based on some handy sample of sub- examples which have been cited show jects and leave more or less doubt that knowing the scale type, as deter- about the generality of the results. mined by the commonly accepted Although this becomes in large part criteria, does not imply that future an extrastatistical problem (Wilk & scales measuring the same phenom- Kempthorne, 1955), it is one which ena will be "permissible" transfor- assumes added importance in view of mations of the original scale. Hence Cronbach's (1957) emphasis on the the use of "permissible" statistics, interaction of experimental and sub- although guaranteeing invariance of ject variables. In the face of these result over the class of "permissible" assorted difficulties, it is not easy to transformations, says little about see what utility the scale typology PARAMETRIC AND NONPARAMETRIC STATISTICS 313 has for the practical problems of the mations which must be considered- investigator. In addition, whatever psychological The preceding remarks have been worth the original rating scale pos- intended to put into broader perspec- sesses will limit still further the trans- tive that sort of invariance which is formations which will occur in prac- involved in the use of permissible tice. statistics. They do not, however, Although rank order tests do pos- solve the immediate problem of sess some logical advantage over whether to use rank order tests or F parametric tests when only permissi- in case only permissible transforma- ble transformations are considered, tions need be considered. Although this advantage is, in the writer's invariance under permissible scale opinion, very slight in practice and transformations may be of relatively does not begin to balance the greater minor importance, there is no point in versatility of parametric procedures. taking unnecessary risks without the The problem is, however, an empiri- possibility of compensation. cal one and it would seem that some On this basis, one would perhaps historical analysis is needed to pro- expect to find the greatest use of rank vide an objective frame of reference. order tests in the initial stages of To quote an after-lunch remark of K. inquiry since it is then that measuring MacCorquodale, "Measurement the- scales will be poorest. However, it is ory should be descriptive, not pre- in these initial stages that the possi- scriptive, nor prescriptive." Such an bly relevant variables are not well- inquiry could not fail to be fascinat- known so that the stronger experi- ing because of the light it would mental designs, and hence paramet- throw on the actual progress of ric procedures, are most needed. measurement in . One Thus, it may well be most efficient to investigation of this sort would prob- use parametric tests, balancing any ably be more useful than all the risk due to possible permissible scale speculation which has been written changes against the greater power on the topic of measurement. and versatility of such tests. In the Question 3. Will the use of paramet- later stages of investigation, we ric as opposed to nonparametric would be generally more sure of the statistics affect inferences about under- scales and the use of rank order proce- lying psychological processes? In a dures would waste information which narrow sense, Question 3 is irrelevant the scales by then embody. to this article since the inferences in At the same time, it should be question are substantive, relating to realized that even with a relatively psychological meaning, rather than crude scale such as the rating scale of formal, relating to data reliability. attitude toward the church, the Nevertheless, it is appropriate to possible permissible transformations discuss the matter briefly in order to which are relevant to the present make explicit some of the considera- discussion are somewhat restricted. tions involved because they are often Since the F ratio is invariant under confused with problems arising under change of zero and unit, it is no re- the two previous questions. With no striction to assume that any trans- pretense of covering all aspects of this formed scale also runs from 1 to 7. question, the following two examples This imposes a considerable limita- will at least touch some of the prob- tion on the permissible scale transfor- lems. 314 NORMAN H. ANDERSON The first example concerns the two three degrees of freedom. In such students, P and Q, mentioned above, cases, there will often be other para- who had used time and speed as metric tests involving specific com- dependent variables. We suppose parisons (Snedecor, 1956) or multiple that their experiment was based on a comparisons (Ducan, 1955) which are 2X2 design and yielded means as more appropriate. Occasionally also, plotted in Figure 1. This graph por- an based on a trays main effects of both variables multiplicative model (Williams, 1952) which are seen to be similar in na- will be useful (Jones & Marcus, 1961). ture in both panels. However, our A judicious choice of test may be of principal concern is with the inter- great help in dissecting the results. action which may be visualized as However, the test only answers set measuring the degree of nonparallel- questions concerning the reliability of ism of the two lines in either panel. the results; only the research worker Panel P shows an interaction. The can say which questions are appropri- reciprocals of these same data, plotted ate and meaningful. in Panel Q, show no interaction. It is Inferences based on nonparamet- thus evident in the example, and true ric tests of interaction would pre- in general, that interaction effects sumably be less sensitive to certain will depend strongly on the measur- types of scale changes. However, ing scales used. caution would still be needed in the Assessing an interaction does not interpretation as has been seen in always cause trouble, of course. Had Question 2. The problem is largely the lines in Panel P, say, crossed each academic, however, since few non- other, it would not be likely that any parametric tests of interaction exist.8 change of scale would yield uncrossed It might be suggested that the ques- lines. In many cases also, the scale tion of interaction cannot arise when used is sufficient for the purposes at only the ordinal properties of the hand and future scale changes need data are considered since the interac- not be considered. Nevertheless, it is tion involves a comparison of differ- clear that a measure of caution will ences and such a comparison is illegit- often be needed in making inferences imate with ordinal data. To the from interaction to psychological extent that this suggestion is correct, process. If the investigator envisages a parametric test can be used to the the possibility of future changes in same purposes equally well if not the scale, he should also realize that a better; to the extent that it is not cor- present inference based on significant rect, nonparametric tests will waste interaction may lose credibility in the information. light of the rescaled data. One final comment on the first It is certainly true that the inter- example deserves emphasis. Since pretation of interactions has some- both time and speed are interval times led to error. It may also be scales, it cannot be argued that the noted that the usual factorial design 3 There is a nomenclatural difficulty here. analysis is sometimes incongruent Strictly speaking, nonparametric tests should with the phenomena. In a 2X2 de- be called more-or-less distribution free tests. sign it might happen, for example, For example, the Mood-Brown generalized that three of the four cell means are median test (Mood, 1950) is distribution free, but is based on a of the same equal. The usual analysis is not sort as in the analysis of variance. As noted optimally sensitive to this one real in the introduction, the usual terminology is difference since it is distributed over used in this article. PARAMETRIC AND NONPARAMETRIC STATISTICS 315 difficulty in interpretation arises be- cause we had only ordinal scales. The second example, suggested by J. Kaswan, is shown in Figure 3. The graph, which is hypothetical, plots CO amount of aggressiveness as a func- CO Ul tion of amount of stress. A glance at uC>E the graph leads immediately to the inference that some sort of threshold effect is present. Under increasing stress, the organism remains quies- cent until the stress passes a certain STRESS threshold value, whereupon the or- FIG. 3. Aggressiveness plotted as a function ganism leaps into full scale aggressive of stress. (The curve is hypothetical. Note behavior. the hypothetical threshold effect.) Confidence in this interpretation is shaken when we stop to consider that would be entertained as guides to the scales for stress and aggression future work. Such inferences, how- may not be very good. Perhaps, ever, are made at the judgment of the when future work has given us im- investigator. Statistical techniques proved scales, these same data would may be helpful in evaluating the yield a quite different function such reliability of various features of the as a straight line. data, but only the investigator can One extreme position regarding the endow them with psychological threshold effect would be to say that meaning. the scales give rank order information and no more. The threshold infer- SUMMARY ence, or any inference based on char- This article has compared paramet- acteristics of the curve shape other ric and nonparametric statistics than the uniform upward trend, under two general headings: practical would then be completely disallowed. statistical problems, and measure- At the other extreme, there would be ment theoretical considerations. The complete faith in the scales and all scope of the article is restricted to inferences based on curve shape, situations in which the dependent including the threshold effect, would variable is numerical, thus excluding be made without fear that they would strictly categorical data. be undermined by future changes in Regarding practical problems, it the scales. In practice, one would was noted that the difference between probably adopt a position between parametric and rank order tests was these two extremes, believing, with not great insofar as significance level Mosteller (1958), that our scales and power were concerned. However, generally have some degree of numer- only the versatility of parametric ical information worked into them, statistics meets the everyday needs of and realizing that to consider only the psychological research. It was con- rank order character of the data cluded that parametric procedures would be to ignore the information are the standard tools of psychological that gives the strongest hold on the statistics although nonparametric behavior. procedures are useful minor tech- From this ill-defined middle grou nd, niques. inferences such as the threshold effect Under the heading of measurement 316 NORMAN H. ANDERSON theoretical considerations, three ques- addition, the cited example of time tions were distinguished. The well- and speed showed that interval scales known fact that an interval scale is of a given phenomenon are not not prerequisite to making a statisti- unique. The discussion of the third cal inference based on a parametric question noted that the problem of test was first pointed out. The second psychological meaning is not basi- question took up the important cally a statistical matter. It was thus problem of invariance. It was noted concluded that the type of measuring that the practical problems of invari- scale used had little relevance to the ance or generality of result far trans- question of whether to use paramet- cend measurement scale typology. In ric or nonparametric tests.

REFERENCES ANDERSON, N. H. Education for research in metrika, 1959, 24, 95-112. psychology. Amer. Psychologist, 1959, 14, JONES, F. N., & MARCUS, M. J. The subject 695-696. effect in judgments of subjective magni- BONEAU, C. A. The effects of violations of as- tude. J. exp. Psychol, 1961, 61, 40-44. sumptions underlying the t test. Psychol. LINDQUIST, E. F. Design and analysis of ex- Bull, 1960, 57, 49-64. periments. Boston: Houghton Mifflin, 1953. COCHRAN, W. G. Some consequences when LORD, F. M. On the statistical treatment of the assumptions for the analysis of variance football numbers. Amer. Psychologist, 1953, are not satisfied. Biometrics, 1947, 3, 22-38. 8, 750-751. COCHRAN, W. G. Some methods for strength- MOOD, A. M. Introduction to the theory of sta- ening the common x2 tests. Biometrics, tistics. New York: McGraw-Hill, 1950. 1954, 10, 417-451. MOSTELLER, F. The mystery of the missing COOMBS, C. H. A theory of psychological scal- corpus. Psychometrika, 1958, 23, 279-290. ing. Bull. Engrg. Res. Inst. U. Mich., 1952, RAO, C. R. Advanced statistical methods in No. 34. biometric research. New York: Wiley, 1952. COTTON, J. W. A re-examination of the re- SAVAGE, I. R. Nonparametric statistics. /. peated measurements problem. Paper read Amer. Statist. Ass., 1957, 52, 331-344. at American Statistical Association, Chi- SAWREY, W. L. A distinction between exact cago, December 1959. and approximate nonparametric methods. CRONBACH, L. J. The two disciplines of scien- Psychometrika, 1958, 23, 171-178. tific psychology. Amer. Psychologist, 1957, SENDERS, V. L. Measurement and statistics. 11, 671-684. New York: Oxford, 1958. DIXON, W. J., & MASSEY, F. J., JR. Introduc- SIEGEL, S. Nonparametric statistics. New tion to statistical analysis. (2nd ed.) New York: McGraw-Hill, 1956. York: McGraw-Hill, 1957. SNEDECOR, G. W. Statistical methods. (5th DUNCAN, D. B, Multiple and multiple ed.) Ames: Iowa State Coll. Press, 1956. F tests. Biometrics, 1955, 11, 1-41. STEVENS, S. S. Mathematics, measurement, FEDERER, W. T. Experimental design. New and psychophysics. In S. S. Stevens (Ed.), York: Macmillan, 1955. Handbook of . New FELDT, L. S. A comparison of the precision of York: Wiley, 1951. three experimental designs employing a TATE, M. W., & CLELLAND, R. C. Nonpara- concomitant variable. Psychometrika, 1958, metric and shortcut statistics. Danville, 111.: 23, 335-354. Interstate, 1957. GAITO, J. Nonparametric methods in psycho- WILK, M. B., & KEMPTHORNE, O. Fixed, logical research. Psychol. Rep., 1959, S, 115- mixed, and random models. J. Amer. 125. Statist. Ass., 1955, SO, 1144-1167. GRANT, D. A. Analysis-of-variance tests in WILLIAMS, E. J. The interpretation of inter- the analysis and comparison of curves. Psy- actions in factorial . Bio- chol. Bull, 1956, S3, 141-154. melrika, 1952, 39, 65-81. GREENHOUSE, S. W., & GEISSER, S. On meth- ods in the analysis of profile data. Psycho- (Received April 8, 1960)