A COMPARISON OF METHODS FOR NORMALIZING SCORES

by

SHIRA R. SOLOMON

DISSERTATION

Submitted to the Graduate School

of Wayne State University,

Detroit, Michigan

in partial fulfillment of the requirements

for the degree of

DOCTOR OF PHILOSOPHY

2008

MAJOR: EVALUATION AND RESEARCH

Approved by:

______Advisor Date

______

______

______UMI Number: 3303509

Copyright 2008 by Solomon, Shira R.

All rights reserved.

UMI Microform 3303509 Copyright 2008 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code.

ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, MI 48106-1346

© COPYRIGHT BY

SHIRA R. SOLOMON

2008

All Rights Reserved

DEDICATION

To my maternal grandmother, Mary Karabenick Brooks, whose love of art, literature, and music has gone hand in hand with her concern for social welfare. To my paternal grandmother, Frances Hechtman Solomon, who played the cards she was dealt with style and wit.

ii

ACKNOWLEDGEMENTS

A dissertation is largely a solitary project, yet it builds on the contributions of many. I have been standing on many shoulders.

First, I would like to thank my major advisor, Professor Shlomo Sawilowsky, whose grand passion for argument made him someone I could relate to, and made seem worth doing. Dr. Sawilowsky has been generous with his time, technical help, and the spirited exegeses that put this discipline in its true human context.

Professors Gail Fahoome, Judith Abrams, and Leonard Kaplan have brought a great deal to this dissertation and to my graduate experience. Dr.

Fahoome has been an excellent teacher, consistently insightful and reassuringly low-key. I lucked into meeting Dr. Abrams through my research assistantship with the medical school. Her assistance and advice have been invaluable. Dr. Kaplan paid me the extraordinary compliment of joining my committee on the brink of his retirement. I am indebted to each of these professors for their intellectual integrity and their simple kindness.

I regret the untimely passing of Professor Donald Marcotte, who would have been proud to see this dissertation completed. Dr. Marcotte provided a wonderful initiation into the world of statistics, with his perennial admonition that the faster you can solve problems, the more time you have to enjoy life.

When it came time to apply for this doctoral program, I reached out to the professors who knew me best. I did not find them, in the end, in the ideological combat zone of my master’s program or in the artful arena of my literary studies. I

iii

found them within the seminary walls, among the rabbis and professors who taught me Talmud. Studying Talmud helped me to stop thinking so much and just learn.

For accomplishing this ingenious feat, and for supporting all my educational adventures, I would like to thank Professor David Kraemer, Rabbi Leonard Levy, and Professor Mayer Rabinowitz.

To Bruce Chapman, the teacher who forced inspiration to the forefront, where it belongs: Here’s to you, Captain. To my great friends, Regina DiNunzio,

Tom Kilroe, Katy Potter, and Deborah Mougoue, who keep me on my toes.

My parents, Carole and Elliot Solomon, have been the staunchest advocates of this reckless leap. Their unrelenting curiosity and unvarnished pleasure in my pursuits has given me strength. And Mark Sawasky, my constant friend and fan and love, becomes a bigger mensch every day.

iv

TABLE OF CONTENTS

DEDICATION ...... ii

ACKNOWLEDGEMENTS ...... iii

LIST OF TABLES...... vii

LIST OF FIGURES...... ix

CHAPTERS

CHAPTER 1 – Introduction...... …...1

Research problem...... 5

Importance of the problem...... 6

Assumptions and limitations ...... 7

Definitions ...... 8

CHAPTER 2 – Literature review...... 10

Mental testing and the ...... 10

Norm-referencing and the T ...... 11

Nonnormality observed ...... 13

Statistical considerations ...... 14

Standardizing transformations ...... 21

Approaches to creating normal scores ...... 28

CHAPTER 3 – Methodology...... 32

Programming specifications...... 33

Sample sizes...... 33

Number of Monte Carlo repetitions...... 33

Achievement and psychometric distributions...... 33

v

Presentation of results ...... 34

CHAPTER 4 – Results ...... 43

CHAPTER 5 – Conclusion...... 89

Discussion...... 92

Moment 1— ...... 92

Moment 2—standard ...... 92

Moment 3—...... 95

Moment 4—...... 95

Recommendations...... 96

REFERENCES...... 98

ABSTRACT ...... 110

AUTOBIOGRAPHICAL STATEMENT...... 112

vi

LIST OF TABLES

Table 1. Differences among Ranking Methods in Attaining Target Moments ...... 25

Table 2. Smooth Symmetric—Accuracy of T Scores on ...... 45

Table 3. Smooth Symmetric—Accuracy of T Scores on Standard Deviations ...... 46

Table 4. Smooth Symmetric—Accuracy of T Scores on Skewness ...... 47

Table 5. Smooth Symmetric—Accuracy of T Scores on Kurtosis ...... 48

Table 6. Discrete Mass at Zero—Accuracy of T Scores on Means...... 49

Table 7. Discrete Mass at Zero—Accuracy of T Scores on Standard Deviations ...50

Table 8. Discrete Mass at Zero—Accuracy of T Scores on Skewness ...... 51

Table 9. Discrete Mass at Zero—Accuracy of T Scores on Kurtosis...... 52

Table 10. Extreme Asymmetric, Growth—Accuracy of T Scores on Means ...... 53

Table 11. Extreme Asymmetric, Growth—Accuracy of T Scores on Standard

Deviations...... 54

Table 12. Extreme Asymmetric, Growth—Accuracy of T Scores on Skewness...... 55

Table 13. Extreme Asymmetric, Growth—Accuracy of T Scores on Kurtosis ...... 56

Table 14. Digit Preference—Accuracy of T Scores on Means…...... 57

Table 15. Digit Preference—Accuracy of T Scores on Standard Deviations...... 58

Table 16. Digit Preference—Accuracy of T Scores on Skewness...... 59

Table 17. Digit Preference—Accuracy of T Scores on Kurtosis ...... 60

Table 18. Multimodal Lumpy—Accuracy of T Scores on Means...... 61

Table 19. Multimodal Lumpy—Accuracy of T Scores on Standard Deviations ...... 62

Table 20. Multimodal Lumpy—Accuracy of T Scores on Skewness ...... 63

Table 21. Multimodal Lumpy—Accuracy of T Scores on Kurtosis...... 64

vii

Table 22. Mass at Zero with Gap—Accuracy of T Scores on Means...... 65

Table 23. Mass at Zero with Gap—Accuracy of T Scores on Standard

Deviations...... 66

Table 24. Mass at Zero with Gap—Accuracy of T Scores on Skewness ...... 67

Table 25. Mass at Zero with Gap—Accuracy of T Scores on Kurtosis...... 68

Table 26. Extreme Asymmetric, Decay—Accuracy of T Scores on Means...... 69

Table 27. Extreme Asymmetric, Decay—Accuracy of T Scores on Standard

Deviations...... 70

Table 28. Extreme Asymmetric, Decay—Accuracy of T Scores on Skewness ...... 71

Table 29. Extreme Asymmetric, Decay—Accuracy of T Scores on Kurtosis...... 72

Table 30. Extreme Bimodal—Accuracy of T Scores on Means...... 73

Table 31. Extreme Bimodal—Accuracy of T Scores on Standard Deviations ...... 74

Table 32. Extreme Bimodal—Accuracy of T Scores on Skewness ...... 75

Table 33. Extreme Bimodal—Accuracy of T Scores on Kurtosis ...... 76

Table 34. Deviation from Target, Summarized by Moment, Size, and

Distribution ...... 90

Table 35. Winning Approximations, Summarized by Moment, Sample Size, and

Distribution ...... 91

viii

LIST OF FIGURES

Figure 1. Comparison of Scores in a Normal Distribution ...... 3

Figure 2. Distribution of T Scores Using Blom’s Approximation: Good fit on all four moments ...... 26

Figure 3. Distribution of T Scores Using Blom’s Approximation: Poor fit on second and third moments ...... 27

Figure 4. Distribution of T Scores Using Blom’s Approximation: Poor fit on fourth moment ...... 28

Figure 5. Achievement: Smooth Symmetric ...... 35

Figure 6. Achievement: Discrete Mass at Zero ...... 36

Figure 7. Achievement: Extreme Asymmetric, Growth...... 37

Figure 8. Achievement: Digit Preference...... 38

Figure 9. Achievement: Multimodal Lumpy ...... 39

Figure 10. Psychometric: Mass at Zero with Gap...... 40

Figure 11. Psychometric: Extreme Asymmetric, Decay...... 41

Figure 12. Psychometric: Extreme Bimodal ...... 42

Figure 13. Smooth Symmetric: Power curve for deviation of ...... 78

Figure 14. Smooth Symmetric: Power curve for deviation range of kurtosis...... 78

Figure 15. Discrete Mass at Zero: Power curve for deviation range of standard deviation...... 79

Figure 16. Discrete Mass at Zero: Power curve for deviation range of kurtosis ...... 79

ix

Figure 17. Extreme Asymmetric, Growth: Power curve for deviation range of standard deviation...... 80

Figure 18. Extreme Asymmetric, Growth: Power curve for deviation range of kurtosis...... 80

Figure 19. Digit Preference: Power curve for deviation range of standard deviation...... 81

Figure 20. Digit Preference: Power curve for deviation range of kurtosis...... 81

Figure 21. Multimodal Lumpy: Power curve for deviation range of standard deviation...... 82

Figure 22. Multimodal Lumpy: Power curve for deviation range of kurtosis ...... 82

Figure 23. Mass at Zero with Gap: Power curve for deviation range of standard deviation...... 83

Figure 24. Mass at Zero with Gap: Power curve for deviation range of kurtosis .....83

Figure 25. Extreme Asymmetric, Decay: Power curve for deviation range of standard deviation...... 84

Figure 26. Extreme Asymmetric, Decay: Power curve for deviation range of kurtosis...... 84

Figure 27. Extreme Bimodal: Power curve for deviation range of standard deviation...... 85

Figure 28. Extreme Bimodal: Power curve for deviation range of kurtosis...... 85

Figure 29. Smooth Symmetric: Power curve for deviation range of standard deviation with inclusion of large sample sizes ...... 87

x

Figure 30. Digit Preference: Power curve for deviation range of standard deviation with inclusion of large sample sizes ...... 87

Figure 31. Mass at Zero with Gap: Power curve for deviation range of kurtosis with inclusion of large sample sizes...... 88

xi 1

CHAPTER 1

INTRODUCTION

To those who believe that “the purpose of data analysis is to analyze data better”

it is clearly wise to learn what a procedure really seems to be telling us about.

(J. W. Tukey, 1962)

Standardized tests can be used to determine aptitude or achievement

(Thorndike, 1982). Whether the goal of a test is to measure differences in ability, personality, or mastery of a subject, it is necessary to analyze individual scores relative to others in the group and also to analyze group scores relative to other group scores (Angoff, 1971; AERA, APA, & NCME, 1999; Netemeyer, Bearden, and Sharma, 2003). Scores are ultimately interpreted according to the purpose of the test. For example, academic aptitude tests are likely to be interpreted competitively, with high performing students favored for scholarships or admission to selective programs and low performing students targeted for remediation.

Achievement tests are typically interpreted in the light of performance benchmarks and used to measure the adequacy of teaching methods or school performance.

Analysis for either purpose requires a frame of reference for the interpretation of raw scores (Aiken, 1994).

Standardization and normalization are two ways of defining the frame of reference for a distribution of test scores. Both types of score conversions, or transformations, mathematically modify raw score values (Osborne, 2002). The defining feature of standard scores is that they use standard deviations to describe scores’ distance from the mean, thereby creating equal units of measure within a

2 given score distribution. Standard scores may be modified to change the scale’s number system (Angoff, 1984), but unless distributions of standard scores are normalized, they will retain the shape of the original score distribution. Therefore, standardization may enable effective analysis of individual scores within a single test, but it does not lead to meaningful comparisons between tests.

Normalization surmounts this limitation by equalizing the areas under the curve that correspond with scores’ successive intervals along the curve.

Normalization is considered a type of area transformation because it “redefines the unit separations”(Angoff, 1984, p.36), changing the shape of the distribution itself.

Normalization has two great strengths, the first of which is shared by standardization: 1) it transforms ordinal scales into continuous scales, which are mathematically tractable; and 2) it superimposes a normal curve onto nonnormal distributional shapes, allowing for between-test comparisons.

Normal scores may be scaled to make them easier to interpret. For example, the formula T = 10 Z + 50 replaces normalized standard scores with T scores, which have a mean of 50 and a standard deviation of 10. Many normal score systems are assigned means and standard deviations that correspond with the T score. For example, the College Entrance Board’s Scholastic Aptitude Test

(SAT) Verbal and Mathematical sections are scaled to a mean of 500 and a standard deviation of 100. Thus, T scores fall between 20 and 80 and SAT scores fall between 200 and 800. Other normalized standard scores include normal curve equivalent (NCE) scores, which have a mean of 500, a standard deviation of 21, and a score range of 1-99; Wechsler scales, which have a mean of 100, a

3 standard deviation of 15, and a 95% score range of 55-145; and stanines, which have a mean of 5, a standard deviation of 2, and a finite score range of 1-9.

Figure 1 . Comparison of scores in a normal distribution. (Adapted from Test

Service Bulletin of The Psychological Corporation, 1955)

The first step in the process of converting raw scores into T scores or other scaled, normal scores is a ranking of the raw scores according to their relative placement on the unit normal distribution. This means that the raw scores will no longer be used to characterize the test score distribution. Instead, raw scores will be replaced by an estimate of their normal probability deviates. Whereas raw scores originally refer to individual coordinates, they are transformed to become components in the two dimensional spaces, or categories, which comprise the area under the normal distribution. Once these normal probability deviates, or Z

4 scores, are obtained, the desired mean and standard deviation are applied. In the case of T scores, Z scores are multiplied by 10 and assigned a mean of 50.

A number of ranking methods that improve the accuracy and of the traditional method have been developed in the last 60 years. These ranking methods are sometimes referred to as proportion estimates because they approximate where the ordinal scores fall along a normal distribution and how much of the corresponding area under the curve the ranked, cumulative proportions occupy. The most prominent of these procedures, based on their inclusion in widely used computer statistical software (e.g., SPSS, 2006) are those attributed to Van der Waerden (1952, 1953a, 1953b; Lehmann, 1975), Blom

(1958), and Tukey (1962), and the Rankit procedure (Ipsen & Jerne, 1944; Bliss,

1956). These proportion estimates have been explored to various degrees in the context of hypothesis testing, where the focus is necessarily on the properties of these estimates in the tails of a distribution. In the context of standardized testing, however, the body of the distribution—that is, the 95% of the curve that lies between the tails—is the focus. To date, there has been no empirical comparison of these ranking methods as they apply to standardized testing.

When normalizing standard scores, practitioners need to know the comparative effects of their selected ranking method on the transformed score outcomes. Specifically, during the transformation of Z scores into T scores, the practitioner would benefit from knowing each method’s potential accuracy and how frequently it is capable of attaining a specific level of accuracy. Conversely, each method’s likely degree and frequency of inaccuracy should be taken into account.

5

For T scores, the criteria for comparing ranking methods are the accuracy and frequency of random scores’ attainment of a mean of 50 and a standard deviation of 10. The standard deviation for T scores, alone, is not a useful point of comparison because it is built on the mean; therefore, its degree of accuracy derives from that of the mean and cannot be used as an independent reference point. However, the accuracy of its standard deviation (that is, how nearly and how frequently it obtains a value of 10) is equally important once the value of the mean has been shown to be 50.

T scores express only the first and second moments of the distribution, (mean) and variability (standard deviation), but they may also be affected by the third and fourth moments, asymmetry (skewness) and peakedness

(kurtosis). Although each of these ranking methods is designed to produce a unit normal score distribution, they may not achieve ideal skewness and kurtosis. A normal curve is perfectly symmetrical, meaning it has zero skew. A kurtosis of three (3) means the shape of the curve is neither more peaked nor more flat than the shape of an idealized normal distribution. It is necessary to examine the skewness and kurtosis of T scores, in addition to their means and standard deviations, in order to fully evaluate each ranking method’s effectiveness in normalizing test scores.

Research Problem

Given the importance of transforming Z scores to a scale that preserves a mean of 50 and a standard deviation of 10, this study aims to empirically demonstrate the relative accuracy of the Blom, Tukey, Van der Waerden, and

6

Rankit approximations for the purpose of normalizing test scores. It will compare their accuracy in terms of achieving the T score’s specified mean and standard deviation and unit normal skewness and kurtosis, among small and large sample sizes in an array of real, nonnormal distributions. Although this objective is an applied one, the investigation will benefit the theoretical advancement of area estimation under the normal distribution.

Importance of the Problem

Standardized test scores, even scores abiding by the familiar T score scale, are notoriously difficult to interpret (Micceri, 1990). Most test-takers, parents, and even many educators, would be at a loss to explain exactly what a score of 39, 73, or 428 means in conventional terms, such as pass/fail, percentage of questions answered correctly, or performance relative to other test-takers. The matter is complicated by . Once error is computed and added/subtracted from a given test score, it reveals a range of possible true scores.

Thus, a standard error of three would produce a range of six scores: it would show the score 52 to be potentially as low as 49 or as high as 55. This example assumes that the mean is 50. However, if a different ranking method produces a mean of 51, the test-taker’s score would be between 50 and 56—or combining the two methods’ results, between 49 and 56. If yet another method produces a mean closer to 49, then theoretically, a test-taker’s true score could lie anywhere between 48 and 56. The potential range of true scores expands with each alternate method of computing the normalized score. Error is not a fixed

7 quantity; it may vary across computational methods as well as sample sizes and statistical distributions.

The accuracy, both in terms of degree and frequency, of the four most visible ranking methods has not been established. Blom, Tukey, Van der Waerden, and Rankit each contribute a ranking formula that approximates a normal distribution, given a set of raw scores or nonnormalized standard scores. However, the formulas themselves have not been systematically compared for their first four moments’ accuracy in terms of normally distributed data. Nor have they been compared in the harsher glare of nonnormal distributions, which are prevalent in the fields of education and psychology (Micceri, 1989). Small samples are also common in real data and are known to have different statistical properties than large samples (Conover, 1980). In general, real data can be assumed to behave differently than data that is based on theoretical distributions, even if these are nonnormal (Stigler, 1977).

Assumptions and Limitations

A series of Monte Carlo simulations will draw samples of different sizes from eight different empirically established population distributions. These eight distributions, though extensive in their representation of real achievement and psychometric test scores, do not represent all possible distributions that could occur in educational and psychological testing, or in social and behavioral science investigations more generally. Nor do the sample sizes represent every possible increment. However, both the sample size increments and the range of distributional types are assumed to be sufficient for the purpose of outlining the

8 comparative accuracy and reliability of the ranking methods in real settings.

Although the interpretation of results need not be restricted to educational and psychological data, similar distributional types may be most often found in these domains.

Definitions

Z scores Raw scores or random variables that have undergone

the standardizing transformation (X – ) / σ , where is

the mean and σ is the population standard deviation .

Also called unmodified standard scores.

Normal scores Raw scores or standard scores that have undergone a

normalizing transformation such that the ordinal

of scores correspond to their probability

deviates on the unit normal distribution.

T scores Raw scores or standard scores that have undergone

the scaling transformation 10 Z +50 , where Z is the

normal probability deviate corresponding to the ordinal

rank of the original raw or standard score .

Proportion estimates Approximation formulas estimating the cumulative

areas under a unit normal distribution that fall below the

ordinal rankings of test scores.

9

Rankit approximation A proportion estimate using the formula ( r - 1/2) / n. *

Van der Waerden’s approximation A proportion estimate using the formula

r / ( n + 1), where r is the rank, ranging from 1 to n.

Blom’s approximation A proportion estimate using the formula ( r - 3/8) / ( n +

1/4).

Tukey’s approximation A proportion estimate using the formula ( r - 1/3) / ( n +

1/3).

Monte Carlo simulation A statistical modeled on a computer that

uses an iterative random process, usually with

replacement of data values, to demonstrate the

behavior of statistical methods under specified

conditions.

* Notation for these four approximation formulas varies in the literature: 1) r is used interchangeably with i and k; and 2) n is used interchangeably with w .

10

CHAPTER 2

LITERATURE REVIEW

The development of ranking methods stems from two related enterprises: the psychological effort to measure mental phenomena and the statistical effort to calculate the area under the unit normal distribution. Knowledge, intellectual ability, and personality are psychological objects that can only be measured indirectly, not by direct observation (Dunn-Rankin, 1983). The scales that describe them are hierarchical—they result in higher or lower scores—but these scores do not express exact quantities of test-takers’ proficiency or attitudes.

Likert scales, which are ordinal, and multiple choice items, which produce discrete score scales, result in numbers that are meaningless without purposeful statistical interpretation (Nanna & Sawilowsky, 1998). Measures with unevenly spaced increments interfere with the interpretation of test scores against performance benchmarks, the longitudinal linking of test editions, and the equating of parallel forms of large-scale tests (Aiken, 1987). They also threaten the robustness and power of the parametric statistical procedures that are conventionally used to analyze standardized test scores (Friedman, 1937;

Sawilowsky & Blair, 1992).

Mental Testing and the Normal Distribution

Standardized test scores present a unique set of statistical considerations because the scoring system may be devised for different purposes. Mehrens and

Lehmann (1987) characterized these purposes as instructional, guidance, administrative, or research, but admittedly, these purposes often overlap. If the

11 purpose of a test is to discriminate between test-takers’ ability or achievement levels, the scoring system would create maximum variability between scores. If its purpose is to evaluate students’ progress toward a specified objective, then the degree of variability between scores is less relevant. Apart from the natural range of test-takers’ aptitude, subject-matter proficiency, and range of attitudes or personality characteristics, a test’s design has a strong influence on its score distribution.

Norm-Referencing and the T Score

The history of testing is fraught with incorrect distributional assumptions.

According to Angoff (1984), “the assumption underlying the search for equal units was that mental ability is fundamentally normally distributed and that equal segments on the base line of a normal curve would pace off equal units of mental ability”(p.11). McCall (1939) devised the T score scale on this same assumption, naming it after the educational and psychological measurement pioneers

Thorndike and Terman (Walker & Lev, 1969). McCall derived a normal scale by randomly selecting individuals from a population that was presumed to be homogenous, testing them, creating a distribution from their scores, and transforming their percentile ranks to normal deviate scores with a preassigned mean of 50 and standard deviation of 10. Today, this method would be considered appropriate for norm-referencing a test to a target population, but thoroughly inappropriate for determining any true ability distribution. Although there is no reason to assume that cognitive phenomena are normally distributed, norm-

12 referencing can be useful for comparing individuals’ performance to others in the same population.

Even when norming makes correct distributional assumptions, it can be problematic. Angoff (1971) argued against normative scoring systems that have built-in, definitional, or inherent meaning. These meanings are liable to be lost over time or to become irrelevant. Aiken (1994) cautioned that norms can become outdated even more quickly in certain circumstances: “for example, changes in school curricula may necessitate restandardizing and perhaps modifying and reconstructing an achievement test every 5 years or so”(p.78). Furthermore, scales can function independently of direct representation. For example, inches, pounds, and degrees Fahrenheit no longer reference their original object for most

Americans, but serve as effective measures nonetheless, due to their familiarity and reliability. Likewise, the T score owes much of its usefulness to its longstanding place as the scale of choice.

Despite these arguments, Mehrens and Lehmann (1987) viewed norm- referencing as the basis for most testing theory and practice. It is “useful in aptitude testing where we wish to make differential predictions. It is also very useful to achievement testing”(p.18). They also noted that standardized tests are often used in both norm-referenced and criterion-referenced contexts; they may be constructed and interpreted to simultaneously compare a student’s performance relative to other students in the target test-taking population as well as to evaluate the student’s absolute knowledge of a subject. Norms may be referenced to

13 national, regional, and local standards; age and grade; mental age; ; or standard scores that are a function of a specific group’s performance.

Nonnormality Observed

According to Nunnally (1978), “test scores are seldom normally distributed”(p.160). Micceri (1989) demonstrated the extent of this phenomenon in the social and behavioral sciences by evaluating the distributional characteristics of 440 real data sets collected from the fields of education and psychology.

Standardized scores from national, statewide, and districtwide test scores accounted for 40% of them. Sources included the Comprehensive Test of Basic

Skills (CTBS), the California Achievement Tests , the Comprehensive Assessment

Program , the Stanford Reading tests, the Scholastic Aptitude Tests (SATs), the

College Board subject area tests, the American College Tests (ACTs), the

Graduate Record Examinations (GREs), Florida Teacher Certification

Examinations for adults, and Florida State Assessment Program test scores for 3 rd ,

5th , 8 th , 10 th , and 11 th grades.

Micceri summarized the tail weights, asymmetry, modality, and digit preferences for the ability measures, psychometric measures, criterion/mastery measures, and gain scores. Over the 440 data sets, Micceri found that only 19

(4.3%) approximated the normal distribution. No achievement measure’s scores exhibited symmetry, smoothness, unimodality, or tail weights that were similar to the Gaussian distribution. Underscoring the conclusion that normality is virtually nonexistent in educational and psychological data, none of the 440 data sets passed the Kolmogorov-Smirnov test of normality at alpha = .01, including the 19

14 that were relatively symmetric with light tails. The data collected from this study highlight the prevalence of nonnormality in real social and behavioral science data sets:

The great variety of shapes and forms suggests that respondent samples themselves consist of a variety of extremely heterogeneous subgroups, varying within populations on different yet similar traits that influence scores for specific measures. When this is considered in addition to the expected dependency inherent in such measures, it is somewhat unnerving to even dare think that the distributions studied here may not represent most of the distribution types to be found among the true populations of ability and psychometric measures. (Micceri, 1989, p.162)

Furthermore, it is unlikely that the will rehabilitate the demonstrated prevalence of nonnormal data sets in applied settings. Tapia and

Thompson (1978) warned against the “fallacious overgeneralization of central limit theorem properties from sample means to individual scores”(cited in Micceri, 1989, p.163). Although sample means may increasingly approximate the normal distribution as sample sizes increase (Student, 1908), it is wrong to assume that the original population of scores is normally distributed. According to Friedman

(1937), “this is especially apt to be the case with social and economic data, where the normal distribution is likely to be the exception rather than the rule”(p.675).

Statistical Considerations

There has been considerable empirical evidence that raw and standardized test scores are nonnormally distributed in the social and behavioral sciences. In addition to Micceri (1989), numerous authors have raised concerns regarding the assumption of normally distributed data (Pearson, 1895; Wilson & Hilferty, 1929;

Allport, 1934; Simon, 1955; Tukey & McLaughlin, 1963; Andrews et al., 1972;

15

Pearson & Please, 1975; Stigler, 1977; Bradley, 1978; Tapia & Thompson, 1978;

Tan, 1982; Sawilowsky & Blair, 1992). Bradley (1977) summarized the rationale for adopting a statistical approach that responds to the fundamental nonnormality of most real data:

One often hears the objection that if a distribution has a bizarre shape one should simply find and control the variable responsible for it. This outlook is appropriate enough to the area of , but it is inappropriate to the behavioral sciences, and perhaps other areas, where the experimenter, even if he knew about the culprit variable and its influence upon population shape, is generally not interested in eliminating an assignable cause, but rather in coping with (i.e., drawing inferences about) a population in which it is free to vary. (p.149)

The prevalence of nonnormal distributions in education, psychology, and related disciplines calls for a closer look at transformation procedures in the domain of achievement and psychometric test scoring.

Transformations take many forms, ranging from the unadjusted linear transformation to the logarithmic, square root, arc-sine, reciprocal, and inverse normal scores transformations. Percentiles may also be staging a comeback.

Zimmerman and Zumbo (2005) argued that “a transformation to percentiles or deciles is also similar to various normalizing transformations” insofar as those transformations “bring sample values from nonnormal populations closer to a normal distribution”(p.636). Percentile ranks denote the percentage of scores falling below a certain point on the . They compared the assignment of percentile values to raw scores with the assignment of ranks to raw scores.

Traditionally, ranking was done by computing percentile ranks for the raw scores, then finding the corresponding values from a normal probability

16 distribution. Today, statistical ranking formulas such as the Blom, Tukey, Van der

Waerden, and Rankit are used to estimate the normal probability deviates. Both percentiles and statistical ranking methods minimize several types of deviations from normality, but according to Zimmerman and Zumbo, “the percentile transformation preserves the relative magnitude of scores between samples as well as within samples”(p.635). This may be advantageous in certain circumstances, but normalizing transformations have enduring appeal due to their familiarity and efficiency.

History of normalizing transformations. An ordinal scale presents only score ranks, without any reference to the distance between those ranks. There is no way of knowing whether the distance between ranks (for example, the second-highest and third-highest scores in a set) is similar to that between other ranks in the set.

Theorists have proposed proportion estimation formulas to deduce the average distance between ranks based on what is known about the properties of the unit normal distribution.

As described by Harter (1961):

The problem of order statistics has received a great deal of attention from statisticians dating at least as far back as a paper by Karl Pearson (1902) giving a solution of a generalization of a problem proposed by Galton (1902). The generalized problem is that of finding the average difference between the pth and the ( p+1) th individuals in a sample of size n when the sample is arranged in an order of magnitude. (p.151)

Other early attempts at characterizing among ordinal scales include Irwin

(1925); Tippet (1925); Thurstone (1928); Pearson and Pearson (1931); Fisher and

Yates (1938, 1953); Ipsen and Jerne (1944); Hastings, Mosteller, Tukey, and

Winsor (1947); Wilks (1948); Godwin (1949); Federer (1951); Mosteller (1951);

17

Bradley and Terry (1952); Scheffé (1952); Cadwell (1953); Pearson and Hartley

(1954); Blom (1954); Kendall (1955); and Harter (1959).

The pursuit of a useful way to characterize the difference between ordinal points on a scale has primarily stemmed from the concerns of hypothesis testing.

This context has driven a focus on interval estimates and the extremes of the normal distribution, because these are the areas that define the null hypothesis.

Testing, on the other hand, is primarily concerned with the differences which characterize the body of the score distribution. In many research settings, ordinal scales are often mathematically transformed into continuous scales in order to be analyzed using parametric methods. According to Tukey (1957):

The analysis of data usually proceeds more easily if (1) effects are additive; (2) the error variability is constant; (3) the error distribution is symmetrical and possibly nearly normal. The conventional purposes of transformation are to increase the degrees of approximation to which these desirable properties hold (p.609).

Transforming scales to a higher level of measurement leads to the problem of gaps. “It is inevitable that gaps occur in the conversions when there are more scale score points than raw score points, and gaps may be more of a problem for some transformation methods and tests than for others.”(Chang, 2006, p.927). For this reason, Bartlett advised “that even when measurements are available it may be safer to analyze by use of ranks”(1947, p.50) by transforming them to expected normal scores. “It is reasonable to assume that if the ranked data were replaced by expected normal scores, the validity of the would be somewhat improved”(p.50).

18

Transforming ordinal data into a continuous scale has been popular since

Fisher and Yates tabled the normal deviates in 1938. According to Wimberly

(1975):

An inherently linear relationship among the T-scores of different variables is free of mismatched kurtoses, , and standard deviations which attenuate correlations or which lead to artificial non-linearities in regressions. Furthermore, the T-score transformation should generally result in a more nearly normal distribution than that provided by other transformations such as those from logarithms, exponents, or roots. (p.694)

T scores also have the advantage of being the most familiar scale, thus facilitating score interpretation. The prime importance of interpretability has been stressed by

Petersen et al. (1989), Kolen and Brennan (2004), and Chang (2006).

Blom (1954) observed that “nearly all the transformations used hitherto in the literature for normalization of binomial and related variables can be developed from a common starting point”(p.303). Blom was referring to the use of the normal probability integral to solve tail and confidence problems associated with certain transformations, but this generalization holds conceptual value as well. The fact that test scores are ordinal can be understood as the statistical point of origin for the advantages and liabilities of normalizing transformations.

Transformation controversies. There has been considerable debate about the statistical properties of various data transformations in the context of hypothesis testing. This literature originally concerned the robustness of such as the analysis of variance (ANOVA) to Type I error

(Glass, Peckham, & Sanders, 1972). Many early studies concluded that transformations are unnecessary for ANOVA because the F test is impervious to

Type I error except in cases of heterogeneity of variance and unequal sample

19 sizes. Srivastava (1959), Games and Lucas (1966), and Donaldson (1968) explored both Type I and Type II error rates for the F test among nonnormally distributed data, suggesting that the test’s power increased in cases of extreme skew and acute kurtosis.

Levine and Dunlap (1982) argued that power can generally be increased by transforming skewed and heteroscedastic data. They took issue with the more conservative approach of Games and Lucas, who “viewed transformation of data as defensible only if it produced Type I error rates closer to the nominal significance level when the null hypothesis was true and a lowered probability of

Type II errors (i.e., higher power) when the null hypothesis was false”(p.273). For

Levine and Dunlap, data transformations can do more than minimize error under specific ANOVA assumptions violations. They can be used for the express purpose of increasing power.

Games (1983) proceeded to redefine the argument by repositioning skewness among the other three moments (central tendency, variability, and kurtosis) that are changed by normalizing transformations. Power fluctuations should be seen as resulting from the combination of transformed moments, not skewness alone. Furthermore, Games argued that normalizing transformations should not be undertaken out of a mechanistic desire to correct skew and increase power. In line with Bradley (1978), Games (1983) held that “if Y has been designated as the appropriate scale for psychological interpretation, then the observation that Y is skewed is certainly an inadequate basis to cause one to switch to a curvilinear transformation”(p.385-6).

20

Games also questioned the process of selecting transformations for variance stabilization and normalization. “It is possible that a variance stabilizing transformation may not be normalizing, and vice versa”(p.386), especially with small samples. Games criticized Levine and Dunlap for not recognizing the complexity of the decision to transform and the difficulty of evaluating the appropriateness of specific transformations for specific purposes. Finally, Games asserted that Levine and Dunlap generated their findings under irrelevant statistical conditions (their sample data was neither skewed or heteroscedastic), which lent to a facile conclusion. “Nobody in the literature has advocated taking such data and applying a transformation”(p.386).

Levine and Dunlap (1983) disputed Games’ (1983) criticism, foremost the assertion that transformations ought to be undertaken exclusively to correct skew.

Claiming that empirical demonstrations are insufficient, they invoked Kendall and

Stuart’s (1979) mathematical proof that the independent samples t test is the most powerful statistical test in the case of normal, homoscedastic data. In short order,

Games (1984) rebutted Levine and Dunlap based on their “failure to distinguish theoretical models from empirical data”(p.345), resulting in a fatal misrepresentation of the behavior of empirical data.

Levine, Liukkonen, and Levine “partially resolved”(1992, p.680) this debate by developing a that identifies the effect of variance-stabilizing, symmetrizing transformations on power. In line with Levine and Dunlap (1982,

1983), they concluded, albeit tentatively, that normalizing transformations could indeed increase power for highly skewed data with equal sample sizes. This

21 represents a concession to Games’ (1983) emphasis on the dictates of observed data: “In the absence of knowledge about the population distribution, we must rely on the data itself to give clues as to which transformation to use”(p.691).

The Games-Levine controversy concerned the implications of transformations for inferential statistical tests such as ANOVA. Here, transformations may help to better meet parametric statistics’ underlying assumptions and thereby reduce Type I and Type II errors. As this exchange demonstrated, however, it is difficult to determine when it is justified to use a transformation. The answer lies in the characteristics of the population, which can only be inferred. Even when egregious assumptions violations seem to warrant a transformation, it is not known to what extent the transformation corrects the condition. Finally, once a transformation has changed the data’s original metric, the resulting test statistic may become unintelligible in terms of the research question (Bradley, 1978; Games, 1983).

In , on the other hand, transformations serve to clarify non-intuitive test scores. For example, the normalizing T score transformation takes raw scores from any number of different metrics, few of which would be familiar to a test taker, teacher, or administrator, and gives them a common framework. Therefore, the T score is immune to the restrictions of normalizing transformations in hypothesis testing scenarios.

Standardizing Transformations

Although standard scores may be assigned any mean and standard deviation through linear scaling, the Z score transformation, which produces a

22 mean of 0 and a standard deviation of 1 for normally distributed variables, is the baseline standardization technique (Walker & Lev, 1969; Mehrens & Lehmann,

1980; Hinkle, Wiersma, & Jurs, 2003). In the case of normally distributed data, Z scores are produced by dividing the deviation score (the difference between raw scores and the mean of their distribution) by the standard deviation. However, Z scores can be difficult to interpret because they produce decimals and negative numbers. Because 95% of the scores fall between -3 and +3, small changes in decimals may imply large changes in performance. Also, because half the scores are negative, it gives the impression to the uninitiated that half of the examinees obtained an extremely poor outcome.

Linear scaling techniques. These problems can be remedied by multiplying standard scores by a number sufficiently large to render decimal places trivial, then adding a number large enough to eliminate negative numbers. The most common type of modified standard score is one that multiplies Z scores by 10 to obtain their standard deviation from the scaled mean of 50 (Cronbach, 1976; Kline, 2000). This linear, scaling modification is sometimes confused with the T score formula, which is a nonlinear, normalizing transformation. On the surface, the T score formula resembles the modified standardization formula, but it operates on a different principle. In the modified standard score formula Xm = 10 Z + 50 , Z is a standard score, the product of the standardizing transformation (X – ) / σ ; in the T score formula T = 10 Z + 50 , Z refers not to the standard score but to the normal deviate corresponding to that score. McCall used a simple linear transformation to convert a group of norm-referenced standard scores into T scores.

23

The utility of modified standard scores is severely restricted by the nature of achievement and psychometric test scores. Modified standard scores can only be obtained for continuous data because they require computation of the mean.

However, most educational and psychological test scores are on a discrete scale, not a continuous scale (Lester & Bishop, 2000). Furthermore, linear transformations retain the shape of the original distribution. If a variable’s original distribution is Gaussian, its transformed distribution will also be normal. If an observed distribution manifests substantial skew, excessive or too little kurtosis, or multimodality, these non-Gaussian features will be maintained in the transformed distribution.

This is problematic for a wide range of practitioners because it is common practice for educators to compare or combine scores on separate tests and for testing companies to reference new versions of their tests to earlier versions.

Standard scores such as Z will not suffice for these purposes because they do not account for differing score distributions between tests. Comparing scores from a symmetric distribution with those from a negatively skewed distribution, for example, will give more weight to the scores at the lower range of the skewed curve than to those at the lower range of the symmetric curve (Horst, 1931).

For example, Wright (1973) described a scenario where standardization would lend itself to the unequal weighting of test scores:

Some subjects, such as mathematics, tend to have widely dispersed scores while other subjects, such as English Composition, tend to have narrowly dispersed scores. Thus a student who is excellent in both subjects will find his mathematics grade of more value to his average than his English grade; the converse is of course true for the student who is poor in both subjects. If

24

you wish to have all subjects equally weighted you must perform a transformation that will equate their dispersions (p.4).

This scenario illustrates the necessity of normalizing transformations, which are curvilinear, for rendering standard deviations uniform across test score distributions. However, normalizing transformations may also mitigate the inequitable interpretation of asymmetrical score distributions. A test score distribution that is positively skewed has more variability than normal on the lower end; therefore, cut points that are determined according to a specific standard score or a standard deviation are likely to refer too many students to remedial services.

Using Area Transformations to Normalize Score Distributions

Whereas linear transformations facilitate the interpretation of continuously scaled, normally distributed raw scores, normalizing transformations create a continuously scaled, normal distribution where there was none. According to

Petersen, Kolen, and Hoover (1989), there is not a good theoretical rationale for normalizing transformations. They are undertaken for applied objectives. Linear scaling transformations make standard scores easier to interpret, but they retain the limitations of unmodified standard scores. They cannot be used to compare scores from different tests, and they are statistically inappropriate for the analysis of data from ordinally scaled instruments.

Establishing population normality is pivotal to the scoring and interpretation of large-scale tests because it makes uniform the central tendency, variability, symmetry, and peakedness of score distributions. Using area transformations to rank random scores of different variables not only attempts to equate their means

25 and homogenize their variance, it also aims to create conformity in the third and fourth moments, skewness and kurtosis. The following table illustrates the relative accuracy of the Blom, Tukey, Van der Waerden, and Rankit approximations in achieving the target moments of the unit normal distribution, with the first two moments scaled to the T. These four transformations are performed on the same

10 scores from a smooth symmetric distribution.

Table 1

Differences among Ranking Methods in Attaining Target Moments

Computed Value of Moments \ Distance from Target

Mean (50) SD (10) Skew (0) Kurt (3)

Blom 50.010 \ 0.010 9.355 \ 0.645 0.008 \ 0.008 2.588 \ 0.412

Tukey 50.009 \ 0.009 9.211 \ 0.789 0.008 \ 0.008 2.559 \ 0.441

Van der W. 50.007 \ 0.007 8.266 \ 1.734 0.009 \ 0.009 2.384 \ 0.616

Rankit 50.011 \ 0.011 9.839 \ 0.161 0.007 \ 0.007 2.696 \ 0.304

All four ranking methods appear to be extremely accurate on the mean, with the average deviation from target only 0.009. The difference between the most and least accurate ranking methods on the mean is 0.004. Similarly, skewness shows only slight deviation from target and negligible variability between methods.

Considerably more variability emerges on standard deviations and kurtosis, however. The average distance from the target standard deviation is 0.832. Van der Waerden’s approximation returns a deviation value that is ten times greater than Rankit’s. Even the most accurate method is still nearly two-tenths of a

26 standard deviation off target. Kurtosis shows a similar pattern to standard deviations, but with less average distance from target and variability within deviation values. Rankit again is the most accurate, with half as much distance from target kurtosis as Van der Waerden’s approximation. The average deviation value for all four ranking methods on kurtosis is nearly half a point, 0.443.

Taking several variables from standardized assessment scores of infant characteristics, the following graphs represent score distributions that have been normalized using Blom’s ranking method. In all three examples (Figures 2 – 4),

Blom’s procedure has produced highly accurate means (corresponding to the target T score mean of 50). However, Figure 3 shows a smaller than normal standard deviation and a negative skew, and Figure 4 shows excessive kurtosis.

Figure 2 . Distribution of T scores using Blom’s approximation: Good fit on all

four moments.

27

Figure 3 . Distribution of T scores using Blom’s approximation: Poor fit on second and third moments.

28

Figure 4. Distribution of T scores using Blom’s approximation: Poor fit on

fourth moment.

Approaches to Creating Normal Scores

Van der Waerden’s approximation. Tarter (2000) described Van der

Waerden’s approximation as “a useful nonparametric inferential procedure…based on inverse Normal scores”(p.221). Normal scores are sometimes characterized as quantiles, or equal unit portions of the area under a normal curve corresponding with the number of observations comprising a sample. Van der Waerden (1952,

1953a, 1953b) suggested that quantiles be computed not strictly on the basis of ranks, but according to the rank of a given score value relative to the sample size

(Conover, 1980).

29

Blom’s approximation . Harter (1961) noted that “there has been an argument of long-standing between advocates of the approximations corresponding to α = 0 and α = 0·5, neither of which is correct”(p.154). Blom (1954,

1958) observed the values of alpha to increase as the number of observations increases, with the lowest value being 0.330. “For a given n, α is least for i = 1, rises quickly to a peak for a relatively small value of i, and then drops off slowly”(Harter, 1961, p.154). This reflects a nonlinear relationship between a score’s rank in a sample and its normal deviate. Because “Blom conjectured that α always lies in the interval (0·33, 0·50),” explained Harter, “he suggested the use of

α = 3/8 as a compromise value” (1961, p.154). Harter found the “compromise value” of 3/8, or 0.375, appropriate for small samples but otherwise too low.

There is evidence that Blom envisioned a specific application of his normal scores approximation. By his own evaluation: “We find that, in the special case of a normal distribution, the plotting rule Pi = ( i – 3/8) / ( n + ¼) leads to a practically unbiased estimate of a σ”(Blom, 1958, p.145). Blom understood the empirical phenomenon of a normal distribution to be uncommon, although it is not clear how he viewed the relative benefits of this formula in other circumstances. Blom concurred with Chernoff and Lieberman (1954) that

“the plotting rule Pi = ( i – 1/2) / n leads to a biased estimate of σ”(Blom,

1958, p.145). He suggested that this rule may be more efficient for large samples, but his own formula promises higher efficiency, along with unbiasedness, with small samples. Brown and Hettmansperger (1996) saw Blom’s approximation as an outgrowth of the quantile function, which “suggests Φ-1(i/n ) or Φ-

30

1[i/ (n+1)]”(p.1669). They considered Blom’s formula to be the most accurate approximation of the normal deviate.

Rankit approximation. Bliss, Greenwood, and White (1956) credited Ipsen and Jerne (1944) with coining the term “rankit,” but Bliss is credited with developing the technique as it is now used. Bliss et al. refined this approximation in their study of the effects of different insecticides and fungicides on the flavor of apples. Its design drew on Scheffé’s advancements in paired comparison research, which sought to account for magnitude and direction of preference, in addition to preference itself. “The transformation of degree of preference to rankits is a simple extension of Scheffé’s analysis in least squares”(Bliss et al., 1956, p.399). In this way, “the proportion of choices…could be transformed to a normal deviate…and the deviates for each sample averaged. These averages or scores would measure the spacing on the hypothetical preference scale”(p.386).

Thus, the Rankit itself was transformed, from an array of observations that are transformed into a single mean deviate, to the normalizing procedure that effects this transformation. Blom found the Rankit approximation to be more convenient and computationally efficient than the Thurstone-Mosteller, Bradley-

Terry, Kendall, and Scheffé techniques, even though “despite differences in the underlying model and method of analysis, the treatment rankings on a preference scale were substantially the same”(p.401). Rankit is also a plotting method for the comparison of “ordered residuals against normal order statistics, which is used to detect outliers and to check distributional assumptions”(Davison & Gigli, 1989, p.211).

31

Tukey’s approximation. Tukey (1957) considered normalizing transformations to be the most important type of data “re-expression”(Hoaglin,

2003, p.313). Pearson and Tukey (1965) affirmed their use for the analysis of observed data, “graduating empirical data” and methodological investigations,

“providing possible parent distributions as foundations for the mathematical study, analytical or empirical, of the properties of statistical procedures”(p.533). They posited the sufficiency of approximations for these purposes, which “are unlikely to require unusually high precision”(p.533). It seems that Tukey may have proposed his approximation, which he characterized as “simple and surely an adequate approximation to what is claimed to be optimum”(1962, p.22), as a refinement of

Blom’s.

32

CHAPTER 3

METHODOLOGY

The purpose of this study is to empirically demonstrate the comparative accuracy of Van der Waerden’s, Blom’s, Tukey’s, and the Rankit approximations for the purpose of normalizing standardized test scores. It will compare their accuracy in terms of achieving the T score’s specified mean and standard deviation and the unit normal distribution’s skewness and kurtosis among small and large sample sizes for a variety of real, nonnormal data sets.

Procedure

A computer program will be written that computes normal scores using the four proportion estimation formulas under investigation. These normal scores will be computed for each successive iteration of randomly sampled raw scores drawn from various real data sets.

The four different sets of normal scores will then be scaled to T scores. The first four moments of the distribution will be calculated from these T scores for each sample size in each population. Absolute values will be computed by subtracting T score means from 50, standard deviations from 10, skewness values from 0, and kurtosis values from 3. These absolute values will be sorted into like bins; next, they will be ranked in order of proximity to the target mean, standard deviation, skewness, and kurtosis.

Both the absolute values representing the T scores’ from the target values and the scores’ relative ranks in terms of accuracy on each criterion will be reported.

33

Programming Specifications

Compaq Visual Fortran Professional Edition 6.6c will be run on a Microsoft

Windows XP platform. Fortran was chosen for its large processing capacity and speed of execution. This is important for Monte Carlo simulations, which typically require from thousands to millions of iterations.

Subroutine POP (Sawilowsky, Blair, & Micceri, 1990) is based on eight distributions described by Micceri (1989). POP uses subroutines RNSET and

RNUND (IMSL, 1987). RNUND generates pseudorandom numbers from a uniform distribution, and RNSET initializes a random seed for use in IMSL random number generators (Visual Numerics, 1994). Subroutine RANKS (Sawilowsky, 1987) ranks sorted data.

Sample Sizes

The simulation will be conducted on samples of size n = 5, 10, 15, 20, 25,

30, 35, 40, 45, 50, 100, 200, 500, and 1,000 selected from a theoretical normal distribution, and from each of the eight Micceri (1989) data sets.

Number of Monte Carlo Repetitions

The goal is to compare the accuracy of four ranking methods. Therefore,

10,000 iterations should suffice to break any ties up to three decimal places.

Achievement and Psychometric Distributions

Micceri (1989) computed three indices of symmetry/asymmetry and two indices of tail weight for each of the 440 large data sets he examined (for 70% of which, n ≥ 1,000), grouped by data type: achievement/ability (accounting for 231 of the measures), psychometric (125), criterion/mastery (35), and gain scores (49).

34

Eight distributions were identified based on specified levels of symmetry and tail weight contamination. Sawilowsky, Blair, and Micceri (1990) translated these results into a Fortran subroutine using achievement and psychometric measures that best represented of the distributional characteristics described by Micceri

(1989).

Achievement distributions . The following five distributions were drawn from achievement measures: Smooth Symmetric, Discrete Mass at Zero, Extreme

Asymmetric, Growth, Digit Preference, and Multimodal Lumpy. These distributions are illustrated in Figures 5 through 9.

Psychometric distributions . Mass at Zero with Gap, Extreme Asymmetric,

Decay, and Extreme Bimodal were drawn from psychometric measures. These distributions are illustrated in Figures 10 through 12.

All eight achievement and psychometric distributions are nonnormal.

Presentation of Results

Tables will document each ranking method’s performance in terms of attaining the T score’s specified mean (50) and standard deviation (10), and the skewness (0) and kurtosis (3) of the unit normal distribution.

35

500

400

300

Frequency 200

100

0 0 5 10 15 20 25

Score

Figure 5. Achievement: Smooth Symmetric. (Sawilowsky & Fahoome, 2003)

Basic characteristics of this distribution:

Range: (0 ≤ x ≤ 27)

Mean: 13.19

Median: 13.00

Variance: 24.11

Skewness: 0.01

Kurtosis: 2.66

36

300

250

200

150 Frequency

100

50

0 0 5 10 15 20 25

Score

Figure 6. Achievement: Discrete Mass at Zero. (Sawilowsky & Fahoome,

2003)

Basic characteristics of this distribution:

Range: (0 ≤ x ≤ 27)

Mean: 12.92

Median: 13.00

Variance: 19.54

Skewness: -0.03

Kurtosis: 3.31

37

500

400

300

Frequency 200

100

0 5 10 15 20 25 30

Score

Figure 7. Achievement: Extreme Asymmetric – Growth. (Sawilowsky &

Fahoome, 2003)

Basic characteristics of this distribution:

Range: (4 ≤ x ≤ 30)

Mean: 24.50

Median: 27.00

Variance: 33.52

Skewness: -1.33

Kurtosis: 4.11

38

300

250

200

150 Frequency

100

50

0 420 445 470 495 520 545 570 595 620

Score

Figure 8. Achievement: Digit Preference. (Sawilowsky & Fahoome, 2003)

Basic characteristics of this distribution:

Range: (420 ≤ x ≤ 635)

Mean: 536.95

Median: 535.00

Variance: 1416.77

Skewness: -0.07

Kurtosis: 2.76

39

25

20

15

Frequency 10

5

0 0 5 10 15 20 25 30 35 40

Score

Figure 9. Achievement: Multimodal Lumpy. (Sawilowsky & Fahoome, 2003)

Basic characteristics of this distribution:

Range: (0 ≤ x ≤ 43)

Mean: 21.15

Median: 18.00

Variance: 141.61

Skewness: 0.19

Kurtosis: 1.80

40

600

500

400

300 Frequency

200

100

0 0 5 10 15

Score

Figure 10 . Psychometric: Mass at Zero with Gap. (Sawilowsky & Fahoome,

2003)

Basic characteristics of this distribution:

Range: (0 ≤ x ≤ 16)

Mean: 1.85

Median: 0

Variance: 14.44

Skewness: 1.65

Kurtosis: 3.98

41

1200

1000

800

600 Frequency

400

200

0 10 15 20 25 30

Score

Figure 11 . Psychometric: Extreme Asymmetric – Decay. (Sawilowsky &

Fahoome, 2003)

Basic characteristics of this distribution:

Range: (10 ≤ x ≤ 30)

Mean: 13.67

Median: 11.00

Variance: 33.06

Skewness: 1.64

Kurtosis: 4.52

42

250

200

150

Frequency 100

50

0 0 1 2 3 4 5

Score

Figure 12 . Psychometric: Extreme Bimodal. (Sawilowsky & Fahoome, 2003)

Basic characteristics of this distribution:

Range: (0 ≤ x ≤ 5)

Mean: 2.97

Median: 4.00

Variance: 2.86

Skewness: -0.80

Kurtosis: 1.30

43

CHAPTER 4

RESULTS

The purpose of this study was to compare the accuracy of the Blom, Tukey,

Van der Waerden, and Rankit approximations. The following 32 tables present the results. They show the absolute and relative accuracy of the four approximations in attaining the target moments of the normal distribution at the values established by the T score scale. The tables are organized sequentially according to distribution and moment. Study results for the mean, the standard deviation, skewness, and kurtosis appear in the same order for each of the eight distributions described in

Chapter 3. All numbers are rounded to the third decimal place.

The accuracy of the four ranking methods on the T score is given in two forms. The first, which comes to the left of the backslash ( \ ), represents the statistic’s rank relative to the other three approximations. The number to the right of the backslash represents an actual value, not a rank. The top half of each table displays the relative ranks and absolute values of approximated scores’ deviation from the target value of the given moment. For example, the T score’s target standard deviation is 10. Therefore, the deviation value represents the absolute value of the distance of each approximation from 10. Two ranking methods that produce a standard deviation of 9.8 or 10.2 would have the same the deviation value: 0.2. The bottom half of the tables displays the ranks and values of the root mean square (RMS). RMS values, which represent the magnitude of difference between scores, are derived by taking the standard deviations of each set of mean, standard deviation, skewness, and kurtosis values. Both deviation from

44 target (the top half of the tables) and RMS (the bottom half) compare the four approximations’ variability. Whereas deviation from target computes each ranking method’s hit rate, or how frequently it is accurate, RMS evaluates the degree of difference between the methods’ performance. It is possible for an approximation to have different ranks in terms of deviation from target and magnitude of deviation.

The rank, which is the first number in each column, is a whole number when the approximation method achieves the same rank over 10,000 Monte Carlo runs.

It is a decimal when this is not the case. However, unlike deviation ranks, RMS ranks correspond to a single statistic: the standard deviation of the respective statistic’s average performance across 10,000 random draws. Therefore, ties are possible between RMS ranks. There are 18 instances of tied RMS ranks distributed among nine tables. Ties are broken by assigning to each tied rank the average value of the tied ranks and the missing rank. For example, the two-way tie

(1, 1, 3, 4) is missing the rank of (2). The first two ranks are reassigned as the mean of (1) and (2): (1.5, 1.5, 3, 4). Three-way ties, which are rare, are broken in the same way: (1, 1, 1, 4) becomes (2, 2, 2, 4), representing the midpoint of (1) and the missing ranks of (2) and (3).

The final statistic that is provided in the tables is the range for deviation from target and RMS. In both cases, the range represents the difference between the highest and the lowest values (not the ranks) in each row. The larger the range, the more the deviation and RMS ranks are likely to matter. Following the 32 tables documenting accuracy, a series of figures explores the deviation range.

45

Table 2 Smooth Symmetric—Accuracy of T Scores on Means

Deviation from Target (50) n Rank \ Value B T V R 5 1.686 \ 0.000 1.360 \ 0.000 1.633 \ 0.000 1.358 \ 0.000 10 1.720 \ 0.000 1.619 \ 0.000 1.680 \ 0.000 1.747 \ 0.000 15 1.794 \ 0.000 1.805 \ 0.000 1.776 \ 0.000 1.836 \ 0.000 20 1.892 \ 0.000 1.835 \ 0.000 1.914 \ 0.000 1.963 \ 0.000 25 1.801 \ 0.000 1.845 \ 0.000 1.819 \ 0.000 1.814 \ 0.000 30 1.913 \ 0.000 2.067 \ 0.000 1.928 \ 0.000 1.828 \ 0.000 35 2.006 \ 0.000 2.079 \ 0.000 1.945 \ 0.000 2.115 \ 0.000 40 1.981 \ 0.000 2.074 \ 0.000 2.017 \ 0.000 2.037 \ 0.000 45 1.906 \ 0.000 1.923 \ 0.000 1.903 \ 0.000 1.923 \ 0.000 50 2.043 \ 0.000 2.047 \ 0.000 1.944 \ 0.000 1.955 \ 0.000 100 2.136 \ 0.000 2.157 \ 0.000 2.153 \ 0.000 2.161 \ 0.000 200 2.244 \ 0.000 2.284 \ 0.000 2.310 \ 0.000 2.317 \ 0.000 500 2.429 \ 0.000 2.445 \ 0.000 2.433 \ 0.000 2.425 \ 0.000 1000 2.466 \ 0.000 2.457 \ 0.000 2.465 \ 0.000 2.471 \ 0.000

Magnitude of Deviation (RMS) n Rank \ Value B T V R 5 1.500 \ 0.000 1.500 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 10 1.500 \ 0.000 3.500 \ 0.000 3.500 \ 0.000 1.500 \ 0.000 15 1.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000 20 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 25 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 30 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 35 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 40 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 45 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 50 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 100 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 200 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 500 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 1000 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000

46

Table 3 Smooth Symmetric—Accuracy of T Scores on Standard Deviations

Deviation from Target (10) n Rank \ Value Range B T V R 5 1.000 \ 1.987 3.000 \ 2.177 4.000 \ 3.372 2.000 \ 2.089 1.385 10 1.000 \ 1.161 3.000 \ 1.296 4.000 \ 2.185 2.000 \ 1.185 1.024 15 1.998 \ 0.844 3.000 \ 0.951 4.000 \ 1.667 1.002 \ 0.842 0.825 20 2.000 \ 0.671 3.000 \ 0.760 4.000 \ 1.367 1.000 \ 0.659 0.708 25 2.000 \ 0.561 3.000 \ 0.638 4.000 \ 1.168 1.000 \ 0.544 0.624 30 2.000 \ 0.485 3.000 \ 0.554 4.000 \ 1.026 1.000 \ 0.465 0.561 35 2.000 \ 0.429 3.000 \ 0.491 4.000 \ 0.918 1.000 \ 0.408 0.510 40 2.000 \ 0.386 3.000 \ 0.442 4.000 \ 0.833 1.000 \ 0.364 0.469 45 2.000 \ 0.351 3.000 \ 0.403 4.000 \ 0.764 1.000 \ 0.329 0.435 50 2.000 \ 0.323 3.000 \ 0.371 4.000 \ 0.707 1.000 \ 0.300 0.407 100 2.000 \ 0.186 3.000 \ 0.215 4.000 \ 0.421 1.000 \ 0.167 0.254 200 2.000 \ 0.111 3.000 \ 0.128 4.000 \ 0.250 1.000 \ 0.010 0.240 500 2.000 \ 0.006 3.000 \ 0.007 4.000 \ 0.128 1.000 \ 0.005 0.123 1000 2.000 \ 0.004 3.000 \ 0.005 4.000 \ 0.008 1.000 \ 0.004 0.004

Magnitude of Deviation (RMS) n Rank \ Value Range B T V R 5 3.000 \ 0.008 4.000 \ 0.008 2.000 \ 0.007 1.000 \ 0.007 0.001 10 3.000 \ 0.005 4.000 \ 0.005 2.000 \ 0.005 1.000 \ 0.004 0.001 15 3.000 \ 0.004 4.000 \ 0.004 2.000 \ 0.004 1.000 \ 0.003 0.001 20 3.000 \ 0.004 4.000 \ 0.004 2.000 \ 0.004 1.000 \ 0.003 0.001 25 3.000 \ 0.003 4.000 \ 0.003 2.000 \ 0.003 1.000 \ 0.002 0.001 30 4.000 \ 0.003 3.000 \ 0.003 2.000 \ 0.003 1.000 \ 0.002 0.001 35 3.000 \ 0.002 4.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.001 0.001 40 3.000 \ 0.002 4.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.001 0.001 45 4.000 \ 0.001 3.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.001 0.000 50 3.000 \ 0.001 4.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.001 0.000 100 4.000 \ 0.001 3.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.000 0.001 200 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 1.000 \ 0.000 0.000 500 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 1.000 \ 0.000 0.000 1000 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 1.000 \ 0.000 0.000

47

Table 4 Smooth Symmetric—Accuracy of T Scores on Skewness

Deviation from Target (0) n Rank \ Value Range B T V R 5 3.717 \ 0.000 2.895 \ 0.000 1.295 \ 0.000 2.093 \ 0.000 0.000 10 3.936 \ 0.001 1.914 \ 0.000 1.232 \ 0.001 2.919 \ 0.001 0.001 15 1.013 \ 0.001 2.994 \ 0.001 3.989 \ 0.105 2.004 \ 0.001 0.104 20 2.006 \ 0.140 2.997 \ 0.140 3.987 \ 0.146 1.010 \ 0.140 0.006 25 1.995 \ 0.122 3.000 \ 0.122 4.000 \ 0.127 1.007 \ 0.122 0.005 30 2.000 \ 0.007 3.000 \ 0.007 4.000 \ 0.007 1.000 \ 0.007 0.000 35 1.993 \ 0.001 2.997 \ 0.001 3.994 \ 0.001 1.016 \ 0.001 0.000 40 2.116 \ 0.000 2.908 \ 0.000 3.732 \ 0.000 1.244 \ 0.000 0.000 45 2.007 \ 0.001 3.000 \ 0.001 3.989 \ 0.001 1.008 \ 0.001 0.000 50 2.020 \ 0.141 2.989 \ 0.141 3.965 \ 0.145 1.027 \ 0.141 0.004 100 2.937 \ 0.002 2.055 \ 0.002 1.170 \ 0.002 3.838 \ 0.002 0.000 200 2.930 \ 0.003 2.063 \ 0.003 1.190 \ 0.003 3.817 \ 0.003 0.000 500 2.897 \ 0.003 2.082 \ 0.003 1.233 \ 0.003 3.788 \ 0.003 0.000 1000 2.875 \ 0.003 2.094 \ 0.003 1.288 \ 0.003 3.743 \ 0.003 0.000

Magnitude of Deviation (RMS) n Rank \ Value Range B T V R 5 1.000 \ 0.002 2.000 \ 0.002 4.000 \ 0.002 3.000 \ 0.002 0.000 10 1.000 \ 0.256 3.000 \ 0.259 4.000 \ 0.279 2.000 \ 0.258 0.023 15 4.000 \ 0.521 2.000 \ 0.520 1.000 \ 0.515 3.000 \ 0.520 0.006 20 1.000 \ 0.446 3.000 \ 0.447 4.000 \ 0.456 2.000 \ 0.446 0.010 25 4.000 \ 0.570 2.000 \ 0.567 1.000 \ 0.551 3.000 \ 0.570 0.019 30 4.000 \ 0.453 2.000 \ 0.450 1.000 \ 0.436 3.000 \ 0.452 0.017 35 3.000 \ 0.479 2.000 \ 0.477 1.000 \ 0.461 4.000 \ 0.479 0.018 40 3.000 \ 0.612 2.000 \ 0.611 1.000 \ 0.560 4.000 \ 0.613 0.053 45 3.000 \ 0.587 2.000 \ 0.586 1.000 \ 0.578 4.000 \ 0.587 0.009 50 3.000 \ 0.607 2.000 \ 0.605 1.000 \ 0.593 4.000 \ 0.608 0.015 100 3.000 \ 0.565 2.000 \ 0.565 1.000 \ 0.564 4.000 \ 0.565 0.001 200 3.000 \ 0.555 2.000 \ 0.555 4.000 \ 0.555 1.000 \ 0.555 0.000 500 3.000 \ 0.549 2.000 \ 0.549 4.000 \ 0.549 1.000 \ 0.549 0.000 1000 3.000 \ 0.555 2.000 \ 0.555 1.000 \ 0.555 4.000 \ 0.555 0.000

48

Table 5 Smooth Symmetric—Accuracy of T Scores on Kurtosis

Deviation from Target (3) n Rank \ Value Range B T V R 5 1.000 \ 1.148 2.000 \ 1.155 4.000 \ 1.195 3.000 \ 1.156 0.047 10 1.000 \ 1.106 3.000 \ 1.111 4.000 \ 1.138 2.000 \ 1.110 0.032 15 1.000 \ 1.092 3.000 \ 1.095 4.000 \ 1.115 2.000 \ 1.093 0.023 20 1.001 \ 1.058 3.000 \ 1.061 4.000 \ 1.079 2.002 \ 1.058 0.021 25 1.922 \ 1.019 3.000 \ 1.022 4.000 \ 1.040 1.078 \ 1.019 0.021 30 2.000 \ 0.981 3.000 \ 0.983 4.000 \ 1.000 1.000 \ 0.980 0.020 35 2.000 \ 0.957 3.000 \ 0.959 4.000 \ 0.975 1.000 \ 0.956 0.019 40 2.000 \ 0.953 3.000 \ 0.956 4.000 \ 0.970 1.000 \ 0.953 0.017 45 2.000 \ 0.979 3.000 \ 0.980 4.000 \ 0.993 1.000 \ 0.978 0.015 50 2.000 \ 1.014 3.000 \ 1.016 4.000 \ 1.028 1.000 \ 1.013 0.015 100 2.000 \ 0.957 3.000 \ 0.960 4.000 \ 0.976 1.000 \ 0.956 0.020 200 2.000 \ 0.948 3.000 \ 0.950 4.000 \ 0.961 1.000 \ 0.947 0.014 500 2.000 \ 0.942 3.000 \ 0.943 4.000 \ 0.949 1.000 \ 0.941 0.008 1000 2.000 \ 0.940 3.000 \ 0.940 4.000 \ 0.944 1.000 \ 0.939 0.005

Magnitude of Deviation (RMS) n Rank \ Value Range B T V R 5 1.000 \ 0.006 2.000 \ 0.006 4.000 \ 0.006 3.000 \ 0.006 0.000 10 2.000 \ 0.310 3.000 \ 0.311 4.000 \ 0.313 1.000 \ 0.310 0.003 15 1.000 \ 0.434 3.000 \ 0.435 4.000 \ 0.438 2.000 \ 0.435 0.004 20 1.000 \ 0.402 3.000 \ 0.403 4.000 \ 0.411 2.000 \ 0.402 0.009 25 4.000 \ 0.470 2.000 \ 0.469 1.000 \ 0.462 3.000 \ 0.470 0.008 30 4.000 \ 0.456 2.000 \ 0.456 1.000 \ 0.452 3.000 \ 0.456 0.004 35 3.000 \ 0.444 2.000 \ 0.444 1.000 \ 0.443 4.000 \ 0.444 0.001 40 3.000 \ 0.462 2.000 \ 0.461 1.000 \ 0.457 4.000 \ 0.462 0.005 45 3.000 \ 0.500 2.000 \ 0.500 1.000 \ 0.498 4.000 \ 0.500 0.002 50 3.000 \ 0.495 2.000 \ 0.495 1.000 \ 0.494 4.000 \ 0.495 0.001 100 2.000 \ 0.476 3.000 \ 0.476 4.000 \ 0.476 1.000 \ 0.476 0.000 200 2.000 \ 0.477 3.000 \ 0.477 4.000 \ 0.478 1.000 \ 0.477 0.001 500 2.000 \ 0.472 3.000 \ 0.472 4.000 \ 0.472 1.000 \ 0.472 0.000 1000 2.000 \ 0.472 3.000 \ 0.472 4.000 \ 0.473 1.000 \ 0.472 0.001

49

Table 6 Discrete Mass at Zero—Accuracy of T Scores on Means

Deviation from Target (50) n Rank \ Value B T V R 5 1.811 \ 0.000 1.403 \ 0.000 1.594 \ 0.000 1.295 \ 0.000 10 1.761 \ 0.000 1.640 \ 0.000 1.711 \ 0.000 1.700 \ 0.000 15 1.774 \ 0.000 1.827 \ 0.000 1.796 \ 0.000 1.866 \ 0.000 20 1.902 \ 0.000 1.845 \ 0.000 1.934 \ 0.000 1.970 \ 0.000 25 1.796 \ 0.000 1.857 \ 0.000 1.840 \ 0.000 1.791 \ 0.000 30 1.937 \ 0.000 2.066 \ 0.000 1.947 \ 0.000 1.853 \ 0.000 35 1.982 \ 0.000 2.078 \ 0.000 1.957 \ 0.000 2.158 \ 0.000 40 1.987 \ 0.000 2.103 \ 0.000 2.007 \ 0.000 2.021 \ 0.000 45 1.924 \ 0.000 1.932 \ 0.000 1.913 \ 0.000 1.908 \ 0.000 50 2.072 \ 0.000 2.008 \ 0.000 1.975 \ 0.000 1.971 \ 0.000 100 2.127 \ 0.000 2.202 \ 0.000 2.136 \ 0.000 2.187 \ 0.000 200 2.266 \ 0.000 2.292 \ 0.000 2.303 \ 0.000 2.330 \ 0.000 500 2.441 \ 0.000 2.435 \ 0.000 2.415 \ 0.000 2.439 \ 0.000 1000 2.456 \ 0.000 2.458 \ 0.000 2.492 \ 0.000 2.463 \ 0.000

Magnitude of Deviation (RMS) n Rank \ Value B T V R 5 2.000 \ 0.000 2.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000 10 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 15 1.000 \ 0.000 4.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 20 1.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000 3.000 \ 0.000 25 1.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000 30 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 35 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 40 1.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000 45 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 50 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 100 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 200 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 500 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 1000 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000

50

Table 7 Discrete Mass at Zero—Accuracy of T Scores on Standard Deviations

Deviation from Target (10) n Rank \ Value Range B T V R 5 1.000 \ 2.049 3.000 \ 2.237 4.000 \ 3.421 2.000 \ 2.149 1.372 10 1.000 \ 1.182 3.000 \ 1.316 4.000 \ 2.202 2.000 \ 1.205 1.020 15 1.997 \ 0.853 3.000 \ 0.959 4.000 \ 1.675 1.003 \ 0.851 0.824 20 2.000 \ 0.675 3.000 \ 0.764 4.000 \ 1.370 1.000 \ 0.663 0.707 25 2.000 \ 0.571 3.000 \ 0.648 4.000 \ 1.176 1.000 \ 0.553 0.623 30 2.000 \ 0.496 3.000 \ 0.564 4.000 \ 1.035 1.000 \ 0.476 0.559 35 2.000 \ 0.440 3.000 \ 0.501 4.000 \ 0.927 1.000 \ 0.418 0.509 40 2.000 \ 0.396 3.000 \ 0.452 4.000 \ 0.842 1.000 \ 0.374 0.468 45 3.000 \ 0.368 3.000 \ 0.412 4.000 \ 0.773 1.000 \ 0.338 0.435 50 2.000 \ 0.333 3.000 \ 0.381 4.000 \ 0.716 1.000 \ 0.310 0.406 100 2.000 \ 0.195 3.000 \ 0.224 4.000 \ 0.429 1.000 \ 0.176 0.253 200 2.000 \ 0.120 3.000 \ 0.137 4.000 \ 0.258 1.000 \ 0.106 0.152 500 2.000 \ 0.007 3.000 \ 0.008 4.000 \ 0.137 1.000 \ 0.006 0.131 1000 2.000 \ 0.005 3.000 \ 0.006 4.000 \ 0.009 1.000 \ 0.005 0.004

Magnitude of Deviation (RMS) n Rank \ Value Range B T V R 5 4.000 \ 0.117 3.000 \ 0.114 2.000 \ 0.113 1.000 \ 0.009 0.108 10 4.000 \ 0.008 3.000 \ 0.008 2.000 \ 0.008 1.000 \ 0.006 0.002 15 3.000 \ 0.005 4.000 \ 0.005 2.000 \ 0.005 1.000 \ 0.004 0.001 20 3.000 \ 0.004 4.000 \ 0.004 2.000 \ 0.004 1.000 \ 0.003 0.001 25 3.000 \ 0.003 4.000 \ 0.003 2.000 \ 0.003 1.000 \ 0.002 0.001 30 4.000 \ 0.004 3.000 \ 0.004 2.000 \ 0.004 1.000 \ 0.003 0.001 35 4.000 \ 0.002 3.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.002 0.000 40 3.000 \ 0.002 4.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.001 0.001 45 4.000 \ 0.002 3.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.001 0.001 50 4.000 \ 0.001 3.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.001 0.000 100 4.000 \ 0.001 3.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.001 0.000 200 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 1.000 \ 0.000 0.000 500 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 1.000 \ 0.000 0.000 1000 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 1.000 \ 0.000 0.000

51

Table 8 Discrete Mass at Zero—Accuracy of T Scores on Skewness

Deviation from Target (0) n Rank \ Value Range B T V R 5 3.740 \ 0.001 2.914 \ 0.001 1.264 \ 0.001 2.083 \ 0.001 0.000 10 1.005 \ 0.004 2.999 \ 0.005 3.996 \ 0.006 2.001 \ 0.005 0.002 15 2.880 \ 0.006 2.302 \ 0.006 2.238 \ 0.004 2.579 \ 0.006 0.002 20 3.983 \ 0.007 2.007 \ 0.006 1.023 \ 0.005 2.987 \ 0.007 0.002 25 2.274 \ 0.003 2.382 \ 0.004 3.197 \ 0.005 2.147 \ 0.003 0.002 30 2.005 \ 0.127 2.994 \ 0.128 3.985 \ 0.134 1.015 \ 0.127 0.007 35 2.017 \ 0.139 2.989 \ 0.140 3.968 \ 0.145 1.026 \ 0.139 0.006 40 2.000 \ 0.119 3.000 \ 0.120 3.999 \ 0.123 1.001 \ 0.119 0.004 45 2.003 \ 0.007 2.997 \ 0.007 3.992 \ 0.007 1.008 \ 0.007 0.000 50 2.007 \ 0.002 2.999 \ 0.002 3.972 \ 0.002 1.023 \ 0.002 0.000 100 2.881 \ 0.001 2.074 \ 0.001 1.339 \ 0.000 3.706 \ 0.001 0.001 200 3.001 \ 0.003 2.019 \ 0.003 1.069 \ 0.003 3.912 \ 0.003 0.000 500 2.965 \ 0.003 2.026 \ 0.003 1.033 \ 0.003 3.975 \ 0.003 0.000 1000 2.863 \ 0.001 2.127 \ 0.001 1.403 \ 0.001 3.607 \ 0.001 0.000

Magnitude of Deviation (RMS) n Rank \ Value Range B T V R 5 4.000 \ 0.010 3.000 \ 0.010 1.000 \ 0.009 2.000 \ 0.010 0.001 10 4.000 \ 0.469 2.000 \ 0.467 1.000 \ 0.459 3.000 \ 0.467 0.010 15 1.000 \ 0.313 3.000 \ 0.313 4.000 \ 0.320 2.000 \ 0.313 0.007 20 1.000 \ 0.473 3.000 \ 0.473 4.000 \ 0.479 2.000 \ 0.473 0.006 25 4.000 \ 0.382 2.000 \ 0.382 1.000 \ 0.382 3.000 \ 0.382 0.000 30 4.000 \ 0.526 2.000 \ 0.525 1.000 \ 0.524 3.000 \ 0.526 0.002 35 3.000 \ 0.535 2.000 \ 0.535 1.000 \ 0.520 4.000 \ 0.535 0.015 40 3.000 \ 0.608 2.000 \ 0.607 1.000 \ 0.598 4.000 \ 0.609 0.011 45 3.000 \ 0.535 2.000 \ 0.559 1.000 \ 0.551 4.000 \ 0.560 0.025 50 3.000 \ 0.566 2.000 \ 0.565 1.000 \ 0.558 4.000 \ 0.567 0.009 100 3.000 \ 0.559 2.000 \ 0.559 1.000 \ 0.558 4.000 \ 0.558 0.001 200 3.000 \ 0.555 2.000 \ 0.555 4.000 \ 0.555 1.000 \ 0.555 0.000 500 4.000 \ 0.552 2.000 \ 0.552 1.000 \ 0.552 3.000 \ 0.552 0.000 1000 2.000 \ 0.542 1.000 \ 0.542 4.000 \ 0.542 3.000 \ 0.542 0.000

52

Table 9 Discrete Mass at Zero—Accuracy of T Scores on Kurtosis

Deviation from Target (3) n Rank \ Value Range B T V R 5 1.000 \ 1.164 2.000 \ 1.171 4.000 \ 1.208 3.000 \ 1.172 0.044 10 1.000 \ 1.109 2.100 \ 1.114 3.100 \ 1.139 2.000 \ 1.112 0.030 15 1.001 \ 1.070 3.000 \ 1.074 3.999 \ 1.098 2.000 \ 1.072 0.028 20 1.000 \ 1.016 3.000 \ 1.020 4.000 \ 1.047 2.000 \ 1.017 0.031 25 1.001 \ 1.078 3.000 \ 1.081 4.000 \ 1.095 1.999 \ 1.078 0.017 30 2.000 \ 1.075 3.000 \ 1.077 4.000 \ 1.091 1.000 \ 1.075 0.016 35 2.000 \ 1.044 3.000 \ 1.046 4.000 \ 1.060 1.000 \ 1.043 0.017 40 2.000 \ 0.996 3.000 \ 0.999 3.999 \ 1.012 1.000 \ 0.996 0.016 45 2.000 \ 0.953 3.000 \ 0.955 4.000 \ 0.968 1.000 \ 0.953 0.015 50 2.000 \ 0.945 3.000 \ 0.946 4.000 \ 0.959 1.000 \ 0.944 0.015 100 2.000 \ 1.081 3.000 \ 1.082 3.999 \ 1.088 1.001 \ 1.080 0.008 200 2.000 \ 0.949 3.000 \ 0.950 4.000 \ 0.961 1.000 \ 0.947 0.014 500 2.000 \ 0.942 3.000 \ 0.943 4.000 \ 0.949 1.000 \ 0.941 0.008 1000 2.000 \ 1.081 3.000 \ 1.081 3.999 \ 1.082 1.001 \ 1.081 0.001

Magnitude of Deviation (RMS) n Rank \ Value Range B T V R 5 4.000 \ 0.010 3.000 \ 0.010 1.000 \ 0.009 2.000 \ 0.010 0.001 10 1.000 \ 0.304 3.000 \ 0.306 4.000 \ 0.320 2.000 \ 0.305 0.016 15 1.000 \ 0.339 3.000 \ 0.340 4.000 \ 0.348 2.000 \ 0.339 0.009 20 1.000 \ 0.326 3.000 \ 0.327 4.000 \ 0.332 2.000 \ 0.326 0.006 25 1.000 \ 0.397 3.000 \ 0.397 4.000 \ 0.398 2.000 \ 0.397 0.001 30 2.000 \ 0.502 3.000 \ 0.502 4.000 \ 0.505 1.000 \ 0.502 0.003 35 4.000 \ 0.354 2.000 \ 0.354 1.000 \ 0.354 3.000 \ 0.354 0.000 40 2.000 \ 0.468 3.000 \ 0.468 4.000 \ 0.469 1.000 \ 0.468 0.001 45 3.000 \ 0.503 2.000 \ 0.503 1.000 \ 0.500 4.000 \ 0.503 0.003 50 3.000 \ 0.465 2.000 \ 0.465 1.000 \ 0.464 4.000 \ 0.466 0.002 100 2.000 \ 0.494 3.000 \ 0.494 4.000 \ 0.495 1.000 \ 0.494 0.001 200 2.000 \ 0.480 3.000 \ 0.480 4.000 \ 0.480 1.000 \ 0.480 0.000 500 2.000 \ 0.477 3.000 \ 0.477 4.000 \ 0.477 1.000 \ 0.477 0.000 1000 2.000 \ 0.473 3.000 \ 0.473 4.000 \ 0.473 1.000 \ 0.473 0.000

53

Table 10 Extreme Asymmetric, Growth—Accuracy of T Scores on Means

Deviation from Target (50) n Rank \ Value B T V R 5 1.738 \ 0.000 1.411 \ 0.000 1.579 \ 0.000 1.196 \ 0.000 10 1.796 \ 0.000 1.669 \ 0.000 1.837 \ 0.000 1.630 \ 0.000 15 1.827 \ 0.000 1.810 \ 0.000 1.846 \ 0.000 1.778 \ 0.000 20 2.039 \ 0.000 1.878 \ 0.000 1.936 \ 0.000 1.923 \ 0.000 25 1.822 \ 0.000 1.853 \ 0.000 1.825 \ 0.000 1.880 \ 0.000 30 1.977 \ 0.000 2.051 \ 0.000 2.004 \ 0.000 1.882 \ 0.000 35 2.076 \ 0.000 2.076 \ 0.000 2.049 \ 0.000 2.028 \ 0.000 40 2.091 \ 0.000 2.033 \ 0.000 2.041 \ 0.000 1.988 \ 0.000 45 1.943 \ 0.000 1.958 \ 0.000 1.881 \ 0.000 1.926 \ 0.000 50 2.043 \ 0.000 2.025 \ 0.000 1.967 \ 0.000 1.988 \ 0.000 100 2.143 \ 0.000 2.235 \ 0.000 2.192 \ 0.000 2.141 \ 0.000 200 2.276 \ 0.000 2.310 \ 0.000 2.314 \ 0.000 2.390 \ 0.000 500 2.411 \ 0.000 2.449 \ 0.000 2.474 \ 0.000 2.437 \ 0.000 1000 2.477 \ 0.000 2.474 \ 0.000 2.457 \ 0.000 2.477 \ 0.000

Magnitude of Deviation (RMS) n Rank \ Value B T V R 5 1.500 \ 0.000 1.500 \ 0.000 3.500 \ 0.000 3.500 \ 0.000 10 1.000 \ 0.000 2.000 \ 0.000 3.500 \ 0.000 3.500 \ 0.000 15 1.000 \ 0.000 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 20 1.000 \ 0.000 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 25 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 30 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 35 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 40 4.000 \ 0.000 1.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 45 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 50 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 100 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 200 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 500 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 1000 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000

54

Table 11 Extreme Asymmetric, Growth—Accuracy of T Scores on Standard Deviations

Deviation from Target (10) n Rank \ Value Range B T V R 5 1.000 \ 2.075 3.000 \ 2.263 4.000 \ 3.442 2.000 \ 2.176 1.367 10 1.000 \ 1.243 3.000 \ 1.375 4.000 \ 2.250 2.000 \ 1.265 1.007 15 1.984 \ 0.934 3.000 \ 1.038 4.000 \ 1.739 1.016 \ 0.932 0.807 20 2.000 \ 0.769 3.000 \ 0.855 4.000 \ 1.446 1.000 \ 0.756 0.690 25 2.000 \ 0.666 3.000 \ 0.740 4.000 \ 1.253 1.000 \ 0.649 0.604 30 2.000 \ 0.601 3.000 \ 0.666 4.000 \ 1.120 1.000 \ 0.581 0.539 35 2.000 \ 0.551 3.000 \ 0.609 4.000 \ 1.018 1.000 \ 0.530 0.488 40 2.000 \ 0.524 3.000 \ 0.577 4.000 \ 0.947 1.000 \ 0.502 0.445 45 2.000 \ 0.484 3.000 \ 0.532 4.000 \ 0.874 1.000 \ 0.462 0.412 50 2.000 \ 0.440 3.000 \ 0.485 4.000 \ 0.804 1.000 \ 0.418 0.386 100 2.000 \ 0.320 3.000 \ 0.347 4.000 \ 0.538 1.000 \ 0.302 0.236 200 2.000 \ 0.258 3.000 \ 0.273 4.000 \ 0.384 1.000 \ 0.245 0.139 500 2.000 \ 0.213 3.000 \ 0.220 4.000 \ 0.273 1.000 \ 0.205 0.068 1000 2.000 \ 0.197 3.000 \ 0.201 4.000 \ 0.230 1.000 \ 0.194 0.036

Magnitude of Deviation (RMS) n Rank \ Value Range B T V R 5 4.000 \ 0.204 3.000 \ 0.200 2.000 \ 0.198 1.000 \ 0.163 0.041 10 4.000 \ 0.169 3.000 \ 0.168 2.000 \ 0.165 1.000 \ 0.144 0.025 15 3.000 \ 0.007 4.000 \ 0.007 2.000 \ 0.007 1.000 \ 0.006 0.001 20 3.000 \ 0.005 4.000 \ 0.005 2.000 \ 0.005 1.000 \ 0.004 0.001 25 4.000 \ 0.010 3.000 \ 0.010 2.000 \ 0.010 1.000 \ 0.008 0.002 30 4.000 \ 0.009 3.000 \ 0.009 2.000 \ 0.009 1.000 \ 0.008 0.001 35 4.000 \ 0.006 3.000 \ 0.006 2.000 \ 0.006 1.000 \ 0.005 0.001 40 4.000 \ 0.005 3.000 \ 0.005 2.000 \ 0.005 1.000 \ 0.005 0.000 45 4.000 \ 0.007 3.000 \ 0.007 2.000 \ 0.007 1.000 \ 0.006 0.001 50 4.000 \ 0.005 3.000 \ 0.005 2.000 \ 0.005 1.000 \ 0.005 0.000 100 4.000 \ 0.004 3.000 \ 0.004 2.000 \ 0.004 1.000 \ 0.004 0.000 200 4.000 \ 0.003 3.000 \ 0.003 2.000 \ 0.003 1.000 \ 0.003 0.000 500 4.000 \ 0.002 3.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.002 0.000 1000 3.000 \ 0.001 4.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.001 0.000

55

Table 12 Extreme Asymmetric, Growth—Accuracy of T Scores on Skewness

Deviation from Target (0) n Rank \ Value Range B T V R 5 4.000 \ 0.005 3.000 \ 0.005 1.000 \ 0.005 2.000 \ 0.005 0.000 10 1.076 \ 0.001 2.973 \ 0.001 3.928 \ 0.001 2.024 \ 0.001 0.000 15 3.424 \ 0.004 2.163 \ 0.004 1.624 \ 0.004 2.790 \ 0.004 0.000 20 1.582 \ 0.107 2.997 \ 0.107 3.988 \ 0.109 1.433 \ 0.107 0.002 25 1.998 \ 0.139 2.998 \ 0.139 3.994 \ 0.143 1.010 \ 0.139 0.004 30 2.000 \ 0.176 2.998 \ 0.177 3.994 \ 0.182 1.008 \ 0.176 0.006 35 2.004 \ 0.142 3.000 \ 0.142 3.992 \ 0.148 1.007 \ 0.141 0.007 40 2.000 \ 0.002 2.999 \ 0.002 3.998 \ 0.003 1.003 \ 0.002 0.001 45 2.149 \ 0.009 2.968 \ 0.009 3.798 \ 0.009 1.085 \ 0.009 0.000 50 3.000 \ 0.010 2.000 \ 0.010 1.000 \ 0.008 4.000 \ 0.010 0.002 100 2.006 \ 0.166 2.996 \ 0.167 3.990 \ 0.168 1.008 \ 0.166 0.002 200 2.591 \ 0.170 2.420 \ 0.169 2.316 \ 0.164 2.677 \ 0.171 0.007 500 2.615 \ 0.174 2.388 \ 0.174 2.193 \ 0.171 2.804 \ 0.175 0.004 1000 2.620 \ 0.176 2.379 \ 0.175 2.160 \ 0.174 2.841 \ 0.176 0.002

Magnitude of Deviation (RMS) n Rank \ Value Range B T V R 5 4.000 \ 0.113 3.000 \ 0.111 1.000 \ 0.010 2.000 \ 0.110 0.103 10 3.000 \ 0.413 2.000 \ 0.412 1.000 \ 0.410 4.000 \ 0.413 0.003 15 4.000 \ 0.481 2.000 \ 0.478 1.000 \ 0.457 3.000 \ 0.480 0.024 20 4.000 \ 0.654 2.000 \ 0.652 1.000 \ 0.640 3.000 \ 0.654 0.014 25 3.000 \ 0.600 2.000 \ 0.597 1.000 \ 0.580 4.000 \ 0.600 0.020 30 4.000 \ 0.504 2.000 \ 0.503 1.000 \ 0.498 3.000 \ 0.504 0.006 35 3.000 \ 0.668 2.000 \ 0.668 1.000 \ 0.665 4.000 \ 0.669 0.004 40 3.000 \ 0.649 2.000 \ 0.648 1.000 \ 0.644 4.000 \ 0.649 0.005 45 3.000 \ 0.666 2.000 \ 0.665 1.000 \ 0.663 4.000 \ 0.666 0.003 50 2.000 \ 0.500 3.000 \ 0.500 4.000 \ 0.505 1.000 \ 0.500 0.005 100 3.000 \ 0.541 2.000 \ 0.540 1.000 \ 0.538 4.000 \ 0.541 0.003 200 3.000 \ 0.596 2.000 \ 0.596 1.000 \ 0.595 4.000 \ 0.596 0.001 500 3.000 \ 0.576 2.000 \ 0.576 1.000 \ 0.576 4.000 \ 0.576 0.000 1000 3.000 \ 0.590 2.000 \ 0.590 1.000 \ 0.590 4.000 \ 0.590 0.000

56

Table 13 Extreme Asymmetric, Growth—Accuracy of T Scores on Kurtosis

Deviation from Target (3) n Rank \ Value Range B T V R 5 1.000 \ 1.176 2.000 \ 1.182 4.000 \ 1.219 3.000 \ 1.183 0.043 10 1.000 \ 1.063 3.000 \ 1.064 4.000 \ 1.093 2.000 \ 1.066 0.030 15 1.000 \ 1.018 3.000 \ 1.022 4.000 \ 1.042 2.000 \ 1.019 0.024 20 1.000 \ 1.008 3.000 \ 1.010 4.000 \ 1.028 2.000 \ 1.008 0.020 25 2.000 \ 1.025 3.000 \ 1.027 4.000 \ 1.041 1.000 \ 1.025 0.016 30 2.000 \ 1.078 3.000 \ 1.079 4.000 \ 1.091 1.000 \ 1.077 0.014 35 2.000 \ 1.085 3.000 \ 1.087 4.000 \ 1.098 1.000 \ 1.085 0.013 40 2.000 \ 1.116 3.000 \ 1.117 4.000 \ 1.124 1.000 \ 1.115 0.009 45 2.000 \ 1.066 3.000 \ 1.067 4.000 \ 1.079 1.000 \ 1.065 0.014 50 2.000 \ 1.082 3.000 \ 1.083 4.000 \ 1.094 1.000 \ 1.081 0.013 100 2.001 \ 1.044 2.999 \ 1.045 3.998 \ 1.051 1.001 \ 1.043 0.008 200 2.001 \ 1.023 3.000 \ 1.024 3.999 \ 1.030 1.002 \ 1.022 0.008 500 2.001 \ 1.019 3.000 \ 1.020 3.999 \ 1.023 1.002 \ 1.019 0.004 1000 2.001 \ 1.018 3.000 \ 1.018 3.999 \ 1.020 1.002 \ 1.018 0.002

Magnitude of Deviation (RMS) n Rank \ Value Range B T V R 5 4.000 \ 0.010 3.000 \ 0.009 1.000 \ 0.008 2.000 \ 0.009 0.002 10 1.000 \ 0.254 3.000 \ 0.256 4.000 \ 0.263 2.000 \ 0.255 0.009 15 4.000 \ 0.371 2.000 \ 0.369 1.000 \ 0.355 3.000 \ 0.371 0.016 20 2.000 \ 0.456 3.000 \ 0.456 4.000 \ 0.457 1.000 \ 0.456 0.001 25 3.000 \ 0.423 2.000 \ 0.423 1.000 \ 0.421 4.000 \ 0.423 0.002 30 4.000 \ 0.556 2.000 \ 0.556 1.000 \ 0.554 3.000 \ 0.556 0.002 35 2.000 \ 0.544 3.000 \ 0.545 4.000 \ 0.548 1.000 \ 0.544 0.004 40 4.000 \ 0.552 2.000 \ 0.552 1.000 \ 0.551 3.000 \ 0.552 0.001 45 1.000 \ 0.569 3.000 \ 0.569 4.000 \ 0.570 2.000 \ 0.569 0.001 50 1.000 \ 0.433 3.000 \ 0.433 4.000 \ 0.435 2.000 \ 0.433 0.002 100 3.000 \ 0.474 2.000 \ 0.474 4.000 \ 0.475 1.000 \ 0.474 0.001 200 3.000 \ 0.516 2.000 \ 0.516 1.000 \ 0.516 4.000 \ 0.516 0.000 500 3.000 \ 0.503 2.000 \ 0.503 1.000 \ 0.503 4.000 \ 0.503 0.000 1000 3.000 \ 0.511 2.000 \ 0.511 1.000 \ 0.511 1.000 \ 0.511 0.000

57

Table 14 Digit Preference—Accuracy of T Scores on Means

Deviation from Target (50) n Rank \ Value B T V R 5 1.570 \ 0.000 1.371 \ 0.000 1.690 \ 0.000 1.184 \ 0.000 10 1.678 \ 0.000 1.644 \ 0.000 1.648 \ 0.000 1.771 \ 0.000 15 1.815 \ 0.000 1.816 \ 0.000 1.725 \ 0.000 1.838 \ 0.000 20 1.883 \ 0.000 1.825 \ 0.000 1.877 \ 0.000 1.986 \ 0.000 25 1.797 \ 0.000 1.870 \ 0.000 1.794 \ 0.000 1.799 \ 0.000 30 1.893 \ 0.000 2.072 \ 0.000 1.957 \ 0.000 1.775 \ 0.000 35 1.997 \ 0.000 2.048 \ 0.000 1.965 \ 0.000 2.075 \ 0.000 40 1.966 \ 0.000 2.048 \ 0.000 2.041 \ 0.000 2.025 \ 0.000 45 1.861 \ 0.000 1.954 \ 0.000 1.863 \ 0.000 1.922 \ 0.000 50 2.049 \ 0.000 2.026 \ 0.000 1.918 \ 0.000 1.922 \ 0.000 100 2.119 \ 0.000 2.185 \ 0.000 2.129 \ 0.000 2.094 \ 0.000 200 2.277 \ 0.000 2.272 \ 0.000 2.254 \ 0.000 2.292 \ 0.000 500 2.427 \ 0.000 2.418 \ 0.000 2.420 \ 0.000 2.429 \ 0.000 1000 2.459 \ 0.000 2.453 \ 0.000 2.456 \ 0.000 2.471 \ 0.000

Magnitude of Deviation (RMS) n Rank \ Value B T V R 5 1.000 \ 0.000 2.500 \ 0.000 2.500 \ 0.000 4.000 \ 0.000 10 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 15 1.000 \ 0.000 4.000 \ 0.000 3.000 \ 0.000 1.000 \ 0.000 20 2.000 \ 0.000 4.000 \ 0.000 3.000 \ 0.000 1.000 \ 0.000 25 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 30 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 35 1.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000 3.000 \ 0.000 40 1.000 \ 0.000 2.500 \ 0.000 2.500 \ 0.000 4.000 \ 0.000 45 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 50 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 100 1.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000 3.000 \ 0.000 200 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 500 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 1000 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000

58

Table 15 Digit Preference—Accuracy of T Scores on Standard Deviations

Deviation from Target (10) n Rank \ Value Range B T V R 5 1.000 \ 1.975 3.000 \ 2.166 4.000 \ 3.361 2.000 \ 2.077 1.386 10 1.000 \ 1.130 3.000 \ 1.265 4.000 \ 2.159 2.000 \ 1.153 1.029 15 2.000 \ 0.819 3.000 \ 0.926 4.000 \ 1.645 1.000 \ 0.817 0.828 20 2.000 \ 0.652 3.000 \ 0.742 4.000 \ 1.350 1.000 \ 0.640 0.710 25 2.000 \ 0.543 3.000 \ 0.620 4.000 \ 1.152 1.000 \ 0.526 0.626 30 2.000 \ 0.468 3.000 \ 0.537 4.000 \ 1.010 1.000 \ 0.448 0.562 35 2.000 \ 0.413 3.000 \ 0.474 4.000 \ 0.903 1.000 \ 0.391 0.512 40 2.000 \ 0.372 3.000 \ 0.428 4.000 \ 0.820 1.000 \ 0.349 0.471 45 2.000 \ 0.336 3.000 \ 0.388 4.000 \ 0.750 1.000 \ 0.314 0.436 50 2.000 \ 0.309 3.000 \ 0.357 4.000 \ 0.695 1.000 \ 0.287 0.408 100 2.000 \ 0.176 3.000 \ 0.205 4.000 \ 0.411 1.000 \ 0.156 0.255 200 2.000 \ 0.102 3.000 \ 0.119 4.000 \ 0.241 1.000 \ 0.009 0.232 500 2.000 \ 0.005 3.000 \ 0.006 4.000 \ 0.119 1.000 \ 0.004 0.115 1000 2.000 \ 0.003 3.000 \ 0.004 4.000 \ 0.007 1.000 \ 0.003 0.004

Magnitude of Deviation (RMS) n Rank \ Value Range B T V R 5 4.000 \ 0.118 3.000 \ 0.115 2.000 \ 0.114 1.000 \ 0.009 0.109 10 4.000 \ 0.002 3.000 \ 0.007 2.000 \ 0.007 1.000 \ 0.006 0.005 15 3.000 \ 0.002 4.000 \ 0.002 1.500 \ 0.002 1.500 \ 0.002 0.000 20 3.000 \ 0.002 4.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.001 0.001 25 3.000 \ 0.002 4.000 \ 0.003 2.000 \ 0.002 1.000 \ 0.002 0.000 30 3.000 \ 0.002 4.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.002 0.000 35 3.000 \ 0.002 4.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.001 0.001 40 3.000 \ 0.001 4.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.001 0.000 45 3.000 \ 0.001 4.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.001 0.000 50 4.000 \ 0.001 3.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.001 0.000 100 4.000 \ 0.001 3.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.001 0.000 200 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 1.000 \ 0.000 0.000 500 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 1.000 \ 0.000 0.000 1000 3.000 \ 0.000 4.000 \ 0.000 2.000 \ 0.000 1.000 \ 0.000 0.000

59

Table 16 Digit Preference—Accuracy of T Scores on Skewness

Deviation from Target (0) n Rank \ Value Range B T V R 5 3.414 \ 0.000 2.817 \ 0.000 1.580 \ 0.000 2.189 \ 0.000 0.000 10 3.998 \ 0.000 2.000 \ 0.000 1.003 \ 0.000 2.999 \ 0.000 0.000 15 3.998 \ 0.002 2.000 \ 0.002 1.003 \ 0.000 2.999 \ 0.002 0.002 20 1.088 \ 0.010 3.000 \ 0.100 4.000 \ 0.109 1.912 \ 0.010 0.099 25 2.002 \ 0.135 2.999 \ 0.136 3.996 \ 0.141 1.002 \ 0.135 0.006 30 2.004 \ 0.122 2.998 \ 0.123 3.997 \ 0.127 1.000 \ 0.122 0.005 35 2.001 \ 0.008 2.999 \ 0.008 3.997 \ 0.008 1.003 \ 0.008 0.000 40 2.020 \ 0.003 2.973 \ 0.003 3.938 \ 0.003 1.069 \ 0.003 0.000 45 2.336 \ 0.003 2.732 \ 0.003 3.184 \ 0.003 1.748 \ 0.003 0.000 50 2.001 \ 0.009 2.983 \ 0.009 3.995 \ 0.010 1.021 \ 0.009 0.001 100 2.958 \ 0.003 2.045 \ 0.003 1.128 \ 0.003 3.869 \ 0.003 0.000 200 2.924 \ 0.003 2.055 \ 0.003 1.169 \ 0.003 3.851 \ 0.003 0.000 500 2.839 \ 0.003 2.094 \ 0.003 1.287 \ 0.003 3.781 \ 0.003 0.000 1000 2.883 \ 0.001 2.110 \ 0.001 1.345 \ 0.001 3.662 \ 0.001 0.000

Magnitude of Deviation (RMS) n Rank \ Value Range B T V R 5 4.000 \ 0.001 3.000 \ 0.001 1.000 \ 0.001 2.000 \ 0.001 0.000 10 4.000 \ 0.313 2.000 \ 0.311 1.000 \ 0.298 3.000 \ 0.311 0.015 15 4.000 \ 0.562 2.000 \ 0.561 1.000 \ 0.557 3.000 \ 0.561 0.005 20 2.000 \ 0.371 3.000 \ 0.372 4.000 \ 0.385 1.000 \ 0.371 0.014 25 4.000 \ 0.533 2.000 \ 0.530 1.000 \ 0.518 3.000 \ 0.532 0.015 30 3.000 \ 0.520 2.000 \ 0.520 1.000 \ 0.520 4.000 \ 0.520 0.000 35 3.000 \ 0.420 2.000 \ 0.418 1.000 \ 0.410 4.000 \ 0.420 0.010 40 3.000 \ 0.486 2.000 \ 0.485 1.000 \ 0.477 4.000 \ 0.486 0.009 45 3.000 \ 0.587 2.000 \ 0.586 1.000 \ 0.574 4.000 \ 0.588 0.014 50 3.000 \ 0.620 2.000 \ 0.618 1.000 \ 0.606 4.000 \ 0.621 0.015 100 3.000 \ 0.564 2.000 \ 0.564 1.000 \ 0.563 4.000 \ 0.564 0.001 200 2.000 \ 0.553 3.000 \ 0.553 4.000 \ 0.554 1.000 \ 0.553 0.001 500 2.000 \ 0.552 3.000 \ 0.552 4.000 \ 0.553 1.000 \ 0.552 0.001 1000 3.000 \ 0.539 2.000 \ 0.539 1.000 \ 0.539 4.000 \ 0.539 0.000

60

Table 17 Digit Preference—Accuracy of T Scores on Kurtosis

Deviation from Target (3) n Rank \ Value Range B T V R 5 1.000 \ 1.153 2.000 \ 1.160 4.000 \ 1.200 3.000 \ 1.162 0.047 10 1.000 \ 1.057 3.000 \ 1.062 4.000 \ 1.095 2.000 \ 0.061 1.034 15 1.000 \ 1.072 3.000 \ 1.076 4.000 \ 1.099 2.000 \ 1.074 0.027 20 1.000 \ 1.082 3.000 \ 1.085 4.000 \ 1.102 2.000 \ 1.083 0.020 25 1.949 \ 1.054 3.000 \ 1.056 4.000 \ 1.073 1.051 \ 1.054 0.019 30 2.000 \ 1.020 3.000 \ 1.023 4.000 \ 1.039 1.000 \ 1.020 0.019 35 2.000 \ 0.976 3.000 \ 0.978 4.000 \ 0.993 1.000 \ 0.975 0.018 40 2.000 \ 0.947 3.000 \ 0.949 4.000 \ 0.964 1.000 \ 0.947 0.017 45 2.000 \ 0.956 3.000 \ 0.958 4.000 \ 0.971 1.000 \ 0.955 0.016 50 2.000 \ 0.969 3.000 \ 0.971 4.000 \ 0.983 1.000 \ 0.968 0.015 100 2.000 \ 0.955 3.000 \ 0.958 4.000 \ 0.974 1.000 \ 0.954 0.020 200 2.000 \ 0.946 3.000 \ 0.940 4.000 \ 0.959 1.000 \ 0.945 0.019 500 2.000 \ 0.940 3.000 \ 0.941 4.000 \ 0.947 1.000 \ 0.939 0.008 1000 2.000 \ 1.083 3.000 \ 1.083 3.999 \ 1.084 1.001 \ 1.083 0.001

Magnitude of Deviation (RMS) n Rank \ Value Range B T V R 5 4.000 \ 0.121 3.000 \ 0.121 1.000 \ 0.117 2.000 \ 0.121 0.004 10 1.000 \ 0.328 3.000 \ 0.329 4.000 \ 0.333 2.000 \ 0.329 0.005 15 1.000 \ 0.321 2.000 \ 0.321 4.000 \ 0.323 3.000 \ 0.321 0.002 20 1.000 \ 0.344 3.000 \ 0.345 4.000 \ 0.354 2.000 \ 0.344 0.010 25 4.000 \ 0.433 2.000 \ 0.431 1.000 \ 0.425 3.000 \ 0.432 0.008 30 2.000 \ 0.467 3.000 \ 0.467 4.000 \ 0.472 1.000 \ 0.466 0.006 35 4.000 \ 0.355 3.000 \ 0.355 1.000 \ 0.355 2.000 \ 0.355 0.000 40 3.000 \ 0.433 2.000 \ 0.433 1.000 \ 0.432 4.000 \ 0.434 0.001 45 4.000 \ 0.459 2.000 \ 0.459 1.000 \ 0.458 3.000 \ 0.459 0.001 50 3.000 \ 0.496 2.000 \ 0.496 1.000 \ 0.492 4.000 \ 0.497 0.005 100 2.000 \ 0.481 3.000 \ 0.481 4.000 \ 0.481 1.000 \ 0.481 0.000 200 2.000 \ 0.480 3.000 \ 0.480 4.000 \ 0.480 1.000 \ 0.480 0.000 500 2.000 \ 0.476 3.000 \ 0.476 4.000 \ 0.476 1.000 \ 0.476 0.000 1000 2.000 \ 0.472 3.000 \ 0.472 4.000 \ 0.472 1.000 \ 0.472 0.000

61

Table 18 Multimodal Lumpy—Accuracy of T Scores on Means

Deviation from Target (50) n Rank \ Value B T V R 5 1.506 \ 0.000 1.383 \ 0.000 1.615 \ 0.000 1.300 \ 0.000 10 1.824 \ 0.000 1.619 \ 0.000 1.677 \ 0.000 1.667 \ 0.000 15 1.809 \ 0.000 1.842 \ 0.000 1.716 \ 0.000 1.839 \ 0.000 20 1.878 \ 0.000 1.786 \ 0.000 1.877 \ 0.000 1.948 \ 0.000 25 1.724 \ 0.000 1.876 \ 0.000 1.799 \ 0.000 1.850 \ 0.000 30 1.863 \ 0.000 2.072 \ 0.000 1.955 \ 0.000 1.750 \ 0.000 35 1.957 \ 0.000 2.054 \ 0.000 1.944 \ 0.000 2.050 \ 0.000 40 1.962 \ 0.000 2.014 \ 0.000 2.065 \ 0.000 1.967 \ 0.000 45 1.845 \ 0.000 1.938 \ 0.000 1.819 \ 0.000 1.905 \ 0.000 50 2.032 \ 0.000 1.988 \ 0.000 1.943 \ 0.000 1.937 \ 0.000 100 2.103 \ 0.000 2.170 \ 0.000 2.120 \ 0.000 2.090 \ 0.000 200 2.247 \ 0.000 2.268 \ 0.000 2.219 \ 0.000 2.265 \ 0.000 500 2.381 \ 0.000 2.413 \ 0.000 2.443 \ 0.000 2.401 \ 0.000 1000 2.460 \ 0.000 2.457 \ 0.000 2.443 \ 0.000 2.444 \ 0.000

Magnitude of Deviation (RMS) n Rank \ Value B T V R 5 1.500 \ 0.000 1.500 \ 0.000 3.500 \ 0.000 3.500 \ 0.000 10 1.000 \ 0.000 2.500 \ 0.000 2.500 \ 0.000 4.000 \ 0.000 15 1.000 \ 0.000 3.000 \ 0.000 3.000 \ 0.000 3.000 \ 0.000 20 1.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000 25 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 30 1.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000 35 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 40 1.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 2.000 \ 0.000 45 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 50 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 100 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 200 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 500 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 1000 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000

62

Table 19 Multimodal Lumpy—Accuracy of T Scores on Standard Deviations

Deviation from Target (10) n Rank \ Value Range B T V R 5 1.000 \ 1.963 3.000 \ 2.154 4.000 \ 3.351 2.000 \ 2.065 1.388 10 1.000 \ 1.123 3.000 \ 1.258 4.000 \ 2.151 2.000 \ 1.146 1.028 15 2.000 \ 0.810 3.000 \ 0.917 4.000 \ 1.636 1.000 \ 0.809 0.827 20 2.000 \ 0.643 3.000 \ 0.733 4.000 \ 1.341 1.000 \ 0.631 0.710 25 2.000 \ 0.537 3.000 \ 0.615 4.000 \ 1.145 1.000 \ 0.520 0.625 30 2.000 \ 0.464 3.000 \ 0.533 4.000 \ 1.005 1.000 \ 0.444 0.561 35 2.000 \ 0.410 3.000 \ 0.471 4.000 \ 0.898 1.000 \ 0.388 0.510 40 2.000 \ 0.368 3.000 \ 0.424 4.000 \ 0.815 1.000 \ 0.346 0.469 45 2.000 \ 0.334 3.000 \ 0.386 4.000 \ 0.747 1.000 \ 0.311 0.436 50 2.000 \ 0.307 3.000 \ 0.355 4.000 \ 0.691 1.000 \ 0.285 0.406 100 2.000 \ 0.178 3.000 \ 0.206 4.000 \ 0.411 1.000 \ 0.158 0.253 200 2.000 \ 0.105 3.000 \ 0.122 4.000 \ 0.242 1.000 \ 0.009 0.233 500 2.000 \ 0.006 3.000 \ 0.006 4.000 \ 0.123 1.000 \ 0.005 0.118 1000 2.000 \ 0.004 3.000 \ 0.004 4.000 \ 0.008 1.000 \ 0.004 0.004

Magnitude of Deviation (RMS) n Rank \ Value Range B T V R 5 4.000 \ 0.118 3.000 \ 0.115 2.000 \ 0.114 1.000 \ 0.009 0.109 10 4.000 \ 0.003 3.000 \ 0.003 2.000 \ 0.003 1.000 \ 0.002 0.001 15 3.000 \ 0.002 4.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.002 0.000 20 3.000 \ 0.002 4.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.001 0.001 25 3.000 \ 0.002 4.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.002 0.000 30 3.000 \ 0.002 4.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.001 0.001 35 3.000 \ 0.001 4.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.001 0.000 40 3.000 \ 0.001 4.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.001 0.000 45 4.000 \ 0.002 3.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.001 0.001 50 4.000 \ 0.001 3.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.001 0.000 100 4.000 \ 0.001 3.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.001 0.000 200 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 1.000 \ 0.000 0.000 500 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 1.000 \ 0.000 0.000 1000 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 1.000 \ 0.000 0.000

63

Table 20 Multimodal Lumpy—Accuracy of T Scores on Skewness

Deviation from Target (0) n Rank \ Value Range B T V R 5 3.989 \ 0.000 2.996 \ 0.000 1.012 \ 0.000 2.003 \ 0.000 0.000 10 1.399 \ 0.003 2.912 \ 0.003 3.742 \ 0.003 1.948 \ 0.003 0.000 15 1.000 \ 0.009 3.000 \ 0.009 4.000 \ 0.009 2.000 \ 0.009 0.000 20 1.565 \ 0.128 2.999 \ 0.129 3.996 \ 0.134 1.439 \ 0.128 0.006 25 1.994 \ 0.121 2.999 \ 0.122 3.998 \ 0.128 1.009 \ 0.121 0.007 30 2.005 \ 0.006 2.998 \ 0.006 3.993 \ 0.007 1.004 \ 0.006 0.001 35 2.853 \ 0.118 2.104 \ 0.115 1.429 \ 0.009 3.614 \ 0.118 0.109 40 3.024 \ 0.120 2.001 \ 0.118 1.004 \ 0.100 3.972 \ 0.121 0.021 45 2.191 \ 0.005 2.735 \ 0.005 3.402 \ 0.006 1.673 \ 0.005 0.001 50 2.001 \ 0.144 2.999 \ 0.145 3.998 \ 0.149 1.002 \ 0.144 0.005 100 2.928 \ 0.003 2.054 \ 0.003 1.168 \ 0.003 3.851 \ 0.003 0.000 200 2.925 \ 0.003 2.062 \ 0.003 1.186 \ 0.003 3.827 \ 0.003 0.000 500 2.917 \ 0.003 2.079 \ 0.003 1.207 \ 0.003 3.797 \ 0.003 0.000 1000 2.884 \ 0.003 2.104 \ 0.003 1.286 \ 0.003 3.726 \ 0.003 0.000

Magnitude of Deviation (RMS) n Rank \ Value Range B T V R 5 4.000 \ 0.010 3.000 \ 0.009 1.000 \ 0.009 2.000 \ 0.009 0.001 10 4.000 \ 0.572 2.000 \ 0.567 1.000 \ 0.534 3.000 \ 0.568 0.038 15 4.000 \ 0.490 2.000 \ 0.486 1.000 \ 0.461 3.000 \ 0.489 0.029 20 4.000 \ 0.684 2.000 \ 0.682 1.000 \ 0.672 3.000 \ 0.684 0.012 25 4.000 \ 0.566 2.000 \ 0.564 1.000 \ 0.553 3.000 \ 0.566 0.013 30 2.000 \ 0.399 3.000 \ 0.399 4.000 \ 0.401 1.000 \ 0.399 0.002 35 2.000 \ 0.477 3.000 \ 0.478 4.000 \ 0.482 1.000 \ 0.477 0.005 40 2.000 \ 0.588 3.000 \ 0.589 4.000 \ 0.596 1.000 \ 0.588 0.008 45 2.000 \ 0.448 3.000 \ 0.448 4.000 \ 0.449 1.000 \ 0.447 0.002 50 2.000 \ 0.484 3.000 \ 0.484 4.000 \ 0.485 1.000 \ 0.480 0.005 100 3.000 \ 0.559 2.000 \ 0.559 1.000 \ 0.559 4.000 \ 0.559 0.000 200 2.000 \ 0.556 3.000 \ 0.556 4.000 \ 0.557 1.000 \ 0.556 0.001 500 2.000 \ 0.550 3.000 \ 0.550 4.000 \ 0.550 1.000 \ 0.550 0.000 1000 2.000 \ 0.561 3.000 \ 0.561 4.000 \ 0.561 1.000 \ 0.561 0.000

64

Table 21 Multimodal Lumpy—Accuracy of T Scores on Kurtosis

Deviation from Target (3) n Rank \ Value Range B T V R 5 1.000 \ 1.150 2.000 \ 1.157 4.000 \ 1.197 3.000 \ 1.158 0.047 10 1.000 \ 1.043 3.000 \ 1.047 4.000 \ 1.077 2.000 \ 1.046 0.034 15 1.000 \ 1.022 3.000 \ 1.026 4.000 \ 1.050 2.000 \ 1.024 0.028 20 1.000 \ 1.042 3.000 \ 1.045 4.000 \ 1.064 2.000 \ 1.043 0.022 25 1.999 \ 1.083 3.000 \ 1.086 4.000 \ 1.101 1.001 \ 1.083 0.018 30 2.000 \ 1.108 3.000 \ 1.110 4.000 \ 1.123 1.000 \ 1.107 0.016 35 2.000 \ 1.049 3.000 \ 1.052 4.000 \ 1.070 1.000 \ 1.048 0.022 40 2.000 \ 1.021 3.000 \ 1.024 4.000 \ 1.044 1.000 \ 1.020 0.024 45 2.000 \ 1.105 3.000 \ 1.107 4.000 \ 1.117 1.000 \ 1.105 0.012 50 2.000 \ 1.072 3.000 \ 1.074 3.999 \ 1.084 1.001 \ 1.072 0.012 100 2.000 \ 0.960 3.000 \ 0.962 4.000 \ 0.978 1.000 \ 0.958 0.020 200 2.000 \ 0.950 3.000 \ 0.952 4.000 \ 0.962 1.000 \ 0.949 0.013 500 2.000 \ 0.943 3.000 \ 0.944 4.000 \ 0.950 1.000 \ 0.942 0.008 1000 2.000 \ 0.942 3.000 \ 0.942 4.000 \ 0.942 1.000 \ 0.941 0.001

Magnitude of Deviation (RMS) n Rank \ Value Range B T V R 5 4.000 \ 0.121 3.000 \ 0.121 1.000 \ 0.117 2.000 \ 0.121 0.004 10 4.000 \ 0.410 2.000 \ 0.403 1.000 \ 0.390 3.000 \ 0.403 0.020 15 4.000 \ 0.426 2.000 \ 0.424 1.000 \ 0.415 3.000 \ 0.425 0.011 20 1.000 \ 0.480 3.000 \ 0.481 4.000 \ 0.485 2.000 \ 0.480 0.005 25 2.000 \ 0.416 3.000 \ 0.416 4.000 \ 0.420 1.000 \ 0.416 0.004 30 2.000 \ 0.348 3.000 \ 0.348 4.000 \ 0.355 1.000 \ 0.347 0.008 35 2.000 \ 0.435 3.000 \ 0.435 4.000 \ 0.439 1.000 \ 0.435 0.004 40 2.000 \ 0.463 3.000 \ 0.464 4.000 \ 0.473 1.000 \ 0.463 0.010 45 2.000 \ 0.399 3.000 \ 0.399 4.000 \ 0.400 1.000 \ 0.399 0.001 50 2.000 \ 0.432 3.000 \ 0.432 4.000 \ 0.434 1.000 \ 0.432 0.002 100 1.000 \ 0.472 3.000 \ 0.472 4.000 \ 0.472 2.000 \ 0.472 0.000 200 2.000 \ 0.475 3.000 \ 0.475 4.000 \ 0.475 1.000 \ 0.474 0.001 500 2.000 \ 0.471 3.000 \ 0.471 4.000 \ 0.471 1.000 \ 0.471 0.000 1000 2.000 \ 0.478 3.000 \ 0.478 4.000 \ 0.478 1.000 \ 0.478 0.000

65

Table 22 Mass at Zero with Gap—Accuracy of T Scores on Means

Deviation from Target (50) n Rank \ Value B T V R 5 1.190 \ 0.000 1.342 \ 0.000 1.561 \ 0.000 1.001 \ 0.000 10 1.989 \ 0.000 1.520 \ 0.000 2.024 \ 0.000 2.105 \ 0.000 15 1.619 \ 0.000 2.125 \ 0.000 2.031 \ 0.000 1.882 \ 0.000 20 1.535 \ 0.000 1.665 \ 0.000 2.191 \ 0.000 2.268 \ 0.000 25 2.016 \ 0.000 1.550 \ 0.000 1.807 \ 0.000 2.747 \ 0.000 30 2.103 \ 0.000 2.111 \ 0.000 2.396 \ 0.000 2.036 \ 0.000 35 2.618 \ 0.000 1.833 \ 0.000 2.457 \ 0.000 1.926 \ 0.000 40 2.503 \ 0.000 1.926 \ 0.000 1.804 \ 0.000 2.093 \ 0.000 45 2.514 \ 0.000 2.011 \ 0.000 1.863 \ 0.000 2.242 \ 0.000 50 2.410 \ 0.000 2.026 \ 0.000 1.932 \ 0.000 2.659 \ 0.000 100 2.350 \ 0.000 2.218 \ 0.000 2.594 \ 0.000 2.267 \ 0.000 200 2.352 \ 0.000 2.365 \ 0.000 2.273 \ 0.000 2.695 \ 0.000 500 2.542 \ 0.000 2.393 \ 0.000 2.161 \ 0.000 2.814 \ 0.000 1000 2.538 \ 0.000 2.331 \ 0.000 2.547 \ 0.000 2.523 \ 0.000

Magnitude of Deviation (RMS) n Rank \ Value B T V R 5 2.000 \ 0.000 2.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000 10 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 15 2.500 \ 0.000 4.000 \ 0.000 2.500 \ 0.000 1.000 \ 0.000 20 3.000 \ 0.000 1.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000 25 1.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000 3.000 \ 0.000 30 3.000 \ 0.000 1.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000 35 3.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000 1.000 \ 0.000 40 2.000 \ 0.000 1.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 45 1.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 2.000 \ 0.000 50 1.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000 100 1.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000 200 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 500 2.000 \ 0.000 1.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 1000 2.000 \ 0.000 4.000 \ 0.000 3.000 \ 0.000 1.000 \ 0.000

66

Table 23 Mass at Zero with Gap—Accuracy of T Scores on Standard Deviations

Deviation from Target (10) n Rank \ Value Range B T V R 5 1.000 \ 6.281 2.186 \ 6.372 2.779 \ 6.937 1.593 \ 6.331 0.656 10 1.000 \ 4.589 2.721 \ 4.678 3.581 \ 5.255 1.860 \ 4.608 0.666 15 1.382 \ 3.898 2.909 \ 3.976 3.864 \ 4.491 1.572 \ 3.900 0.593 20 1.957 \ 3.255 2.987 \ 3.323 3.981 \ 3.786 1.036 \ 3.246 0.540 25 2.000 \ 2.976 3.000 \ 3.035 4.000 \ 3.444 1.000 \ 2.962 0.482 30 2.000 \ 2.940 3.000 \ 2.993 4.000 \ 3.358 1.000 \ 2.925 0.433 35 1.999 \ 3.215 2.999 \ 3.262 3.998 \ 3.589 1.000 \ 3.199 0.390 40 2.000 \ 3.163 2.999 \ 3.206 3.999 \ 3.506 1.000 \ 3.146 0.360 45 2.000 \ 3.113 3.000 \ 3.153 4.000 \ 3.430 1.000 \ 3.096 0.334 50 2.000 \ 3.078 3.000 \ 3.114 4.000 \ 3.373 1.000 \ 3.060 0.313 100 2.000 \ 2.909 3.000 \ 2.931 4.000 \ 3.089 1.000 \ 2.893 0.196 200 2.000 \ 2.802 3.000 \ 2.815 4.000 \ 2.909 1.000 \ 2.790 0.119 500 2.000 \ 2.759 3.000 \ 2.765 4.000 \ 2.811 1.000 \ 2.752 0.059 1000 1.957 \ 2.746 2.810 \ 2.750 4.000 \ 2.776 1.233 \ 2.743 0.033

Magnitude of Deviation (RMS) n Rank \ Value Range B T V R 5 4.000 \ 2.750 3.000 \ 2.717 2.000 \ 2.687 1.000 \ 2.283 0.467 10 4.000 \ 2.676 3.000 \ 2.668 2.000 \ 2.634 1.000 \ 2.362 0.314 15 3.000 \ 0.904 4.000 \ 0.906 2.000 \ 0.899 1.000 \ 0.867 0.037 20 4.000 \ 0.645 3.000 \ 0.644 2.000 \ 0.642 1.000 \ 0.626 0.019 25 4.000 \ 0.520 3.000 \ 0.519 2.000 \ 0.519 1.000 \ 0.511 0.009 30 4.000 \ 0.524 3.000 \ 0.523 2.000 \ 0.523 1.000 \ 0.517 0.007 35 4.000 \ 0.826 3.000 \ 0.824 2.000 \ 0.824 1.000 \ 0.814 0.012 40 4.000 \ 0.623 3.000 \ 0.623 2.000 \ 0.622 1.000 \ 0.617 0.006 45 4.000 \ 0.681 3.000 \ 0.680 2.000 \ 0.680 1.000 \ 0.677 0.004 50 4.000 \ 0.662 3.000 \ 0.661 2.000 \ 0.661 1.000 \ 0.658 0.004 100 1.000 \ 0.348 2.000 \ 0.348 4.000 \ 0.348 3.000 \ 0.348 0.000 200 1.000 \ 0.264 2.000 \ 0.265 3.000 \ 0.265 4.000 \ 0.265 0.001 500 1.000 \ 0.169 2.000 \ 0.169 3.000 \ 0.169 4.000 \ 0.170 0.001 1000 1.000 \ 0.119 2.000 \ 0.119 4.000 \ 0.119 3.000 \ 0.119 0.000

67

Table 24 Mass at Zero with Gap—Accuracy of T Scores on Skewness

Deviation from Target (0) n Rank \ Value Range B T V R 5 4.000 \ 0.719 3.000 \ 0.718 1.000 \ 0.715 2.000 \ 0.718 0.004 10 4.000 \ 0.688 2.000 \ 0.687 1.000 \ 0.684 3.000 \ 0.687 0.004 15 3.999 \ 0.675 2.000 \ 0.675 1.000 \ 0.672 3.000 \ 0.675 0.003 20 4.000 \ 0.761 2.000 \ 0.760 1.000 \ 0.753 3.000 \ 0.761 0.008 25 3.998 \ 0.774 2.000 \ 0.772 1.000 \ 0.764 3.002 \ 0.774 0.010 30 3.056 \ 0.808 2.000 \ 0.806 1.000 \ 0.794 3.944 \ 0.808 0.014 35 2.999 \ 0.692 2.000 \ 0.691 1.001 \ 0.689 4.000 \ 0.692 0.003 40 3.002 \ 0.676 2.000 \ 0.676 1.001 \ 0.674 3.997 \ 0.676 0.002 45 2.999 \ 0.676 1.999 \ 0.676 1.002 \ 0.674 3.999 \ 0.676 0.002 50 2.997 \ 0.694 2.003 \ 0.694 1.001 \ 0.692 3.997 \ 0.695 0.003 100 3.000 \ 0.701 2.000 \ 0.701 1.000 \ 0.699 4.000 \ 0.701 0.002 200 3.000 \ 0.747 2.000 \ 0.747 1.000 \ 0.743 4.000 \ 0.748 0.005 500 3.000 \ 0.749 2.000 \ 0.749 1.000 \ 0.746 4.000 \ 0.749 0.003 1000 2.999 \ 0.701 2.000 \ 0.700 1.001 \ 0.700 3.999 \ 0.701 0.001

Magnitude of Deviation (RMS) n Rank \ Value Range B T V R 5 4.000 \ 0.491 3.000 \ 0.491 1.000 \ 0.491 2.000 \ 0.491 0.000 10 4.000 \ 0.575 2.000 \ 0.574 1.000 \ 0.573 3.000 \ 0.574 0.002 15 1.000 \ 0.676 3.000 \ 0.676 4.000 \ 0.677 2.000 \ 0.676 0.001 20 1.000 \ 0.692 3.000 \ 0.692 4.000 \ 0.693 2.000 \ 0.692 0.001 25 2.000 \ 0.626 3.000 \ 0.626 4.000 \ 0.632 1.000 \ 0.625 0.007 30 1.000 \ 0.751 3.000 \ 0.753 4.000 \ 0.764 2.000 \ 0.751 0.002 35 2.000 \ 0.684 3.000 \ 0.685 4.000 \ 0.686 1.000 \ 0.684 0.002 40 2.000 \ 0.658 3.000 \ 0.658 4.000 \ 0.659 1.000 \ 0.658 0.001 45 2.000 \ 0.686 3.000 \ 0.686 4.000 \ 0.687 1.000 \ 0.686 0.001 50 2.000 \ 0.693 3.000 \ 0.693 4.000 \ 0.694 1.000 \ 0.693 0.001 100 2.000 \ 0.714 3.000 \ 0.714 4.000 \ 0.714 1.000 \ 0.714 0.000 200 2.000 \ 0.681 3.000 \ 0.682 4.000 \ 0.685 1.000 \ 0.681 0.004 500 2.000 \ 0.691 3.000 \ 0.691 4.000 \ 0.693 1.000 \ 0.691 0.002 1000 2.000 \ 0.715 3.000 \ 0.715 4.000 \ 0.715 1.000 \ 0.715 0.000

68

Table 25 Mass at Zero with Gap—Accuracy of T Scores on Kurtosis

Deviation from Target (3) n Rank \ Value Range B T V R 5 1.000 \ 0.144 2.000 \ 0.145 4.000 \ 0.150 3.000 \ 0.145 0.006 10 1.000 \ 0.183 2.500 \ 0.184 4.000 \ 0.190 2.500 \ 0.184 0.007 15 1.001 \ 0.199 3.000 \ 0.200 3.999 \ 0.206 2.000 \ 0.200 0.007 20 1.001 \ 0.517 3.000 \ 0.518 3.999 \ 0.528 2.000 \ 0.517 0.011 25 1.000 \ 0.846 3.000 \ 0.848 4.000 \ 0.858 2.000 \ 0.846 0.012 30 1.742 \ 0.779 3.000 \ 0.782 4.000 \ 0.797 1.258 \ 0.780 0.018 35 2.000 \ 0.180 2.999 \ 0.181 4.000 \ 0.185 1.000 \ 0.180 0.005 40 1.997 \ 0.202 3.000 \ 0.203 3.999 \ 0.206 1.003 \ 0.202 0.004 45 2.000 \ 0.199 3.000 \ 0.200 3.997 \ 0.203 1.002 \ 0.199 0.004 50 2.001 \ 0.172 2.999 \ 0.172 3.998 \ 0.175 1.002 \ 0.171 0.004 100 2.000 \ 0.163 3.000 \ 0.164 3.998 \ 0.166 1.001 \ 0.163 0.003 200 2.000 \ 0.423 3.000 \ 0.424 4.000 \ 0.430 1.000 \ 0.423 0.007 500 2.000 \ 0.421 3.000 \ 0.422 4.000 \ 0.425 1.000 \ 0.421 0.004 1000 2.000 \ 0.164 2.999 \ 0.164 3.999 \ 0.164 1.002 \ 0.164 0.000

Magnitude of Deviation (RMS) n Rank \ Value Range B T V R 5 1.000 \ 0.807 2.000 \ 0.809 4.000 \ 0.819 3.000 \ 0.809 0.012 10 1.000 \ 0.675 3.000 \ 0.679 4.000 \ 0.705 2.000 \ 0.678 0.030 15 1.000 \ 0.667 3.000 \ 0.670 4.000 \ 0.691 2.000 \ 0.669 0.024 20 1.000 \ 0.615 3.000 \ 0.617 4.000 \ 0.628 2.000 \ 0.615 0.013 25 2.000 \ 0.945 3.000 \ 0.945 4.000 \ 0.951 1.000 \ 0.945 0.006 30 1.000 \ 0.900 3.000 \ 0.902 4.000 \ 0.913 2.000 \ 0.900 0.013 35 2.000 \ 0.740 3.000 \ 0.740 4.000 \ 0.745 1.000 \ 0.740 0.005 40 2.000 \ 0.754 3.000 \ 0.755 4.000 \ 0.756 1.000 \ 0.754 0.002 45 2.000 \ 0.715 3.000 \ 0.716 4.000 \ 0.724 1.000 \ 0.715 0.009 50 2.000 \ 0.611 3.000 \ 0.613 4.000 \ 0.622 1.000 \ 0.611 0.011 100 2.000 \ 0.687 3.000 \ 0.688 4.000 \ 0.691 1.000 \ 0.687 0.004 200 2.000 \ 0.940 3.000 \ 0.940 4.000 \ 0.942 1.000 \ 0.940 0.002 500 2.000 \ 0.945 3.000 \ 0.945 4.000 \ 0.946 1.000 \ 0.945 0.001 1000 2.000 \ 0.657 3.000 \ 0.657 4.000 \ 0.657 1.000 \ 0.656 0.001

69

Table 26 Extreme Asymmetric, Decay—Accuracy of T Scores on Means

Deviation from Target (50) n Rank \ Value B T V R 5 1.872 \ 0.000 1.520 \ 0.000 1.575 \ 0.000 1.032 \ 0.000 10 1.748 \ 0.000 1.604 \ 0.000 1.930 \ 0.000 2.158 \ 0.000 15 1.703 \ 0.000 1.945 \ 0.000 1.967 \ 0.000 1.854 \ 0.000 20 2.455 \ 0.000 1.720 \ 0.000 2.076 \ 0.000 1.546 \ 0.000 25 1.918 \ 0.000 1.684 \ 0.000 1.823 \ 0.000 2.114 \ 0.000 30 2.150 \ 0.000 1.895 \ 0.000 2.018 \ 0.000 1.992 \ 0.000 35 2.203 \ 0.000 2.284 \ 0.000 2.063 \ 0.000 2.092 \ 0.000 40 1.705 \ 0.000 2.564 \ 0.000 1.867 \ 0.000 2.313 \ 0.000 45 2.109 \ 0.000 2.300 \ 0.000 1.912 \ 0.000 1.831 \ 0.000 50 1.890 \ 0.000 2.431 \ 0.000 2.297 \ 0.000 1.955 \ 0.000 100 2.272 \ 0.000 2.329 \ 0.000 2.166 \ 0.000 2.466 \ 0.000 200 2.503 \ 0.000 2.275 \ 0.000 2.392 \ 0.000 2.389 \ 0.000 500 2.293 \ 0.000 2.640 \ 0.000 2.485 \ 0.000 2.444 \ 0.000 1000 2.528 \ 0.000 2.582 \ 0.000 2.447 \ 0.000 2.379 \ 0.000

Magnitude of Deviation (RMS) n Rank \ Value B T V R 5 2.000 \ 0.000 2.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000 10 1.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 2.000 \ 0.000 15 3.000 \ 0.000 4.000 \ 0.000 1.000 \ 0.000 2.000 \ 0.000 20 3.000 \ 0.000 4.000 \ 0.000 1.500 \ 0.000 1.500 \ 0.000 25 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 30 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 35 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 40 3.000 \ 0.000 2.000 \ 0.000 1.000 \ 0.000 4.000 \ 0.000 45 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 50 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 100 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 200 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 500 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 1000 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000

70

Table 27 Extreme Asymmetric, Decay—Accuracy of T Scores on Standard Deviations

Deviation from Target (10) n Rank \ Value Range B T V R 5 1.000 \ 3.410 2.873 \ 3.566 3.810 \ 4.547 1.937 \ 3.494 1.137 10 1.000 \ 2.171 2.995 \ 2.287 3.993 \ 3.059 1.998 \ 2.190 0.888 15 1.974 \ 1.801 3.000 \ 1.892 4.000 \ 2.511 1.026 \ 1.798 0.713 20 2.000 \ 1.611 3.000 \ 1.687 4.000 \ 2.207 1.000 \ 1.599 0.608 25 2.000 \ 1.495 3.000 \ 1.560 4.000 \ 2.011 1.000 \ 1.480 0.531 30 2.000 \ 1.319 3.000 \ 1.376 4.000 \ 1.779 1.000 \ 1.301 0.478 35 2.000 \ 1.260 3.000 \ 1.312 4.000 \ 1.674 1.000 \ 1.241 0.433 40 2.000 \ 1.228 3.000 \ 1.275 4.000 \ 1.604 1.000 \ 1.209 0.395 45 2.000 \ 1.203 3.000 \ 1.246 4.000 \ 1.548 1.000 \ 1.184 0.364 50 2.000 \ 1.184 3.000 \ 1.224 4.000 \ 1.504 1.000 \ 1.165 0.339 100 2.000 \ 1.129 3.000 \ 1.152 4.000 \ 1.317 1.000 \ 1.114 0.203 200 2.000 \ 1.055 3.000 \ 1.068 4.000 \ 1.162 1.000 \ 1.044 0.118 500 2.000 \ 1.017 3.000 \ 1.022 4.000 \ 1.066 1.000 \ 1.010 0.056 1000 2.022 \ 1.003 2.911 \ 1.006 4.000 \ 1.030 1.067 \ 1.001 0.029

Magnitude of Deviation (RMS) n Rank \ Value Range B T V R 5 4.000 \ 0.307 3.000 \ 0.303 2.000 \ 0.300 1.000 \ 0.257 0.050 10 4.000 \ 0.460 3.000 \ 0.459 2.000 \ 0.454 1.000 \ 0.414 0.046 15 4.000 \ 0.365 3.000 \ 0.365 2.000 \ 0.362 1.000 \ 0.340 0.025 20 4.000 \ 0.451 3.000 \ 0.450 2.000 \ 0.447 1.000 \ 0.428 0.023 25 4.000 \ 0.301 3.000 \ 0.300 2.000 \ 0.299 1.000 \ 0.287 0.014 30 4.000 \ 0.283 3.000 \ 0.283 2.000 \ 0.282 1.000 \ 0.273 0.010 35 4.000 \ 0.251 3.000 \ 0.250 2.000 \ 0.249 1.000 \ 0.240 0.011 40 4.000 \ 0.251 3.000 \ 0.250 2.000 \ 0.249 1.000 \ 0.239 0.012 45 4.000 \ 0.215 3.000 \ 0.215 2.000 \ 0.214 1.000 \ 0.209 0.006 50 4.000 \ 0.215 3.000 \ 0.215 2.000 \ 0.214 1.000 \ 0.209 0.006 100 4.000 \ 0.176 3.000 \ 0.175 2.000 \ 0.175 1.000 \ 0.174 0.002 200 4.000 \ 0.121 3.000 \ 0.120 2.000 \ 0.120 1.000 \ 0.119 0.002 500 4.000 \ 0.008 3.000 \ 0.008 2.000 \ 0.008 1.000 \ 0.008 0.000 1000 2.000 \ 0.005 3.000 \ 0.005 4.000 \ 0.005 1.000 \ 0.005 0.000

71

Table 28 Extreme Asymmetric, Decay—Accuracy of T Scores on Skewness

Deviation from Target (0) n Rank \ Value Range B T V R 5 4.000 \ 0.668 3.000 \ 0.666 1.000 \ 0.653 2.000 \ 0.665 0.015 10 4.000 \ 0.647 2.000 \ 0.645 1.000 \ 0.635 3.000 \ 0.645 0.012 15 3.998 \ 0.635 2.000 \ 0.633 1.001 \ 0.625 3.001 \ 0.634 0.010 20 3.994 \ 0.605 2.000 \ 0.604 1.000 \ 0.597 3.006 \ 0.605 0.008 25 3.000 \ 0.578 2.000 \ 0.577 1.000 \ 0.571 3.999 \ 0.578 0.007 30 3.000 \ 0.323 2.000 \ 0.322 1.000 \ 0.315 4.000 \ 0.323 0.008 35 3.000 \ 0.235 2.000 \ 0.234 1.001 \ 0.226 3.999 \ 0.235 0.009 40 3.000 \ 0.156 2.000 \ 0.155 1.000 \ 0.146 4.000 \ 0.157 0.011 45 3.000 \ 0.101 2.001 \ 0.010 1.001 \ 0.010 4.000 \ 0.102 0.092 50 3.000 \ 0.009 2.000 \ 0.009 1.000 \ 0.008 3.999 \ 0.009 0.001 100 3.000 \ 0.591 2.000 \ 0.590 1.000 \ 0.588 4.000 \ 0.591 0.003 200 3.000 \ 0.501 2.000 \ 0.500 1.000 \ 0.493 4.000 \ 0.502 0.009 500 3.000 \ 0.505 2.000 \ 0.504 1.000 \ 0.501 4.000 \ 0.505 0.004 1000 3.000 \ 0.505 2.000 \ 0.505 1.000 \ 0.503 4.000 \ 0.506 0.003

Magnitude of Deviation (RMS) n Rank \ Value Range B T V R 5 4.000 \ 0.208 3.000 \ 0.208 1.000 \ 0.207 2.000 \ 0.208 0.001 10 1.000 \ 0.587 3.000 \ 0.588 4.000 \ 0.599 2.000 \ 0.588 0.012 15 4.000 \ 0.612 3.000 \ 0.612 1.000 \ 0.611 2.000 \ 0.612 0.001 20 3.000 \ 0.611 2.000 \ 0.611 4.000 \ 0.611 1.000 \ 0.611 0.000 25 2.000 \ 0.594 3.000 \ 0.594 4.000 \ 0.596 1.000 \ 0.594 0.002 30 2.000 \ 0.551 3.000 \ 0.551 4.000 \ 0.551 1.000 \ 0.551 0.000 35 3.000 \ 0.654 2.000 \ 0.653 1.000 \ 0.649 4.000 \ 0.654 0.005 40 4.000 \ 0.695 2.000 \ 0.694 1.000 \ 0.693 3.000 \ 0.695 0.002 45 3.000 \ 0.717 2.000 \ 0.717 1.000 \ 0.713 4.000 \ 0.717 0.004 50 2.000 \ 0.682 3.000 \ 0.682 4.000 \ 0.682 1.000 \ 0.682 0.000 100 2.000 \ 0.652 3.000 \ 0.652 4.000 \ 0.654 1.000 \ 0.652 0.002 200 2.000 \ 0.562 3.000 \ 0.562 4.000 \ 0.566 1.000 \ 0.561 0.005 500 2.000 \ 0.573 3.000 \ 0.573 4.000 \ 0.576 1.000 \ 0.573 0.003 1000 2.000 \ 0.571 3.000 \ 0.571 4.000 \ 0.573 1.000 \ 0.571 0.002

72

Table 29 Extreme Asymmetric, Decay—Accuracy of T Scores on Kurtosis

Deviation from Target (3) n Rank \ Value Range B T V R 5 1.000 \ 0.857 2.000 \ 0.861 4.000 \ 0.886 3.000 \ 0.862 0.029 10 1.000 \ 0.848 3.000 \ 0.851 4.000 \ 0.865 2.000 \ 0.850 0.017 15 1.001 \ 0.851 3.000 \ 0.853 3.999 \ 0.864 2.000 \ 0.852 0.013 20 1.000 \ 0.877 3.000 \ 0.878 4.000 \ 0.887 2.000 \ 0.877 0.010 25 2.000 \ 0.903 3.000 \ 0.904 4.000 \ 0.912 1.005 \ 0.903 0.009 30 2.000 \ 1.047 3.000 \ 1.048 4.000 \ 1.055 1.000 \ 1.046 0.009 35 2.000 \ 1.119 3.000 \ 1.120 4.000 \ 1.127 1.000 \ 1.119 0.008 40 2.000 \ 1.171 3.000 \ 1.172 4.000 \ 1.180 1.000 \ 1.171 0.009 45 2.000 \ 1.190 3.000 \ 1.191 3.999 \ 1.198 1.001 \ 1.190 0.008 50 2.001 \ 1.178 3.000 \ 1.179 3.999 \ 1.184 1.000 \ 1.178 0.006 100 2.001 \ 0.879 2.999 \ 0.879 4.000 \ 0.882 1.000 \ 0.879 0.003 200 2.000 \ 0.936 3.000 \ 0.937 4.000 \ 0.942 1.000 \ 0.936 0.006 500 2.000 \ 0.933 3.000 \ 0.934 4.000 \ 0.936 1.000 \ 0.933 0.003 1000 2.000 \ 0.933 3.000 \ 0.933 4.000 \ 0.934 1.000 \ 0.932 0.002

Magnitude of Deviation (RMS) n Rank \ Value Range B T V R 5 4.000 \ 0.006 3.000 \ 0.006 1.000 \ 0.005 2.000 \ 0.005 0.001 10 1.000 \ 0.726 3.000 \ 0.729 4.000 \ 0.749 2.000 \ 0.728 0.023 15 2.000 \ 0.815 3.000 \ 0.816 4.000 \ 0.818 1.000 \ 0.815 0.003 20 1.000 \ 0.716 3.000 \ 0.717 4.000 \ 0.723 2.000 \ 0.716 0.007 25 2.000 \ 0.786 3.000 \ 0.787 4.000 \ 0.790 1.000 \ 0.786 0.004 30 2.000 \ 0.783 3.000 \ 0.784 4.000 \ 0.787 1.000 \ 0.783 0.004 35 1.000 \ 0.581 2.000 \ 0.581 4.000 \ 0.582 3.000 \ 0.581 0.001 40 4.000 \ 0.662 2.000 \ 0.661 1.000 \ 0.660 3.000 \ 0.662 0.002 45 3.000 \ 0.649 2.000 \ 0.648 1.000 \ 0.647 4.000 \ 0.649 0.002 50 2.000 \ 0.633 3.000 \ 0.633 4.000 \ 0.634 1.000 \ 0.633 0.001 100 2.000 \ 0.831 3.000 \ 0.831 4.000 \ 0.831 1.000 \ 0.831 0.000 200 2.000 \ 0.541 3.000 \ 0.541 4.000 \ 0.543 1.000 \ 0.541 0.002 500 2.000 \ 0.556 3.000 \ 0.556 4.000 \ 0.557 1.000 \ 0.556 0.001 1000 2.000 \ 0.563 3.000 \ 0.563 4.000 \ 0.564 1.000 \ 0.563 0.001

73

Table 30 Extreme Bimodal—Accuracy of T Scores on Means

Deviation from Target (50) n Rank \ Value B T V R 5 1.550 \ 0.000 1.590 \ 0.000 1.514 \ 0.000 1.075 \ 0.000 10 1.817 \ 0.000 1.602 \ 0.000 2.090 \ 0.000 1.741 \ 0.000 15 1.775 \ 0.000 2.050 \ 0.000 1.920 \ 0.000 1.693 \ 0.000 20 2.088 \ 0.000 1.717 \ 0.000 2.187 \ 0.000 1.928 \ 0.000 25 1.930 \ 0.000 1.722 \ 0.000 1.963 \ 0.000 2.033 \ 0.000 30 2.135 \ 0.000 1.913 \ 0.000 2.035 \ 0.000 2.069 \ 0.000 35 2.196 \ 0.000 2.195 \ 0.000 2.118 \ 0.000 2.131 \ 0.000 40 1.983 \ 0.000 2.133 \ 0.000 2.112 \ 0.000 2.226 \ 0.000 45 1.903 \ 0.000 2.309 \ 0.000 1.921 \ 0.000 1.936 \ 0.000 50 2.152 \ 0.000 2.085 \ 0.000 2.109 \ 0.000 2.057 \ 0.000 100 2.226 \ 0.000 2.390 \ 0.000 2.135 \ 0.000 2.351 \ 0.000 200 2.443 \ 0.000 2.348 \ 0.000 2.373 \ 0.000 2.346 \ 0.000 500 2.451 \ 0.000 2.501 \ 0.000 2.468 \ 0.000 2.450 \ 0.000 1000 2.476 \ 0.000 2.515 \ 0.000 2.511 \ 0.000 2.441 \ 0.000

Magnitude of Deviation (RMS) n Rank \ Value B T V R 5 2.000 \ 0.000 2.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000 10 1.000 \ 0.000 2.500 \ 0.000 2.500 \ 0.000 4.000 \ 0.000 15 1.000 \ 0.000 2.000 \ 0.000 3.500 \ 0.000 3.500 \ 0.000 20 1.000 \ 0.000 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 25 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 30 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 35 1.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000 40 2.000 \ 0.000 1.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 45 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 50 1.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 2.000 \ 0.000 100 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 200 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 500 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 1000 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000

74

Table 31 Extreme Bimodal—Accuracy of T Scores on Standard Deviations

Deviation from Target (10) n Rank \ Value Range B T V R 5 1.000 \ 2.639 2.993 \ 2.811 3.989 \ 3.899 1.996 \ 2.730 1.260 10 1.000 \ 1.865 3.000 \ 1.982 4.000 \ 2.761 2.000 \ 1.882 0.896 15 1.998 \ 1.570 3.000 \ 1.659 4.000 \ 2.271 1.000 \ 1.565 0.701 20 2.000 \ 1.413 3.000 \ 1.486 4.000 \ 1.994 1.000 \ 1.400 0.594 25 2.000 \ 1.318 3.000 \ 1.381 4.000 \ 1.820 1.000 \ 1.302 0.518 30 2.000 \ 1.269 3.000 \ 1.324 4.000 \ 1.708 1.000 \ 1.252 0.456 35 2.000 \ 1.218 3.000 \ 1.266 4.000 \ 1.611 1.000 \ 1.200 0.411 40 2.000 \ 1.178 3.000 \ 1.222 4.000 \ 1.534 1.000 \ 1.160 0.374 45 2.000 \ 1.142 3.000 \ 1.182 4.000 \ 1.468 1.000 \ 1.123 0.345 50 2.000 \ 1.078 3.000 \ 1.115 4.000 \ 1.382 1.000 \ 1.060 0.322 100 2.000 \ 0.996 3.000 \ 1.018 4.000 \ 1.174 1.000 \ 0.981 0.193 200 2.000 \ 0.931 3.000 \ 0.943 4.000 \ 1.035 1.000 \ 0.921 0.114 500 2.000 \ 0.886 3.000 \ 0.892 4.000 \ 0.936 1.000 \ 0.879 0.057 1000 1.956 \ 0.869 2.986 \ 0.872 4.000 \ 0.897 1.058 \ 0.866 0.031

Magnitude of Deviation (RMS) n Rank \ Value Range B T V R 5 4.000 \ 0.379 3.000 \ 0.371 2.000 \ 0.367 1.000 \ 0.298 0.081 10 4.000 \ 0.421 3.000 \ 0.421 2.000 \ 0.416 1.000 \ 0.379 0.042 15 3.000 \ 0.270 4.000 \ 0.270 2.000 \ 0.266 1.000 \ 0.240 0.030 20 4.000 \ 0.273 3.000 \ 0.273 2.000 \ 0.269 1.000 \ 0.246 0.027 25 4.000 \ 0.206 3.000 \ 0.205 2.000 \ 0.202 1.000 \ 0.184 0.022 30 4.000 \ 0.185 3.000 \ 0.184 2.000 \ 0.182 1.000 \ 0.167 0.018 35 4.000 \ 0.181 3.000 \ 0.180 2.000 \ 0.178 1.000 \ 0.162 0.019 40 4.000 \ 0.162 3.000 \ 0.161 2.000 \ 0.159 1.000 \ 0.145 0.017 45 4.000 \ 0.130 3.000 \ 0.129 2.000 \ 0.127 1.000 \ 0.115 0.015 50 4.000 \ 0.156 3.000 \ 0.155 2.000 \ 0.153 1.000 \ 0.140 0.016 100 4.000 \ 0.106 3.000 \ 0.105 2.000 \ 0.104 1.000 \ 0.010 0.096 200 4.000 \ 0.009 3.000 \ 0.009 2.000 \ 0.009 1.000 \ 0.008 0.001 500 4.000 \ 0.006 3.000 \ 0.005 2.000 \ 0.005 1.000 \ 0.005 0.001 1000 4.000 \ 0.004 3.000 \ 0.004 2.000 \ 0.004 1.000 \ 0.004 0.000

75

Table 32 Extreme Bimodal—Accuracy of T Scores on Skewness

Deviation from Target (0) n Rank \ Value Range B T V R 5 3.463 \ 0.003 2.813 \ 0.003 1.552 \ 0.003 2.172 \ 0.003 0.000 10 1.001 \ 0.162 3.000 \ 0.162 3.999 \ 0.166 2.000 \ 0.162 0.004 15 1.003 \ 0.155 3.000 \ 0.155 3.997 \ 0.159 2.000 \ 0.155 0.004 20 1.987 \ 0.149 2.999 \ 0.150 3.997 \ 0.154 1.018 \ 0.149 0.005 25 2.038 \ 0.136 2.768 \ 0.136 3.313 \ 0.138 1.882 \ 0.136 0.002 30 2.990 \ 0.307 2.002 \ 0.307 1.013 \ 0.306 3.995 \ 0.307 0.001 35 2.996 \ 0.304 2.002 \ 0.304 1.003 \ 0.303 3.999 \ 0.304 0.001 40 2.873 \ 0.309 2.018 \ 0.309 1.171 \ 0.309 3.939 \ 0.310 0.001 45 2.999 \ 0.293 2.002 \ 0.293 1.005 \ 0.292 3.995 \ 0.293 0.001 50 2.012 \ 0.170 2.984 \ 0.170 3.956 \ 0.171 1.047 \ 0.170 0.001 100 2.372 \ 0.006 2.645 \ 0.006 2.953 \ 0.006 2.031 \ 0.006 0.000 200 2.377 \ 0.006 2.627 \ 0.006 2.900 \ 0.006 2.099 \ 0.006 0.000 500 2.390 \ 0.005 2.608 \ 0.005 2.838 \ 0.006 2.164 \ 0.005 0.001 1000 2.032 \ 0.318 2.968 \ 0.318 3.905 \ 0.318 1.095 \ 0.318 0.000

Magnitude of Deviation (RMS) n Rank \ Value Range B T V R 5 4.000 \ 0.365 3.000 \ 0.363 1.000 \ 0.352 2.000 \ 0.363 0.013 10 1.000 \ 0.571 3.000 \ 0.572 4.000 \ 0.575 2.000 \ 0.571 0.004 15 2.000 \ 0.664 3.000 \ 0.664 4.000 \ 0.666 1.000 \ 0.663 0.003 20 2.000 \ 0.595 3.000 \ 0.595 4.000 \ 0.597 1.000 \ 0.595 0.002 25 2.000 \ 0.723 3.000 \ 0.723 4.000 \ 0.724 1.000 \ 0.723 0.001 30 4.000 \ 0.562 2.000 \ 0.561 1.000 \ 0.559 3.000 \ 0.562 0.003 35 3.000 \ 0.577 2.000 \ 0.576 1.000 \ 0.576 4.000 \ 0.577 0.001 40 3.000 \ 0.624 2.000 \ 0.623 1.000 \ 0.622 4.000 \ 0.623 0.002 45 3.000 \ 0.584 2.000 \ 0.583 1.000 \ 0.582 4.000 \ 0.584 0.002 50 3.000 \ 0.567 2.000 \ 0.566 1.000 \ 0.563 4.000 \ 0.567 0.004 100 3.000 \ 0.661 2.000 \ 0.661 1.000 \ 0.660 4.000 \ 0.661 0.001 200 3.000 \ 0.654 2.000 \ 0.654 1.000 \ 0.654 4.000 \ 0.654 0.000 500 2.000 \ 0.665 3.000 \ 0.665 4.000 \ 0.665 1.000 \ 0.665 0.000 1000 3.000 \ 0.646 2.000 \ 0.646 1.000 \ 0.646 4.000 \ 0.646 0.000

76

Table 33 Extreme Bimodal—Accuracy of T Scores on Kurtosis

Deviation from Target (3) n Rank \ Value Range B T V R 5 1.000 \ 1.235 2.000 \ 1.238 4.000 \ 1.259 3.000 \ 1.239 0.024 10 1.004 \ 1.209 2.999 \ 1.211 3.996 \ 1.219 2.001 \ 1.210 0.010 15 1.002 \ 1.198 3.000 \ 1.199 3.996 \ 1.203 2.003 \ 1.199 0.005 20 1.010 \ 1.203 2.996 \ 1.203 3.988 \ 1.206 2.006 \ 1.203 0.003 25 1.333 \ 1.125 2.994 \ 1.126 3.982 \ 1.130 1.691 \ 1.125 0.005 30 2.000 \ 1.119 3.000 \ 1.120 4.000 \ 1.123 1.000 \ 1.119 0.004 35 2.000 \ 1.136 3.000 \ 1.136 4.000 \ 1.139 1.000 \ 1.136 0.003 40 2.000 \ 1.128 3.000 \ 1.128 4.000 \ 1.131 1.000 \ 1.128 0.003 45 2.000 \ 1.107 3.000 \ 1.107 4.000 \ 1.110 1.000 \ 1.107 0.003 50 2.000 \ 1.089 3.000 \ 1.089 4.000 \ 1.092 1.000 \ 1.089 0.003 100 1.936 \ 1.115 2.994 \ 1.115 3.971 \ 1.118 1.093 \ 1.115 0.003 200 1.953 \ 1.109 2.934 \ 1.109 3.817 \ 1.112 1.292 \ 1.109 0.003 500 2.039 \ 1.104 2.960 \ 1.105 3.867 \ 1.106 1.133 \ 1.104 0.002 1000 1.999 \ 1.085 2.998 \ 1.085 3.996 \ 1.085 1.007 \ 1.085 0.000

Magnitude of Deviation (RMS) n Rank \ Value Range B T V R 5 4.000 \ 0.310 3.000 \ 0.307 1.000 \ 0.290 2.000 \ 0.306 0.020 10 4.000 \ 0.593 2.000 \ 0.592 1.000 \ 0.591 3.000 \ 0.592 0.002 15 2.000 \ 0.725 3.000 \ 0.725 4.000 \ 0.726 1.000 \ 0.725 0.001 20 1.000 \ 0.557 3.000 \ 0.557 4.000 \ 0.557 2.000 \ 0.557 0.000 25 2.000 \ 0.657 3.000 \ 0.657 4.000 \ 0.658 1.000 \ 0.657 0.001 30 2.000 \ 0.500 3.000 \ 0.500 4.000 \ 0.501 1.000 \ 0.500 0.001 35 4.000 \ 0.589 2.000 \ 0.589 1.000 \ 0.589 3.000 \ 0.589 0.000 40 3.000 \ 0.600 2.000 \ 0.600 1.000 \ 0.599 4.000 \ 0.600 0.001 45 3.000 \ 0.634 2.000 \ 0.634 1.000 \ 0.633 4.000 \ 0.634 0.001 50 3.000 \ 0.550 2.000 \ 0.550 1.000 \ 0.547 4.000 \ 0.551 0.004 100 3.000 \ 0.645 2.000 \ 0.645 1.000 \ 0.644 4.000 \ 0.645 0.001 200 1.000 \ 0.636 3.000 \ 0.636 4.000 \ 0.636 2.000 \ 0.636 0.000 500 2.000 \ 0.631 3.000 \ 0.631 4.000 \ 0.631 1.000 \ 0.631 0.000 1000 3.000 \ 0.667 2.000 \ 0.667 1.000 \ 0.667 4.000 \ 0.667 0.000

77

The 16 figures that follow plot the range of deviation values for each distribution against a power curve. The power curve is a regression model that

b1 follows the formula: Y = bot . Curve fitting is only possible for the deviation range on the second and fourth moments, standard deviation and kurtosis. The first and third moments, mean and skewness, either contain zeros, which make transformations impossible, or lack sufficient variability to make curve fitting worthwhile.

Only the first 10 sample sizes, which increase in increments of five from n =

5 to n = 50, are used for this initial set of figures. Typically, more statistical variability occurs among smaller samples. This Monte Carlo study was designed to comprehensively document the ranking methods’ performance at small sample sizes and to evaluate these trends at larger sample sizes. To serve this end, several of the small-sample regression models are fitted a second time with the addition of four sample sizes: n = 100, n = 200, n = 500, and n = 1,000.

78

Figure 13 . Smooth Symmetric: Power curve for deviation range of

standard deviation.

Figure 14. Smooth Symmetric: Power curve for deviation range of kurtosis.

79

Figure 15. Discrete Mass at Zero: Power curve for deviation range of standard deviation.

Figure 16. Discrete Mass at Zero: Power curve for deviation range of kurtosis.

80

Figure 17. Extreme Asymmetric, Growth: Power curve for deviation range of standard deviation.

Figure 18 . Extreme Asymmetric, Growth: Power curve for deviation range of kurtosis.

81

Figure 19 . Digit Preference: Power curve for deviation range of standard deviation.

Figure 20 . Digit Preference: Power curve for deviation range of kurtosis.

82

Figure 21 . Multimodal Lumpy: Power curve for deviation range of standard deviation.

Figure 22 . Multimodal Lumpy: Power curve for deviation range of kurtosis.

83

Figure 23 . Mass at Zero with Gap: Power curve for deviation range of standard deviation.

Figure 24. Mass at Zero with Gap: Power curve for deviation range of kurtosis.

84

Figure 25. Extreme Asymmetric, Decay: Power curve for deviation range of standard deviation.

Figure 26. Extreme Asymmetric, Decay: Power curve for deviation range of kurtosis.

85

Figure 27. Extreme Bimodal: Power curve for deviation range of standard deviation.

Figure 28. Extreme Bimodal: Power curve for deviation range of kurtosis.

86

Power curves are variously selected for a second fitting with the addition of the larger sample sizes. Figure 29 shows that inclusion of larger sample sizes causes the Smooth Symmetric power curve to remain intact. This curve can be compared with Figure 13. Figure 30 shows the rectifying of the Digit Preference power curve fit when larger sample sizes are included. Compare this curve with

Figure 19. Figure 31 shows the Mass at Zero with Gap distribution, which achieves an extremely poor fit when only small samples are included (see Figure 24), but assumes the basic shape of the power curve with the addition of larger samples.

Together, these three large sample curves illustrate that to whatever extent predictive patterns are established when n ≤ 50, those regression slopes either improve in fit or continue to hold when sample sizes increase. Therefore, it does not seem warranted to present a complete set of power curves with the larger sample sizes.

87

Figure 29 . Smooth Symmetric: Power curve for deviation range of standard deviation with inclusion of large sample sizes.

Figure 30 . Digit Preference: Power curve for deviation range of standard deviation with inclusion of large sample sizes.

88

Figure 31. Mass at Zero with Gap: Power curve for deviation range of kurtosis with inclusion of large sample sizes.

89

CHAPTER 5

CONCLUSION

The purpose of this study was to compare the accuracy of the Blom, Tukey,

Van der Waerden, and Rankit approximations in attaining the target moments of the normal distribution. Means and standard deviations were scaled to the T to facilitate interpretation in the context of standardized testing in education.

Accuracy was conceptualized in both relative and absolute terms, as expressed in ranks and absolute values throughout the results tables. Deviation from target and magnitude of deviation framed the comparison of accuracy measures.

A Monte Carlo simulation allowed the ranking methods’ performance to be experimentally evaluated under a variety of real distributional conditions. Each entry in the tables is the product of 10,000 iterations of a random selection process. Replicating this experiment would produce slightly different numerical values due to the random processes it involves. However, the design is sufficiently powerful that the outcome of the comparisons would be identical.

The final two tables summarize the major findings according to moment, sample size, and distribution. Table 34 presents the average deviation ranks and values and Table 35 identifies the winning approximations by name. In Table 35, hyphens ( - ) indicate that all values for the mean are zero. Forward slashes ( / ) indicate that three out of four values for skewness are tied.

90

Table 34 Deviation from Target, Summarized by Moment, Sample Size, and Distribution

Van der Blom Tukey Waerden Rankit Rank Value Rank Value Rank Value Rank Value Range

Moment

Mean 2.045 0.000 2.022 0.000 2.034 0.000 2.026 0.000 0.000 Standard Deviation 1.859 1.142 2.985 1.186 3.982 1.603 1.146 1.119 0.484 Skewness 2.668 0.192 2.477 0.192 2.269 0.191 2.586 0.192 0.001 Kurtosis 1.687 0.947 2.915 0.941 3.988 0.952 1.394 0.930 0.022

Sample Size

5 ≤ 50 1.976 0.609 2.585 0.628 3.103 0.769 1.720 0.603 0.166 100 ≤ 1000 2.231 0.435 2.599 0.423 2.962 0.447 1.883 0.416 0.031

Distribution

Smooth Symmetric 2.007 0.393 2.643 0.411 3.196 0.531 1.653 0.391 0.140 Discr Mass Zero 2.033 0.404 2.608 0.421 3.136 0.539 1.715 0.403 0.136 Asym – Growth 1.995 0.453 2.670 0.470 3.257 0.583 1.596 0.452 0.131 Digit Preference 2.039 0.390 2.622 0.408 3.131 0.527 1.692 0.370 0.158 Multimod Lumpy 1.987 0.412 2.624 0.396 3.126 0.510 1.737 0.376 0.134 Mass Zero w/Gap 2.239 1.129 2.465 1.126 2.747 1.204 2.103 1.113 0.092 Asym – Decay 2.238 0.726 2.528 0.739 2.765 0.835 2.046 0.725 0.109 Extreme Bimodal 1.980 0.655 2.649 0.669 3.190 0.765 1.753 0.654 0.112

91

Table 35 Winning Approximations, Summarized by Moment, Sample Size, and Distribution

1st Place 2nd Place 3rd Place 4th Place Rank \ Value Rank \ Value Rank \ Value Rank \ Value

Moment Mean T \ - V \ - R \ - B \ - Standard Deviation R \ R B \ B T \ T V \ V Skewness V \ V T \ B/T/R R \ B/T/R B \ B/T/R Kurtosis R \ R B \ T T \ B V \ V

Sample Size 5 ≤ 50 R \ R B \ B T \ T V \ V 100 ≤ 1000 R \ R B \ T T \ B V \ V

Distribution Achievement Smooth Symmetric R \ R B \ B T \ T V \ V Discrete Mass at Zero R \ R B \ B T \ T V \ V Asymmetric – Growth R \ R B \ B T \ T V \ V Digit Preference R \ R B \ B T \ T V \ V Multimodal Lumpy R \ R B \ T T \ B V \ V Psychometric Mass at Zero with Gap R \ R B \ T T \ B V \ V Asymmetric – Decay R \ R B \ B T \ T V \ V Extreme Bimodal R \ R B \ B T \ T V \ V

92

Discussion

Moment 1—Mean

All four ranking methods attain the target value of 50 for the mean.

Differences appear in the numerical results only after the third decimal place, and are therefore meaningless in terms of practical application. Most mean deviation values are machine-constant zeros, meaning they are zero at least until the sixth decimal place. Although these differences are reflected in the deviation and magnitude ranks, they do not merit further summary statistics, such as deviation or

RMS range.

Moment 2—Standard Deviation

The absolute and relative accuracy of the four ranking methods in attaining the target standard deviation differ substantially. Their average absolute deviation from the target T score standard deviation is 1.263. This means that the practitioner who uses any of the four ranking methods to normalize test scores without reference to sample size or distribution can expect to obtain an estimated standard deviation of 8.737 – 11.263. Adding the test instrument’s standard error to this compounds the problem. An instrument with a standard error of three (± 3) and a Z score of two would incur a final T score of 67.474 and 72.526, whose true range would be between 64.474 and 75.526, for a total of 11.052. Even a standard error half this size would lead to a true score range of 65.974 to 74.026, or 8.052.

Thus, a standard deviation that is off target by 1.263 would combine with a standard error of ± 1.5 to nearly triple the size of the true score range, from a theorized range of three to an actual range of more than eight. This is an increase

93 of 268%. As the standard error increases, the estimated difference between the theorized and actual score range diminishes. At a standard error of three, this increase is 184%. At a standard error of four, it becomes 163%.

The smallest observed deviation from the target standard deviation occurred in the Multimodal Lumpy distribution, for which Rankit obtained an average deviation value of 0.509. Van der Waerden’s method performed at its worst in the Mass at Zero with Gap distribution, obtaining a 3.768 deviation value.

In applied terms, this means that a practitioner using Van der Waerden’s formula for normalizing a standardized test score of Z = 2 could obtain a T score as low as

62.464 or as high as 77.536. Adding in a relatively modest standard error of two and rounding to the nearest whole number, a test-taker’s strong performance could result in an actual score as low as 60 or as high as 80. This range would indicate that the test-taker’s true performance falls somewhere between the 74 th and the

99 th percentile. Such information would be useless for any real testing purpose. On the other hand, the practitioner who uses Rankit with a Multimodal Lumpy distribution ( Z = 2) would obtain a T score between 68.982 and 71.018. Including a standard error of two and rounding, the test-taker would see a final score of 67 to

73. A true score range of six is clearly preferable to a range of 20. However, even the best ranking method produces an estimated half point deviation from the T score’s target standard deviation. In one of the best applied scenarios, this means that the true score range would still be 151% higher than the standardized test instrument’s stated standard error.

94

When assessing the potential impact of selecting a ranking method on the outcome of T scores, the average deviation score range (Table 34) may be misleading. It indicates that the difference between the highest and the lowest deviation values from the target standard deviation is less than a half point (0.484).

However, the gulf between the highest and lowest deviation values across all distributions is vast: 3.259. By the same token, the average deviation range among samples of 5 through 50 is 0.614, compared to 0.117 among samples of 100 through 1,000. Much more variability in the extent of ranking methods’ deviation from target occurs among small samples than among large samples.

RMS values may provide additional insight here. As anticipated, the magnitude of deviation from target as expressed in RMS values is highest for samples of n = 5, with only one exception. The highest RMS for Extreme

Asymmetric – Decay is found at sample size 10. The highest RMS among all sample sizes is 2.750, which is found at sample size 5 in the psychometric distribution Mass at Zero with Gap. Four of the five achievement distributions attain an RMS of zero at sample size 200. This lack of deviation magnitude holds for the larger sample sizes as well. Among achievement distributions, only Extreme

Asymmetric – Growth does not attain a low RMS of zero. At sample size 200, it reaches 0.003, which tapers off to 0.001 by n = 1,000. The average RMS range among the five achievement distributions is 0.111 and among the three psychometric distributions, 1.154. The average RMS range among all eight distributions is 0.158, with most RMS variability found among the psychometric distributions and the smaller samples. Curiously, all the worst RMS values belong

95 to Blom, yet Blom achieves second place in terms of relative and absolute deviation from target (Table 35). This suggests that Blom’s approximation may work by sacrificing some technical precision for reliability.

Moment 3—Skewness

The four ranking methods’ average deviation from the target skewness value of the normal distribution is 0.192. The psychometric distribution Mass at

Zero with Gap contains the worst deviation value for skewness and the achievement distribution Digit Preference contains the best. Blom and Tukey tie at

0.719 for the worst skewness performance in a given distribution, and Blom and

Rankit tie at 0.022 for the best. Ranking methods should not be selected on the basis of their deviation from target skewness values because the deviation quantities are small and the differences between them are negligible. Table 25 shows a three-way tie for second place between Blom, Tukey, and Rankit. Van der

Waerden scores its only first-place finish in this case, with a mere 0.001 margin of win. Furthermore, it is not clear how deviations from normal skewness may affect test scoring or interpretation.

Moment 4—Kurtosis

Kurtosis values show greater deviation from target than skewness values but less than standard deviations. The average deviation value for kurtosis across all sample sizes and distributions is 0.943. The average deviation is higher by

0.222 for the smaller samples (n ≤ 50) than for the larger sample sizes. Although the difference between the highest deviation value on any distribution and the lowest (1.145 on Extreme Bimodal and 0.328 on Mass at Zero with Gap,

96 respectively) appears substantial at 0.817, the overall difference between the best- performing and worst-performing ranking methods is only 0.022 on kurtosis. RMS values support the conclusion that differences on kurtosis are likely to have little practical meaning. The psychometric distributions have higher RMS values than the achievement distributions by an average of 0.176. The highest RMS value,

0.946, occurs at sample size 500 in the Mass at Zero with Gap distribution. The third-highest RMS value, 0.726, occurs at sample size 15 in the Extreme Bimodal distribution. The highest RMS value among achievement distributions, 0.556, occurs at sample size 30 in the Extreme Asymmetric – Growth distribution.

However, all the lowest RMS scores on distributions occur at the smallest sample size, n = 5. Aside from the question of how kurtosis considerations may actually affect test scoring or interpretation, RMS values display irregularity and nonconformity with the patterns established by the second and third moments.

Recommendations

The Blom, Tukey, Van der Waerden, and Rankit approximations display considerable variability on the even moments, standard deviation and kurtosis.

Only standard deviation, however, has known practical implications for test scoring and interpretation. Results for the odd moments, mean and skewness, may contribute to the analytical pursuit of area estimation under the normal distribution.

The great variability between and within ranking methods on the standard deviation suggests that practitioners should consider both sample size and distribution when selecting a normalizing procedure.

97

Small samples and skewed distributions aggravate the inaccuracy of all ranking methods. However, substantial differences between methods and deviations from target are found among large samples and relatively symmetric distributions as well. Therefore, scores from large samples should be plotted to observe population variance, in addition to propensity scores, tail weight, modality, and symmetry. Practitioners including analysts, educators, and administrators should also be advised that most test scores are less accurate than they appear.

Caution should be exercised when making decisions based on standardized test performance.

Table 35 simplifies this selection. Rankit is the most accurate method on the standard deviation and on kurtosis when sample size and distribution are not taken into account; it is the most accurate method among both small and large samples; and it is the most accurate method among both achievement and psychometric distributions. Van der Waerden’s approximation consistently performs the worst across sample sizes and distributions. In most cases, Blom’s method comes in second place and Tukey’s, third. The exceptions are trivial for applied purposes.

It would be useful to perform a more exhaustive empirical study of these ranking methods to better describe their patterns. It would also be of theoretical value to analyze the mathematical properties of their differences. More research can be done in both theoretical and applied domains. However, for the purpose of normalizing test scores in the social and behavioral sciences, these results suffice.

98

REFERENCES

Aiken, L. R. (1987). Formulas for equating ratings on different scales. Educational

and Psychological Measurement, 47 (1): 51-54.

Aiken, L. R. (1994). Psychological Testing and Assessment, 8 th Ed . Boston: Allyn

and Bacon.

Allport, F. M. (1934). The J-curve hypothesis of conforming behavior. Journal of

Social Psychology, 5: 141-183.

American Educational Research Association (AERA), American Psychological

Association (APA), & National Council on Measurement in Education

(NCME) (1999). Standards for Educational and Psychological Testing .

Washington, D.C.: AERA.

Andrews, D. F., Bickel, P. J., Hampel, F. R., Huber, P. J., Rogers, W. H., & Tukey,

J. W. (1972). Robust estimates of location survey and advances . Princeton:

Princeton University Press.

Angoff, W. H. (1971). Scales, norms, and equivalent scores. In Thorndike, R. L.,

Ed. Educational Measurement, 2 nd Ed . Washington, D.C.: American Council

on Education.

—(1984). Scales, Norms, and Equivalent Scores . Princeton: Educational Testing

Service.

Bartlett, M. S. (1947). The use of transformations. Biometrics, 3 (1): 39-52.

Retrieved August 6, 2007 from JSTOR database.

99

Blair, R. C. & Higgins, J. J. (1980). A comparison of the power of the Wilcoxon’s

rank-sum statistic to that of the Student’s t statistic under various non-

normal distributions. Journal of Educational Statistics, 5: 309-35.

Blair, R. C. & Higgins, J. J. (1985). Comparison of the power of the paired samples

t test to that of Wilcoxon’s signed-ranks test under various population

shapes. Psychological Bulletin, 97: 119-28.

Bliss, C. I., Greenwood, M. L., & White, E. S. (1956). A rankit analysis of paired

comparisons for measuring the effect of sprays on flavor. Biometrics, 12 (4):

381-403. Retrieved March 26, 2007 from JSTOR database.

Blom, G. (1954). Transformation of the binomial, negative binomial, Poisson and

χ2 distributions. Biometrika, 41 (3/4): 302-316.

Blom, G. (1958). Statistical Estimates and Transformed Beta-Variables . New York:

John Wiley & Sons.

Box, G. E. P. & Cox, D. R. (1964). An analysis of transformations. Journal of the

Royal Statistical Society, 26: 211-252.

Bradley, R. A. & Terry, M. E. (1952). Rank analysis of incomplete block designs I.

The method of paired comparisons. Biometrika, 39 : 324-345.

Bradley, J. V. (1977). A common situation conducive to bizarre distribution shapes.

The American Statistician, 31: 147-150.

Bradley, J. V. (1978). Robustness? British Journal of Mathematical and Statistical

Psychology, 31: 144-152.

100

Brown, B. M. & Hettmansperger, T. P. (1996). Normal scores, normal plots, and

tests for normality. Journal of the American Statistical Association,

91 (436):1668-1675. Retrieved August 6, 2007 from JSTOR database.

Cadwell, J. H. (1953). The distribution of quasi-ranges in samples from a normal

population. Annals of , 24: 603-13.

Chang, S. W. (2006). Methods in scaling the basic competence test. Educational

and Psychological Measurement, 66 : 907-929

Conover, W. J. (1980). Practical . New York: John Wiley &

Sons.

Cronbach, L. J. (1976). Essentials of Psychological Testing, 3 rd Ed . New York:

Harper & Row.

Davison, A. C. & Gigli, A. (1989). Deviance residuals and normal scores plots .

Biometrika, 76 (2): 211-221. Retrieved August 3, 2007 from JSTOR

database.

Donaldson, T. S. (1968). Robustness of the F-Test to errors of both kinds and the

correlation between the numerator and the denominator of the F-ratio.

Journal of the American Statistical Association, 63: 660-676.

Dunn-Rankin, P. (1983). Scaling Methods . Hillsdale: Lawrence Erlbaum

Associates.

Federer, W. T. (1951). Evaluation of Variance Components from a Group of

Experiments with Multiple Classifications. Iowa Agricultural Experiment

Station Research Bulletin, 380 .

101

Fisher, R. A. & Yates, F. (1938). Statistical Tables for Biological, Agricultural and

Medical Research . Edinburgh: Oliver and Boyd.

Fisher, R. A. & Yates, F. (1953). Statistical Tables for Biological, Agricultural and

Medical Research, 4 th Ed . London: Oliver and Boyd.

Friedman, M. (1937). The use of ranks to avoid the assumption of normality

implicit in the analysis of variance. Journal of the American Statistical

Association, 32 (200): 675-701.

Galton, F. (1902). The most suitable proportion between the value of first and

second prizes . Biometrika, 1 (4): 385-90.

Games, P. A. (1983). Curvilinear transformations of the dependent variable.

Psychological Bulletin, 93 (2): 382-387.

Games, P. A. (1984). Data transformations, power, and skew: A rebuttal to Levine

and Dunlap. Psychological Bulletin, 95 (2): 345-347.

Games, P. A. & Lucas, P. A. (1966). Power of the analysis of variance of

independent groups on non-normal and normally transformed data.

Educational and Psychological Measurement, 26: 311-327.

Glass, G. V., Peckham, P. D., Sanders, J. R. (1972). Consequences of failure to

meet the assumptions underlying the fixed effect analysis of variance and

covariance. Review of Educational Research, 42: 237-288.

Godwin, H. J. (1949). On the estimation of dispersion by linear systematic

statistics. Biometrika, 36: 92-100.

Gosset, W. S. (“Student”) (1908). The probable error of a mean. Biometrika, 6 (1):

1-25.

102

Harter, H. L. (1959). The use of sample quasi-ranges in estimating population

standard deviation. Annals of Mathematical Statistics., 30: 980-99.

Harter, H. L. (1961). Expected values of normal order statistics. Biometrika,

48 (1/2): 151-165. Retrieved August 3, 2007 from JSTOR database.

Hastings, C., Mosteller, F., Tukey, J. W., & Winsor, C. P. (1947). Low moments for

small samples: A comparative study of order statistics. Annals of

Mathematical Statistics, 18: 413-26.

Hinkle, D. E., Wiersma, W., & Jurs, S. G. (2003). Applied Statistics for the

Behavioral Sciences, 5 th Ed . Boston: Houghton Mifflin.

Hoaglin, D. C. (2003). John W. Tukey and data analysis. Statistical Science, 18 (3):

311-318. Retrieved August 3, 2007 from JSTOR database.

Horst, P. (1931). Obtaining comparable scores from distributions of dissimilar

shape. Journal of the American Statistical Association, 26 (176): 455-460.

Retrieved August 23, 2007 from JSTOR database.

Ipsen, J. & Jerne, N. (1944). Graphical evaluation of the distribution of small

experimental series. Acta Pathologica, Microbiologica et Immunologica

Scandinavica, 21 : 343-361.

Irwin, J. O. (1925). The further theory of Francis Galton’s individual difference

problem. Biometrika, 17: 100-28.

Kendall, M. G. (1955). Further contributions to the theory of paired comparisons.

Biometrics, 11 : 43-62.

Kendall, M. G. & Stuart, A. (1979). The Advanced Theory of Statistics, 4 th Ed., Vol.

2. New York: MacMillan.

103

Kline, P. (2000). Handbook of Psychological Testing, 2 nd Ed . London: Routledge.

Kolen, M. J. & Brennan, R. L. (2004). Test Equating, Scaling, and Linking:

Methods and practices, 2 nd Ed. New York: Springer Science+Business

Media.

Lehmann, E. L. (1975). Nonparametrics: Statistical Methods Based on Ranks. San

Francisco: Holden-Day.

Lester, P. E. & Bishop, L. K. (2000). Handbook of Tests and Measurement in

Education and the Social Sciences, 2 nd Ed . Lanham, MD: Scarecrow Press.

Levine, D. W. & Dunlap, W. P. (1982). Power of the F Test with skewed data:

Should one transform or not? Psychological Bulletin, 92 (1): 272-280.

Levine, D. W. & Dunlap, W. P. (1983). Data transformation, power, and skew: A

rejoinder to Games. Psychological Bulletin,93 (3): 596-599 .

Levine, A., Liukkonen, J., & Levine, D. W. (1992). Predicting power changes under

transformations in ANOVA tests. Communications in Statistics, 21: 679-92.

McCall, W. A. (1939). Measurement . New York: MacMillan.

Mehrens, W. A. & Lehmann, I. J. (1980). Standardized Tests in Education, 3 rd Ed.

New York: Holt, Rinehart and Winston.

Mehrens, W. A. & Lehmann, I. J. (1987). Using Standardized Tests in Education,

4th Ed. New York: Longman.

Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures.

Psychological Bulletin, 105 (1): 156-166.

Micceri, T. (1990). Proportions, pitfalls and pendulums. Educational and

Psychological Measurement, 50 (4): 769-74.

104

Mosteller, F. (1951). Remarks on the method of paired comparisons: I. The least

squares solution assuming equal standard deviations and equal

correlations. Psychometrika, 16 : 3-9.

Netemeyer, R. G., Bearden, W. O. & Sharma, S. (2003). Scaling Procedures:

Issues and Applications . Thousand Oaks: Sage Publications.

Nanna, M. J. & Sawilowsky, S. S. (1998). Analysis of Likert scale data in disability

and medical rehabilitation research. Psychological Methods, 3 (1): 55-67.

Nunnally, J. C. (1978). Psychometric Theory. New York: McGraw-Hill.

Osborne, J. W. (2002). Normalizing data transformations. ERIC Digest,

ED470204 . Available online: www.eric.ed.gov

Pearson, K. (1895). Contributions to the mathematical theory of evolution: II. Skew

variation in homogeneous material. Philosophical Transactions of the Royal

Society, Series A, 186: 343-414.

Pearson, K. (1902). Note on Francis Galton’s problem. Biometrika, 1 (4): 390-9.

Pearson, K. & Hartley, H. O. (1954). Biometrika Tables for Statisticians, I.

Cambridge University Press for the Biometrika Trustees.

Pearson, K. & Pearson, M. (1931). On the mean character and variance of a

ranked individual, and on the mean and variance of the intervals between

ranked individuals. Biometrika, 23: 364-87.

Pearson, E. S. & Please, N. W. (1975). Relation between the shape of a

population distribution and the robustness of four simple test statistics.

Biometrika, 62: 223-241.

105

Pearson, E. S. & Tukey, J. W. (1965). Approximate means and standard

deviations based on distances between percentage points of frequency

curves. Biometrika, 52 (3/4): 533-546. Retrieved August 6, 2007 from

JSTOR database.

Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and

equating. In R. L. Linn (Ed.), Educational Measurement, 3 rd Ed . New York:

American Council on Education; and Macmillan.

The Psychological Corporation (1955). Methods of expressing test scores. Test

Service Bulletin, 48: 7-10.

Sawilowsky, S., Blair, R. C., & Micceri, T. (1990). A PC FORTRAN subroutine

library of psychology and education data sets. Psychometrika, 55: 729.

Sawilowsky, S. & Blair, R. C. (1992). A more realistic look at the robustness and

Type II error properties of the t test to departures from population normality.

Psychological Bulletin, 111 (2): 352-360.

Sawilowsky, S. & Fahoome, G. (2003). Statistics Through Monte Carlo Simulation

with Fortran . Oak Park: JMASM.

Scheffé, H. (1952). An analysis of variance for paired comparisons. Journal of the

American Statistical Association, 47: 381-400.

Simon, H. A. (1955). On a class of skew distribution functions. Biometrika, 42: 425-

440.

Srivastava, A. B. L. (1959). Effect of non-normality on the power of the analysis of

variance test. Biometrika, 46: 114-122.

106

SPSS (2006). Statistical Package for the Social Sciences (SPSS) 15.0 for

Windows . Author.

Stigler, S. M. (1977). Do robust estimators work with real data? The Annals of

Statistics, 5 (6): 1055-1098.

Student—See Gosset, W. S.

Tan, W. Y. (1982). Sampling distributions and robustness of t, F and variance-ratio

in two samples and ANOVA models with respect to departures from

normality. Communications in Statistics, A11: 2485-2511.

Tapia, R. A. & Thompson, J. R. (1978). Nonparametric Probability Density

Estimation. Baltimore: Johns Hopkins University Press.

Tarter, M.E. (2000). Statistical Curves and Parameters: Choosing an appropriate

approach. Natick: A K Peters.

Thissen, D. & Wainer, H. (2001). Test Scoring . Mahwah: Lawrence Erlbaum

Associates.

Thorndike, R. L. (1982). Applied . Boston: Houghton Mifflin.

Thurstone, L. L. (1928). Attitudes can be measured. The American Journal of

Sociology, 22 (4): 529-554.

Tindal, G. (1987). The effect of different metrics on interpretations of change in

program evaluation. Remedial and Special Education, 8 (5): 19-28.

Tippett, L. H. C. (1925). On the extreme individuals and the range of samples

taken from a normal population. Biometrika, 17: 364-87.

107

Tukey, J. W. (1957). On the comparative anatomy of transformations. The Annals

of Mathematical Statistics, 28 (3): 602-632. Retrieved March 26, 2007 from

JSTOR database.

Tukey, J. W. (1962). The future of data analysis. The Annals of Mathematical

Statistics, 33 (1): 1-67. Retrieved August 3, 2007 from JSTOR database.

Tukey, J. W. & McLaughlin, D. H. (1963). Less vulnerable confidence and

significance procedures for location based on a single sample:

Trimming/Winsorization. Indian Journal of Statistics, 25: 331-351.

Van der Waerden, B. L. (1952/1953a). Order tests for the two-sample problem and

their power. Proceedings Koninklijke Nederlandse Akademie van

Wetenschappen (A), 55 (Indagationes Mathematical 14): 453-458, & 56

(Indagationes Mathematicae 15): 303-316.

Van der Waerden, B. L. (1953b). Testing a distribution function. Proceedings

Koninklijke Nederlandse Akademie van Wetenschappen (A),56

(Indagationes Mathematicae 15): 201-207.

Visual Numerics (1994). IMSL Stat/Library: FORTRAN subroutines for statistical

applications, Vol. 1 . Houston: Author.

Walker, H. M. & Lev, J. (1969). Elementary Statistical Methods, 3 rd Ed . New York:

Holt, Rinehart and Winston.

Wilks, S. S. (1948). Order statistics. Bulletin of the American Mathematical Society,

54: 6-50.

108

Wilson, E. B. & Hilferty, M. M. (1929). Note on C. S. Peirce’s experimental

discussion of the law of errors. Proceedings of the National Academy of

Science, 15: 120-125.

Wimberley, R. C. (1975). A program for the T-score normal standardizing

transformation. Educational and Psychological Measurement, 35 : 693-695.

Wright, E. N. (1973). Examinations, marks, grades and scales: A working paper.

Ontario: Toronto Board of Education.

Zimmerman, D. W. & Zumbo, B. D. (2005). Can percentiles replace raw scores in

the statistical analysis of test data? Educational and Psychological

Measurement, 65 : 613-638. Retrieved March 7, 2007 from

http://epm.sagepub.com

109

ABSTRACT

A COMPARISON OF RANKING METHODS FOR NORMALIZING SCORES

by

SHIRA R. SOLOMON

May 2008

Advisor: Shlomo S. Sawilowsky

Major: Evaluation and Research

Degree: Doctor of Philosophy

Normalizing transformations define the frame of reference for standardized test score distributions, allowing for meaningful comparisons between tests.

Normalization equalizes the intervals between data points by approximating where ordinal scores fall along a normal distribution and how much of the corresponding area under the curve the ranked, cumulative proportions occupy. The most prominent among such ranking methods are the Blom, Tukey, Van der Waerden, and Rankit approximations. The purpose of this study was to provide an empirical comparison of these ranking methods as they apply to standardized test scoring and interpretation.

A series of Monte Carlo simulations was performed to compare their accuracy in terms of achieving the T score’s specified mean and standard deviation and unit normal skewness and kurtosis. Eight nonnormal distributions of real achievement and psychometric data were used at 10 small and four large sample sizes. All four ranking methods were found to be accurate on the odd moments but displayed considerable deviation from target values on the even

110 moments. Standard deviation showed the most variability on both accuracy measures: deviation from target and magnitude of deviation.

The substantial variability between and within ranking methods on the standard deviation suggests that practitioners should consider both sample size and distribution when selecting a normalizing procedure. However, Rankit is the most accurate method among small and large samples, achievement and psychometric distributions, and overall. Van der Waerden’s approximation consistently performs the worst across sample sizes and distributions. These results indicate that Rankit should be the default selection for score normalization in the social and behavioral sciences.

111

AUTOBIOGRAPHICAL STATEMENT

SHIRA R. SOLOMON

Prior to her doctoral work in Educational Evaluation and Research at

Wayne State University, Shira Solomon received a Bachelor of Arts in

Comparative Literature from Columbia University (1994), a Bachelor of Arts in

Talmud and Rabbinic Literature from the Jewish Theological Seminary of America

(1994), and a Master of Science in Teaching from the New School for Social

Research (1997).

She has taught English Language Arts in New York City public schools,

English as a Foreign Language at National Taiwan University, English as a Second

Language at Wayne State University’s English Language Institute, and research methods for allied health and human services at Madonna University and

University of Detroit – Mercy. She has also worked as a technical writer, a communications writer, and an entertainment writer. She trained in social and behavioral health research at the Institute of Gerontology and the School of

Medicine at Wayne State University.

Ms. Solomon’s research interests include assessment methodology and the epidemiological investigation of literacy.