DECONSTRUCTING THE COMPARATIVE POWER OF THE INDEPENDENT SAMPLES t AND THE WILCOXON MANN-WHITNEY FOR SHIFT AND SLIGHT HETEROSCEDASTICITY

by

TANA J. BRIDGE

DISSERTATION

Submitted to the Graduate School

of Wayne State University,

Detroit, Michigan

in partial fulfillment of the requirements

for the degree of

DOCTOR OF PHILOSOPHY

2007

MAJOR: THEORETICAL EVALUATION AND RESEARCH

Approved by: ______Advisor Date ______

UMI Number: 3284037

UMI Microform 3284037 Copyright 2008 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code.

ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, MI 48106-1346

© COPYRIGHT BY

TANA JOLEE BRIDGE

2007

All Rights Reserved

DEDICATION

I dedicate this to my husband Patrick and my children Kayla and Evan who make this journey through life so sweet! I am blessed!

ii

ACKNOWLEDGMENTS

It is with deepest gratitude that I would like to thank those who have been a part of my educational journey.

I wish to thank Dr. Shlomo Sawilowsky who was committed to maximizing every opportunity for me to learn beginning from my enrollment through final phases of this dissertation. Convinced that it would all become clear, through patience and perseverance he successfully taught me more than I anticipated.

Dr. Gail Fahoome was amazing throughout this process; I would have never arrived at this date without her generosity. Giving countless hours of personal time in the program development, she not only was instrumental in the success of the Monte Carlo program but also in ensuring I kept a sense of humor along the way. Her sacrifice was great, for which I am so thankful.

Although Dr. Donald Marcotte is no longer with us, his teaching, humor, and support motivated me through completion of my course work and ultimately has shaped the way I teach. I appreciated his high expectations, attention to detail and patience. I feel blessed to have had him for several courses.

I am also grateful for my other two Committee members, Dr. Matthew Jackson for his commitment to ensuring I move through this process in a supported manner and Dr. James Moseley who unselfishly filled the position previously held by Dr. Marcotte.

Finally, I would like to thank my family for their encouragement throughout this process. To Patrick, my terrific husband, who ensured that home life never

iii

missed a beat and whose continual support and encouragement was unfailing.

To Kayla and Evan who have always been excited about this process, I am thankful for their patience and encouragement. I also would like to acknowledge my parents, Jon and Karen Mattson who have removed countless barriers to my achieving degrees in higher education and who have supported me in tasks big and small to achieve this degree. To all of them I remain grateful.

iv

TABLE OF CONTENTS

Chapter Page

DEDICATION ...... ii

ACKNOWLEDGMENTS ...... iii

LIST OF TABLES ...... viii

LIST OF FIGURES ...... xiii

CHAPTERS

CHAPTER 1 – Introduction ...... 1

The Relevance of the Problem ...... 4

The Purpose of the Study ...... 5

Limitations ...... 7

CHAPTER 2 – Literature Review ...... 9

The Student t test ...... 9

The Wilcoxon Mann-Whitney Test ...... 11

The Power and Robustness of the t test & Wilcoxon ...... 12

The Function of the & in Inferential Testing ...... 14

Scale Change and ...... 16

Shift in Location Parameters ...... 18

Scale Change and Shift in Location Parameters ...... 20

CHAPTER 3 – Method ...... 23

Monte Carlo Studies ...... 23

Methodology ...... 24

v

Study Parametric ...... 24

Sampling Distribution ...... 24

Sampling Size and Nominal Alpha ...... 26

Study Design ...... 27

CHAPTER 4 – Results ...... 29

Homogeneity of Variance with Shift in Location ...... 42

Equal Shift, Equal Sample Size ...... 42

Equal Shift, Unequal Sample Size ...... 42

Unequal Shift, Equal Sample Size ...... 43

Unequal Shift, Unequal Sample Size ...... 43

Heterogeneity of Variance with No Shift ...... 44

Gaussian Distribution ...... 44

Equal Variance, Equal Sample Size ...... 44

Unequal Variance, Equal Sample Size...... 45

Unequal Variance, Unequal Sample Size ...... 48

Smooth Symmetric Distribution ...... 53

Unequal Variance, Equal Sample Size ...... 53

Unequal Variance, Unequal Sample Size ...... 56

Extreme Asymmetric, Achievement Distribution...... 61

Unequal Variance, Equal Sample Size ...... 61

vi

Unequal Variance, Unequal Sample Size ...... 64

Heterogeneity of Variance with Shift in Location ...... 69

CHAPTER 5 – Conclusion ...... 72l

REFERENCES ...... 80

ABSTRACT ...... 87

AUTOBIOGRAPHICAL STATEMENT ...... 89

vii

List of Tables

TABLE PAGE

Table 1 Rejections (10-6) for Shift (L,U) = (.0,.0) for t and Wilcoxon, Gaussian distribution, various ratios of 1 & 2, sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions...... 30

Table 2 Rejections (10-6) for Shift (L,U) = (.0,.0) for t and Wilcoxon, Gaussian distribution, various ratios of 1 & 2, sample size (n1,n2) = (20,20),  = .05; 1,000,000 repetitions...... 31

Table 3 Rejections (10-6) for Shift (L,U) = (.0,.0) for t and Wilcoxon, Gaussian distribution, various ratios of 1 & 2, sample size (n1,n2) = (15,45),  = .05; 1,000,000 repetitions...... 32

Table 4 Rejections (10-6) for Shift (L,U) = (.0,.0) for t and Wilcoxon, Smooth Symmetric distribution, various ratios of 1 & 2, sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions...... 33

Table 5 Rejections (10-6) for Shift (L,U) = (.0,.0) for t and Wilcoxon, Smooth Symmetric distribution, various ratios of 1 & 2, sample size (n1,n2) = (20,20),  = .05; 1,000,000 repetitions...... 34

Table 6 Rejections (10-6) for Shift (L,U) = (.0,.0) for t and Wilcoxon, Smooth Symmetric distribution, various ratios of 1 & 2, sample size (n1,n2) = (15,45),  = .05; 1,000,000 repetitions...... 35

Table 7 Rejections (10-6) for Shift (L,U) = (.0,.0) for t and Wilcoxon, Extreme Asymmetric Achievement distribution, various ratios of 1 & 2, sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions...... 36

Table 8 Rejections (10-6) for Shift (L,U) = (.0,.0) for t and Wilcoxon, Extreme Asymmetric Achievement distribution, various ratios of 1 & 2, sample size (n1,n2) = (20,20),  = .05; 1,000,000 repetitions...... 37

Table 9 Rejections (10-6) for Shift (L,U) = (.0,.0) for t and Wilcoxon, Extreme Asymmetric Achievement distribution, various ratios of 1 & 2, sample size (n1,n2) = (15,45),  = .05; 1,000,000 repetitions...... 38

viii

Table 10 Rejections (10-6) for Shift (L,U) = (0.5,.0) for t and Wilcoxon, Gaussian distribution, various ratios of 1 & 2, sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions...... 39

Table 11 Rejections (10-6) for Shift (L,U) = (0.5,.0) for t and Wilcoxon, Smooth Symmetric distribution, various ratios of 1 & 2, sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions...... 40

Table 12 Rejections (10-6) for Shift (L,U) = (0.5,.0) for t and Wilcoxon, Extreme Asymmetric Achievement distribution, various ratios of 1 & 2, sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions...... 41

Table 13 of rejection change in location/shift: (.05, .1, .15, .2) with equal and unequal sample sizes...... 43

Table 14 Range of rejection change in location/shift: (.0-.20 (.05)) ...... 44

Table 15 Rejection rates (10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Gaussian distribution, sample size (n1,n2) = (20,20),  = .05; 1,000,000 repetitions...... 46

Table 16 Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Gaussian distribution, sample size (n1,n2) = (30,30),  = .05; 1,000,000 repetitions...... 47

Table 17 Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Gaussian distribution, sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions...... 49

Table 18 Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Gaussian distribution, sample size (n1,n2) = (30,10),  = .05; 1,000,000 repetitions...... 50

Table 19 Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Gaussian distribution, sample size (n1,n2) = (15,45),  = .05; 1,000,000 repetitions...... 51

Table 20 Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Gaussian distribution, sample size (n1,n2) = (45,15),  = .05; 1,000,000 repetitions...... 52

ix

Table 21 Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Smooth Symmetric distribution, sample size (n1,n2) = (20,20),  = .05; 1,000,000 repetitions...... 54

Table 22 Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Smooth Symmetric distribution, sample size (n1,n2) = (30,30),  = .05; 1,000,000 repetitions...... 55

Table 23 Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Smooth Symmetric distribution, sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions ...... 57

Table 24 Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Smooth Symmetric distribution, sample size (n1,n2) = (30,10),  = .05; 1,000,000 repetitions ...... 58

Table 25 Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Smooth Symmetric distribution, sample size (n1,n2) = (15,45),  = .05; 1,000,000 repetitions...... 59

Table 26 Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Smooth Symmetric distribution, sample size (n1,n2) = (45,15),  = .05; 1,000,000 repetitions...... 60

Table 27 Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Extreme Asymmetric Achievement distribution, sample size (n1,n2) = (20,20),  = .05; 1,000,000 repetitions ...... 62

Table 28 Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Extreme Asymmetric Achievement distribution, sample size (n1,n2) = (30,30),  = .05; 1,000,000 repetitions ...... 63

Table 29 Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Extreme Asymmetric Achievement distribution, sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions...... 65

Table 30 Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Extreme Asymmetric Achievement distribution, sample size (n1,n2) = (30,10),  = .05; 1,000,000 repetitions ...... 66

x

Table 31 Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Extreme Asymmetric Achievement distribution, sample size (n1,n2) = (15,45),  = .05; 1,000,000 repetitions...... 67

Table 32 Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Extreme Asymmetric Achievement distribution, sample size (n1,n2) = (45,15),  = .05; 1,000,000 repetitions ...... 68

Table 33 Gaussian Distribution: Average rejection and range of rejection across all increments of scale change and sample sizes...... 70

Table 34 Smooth Symmetric Distribution: Average rejection and range of rejection across all increments of scale change and sample sizes ...... 70

Table 35 Extreme Asymmetric Achievement Distribution: Average rejection and range of rejection across all increments of scale change and sample sizes. . 71

Table 36 Rejections for Shift (L,U) = (.0,.0) for t and Wilcoxon, 1 & 2= (1.0,1.0), sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions...... 74

Table 37 Rejections for Shift (L,U) = (.0,.0) for t and Wilcoxon, 1 & 2= (1.0,1.0), sample size (n1,n2) = (20,20),  = .05; 1,000,000 repetitions...... 74

Table 38 Rejections for Shift (L,U) = (.0,.0) for t and Wilcoxon, 1 & 2= (1.0,1.0), sample size (n1,n2) = (15,45),  = .05; 1,000,000 repetitions ...... 74

-6 Table 39 Rejections (x10 ) for Shift (L,U) = (.0,.0) for t and Wilcoxon, ratios of 1 & 2, sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions ...... 75

-6 Table 40 Rejections (x10 ) for Shift (L,U) = (.0,.0) for t and Wilcoxon, ratios of 1 & 2, sample size (n1,n2) = (20,20),  = .05; 1,000,000 repetitions...... 75

Table 41 Range of rejection change in location/shift: (.05, .10, .15, .20) ...... 76

Table 42 Rejections (power×10-6) for Shift (L,U) = (.05,.0) for t and Wilcoxon, Gaussian distribution, various ratios of 1 & 2, sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions ...... 77

Table 43 Rejections (power×10-6) for Shift (L,U) = (.05,.0) for t and Wilcoxon, Smooth Symmetric distribution, various ratios of 1 & 2, sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions...... 77

xi

Table 44 Rejections (power×10-6) for Shift (L,U) = (.05,.0) for t and Wilcoxon, Extreme Asymmetric, Achievement distribution, various ratios of 1 & 2, sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions ...... 78

.

xii

List of Figures

FIGURE PAGE

Figure 1 Gaussian Distribution ...... 27

Figure 2 Smooth Symmetric Distribution ...... 28

Figure 3 Extreme Asymmetric, Achievement Distribution ...... 29

xiii

1

CHAPTER ONE

INTRODUCTION

The appropriate use of parametric and nonparametric tests has been an ongoing debate, beginning in the late 1800‟s and continuing through the present.

Specifically, researchers continue to debate comparative power properties of parametric and nonparametric tests when the underlying assumptions of the tests are violated. The emphasis of this debate will focus on the parametric t test and the nonparametric Wilcoxon Mann Whitney test.

The parametric t test is the most widely used procedure in education and medical research (Blair & Higgins, 1980b; Bridge & Sawilowsky, 1999). It is defined as “a statistical test of hypotheses in which the null hypothesis includes a specific value for the population parameter and in which certain assumptions have been met” (Hinkle, Weirsman & Jurs, 1998, p. 620). The assumptions of the t test include: 1) the samples are randomly and independently selected; 2) the population distribution from which the samples were selected is normal (i.e.,

Gaussian or DeMoivrian); 3) the have the same variance (homogeneity), and 4) the data are measured on at least an interval scale. Parametric tests are often used when these assumptions are violated, however an extreme violation of one of the assumptions or violation of multiple assumptions can impact Type I and Type II error rates (Blair, 1991; Conover & Iman, 1981; Zimmerman, 1987;

Gibbons & Chakraborti, 1992).

2

The nonparametric Wilcoxon Mann Whitney test differs from parametric tests in that they have fewer underlying assumptions and are regarded as both robust regardless of the treatment alternative and powerful for treatment alternatives of shift in location when the assumption of normality is violated

(Mansfield, 1986; Bridge & Sawilowsky, 1999; Sawilowsky & Fahoome, 2003).

Further, the Wilcoxon Mann Whitney test maintains its robustness and power properties if data are at an ordinal, interval or ratio level. Therefore the

Wilcoxon Mann Whitney test is applicable for many different types of data analyses.

Recently, a debate specific to the comparative power of parametric and nonparametric tests resurfaced in the medical literature (William, Cheung,

Russell, Cohen, Longo & Lervy, 2000; Barber & Thompson, 2000).

Specifically, William et al. used the Wilcoxon Mann-Whitney test over the independent samples t test to analyze cost data, which is often characterized by an extreme skew. The outcomes of the William et al. (2000) study allowed them to reject the null hypothesis.

In review of the methods and outcomes of the William et al. (2000) study,

Barber and Thompson (2000) replicated the analysis using the independent samples t test. Outcomes from their analysis failed to reject the null hypothesis. Barber and Thompson (2000) argued that the t test is more appropriate when analyzing cost data and a shift in location of the mean associated with the t test could not be compromised for a shift in the median

3

associated with the Wilcoxon Mann Whitney. Thus, Barber and Thompson

(2000) questioned the validity of the William et al. (2000) study, stating that the conclusions drawn by William et al. “can lead to seriously misleading conclusions, which could influence important policy decisions in health care”

(Barber & Thompson, 2000, p. 1730).

Barber and Thompson have maintained this position and had made previous claims as to the value of the mean and the lack of usefulness of nonparametric tests. In a 1998 study, Barber and Thompson reviewed 45 clinical trials/studies. They acknowledge possible concerns over the validity of the t test to violations from normality that is common with highly skewed distributions of cost data. However they reiterated that when looking at cost data, the statistical methods must be based on the (as tested by the t test), because it offers a more precise measure. They advanced the following arguments:

1. Although cost data is recognized as data that is often highly skewed,

(breaching the assumption of normality and perhaps homogeneity), the

t test is the most precise measure of analysis.

2. “Economic analysis is mainly concerned with means” thus implying shift

in location of a median (Wilcoxon Mann-Whitney) is less useful then a

shift in mean (t test) when completing an economic analysis (Barber

and Thompson, 2000, p. 1730).

4

3. Heterogeneity (scale change/variance) would not be a consideration.

Rather, the focus was specific to change in location of the mean.

Relevance of the Problem to Education and the Social Sciences

Although the previous example is specific to medical data, within the disciplines of education and the social sciences, the most common two- sample test utilized is the t test (Blair & Higgins, 1980b). Further it is understood that within these disciplines, data sets are most often nonnormal and researchers frequently neglect the issue surrounding nonnormality. For example, Micceri (1989) analyzed 440 distributions in psychology and education and found none to be normally distributed, and only 3% that could be classified as relatively symmetric with light tails. Breckler (1990) reviewed

72 articles in journals of social sciences using multivariate tests. Outcomes from this review indicated that only 19% of the authors acknowledged the assumption of multivariate normality, and less than 10% tested to see if this assumption was violated.

Nonnormality of data in education, the social sciences, and medicine abounds yet the t test remains the of choice, which raises concern as to the validity of many studies. Although there have been numerous studies assessing the comparative power of the t test and Wilcoxon Mann Whitney, the recent debate in the medical literature warrants a specific focus on the effects of shift in location parameters.

5

The Purpose of this Study

It is recognized that researchers often defer to parametric tests without regard or understanding of their limitations (Siegel, 1956; Gibbons, 1985;

Zimmerman, 1987). The purpose of this study is to assess if, in the presence of a slight scale change, the reason the t test fails to reject and the Wilcoxon

Mann Whitney does reject is due to the scale change and not shift or change in location.

The use of Monte Carlo methods will allow for several parameters of the

Wilcoxon Mann-Whitney and t test to be evaluated and controlled, specific to shift in location. Those parameters will include: sample size, alpha level, and distributions. Further, scale change (as it relates to shift in location) will be assessed by adding a constant and altering the (See „definitions‟ for more expansive definitions (pp. 6-8) and “study parameters” for operationalization of parameters (p. 27-32)).

Definitions

In the stated purpose of this study, the following terms are defined.

Monte Carlo. Monte Carlo studies are computer simulations that allow for the measuring of mathematical properties of statistical tests (Harwell, 1990).

Wilcoxon Mann-Whitney. The Wilcoxon Mann-Whitney is a “nonparametric test for ordinal data with dependent samples” (Hinkle, et al., 1998, p. 622).

6

Independent Sample t Test. The independent samples t test can be defined as “a for determining the significance of a difference between means (for a two sample case)” (Runyon & Haber, 1991, p. 337).

Scale Change. Scale change is the variance of the sample scores. “Spread of the distribution of observations” (Hutchinson, 2002, p.248).

Shift in Location. Shift in location is the systematic differences between groups of measures (Kerlinger & Lee, 2000, p.107). Shift in location for the t test is specifically the mean whereas it is the median for the Wilcoxon Mann-

Whitney.

Robustness. Hunter and May (1993) defined robustness of a statistical test as “the extent that violating its assumptions does not appreciable affect the probability of its Type I error” (p.386). Sawilowsky (1990) added that robustness also speaks to Type II error and the compliment of the power of the statistical test.

Alpha level. The alpha level is often referred to as the level of significance and refers to “the probability of making a Type I error if Ho is rejected” (Hinkle, et al., 1998, p. 618).

Distribution. Distribution is defined as an arrangement of values/outcomes that demonstrates observed (Hinkle, et al., 1998).

Skewness. refers to the lack of symmetry of a distribution of scores, elongation of either the left or right tails (Wilcox, 1996).

7

Power. Power is “the probability of rejecting the null hypothesis when it is false” which is referred to as Type II error (Hinkle, et al., 1998, p. 620).

Limitations

Limitations to this study are listed below.

1. The three data sets will be computer simulated from one theoretical

(Gaussian) and two real world distributions (Micceri, 1989).

2. Distributions are limited to those listed under parameters. These

distributions are not representative of all possible distributions.

3. This study is limited to addressing the underlying assumptions of normality

and and excludes the other underlying assumptions.

Specifically, it will offer conclusions about shift in location and scale

change.

4. This study is limited to the Wilcoxon Mann-Whitney and the independent

samples t test and does not compare the dependent samples t test or

Wilcoxon Sign Rank test.

8

CHAPTER TWO

LITERATURE REVIEW

Statistical tests were developed for the purpose of studying the relationship between variables. Researchers must exercise careful choice in test selection to ensure that the power and robust properties of the test are preserved. When studying two-independent samples, the parametric t test and nonparametric

Wilcoxon Mann Whitney tests are popular options, with each requiring that underlying assumptions be met with minimal violations. Recognizing that violations to the test assumptions occur, comparative power studies give guidance to best test selection.

The Student t test

Gosset‟s t test (Student, 1908) has been extensively used and widely studied over the years since its inception in 1908. The t test remains one of the most, popular statistics used in the 21st century. The t test can be defined as “a test statistic for determining the significance of difference between means (for a two-sample case)” (Runyon & Haber, 1991, p. 337) and “whose distribution is equal to the square root of the F distribution with one degree of freedom in the numerator” (Sawilowsky, 1990, p. 99). According to Kerlinger and Lee (2000), the t test‟s underlying assumptions include (pp. 414-418):

1. The Assumption of Normality. The t test assumes that the sample is

drawn from a population that is normally distributed.

9

2. Independence of Observation. It is assumed that all observations are

randomly and independently sampled from the population.

3. Homogeneity of Variance. This assumes that the variance between

groups is the same, thus, all populations have the same variance.

4. Continuity and Equal Intervals of Measure. Based on the arithmetic

activities of statistical operations, this assumption requires the data to

be either interval or ratio scale data.

In the initial years it was believed that violating the assumptions of the t test could lead to erroneous findings, therefore the underlying assumptions of the test must be met. However, research studies have indicated that under some conditions the t test maintains its power properties when the violations are not extreme or are few in number (Conover & Iman, 1981; Zimmerman, 1987;

Blair, 1991; Gibbons & Chakraborti, 1992; Sawilowsky & Blair, 1992; Bridge &

Sawilowsky, 1999). For example, the t test is the uniformly most powerful unbiased test (UMPI) when the data are normally distributed, but some researchers have found that the t test maintains its power properties when violations from normality are minimal (Blair, 1981; Bridge & Sawilowsky,

1999). Further, the test may retain its robust properties under some violations of homogeneity (Boneau, 1960, Glass, Peckham & Sanders, 1972,

Sawilowsky & Blair, 1992). Certainly the t test is less rigid then first identified; yet the conditions for using the t test must be understood and applied appropriately.

10

The Wilcoxon Mann – Whitney

Frank A. Wilcoxon first developed the Wilcoxon Mann-Whitney in 1945. In

1947, Henry B. Mann and Donald R. Whitney, utilizing varied procedures, arrived at the same results as F. Wilcoxon; thus, the test is often referred to as the Wilcoxon Mann-Whitney test. The Wilcoxon Mann-Whitney is a statistic used for two independent samples and is recognized for its sensitivity to location parameters. The test utilizes stochastic ordering of scores; therefore ordinal, interval, or ratio data can be used. Unlike the t test, the Wilcoxon

Mann-Whitney utilizes the median rather then the mean. The Wilcoxon Mann-

Whitney has two underlying assumptions: 1. Samples must be independent from one another, and 2. Population distributions of the dependent variable share a similar unspecified shape/distribution.

The Wilcoxon test may be considered the most powerful nonparametric test

(Runyon & Haber, 1991). According to Bradley (1968), “In comparison to other distribution free statistics, the Wilcoxon test typically either ranks first or, when the set of tests being compared includes the optimum test for the conditions of comparison, ranks a close second” (pp. 109-110). The strengths of the

Wilcoxon Mann-Whitney are its ability to remain both powerful and robust under conditions of non-normality. According to Kerlinger and Lee (2000),

“Nonparametric methods are virtually inexhaustible” (p. 242); “given the relatively simple principles involved and the various properties of data that can be exploited: range, periodicity, distributions, and rank” (p. 425).

11

The Power and Robustness of the t test and Wilcoxon Mann-Whitney

One of the underlying assumptions of the t test is normality, yet the t test was developed at a time when the concept of normality in real world distributions was under scrutiny. Researchers argued that the t test maintained acceptable power when the assumption of normality was violated. Bradley

(1978) summarized their views (p. 145):

„this assumption may be violated almost with impurity…may be safely ignored‟ (Hays, 1963, pp. 322, 380); „nearly immune to violation…invulnerability... functionally a distribution-free test‟ (Boneau, 1960, pp.50, 51, 60); „assumption may be waived‟ (Dinham, 1976, p. 174); „assumptions whose failure does not much matter‟ (Wright, 1976, p. 992); „this remarkable property of robustness to nonnormality‟ (Box, 1953, p. 318); „remarkable robust‟ (Walker & Lev, 1969, p. 286); „amazingly sensitive …extremely gratifying‟ (Linquist, 1953, pp. 81, 86).

Recognized for its power under normal theory, it was believed that compromising normality would only result in a minor, if any, decrease in power. However according to Blair (1981):

one might assume that because the t test is the uniformly most powerful (UMPU) unbiased test under normal theory, it will naturally be more powerful than other tests in the non-normal situation, provided that its normal theory power is preserved. But this is fallacious reasoning because the optimal power properties associated with the t test under normal theory are no longer in force once the normality stipulation has been abandoned (p.500).

In fact, a study by Micceri (1989) indicated that many data sets in education and psychology are nonnormal. As previously mentioned Micceri (1989) studied 440 distributions in psychology and education, and found that none

12

were normally distributed and only 3% were identified as relatively symmetric with light tails. Data sets in other disciplines are also recognized as non- normal. For example, distributions in the fields of medicine often demonstrate extreme skewness (Bridge & Sawilowsky, 1999; Barber & Thompson, 1998).

Therefore, if the presence of non-normal distributions is a rule rather than an exception, researchers must take a close look at the shape of their data and the tests they are applying.

Although the non-parametric counterpart to the t tests, specifically the

Wilcoxon Mann-Whitney, has existed since 1945, it did not gain quick popularity. This lack of popularity was attributed to belief that the t test would maintained its power properties under nonnormality, the Wilcoxon Mann-

Whitney was less powerful, and that there were few nonparametric options for complex designs (Sawilowsky, 1990). However, over the previous two decades, concerns over proper test selection have accelerated the study of the comparative power of the t test and Wilcoxon Mann-Whitney tests and outcomes from their studies have been controversial.

Hodges and Lehmann (1956) indicated a larger power advantage with use of the Wilcoxon over the t test with asymptotic/nonasymptotic conditions. A subsequent study by Boneau (1962) discounted those findings. Boneau affirmed the power advantage of the t test over the Wilcoxon Mann-Whitney with asymptotic/nonasymptotic conditions. Subsequently Neave and Granger

(1968) reported that the Wilcoxon rank-sum was only slightly inferior to the t

13

test with normal distributions and was “much superior in the cases of non- normal populations” (p.509).

Researchers also compared the t tests and the Wilcoxon Mann-Whitney specific to the asymptotic relative (ARE), when sample sizes are large. Blair & Higgins (1980a) indicated a “very large potential power advantage of the Wilcoxon statistic over the t statistic promised by asymptotic theory obtainable with nonasymptotic conditions” (p. 654). Developing this field of study, Blair & Higgins (1981) completed a subsequent study of the differences between the t test and Wilcoxon by examining the characteristic of the asymptotic relative efficiency and again concluded that the Wilcoxon had large asymptotic relative efficiency (ARE) or power advantages over the t test.

This point is reemphasized in research by Bridge and Sawilowsky (1997,

1999). They found that although the t test can maintain its power properties when distributions vary slightly from normality; when distributions are skewed, the Wilcoxon Rank Sum is more powerful. Further studies of sample size and power suggest that the power advantage of the Mann-Whitney is maintained even when the sample sizes increase, “therefore the Mann-Whitney test should not be thought of as a „small sample‟ procedure” (Blair & Higgins, 1985, p. 119).

The robustness of the t test and Wilcoxon Mann-Whitney, specific to distribution, sample size and one/two tailed test has also been extensively studied. Researchers contend that the t test is robust “insofar as Type I errors

14

are concerned, to non-Gaussian population shape so long as (a) sample sizes are equal or nearly so, (b) sample sizes are fairly large (25-30), and (c) tests are two-tailed rather than one tailed” (Sawilowsky & Blair, 1992, p. 352).

However, in reference to normality this generalization is only true if violations to normality are minimal. According to Sawilowsky and Blair (1992), errors in previous studies of the power properties and robust nature of the t test under violations of normality did not utilize and recognize the extreme distributions of real world data. They add that with extremely skewed distributions, the results were “non-robust” when selecting the independent samples t test (p. 352).

The last few decades have provided research to address the comparative power and robustness properties of the t test and Mann-Whitney tests specific to normality/non-normality. Undoubtedly the t test is the UMPT under normal distributions however, its advantage decreases as distributions vary from normality.

The Function of the Mean and Median in Inferential Statistics

According to Cohen (1988), researchers most often use the in hypothesis testing. The t test remains the most popular test of assessing mean differences in education and in the medical field (Blair & Higgins, 1980a;

Bridge & Sawilowsky, 1999). In fact, Barber and Thompson (2000) assert that the mean is very important in inferential testing. Specifically, they argue that economic data in the medical field is primarily concerned with the mean; therefore the best statistic is the t test. In a recent study, they actively chose

15

the t test over the Wilcoxon Mann-Whitney out of the desire to prioritize a test that specifically uses a mean rather than the Mann-Whitney, which uses the median (Barber & Thompson, 2000). This is a common misconception among researchers and indicates a limited understanding of the efficiency of the median in inferential testing. Therefore a brief explanation of the mean and median as it relates to inferential testing is warranted.

The t test relies on the mean to assess a change in location parameters; the

Wilcoxon Mann-Whitney uses the median, which is derived from the data. However, analysis with the Wilcoxon Mann-Whitney can take place at any of the and is not restricted. Thus, comparing the mean and the median at the 50th with symmetrical data renders a similar outcome specific to shift in location.

The t test requires that the data be normally distributed or relatively symmetric, while the Wilcoxon Mann-Whitney does not. One of the controversies surrounding these two tests is that many parametricians feel tests of the median are not as sensitive as tests of the mean. They argue that in ranking the data, information is lost (Sawilowsky, 1993). Conover and Iman

(1976) conclude that although the distribution of the mean approaches normality quickly as sample sizes increase, this may not be the case when data is highly skewed, when outliers impact the mean, or sample sizes are small. Additionally, recalling that the Wilcoxon Mann-Whitney utilizes stochastic/rank ordering, the “theoretical mean and variance of the ranks are

16

known” (Harwell, 1988, p. 35), therefore, unlike the median in ranked data; the

mean is highly sensitive to both skewed distributions and the presence of

outliers. Due to its ranked value, outliers bare little impact on the Wilcoxon

Mann-Whitney as compared to the t test.

Scale Change/Variance

Statistical tests are also sensitive to scale change or homogeneity of variance. Homogeneity speaks to the “extent to which the members of the group tend to be the same on the variables being investigated” (Hinkle, et al,

1998, p.618). Homogeneity of variance ensures that the variance between groups is the same or equal with respect to random variations. Thus, with equal starting points, a study that looks at shift in location between two groups will allow for a valid comparison of outcomes.

The power and robustness of t test and Mann-Whitney specific to the violation of homogeneity of variance has also been a source of debate (Zimmerman,

1987, 1991, Gibbons & Chakraborti, 1991, 1992). According to Bradstreet

(1997), when population are not equal (heteroscedasticity), error is likely to occur and the findings can be misleading. Data sets in education and psychology are recognized as lacking homogeneity, thus impacting the mean and scale change (Howell & Games, 1974). Sawilowsky and Blair (1992) supported this position after their review of large numbers of data sets. They stated that they have “never encountered a treatment or other naturally occurring condition that produces heterogeneous variances while leaving

17

population means exactly equal” (Sawilowsky & Blair, 1992, p. 358). Therefore, the integrity of outcomes is in question when this violation occurs.

Studies looking at the t test and Wilcoxon Mann-Whitney tests specific to normality found that the t test is robust under violations of homogeneity, provided that sample sizes are equal and reasonably large (Zimmerman &

Williams, 1989; Gibbons & Chakraborti, 1991). Further, Zimmerman (1987) found that “when the small sample size is associated with small variance, the

Wilcoxon Mann-Whitney was more powerful in detecting differences between population means” (p.171). However, if sample sizes are equal or if samples are small with a large variance, the t test is more powerful (Zimmerman, 1987).

Further, “when samples sizes are unequal…the t test is decidedly not robust and Type I and Type II error probabilities are sometimes substantially altered”

(Zimmerman, 1991, p.360). Zimmerman offers these conclusions: “It is well know that when assumptions are satisfied, the Mann-Whitney is just slightly less powerful than the t test, having asymptotic relative efficiency of .955. Monte

Carlo simulations have disclosed that violations of assumptions alter the entire power functions of the two tests in a similar way (Zimmerman, 1987). As long as the Student t test performs well, the Mann-Whitney test is only slightly less effective. But whenever the t test fails, because of unequal variance, the Mann-

Whitney test also fails” (Zimmerman, 1991, p. 360).

The t test uses original data to arrive at the mean and is highly influenced by

heterogeneity of scores. Thus, comparatively it is less likely than the Wilcoxon

18

Mann-Whitney to pick up true variance if there are violations of homogeneity, especially with smaller samples sizes. One recognized solution is to increase sample size; as sample size increases, scores become more homogeneous, although outliers can still impact the mean. In the fields of education and the social sciences samples are often moderate to small, thus heterogeneity and error remains a potential threat (Sawilowsky & Blair, 1992).

Another suggested solution is to assess the heterogeneity of the data prior to executing the t test. In fact, prudent researchers recognize that although several statistical software packages offer this option, the use of these tests may increase Type I error and should not be regarded as a plausible solution

(Hutchinson, 2002; Zimmerman, 2004). Rather, tests that are sensitive to both location and scale change without impacting Type I error should be investigated and utilized (Hutchinson, 2002).

Shift in Location Parameters

Parametric and nonparametric tests are sensitive to a shift in location when a treatment effect is present. A study by Neave & Granger (1968) compared eight different two-sample tests using a Monte Carlo design and compared both normal and nonnormal data sets with 500 pairs of samples. They found similarities with the t test and the Wilcoxon Mann-Whitney, however when populations were normally distributed the Wilcoxon Mann-Whitney was “only very slightly inferior to the t test and naturally were much superior in the cases of non-normal populations” (Neave & Granger, 1968, p. 509).

19

Studies also suggest that with shift in location, other changes may also be produced. “Treatment often produces changes in mean as well as variance, skew, tail weight, and other population parameters, (Sawilowsky & Blair, 1992, p. 353). Sawilowsky and Blair (1992) studied the robustness of the t test and

Wilcoxon Mann-Whitney test. Outcomes from their research found that the t test retains its robust properties when sample sizes exceed 30 per group, groups have equal sample sizes and a two-tail test is being applied

(Sawilowsky, 1990; Sawilowsky & Blair, 1992). They suggest that when normality is met (or deviates slightly) the t test maintains a small power advantage over the Wilcoxon. Further, when normality is violated, the

Wilcoxon can be three to four times more powerful than the t test (Sawilowsky,

1992).

Bridge and Sawilowsky (1999) studied shift in location for three real data sets. The results suggest "the t test was more powerful only under a distribution that was relatively symmetric although the magnitude of the differences was trivial”. “In contrast, the Wilcoxon Rank Sum test held huge power advantages for data sets which presented skewness or heavy tails” (pp.

232-233). Further, their findings indicate that the Wilcoxon Rank Sum is powerful with small or unequal samples sizes often found in medicine; an advantaged not shared with the t test. Additionally, according to Sawilowsky

(2001) one strength of the Wilcoxon Mann-Whitney is its sensitivity to shift in location compared with the power of the independent samples t test. However,

20

when the treatment alternative is a shift in location and a change in scale, neither test is recommended.

Scale Change and Shift in Location Parameters

Scale change and shift in location are critical to a tests ability to maintain its power properties. Sawilowsky (2002) purports that a study of scale change or shift in location properties should not be studied in isolation; rather studied together. Recognizing the imperfections of any data set specific to not only the distribution, but also to the impact of treatment/intervention on scores, it is realistic to assume that scale change and shift in location between groups may not be harmonious. It is not uncommon that a treatment effect is limited to only an increase (or decrease) in the scores or values for each “n”. What may be more common is that some scores/values may increase while others decrease. Perhaps the scores will become more spread out or heterogeneous or they may respond by becoming more similar, bunched or homogeneous.

Therefore, the violation of homogeneity of groups, critical to the function of the t test may be compromised as variance of scores becomes negatively impacted. In this case, the shift in location (mean, median) may become distorted and no longer robust (Sawilowsky & Blair, 1992, Sawilowsky &

Fahoome, 2003). What may appear as a shift in location is actually the result of great variances between scores within the group. Ultimately the mean could demonstrate a change; however it would not be without a great deal of internal variance. Plausibly, the change could be a combination of both scale

21

change and shift (Sawilowsky & Blair, 1992; Hart, 2001). Accordingly, the nature of these differences will affect which test is most powerful (Hart, 2001).

Specific to change in location, a violation of homogeneity may not impact the

Wilcoxon Mann-Whitney as greatly. The Wilcoxon Mann-Whitney‟s process of stochastic ordering eliminates the variance between scores and is sensitive enough to detect slight shifts in location, and even when no shift exists, it can detect variance. Research conducted by Hutchinson (2002) compared three tests specific to scale change and shift in location: the t test, Mann-Whitney-

Wilcox, and the Lepage. His findings suggest that the independent t test had the least power and the Lepage demonstrated an advantage over the

Wilcoxon.

Perhaps the most difficult part of studying the power properties of the independent samples t test and the Wilcoxon Mann-Whitney is the fact that many data sets may have multiple characteristics and or multiple violations.

For example, a data set may come from a skewed distribution have a large, but uneven number per group. In this case there may be extreme violations or perhaps only minor violations to one or multiple assumptions.

Although some characteristics of data sets are known (sample size, numbers per group, etc), many times the distribution characteristics are unknown, leading to a violation of the underlying assumptions and potentially erroneous findings. As discussed, when violations are present the likelihood of committing either a Type I or Type II error increases. Recognizing that a small

22

percentage (3%) of distributions in education and psychology approach normality (Micceri, 1989, Zumbo & Zimmerman, 1993), and that medical data are recognized for its skewed properties (Barber & Thompson, 1998; Bridge &

Sawilowsky, 1999), the effort to establish theoretical support for test selection must continue.

23

CHAPTER THREE

METHODS

Monte Carlo Design

With the introduction of computers and advanced technology, assessing the robustness and power properties of statistical tests has been simplified. The evolution of determining the robustness and comparative power properties of statistical tests began with the application to theoretical distributions, sample sizes and treatment effects, to using real world distributions. In the forefront of this technology is the application of computer simulations or Monte Carlo studies. Harwell (1990) defined Monte Carlo studies as computer simulations measuring the mathematical properties of a statistical test, allowing for the simulation and control of variables under study. According to Harwell (1990), a

Monte Carlo study is conducted in the following manner:

In the typical MC study of a given statistical test the following process is repeated for a large number of samples: data are simulated which reflect a specific relationship among variables (but which do not usually conform to the assumptions required for correct application of the test), the statistical test is computed for the data, and the value of the statistical test is recorded. The values of the statistical test provide information on its properties (e.g., the proportion of the “significant” values on the test). If the underlying assumption of the test were satisfied, exact would guarantee that the test would have a specified type I error rate and would permit the probability of rejecting a false statistical hypothesis to be computed. Monte Carlo studies permit these characteristics to be examined when underlying assumptions are violated (p.4).

This study will utilize Monte Carlo techniques using a Dell, L500r computer and Essential Lahey FORTRAN 90 v 4.0 software. A program will be written in

24

FORTRAN language to assess the power and robustness of the t test and the

Wilcoxon Mann Whitney, specific to shift in location and scale change with shift in location. Data will be generated using a pseudo-random number generator, provided through the Essential Lahey FORTRAN 90 v 4.0 software.

Subroutines will draw from FORTRAN programs previously developed in other

Monte Carlo studies (Sawilowsky & Fahoome, 2003).

METHODOLOGY

Applying Monte Carlo techniques, the comparative power and robustness of the independent samples t test and the Wilcoxon Mann-Whitney specific to 1) change in scale and 2) change in scale with shift in location will be investigated. Each test will be compared using various distributions, and sample sizes. The following summarizes the parameters utilized.

Study Parameters

Sampling Distributions

This study is specific to small scale change and change in location however it is recognized that the power of the test is also impacted by the . For points of comparison, three distributions were identified. Each distribution is distinguished by it skewness, , mean, median and . In addition to a normal (Gaussian) distribution, two distributions studied by Micceri (1989) as representative of “real world data” within education and psychology will be used. These distributions include:

Smooth Symmetric and Extreme Asymmetry, Achievement. Descriptions of

25

the distributions are as followed (Sawilowsky & Blair, 1992):

1. Gaussian Distribution: This bell shaped distribution has equally

weighted tails and distributions of scores. The mean and median =

0.00, standard deviation = 1.00, skewness = 0.00, and kurtosis = 3.00.

2. Smooth Symmetric Distribution. The Smooth Symmetric distribution is

similar to the however it is distinguished by a light

skew and a small variance in kurtosis from the normal distribution. This

distribution has a mean = 13.91, median =13.00, standard deviation =

4.91, skewness = 0.01 and kurtosis = 2.66. This distribution

450 400 350 300 250 200 150 100 50

Frequency of Scores Frequency 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26

Scores 26

demonstrates an 11.3% variance from normal kurtosis, thus slightly

playkurtic.

3. Extreme Asymmetry, Achievement Distribution. This distribution has a

mean = 24.5, median =27.00, standard deviation = 5.79, skewness =

1.64, and kurtosis = 4.11. This distribution demonstrates a 37%

variance (leptokurtic) from normal kurtosis.

Sample Size and Nominal Alpha

An array of six equal and unequal sample sizes was selected. These include: (n1, n2) = (10, 30), (30, 10), (20, 20), (15, 45), (45, 15) and (30, 30).

Sample sizes were chosen based on their representation of real world data often used in clinical trials, psychology, and education. Nominal alpha levels will be set at .05 respectively. One million repetitions will be run.

27

Study Design

The study design will allow for a comparison of powers of the t test and

Wilcoxon-Mann Whitney specific to both shift in location and scale change.

Utilizing the three distributions and six sample sizes, constants will be added to study treatment effects specific to shift in location parameters of the independent samples t test and Wilcoxon Mann-Whitney. These treatment effects are a function of the standard deviation multiplied by a constant

(Sawilowsky & Fahoome, 2003). Constants 0-.2 (increasing in increments of

.05) will be added in separate runs rendering a p-value. The obtained p-values for each distribution will then be used to compare the power of the test statistics by subtracting the larger p-value from the smaller p-value.

Additionally, to assess the robustness (Type I error) of the two tests, actual alpha will be compared with nominal alpha.05.

The shape of various distributions and violations to the assumption of homoscedasticity could result in scale change without a shift in location

(Sawilowsky and Fahoome, 2002). Therefore, it is important to look at the impact of small treatment effects, where the assumption of homogeneity is not in question, but where distribution of scores or variance within scores may be impacted. This part of the study will look at the comparative power of the independent samples t test and the Wilcoxon Mann-Whitney in detecting the impact of -scale change on shift in location. The ratio of variance for group one and group two will range from 1.0-1.2 (increase in increments of .05) for

28

homogeneous variance with shift and heterogeneous variance, no shift. Sizes of 0.0–1.1 (increasing in increments of .05) will be used where shift and scale change are both present (see appendix 1). Treatment scores will be converted to z scores and multiplied by the treatment effect.

One million repetitions will be performed for each distribution, sample size, and treatment effect. Power comparisons will be conducted by subtracting the more powerful statistic form the less powerful statistic. Robustness of the two tests will also be assessed for each condition. In addition to narratives, charts and bar diagrams will be used to illustrate the comparisons.

29

CHAPTER FOUR

RESULTS

Using a Monte Carlo simulation, the Type I and Type II error rates of the t

test and Wilcoxon were examined, controlling for scale change/variance and

shift in . The results produced outcomes that support

previous findings, and extend new insights specific to the power and

robustness of the t test and Wilcoxon. Three populations were studied; one

theoretical distribution (Gaussian) and two real prototypical data sets found in

education and psychology (Smooth Symmetric & Extreme Asymmetry,

Achievement, Micceri, 1989). Six equal and unequal sample sizes were used:

(n1, n2) = (10, 30), (30,10), (20,20), (15, 45), (45,15), (30, 30). Further

controlling for change in scale and shift in location, both homogeneity and

heterogeneity were studied. Incrementing equal and unequal amounts of

variance to n1 and n2, a slight shift in location was added to both the upper and

lower tails for a total of 25 study combinations for each of the six sample sizes.

The outcomes were organized into 450 tables by distribution. However, listed below are those tables (tables 1-12) used to illustrate the findings in the conclusion.

30

Table 1. Rejections (10-6) for Shift (L,U) = (.0,.0) for t and Wilcoxon, Gaussian distribution, various ratios of 1 & 2, sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions. t test Wilcoxon 1 2 L=.025 U=.025 L=.025 U=.025 1 1 25344 24692 25211 24717 1 1.05 22584 22212 23499 23301 1 1.1 20166 19928 22010 21845 1 1.15 18207 17945 20963 20674 1 1.2 16134 16326 19338 19781 1.05 1 27617 27765 26390 26467 1.05 1.05 25141 25183 25056 25095 1.05 1.1 22273 22225 23258 23325 1.05 1.15 20288 20343 22214 22095 1.05 1.2 18484 18157 20893 20897 1.1 1 30697 30716 28088 28083 1.1 1.05 28000 27738 26708 26324 1.1 1.1 25098 24970 25026 25087 1.1 1.15 22798 22596 23819 23511 1.1 1.2 20405 20249 22250 22103 1.15 1 33819 33417 29964 29666 1.15 1.05 30575 30670 28032 28123 1.15 1.1 27401 27748 26208 26662 1.15 1.15 24761 24886 24647 24841 1.15 1.2 22790 22776 23650 23588 1.2 1 36292 36588 30969 31187 1.2 1.05 33411 33234 29508 29376 1.2 1.1 30070 30109 27774 27595 1.2 1.15 27560 27533 26509 26392 1.2 1.2 25108 24898 25001 24729

31

Table 2. Rejections (10-6) for Shift (L,U) = (.0,.0) for t and Wilcoxon, Gaussian distribution, various ratios of 1 & 2, sample size (n1,n2) = (20,20),  = .05; 1,000,000 repetitions. t test Wilcoxon 1 2 L=.025 U=.025 L=.025 U=.025 1 1 24981 24737 24672 24417 1 1.05 24996 25116 24519 24539 1 1.1 24953 24954 24542 24593 1 1.15 25193 24846 24892 24549 1 1.2 24856 25150 24525 24718 1.05 1 25031 24961 24593 24472 1.05 1.05 25025 25029 24554 24741 1.05 1.1 25056 25085 24716 24722 1.05 1.15 25041 24920 24602 24617 1.05 1.2 25051 25028 24868 24472 1.1 1 24877 25235 24584 24977 1.1 1.05 25070 25209 24579 24559 1.1 1.1 24851 25211 24350 24752 1.1 1.15 25105 24790 24718 24513 1.1 1.2 25161 24782 24758 24532 1.15 1 25102 24998 24954 24675 1.15 1.05 25076 24622 24617 24408 1.15 1.1 25278 25177 24677 24852 1.15 1.15 25081 24841 24575 24446 1.15 1.2 24942 24980 24574 24577 1.2 1 25153 25159 24979 25066 1.2 1.05 25301 25080 24812 24739 1.2 1.1 25134 24706 24756 24452 1.2 1.15 25144 24904 24689 24490 l.2 1.2 24835 24855 24521 24333

32

Table 3. Rejections (10-6) for Shift (L,U) = (.0,.0) for t and Wilcoxon, Gaussian distribution, various ratios of 1 & 2, sample size (n1,n2) = (15,45),  = .05; 1,000,000 repetitions. t test Wilcoxon

1 2 L=.025 U=.025 L=.025 U=.025 1 1 24980 25148 25459 25598 1 1.05 22322 22084 23866 23640 1 1.1 20089 20098 22701 22686 1 1.15 17950 18007 21197 21159 1 1.2 16289 16159 20250 20158 1.05 1 27861 27965 27041 27202 1.05 1.05 24863 25147 25327 25515 1.05 1.1 22687 22528 24206 24100 1.05 1.15 20007 20059 22397 22488 1.05 1.2 18521 18002 21581 21220 1.1 1 30736 30680 28723 28646 1.1 1.05 27568 27919 27004 27120 1.1 1.1 24753 25089 25147 25390 1.1 1.15 22504 22513 23963 24095 1.1 1.2 20602 20506 22880 22836 1.15 1 33488 33592 30059 30133 1.15 1.05 30669 30417 28666 28376 1.15 1.1 27824 27665 27123 27062 1.15 1.15 25098 25372 25600 25738 1.15 1.2 22348 22817 23683 24329 1.2 1 36842 36605 32108 31738 1.2 1.05 33318 33140 30108 29875 1.2 1.1 30005 30194 28304 28238 1.2 1.15 27332 27573 26898 26735 1.2 1.2 25230 25198 25539 25623

33

Table 4. Rejections (10-6) for Shift (L,U) = (.0,.0) for t and Wilcoxon, Smooth Symmetric distribution, various ratios of 1 & 2, sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions. t test Wilcoxon 1 2 L=.025 U=.025 L=.025 U=.025 1 1 24991 25016 24980 24815 1 1.05 21374 23409 20411 22492 1 1.1 18297 21895 20371 22462 1 1.15 15669 20657 18877 22344 1 1.2 13461 19511 18153 21122 1.05 1 29022 26654 30390 27875 1.05 1.05 24991 25016 24980 24815 1.05 1.1 21533 23487 20411 22492 1.05 1.15 18585 22036 20365 22488 1.05 1.2 16062 20825 19628 22093 1.1 1 33307 28283 30460 27914 1.1 1.05 28816 26562 30390 27875 1.1 1.1 24991 25016 24980 24815 1.1 1.15 21693 23564 20411 22492 1.1 1.2 18875 22171 20401 22482 1.15 1 37750 29927 32609 27968 1.15 1.05 32874 28128 30467 27873 1.15 1.1 28647 26482 30390 27875 1.15 1.15 24991 25016 24980 24815 1.15 1.2 21810 23618 20411 22492 1.2 1 42427 31536 33684 29487 1.2 1.05 37099 29720 31497 28310 1.2 1.1 32493 27998 30410 27889 1.2 1.15 28473 26424 30390 27875 1.2 1.2 24991 25016 24980 24815

34

Table 5. Rejections (10-6) for Shift (L,U) = (.0,.0) for t and Wilcoxon, Smooth Symmetric distribution, various ratios of 1 & 2, sample size (n1,n2) = (20,20),  = .05; 1,000,000 repetitions.

t test Wilcoxon 1 2 L=.025 U=.025 L=.025 U=.025 1 1 25129 25044 25490 25182 1 1.05 23891 26325 25292 28052 1 1.1 22732 27656 25287 28072 1 1.15 21730 29021 24392 29116 1 1.2 20828 30344 24779 28931 1.05 1 26448 23778 28150 24920 1.05 1.05 25129 25044 25490 25182 1.05 1.1 23945 26251 25292 28052 1.05 1.15 22834 27539 25254 28091 1.05 1.2 21871 28830 25051 28364 1.1 1 27821 22615 28163 24908 1.1 1.05 26388 23827 28150 24920 1.1 1.1 25129 25044 25490 25182 1.1 1.15 24008 26191 25292 28052 1.1 1.2 22932 27432 25286 28061 1.15 1 29191 21566 29277 24043 1.15 1.05 27681 22737 28189 24884 1.15 1.1 26333 23876 28150 24920 1.15 1.15 25129 25044 25490 25182 1.15 1.2 24055 26136 25292 28052 1.2 1 30564 20689 29137 24462 1.2 1.05 28980 21704 28482 24658 1.2 1.1 27568 22845 28154 24915 1.2 1.15 26278 23934 28150 24920 l.2 1.2 25129 25044 25490 25182

35

Table 6. Rejections (10-6) for Shift (L,U) = (.0,.0) for t and Wilcoxon, Smooth Symmetric distribution, various ratios of 1 & 2, sample size (n1,n2) = (15,45),  = .05; 1,000,000 repetitions. t test Wilcoxon 1 2 L=.025 U=.025 L=.025 U=.025 1 1 25281 25171 25366 25033 1 1.05 21303 23817 21043 23413 1 1.1 18046 22634 20981 23388 1 1.15 15196 21473 19207 23550 1 1.2 12851 20441 18591 22196 1.05 1 29683 26560 31707 28199 1.05 1.05 25281 25171 25366 25033 1.05 1.1 21457 23877 21043 23413 1.05 1.15 18343 22726 20970 23419 1.05 1.2 15581 21639 20141 23109 1.1 1 34375 27876 31781 28226 1.1 1.05 29474 26489 31707 28199 1.1 1.1 25281 25171 25366 25033 1.1 1.15 21607 23933 21043 23413 1.1 1.2 18632 22833 21017 23403 1.15 1 39270 29278 34115 28087 1.15 1.05 33888 27755 31794 28184 1.15 1.1 29276 26421 31707 28199 1.15 1.15 25281 25171 25366 25033 1.15 1.2 21757 23989 21043 23413 1.2 1 44318 30582 35057 29722 1.2 1.05 38516 29066 32826 28589 1.2 1.1 33464 27643 31737 28205 1.2 1.15 29109 26361 31707 28199 1.2 1.2 25281 25171 25366 25033

36

Table 7. Rejections (10-6) for Shift (L,U) = (.0,.0) for t and Wilcoxon, Extreme Asymmetric Achievement distribution, various ratios of 1 & 2, sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions. t test Wilcoxon 1 2 L=.025 U=.025 L=.025 U=.025 1 1 31540 15464 25355 24800 1 1.05 29804 12383 39795 8410 1 1.1 28193 9957 38941 8447 1 1.15 26765 8044 37764 8530 1 1.2 25488 6531 36436 8658 1.05 1 33441 19073 17143 60291 1.05 1.05 31540 15464 25355 24800 1.05 1.1 29894 12533 39795 8410 1.05 1.15 28355 10169 39084 8446 1.05 1.2 26964 8312 37854 8527 1.1 1 35223 23066 17553 59912 1.1 1.05 33352 18898 17143 60291 1.1 1.1 31540 15464 25355 24800 1.1 1.15 29964 12640 39795 8410 1.1 1.2 28485 10356 39142 8440 1.15 1 37039 27444 18188 59334 1.15 1.05 35042 22647 17529 59977 1.15 1.1 33279 18745 17143 60291 1.15 1.15 31540 15464 25355 24800 1.15 1.2 30021 12750 39795 8410 1.2 1 38799 32039 18961 58662 1.2 1.05 36781 26811 18142 59375 1.2 1.1 34882 22308 17452 59999 1.2 1.15 33203 18601 17143 60291 1.2 1.2 31540 15464 25355 24800

37

Table 8. Rejections (10-6) for Shift (L,U) = (.0,.0) for t and Wilcoxon, Extreme Asymmetric Achievement distribution, various ratios of 1 & 2, sample size (n1,n2) = (20,20),  = .05; 1,000,000 repetitions. t test Wilcoxon 1 2 L=.025 U=.025 L=.025 U=.025 1 1 24566 24604 25337 25172 1 1.05 25966 23273 56928 11616 1 1.1 27327 22160 56135 11813 1 1.15 28760 21129 54974 12140 1 1.2 30086 20254 53633 12531 1.05 1 23287 25984 11670 56365 1.05 1.05 24566 24604 25337 25172 1.05 1.1 25896 23336 56928 11616 1.05 1.15 27187 22259 56178 11776 1.05 1.2 28573 21290 55067 12103 1.1 1 22191 27382 11890 55600 1.1 1.05 23348 25923 11670 56365 1.1 1.1 24566 24604 25337 25172 1.1 1.15 25826 23386 56928 11616 1.1 1.2 27063 22350 56331 11761 1.15 1 21166 28744 12202 54536 1.15 1.05 22297 27236 11856 55634 1.15 1.1 23388 25872 11670 56365 1.15 1.15 24566 24604 25337 25172 1.15 1.2 25771 23457 56928 11616 1.2 1 20251 30112 12598 53265 1.2 1.05 21309 28567 12177 54629 1.2 1.1 22377 27127 11846 55795 1.2 1.15 23455 25813 11670 56365 l.2 1.2 24566 24604 25337 25172

38

Table 9. Rejections (10-6) for Shift (L,U) = (.0,.0) for t and Wilcoxon, Extreme Asymmetric Achievement distribution, various ratios of 1 & 2, sample size (n1,n2) = (15,45),  = .05; 1,000,000 repetitions. t test Wilcoxon 1 2 L=.025 U=.025 L=.025 U=.025 1 1 30580 18055 25593 25015 1 1.05 28603 14863 47749 6946 1 1.1 26879 12324 46577 7015 1 1.15 25273 10214 44995 7147 1 1.2 23881 8488 43250 7331 1.05 1 32592 21673 14656 69801 1.05 1.05 30580 18055 25593 25015 1.05 1.1 28690 14987 47749 6946 1.05 1.15 27005 12529 46730 6996 1.05 1.2 25487 10482 45113 7136 1.1 1 34675 25595 15111 69156 1.1 1.05 32488 21486 14656 69801 1.1 1.1 30580 18055 25593 25015 1.1 1.15 28778 15110 47749 6946 1.1 1.2 27136 12724 46832 6991 1.15 1 36652 29815 15760 68141 1.15 1.05 34451 25223 15080 69204 1.15 1.1 32412 21328 14656 69801 1.15 1.15 30580 18055 25593 25015 1.15 1.2 28851 15249 47749 6946 1.2 1 38546 34080 16573 67034 1.2 1.05 36374 29224 15715 68224 1.2 1.1 34276 24841 15010 69300 1.2 1.15 32326 21180 14656 69801 1.2 1.2 30580 18055 25593 25015

39

Table 10. Rejections (10-6) for Shift (L,U) = (0.5,.0) for t and Wilcoxon, Gaussian distribution, various ratios of 1 & 2, sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions. t test Wilcoxon 1 2 L=.025 U=.025 L=.025 U=.025 1 1 34045 18432 33786 18262 1 1.05 30634 16455 31630 17392 1 1.1 27449 14733 29509 16277 1 1.15 24621 13117 27910 15350 1 1.2 22338 11828 26386 14550 1.05 1 36872 20301 35004 19497 1.05 1.05 33257 18273 32858 18380 1.05 1.1 30148 16408 31162 17393 1.05 1.15 27241 14840 29310 16459 1.05 1.2 24709 13577 27790 15554 1.1 1 40407 23082 36810 21259 1.1 1.05 36450 20604 34601 19664 1.1 1.1 33286 18429 32835 18451 1.1 1.15 29785 16808 30729 17757 1.1 1.2 27369 15445 29626 17013 1.15 1 43792 25494 38660 22666 1.15 1.05 39258 23107 36019 21313 1.15 1.1 35796 20753 34166 20065 1.15 1.15 32600 19004 32157 19038 1.15 1.2 29657 17370 30238 18119 1.2 1 47072 28299 39975 24028 1.2 1.05 42641 25642 37508 22893 1.2 1.1 39023 23344 35800 21607 1.2 1.15 35348 21317 33765 20437 1.2 1.2 32516 19245 32117 19023

40

Table 11. Rejections (10-6) for Shift (L,U) = (0.5,.0) for t and Wilcoxon, Smooth Symmetric distribution, various ratios of 1 & 2, sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions. t test Wilcoxon 1 2 L=.025 U=.025 L=.025 U=.025 1 1 32319 19014 44536 13203 1 1.05 27693 17817 35395 14486 1 1.1 23804 16768 27838 16671 1 1.15 20498 15826 23730 18092 1 1.2 17609 14942 23227 16436 1.05 1 36984 20511 42831 15287 1.05 1.05 31936 19257 44536 13203 1.05 1.1 27549 18093 35395 14486 1.05 1.15 23859 17085 27797 16820 1.05 1.2 20689 16144 23730 18092 1.1 1 41958 22066 41665 17914 1.1 1.05 36357 20726 42831 15287 1.1 1.1 31587 19485 44536 13203 1.1 1.15 27426 18352 35395 14486 1.1 1.2 23901 17388 27802 16820 1.15 1 47029 23574 42111 20259 1.15 1.05 41023 22194 40908 18019 1.15 1.1 35760 20906 42831 15287 1.15 1.15 31226 19714 44536 13203 1.15 1.2 27308 18609 35395 14486 1.2 1 52096 25114 41048 23651 1.2 1.05 45858 23632 42185 20096 1.2 1.1 40176 22312 40908 18019 1.2 1.15 35246 21075 42831 15287 1.2 1.2 30961 19911 44536 13203

41

Table 12. Rejections (10-6) for Shift (L,U) = (0.5,.0) for t and Wilcoxon, Extreme Asymmetric Achievement distribution, various ratios of 1 & 2, sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions. t test Wilcoxon

1 2 L=.025 U=.025 L=.025 U=.025 1 1 41094 9562 51118 7186 1 1.05 38702 7592 48152 7345 1 1.1 36589 6069 44949 7626 1 1.15 34647 4907 42598 7818 1 1.2 32838 3968 54659 3362 1.05 1 42978 12259 51386 7179 1.05 1.05 40574 9817 51118 7186 1.05 1.1 38326 7847 48152 7345 1.05 1.15 36343 6352 44949 7626 1.05 1.2 34484 5186 42651 7816 1.1 1 44927 15332 44158 26154 1.1 1.05 42400 12390 51300 7181 1.1 1.1 40110 10005 51118 7186 1.1 1.15 38010 8123 48152 7345 1.1 1.2 36125 6634 45028 7623 1.15 1 46790 18734 51412 22727 1.15 1.05 44223 15336 44310 25882 1.15 1.1 41868 12524 51207 7185 1.15 1.15 39722 10197 51118 7186 1.15 1.2 37719 8358 48152 7345 1.2 1 48597 22445 55438 21000 1.2 1.05 45961 18594 51471 22634 1.2 1.1 43565 15339 44311 25880 1.2 1.15 41384 12651 51168 7185 1.2 1.2 39351 10399 51118 7186

42

Homogeneity of Variance with Shift in Location

Equal shift, equal sample size

Equal samples sizes (n1=20, n2=20; n1=30, n2=30) and equal amounts of shift (.05, .10, .15, and .20) were studied. With the Gaussian distribution, the t test maintained a slight power advantage across all additions of shift as compared to the Wilcoxon. This advantage was maintained with both samples sizes studied; the t test rejected 4.97-5.01% whereas the Wilcoxon rejected

4.89-4.97%. The advantage held by the t test was not maintained with the introduction of the Smooth Symmetric distribution but as expected, the rejection rates are very similar. The range of rejection for the t test was 4.97-

5.00% whereas the range of rejections for the Wilcoxon was 5.02-5.50%.

Continuing with equal sample sizes and equal amounts of shift, slight differences are noted in the Extreme Asymmetric, Achievement distribution.

The t test and Wilcoxon rejection ranges were 4.92-4.89% and 5.01-5.50% respectively (see table 13).

Equal shift, unequal sample size

Unequal samples sizes (n1=10, n2=30; n1=30, n2=10; n1=15, n2=45; n1=45, n2=15) and equal amounts of shift (.05, .1, .15, and .20) were studied. With the

Gaussian distribution, the range of rejection for the t test was 4.96 -5.03% whereas the Wilcoxon rejected 4.92-5.12%. The Smooth Symmetric distribution demonstrated similar rejection rates as shown with the Gaussian distribution. The t test rejected 4.99-5.05% whereas the Wilcoxon rejects 4.98-

43

5.04%. The Extreme Asymmetric, Achievement distribution noted the t test

rejecting at 4.70-4.86% whereas the Wilcoxon rejected at a higher rate 5.01-

5.07%.

Table 13: Range of rejection change in location/shift: (.05, .1, .15, .2) with equal and unequal sample sizes. Distribution Test Equal Unequal sample sample sizes sizes Gaussian t test 4.91-5.01% 4.96-5.03% Wilcoxon 4.89-4.97% 4.92-5.12% Smooth Symmetric t test 4.97-5.00% 4.99-5.05% Wilcoxon 5.02-5.50% 4.98-5.04% Extreme Asymmetric, t test 4.92-4.89% 4.70-4.86% Achievement Wilcoxon 5.01-5.50% 5.01-5.07%

Unequal shift, equal sample sizes

Using equal samples sizes (n1=20, n2=20; n1=30, n2=30) and unequal

amounts of shift (constant .0-.20 (.05)) power was studied. With the Gaussian

distribution, the t test maintained a slight power advantage across all additions

of variance as compared to the Wilcoxon, The t test rejected 5.27-11.87%

whereas the Wilcoxon rejected 5.16-11.45%. The advantage held by the t test

was inconsistent with the Smooth Symmetric distribution. The t test rejected

5.22-9.88% whereas the Wilcoxon rejects 6.33-6.75%. With the Extreme

Asymmetric, Achievement distribution, the t test rejected 5.20-12.13% with the

Wilcoxon maintaining a large power advantage with rejection rates ranging

from 7.77-34.52% (see table 14).

Unequal shift, unequal sample size

With unequal samples sizes (n1=10, n2=30; n1=30, n2=10; n1=15, n2=45;

44

n1=45, n2=15) and unequal amounts of shift (constant .0-.20 (.05)) the power

differential was apparent. With the Gaussian distribution, the t test maintained

a slight power advantage. The t test rejected 5.20-10.16% whereas the

Wilcoxon rejected 8.10-10.01%. The advantage held by the t test was

maintained with the Smooth Symmetric distribution. The t test rejected 5.13-

8.67% and the Wilcoxon rejects 5.76-6.37%. With Extreme Asymmetric

Achievement distribution, the Wilcoxon rejected at a higher rate (5.82-29,02%)

as compared to the t test (4.75-10.39%) (see table 14).

Table 14: Range of rejection change in location/shift: (.0-.20 (.05)) Distribution Test Equal Unequal sample sample sizes sizes Gaussian t test 5.27-11.87% 5.20-10.16% Wilcoxon 5.16-11.45% 8.10-10.01% Smooth Symmetric t test 5.22-9.88% 5.13-8.67% Wilcoxon 6.33-6.75% 5.76-6.37% Extreme Asymmetric, t test 5.20-12.13% 4.75-10.39% Achievement Wilcoxon 7.77-34.52% 5.82-29.02%

Heterogeneity of Variance with No Shift

Gaussian Distribution

Equal variance, equal sample size

Specific to the Gaussian distribution, the findings indicate that with no shift added, the t test and Wilcoxon reject discriminately based on sample size equality/inequality and the weighted variance between groups. Scale change/variance produced similar rejection rates when equal variance (1.0,1.0;

1.05,1.05; 1.1,1.1; 1.15,1.15; 1.2,1.2) was added to each side. This finding was consistent across all six sample sizes. When adding the upper and lower tails,

45

the t test has an average rejection rate of 4.95% (range 4.96-5.03) and the

Wilcoxon hosts a 4.91% (range 4.94-5.01) rejection rate (see tables 15-20).

Unequal variance, equal sample size

With unequal variance, the performance of each test comes into question.

With equal sample sizes (n1, n2= 20,20; 30,30) each test demonstrates internal

consistency. As illustrated in Table 15 both tests demonstrated rejection rates

that were close, however the t test rejected at a slightly higher rate than the

Wilcoxon across all increments of variance with samples sizes (n1, n2= 20,20).

However with slightly larger sample sizes (n1, n2= 30,30), and larger

differences in variance (1, 2= 1, 1.2; 1.2, 1) the Wilcoxon out-performed the t

test (see table 16).

46

Table 15. Rejection rates (10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Gaussian distribution, sample size (n1,n2) = (20,20),  = .05; 1,000,000 repetitions.

2 t test Wilcoxon Power 1 Difference 1 1 49718 49089 -629 1 1.05 50112 49058 -1054 1 1.1 49907 49135 -772 1 1.15 50039 49441 -598 1 1.2 50006 49243 -763 1.05 1 49992 49065 -927 1.05 1.05 50054 49295 -759 1.05 1.1 50141 49438 -703 1.05 1.15 49961 49219 -742 1.05 1.2 50079 49340 -739 1.1 1 50112 49561 -551 1.1 1.05 50279 49138 -1141 1.1 1.1 50062 49102 -960 1.1 1.15 49895 49231 -664 1.1 1.2 49943 49290 -653 1.15 1 50100 49629 -471 1.15 1.05 49698 49025 -673 1.15 1.1 50455 49529 -926 1.15 1.15 49922 49021 -901 1.15 1.2 49922 49151 -771 1.2 1 50312 50045 -267 1.2 1.05 50381 49551 -830 1.2 1.1 49840 49208 -632 1.2 1.15 50048 49179 -869 1.2 1.2 49690 48854 -836

47

Table 16. Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Gaussian distribution, sample size (n1,n2) = (30,30),  = .05; 1,000,000 repetitions.

t test Wilcoxon Power 1 2 Difference 1 1 49953 49371 -582 1 1.05 49706 49526 -180 1 1.1 49956 49913 -43 1 1.15 50060 49848 -212 1 1.2 50193 50243 50 1.05 1 50052 49742 -310 1.05 1.05 50015 49789 -226 1.05 1.1 50336 49801 -535 1.05 1.15 50181 49980 -201 1.05 1.2 50207 50016 -191 1.1 1 50149 49770 -379 1.1 1.05 49742 49472 -270 1.1 1.1 50206 49608 -598 1.1 1.15 49831 49174 -657 1.1 1.2 49984 49617 -367 1.15 1 49795 49740 -55 1.15 1.05 50172 49753 -419 1.15 1.1 50108 49569 -539 1.15 1.15 50261 49772 -489 1.15 1.2 49824 49401 -423 1.2 1 50331 50547 216 1.2 1.05 50216 49371 -845 1.2 1.1 50290 49526 -764 1.2 1.15 49978 49913 -65 1.2 1.2 50296 49848 -448

48

Unequal variance and Unequal sample size

Unequal sample sizes (n1, n2= 10,30; 30,10; 15,45; 45,15) combined with unequal variance increased the rejection rates of the t test and Wilcoxon, however with less consistency then with equal sample sizes. As noted in

Tables 17 and 18, as the variance increased and each group became more heterogeneous, the rejection rates for the t test increased at a greater rate when the variance weight was greater in the lower tail. Under some conditions, specifically as the sides increased in variance and sample size, the

Wilcoxon out performed the t test (see Tables 19 & 20).

49

Table 17. Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Gaussian distribution, sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions.

t test Wilcoxon Power 1 2 Difference 1 1 50036 49928 -108 1 1.05 44796 46800 2004 1 1.1 40094 43855 3761 1 1.15 36152 41637 5485 1 1.2 32460 39119 6659 1.05 1 55382 52857 -2525 1.05 1.05 50324 50151 -173 1.05 1.1 44498 46583 2085 1.05 1.15 40631 44309 3678 1.05 1.2 36641 41790 5149 1.1 1 61413 56171 -5242 1.1 1.05 55738 53032 -2706 1.1 1.1 50068 50113 45 1.1 1.15 45394 47330 1936 1.1 1.2 40654 44353 3699 1.15 1 67236 59630 -7606 1.15 1.05 61245 56155 -5090 1.15 1.1 55149 52870 -2279 1.15 1.15 49647 49488 -159 1.15 1.2 45566 47238 1672 1.2 1 72880 62156 -10724 1.2 1.05 66645 58884 -7761 1.2 1.1 60179 55369 -4810 1.2 1.15 55093 52901 -2192 1.2 1.2 50006 49730 -276

50

Table 18. Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Gaussian distribution, sample size (n1,n2) = (30,10),  = .05; 1,000,000 repetitions.

t test Wilcoxon Power 1 2 Difference 1 1 49906 49563 -343 1 1.05 56052 53287 -2765 1 1.1 61299 56129 -5170 1 1.15 67114 59129 -7985 1 1.2 72965 61787 -11178 1.05 1 44675 46576 1901 1.05 1.05 50041 49972 -69 1.05 1.1 55554 52917 -2637 1.05 1.15 60846 56058 -4788 1.05 1.2 66499 59026 -7473 1.1 1 40117 44158 4041 1.1 1.05 44671 46720 2049 1.1 1.1 50291 50009 -282 1.1 1.15 55089 52783 -2306 1.1 1.2 60621 55854 -4767 1.15 1 36063 41599 5536 1.15 1.05 40654 44399 3745 1.15 1.1 45472 47125 1653 1.15 1.15 50344 50173 -171 1.15 1.2 54630 52086 -2544 1.2 1 32374 39086 6712 1.2 1.05 36582 41714 5132 1.2 1.1 40998 44484 3486 1.2 1.15 45032 47017 1985 1.2 1.2 50068 49764 -304

51

Table 19. Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Gaussian distribution, sample size (n1,n2) = (15,45),  = .05; 1,000,000 repetitions.

t test Wilcoxon Power 1 2 Difference 1 1 50128 51057 929 1 1.05 44406 47506 3100 1 1.1 40187 45387 5200 1 1.15 35957 42356 6399 1 1.2 32448 40408 7960 1.05 1 55826 54243 -1583 1.05 1.05 50010 50842 832 1.05 1.1 45215 48306 3091 1.05 1.15 40066 44885 4819 1.05 1.2 36523 42801 6278 1.1 1 61416 57369 -4047 1.1 1.05 55487 54124 -1363 1.1 1.1 49842 50537 695 1.1 1.15 45017 48058 3041 1.1 1.2 41108 45716 4608 1.15 1 67080 60192 -6888 1.15 1.05 61086 57042 -4044 1.15 1.1 55489 54185 -1304 1.15 1.15 50470 51338 868 1.15 1.2 45165 48012 2847 1.2 1 73447 63846 -9601 1.2 1.05 66458 59983 -6475 1.2 1.1 60199 56542 -3657 1.2 1.15 54905 53633 -1272 1.2 1.2 50428 51162 734

52

Table 20. Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Gaussian distribution, sample size (n1,n2) = (45,15),  = .05; 1,000,000 repetitions.

t test Wilcoxon Power 1 2 Difference 1 1 49949 50884 935 1 1.05 55562 53907 -1655 1 1.1 61480 57224 -4256 1 1.15 67504 60489 -7015 1 1.2 73287 63304 -9983 1.05 1 44571 47869 3298 1.05 1.05 50070 50793 723 1.05 1.1 55335 53687 -1648 1.05 1.15 60912 57100 -3812 1.05 1.2 66860 60272 -6588 1.1 1 40273 45380 5107 1.1 1.05 44808 48044 3236 1.1 1.1 50323 51257 934 1.1 1.15 55416 54195 -1221 1.1 1.2 60361 56551 -3810 1.15 1 36103 42541 6438 1.15 1.05 40637 45426 4789 1.15 1.1 45116 48145 3029 1.15 1.15 49982 51037 1055 1.15 1.2 55232 53573 -1659 1.2 1 32208 40170 7962 1.2 1.05 36429 42987 6558 1.2 1.1 40704 45514 4810 1.2 1.15 45441 48398 2957 1.2 1.2 49862 50687 825

53

Smooth Symmetric Distribution

Unequal variance and equal sample sizes

The Smooth Symmetric distribution is markedly similar to the Gaussian distribution and is the real data set identified by Micceri (1989) that most resembles normality. Review of these outcomes demonstrates similarities and differences to the Gaussian distribution. Similarly, the tests maintain inner-test consistency with the t test rejecting 5.0-5.1% of the time and the Wilcoxon rejecting 5.0-5.3% of the time. With equal sample sizes, differences include rejection rates that are slightly higher with Wilcoxon. The Wilcoxon rejects consistently across all increments of variance as compared to the Gaussian

(see Tables 21 and 22).

54

Table 21. Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Smooth Symmetric distribution, sample size (n1,n2) = (20,20),  = .05; 1,000,000 repetitions.

t test Wilcoxon Power 1 2 Difference 1 1 50173 50672 499 1 1.05 50216 53344 3128 1 1.1 50388 53359 2971 1 1.15 50751 53508 2757 1 1.2 51172 53710 2538 1.05 1 50226 53070 2844 1.05 1.05 50173 50672 499 1.05 1.1 50196 53344 3148 1.05 1.15 50373 53345 2972 1.05 1.2 50701 53415 2714 1.1 1 50436 53071 2635 1.1 1.05 50215 53070 2855 1.1 1.1 50173 50672 499 1.1 1.15 50199 53344 3145 1.1 1.2 50364 53347 2983 1.15 1 50757 53320 2563 1.15 1.05 50418 53073 2655 1.15 1.1 50209 53070 2861 1.15 1.15 50173 50672 499 1.15 1.2 50191 53344 3153 1.2 1 51253 53599 2346 1.2 1.05 50684 53140 2456 1.2 1.1 50413 53069 2656 1.2 1.15 50212 53070 2858 1.2 1.2 50173 50672 499

55

Table 22. Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Smooth Symmetric distribution, sample size (n1,n2) = (30,30),  = .05; 1,000,000 repetitions.

t test Wilcoxon Power 1 2 Difference 1 1 49772 50158 386 1 1.05 49899 52206 2307 1 1.1 50270 52183 1913 1 1.15 50656 52517 1861 1 1.2 51306 52693 1387 1.05 1 49881 52079 2198 1.05 1.05 49772 50158 386 1.05 1.1 49886 52206 2320 1.05 1.15 50261 52188 1927 1.05 1.2 50531 52349 1818 1.1 1 50219 52080 1861 1.1 1.05 49851 52079 2228 1.1 1.1 49772 50158 386 1.1 1.15 49884 52206 2322 1.1 1.2 50230 52184 1954 1.15 1 50688 52432 1744 1.15 1.05 50168 52084 1916 1.15 1.1 49833 52079 2246 1.15 1.15 49772 50158 386 1.15 1.2 49872 52206 2334 1.2 1 51211 52632 1421 1.2 1.05 50630 52308 1678 1.2 1.1 50120 52089 1969 1.2 1.15 49824 52079 2255 1.2 1.2 49772 50158 386

56

Unequal variance and unequal sample sizes

Unequal variance with unequal sample sizes (n1, n2= 10,30; 30,10; 15,45;

45,15) produced an increase in rejection rates across the four sample sizes for both tests. As demonstrated in Table 23, as the variance became more unbalanced, the rejection rates are impacted. Both tests become less consistent in their rejection rates (as compared to equal sample sizes).

Further, with increased weights of variance, the rejection rates increased discriminately. The t test is slightly more powerful when there is equal or slight variance difference but the Wilcoxon becomes more powerful when the variance increases in amount. This finding was consistent across all the sample sizes (see Tables 24-26).

57

Table 23. Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Smooth Symmetric distribution, sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions.

t test Wilcoxon Power 1 2 Difference 1 1 50007 49795 -212 1 1.05 44783 42903 -1880 1 1.1 40192 42833 2641 1 1.15 36326 41221 4895 1 1.2 32972 39275 6303 1.05 1 55676 58265 2589 1.05 1.05 50007 49795 -212 1.05 1.1 45020 42903 -2117 1.05 1.15 40621 42853 2232 1.05 1.2 36887 41721 4834 1.1 1 61590 58374 -3216 1.1 1.05 55378 58265 2887 1.1 1.1 50007 49795 -212 1.1 1.15 45257 42903 -2354 1.1 1.2 41046 42883 1837 1.15 1 67677 60577 -7100 1.15 1.05 61002 58340 -2662 1.15 1.1 55129 58265 3136 1.15 1.15 50007 49795 -212 1.15 1.2 45428 42903 -2525 1.2 1 73963 63171 -10792 1.2 1.05 66819 59807 -7012 1.2 1.1 60491 58299 -2192 1.2 1.15 54897 58265 3368 1.2 1.2 50007 49795 -212

58

Table 24. Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Smooth Symmetric distribution, sample size (n1,n2) = (30,10),  = .05; 1,000,000 repetitions.

t test Wilcoxon Power 1 2 Difference 1 1 49987 50140 153 1 1.05 55799 58665 2866 1 1.1 61778 58755 -3023 1 1.15 67935 60985 -6950 1 1.2 74235 63429 -10806 1.05 1 44696 43011 -1685 1.05 1.05 49987 50140 153 1.05 1.1 55497 58665 3168 1.05 1.15 61218 58730 -2488 1.05 1.2 67025 60170 -6855 1.1 1 40202 42929 2727 1.1 1.05 44936 43011 -1925 1.1 1.1 49987 50140 153 1.1 1.15 55226 58665 3439 1.1 1.2 60710 58693 -2017 1.15 1 36281 41330 5049 1.15 1.05 40584 42950 2366 1.15 1.1 45172 43011 -2161 1.15 1.15 49987 50140 153 1.15 1.2 54985 58665 3680 1.2 1 32888 39279 6391 1.2 1.05 36811 41830 5019 1.2 1.1 40969 42989 2020 1.2 1.15 45365 43011 -2354 1.2 1.2 49987 50140 153

59

Table 25. Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Smooth Symmetric distribution, sample size (n1,n2) = (15,45),  = .05; 1,000,000 repetitions.

t test Wilcoxon Power 1 2 Difference 1 1 50452 50399 -53 1 1.05 45120 44456 -664 1 1.1 40680 44369 3689 1 1.15 36669 42757 6088 1 1.2 33292 40787 7495 1.05 1 56243 59906 3663 1.05 1.05 50452 50399 -53 1.05 1.1 45334 44456 -878 1.05 1.15 41069 44389 3320 1.05 1.2 37220 43250 6030 1.1 1 62251 60007 -2244 1.1 1.05 55963 59906 3943 1.1 1.1 50452 50399 -53 1.1 1.15 45540 44456 -1084 1.1 1.2 41465 44420 2955 1.15 1 68548 62202 -6346 1.15 1.05 61643 59978 -1665 1.15 1.1 55697 59906 4209 1.15 1.15 50452 50399 -53 1.15 1.2 45746 44456 -1290 1.2 1 74900 64779 -10121 1.2 1.05 67582 61415 -6167 1.2 1.1 61107 59942 -1165 1.2 1.15 55470 59906 4436 1.2 1.2 50452 50399 -53

60

Table 26. Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Smooth Symmetric distribution, sample size (n1,n2) = (45,15),  = .05; 1,000,000 repetitions.

t test Wilcoxon Power Differenc 1 2 e 1 1 49949 50884 935 1 1.05 55562 53907 -1655 1 1.1 61480 57224 -4256 1 1.15 67504 60489 -7015 1 1.2 73287 63304 -9983 1.05 1 44571 47869 3298 1.05 1.05 50070 50793 723 1.05 1.1 55335 53687 -1648 1.05 1.15 60912 57100 -3812 1.05 1.2 66860 60272 -6588 1.1 1 40273 45380 5107 1.1 1.05 44808 48044 3236 1.1 1.1 50323 51257 934 1.1 1.15 55416 54195 -1221 1.1 1.2 60361 56551 -3810 1.15 1 36103 42541 6438 1.15 1.05 40637 45426 4789 1.15 1.1 45116 48145 3029 1.15 1.15 49982 51037 1055 1.15 1.2 55232 53573 -1659 1.2 1 32208 40170 7962 1.2 1.05 36429 42987 6558 1.2 1.1 40704 45514 4810 1.2 1.15 45441 48398 2957 1.2 1.2 49862 50687 825

61

Extreme Asymmetric, Achievement Distribution

Unequal variance and equal sample sizes

The Extreme Asymmetric, Achievement distribution is recognized for its extreme negative skew. The difference between the two tests becomes evident as the Wilcoxon begins to consistently reject at a higher rate. With equal variance, the Wilcoxon rejects at a higher rate as compared to the t test.

Further, as the variance becomes greater, the rejection rate increases notable for the Wilcoxon (see Table 27 and 28).

62

Table 27. Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Extreme Asymmetric Achievement distribution, sample size (n1,n2) = (20,20),  = .05; 1,000,000 repetitions.

t test Wilcoxon Power 1 2 Difference 1 1 49170 50509 1339 1 1.05 49239 68544 19305 1 1.1 49487 67948 18461 1 1.15 49889 67114 17225 1 1.2 50340 66164 15824 1.05 1 49271 68035 18764 1.05 1.05 49170 50509 1339 1.05 1.1 49232 68544 19312 1.05 1.15 49446 67954 18508 1.05 1.2 49863 67170 17307 1.1 1 49573 67490 17917 1.1 1.05 49271 68035 18764 1.1 1.1 49170 50509 1339 1.1 1.15 49212 68544 19332 1.1 1.2 49413 68092 18679 1.15 1 49910 66738 16828 1.15 1.05 49533 67490 17957 1.15 1.1 49260 68035 18775 1.15 1.15 49170 50509 1339 1.15 1.2 49228 68544 19316 1.2 1 50363 65863 15500 1.2 1.05 49876 66806 16930 1.2 1.1 49504 67641 18137 1.2 1.15 49268 68035 18767 1.2 1.2 49170 50509 1339

63

Table 28. Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Extreme Asymmetric Achievement distribution, sample size (n1,n2) = (30,30),  = .05; 1,000,000 repetitions.

t test Wilcoxon Power 1 2 Difference 1 1 48883 50070 1187 1 1.05 49035 75301 26266 1 1.1 49168 74367 25199 1 1.15 49358 73087 23729 1 1.2 49678 71606 21928 1.05 1 48987 74784 25797 1.05 1.05 48883 50070 1187 1.05 1.1 49036 75301 26265 1.05 1.15 49163 74403 25240 1.05 1.2 49325 73187 23862 1.1 1 49180 73865 24685 1.1 1.05 48993 74784 25791 1.1 1.1 48883 50070 1187 1.1 1.15 49016 75301 26285 1.1 1.2 49135 74572 25437 1.15 1 49451 72599 23148 1.15 1.05 49146 73892 24746 1.15 1.1 48984 74784 25800 1.15 1.15 48883 50070 1187 1.15 1.2 49002 75301 26299 1.2 1 49796 71258 21462 1.2 1.05 49406 72691 23285 1.2 1.1 49108 74091 24983 1.2 1.15 48965 74784 25819 1.2 1.2 48883 50070 1187

64

Unequal variance and unequal sample sizes

The rejection rates when both tails combine for the Extreme Asymmetric distribution are listed in Tables 29, 30, 31, and 32. Although the margin of rejection is narrow when variance is the same, the Wilcoxon holds an advantage over the t test when variance is unequal.

65

Table 29. Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Extreme Asymmetric Achievement distribution, sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions.

t test Wilcoxon Power 1 2 Difference 1 1 47004 50155 3151 1 1.05 42187 48205 6018 1 1.1 38150 47388 9238 1 1.15 34809 46294 11485 1 1.2 32019 45094 13075 1.05 1 52514 77434 24920 1.05 1.05 47004 50155 3151 1.05 1.1 42427 48205 5778 1.05 1.15 38524 47530 9006 1.05 1.2 35276 46381 11105 1.1 1 58289 77465 19176 1.1 1.05 52250 77434 25184 1.1 1.1 47004 50155 3151 1.1 1.15 42604 48205 5601 1.1 1.2 38841 47582 8741 1.15 1 64483 77522 13039 1.15 1.05 57689 77506 19817 1.15 1.1 52024 77434 25410 1.15 1.15 47004 50155 3151 1.15 1.2 42771 48205 5434 1.2 1 70838 77623 6785 1.2 1.05 63592 77517 13925 1.2 1.1 57190 77451 20261 1.2 1.15 51804 77434 25630 1.2 1.2 47004 50155 3151

66

Table 30. Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Extreme Asymmetric Achievement distribution, sample size (n1,n2) = (30,10),  = .05; 1,000,000 repetitions.

t test Wilcoxon Power 1 2 Difference 1 1 47133 50431 3298 1 1.05 61483 80480 18997 1 1.1 98119 206506 108387 1 1.15 154023 305591 151568 1 1.2 222749 364169 141420 1.05 1 54602 57555 2953 1.05 1.05 47133 50431 3298 1.05 1.1 60215 80480 20265 1.05 1.15 93689 202845 109156 1.05 1.2 145130 305628 160498 1.1 1 76349 141959 65610 1.1 1.05 53861 57555 3694 1.1 1.1 47133 50431 3298 1.1 1.15 59122 80480 21358 1.1 1.2 89914 202845 112931 1.15 1 107821 233556 125735 1.15 1.05 73810 136942 63132 1.15 1.1 53237 57555 4318 1.15 1.15 47133 50431 3298 1.15 1.2 58180 80480 22300 1.2 1 147889 307464 159575 1.2 1.05 102868 233592 130724 1.2 1.1 71637 136942 65305 1.2 1.15 52699 57555 4856 1.2 1.2 47133 50431 3298

67

Table 31. Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Extreme Asymmetric Achievement distribution, sample size (n1,n2) = (15,45),  = .05; 1,000,000 repetitions.

t test Wilcoxon Power 1 2 Difference 1 1 48635 50608 1973 1 1.05 43466 54695 11229 1 1.1 39203 53592 14389 1 1.15 35487 52142 16655 1 1.2 32369 50581 18212 1.05 1 54265 84457 30192 1.05 1.05 48635 50608 1973 1.05 1.1 43677 54695 11018 1.05 1.15 39534 53726 14192 1.05 1.2 35969 52249 16280 1.1 1 60270 84267 23997 1.1 1.05 53974 84457 30483 1.1 1.1 48635 50608 1973 1.1 1.15 43888 54695 10807 1.1 1.2 39860 53823 13963 1.15 1 66467 83901 17434 1.15 1.05 59674 84284 24610 1.15 1.1 53740 84457 30717 1.15 1.15 48635 50608 1973 1.15 1.2 44100 54695 10595 1.2 1 72626 83607 10981 1.2 1.05 65598 83939 18341 1.2 1.1 59117 84310 25193 1.2 1.15 53506 84457 30951 1.2 1.2 48635 50608 1973

68

Table 32. Rejection rates(10-6): Combined upper and lower tails for Shift (L,U) = (.0,.0) for t and Wilcoxon, Extreme Asymmetric Achievement distribution, sample size (n1,n2) = (45,15),  = .05; 1,000,000 repetitions.

t test Wilcoxon Power 1 2 Difference 1 1 48103 49913 1810 1 1.05 53762 84175 30413 1 1.1 59732 83973 24241 1 1.15 65691 83689 17998 1 1.2 71871 83239 11368 1.05 1 43012 53873 10861 1.05 1.05 48103 49913 1810 1.05 1.1 53485 84175 30690 1.05 1.15 59130 83990 24860 1.05 1.2 64845 83703 18858 1.1 1 38698 52795 14097 1.1 1.05 43248 53873 10625 1.1 1.1 48103 49913 1810 1.1 1.15 53233 84175 30942 1.1 1.2 58611 84018 25407 1.15 1 34957 51405 16448 1.15 1.05 39072 52947 13875 1.15 1.1 43470 53873 10403 1.15 1.15 48103 49913 1810 1.15 1.2 52981 84175 31194 1.2 1 31897 49743 17846 1.2 1.05 35436 51529 16093 1.2 1.1 39425 53042 13617 1.2 1.15 43661 53873 10212 1.2 1.2 48103 49913 1810

69

Heterogeneity of Variance with Shift in Location

The Gaussian produced consistent results; combined lower and upper tail rejections rendered a higher total across the different variance combinations

(see Table 33). Specifically, with the smaller sample sizes, the t test held a modest advantage rejecting 56% over the Wilcoxon with sample sizes n1=10, n2=30; n1=30, n2=10. However, with the increase in samples sizes, the modest advantage is achieved with the Wilcoxon rejecting 56% of the time and the t test rejecting 44%. These rejection rates were consistent with both .0 and

.10 shift. Noticeable are the rejection rates when the shift increases to .15 and

.20. Under these conditions, the t test holds a slight advantage when sample sizes are smaller (52-56% rejection rate) however when sample sizes increase to n1=15, n2=45 and n1=45, n2=15, the Wilcoxon rejected 56% of the time.

The Smooth Symmetric Distribution rendered similar results as the

Gaussian, however the power of each test was based much on the amount and spread of shift (see Table 34).

With non-normal distributions, specifically the Extreme Asymmetric,

Achievement, with both even and uneven sample sizes, the power of the

Wilcoxon was up to 20-30 times greater than the t test. The Wilcoxon was consistently more powerful across all levels of variance. When sample sizes were larger and equal, the gap between rejection rates with the Wilcoxon and the t test closed, yet the Wilcoxon maintained greater power (see Table 35).

70

Table 33: Gaussian Distribution: Average rejection and range of rejection across all increments of scale change and sample sizes. Shift Test Average Range of rejection Rejection .0_.05 t test 5.29 3.41-7.64 Wilcoxon 5.25 4.09-6.60 .0_.10 t test 5.87 3.81-8.50 Wilcoxon 5.94 4.50-7.39 .0_.15 t test 7.22 4.55-10.02 Wilcoxon 7.10 5.29-8.70 .0_.20 t test 8.93 5.54-12.08 Wilcoxon 8.71 6.36-10.46 .05_.10 t test 5.27 3.41-6.94 Wilcoxon 5.25 4.09-6.60 .05_.15 t test 6.01 3.83-7.97 Wilcoxon 5.99 4.54-7.37 .05_.20 t test 7.27 4.96-10.00 Wilcoxon 7.17 5.27-8.64 .10_.15 t test 5.28 3.40-7.61 Wilcoxon 5.25 4.08-6.62 .10_.20 t test 6.01 4.31-7.76 Wilcoxon 5.96 4.50-7.33 .15_.20 t test 5.29 3.40-7.63 Wilcoxon 5.25 4.08-6.61

Table 34: Smooth Symmetric Distribution: Average rejection and range of rejection across all increments of scale change and sample sizes. Shift Test Average Range of rejection Rejection .0_.05 t test 5.25 3.25-8.00 Wilcoxon 5.74 3.96-6.76 .0_.10 t test 5.77 3.84-8.95 Wilcoxon 6.09 5.65-7.49 .0_.15 t test 6.65 3.54-10.32 Wilcoxon 6.47 4.11-9.47 .0_.20 t test 7.69 4.34-12.12 Wilcoxon 7.04 5.25-11.13 .05_.10 t test 5.25 3.25-8.01 Wilcoxon 5.73 3.96-6.75 .05_.15 t test 5.83 3.41-8.95 Wilcoxon 6.01 4.21-8.26 .05_.20 t test 6.61 3.76-10.32

71

Wilcoxon 6.49 4.63-9.48 .10_.15 t test 5.25 3.26-8.01 Wilcoxon 5.73 3.97-6.79 .10_.20 t test 5.77 3.41-8.95 Wilcoxon 6.09 4.21-8.26 .15_.20 t test 5.25 3.25-8.01 Wilcoxon 5.73 3.96-6.75

Table 35: Extreme Asymmetric Achievement Distribution: Average rejection and range of rejection across all increments of scale change and sample sizes. Shift Test Average Range of rejection Rejection .0_.05 t test 5.75 3.68-24.92 Wilcoxon 8.53 5.04-33.10 .0_.10 t test 6.50 4.41-27.75 Wilcoxon 8.44 5.76-16.37 .0_.15 t test 7.74 5.43-30.72 Wilcoxon 11.98 5.82-48.86 .0_.20 t test 9.51 6.72-12.12 Wilcoxon 17.48 7.25-34.52 .05_.10 t test 5.75 3.68-24.92 Wilcoxon 8.51 5.04-42.09 .05_.15 t test 6.50 4.38-27.75 Wilcoxon 9.78 5.30-22.36 .05_.20 t test 7.74 5.42-30.72 Wilcoxon 11.98 5.82-37.26 .10_.15 t test 5.75 3.68-24.92 Wilcoxon 8.48 5.04-42.04 .10_.20 t test 6.50 4.42-27.75 Wilcoxon 9.67 5.54-46.37 .15_.20 t test 5.75 3.68-24.92 Wilcoxon 8.45 5.04-34.66

72

CHAPTER FIVE

CONCLUSION

This study was stimulated by debate of the appropriateness of the t test verses the Wilcoxon when the test assumptions are violated. The purpose was to determine if, in the presence of a slight scale change and shift in location, the t test fails to reject and the Wilcoxon does reject it is because the latter is actually detecting a change in location, as opposed to a change in scale. This long standing debate was reinvigorated by Barber & Thompson (2000) and furthered by Long, Feng & Cliff (2003), who stated:

the Wilcoxon-Mann-Whitney test provides a test of the null hypothesis that δ = zero, assuming that the two population distributions are identical. However, the distributions could differ in shape or spread, with or without having δ equal to 0. In other words, the Wilcoxon-Mann-Whitney test is testing the hypothesis that two groups represent random samples from the same distribution, but rejection of the hypothesis is sometimes taken as reflecting a difference in the location of the two distributions. Differences between the two distributions in shape or in spread may invalidate inferences from the Wilcoxon-Mann-Whitney test. (p. 650).

Inherent in this debate is the lack of recognition of the fundamental properties and functions of the t test and the Wilcoxon test. The t test‟s superiority is unquestionable when data are normally distributed with homogeneous variances, and the hypothesis is a shift in means. However, real data sets present the fundamental problem that violations to population normality occur as frequently as 97% with real education and psychology data sets (Micceri, 1989; Sawilowsky

& Blair, 1992). An ancillary outcome of this study is to confirm that the superior properties of the t test are not preserved in the presence of non-normality. When

73

data are non-normally distributed, the t test‟s Type I error is adversely affect, and the Wilcoxon rank-sum test maintains a power advantage.

Additionally, there is an apparent neglect that the most occurring change in treatment in any field impacts both the location as well as scale. Even within the most controlled environment, some level of variability will exist. Further articulated by Sawilowsky (2002),

the most prolific treatment outcome in applied studies is known. It is where a change in scale is concomitant with a shift in means. As an intervention is implemented, the means increase or decrease according to the context. Simultaneously, the treatment group may become more homogeneous on the outcome variable due to sharing the same intervention, method, condition etc. Alternately, the groups may become more heterogeneous, as some respond to the treatment while others do not respond or even regress (p. 466).

Although neither the t test nor the Wilcoxon are good choices if the hypothesis is designed to test both change in location and scale, prudent researchers must be mindful that although alternative hypotheses are designed to identify change in mean/location, variance is also impacted by treatment.

Outcomes of this study allowed for a comparison of the Type I errors rates.

Outcomes indicate that under normality, with no change in shift nor scale, the t test is slightly more robust then the Wilcoxon. However with non-normal distributions, robust properties can be studied. With the t test, the Type I error rates become more conservative with Type II error rates increasing. As anticipated, with the Wilcoxon, the error rates remain consistent across distributions. This is illustrated across several equal and unequal sample sizes

(see Tables 36, 37, and 38).

74

Table 36: Rejections for Shift (L,U) = (.0,.0) for t and Wilcoxon, 1 & 2= (1.0,1.0), sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions. Distribution t test Wilcoxon Lower Tail Upper Tail Lower Tail Upper Tail Gaussian .025344 .024692 .025211 .024717 Smooth Symmetric .024991 .025016 .024980 .024815 Extreme Asymmetric Achievement .031540 .015464 .025355 .024800

Table 37: Rejections for Shift (L,U) = (.0,.0) for t and Wilcoxon, 1 & 2= (1.0,1.0), sample size (n1,n2) = (20,20),  = .05; 1,000,000 repetitions. Distribution t test Wilcoxon Lower Tail Upper Tail Lower Tail Upper Tail Gaussian .024981 .024737 .024672 .024417 Smooth Symmetric .025129 .025044 .025490 .025182 Extreme Asymmetric Achievement .024566 .024604 .025337 .025172

Table 38: Rejections for Shift (L,U) = (.0,.0) for t and Wilcoxon, 1 & 2= (1.0,1.0), sample size (n1,n2) = (15,45),  = .05; 1,000,000 repetitions. Distribution t test Wilcoxon Lower Tail Upper Tail Lower Tail Upper Tail Gaussian .024980 .025148 .025459 .025598 Smooth Symmetric .025281 .025171 .025366 .025033 Extreme Asymmetric Achievement .030580 .018055 .025593 .025015

Recognized as the Behrens-Fisher problem, scale change without change in location allows for a further examination of the performance of each test‟s robust properties. Although the likelihood of this happening in applied research is minimal; it is germane to this study. Underlying the debate is the assumption that if the t test fails to reject, and the Wilcoxon does reject, it must be due to something other than shift in location. However findings suggest that neither test performs well (see Table 39 and 40).

75

Table 39: Rejections (x10-6) for Shift (L,U) = (.0,.0) for t and Wilcoxon, various ratios of 1 & 2, sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions. Distribution Scale t test Wilcoxon Change Lower Tail Upper Tail Lower Tail Upper Tail Gaussian 1 2 1 1.05 22584 22212 23499 23301 1 1.1 20166 19928 22010 21845 1 1.15 18207 17945 20963 20674 1 1.2 16134 16326 19338 19781 Smooth Symmetric 1 1.05 21374 23409 20411 22492 1 1.1 18297 21895 20371 22462 1 1.15 15669 20657 18877 22344 1 1.2 13461 19511 18153 21122 Extreme Asymmetric 1 1.05 29804 12383 39795 8410 Achievement 1 1.1 28193 9957 38941 8447 1 1.15 26765 8044 37764 8530 1 1.2 25488 6531 36436 8658

Table 40: Rejections (x10-6) for Shift (L,U) = (.0,.0) for t and Wilcoxon, various ratios of 1 & 2, sample size (n1,n2) = (20,20),  = .05; 1,000,000 repetitions. Distribution Scale t test Wilcoxon Change Lower Tail Upper Tail Lower Tail Upper Tail Gaussian 1 2 1 1.05 24996 25116 24519 24539 1 1.1 24953 24954 24542 24593 1 1.15 25193 24846 24892 24549 1 1.2 24856 25150 24525 24718 Smooth Symmetric 1 1.05 23891 26325 25292 28052 1 1.1 22732 27656 25287 28072 1 1.15 21730 29021 24392 29116 1 1.2 20828 30344 24779 28931 Extreme Asymmetric 1 1.05 25966 23273 56928 11616 Achievement 1 1.1 27327 22160 56135 11813 1 1.15 28760 21129 54974 12140 1 1.2 30086 20254 53633 12531

76

In an effort to study power, the following table illustrates the differences among distributions, tests and equal/unequal sample sizes while holding the variance constant and adding equal increments of change in location/shift (.05,

.10, .15, .20) (see Table 41).

Table 41: Range of rejection change in location/shift: (.05, .10, .15, .20)

Distribution Test Rejection rates Gaussian t test 4.91-5.03% Wilcoxon 4.89-5.12% Smooth Symmetric t test 4.97-5.05% Wilcoxon 4.98-5.50% Extreme Asymmetric, t test 4.70-4.89% Achievement Wilcoxon 5.01-5.50%

The outcomes of this power study are anticipated; with the Gaussian distribution, the t is slightly more powerful. Further, the power of the t test and Wilcoxon are comparable specific to both the Gaussian and Smooth Symmetric distributions with the Wilcoxon maintaining a slight advantage with the non-normal Smooth

Symmetric distribution. Noticeably, when extreme skew is present, the Wilcoxon maintains greater power.

The primary outcome of this research is slight change in location and scale change. The motivating argument is that the t test is more powerful then the

Wilcoxon and not likely to pick up error. Whereas the Wilcoxon, due to its nonparametric functions, is likely to pick up change in location and variance. As illustrated in the findings, under normality, the t test is more reactive, rejecting more than the ranked-based Wilcoxon. Further, as the variance difference

77

increases, both test‟s rejection rates increase. With the introduction of non- normality, both tests reject at a higher rate, with the Wilcoxon rejecting more frequently then the t test (see Tables 42, 43 and 44).

Table 42: Rejections (power×10-6) for Shift (L,U) = (.05,.0) for t and Wilcoxon, Gaussian distribution, various ratios of 1 & 2, sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions. t test Wilcoxon 1 2 L=.025 U=.025 L=.025 U=.025 1.05 1 36872 20301 35004 19497 1.1 1 40407 23082 36810 21259 1.1 1.05 36450 20604 34601 19664 1.15 1 43792 25494 38660 22666 1.15 1.05 39258 23107 36019 21313 1.15 1.1 35796 20753 34166 20065 1.2 1 47072 28299 39975 24028 1.2 1.05 42641 25642 37508 22893 1.2 1.1 39023 23344 35800 21607 1.2 1.15 35348 21317 33765 20437

Table 43: Rejections (power×10-6) for Shift (L,U) = (.05,.0) for t and Wilcoxon, Smooth Symmetric distribution, various ratios of 1 & 2, sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions. t test Wilcoxon 1 2 L=.025 U=.025 L=.025 U=.025 1.05 1 36984 20511 42831 15287 1.1 1 41958 22066 41665 17914 1.1 1.05 36357 20726 42831 15287 1.15 1 47029 23574 42111 20259 1.15 1.05 41023 22194 40908 18019 1.15 1.1 35760 20906 42831 15287 1.2 1 52096 25114 41048 23651 1.2 1.05 45858 23632 42185 20096 1.2 1.1 40176 22312 40908 18019 1.2 1.15 35246 21075 42831 15287

78

Table 44: Rejections (power×10-6) for Shift (L,U) = (.05,.0) for t and Wilcoxon, Extreme Asymmetric, Achievement distribution, various ratios of 1 & 2, sample size (n1,n2) = (10,30),  = .05; 1,000,000 repetitions. t test Wilcoxon 1 2 L=.025 U=.025 L=.025 U=.025 1.05 1 42978 12259 51386 7179 1.1 1 44927 15332 26154 44158 1.1 1.05 42400 12390 51300 7181 1.15 1 46790 18734 22727 51412 1.15 1.05 44223 15336 25882 44310 1.15 1.1 41868 12524 51207 7185 1.2 1 48597 22445 55438 21000 1.2 1.05 45961 18594 51471 22634 1.2 1.1 43565 15339 44311 25880 1.2 1.15 41384 12651 51168 7185

The outcomes of this study further confirm theoretical understanding about the power parameters when shift and scale are present. When the treatment impacts location, researchers can maintain confidence that if the Wilcoxon rejects the null and the t test does not, this rejection reflects a shift in location.

The latest iteration searching for a reason to prefer the t test over the Wilcoxon, when testing for shift, was the claim that in small treatment conditions impacting both scale and shift, the Wilcoxon was erroneously rejecting the null hypothesis due to the scale change, instead of an apparently more desirous shift in location.

The t test may be a preferred choice when testing for shift in location under normality. However, alternative methods, such as the robust and more powerful

Wilcoxon Rank Sum test, should be used when non-normality is identified and must be respected when testing for a shift in location parameters as illustrated in

79

these findings. The findings indicate that the Wilcoxon is extremely sensitive to variance and is not likely to reject the null hypothesis based on a treatment that primarily impacts the variance.

Altman (1982) raised concerns that when researchers (and professional journals) set precedent for poor methodology, they directly influence other researchers‟ use of poor methodology in their work. This study challenges researchers to review the literature ensuring proper test selection and further charges the editors of journals to carefully review design. Although this study aimed at dispelling continual myth and misinformation about the comparative power of the t test verses the Wilcoxon test, only three of the eight social and behavioral science prototypical data sets provided by Micceri (1989) were studied. Expansion to include other distributions that capture other disciplines, should they be demonstrated to differ, would be prudent.

80

REFERENCES

Barber, J.A. & Thompson, S.G. (1998). Analysis and interpretation of

cost data in randomized controlled trials: review of published findings.

British Medical Journal, 317, 1195-1200.

Barber, J.A. & Thompson, S.G. (2000). Would have been better to use t

test than Mann-Whitney U test. British Medical Journal, 320, 1730-

1731.

Blair, R.C. (1981). A reaction to “consequences of failure to meet

assumptions underlying the fixed effects and

”. Review of Educational Research, 51(4), 499-507.

Blair, R.C. (1991). New critical values for the generalized t and

generalized rank- sum procedures. Communications in Statistics, 20,

981-994.

Blair, R.C. & Higgins, J.J. (1980a). The power of t and Wilcoxon statistics:

A comparison. Evaluation Review, 4(5), 645-656.

Blair, R.C. & Higgins, J.J. (1980b). A comparison power of the Wilcoxon‟s

rank-sum statistic to that of Student‟s t statistic under various non-

normal distributions. Journal of Educational Statistics, 5(4), 309-335.

Blair, R.C. & Higgins, J.J. (1981). A note on the asymptotic relative

efficiency of the Wilcoxon rank-sum test relative to the independent

means t test under mixtures of two normal distributions. British

Journal of Mathematical and Statistical Psychology, 34, 124-128.

81

Blair, R.C. & Higgins, J.J. (1985). Comparison of the power of the paired

samples t test to that of Wilcoxon‟s signed ranks test under various

population shapes. Psychological Bulletin, 97, 119-128.

Bradley, J.V. (1968). Distribution-free statistical tests. Englewood Cliffs, NJ:

Prentice Hall.

Bridge, P.D. & Sawilowsky, S.S. (1999). Increasing physicians‟ awareness

of the impact of statistics on research outcomes: Comparative powers of

the t test and Wilcoxon Rank-Sum test in small sampled applied

research. Journal of Clinical , 52(3), 229-235.

Boneau, C.A. (1962). A comparison of the power of the U and t-test.

Psychological Review, 69, 246-256.

Boneau, C.A. (1960). The effects of violations of assumptions underlying

the t-test. Psychological Bulletin, 57, 49-64.

Bradstreet, T.E. (1997). A Monte Carlo study of Type I error rates for the

two-sample Behrens-Fisher problem with and without rank

transformation. Computational Statistics & Data Analysis, 25, 167-

179.

Breckler, S.J. (1990). Application of covariance structure modeling to

psychology: Cause for concern? Psychological Bulletin, 107, 260-273.

Bridge, P.D., & Sawilowsky, S.S. (1997). Revisiting the t-test on ranks as an

alternative to the Wilcoxon rank-sum test. Perceptual and Motor Skills, 85,

399-402.

82

Bridge, P.D. & Sawilowsky, S.S. (1999). Increasing physician‟s awareness

of the impact of statistics on research outcomes: Comparative power of

the t-test and Wilcoxon Rank-Sum test in small samples applied

research. Journal of Clinical Epidemiology, 52(3), 229-236.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences.

(2nd edition). New York: Academic Press.

Conover, W.J. & Iman, R.L. (1976). On some alternative procedures using

ranks for the analysis of experimental designs. Communications in

Statistics, A5(14), 1349-1368.

Conover, W.J. & Iman, R.L. (1981). Rank transformation as a bridge

between parametric and nonparametric statistic. American Statistics,

35, 124-133.

Gibbons, J.D. (1985). Nonparametric methods for quantitative analysis (2nd

ed.). Columbus,OH: American Sciences.

Gibbons, J.D., & Chakraborti, S. (1991). Comparisons of the Mann-Whitney,

Student‟s t, and Alternate t tests for means of normal distributions. Journal

of Experimental Education, 59, 258-267.

Gibbons, J.D., & Chakraborti, S. (1992). Response to Zimmerman. Journal

of Experimental Education, 60(4), 365-366.

Glass, G., Peckman, P., & Sanders, J.R. (1972). Consequences of failure to

meet assumptions underlying the fixed effects analysis of variance and

covariance. Review of Educational Research, 42, 237-288.

83

Hart, A. (2001). Mann-Whitney test is not just a test of :

differences in spread can be important. British Medical Journal, 323,

391-393.

Harwell, M.R. (1988). Choosing between parametric and nonparametric

tests. Journal of Counseling and Development, 67, 35-38.

Harwell, M.R. (1990). Summarizing Monte Carlo results in

methodological research. Paper presented at the annual meeting of

the American Educational Association, Boston.

Hinkle, D.E., Wiersma, W., & Jurs, S.G. (1998). Applied Statistics for the

Behavioral Sciences (4th ed.). New York: Houghton Mifflin Company.

Hodges, J.C. & Lehmann, E.L. (1956). The efficiency of some nonparametric

competitors of the t-test. Annals of , 27, 324-335.

Howell J.F. & Games, P. A. (1974). The effects of variance heterogeneity

on simultaneous multiple-comparison procedures with equal sample

size. British Journal of Mathematical and Statistical Psychology, 27,

72-81.

Hunter M.A. & May, R.B. (1993). Some myths concerning parametric and

nonparametric tests. Canadian Psychological, 34(4), 384-389.

Hutchinson, T.P. (2002). Should we routinely test for simultaneous

location and scale changes? Ergonomics, 45(3), 248-251.

Kerlinger, F.N. & Lee, H.B. (2000). Foundations of Behavioral Research (4th

ed.). New York: Harcourt College Publishers.

84

Long, J.D., Feng, D. & Cliff, N. (2003). Ordinal analysis of behavioral data. In

J.A. Schinka, W.F. Velicer, & I.B. Weiner, (Eds.) Handbook of

Psychology, Vol 2: Research Methods in Psychology. New York: Wiley.

Mansfield, E. (1986). Basic statistics with applications. New York: W.W.

Norton & Company.

Micceri, T. (1989). The unicorn, the normal curve and other improbable

creatures. Psychological Bulletin, 105(1), 156-166.

Neave, H.R. & Granger, C.W.J. (1968). A Monte Carlo study comparing

various two-sample test for differences in mean. Technometrics, 10(3),

509-522.

Runyon, R.P. & Haber, A. (1991). Fundamentals of Behavioral Statistics

(7th ed.). New York: McGraw Hill.

Sawilowsky, S.S. (1990). Nonparametric test of in

experimental design. Review of Educational Research, 60(1), 91-126.

Sawilowsky, S.S. (1993). Comments on using alternatives to normal theory

statistics in social and behavioural science. Canadian Psychology, 34(4),

432-439.

Sawilowsky, S.S. (2002). Fermat, Schubert, Einstein, and Behrens-

Fisher: The probable difference between two means when  .

Journal of Modern Applied Statistical Methods, 1(2), 461-472.

Sawilowsky, S.S. & Blair, R.C. (1992). A more realistic look at the

robustness and type II error properties of the t test departures from

85

population normality. Psychological Bulletin, 111(2), 352-360.

Sawilowsky, S.S. & Fahoome, G.F. (2003). Statistics through Monte Carlo

Simulation with Fortran. Michigan: JMASM, Inc.

Siegal, S. (1956). for the behavioral sciences. New

York: McGraw Hill.

Wilcox, R.R. (1996). Statistics for the Social Sciences. San Diego:

Academic Press.

William, J.G., Cheung, W.Y., Russell, I.T., Cohen, D.R., Longo, M. &

Lervy, B. (2000). Open access follow up for inflammatory bowel

disease: pragmatic randomized trial and cost effective study. British

Medical Journal, 320, 544-5.

Zimmerman, D.W. (1987). Comparative power of Student t test and Mann

Whitney U test for Unequal Sample Sizes and Variances. Journal of

Experimental Education, 55, 171-174.

Zimmerman, D.W. (1991). Failure of the Mann-Whitney Test: A Note on

the Simulation study of Gibbons and Chakraborti (1991). Journal of

Experimental Education, 60, 359-364.

Zimmerman, D.W. (2004). A note on preliminary test of equality of

variance. British Journal of Mathematics and Statistical Psychology,

57(1), 173-182.

Zimmerman D.W. & Williams, R.H. (1989). Power comparisons of the

Student t test and two approximations when variances and sample

86

sizes are unequal. Journal of Indian Society of Agricultural

Statistics, 41, 206-217.

Zumbo B.D. & Zimmerman, D.W. (1993) Introduction to the Symposium.

Canadian Psychology, 34(4), 381-383.

87

ABSTRACT

DECONSTRUCTING THE COMPARATIVE POWER OF THE INDEPENDENT SAMPLES t AND THE WILCOXON MANN WHITNEY FOR SHIFT AND SLIGHT HETEROSCEDASTICITY

by

TANA J. BRIDGE

December 2007

Advisor: Dr. Shlomo Sawilowsky Major: Theoretical Evaluation and Research Degree: Doctor of Philosophy

Ongoing debate specific to the power properties of the independent

samples t test and Wilcoxon Mann-Whitney required a need for this study.

Researchers chose the t test over the Wilcoxon, when testing for shift,

claiming that in small treatment conditions, the Wilcoxon was erroneously

rejecting the null hypothesis due to scale change. Therefore, the purpose of

this study was to assess if, in the presence of a slight scale change, the

reason the t test fails to reject and the Wilcoxon does reject is due to the scale

change and not shift in location.

Applying Monte Carlo techniques, the comparative power and robustness

of the t test and the Wilcoxon were investigated. In addition to the Gaussian

distribution, two real prototypical data sets Smooth Symmetric and Extreme

Asymmetry, Achievement, (Micceri, 1989) were applied. Sample Sizes

included: (n1, n2) = (10, 30), (30, 10), (20, 20), (15, 45), (45, 15) and (30, 30).

The ratio of variance for group one and group two ranged from 1.0-1.2

88

(increase in increments of .05). Shift/change in location parameters increased from 0.0–1.1 (increments of .05). Nominal alpha was set at .05.

Outcomes compared the robustness and power of each test. Recognized as the Behrens-Fisher problem, scale change without change in location, outcomes confirm neither test as robust. In studying shift while holding variance constant; the power of both tests are comparable specific to the

Gaussian and Smooth Symmetric distributions, however with extreme skew, the Wilcoxon maintains much greater power.

The primary focus of this research is slight change in location and scale change. Under normality, the t test rejects more than the Wilcoxon. Further, as the variance difference increases, both test‟s rejection rates increase. With the introduction of non-normality, both tests reject at a higher rate, with the

Wilcoxon rejecting more frequently then the t test. The outcomes of this study confirm the strength of the t test under normality however when the treatment impacts location, researchers can maintain confidence that if the Wilcoxon rejects the null and the t test does not, this rejection reflects a shift in location.

89

AUTOBIOGRAPHICAL STATEMENT

Tana J. Bridge

EDUCATION:

1999-2007 Doctor of Philosophy Wayne State University, Detroit, Michigan Major: Theoretical Evaluation and Research Minor: Urban Studies – Conflict Resolution and Mediation

1987-1988 Master of Social Work University of Michigan, Ann Arbor, Michigan Major: Families and Groups Minor: Administration

1982-1987 Bachelor of Science Eastern Michigan University, Ypsilanti, Michigan Major: Psychology Minor: Social Work

FACULTY APPOINTMENTS:

Eastern Michigan University Department of Social Work, Ypsilanti, Michigan

2004-present Assistant Professor

2002-2004 Full Lecturer

1993-2002 Adjunct Lecturer

PROFESSIONAL APPOINTMENTS:

Boysville of Michigan, Clinton, Michigan

6/1993 – 8/1994 Clinical Supervisor

1/1994-6/1994 Home Developer\Licensing Manager

8/1990-6/1993 Treatment Coordinator

9/1988-12/1990 Family Therapist