Type I Error Rates of the Two-Sample Pseudo-Median Procedure," Journal of Modern Applied Statistical Methods: Vol

Journal of Modern Applied Statistical Methods Volume 10 | Issue 2 Article 3 11-1-2011 Type I Error Rates of the Two-Sample Pseudo- Median Procedure Nor Aishah Ahad Universiti Utara Malaysia Abdul Rahman Othman Universiti Sains Malaysia Sharipah Soaad Syed Yahaya Universiti Utara Malaysia Follow this and additional works at: http://digitalcommons.wayne.edu/jmasm Part of the Applied Statistics Commons, Social and Behavioral Sciences Commons, and the Statistical Theory Commons Recommended Citation Ahad, Nor Aishah; Othman, Abdul Rahman; and Yahaya, Sharipah Soaad Syed (2011) "Type I Error Rates of the Two-Sample Pseudo-Median Procedure," Journal of Modern Applied Statistical Methods: Vol. 10 : Iss. 2 , Article 3. DOI: 10.22237/jmasm/1320120120 Available at: http://digitalcommons.wayne.edu/jmasm/vol10/iss2/3 This Regular Article is brought to you for free and open access by the Open Access Journals at DigitalCommons@WayneState. It has been accepted for inclusion in Journal of Modern Applied Statistical Methods by an authorized editor of DigitalCommons@WayneState. Journal of Modern Applied Statistical Methods Copyright © 2011 JMASM, Inc. November 2011, Vol. 10, No. 2, 418-423 1538 – 9472/11/$95.00 Regular Articles Type I Error Rates of the Two-Sample Pseudo-Median Procedure Nor Aishah Ahad Abdul Rahman Othman Sharipah Soaad Syed Yahaya Universiti Utara Malaysia, Universiti Sains Malaysia, Universiti Utara Malaysia, Kedah, Malaysia Penang, Malaysia Kedah, Malaysia The performance of the pseudo-median based procedure is examined in terms of controlling Type I error for a two independent groups test. The procedure is a modification of the one-sample Wilcoxon statistic using the pseudo-median of differences between group values as the central measure of location. The proposed procedure was shown to have good control of Type I error rates under the study conditions regardless of distribution type. Key words: Mann-Whitney-Wilcoxon, pseudo-median, t-test, type I error. Introduction similar to parametric tests (Pratt, 1964; Testing the equality of central tendency Zimmerman & Zumbo, 1992). Further, parameters between two independent samples by nonparametric methods are more appropriate for controlling Type I error is a common statistical non-normal symmetric data. Many attempts problem. If an underlying distribution is have been made to deal with asymmetric normally distributed with equal population distributions. In this study, a method to handle variances, the most suitable test statistic to use is the problem of asymmetric data, as well as the Student’s t-test. Student’s t, however, is heterogeneity of variances, is suggested. The sensitive to non-normal data and heterogeneity method is known as the pseudo-median of variances. Under these situations, Welch’s procedure, where the pseudo-median of approximate test (Welch, 1938) usually offers differences between group values are employed the best practical solution, but this statistic does as the central measure of location with the one- not adequately control Type I error probabilities sample nonparametric Wilcoxon procedure in a under non-normal distributions. two group setting. The pseudo-median of a To surmount the problem of non- distribution F is defined to be the median of the normality, researchers typically seek distribution (Z1 + Z2)/2, where Z1 and Z2 are all nonparametric test alternatives, such as the possible differences between two observations Mann-Whitney-Wilcoxon, which is believed to from each group. Z1 and Z2 are independent and be effective against violations of normality. have the same distribution as F (Hoyland, 1965; Although ranking methods are often useful when Hollander & Wolfe, 1999). samples are obtained from heavy-tailed The pseudo-median is a location distributions, they are influenced by unequal parameter. The estimation of this parameter is variances accomplished using the Hodges-Lehmann estimator. According to Hollander and Wolfe (1999), the Hodges-Lehmann estimator ()θˆ is a Nor Aishah Ahad is an Academician in the consistent estimator of the pseudo-median, School of Quantitative Sciences. Email: which in general may differ from the median. [email protected]. Abdul Rahman Othman is However, when F is symmetric, the median and a Professor in the School of Distance Education. pseudo-median coincide. The pseudo-median is Email: [email protected]. Sharipah Soaad selected as the central measure of location Syed Yahaya is an Associate Professor in the because it is convenient and the asymptotic School of Quantitative Sciences. Email: properties of the pseudo-median are the same as [email protected]. 418 AHAD, OTHMAN & SYED YAHAYA median. In this study, the performance of the nn12 = pseudo-medians procedure in terms of Type I WRe ij ij . (4) == error was measured via Monte Carlo simulation. ij1 Because the sampling distribution of this pseudo-median procedure is intractable, the The modification of the Wilcoxon bootstrap method was used to arrive at the procedure is performed by adding the pseudo- significant values. median value to the second sample to form a + ˆ new sample, Xd2 . The aligned difference, Methodology based on the location-aligned samples, becomes: This study addresses both symmetric and asymmetric distribution and the methods applied DXˆ =−() X +=− dDdˆˆ. (5) to the two types of distributions are very ij12 i j ij different. Let XXX= (), , ..., X and 111121n1 Let Rˆ denote the rank of Dˆ . The indicator = ij ij XXXX221222(), , ..., n be samples from 2 function and the aligned statistic are expressed distributions F1 and F2 respectively. The as: pseudo-median is defined as: 0, Dˆ < 0 ij == ˆ DD+ eDˆij0.5, ij 0 (6) ˆ ij i′′ j d = Median 2 1, Dˆ > 0 ij (1) ()XX−+−() X X and 12ij 12ij'' = Median nn12 ˆˆ= ˆ 2 WRe ij ij . (7) ij==1 ≠ ≠ where ii' and j j ' . When F1 and F2 are Because the second sample was symmetric, d can be defined as the difference realigned with the estimate d, it is necessary to between the centers of symmetry. Hence, the find the pseudo sampling distribution for the hypothesis is given as: estimate W. Use of a bootstrap procedure is proposed in order to construct the hypothesis = test. Separately bootstrap ni observations from Hd0 :0 X group and n observations from Xd+ ˆ versus (2) 1 j 2 group to obtain bootstrap samples, X * and X * . Hd:0.≠ 1 2 1 The bootstrap difference becomes ***=− * = = DXij12 i X j where Rij denotes the rank of Let DXXij12 i- j , in1,2,..., 1 , * = = Dij . The indicator function and the bootstrap j 1,2,...,n2 and Nnn12. The statistic is a one-sample Wilcoxon statistic based on the statistic can be defined as: NDij ’s. Let Rij denote the rank of Dij . The * < indicator function and the statistic are expressed 0, Dij 0 as: **== eDij0.5, ij 0 (8) < 0, Dij 0 * > 1, Dij 0 == eDij0.5, ij 0 (3) and 1, D > 0 ij nn12 WRe***= . (9) and ij ij ij==1 419 TYPE I ERROR RATES OF THE TWO-SAMPLE PSEUDO-MEDIAN PROCEDURE The steps to obtain the p value using the example, let X1 and X 2 be two skewed bootstrap method for symmetric distribution are distributions where the standard deviations need as follows: not be the same. Let YYY= (), and 11112 YYY= (), represent the new generated 1. Calculate W from X1 and X 2 . 22122 samples of size two, which have the same ˆ distribution with X1 and X 2 , respectively. 2. Calculate d from X1 and X 2 . Compute ai as follows: ˆ 3. Add d to X 2 to form a new sample, ()()YY−+− YY Xd+ ˆ . = 11 21 12 22 2 amediani (10) 2 ˆ 4. Calculate W from X1 and the new sample in step 3. Repeat the process of generating new samples of size two 9,999 times and repeat the computation 5. Generate bootstrap samples by randomly of ai to obtain aa1, 2 ,..., a 10,000 . Therefore, the sampling with replacement n observations i median of aa1, 2 ,..., a 10,000 is the value of a . from the X1 group, and nj observations For asymmetric distributions, the steps * to obtain the p value using a bootstrap method from the new sample in step 3 yielding X1 are the same except for one small alteration in and X * . 2 step 1. In this step, a constant a is introduced to the members of the second sample (X2) to form * 6. Calculate W from the bootstrap samples, a new sample, Xnew2 . Steps 2-10 proceed as X * and X * . 1 2 noted, with the one difference that X 2 has become Xnew2 . 7. Find ()WW* − ˆ . To study the robustness of this procedure, four variables were manipulated to create conditions known to highlight the 8. Repeat Steps 5 - 7 B times. strengths and weaknesses of the test for the equality of location parameters. The variables 9. Compare the value of ()WW* − ˆ with are (1) types of distributions, (2) degree of variance inequality, (3) balanced/unbalanced ()WEWH− ()| . 0 sample sizes, and (4) pairings of unequal group variance and sample sizes. In this study, =−>−* ˆ () empirical Type 1 error rates were collected and Let UWW()() WEWH| 0 and later compared under various study conditions. =−<−* ˆ () LWW()() WEWH| 0 . The number of groups and sample sizes were fixed. This study covered only the two = 2 groups case with total sample size of N 40 . 10. Calculate the p value as × min() #LU , # . This value was later divided into two groups B forming the balanced and unbalanced design. For the balanced design, the value is equally For asymmetric distributions, the divided into nn==20 , and for the difference between the centers of symmetry 12 between the two groups cannot be assumed to be unbalanced design the groups were divided into = and = . To investigate the zero; therefore, to ensure the setting for the null n1 15 n2 25 condition, a constant a must be determined and distribution types, this study focused on (1) added to the members of the second sample.

Load more