On the Kumaraswamy Fisher Snedecor Distribution
Total Page:16
File Type:pdf, Size:1020Kb
Mathematics and Statistics 4(1): 1-14, 2016 http://www.hrpub.org DOI: 10.13189/ms.2016.040101 On the Kumaraswamy Fisher Snedecor Distribution Adepoju, K.A*, Chukwu, A.U, Shittu, O.I Department of Statistics, University of Ibadan, Nigeria Copyright©2016 by authors, all rights reserved. Authors agree that this article remains permanently open access under the terms of the Creative Commons Attribution License 4.0 International License Abstract We propose the Kumaraswamy-F (KUMAF) population means relies heavily on a good number of distribution which is a generalization of the conventional assumptions of the Analysis of Variance (ANOVA). Fisher Snedecor (F-distribution). The new distribution can For the F-ratio statistic there are two fundamental be used even when one or more of the regular assumptions assumptions: the variances of the compared populations are are violated. It is obtained with the addition of two shape the same; the estimates of the population variance are parameters to a continuous F-distribution which is independent. commonly used to test the null hypothesis in the Analysis of Therefore before we proceed with an analysis of the data Variance (ANOVA test). The statistical properties of the we have collected we have to make sure that these proposed distribution such as moments, moment generating assumptions have been met. function, the asymptotic behavior among others were These assumptions include independent of k populations investigated. The method of maximum likelihood is used to being tested, equality of the population variances, and estimate the model parameters and the observed information absence of outlier among others. When a number of the matrix is derived. The distribution is found to be more above assumptions are violated, the use of F-test result to test flexible and robust to regular assumptions of the for the equality of the regresand becomes incorrect or conventional F-distribution. In future research, the misleading. For example, if the assumption of independence flexibility of this distribution as well as its robustness using is violated, then the one-way ANOVA is simply not a real data set will be examined. The new distribution is appropriate. Similarly, when the assumption of normality or recommended for used in most applications where the unequal variances is violated, the classical F-test fails to assumption underlying the use of conventional F reject the null hypothesis even if the data actually provide distribution for one-way analysis of variance are violated strong evidence for it. A potentially more damaging such as homogeneity of variance or normality assumption violation of assumption occurs when one are more of the probably as result of the presence of outlier(s). It is populations being tested are not normally distributed instructive to note that the new distribution preserves the probably due to the presence of outliers. This occurs more originality of the data without transformation. especially when the sample sizes are not equaled Keywords Fisher-Snedecor Distribution, (unbalanced). Often, the effect of an assumption violation on Kumaraswamy-F Distribution, One Way ANOVA, the result of one-way ANOVA depends on the extent of the Outlier, Maximum Likelihood Method violation (such as how unequal the population variances are, or how heavy-tailed one or another population distribution is). Some small violations may have little practical effect on the analysis, while other violations may render the one-way ANOVA result uselessly incorrect. Krutchkoff [10] 1. Introduction discussed some misconceptions about the F-test and An F-test is any statistical test in which the test statistic provided a simulation based solution to overcome drawbacks has an F-distribution under the null hypothesis. It is most of the test. ANOVA under heteroscedasticity is a Behrens- often used when comparing statistical models that have been Fisher’s type problem. The Behrens–Fisher problem, named fitted to a data set, in order to identify the model that best fits after Ronald Fisher and W. V. Behrens, is the problem of the population from which the data were sampled. Exact interval estimation and hypothesis testing concerning the F-tests mainly arise when the models have been fitted to the difference between the means of two normally distributed data using least squares. The name was coined by George W. populations when the variances of the two populations are Snedecor, in honour of Sir Ronald A. Fisher. Fisher initially not assumed to be equal, based on two independent samples. developed the statistic as the variance ratio in the 1920s. The Tsui and Weerahandi [14] generalized the conventional efficient use of classical F-test to compare several (k) definition of the p-value from the F-distribution so that 2 On the Kumaraswamy Fisher Snedecor Distribution problems such as the Behrens-Fisher problem can be r1 r1 −1 resolved. Weerahandi [15] discussed the numerical r 2 x 2 = 1 equivalence of this test with the representation of Behrens- f (x) (1) r +r2 Fisher solution given in Barnard [1]. Rice and Gaines [12] 1 1 r r r1 r x 2 extended the p-value given in Barnard [1] to the one-way B 1 , 2 r 2 1+ 1 ANOVA case. Samaradasa and Weerahandi [16] extended 2 2 2 r2 the representation of the two-sample test given in Tsui and Weerahandi [14] to the one-way ANOVA case and provide r1 x r1 r2 exact tests for making multiple comparisons for means and β ; , r2 2 2 variances. This test referred to as the generalized F-test for F(x) = I = = A (2) r1x r r one-way ANOVA is numerically equivalent in performance r 2 β 1 , 2 to the test in Rice and Gaines [12]. 2 2 This paper therefore focuses on developing a generalized F-distribution that is capable of handling data that are The Kumaraswamy generator (link function) as given by non-normal and non-Gaussian and probably infected with Jones [7] as outliers. The proposed distributions will be used to develop C−1 b−1 = − c (3) p-value that is less sensitive to any serious model assumption g(x) bc[F(X ) ] [1 F(X ) ] f (x) violations. then using (1) and (2) in (3) , we obtain the probability density function of the Kumaraswamy-F distribution as: distribution as: 2. The Kumaraswamy-F Distribution b−1 r1 r1 (KUMA-F) c−1 c bc I 1− I r 2 x 2 −1 r1x r1 x 1 In this section Kumaraswamy-F distribution is developed r 2 r2 by compounding the tractable link function developed by g(x) = (4) r +r Kumaraswamy [8] with conventional F-distribution by 1 2 r r r1 r x 2 Fisher Snedecor to obtain the Kumaraswamy-F distribution. B 1 , 2 r 2 1+ 1 A random variable X is distributed as the 2 2 2 r2 Kumaraswamy-F distribution if its PDF is derived as follows: > > > > > b 0, c 0, x 0, r1 0, r2 0 The graph of the above pdf is given in Fig. 1 below: Figure 1. The PDF of Kumaraswamy F-Distribution at r1=3, r2=22 b=2.5 Mathematics and Statistics 4(1): 1-14, 2016 3 c−1 Using the appropriate transformation, one can show that ∞ 1 1 b−1 dM ∞ g(x)dx = bc M c (1− M c ) = ∫ ∫ cAc−1 ∫ g(x)dx 1 −∞ 0 0 1 1 1− c b−1 dM Let = b M (1− M ) − ∫ 1 c 1 r1 0 r −1 c 2 1 M r1 2 x I r x dA r2 1 1 A = such that = + r dx r1 r2 b−1 2 2 = b (1− M ) dM r1 r2 r1 x ∫ β , 1+ 0 2 2 r 2 = 1 Then This verifies that g(x) is indeed a probability density function of a continuous distribution. b−1 dA g(x) = bc A c−1 (1− Ac ) dx ∞ ∞ 3. Cumulative Distribution Function b−1 g(x)dx = bc A c−1 (1− Ac ) dA =1 ∫ ∫ Lemma1: Given that X ˜KUMA − F (b,c,r ,r ) its −∞ 0 1 2 , distribution function is expressed as c dM c−1 c b Let M = A = c A dA r1x r1 r2 β ; , 1 r2 2 2 dM c G(x) =1− 1− dA = − A = M r r C Ac 1 β 1 , 2 2 2 Proof r1 r r 2 1 −1 b−1 1 t 2 x c−1 c r2 G(x) = P(X ≤ x) = bc I r t 1− I r t + dt ∫ 1 1 r1 r2 0 r2 r2 2 r1 r2 r1t B , 1+ 2 2 r2 r1 r 2 1 −1 r1 2 t dp r2 Let P = I and = r1t r dt r1 r2 r1t 2 B , 1 + 2 2 r2 We can be simplified as follows M b−1 G(x) = b∫ (1− P c ) dP 0 (5) G(x) = 1− (1− P c )b 4 On the Kumaraswamy Fisher Snedecor Distribution r1 x r1 r2 β ; , r2 2 2 Recall that P = then r r β 1 , 2 2 2 c b r1 x r1 r2 β ; , r2 2 2 G(x) = 1− 1− (6) r r β 1 , 2 2 2 The plot of the CDF for various values of b, c, r1 and r2 is plotted below Figure 2. CDF of Kumaraswamy distribution 4. Limit Behaviour In this section, we investigate the limit behavior of Kumaraswamy-F distribution as x → ∞ and as x → 0. This can be achieved by taking the limit of the equation (4). For x → ∞ r1 c−1 r1 b−1 2 −1 1 r1 2 1 c x r1 +r2 Iim g(x) = Iim bc I r x 1− I r1 r2 r x→∞ 1 r1x β 2 2 x→∞ , r1 r r2 2 2 2 1+ x r2 This tends to zero because 1 Iim + = 0 x→∞ r1 r2 r x 2 1+ 1 r 2 Mathematics and Statistics 4(1): 1-14, 2016 5 r 1 −1 Similarly, as x → 0, Iim g(x) = 0, this is because Iim x 2 = 0 x→0 The above indicates that the proposed distribution has a mode.