Externally Studentized Normal Midrange Distribution Distribuição Da Midrange Estudentizada Externamente
Total Page:16
File Type:pdf, Size:1020Kb
Ciência e Agrotecnologia 41(4):378-389, Jul/Aug. 2017 http://dx.doi.org/10.1590/1413-70542017414047716 Externally studentized normal midrange distribution Distribuição da midrange estudentizada externamente Ben Dêivide de Oliveira Batista1*, Daniel Furtado Ferreira2, Lucas Monteiro Chaves2 1Universidade Federal de São João del-Rei/UFSJ, Departamento de Matemática e Estatística/DEMAT, São João del-Rei, MG, Brasil 2Universidade Federal de Lavras/UFLA, Departamento de Estatística/DES, Lavras, MG, Brasil *Corresponding author: [email protected] Received in December 15, 2016 and approved in February 15, 2017 ABSTRACT The distribution of externally studentized midrange was created based on the original studentization procedures of Student and was inspired in the distribution of the externally studentized range. The large use of the externally studentized range in multiple comparisons was also a motivation for developing this new distribution. This work aimed to derive analytic equations to distribution of the externally studentized midrange, obtaining the cumulative distribution, probability density and quantile functions and generating random values. This is a new distribution that the authors could not find any report in the literature. A second objective was to build an R package for obtaining numerically the probability density, cumulative distribution and quantile functions and make it available to the scientific community. The algorithms were proposed and implemented using Gauss-Legendre quadrature and the Newton-Raphson method in R software, resulting in the SMR package, available for download in the CRAN site. The implemented routines showed high accuracy proved by using Monte Carlo simulations and by comparing results with different number of quadrature points. Regarding to the precision to obtain the quantiles for cases where the degrees of freedom are close to 1 and the percentiles are close to 100%, it is recommended to use more than 64 quadrature points. Index terms: Distribution function; density function; Gauss-Legendre quadrature; Newton-Raphson method; R. RESUMO A distribuição da midrange estudentizada externamente foi criada com base nos procedimentos de estudentização de Student e foi inspirada na distribuição da amplitude estudentizada externamente. O amplo uso da amplitude estudentizada externamente em comparações múltiplas também foi uma das motivações para desenvolver esta nova distribuição. Neste trabalho objetivou-se derivar expressões analíticas da distribuição da midrange estudentizada externamente, obtendo a função de distribuição, função densidade de probabilidade, função quantil e geradores de números aleatórios. Essa é uma nova distribuição que os não há relatos na literatura especializada. Um segundo objetivo foi construir um pacote R para obter numericamente as funções mencionadas e torná-las disponíveis para a comunidade científica. Os algoritmos foram propostos e implementados usando os métodos de quadratura Gauss-Legendre e Newton-Raphson no software R, resultando no pacote SMR, disponível para baixar na página do CRAN. As rotinas implementadas apresentaram alta acurácia, sendo verificadas usando simulação Monte Carlo e pela comparação com diferentes pontos de quadratura. Quanto a precisão para se obter os quantis da distribuição da midrange estudentizada externamente para 1 grau de liberdade ou percentis próximo de 100%, é sugerido utilizar mais do que 64 pontos de quadratura. Termos para indexação: Função de distribuição; função densidade; quadratura Gauss-Legendre; método Newton- Raphson; algoritmo; R. INTRODUCTION Gumbel, 1944, 1946; Wilks, 1948; Yalcin et. al., 2014; Wan et. al., 2014; Barakat et. al., 2015; Bland, 2015; Mansouri, Many problems on statistical investigations are 2015; Li and Mansouri, 2016) and some of these studies based on studies of sample order statistics. Important cases are discussed in the sequence. of the order statistics are the minimum and maximum value Tippet (1925) studied the first four moments of of a sample. Among other functions of order statistics, the range. Pearson and Hartley (1942) obtained tabulated the range and midrange are of special interest here, which values of the cumulative probabilities for several range correspond to the difference between the maximum and values in small samples (n = 2 to 20) drawn from normal the minimum and to the average from these two extreme populations. Gumbel (1944, 1946) established the values, respectively. Several authors studied and applied independence of extreme values for large samples from this subject (Tippet, 1925; Pearson and Hartley, 1942; several continuous distribution, as well as the distribution 2017 | Lavras | Editora UFLA | www.editora.ufla.br | www.scielo.br/cagro All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License attribuition-type BY. Externally studentized normal midrange distribution 379 of the range and midrange. Wilks (1948) reviewed several ∞∞ articles relating to the order statistics and suggested several f(;,) q nν=∫∫ n (1)()( n − xφφ y xq +× y ) examples of their applications to statistical inference. 0 −∞ (4) Studies related with studentization were initially ×[ Φ (xq + y ) −Φ ( y )]n−2 f ( x ;ν ) dydx , proposed by Student (1927) and the particular case of the studentized range distribution have been largely used where ϕ(y) and Ф(y) are the probability density and in multiple comparison procedures in different areas cumulative distribution functions from a standard normal of scientific research since the pioneering works in this population evaluated at y, with y ]−∞,∞[ and f(x;ν) is area (Duncan, 1952, 1955; Tukey, 1949, 1953). The the probability density function of X = S/σ. Considering studentized range refers to the random variable defined that X is obtained in a sample of size∈ ν+1 from the normal ν S 2 simply as the range divided by the sample standard ∼ χ 2 distribution, then it is well known the fact that σ 2 ν , i.e., deviation, considering that both terms of this ratio are it has a chi-square distribution with ν degrees of freedom 22 random variables independently distributed and computed ν SS 2 (Mood et al., 1974). Hence, = ∼ χνν / . Therefore, it can in samples drawn from the normal distribution. νσ22 σ S 2 Let Y , Y , …, Y be the order statistics in a sample of be concluded that X = ∼ χνν / . 1 2 n σ size , that are defined by sorting the original sample variables X1, X2, …, Xn in increasing order. The sample X1, X2, …, Theorem 1. If X=S/σ is computed in a sample of size ν+1 Xn are drawn from a population with distribution function from a normal distribution with mean μ and variance F(x). The range is defined by W = Yn - Y1. The cumulative σ², then its probability density function is given by distribution and the probability density functions (cdf and Equation 5, pdf) of the range are the Equations 1 and 2, ν /2 ν νν−−1x2 /2 ∞ fx(;)ν = x e, x≥ 0. (5) F( w )= n f ( z )[ F ( w +− z ) F ( z )]n−1 dz , Γ ν ν /2− 1 W ∫ (1) ( / 2)2 −∞ Proof. If U ∼ χ 2,, as discussed above, then the distribution of and ν X = S/σ can be obtained from the transformation U = v X 2. ∞ The Jacobian of this transformation is given by f() w= nn ( − 1)()( f z f w +× z ) W ∫ −∞ (2) du Jx= = 2,ν n−2 ×[F( w +− z ) F () z] dz , dx respectively, as showed in David and Nagaraja (2003) and for x > 0. in Gumbel (1947). The density function of X, from samples of a Considering now samples of size n from the normal normal population, is obtained from the chi-square distribution by distribution with standard deviation σ and mean µ ∈ , the externally studentized range is defined by the ratio WW′ 1 Q = = where W’ = W/σ is the sample standard range νν= = ν /2−− 1u /2 XS fx(;) fU (;)|| u J ν /2 u e| J | and S 2 is an independent and unbiased estimator of σ2, 2Γ (ν / 2) associated with ν degrees of freedom. The cumulative 1 2 = ()ννxe2νν /2−− 1x /2 2 x distribution and the probability density functions, according ν /2 2Γ (ν / 2) to David and Nagaraja (2003), are given by Equations 3 and 4, ν /2− 1 2νν 2 = 2νν /2−− 1x /2 ∞∞ ν /2 xx() e n−1 2Γ (ν / 2) F( q ; n ,νφ )=∫∫ n ( y )[ Φ ( xq + y ) −Φ ( y )] × 0 −∞ (3) resulting in Equation 5. × f(; xν ) dydx , Another very interesting order statistic is the midrange and introduced, among others, by Gumbel (1947) and and Rider (1957). Ciência e Agrotecnologia 41(4):378-389, Jul/Aug. 2017 380 BATISTA, B. D. de O. et al. Definition 1. The midrange is defined as the mean between X2, … , Xn are drawn from a population with distribution the minimum and the maximum order statistics by Equation 6, function F(x). Therefore, the distribution of the midrange is given by the following theorem. Theorem 2. The probability density function and the R=( YY1 + n ) / 2. (6) cumulative distribution function of R , Definition 1, from a random sample X , X , … , X , of size n, where X has The externally studentized midrange is defined 1 2 n j distribution function F(x) and probability density function considering the original studentization procedures f(x), j = 1, 2, … , n given by Equations 7 and 8, of Student (1927) and was inspired in the externally studentized range definition. r f() r= 2( nn − 1)()(2 f z f r −× z ) Definition 2. The externally studentized midrange are R ∫−∞ (7) defined by n−2 ×[F (2 r −− z ) F ( z )] dz , Q= RS/, and where R is expressed in Equation 6 and S is an estimator of r the population standard deviation σ with ν degrees of freedom F( r ) = n f ( z )[ F (2 r−− z ) F ( z )]n−1 dz , (8) R ∫−∞ obtained independently from R. It should be noticed that Q is a random variable. respectively. However, few studies address the midrange Proof.