<<

Edited by Emilio Gómez-Déniz

Printed the of SpecialEdition Issue Published in Symmetry

www.mdpi.com/journal/symmetry Distributions

Asymmetric Symmetric and

Symmetric and Asymmetric Distributions • Emilio Gómez-Déniz Symmetric and Asymmetric Distributions

Symmetric and Asymmetric Distributions Theoretical Developments and Applications

Editor Emilio Gomez´ -Deni´ z

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin Editor Emilio Gomez-D´´ eniz Department of Quantitative Methods and TIDES Institute, University of Las Palmas de Gran Canaria Spain

Editorial Office MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal Symmetry (ISSN 2073-8994) (available at: https://www.mdpi.com/journal/symmetry/special issues/Symmetric Asymmetric Distributions Theoretical Developments Applications).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. Journal Name Year, Article Number, Page .

ISBN 978-3-03936-646-0 (Hbk) ISBN 978-3-03936-647-7 (PDF)

c 2020 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications. The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND. Contents

About the Editor ...... vii

Preface to ”Symmetric and Asymmetric Distributions” ...... ix

Emilio G´omez-D´eniz, Yuri A. Iriarte, Enrique Calder´ın-Ojeda and H´ector W. G´omez Modified Power-Symmetric Distribution Reprinted from: Symmetry 2019, 11, 1410, doi:10.3390/sym11111410 ...... 1

F´abio V. J. Silveira, Frank Gomes-Silva, C´ıcero C. R. Brito, Moacyr Cunha-Filho, Felipe R. S. Gusm˜ao and S´ılvio F. A. Xavier-Junior ´ Normal-G Class of Probability Distributions: Properties and Applications Reprinted from: Symmetry 2019, 11, 1407, doi:10.3390/sym11111407 ...... 17

H´ector J. G´omez, Diego I. Gallardo and Osvaldo Venegas Generalized Truncation Positive Reprinted from: Symmetry 2019, 11, 1361, doi:10.3390/sym11111361 ...... 35

Tiago M. Magalh˜aes, Diego I. Gallardo and H´ector W. G´omez of Maximum Likelihood Estimators in the Weibull Censored Data Reprinted from: Symmetry 2019, 11, 1351, doi:10.3390/sym11111351 ...... 53

Guillermo Mart´ınez-Fl´orez, Inmaculada Barranco-Chamorro, Heleno Bolfarine and H´ector W. Gomez ´ Flexible Birnbaum–Saunders Distribution Reprinted from: Symmetry 2019, 11, 1305, doi:10.3390/sym11101305 ...... 63

Neveka M. Olmos, Osvaldo Venegas, Yolanda M. G´omez and Yuri Iriarte An Asymmetric Distribution with Heavy Tails and Its Expectation–Maximization (EM) Algorithm Implementation Reprinted from: Symmetry 2019, 11, 1150, doi:10.3390/sym11091150 ...... 81

Yolanda M. G´omez, Emilio G´omez-D´eniz, Osvaldo Venegas, Diego I. Gallardo and H´ector W. Gomez ´ An Asymmetric Bimodal Distribution with Application to Quantile Regression Reprinted from: Symmetry 2019, 11, 899, doi:10.3390/sym11070899 ...... 95

Barry C. Arnold, H´ector W. G´omez, H´ector Varela and Ignacio Vidal and Bivariate Models Related to the GeneralizedEpsilon–Skew– Reprinted from: Symmetry 2019, 11, 794, doi:10.3390/sym11060794 ...... 107

V. J. Garc´ıa, M. Martel–Escobar and F. J. V´azquez–Polo A Note on Ordering Probability Distributions by Skewness Reprinted from: Symmetry 2018, 10, 286, doi:10.3390/sym10070286 ...... 121

v

About the Editor

Emilio Gomez´ -D´eniz (M.S. in Mathematics, M.S. in Economics, and Ph.D. in Economics) is professor of Mathematics at the University of Las Palmas de Gran Canaria (Spain), where he has taught and conducted research for more than 30 years. He is also member of the Institute of Tourism and Sustainable Economic Development (TIDES). His main lines of research focus on distribution theory, Bayesian , robustness, and Bayesian applications in economics, with emphasis in actuarial statistics, tourism, education, sports, electronic engineering, health economics, etc. He has published more than 150 articles, most of them with JCR impact factor according to Web of Science and his work has been cited more than 1000 times on Google Scholar with an h-index of 16 and an i10 index of 35. He is currently the editor or member of the editorial board of the Spanish Journal of Statistics (SJS), Chilean Journal of Statistics, Mathematical Problems in Engineering, Risks, Risk Magazine, and Journal of Financial and Risk Management (Reviewer Board) and Statistical Methodology and Mathematical Reviews in the past. He has also acted and regularly acts as reviewer for more than 100 international journals. His academic leadership is further reflected through collaborative research with more than 50 leading international scientists worldwide. He is the author of many books on mathematics education and research, including ”Actuarial Statistics. Theory and Applications”, Prentice Hall (in Spanish). He has organized and participated in many international conferences and seminars. In addition, he has visited, as guest researcher and speaker, the University of Melbourne, the University of Kuwait, the University of Cantabria, the University of Antofagasta (Chile), the University of Barcelona, and the University of Granada, among others. In 2008, he was awarded the prestigious IV Julio Castelo Matran´ International Insurance Award, sponsored by the Mapfre Foundation and also recognized by his university for excellence in teaching.

vii

Preface to ”Symmetric and Asymmetric Distributions”

This Special Issue contains nine chapters selected after a comprehensive peer review process. Each chapter exclusively adheres to the topic specified in this Special Issue. The Associate Editor wants to especially thank all the authors who made this Special Issue possible, and all the anonymous reviewers for their altruistic effort and help in reviewing and for their excellent suggestions and critical reviews of the submitted manuscripts. All the chapters are written in the format of a research article for scholarly journals, i.e., title, abstract, and development, with an extensive bibliography at the end of the paper. All of them include real applications that will undoubtedly be useful for other researchers and graduate students who conduct similar research. I express my gratitude to MPDI for publishing this book, to Ms. Celina Si, Section Managing Editor, for her work and patience, and last but not least to my family for their and cooperation.

Emilio Gomez´ -D´eniz Editor

ix

S symmetry S

Article Modified Power-Symmetric Distribution

Emilio Gómez-Déniz 1, Yuri A. Iriarte 2, Enrique Calderín-Ojeda 3 and Héctor W. Gómez 2,* 1 Department of Quantitative Methods in Economics and TIDES Institute, University of Las Palmas de Gran Canaria, 35017 Las Palmas de Gran Canaria, Spain; [email protected] 2 Departamento de Matemáticas, Facultad de Ciencias Básicas, Universidad de Antofagasta, Antofagasta 1240000, Chile; [email protected] 3 Centre for Actuarial Studies, Department of Economics, The University of Melbourne, Melbourne, VIC 3010, Australia; [email protected] * Correspondence: [email protected]

Received: 2 October 2019; Accepted: 13 November 2019; Published: 15 November 2019

Abstract: In this paper, a general class of modified power-symmetric distributions is introduced. By choosing as symmetric model the normal distribution, the modified power-normal distribution is obtained. For the latter model, some of its more relevant statistical properties are examined. Parameters estimation is carried out by using the method of moments and maximum likelihood estimation. A simulation analysis is accomplished to study the performance of the maximum likelihood estimators. Finally, we compare the efficiency of the modified power-normal distribution with other existing distributions in the literature by using a real dataset.

Keywords: maximum likelihood; ; power-normal distribution

1. Introduction Over the last few years, the search for flexible probabilistic families capable of modeling different levels of bias and kurtosis has been an issue of great interest in the field of distributions theory. This was mainly motivated by the seminal work of Azzalini [1]. In that paper, the probability density function (pdf) of a skew-symmetric distribution was introduced. The expression of this density is given by

g(z; λ)=2 f (z)G(λz), z, λ ∈ R, (1) where f (·) is a symmetric pdf about zero; G(·) is an absolutely continuous distribution function, which is also symmetric about zero; and λ is a parameter of asymmetry. For the case where f (·) is the standard normal density (from now on, we reserve the symbol φ for this function), and G(·) is the standard normal cumulative distribution function (henceforth, denoted by Φ), the so-called skew-normal (SN) distribution with density

φ ( λ)= φ( )Φ(λ ) λ ∈ R Z z; 2 z z , z, , (2) is obtained. We use the notation Z ∼SN(λ) to denote the Z with pdf given by Equation (2). A generalization of the SN distribution is introduced by Arellano-Valle et al. [ 2] and Arellano-Valle et al. [ 3]; they study Fisher’s information matrix of this generalization. For further details about theSN distribution, the reader is referred to Azzalini4 []. Martínez-Flórez et al. [5] used generalizations of the SN distribution to extend the Birnbaum-Saunders model, and Contreras-Reyes and Arellano-Valle [6] utilized the Kullback–Leibler measure to compare the multivariate normal distribution with the skew-multivariate normal. One of the main limitations of working with the family given by Equation (1) is that the information matrix could be singular for some of its particular models (see Azzalini [ 1]). This might

Symmetry 2019, 11, 1410; doi:10.3390/sym111114101 www.mdpi.com/journal/symmetry Symmetry 2019, 11, 1410 lead to some difficulties in the estimation, due to the asymptotic convergence of the maximum likelihood (ML) estimators. To overcome this issue, some authors (see Chiogna [ 7] or Arellano-Valle and Azzalini [8]) have used a reparametrization of theSN model to obtain a nonsingular information matrix. However, this methodology cannot be extended to all type of skew-symmetric models which suffers of this convergence problem. On the other hand, the family of power-symmetric ( PS) distributions does not have this problem of singularity in the information matrix (see, Pewsey et al.9]). [ The pdf of this family of distribution is given by

ϕ ( α)=α ( ){ ( )}α−1 ∈ R α ∈ R+ F z; f z F z , z , . (3) where F(·) is itself a cumulative distribution function (cdf) and α is the . For the particular case thatF(·)=Φ(·), the power-normal ( PN) distribution is obtained, with density given by

α− + f (z; α)=αφ(z){Φ(z)} 1, z ∈ R, α ∈ R . (4)

For some references where this family is discussed, the reader is referred to Lehmann10 [ ], Durrans [11], Gupta and Gupta [ 12], and Pewsey et al. [ 9], among other papers. Other extensions of this model are given in Martínez-Flórez et al. [13], where a multivariate version from the model is introduced; also, Martínez-Flórez et al. [14] carried out applications by using regression models; finally, Martínez-Flórez et al. [15] examined the exponential transformation of the model , and Martínez-Flórez et al. [16] examined a version of the model doubly censored with inflation in a regression context. Truncations of the PN distribution were considered by Castillo el al. [17]. In this paper, a modification in the pdf of the PS probabilistic family is implemented to increase the degree of kurtosis. This methodology is later used to explain datasets that include atypical observations. Usually, this methodology is accomplished by increasing the number of parameters in the model. The paper is organized as follows. In Section 2, first, we introduce the modified power symmetric distribution. Then, the particular case of the modified power normal distribution is derived. Some of the most relevant statistical properties of this model, including moments and kurtosis coefficient, are presented. Next, in Section 3, some methods of estimation are discussed. Later, a simulation study is provided to illustrate the behavior of the shape parameter. A numerical application where the modified power normal distribution is compared to the SN and PN distributions is given in Section 4. Finally, Section 5 concludes the paper.

2. Genesis and Properties of Modified Power-Normal Distribution In this section, we introduce a new family of probability distributions. The idea is to make a transformation to a given probability density, as the skew-symmetric or power-symmetric distributions does. As there exists a certain resemblance between our formula (Equation (6)) and the formula for the power-symmetric distributions (Equation (3)), we agree to name these new distributions as modified power-symmetric (MPS) distributions. From the standard normal distribution, we obtain the so-called Modified Power-Normal (MPN ) distribution. The main parameters and properties of this particular distribution will be studied throughout this work.

2.1. Probability Density Function Definition 1. Let Z be a continuous and symmetric random variable with cdfG(z; η) and pdf g(z; η), where η denotes a vector of parameters. We say that, a random variable, X, follows a MPS distribution, denoted as X ∼MPS(η, α), if its cdf is given by

α [ 1 + G(x; η) ] − 1 F(x; η, α)= , (5) 2α − 1

2 Symmetry 2019, 11, 1410 and its pdf is given by α α− f (x; η, α)= g(x; η) [ 1 + G(x; η) ] 1 . (6) 2α − 1

where x ∈ R and α > 0.

Remark 1. In the case α = 1, the transformation given by Equation (6) is the identity. That is, the MPS distribution for α = 1 always provides the input probability density function.

Thereforeforth, we proceed to examine the MPN distribution, whose cdf is provided by

α x−μ 1 + Φ σ − 1 F(x; μ, σ, α)= , (7) 2α − 1 and whose pdf is given by

α− α x − μ x − μ 1 f (x; μ, σ, α)= φ 1 + Φ , (8) (2α − 1)σ σ σ where x ∈ R, μ ∈ R is the , σ > 0 is the , and α > 0 is the shape parameter. Hereafter, this will be denoted asX ∼MPN(μ, σ, α). Figure 1 depicts some different shapes of the pdf of this model, for selected values of the parameterα with μ = −1, 1 and σ = 1. The MPN class of distributions is applicable for the change point problem, due to its favorable properties (see Maciak et al. [18]); moreover, the MPN model can be utilized in calibration (see Pestaˇ [19]).

Remark 2. Here, μ ∈ R and σ > 0 are location and scale parameters of the MPN distribution, respectively. For the particular case α = 1, these are not only location and scale parameters but also the and of the standard normal distribution.

Figure 1. of the pdf of MPN distribution for selected values of the parameters.

3 Symmetry 2019, 11, 1410

2.2. Statistical Properties

2.2.1. Shape of the Density The MPN distribution exhibits a bell-shaped form, which can be symmetric or positively or negatively skewed depending on the value of the parameter α. Now, we derive some analytical expressions that are useful to obtain approximations of modal values and inflection points of this model. In the following, it will be assumed that μ = 0 and σ = 1.

∼MPN( α) ( ( α)) Proposition 1. The pdf of X 0, 1, has a local maximum at x1, f x1; and two inflection ( ( α)) ( ( α)) points at x2, f x2; and x3, f x3; , respectively, where x1 is the root of the equation ∗ ∗ (α − 1)φ(x ) x = , (9) 1 + Φ(x∗) and x2 and x3 are two solutions of the equation (α − 1)φ(x) 2 (α − 1)φ(x) φ(x) 1 = −x + − x + . (10) 1 + Φ(x) 1 + Φ(x) 1 + Φ(x)

Proof. The proof consists of simple derivatives of the function f . From the equation (8), we calculate

∂ α α− (α − 1)φ(x) f (x; α)= φ(x)[1 + Φ(x)] 1 −x + . ∂x 2α − 1 1 + Φ(x) 2 2 ∂ α α− (α − 1)φ(x) (α − 1)φ(x) f (x; α)= φ(x)[1 + Φ(x)] 1 −x + − 1 + ∂x2 2α − 1 1 + Φ(x) 1 + Φ(x) φ(x) × x + . 1 + Φ(x)

By setting Equations (9) and (10) to be equal to zero, the results are obtained after some algebra. Figure 2 displays the graph of the first derivative off (·), where it is observed that the maximum exists and it is unique. Therefore, the MPN distribution is unimodal. 

Figure 2. Plot of the first derivative of MPN distribution for selected values of the parameters.

4 Symmetry 2019, 11, 1410

Remark 3. The solutions of Equations (9) and (10) can be numerically obtained by using the built-in function “uniroot” in the software package R. Table 1 below illustrates some approximations of the roots x1, x2, and x3, and the corresponding figures of the pdf evaluated at these values.

Table 1. Approximations of the roots of Equations (9) and (10) for some values of α, and the corresponding figures of the pdf of the MPN evaluated at these roots.

α x x x f (x α) f (x α) f (x α) 1 2 3 1; 2; 3; 0.5 −0.136 −1.135 0.886 0.397 0.239 0.241 1.0 0.000 −1.000 1.000 0.399 0.242 0.242 2.0 0.243 −0.691 1.173 0.412 0.261 0.251 3.0 0.435 −0.414 1.299 0.433 0.282 0.266 4.0 0.586 −0.203 1.396 0.457 0.298 0.284 5.0 0.706 −0.041 1.475 0.481 0.316 0.301

2.2.2. Moments Proposition 2. The rth moments of X ∼MPN(0, 1, α) for r = 1, 2, 3, . . . , are given by α E(Xr)= a (α), (11) 2α − 1 r (α) where ar is defined as 1 −1 r α−1 ar(α)= [Φ (u)] (1 + u) du. (12) 0 − Here, Φ 1(·) is the of the standard normal distribution.

Proof. By using the change of variable u = Φ(x), it follows that ∞ α E( r)= r φ( )[ + Φ( )]α−1 X x α x 1 x dx −∞ 2 − 1 α 1 = [Φ−1( )]r( + )α−1 α u 1 u du 2 − 1 0 α = a (α).  2α − 1 r

Corollary 1. The mean and of X are given by α E(X)= a (α) and 2α − 1 1 α α Var(X)= a (α) − a2(α) , 2α − 1 2 2α − 1 1 respectively.

β β Corollary 2. The skewness ( 1) and kurtosis ( 2) coefficients are, respectively, given by

(α) − 3α (α) (α)+ 2α2 3(α) a3 α− a a2 α 2 a β = 2 1 1 (2 −1) 1 1 and α 3/2 [ (α) − α 2(α)]3/2 2α−1 a2 2α−1 a1 α α2 α3 (α) − α4 (α) (α)+ 6 2(α) (α) − 3 4(α) a4 2 −1 a1 a3 (2α−1)2 a1 a2 (2α−1)3 a1 β = . 2 α [ (α) − α 2(α)]2 2α−1 a2 2α−1 a1

Remark 4. Observe that the integral in Equation (12) can be numerically approximated by using the built-in function integrate available in the software package R. Below, in Table 2, some approximations of the mean and

5 Symmetry 2019, 11, 1410 variance for the MPN distribution for different values of α are displayed. Figure 3 illustrates the behavior of the E(X) and Var(X) of the MPN distribution for different values of α. It is observable that when α grows, the mean increases and the variance decreases. Figure 4 displays the curves associated with the coefficients of skewness (left panel) and kurtosis (right) of the MPN and PN distributions. It is shown that, depending on the values of α, the MPN distribution exhibits equal, greater, or lesser values for these coefficients compared to thePN model. In general, the MPN distribution has a smaller range of skewness than thePN distribution. On the other hand, whenα < 13.05, the MPN distribution has a greater kurtosis coefficient than the PN model.

Table 2. Approximations of E(X) and Var(X) of the MPN distribution for different values of α.

α E(X) Var(X)

0.5 −0.097 1.006 1.0 0.000 1.000 5.0 0.659 0.770 10.0 1.119 0.521 100.0 2.247 0.218

Figure 3. Plot of the E(X) and Var(X) of the MPN distribution.

Figure 4. Graphs of the skewness and kurtosis coefficients for the MPN and PN distributions.

6 Symmetry 2019, 11, 1410

2.2.3. Stochastic Ordering Stochastic ordering is an important tool to compare continuous random variables. It is well-known ≤ that random variable X1 is smaller than random variable X2 in stochastic ordering ( X1 st X2)if ( ) ≥ ( ) ≤ ( ) ( ) FX1 x FX2 x for all x, and in likelihood ratio order (X1 lr X2)if fX1 x / fX2 x decreases with x. Using Theorem 1.C.1 and Theorem 2.A.1 of Shaked and Shanthikumar20 [ ], the above stochastic orders hold according to the following implications,

≤ ⇒ ≤ X1 lr X2 X1 st X2. (13)

The proposition shows that the members of the MPN family can be stochastically ordered according to parameters values.

∼MPN( α ) ∼MPN( α ) α > α ≤ Proposition 3. Let X1 0, 1, 1 and X2 0, 1, 2 .If 1 2, then X1 lr X2 and, ≤ therefore, X1 st X2.

Proof. From the quotient of both densities, it follows that

α f (x; α ) α 2 1 − 1 α −α X2 2 = 2 [ + Φ( )] 2 1 α 1 x , ( α ) α 2 − fX1 x; 1 1 2 1 is non-decreasing if and only if μ (x) ≥ 0 for x ∈ (−∞, ∞), where

α −α μ(x)=[1 + Φ(x)] 2 1 .

After some calculations, it is shown that

α −α − μ ( )=(α − α )φ( )[ + Φ( )] 2 1 1 x 2 1 x 1 x .

α > α μ ( ) < ∈ (−∞ ∞) ( α ) ( α ) It is straightforward that for 1 2, then x 0 for x , . Therefore, fX2 x; 2 / fX1 x; 1 ≤ is decreasing in x, and consequently X1 lr X2. The other implication follows immediately from (13). 

3. Inference In this section, parameters estimation for theMPN distribution is discussed by using the method of moments and ML estimation. Additionally, a simulation analysis is carried out to illustrate the behavior of the ML estimators.

3.1. Method of Moments The following proposition illustrates the derivation of the estimates of the MPN distribution.

∼MPN(μ σ α) Proposition 4. Let x1, ..., xn be a random sample obtained from the random variable X , , , θ =(μ σ α ) θ =(μ σ α) then the moment estimates M M, M, M for , , are given by α σ = Sx μ = − σ M (α ) M , M x M α a1 M (14) α α 2 M − 1 M (α ) − M 2(α ) α a2 M α a M 2 M − 1 2 M − 1 1 and α α2 (α ) − 3 M (α ) (α )+ 2 M 3(α ) a3 M α a M a2 M α a M 2 M −1 1 (2 M −1)2 1 − Ax = 0, (15) α 3/2 α M [ (α ) − M 2(α )]3/2 α a2 M α a M 2 M −1 2 M −1 1

7 Symmetry 2019, 11, 1410

where x, Sx and Ax denote the sample mean, sample standard deviation and sample Fisher’s skewness coefficient respectively.

Proof. As μ and σ are location and scale parameters respectively, the skewness coefficient does not depend on these parameters. Thus, the result in (15) is directly obtained from matching the sample skewness coefficient with population counterpart given in Corollary 2. In addition, by considering that X = σZ + μ, where Z ∼MPN(0, 1, α), and again by equating sample mean and sample variance to the mean and variance respectively, it follows that

= σ E( )+μ x M X M, α = σ M (α )+μ M α a1 M M, 2 M − 1 and 2 = σ2 V ( ) Sx M ar X α α = σ2 M (α ) − M 2(α ) M α a2 M α a1 M , 2 M − 1 2 M − 1 α μ where M satisfies expression (15). Then, (14) is obtained by solving the latter equations for M and σ  M, respectively.

3.2. Maximum Likelihood Estimation MPN (μ σ α) For a random sample x1, ..., xn derived from the , , distribution, the log- can be written as n n − μ (μ σ α)= (σ α) − 1 ∑( − μ)2 +(α − ) ∑ + Φ xi , , nc , σ2 xi 1 log 1 σ , (16) 2 i=1 i=1

(σ α)= (α) − ( α − ) − (σ) − 1 ( π) where c , log log 2 1 log 2 log 2 . The score equations are given by

n μ + σ(α − ) κ( )= n 1 ∑ xi nx, (17) i=1 n n σ2 + σ(α − ) ( − μ)κ( )= ( − μ)2 n 1 ∑ xi xi ∑ xi , (18) = = i 1 i 1 n − μ α ( ) n + + Φ xi = 2 log 2 n α ∑ log 1 σ α − , (19) i=1 2 1

φ( w−μ ) κ( )=κ( μ σ)= σ where w w; , w−μ . 1+Φ( σ )

Solutions for these Equations (17)–(19) can be obtained by using numerical procedures such as Newton–Raphson algorithm. Alternatively, these estimates can be found by directly maximizing the log-likelihood surface given by (16) and using the subroutine “optim” in the software package21 [].

3.3. Simulation Study To examine the behavior of the proposed approach, a simulation study is carried out to assess the performance of the estimation procedure for the parameters μ, σ, and α in the MPN model. The simulation analysis is conducted by considering 1000 generated samples of sizes n = 50, 100, and 200 from the MPN distribution. The goal of this simulation is to study the behavior of the ML

8 Symmetry 2019, 11, 1410 estimators of the parameters by using our proposed procedure. To generate X ∼MPN(μ, σ, α), the following algorithm is used, ∼ ( ) 1. Step 1: Generate W Uniform 0, 1 . − α α 2. Step 2: Compute X = μ + σΦ 1 (2 W − W + 1)1/ − 1 .

− where μ ∈ R, σ > 0, α > 0 and Φ 1(·) is the quantile function of the standard normal distribution. For each generated sample of the MPN distribution, the ML estimates and corresponding standard deviation (SD) were computed for each parameter. As it can be seen in Table 3, the performance of the estimates improves when n and α increases.

Table 3. Maximum likelihood (ML) estimates and standard deviation (SD) for the parametersμ, σ and α of the MPN model for different generated samples of sizes n = 50, 100, and 200.

n = 50 μσ α μ (SD) σ (SD) α (SD) 0 1 0.1 −0.352478 (0.149214) 0.994441 (0.091321) 0.190243 (0.175202) 0.5 −0.19534 (0.14501) 0.990622 (0.094550) 0.613052 (0.272096) 0.8 −0.083183 (0.144587) 0.990286 (0.098669) 0.854338 (0.164924) 1 −0.009586 (0.141691) 0.995312 (0.0997256) 1.007328 (0.122688) 5 0.004225 (0.100001) 0.997408 (0.088254) 5.030272 (0.229064) 10 0.001108 (0.066610) 0.999124 (0.068611) 10.060478 (0.475019) 100 0.002171 (0.017362) 1.001152 (0.029604) 100.437990 (2.668190) n = 100 0 1 0.1 −0.351446 (0.104552) 0.998513 (0.070831) 0.180054 (0.111930) 0.5 −0.19268 (0.101786) 0.997622 (0.068806) 0.576957 (0.223378) 0.8 −0.08140 (0.099360) 0.997674 (0.069451) 0.830318 (0.152995) 1 0.002786 (0.097411) 0.996444 (0.069648) 1.002200 (0.088749) 5 0.002014 (0.099305) 0.996788 (0.085987) 5.023032 (0.221756) 10 0.002897 (0.046109) 1.000515 (0.050192) 10.032857 (0.339106) 100 0.000623 (0.012137) 1.000185 (0.019759) 100.168752 (1.866302) n = 200 0 1 0.1 −0.348177 (0.072732) 0.999433 (0.047548) 0.170978 (0.076165) 0.5 −0.196617 (0.072015) 0.999142 (0.047896) 0.562935 (0.218890) 0.8 −0.076657 (0.069510) 0.997719 (0.050718) 0.824700 (0.127661) 1 0.001158 (0.06877) 0.998408 (0.050586) 1.003651 (0.058344) 5 −0.000165 (0.053006) 1.000623 (0.044182) 5.005130 (0.115719) 10 −0.000239 (0.033615) 1.000017 (0.035902) 10.014958 (0.246652) 100 0.000514 (0.008452) 1.000491 (0.014599) 100.104380 (1.295144)

Fisher’s Information Matrix X−μ Let us now consider X ∼MPN(μ, σ, α) and Z = σ ∼MPN(0, 1, α). For a single observation x of X, the log-likelihood function for θ =(μ, σ, α) is given by 1 x − μ (θ)=log f (θ, x)=c(σ, α) − (x − μ)2 +(α − 1) log 1 + Φ . X 2σ2 σ

The corresponding first and second partial derivatives of the log-likelihood function are derived in the Appendix A. It can be shown that the Fisher’s information matrix for theMPN distribution is provided by ⎛ ⎞ μμ μσ μα ⎜ I I I ⎟ (θ)= IF ⎝ Iσσ Iσα ⎠ Iαα

9 Symmetry 2019, 11, 1410

with the following entries,

1 α − 1 μμ = + ( + ) I σ2 σ2 b11 b02 , 2 α − 1 μσ = E( ) − ( − − ) I σ2 Z σ2 b01 b21 b12 ,

= 1 Iμα σ b01, 1 3 α − 1 σσ = − + E( 2) − ( − − ) I σ2 σ2 Z σ2 2b11 b31 b22 ,

= 1 Iσα σ b11,

α 1 2 log2(2) Iαα = − , α2 (2α − 1)2   = E iκj( ) where bij Z Z;0,1 must be numerically computed. The Fisher’s (expected) information matrix can be obtained by computing the expected values of the above expressions. By taking in this matrix, α = 1, we have that Z ∼ N(μ, σ) and ⎛ ⎞ 1 0 1 d ⎜ σ2 σ 02 ⎟ (μ σ α = )=⎝ 2 1 ⎠ IF , , 1 0 σ2 σ d12 , 1 1 − 2( ) σ d02 σ d12 1 2 log 2  = ∞ ziφj(z) where dij −∞ 1+Φ(z) dz must be numerically obtained. (μ σ α = ) ( − 2( ) − 2 − 2 ) σ4 = σ4 = The determinant of IF , , 1 is 2 4 log 2 b12 2b02 / 0.003357435/ 0, consequently, the Fisher’s information matrix is nonsingular at α = 1. Therefore, for large samples, the ML estimators, θ,ofθ are asymptotically normal, that is, √ L θ − θ −→ ( (θ)−1) n N3 0, I , resulting in the asymptotic variance of the ML estimators θˆ being the inverse of Fisher’s information matrix I(θ). As the parameters are unknown, the observed information matrix is usually considered, where the unknown parameters are estimated by ML.

4. Application In this section, a numerical illustration based on a real dataset is presented. The goal of this application is to show empirical evidence that the MPN yields a better fit to data than the PN, SN, and t-student (TS) with α degrees of freedom distributions. For that reason, we consider a set of 3848 observations of the variable “density” included in the dataset verb “POLLEN5.DA” available at http://lib.stat.cmu.edu/datasets/pollen.data. This variable measures a geometric characteristic of a specific type of pollen. This dataset was previously used by Pewsey et al. [ 9]to compare thePN and SN distributions. A summary of some are displayed in Table 4 below.

10 Symmetry 2019, 11, 1410

Table 4. Summary of descriptive statistics for the pollen density dataset.

Mean Variance Skewness Kurtosis 0.000 −0.030 9.887 0.109 3.193

By using the results derived in Proposition 4, we have computed the moment estimates for the parameters (μ, σ, α) of the MPN distribution, obtaining(−5.609, 4.576, 11.857). Then, by taking these numbers as initial values, the ML estimates are derived. In Table 5, the ML estimates for the parameters of the MPN , PN, SN, and TS distributions. The figures between brackets are the asymptotic standard errors of the estimates obtained by inverting the Fisher’s information matrices for the three models evaluated at their respective ML estimates. Additionally, for each model, the values of the maximum of the log-likelihood function ( max) are reported. TheMPN distribution attains the largest value, and consequently provides a better fit to data.

Table 5. Parameter estimates; standard errors (SE); and maximum of the log-likelihood function,max, for the TS, SN, PN, and MPN corresponding to the pollen density dataset.

Parameters TS(SE) SN(SE) PN(SE) MPN (SE) μ −0.010 (0.05) −2.04 (0.24) −1.74 (0.68) −5.73 (0.43) σ 3.037 (0.05) 3.75 (0.14) 3.69 (0.21) 4.62 (0.14) α 29.995 (13.01) 0.93 (0.14) 1.77 (0.37) 12.13 (1.21)

max −9864.99 −9863.42 −9863.37 −9861.98

To compare the fit achieved by each distribution, the values of several measures of , i.e., Akaike’s information criterion (AIC) (see Akaike [ 22]) and Bayesian information criterion (BIC) (see Schwarz [23]) are reported in Table 6. A model with lower numbers in these measures of model selection is preferable. It can be seen that the MPN is preferable in terms of these two measures of model validation. In addition, the Kolmogorov–Smirnov test statistics and the correspondingp-values has been included in this table for all the models considered. It can be observed that none of the models is rejected at the usual significance levels. However, the MPN distribution has a higher p-value and is rejected later than the other two models. Alternative methods of model selection to the Kolmogorov–Smirnov test that can be applied here can be found in J a¨ntschi and Bolboac a˘ [24] and Ja¨ntschi [25]. Furthermore, the associated to the empirical distribution of the variable “density” in the pollen dataset is illustrated in the left hand side of Figure 5. In addition, the densities of TS, SN, PN, and MPN , by using the maximum likelihood estimates of their parameters, have been superimposed. Similarly, on the right hand side of Figure 5, the fit in both tails is shown. It is observable that, for this dataset, theMPN has thicker tails than the other three distributions. Finally, the QQ-plots for each distribution considered have been illustrated in Figure 6. Here, note that the ◦ MPN distribution exhibits an almost perfect alignment with the 45 line, and therefore it provides a better fit for extreme quantiles. Finally, Figure 7 displays the profile log-likelihood ofμ, σ, and α of the MPN distribution. It is noticeable that the estimates are unique.

Table 6. Akaike’s information criterion (AIC), Bayesian information criterion (BIC), Kolmogorov– Smirnov (KSS) test, and the corresponding p-values for all the models considered.

Criteria TS SN PN MPN AIC 19,735.98 19,732.84 19,732.74 19,729.96 BIC 19,754.74 19,751.61 19,751.50 19,748.72 KSS (p-value) 0.014 (0.516) 0.013 (0.559) 0.012 (0.627) 0.010 (0.820)

11 Symmetry 2019, 11, 1410

031 031 31 31 61 61 76 76

031 31

'HQVLW\ 61

76 'HQVLW\ 'HQVLW\             H Hí Hí Hí Hí Hí

í í    í í í 3ROOHQGHQVLW\

3ROOHQGHQVLW\ 3ROOHQGHQVLW\

Figure 5. Left panel: Histogram of the empirical distribution and fitted densities by ML superimposed for pollen dataset. Right panel: Plots of the tails for the four models.

D E 6DPSOH4XDQWLOHV 6DPSOH4XDQWLOHV í í    í í   

í í    í í   

7KHRUHWLFDO4XDQWLOHV 7KHRUHWLFDO4XDQWLOHV F G 6DPSOH4XDQWLOHV 6DPSOH4XDQWLOHV í í    í í   

í í    í í   

7KHRUHWLFDO4XDQWLOHV 7KHRUHWLFDO4XDQWLOHV

Figure 6. QQ-plots: (a) MPN model; (b) PN model; (c) SN model; (d) TSmodel.

12 Symmetry 2019, 11, 1410

Figure 7. Profile log-likelihood of μ, σ and α for the MPN distribution.

5. Concluding Remarks In this paper, a modification of the continuous symmetric-power distribution has been introduced. The particular case of the normal distribution the MPN distribution has been examined in detail. This distribution arises by modifying the distribution function of the symmetrical powers family. After carrying out this modification, a more flexible family of probability distributions is obtained, allowing for the kurtosis coefficient to take a certain range of values in the parameter space. For this model, its basic properties, different method of estimation and Fisher’s information matrix were studied. By using a real dataset, we showed that theMPN distribution provides a better fit than other existing models in the literature such as the TS, SN, and PN distributions.

Author Contributions: The authors contributed equally to this work. Funding: This work was partially completed while Héctor W. Gómez visited the Universidad de Las Palmas de Gran Canaria, supported by MINEDUC-UA project, code ANT1855. This research was also funded by (EGD) [Ministerio de Economía y Competitividad, Spain] grant number [ECO2013–47092]; (EGD)[Ministerio de Economía, Industria y Competitividad. Agencia Estatal de Investigación] grant number [ECO2017–85577–P ]; Acknowledgments: We also acknowledge the referee’s suggestions that helped us to improve this work. Conflicts of Interest: The authors declare no conflicts of interest.

Appendix A The first derivatives of (θ) are given by

∂(θ) − μ = 1 x − (α − )κ( ) ∂μ σ σ 1 x   ∂(θ) − μ 2 − μ = − 1 − x +(α − ) x κ( ) ∂σ σ 1 σ 1 σ x α ∂(θ) 1 2 log(2) x − μ = − + log 1 + Φ ∂α α 2α − 1 σ

13 Symmetry 2019, 11, 1410

The second derivatives of l(θ) are ∂2(θ) α − − μ = − 1 − 1 x κ( )+κ2( ) ∂μ2 σ2 σ2 σ x x   ∂2(θ) − μ α − − μ 2 − μ = − 2 x + 1 κ( ) − x − x κ( ) ∂μ∂σ σ2 σ σ2 x 1 σ σ x

∂2(θ) ( ) = − k x ∂μ∂α σ   ∂2(θ) − μ 2 α − − μ − μ 2 − μ = 1 − 3 x + 1 x κ( ) − x − x κ( ) ∂σ2 σ2 σ2 σ σ2 σ x 2 σ σ x ∂2(θ) − μ = − x ( ) ∂σ∂α σ2 k x

∂2(θ) α 2( ) = − 1 + 2 log 2 ∂α2 α2 (2α − 1)2

References

1. Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Stat. 1985, 12, 171–178. 2. Arellano-Valle, R.B.; Gómez, H.W.; Quintana, F.A. A New Class of Skew-Normal Distributions. Commun. Stat. Theory Methods 2004, 33, 1465–1480. [CrossRef] 3. Arellano-Valle, R.B.; Gómez, H.W.; Salinas, H.S. A note on the Fisher information matrix for the skew-generalized-normal model. Stat. Oper. Res. Trans. 2013, 37, 19–28. 4. Azzalini, A. The Skew-Normal and Related Families; IMS monographs; Cambridge University Press: New York, NY, USA 2014. 5. Martínez-Flórez, G.; Barranco-Chamorro, I.; Bolfarine, H.; Gómez, H.W. Flexible Birnbaum–Saunders Distribution. Symmetry 2019, 11, 1305. [CrossRef] 6. Contreras-Reyes, J.E.; Arellano-Valle, R.B. Kullback–Leibler Divergence Measure for Multivariate Skew-Normal Distributions. Entropy 2012, 14, 1606–1626. [CrossRef] 7. Chiogna, M. A note on the asymptotic distribution of the maximum likelihood estimator for the scalar skew-normal distribution. Stat. Methods Appl. 2005, 14, 331–341. [CrossRef] 8. Arellano-Valle, R.B.; Azzalini, A. The centred parametrization for the multivariate skew-normal distribution. J. Multivar. Anal. 2008, 99, 1362–1382. [CrossRef] 9. Pewsey, A.; Gómez, H.W.; Bolfarine, H. Likelihood-based inference for power distributions. Test 2012, 21, 775–789. [CrossRef] 10. Lehmann, E.L. The power of rank tests. Ann. Math. Statist. 1953, 24, 23–43. [CrossRef] 11. Durrans, S.R. Distributions of fractional order statistics in hydrology. Water Resour. Res.1992, 28, 1649–1655. [CrossRef] 12. Gupta, D.; Gupta, R.C. Analyzing skewed data by power normal model. Test 2008, 17, 197–210. [CrossRef] 13. Martínez-Flórez, G.; Arnold, B.C.; Bolfarine, H.; Gómez, H.W. The Multivariate Alpha-power Model. J. Stat. Plan. Inference 2013, 143, 1236–1247. [CrossRef] 14. Martínez-Flórez, G.; Bolfarine, H.; Gómez, H.W. Asymmetric regression models with limited responses with an application to antibody response to vaccine. Biom. J. 2013, 55, 156–172. [CrossRef][PubMed] 15. Martínez-Flórez, G.; Bolfarine, H.; Gómez, H.W. The log alpha-power asymmetric distribution with application to air pollution. Environmetrics 2014, 25, 44–56. [CrossRef] 16. Martínez-Flórez, G.; Bolfarine, H.; Gómez, H.W. Doubly censored power-normal regression models with inflation. Test 2015, 24, 265–286. [CrossRef] 17. Castillo, N.O.; Gallardo, D.I.; Bolfarine, H.; Gómez, H.W. Truncated Power-Normal Distribution with Application to Non-Negative Measurements. Entropy 2018, 20, 433. [CrossRef]

14 Symmetry 2019, 11, 1410

18. Maciak, M.; Pesˇtová, B.; Pesˇta, M. Structural breaks in dependent, heteroscedastic, and extremal panel data. Kybernetika 2018, 54, 1106–1121. [CrossRef] 19. Pesˇta, M. Total least squares and bootstrapping with application in calibration. Statistics 2013, 47, 966–991. [CrossRef] 20. Shaked, M.; Shanthikumar, J.G. Stochastic Orders; Springer Science & Business Media: New York, NY, USA, 2007. 21. R Development Core Team. A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2019. 22. Akaike, H. A new look at the identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [CrossRef] 23. Schwarz, G. Estimating the dimension of a model. Ann. Statist. 1978, 6, 461–464. [CrossRef] 24. Ja¨ntschi, L.; Bolboac a˘, S.D. Computation of Probability Associated with Anderson–Darling . Mathematics 2018, 6, 88. [CrossRef] 25. Ja¨ntschi, L. A Test Detecting the Outliers for Continuous Distributions Based on the Cumulative Distribution Function of the Data Being Tested. Symmetry 2019, 11, 835. [CrossRef]

c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

15

S symmetry S

Article Normal-G Class of Probability Distributions: Properties and Applications

Fábio V. J. Silveira 1, Frank Gomes-Silva 1,*, Cícero C. R. Brito 2, Moacyr Cunha-Filho 1, Felipe R. S. Gusmão 1 and Sílvio F. A. Xavier-Júnior 3 1 Department of Statistics and Informatics, Rural Federal University of Pernambuco, Recife 52171900, Pernambuco, Brazil; [email protected] (F.V.J.S.); [email protected] (M.C.-F.); [email protected] (F.R.S.G.) 2 Federal Institute of Education, Science and Technology of Pernambuco, Recife 50740545, Pernambuco, Brazil; [email protected] 3 Department of Statistics, Paraíba State University, Campina Grande 58429500, Paraíba, Brazil; [email protected] * Correspondence: [email protected]

Received: 28 September 2019; Accepted: 12 November 2019; Published: 15 November 2019

Abstract: In this paper, we propose a novel class of probability distributions called Normal-G. It has the advantage of demanding no additional parameters besides those of the parent distribution, thereby providing parsimonious models. Furthermore, the class enjoys the property of identifiability whenever the baseline is identifiable. We present special Normal- G sub-models, which can fit asymmetrical data with either positive or negative skew. Other important mathematical properties are described, such as the series expansion of the probability density function (pdf), which is used to derive expressions for the moments and the moment generating function (mgf). We bring Monte Carlo simulation studies to investigate the behavior of the maximum likelihood estimates (MLEs) of two distributions generated by the class and we also present applications to real datasets to illustrate its usefulness.

Keywords: probabilistic distribution class; normal distribution; identifiability; maximum likelihood; moments

1. Introduction For many purposes, statistical distributions are used in a plethora of science fields. They are regularly useful tools to describe natural and social phenomena, providing suitable models which can help dealing with real problems, such as for instance, those concerning the prediction of an event of interest. Recent works have focused attention at formulating and describing new classes of probability distributions, which are defined generally as extensions of widely known models by adding a single or more parameters to the cumulative distribution function (cdf). Hopefully, the new models will provide more flexibility and better fitting to real data. Some examples are [ 1,2] where a shape parameter is added to the model by exponentiating the cdf. A general method of introducing a parameter to expand a family of distributions was presented by3 [ ]; they applied the method to create a new two-parameter extension of the and a new three-parameter . A natural generalization of the Normal pdf was proposed by4 [] and perhaps it is the most widely known generalized Normal distribution. The power 2 appearing in the original pdf was replaced by a shape parameter s > 0. Therewith, the new pdf becomes:     − μ s ( |μ σ )= −  x  f x , , s K exp  σ  ,

Symmetry 2019, 11, 1407; doi:10.3390/sym1111140717 www.mdpi.com/journal/symmetry Symmetry 2019, 11, 1407 where K is a normalizing constant, which depends onσ and s. One can see that the is a particular case of the generalized Normal of Nadarajah [4] when s = 1. Azzalini [5] defined a mathematically tractable class that includes strictly (not just asymptotically) the Normal distribution. The general pdf of the class is 2 G(λy) f (y) for −∞ < y < ∞, where λ ∈ R, d = Φ G is an absolutely continuous cdf, dy G and f are pdfs symmetric about 0. Making G and f = φ, namely the standard normal cdf and pdf respectively, one gets to the well-known skew-normal distribution, whose pdf isφ(y; λ)=2φ(y)Φ(λy). It is easy to see that φ(y;0)=φ(y), but when λ = 0, the distribution is asymmetric and its coefficient of skewness has the same sign as λ. A generalization denoted by compressed normal distribution was introduced by [ 6], whose objective was dealing with negatively skewed data (specifically with human longevity data); in this way, they induced a skew by addingkx to the denominator of the location-scale transformation, that is,

x − μ t(x)= σ + kx and when k < 0, the curve presents a negative skew; for k > 0, a positive skew occurs. Classes with one or more additional parameters usually generalize existing classes as particular cases. The McDonald-Weibull distribution7 [ ] is an important sub-model of the McDonald class; it has three extra parameters and includes the Beta-Weibull [8] and the Kumaraswamy-Weibull [9] as special cases. A technique to derive families of continuous distributions using a pdf as a generator was introduced by [10] and the models emerged from such method are called members of the T-X family. In other words, if r(t) is the pdf of a random variable T ∈ [a, b], for −∞ ≤ a < b ≤ ∞ and W(G(x)) is a function of the cdf G(x) of a random variable X so that: • W(G(x)) ∈ [a, b]; • W(G(x)) is differentiable and monotonically non-decreasing; • W(G(x)) → a as x →−∞ and W(G(x)) → b as x → ∞;  ( )= W(G(x)) ( ) then F x a r t dt is the cdf of a new family of distributions. An example of a T-X family member is the Gompertz- G class [11]; to define its cdf, the chosen γ − θ ( γt− ) functions wereW[G(x)] = − log[1 − G(x)] and r(t)=θe te γ e 1 for t > 0, given that θ > 0, γ > 0. Varying G(x), one can get different sub-models of the class. The procedure to define a T-X family member is indeed capable to generalize a large number of distributions. Even though it can be regarded as a particular case described by the method of generating classes of probability distributions presented in the recent work of [12]. This new method has a high power of generalization. It consists of creating distribution classes by integrating a cdf, such that the limits of the integration are special functions that satisfy some conditions. Thus, the cdf of the general class is given by:

( ) ( ) n Uj x n Vj x F(x)=ζ(x) ∑ dH(t) − ν(x) ∑ dH(t) (1) ( ) ( ) j=1 Lj x j=1 Mj x

∈ N ζ ν R → R R → R ∪{±∞} where H is a cdf, n , , : and Lj, Uj, Mj, Vj : are the aforementioned special functions that will be discussed in the next section. Based on this innovative method, we introduce the Normal-G class of distributions. We consider that this extension will yield good submodels. This paper aims to investigate and compare some of them with other competitive extended probability distributions.

2. The Normal-G Class and Some Mathematical Properties ζ ν R → R R → R ∪{±∞} The method established by [ 12] states that if H, , : and Lj, Uj, Mj, Vj : for j = 1, 2, 3, . . . , n are monotonic and right continuous functions such that: (c1) H is a cdf and ζ and ν are non-negative;

18 Symmetry 2019, 11, 1407

ζ( ) ( ) ( ) ν( ) ( ) ( ) ∀ = (c2) x , Uj x and Mj x are non-decreasing and x , Vj x , Lj x are non-increasing j 1, 2, 3, . . . , n; ζ( ) = ν( ) ζ( )= ( )= ( ) ∀ = (c3) If lim x lim x , then lim x 0 or lim Uj x lim Lj x j x→−∞ x→−∞ x→−∞ x→−∞ x→−∞ ν( )= ( )= ( ) ∀ = 1, 2, 3, . . . , n, and lim x 0 or lim Mj x lim Vj x j 1, 2, 3, . . . , n; x→−∞ x→−∞ x→−∞ ζ( )= ν( ) = ( )= ( ) ( )= (c4) If lim x lim x 0, then lim Uj x lim Vj x and lim Mj x x→−∞ x→−∞ x→−∞ x→−∞ x→−∞ ( ) ∀ = lim Lj x j 1, 2, 3, . . . , n; x→−∞ ( ) ≤ ( ) ν( ) = ( ) ≤ ( ) ∀ = (c5) lim Lj x lim Uj x and if lim x 0, then lim Mj x lim Vj x j x→−∞ x→−∞ x→−∞ x→+∞ x→+∞ 1, 2, 3, . . . , n; ( ) ≥ { ∈ R ( ) < } ( ) ≤ { ∈ R ( ) > } (c6) lim Un x sup x : H x 1 and lim L1 x inf x : H x 0 ; x→+∞ x→+∞ (c7) lim ζ(x)=1; x→+∞ ν( )= ( )= ( ) ∀ = ≥ (c8) lim x 0 or lim Mj x lim Vj x j 1, 2, 3, . . . , n and n 1; x→+∞ x→+∞ x→+∞ ( )= ( ) ∀ = − ≥ (c9) lim Uj x lim Lj+1 x j 1, 2, 3, . . . , n 1 and n 2; x→+∞ x→+∞ ( ) ( ) (c10) H is a cdf without points of discontinuity or all functions Lj x and Vj x are constant at the right of the vicinity of points whose image are points of discontinuity of H, being also continuous in that points. Moreover, H does not have any point of discontinuity in the set ( ) ( ) ( ) ( ) = lim Lj x , lim Uj x , lim Mj x , lim Vj x for some j 1, 2, 3, . . . , n; x→±∞ x→±∞ x→±∞ x→±∞ then Equation (1) is a cdf. = ζ( )= ν( )= ( )=−∞ ( )=[ ( ) − ] ( ( )[ − ( )]) Let n 1, x 1, x 0, L1 x , U1 x 2G x 1 / G x 1 G x and H(t)=Φ(t); the function in Equation (1) turns into:

2G(x)−1 G(x)(1−G(x)) F (x)= dΦ(t) , (2) G −∞

( ) ν( )= ( ) ( ) where G x is a cdf. Since x 0, there is no need to specify M1 x and V1 x . The conditions (c1), (c7), (c8) and (c10) are straightforward; clearly (c4), (c5) and (c9) do not need to be verified in this case. Given that G(x) is non-decreasing:

< ⇒ ( ) ≤ ( ) ⇒ 1 ≤ 1 x1 x2 G x1 G x2 − ( ) − ( ) 1 G x1 1 G x2 ⇒ 1 − 1 ≤ 1 − 1 − ( ) ( ) − ( ) ( ) 1 G x1 G x1 1 G x2 G x2 ( ) − ( ) − ⇒ 2G x1 1 ≤ 2G x2 1 ⇒ ( ) ≤ ( ) ( )( − ( )) ( )( − ( )) U1 x1 U1 x2 , G x1 1 G x1 G x2 1 G x2 ( ) ζ( ) ( ) so U1 x is non-decreasing, as well as x ; and since L1 x is non-increasing, (c2) is true. Considering ( )= [ − ( )] − ( ) ( )=−∞ = ( ) that U1 x 1/ 1 G x 1/G x , it is easy to verify that lim U1 x lim L1 x ; and x→−∞ x→−∞ ν( )= ( )=+∞ = sincelim x 0, (c3) is satisfied. The condition (c6) is also true because lim U1 x x→−∞ x→+∞ { ∈ R ( ) < } ( )=−∞ = { ∈ R ( ) > } sup x : H x 1 and lim L1 x inf x : H x 0 . x→+∞ Therefore, according to the method exposed above, Equation (2) is a cdf and, from now on, we will denote it by Normal- G class of probability distributions. The new cdf can be viewed as a composed function of G(x), which will be referred as parent distribution or baseline; in agreement with12 [ ], if the baseline is continuous (discrete), then the Normal-G will generate a continuous (discrete) distribution, whose support will be the same as G(x). It is worth remarking that the proposed class demands no additional parameters other than the ones of the parent distribution. Although the Normal- G class has been defined as a composed function of a single G(x),itis possible to formulate classes that depend on more than one baseline; see [12] for further details.

19 Symmetry 2019, 11, 1407

We can rewrite Equation (2) as:

2G(x)−1 ( )= G(x)(1−G(x)) √1 −t2/2 FG x e dt , (3) −∞ 2π  √1 −t2/2 x and since φ(t)= e , and Φ(x)= −∞ φ(t)dt, we get to: 2π 2G(x) − 1 F (x)=Φ . (4) G G(x)[1 − G(x)]

In case of continuous G(x), we can take the derivative of Equation (4) with respect to x: 2G(x) − 1 1 − 2G(x)[1 − G(x)] f (x)=φ g(x) . (5) G G(x)[1 − G(x)] G(x)2[1 − G(x)]2

The expression in Equation (5) is the pdf of the class Normal- G, whose hazard rate function (hrf) is given by: φ 2G(x)−1 ( )[ − ( )] − ( )[ − ( )] τ ( )= G x 1 G x 1 2G x 1 G x ( ) G x g x . − Φ 2G(x)−1 G(x)2[1 − G(x)]2 1 G(x)[1−G(x)] Many distributions presented in the statistical literature undergo the problem of non-identifiability. One cannot assume that the parameters of a non-identifiable model will be uniquely determined from a set of observed random variables; in other words, inferences on the parameters may not be reliable. As the Theorem 1 states, the Normal- G class is exempt from this problem, whenever the parent distribution G satisfies the property of identifiability.

Theorem 1. If the cdf FG belongs to the Normal-G class and the cdf G is identifiable, then FG is identifiable.

< ( |ξ ) < = ξ Proof of Theorem 1. Given that 0 G x j 1 for j 1, 2, where j is a parametric vector and ( |ξ )= ( |ξ ) assuming that FG x 1 FG x 2 , we have: ( |ξ ) − ( |ξ ) − Φ 2G x 1 1 = Φ 2G x 2 1 ( |ξ )[ − ( |ξ )] ( |ξ )[ − ( |ξ )] . G x 1 1 G x 1 G x 2 1 G x 2

Since the function Φ is injective, we can write:

1 − 1 = 1 − 1 − ( |ξ ) ( |ξ ) − ( |ξ ) ( |ξ ) 1 G x 1 G x 1 1 G x 2 G x 2 ( |ξ ) − ( |ξ ) ( |ξ ) − ( |ξ ) G x 1 G x 2 = G x 1 G x 2 [ − ( |ξ )][ − ( |ξ )] − ( |ξ ) ( |ξ ) 1 G x 1 1 G x 2 G x 1 G x 2 ( |ξ ) = ( |ξ ) If G x 1 G x 2 , then:

[ − ( |ξ )][ − ( |ξ )] = − ( |ξ ) ( |ξ ) 1 G x 1 1 G x 2 G x 1 G x 2 (6)

The left-hand side of Equation (6) is necessarily positive for almost all x ∈ R, whereas the right-hand ( |ξ )= ( |ξ ) ⇒ ξ = ξ side is negative, a contradiction. Thereby, G x 1 G x 2 1 2.

2.1. Special Normal-G Sub-Models Here we present two distributions from the Normal-G class.

20 Symmetry 2019, 11, 1407

2.1.1. The Normal-Weibull Distribution Weibull is one of the most used models to describe natural phenomena and failure of several kinds of components. It is extensively used in and reliability. In recent times, many authors have focused on new extensions for it, such as13 [ ,14]. The two-parameter Weibull cdf is given ( | λ)= − −(x/λ)k ≥ λ > by GW x k, 1 e for x 0, where k, 0. Replacing the baseline G in Equation (4)by GW, we get to the Normal-Weibull cdf, namely:   (x/λ)k − ( )=Φ e 2 FNW x , (7) 1 − e−(x/λ)k for x ≥ 0. Using Equation (5) to write the corresponding pdf, we have:   −( λ)k −( λ)k ( λ)k − − − x/ x/ e x/ − 2 kxk 1 1 2 1 e e f (x)=φ . (8) NW − −(x/λ)k λk 2 1 e e−(x/λ)k 1 − e−(x/λ)k

Plots of pdf and hrf of the Normal-Weibull distribution for different values of the parameters are portrayed in Figure 1. The different shapes of the hrf curve evince the flexibility of the model. Particularly for k = 1, the Weibull distribution is equivalent to an Exponential distribution, so the hrf is constant; in contrast, the Normal-Exponential model has an increasing hrf in some left-bounded interval.

k = 1.0, λ = 1.0 k = 1.0, λ = 1.0 k = 1.5, λ = 0.5 k = 4.0, λ = 1.0 k = 2.5, λ = 1.5 k = 0.4, λ = 0.7 k = 0.4, λ = 0.7 k = 0.22, λ = 1.0 k = 4.0, λ = 0.9 hrf pdf 012345 0123456

0.0 0.5 1.0 1.5 0.0 0.2 0.4 0.6 0.8

x x

Figure 1. Plots of pdf and hrf for the Normal-Weibull distribution.

In Figure 2, the vertical axis shows the range of values of Pearson’s moment coefficient of skewness, which depends on the parametersk and λ. We can see in the graph that the Normal-Weibull distribution is also able to fit data with either positive or negative skew.

Figure 2. Skewness of the Normal-Weibull distribution.

21 Symmetry 2019, 11, 1407

2.1.2. The Normal-Log-

The Log-logistic distribution is commonly applied to reliability and oftentimes it works well −1 ( |α β)= − +( α)β ≥ α β > as a lifetime model. Its cdf is given by GLL x , 1 1 x/ for x 0, where , 0. The Normal-log-logistic cdf is easily obtained replacing the parent distribution G in Equation (4)by GLL. Thus: β −β ( )=Φ x − x FNLL x α α , (9) for x ≥ 0. Taking the derivative of Equation (9) with respect to x, we get to the pdf: β −β β x x x 2 β −β− f (x)=φ − 1 + βα x 1 . (10) NLL α α a

Figure 3 shows plots of pdf and hrf for different values of α and β. It is worth noting that the Normal-log-logistic distribution may have a decreasing hrf of early failure. It is also possible for the hrf to be increasing or unimodal.

α = 1.0, β = 1.0 α = 1.0, β = 1.0 α = 0.4, β = 1.0 α = 0.4, β = 1.0 α = 0.9, β = 0.3 α = 1.3, β = 4.6 α = 1.3, β = 4.6 α = 2.7, β = 5.0 α = 2.7, β = 5.0 hrf pdf 0123456 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 0.0 0.1 0.2 0.3 0.4 0.5 0.6

x x

Figure 3. Plots of pdf and hrf for the Normal-log-logistic distribution.

Pearson’s moment coefficient of skewness for the Normal-log-logistic distribution is depicted in Figure 4.

Figure 4. Skewness of the Normal-log-logistic distribution.

2.2. Series Representation The normal cdf is related to the erf as follows: 1 z Φ(z)= 1 + erf √ , (11) 2 2

22 Symmetry 2019, 11, 1407

 √ ( )= √2 z −t2 ( ) where erf z π 0 e dt. Provided that erf z/ 2 can be linearly represented by: √ ∞ (− )n · ( )2n+1 √z = √2 1 z/ 2 erf ∑ ( + ) 2 π = n! 2n 1 n 0 ∞ n 2n+1 = 2 · − 1 z π ∑ ( + ) , (12) n=0 2 n! 2n 1 replacing Equation (12) in Equation (11), we obtain:

∞ + 1 1 1 n z2n 1 Φ(z)= + √ ∑ − . (13) π ( + ) 2 2 n=0 2 n! 2n 1

Now, considering |G(x)| < 1, we can write: ( ) − ( ) − ∞ 2G x 1 = 2G x 1 · 1 = − 1 ( )k ( )[ − ( )] ( ) − ( ) 2 ( ) ∑ G x (14) G x 1 G x G x 1 G x G x k=0 and replacing z of the right member of Equation (13) by the expression in Equation (14), we have:   ∞ ∞ 2n+1 2G(x) − 1 1 1 (−1/2)n 1 Φ = + √ ∑ 2 − ∑ G(x)k ( )[ − ( )] π ( + ) ( ) G x 1 G x 2 2 n=0 n! 2n 1 G x = k 0  2n+1 ∞ (− )n 2n+1 ∞ = 1 + √1 1/2 − 1 ( )k ∑ ( + ) 2 ( ) ∑ G x . (15) 2 2π = n! 2n 1 G x = n 0 ! "# $ ! k 0 "# $ A1 A2

The right member of Equation (15) has two factors, namely, A1 and A2, that can be rewritten as power series. Concerning to A1, the binomial theorem allows us to write: 2n+1 2n+1 + j − 1 = 2n 1 2n+1−j − 1 2 ( ) ∑ 2 ( ) G x j=0 j G x 2n+1 2n + 1 + − − = ∑ (−1)j · 22n 1 j · G(x) j j=0 j 2n+1 = δ · ( )−j ∑ j G x (16) j=0

It is a known result related to power series raised to powers that:   ∞ N ∞ ( )k = ( )k ∑ akG x ∑ ckG x , (17) k=0 k=0

N 1 k where c = a , c = ∑ = (sN − k + s)asc − for k ≥ 1 and N ∈ N. Setting N = 2n + 1 and a = 1 0 0 k ka0 s 1 k s k for all k ≥ 0, we get to the expression A2 in Equation (15) and we can use the result in Equation (17)to write as follows:   ∞ 2n+1 ∞ ( )k = · ( )k ∑ G x ∑ ck G x , (18) k=0 k=0

23 Symmetry 2019, 11, 1407

= = 1 ∑k ( [ + ] − + ) ≥ + ∈ N such that c0 1, ck k s=1 s 2n 1 k s ck−s for k 1 and 2n 1 . Now replacing A1 and A2 of the Equation (15) by the right members of the Equations (16) and (18) respectively, we obtain the result below: ∞ 2n+1 ∞ 2G(x) − 1 1 1 (−1/2)n − Φ = + √ ∑ · ∑ δ · G(x) j · ∑ c · G(x)k ( )[ − ( )] π ( + ) j k G x 1 G x 2 2 n=0 n! 2n 1 j=0 k=0 ∞ 2n+1 ∞ + + − 1 2n + 1 (−1)n j · 2n 1 j − = + ∑ ∑ ∑ √ c ·G(x)k j 2 j ( + ) π k n=0 j=0 k=0 ! n! "#2n 1 2 $ η j,n,k ∞ 2n+1 = 1 + η · ( )k−j ∑ ∑ j,n,k G x . (19) 2 n,k=0 j=0

The Fubini’s theorem on differentiation allows us to write the derivative of Equation (19) with respect to x as follows:

∞ 2n+1 ( )= η · ( − ) ( ) ( )k−j−1 fG x ∑ ∑ j,n,k !k j g x"#G x $ . (20) n,k=0 j=0 ( ) gk−j x

( ) Since gk−j x is the pdf of a random variable of the exponentiated family, as described in [15,16], one can say that (20) is the Normal- G pdf (5) expressed as a linear combination of pdfs of exponentiated distributions. Such useful property is typically found and detailed in works on new classes of distributions; see for instance: [17–20].

2.3. Quantile Function By inverting Equation (4), the quantile function associated with the Normal- G class is obtained. = ( ) For simplification, let us write v FG x . From Equation (4) we have:

− 2G(x) − 1 − − Φ 1(v)= ⇒ Φ 1(v)G(x)2 +[2 − Φ 1(v)]G(x) − 1 = 0, G(x)[1 − G(x)] that is, a quadratic equation for G(x), that admits the following two solutions: % % − − Φ 1(v) − 2 − 4 + Φ−1(v)2 Φ 1(v) − 2 + 4 + Φ−1(v)2 and . 2Φ−1(v) 2Φ−1(v)

If the first solution above is picked, then G(x) might assume values lesser than 0 (see v = 0.95 for example). On the other hand, the second one allows us to verify that 0 < G(x) < 1 is valid for all x ∈ R. Finally, we can write the quantile function of Equation (4) as follows:  %  − Φ 1(v) − 2 + 4 + Φ−1(v)2 Q (v)=Q , (21) F G 2Φ−1(v)

(·) such that QG is the quantile function of the baseline G. A uniform random number generator and (21) make the simulation of random variables following (3) quite simple. Namely, if Z ∼U(0, 1), ( ) then QF Z follows a Normal-G distribution.

24 Symmetry 2019, 11, 1407

2.4. Raw Moments, Incomplete Moments and Moment Generating Function Provided that X follows a Normal- G distribution, the rth raw moment of X is E(Xr)=  ∞ r ( ) ( ) ∈ Z∗ −∞ x fG x dx, where fG x is given in Equation (20) and r +. Using Fubini’s theorem to change the order of integration and series, we have: ∞ 2n+1 ∞ E(Xr)= ∑ ∑ η xr g − (x)dx (22) j,n,k −∞ k j n,k=0 j=0 ∞ 2n+1 = η ( r ) ∑ ∑ j,n,kE Yk−j (23) n,k=0 j=0

( ) where Yk−j follows the exponentiated distribution whose pdf is gk−j x shown in Equation (20). Despite the upper infinity limit in the sums, expressions like Equation (23) are not intractable. According to [21], one can get fairly accurate results truncating each infinite sum by 20; they used numerical routines to compute accurately similar expressions for the moments of some Kumaraswamy generalized distributions. The rth moment can also be represented in terms of the quantile function of the baseline. Defining = ( ) 1/(k−j) u Gk−j x and replacing x in Equation (22)byQG u , we have:

∞ + 2n 1 1 1 r ( r)= η k−j E X ∑ ∑ j,n,k QG u du . n,k=0 j=0 0

The rth incomplete moment of X is given by the following expression: z ∞ 2n+1 r ∗ Tr(z)= x f (x)dx = ∑ ∑ η T (z) (24) −∞ G j,n,k r n,k=0 j=0

∗( ) where Tr z is the rth incomplete moment of Yk−j. One can also write Equation (24) in terms of the quantile function of G: ∞ + k−j 2n 1 [G(x)] 1 r ( )= η k−j Tr z ∑ ∑ j,n,k QG u du . n,k=0 j=0 0

The mgf is a function associated with a random variable, whose moments can be straightforwardly derived using it. It is also useful to check whether two functions of random variables are equal since ( ) there is a bijection between pdfs and mgfs (when they exist). The mgf MX t of X is the tX ∈ (−ι ι) ι > ( ) − of e , where t , , 0. Given that M Yk−j t is the mgf of Yk j, on the lines of Equation (23), we can write: ∞ 2n+1 ∞ ( )= η tx ( ) MX t ∑ ∑ e g − x dx j,n,k −∞ k j n,k=0 j=0 ∞ 2n+1 = ∑ ∑ η ( ) j,n,kMYk−j t . n,k=0 j=0

2.5. Estimation and Inference Attractive asymptotic properties, such as efficiency and consistency, are some of the reasons that make the maximum likelihood method the most usually applied method of parametric . The MLEs are the points that maximize the likelihood function over the domain of the parameter space. Since the logarithmic function is increasing, performing the maximization of the log-likelihood function, besides being a more convenient task, also provides the MLEs.

25 Symmetry 2019, 11, 1407

ξ =(ξ ξ ) × Given that 1, ..., r is the r 1 parametric vector of a random variable X that follows a Normal-G distribution, G(x|ξ)=Gξ (x) is the baseline, g(x|ξ)=gξ (x) is its corresponding pdf and =( ) X x1,...,xm is a complete random sample of size m from X, then the log-likelihood function is:  m ξ ( ) − m 2G xj 1 2 (ξ| )=∑ φ   + ∑ − ξ ( )+ ξ ( ) X log ( ) − ( ) log 1 2G xj 2G xj j=1 Gξ xj 1 Gξ xj j=1 (25) m m   m − ( ) − − ( ) + ( ) 2 ∑ log Gξ xj 2 ∑ log 1 Gξ xj ∑ log gξ xj . j=1 j=1 j=1

Thanks to powerful functions available within the software for statistical computing, it is possible to use numerical methods to maximize (25); for this purpose, R [22] brings the function optim in package stats. The MLEs can also be obtained by solving the system of equationsU(ξ|X)=0r, where U(ξ|X)= ∇ (ξ| )=( ) ξ X ui 1≤i≤r is the score vector, such that:

m [ ( ) − ]4 − 4( ) m ( ) − Gξ xj 1 Gξ xj ∂ 4Gξ xj 2 ∂ u = ∑ · Gξ (x )+∑ · Gξ (x ) i 3( )[ − ( )]3 ∂ξ j − ( )+ 2( ) ∂ξ j j=1 Gξ xj 1 Gξ xj i j=1 1 2Gξ xj 2Gξ xj i m ∂ m ∂ m ∂ − 1 · ( )+ 1 · ( )+ 1 · ( ) 2 ∑ Gξ xj 2 ∑ Gξ xj ∑ gξ xj ξ ( ) ∂ξ − ξ ( ) ∂ξ ξ ( ) ∂ξ j=1 G xj i j=1 1 G xj i j=1 g xj i and 0r is a r × 1 vector of zeros. The information matrixJ(ξ|X) is essential to construct confidence intervals and to test hypotheses ξ (ξ| ) I on . The expectation of √J X is the expected Fisher information matrix ξ and under certain conditions of regularity, m(ξ − ξ) follows approximately a multivariate normal distribution −1 Nr 0r, Iξ . The expression for J(ξ|X) is presented in Appendix A.

3. Numerical Analysis

3.1. Simulation Study We used the free software R version 3.4.4 [22] to carry out the Monte Carlo simulation study; the number of replications was 10,000. The pseudo-random samples were generated via Von Neumann’s acceptance-rejection method [ 23]. This simple procedure requires the corresponding pdf y = f (x), a minorant and a majorant forx and a majorant for y; it is not necessary to implement the quantile function in this case. Four sample sizes, namely n = 50, 100, 200 and 500, and five different values for the vector of parameters were considered. For each scenario, we calculated the bias and the mean squared error (MSE) as follows: 10000 10000 2 = 1 ξ − ξ = 1 ξ − ξ Biasi ∑ ij i , MSEi ∑ ij i 10000 j=1 10000 j=1

ξ ξ =(ξ ξ ) ξ ξ where i is the i-th element of the vector of parameters 1, ..., r and ij is the estimate for i at the j-th . The log-likelihood function was maximized using the technique of simulated ξ annealing, available by theoptim subroutine, for which the user has to pass a vector 0 of initial values. ξ = × At first, we took 0 1r, namely a r 1 vector of ones, then we run one single replication considering = ξ sample size n 50; the obtained estimates from this procedure were assigned to 0 and used in all of the aforementioned scenarios. The results for both parameters of the Normal-Weibull density (8), shown in Table 1, indicate that the estimates are fairly close to the actual values. Moreover, as it would be expected, the bigger the sample size, the smaller the MSEs.

26 Symmetry 2019, 11, 1407

Table 1. Bias and MSE of the estimates under the maximum likelihood method for the Normal-Weibull model.

Actual Value Bias MSE nk λ k λ k λ

1.0 1.7 0.02707850 −0.00948110 0.00822299 0.00659991 0.5 2.0 0.01446186 −0.02037125 0.00209051 0.03647395 50 3.0 0.5 0.07878343 −0.00096778 0.07352625 0.00006357 0.9 4.0 0.02586182 −0.02752101 0.00675452 0.04497248 7.1 5.8 0.19086399 −0.00540407 0.41402178 0.00153179 1.0 1.7 0.01306919 −0.00453981 0.00377883 0.00332042 0.5 2.0 0.00726917 −0.01412373 0.00095786 0.01838681 100 3.0 0.5 0.03914672 −0.00057766 0.03400929 0.00003204 0.9 4.0 0.01167774 −0.01403219 0.00305903 0.02273089 7.1 5.8 0.08363335 −0.00279451 0.18835856 0.00077190 1.0 1.7 0.00651588 −0.00253820 0.00181409 0.00166703 0.5 2.0 0.00358578 −0.00678178 0.00045681 0.00923355 200 3.0 0.5 0.01901041 −0.00029362 0.01628677 0.00001604 0.9 4.0 0.00658567 −0.00745541 0.00148059 0.01138373 7.1 5.8 0.03656102 −0.00066316 0.09041610 0.00038519 1.0 1.7 0.00317837 −0.00127234 0.00071164 0.00066754 0.5 2.0 0.00195165 −0.00609800 0.00017967 0.00370983 500 3.0 0.5 0.00748033 −0.00008804 0.00636008 0.00000641 0.9 4.0 0.00297109 −0.00200533 0.00057744 0.00455810 7.1 5.8 0.01427116 −0.00045889 0.03550063 0.00015444

The results given in Table 2 suggest that the estimates of the parameters of the Normal-log-logistic model (10) have similar behavior of those shown in Table 1, that is to say, the biases are quite small and the MSE decreases as the sample size increases.

Table 2. Bias and MSE of the estimates under the maximum likelihood method for the Normal-log-logistic model.

Actual Value Bias MSE n αβ α β α β

2.7 5.0 0.00054404 0.13424700 0.00109551 0.20695579 0.4 1.2 0.00024567 0.03275857 0.00041864 0.01195999 50 6.0 2.5 −0.00002168 0.06835325 0.02161847 0.05192641 4.0 3.4 0.00185659 0.08997153 0.00521251 0.09539035 1.0 8.0 0.00010181 0.21835854 0.00005868 0.53164800 2.7 5.0 0.00012046 0.06377031 0.00055694 0.09509012 0.4 1.2 0.00005101 0.01519026 0.00021246 0.00547655 100 6.0 2.5 0.00191769 0.03297888 0.01099844 0.02386620 4.0 3.4 −0.00004923 0.04701171 0.00263820 0.04439236 1.0 8.0 0.00009827 0.11057419 0.00002980 0.24582360 2.7 5.0 0.00015504 0.03299212 0.00028047 0.04585339 0.4 1.2 0.00006551 0.00866507 0.00010677 0.00265802 200 6.0 2.5 −0.00106277 0.01625314 0.00553891 0.01145699 4.0 3.4 −0.00002407 0.02295567 0.00133064 0.02123514 1.0 8.0 0.00000852 0.05090923 0.00001503 0.11715066 2.7 5.0 0.00021558 0.01327469 0.00011277 0.01789692 0.4 1.2 −0.00008941 0.00435633 0.00004284 0.00104215 500 6.0 2.5 −0.00042358 0.00648252 0.00222639 0.00447192 4.0 3.4 −0.00007789 0.00855609 0.00053513 0.00826466 1.0 8.0 0.00003954 0.01663711 0.00000604 0.04559331

27 Symmetry 2019, 11, 1407

3.2. Applications The first data to be considered is related to the soil fertility influence and the characterization of the biologic fixation of N2 for the Dimorphandra wilsonii Rizz growth. It was originally studied by [24] and it also figures in the work of [25]. For 128 plants, the phosphorus concentration in the leaves was quantified. Here are the numbers: 0.22, 0.17, 0.11, 0.10, 0.15, 0.06, 0.05, 0.07, 0.12, 0.09, 0.23, 0.25, 0.23, 0.24, 0.20, 0.08, 0.11, 0.12, 0.10, 0.06, 0.20, 0.17, 0.20, 0.11, 0.16, 0.09, 0.10, 0.12, 0.12, 0.10, 0.09, 0.17, 0.19, 0.21, 0.18, 0.26, 0.19, 0.17, 0.18, 0.20, 0.24, 0.19, 0.21, 0.22, 0.17, 0.08, 0.08, 0.06, 0.09, 0.22, 0.23, 0.22, 0.19, 0.27, 0.16, 0.28, 0.11, 0.10, 0.20, 0.12, 0.15, 0.08, 0.12, 0.09, 0.14, 0.07, 0.09, 0.05, 0.06, 0.11, 0.16, 0.20, 0.25, 0.16, 0.13, 0.11, 0.11, 0.11, 0.08, 0.22, 0.11, 0.13, 0.12, 0.15, 0.12, 0.11, 0.11, 0.15, 0.10, 0.15, 0.17, 0.14, 0.12, 0.18, 0.14, 0.18, 0.13, 0.12, 0.14, 0.09, 0.10, 0.13, 0.09, 0.11, 0.11, 0.14, 0.07, 0.07, 0.19, 0.17, 0.18, 0.16, 0.19, 0.15, 0.07, 0.09, 0.17, 0.10, 0.08, 0.15, 0.21, 0.16, 0.08, 0.10, 0.06, 0.08, 0.12, 0.13. Table 3 brings some descriptive statistics.

Table 3. Descriptive statistics for soil fertility dataset.

n mean median min max Variance Skewness Kurtosis 128 0.14078 0.13 0.05 0.28 0.00296 0.45438 −0.64478

We fitted the Normal-Weibull distribution (NW) (7) to the soil fertility dataset and compared it to the fits of Weibull (W), Exponentiated Weibull (ExpW) [ 1], Marshall-Olkin Extended Weibull (MOEW)26 [ ], Kumaraswamy-Weibull (KwW) [ 9], Beta-Weibull (BW) [ 8] and McDonald-Weibull (McW) [7]. The function goodness.fit of the R package AdequacyModel provides, besides the MLEs and the standard errors (SE), some criteria for model selection (AIC, CAIC, BIC and HQIC); they are shown in Table 4.

Table 4. Fitted distributions to the soil fertility dataset (estimates and information criteria).

Distribution Parameters Estimates (SE) AIC CAIC BIC HQIC NW k 0.8398477 (0.0445182) −395.7584 −395.6624 −390.0544 −393.4408 λ 0.2049909 (0.0074017) W k 2.8185566 (0.1919639) −385.6297 −385.5337 −379.9256 −383.3121 λ 0.1584836 (0.0052564) ExpW k 1.5321145 (0.5023377) −387.4361 −387.2426 −378.8801 −383.9598 λ 0.0939938 (0.0374090) a 3.5076974 (2.6763009) MOEW k 3.9300962 (0.2426080) −377.9475 −377.754 −369.3914 −374.4711 λ 8.9163819 (4.5940983) a 0.0031628 (0.0007240) KwW k 1.1503912 (0.3443931) −384.1491 −383.8239 −372.7409 −379.5139 λ 0.1953371 (0.1291154) a 3.3444607 (1.5352029) b 7.5480698 (10.206142) BW k 0.8477957 (0.2166409) −385.7589 −385.4337 −374.3508 −381.1237 λ 0.3304922 (0.4395169) a 9.0436364 (4.5271059) b 15.211970 (22.481984) McW k 5.6665646 (8.3928707) −384.6657 −384.1739 −370.4055 −378.8717 λ 0.5912941 (0.5124852) a 13.441193 (23.051917) b 14.363802 (18.264058) c 0.0870787 (0.1234075)

28 Symmetry 2019, 11, 1407

Information criteria may be used as relative goodness-of-fit measures, such that the lowest values will characterize the best fitted models. In this sense, the Normal-Weibull distribution outperforms the other ones. Figure 5 shows the histogram of soil fertility data and the fitted densities with the three lowest values of AIC among the distributions in the first column of Table 4. Although the Normal-Weibull and Exponentiated Weibull curves appear to be very close, the blue one (NW) seems to be closer to the histogram.

NW W ExpW Density 02468

0.05 0.10 0.15 0.20 0.25

Phosphorus concentration in leaves

Figure 5. Histogram of soil fertility dataset and fitted densities.

∗ ∗ The modified versions of Anderson-Darling (A ) and Cramér-von Mises (W ) statistics (more details in [27]) are typically used to investigate the quality of fit of probabilistic models. Table 5 brings these statistics concerning the fitted models to soil fertility data.

Table 5. Goodness-of-fit test statistics.

∗ ∗ Distribution A W NW 0.454008 0.079841 W 1.156994 0.207118 ExpW 0.784451 0.138403 MOEW 1.123759 0.183128 KwW 0.907239 0.163617 BW 0.750593 0.130501 McW 0.758296 0.137509

The measures portrayed in Table 5 represent the difference between the empirical distribution ∗ function and the real underlying cdf; hence we will consider that the models with lower values of Aand ∗ W fit the data better. Therefore, once again the Normal-Weibull distribution beats the competing models. The second application concerns to a dataset representing waiting times (in seconds) between 65 successive eruptions of water through a hole in the cliff at the coastal town of Kiama (New South Wales, Australia), known as the Blowhole. These data can be obtained in17 [ ,28]. Here are they: 83, 51, 87, 60, 28, 95, 8, 27, 15, 10, 18, 16, 29, 54, 91, 8, 17, 55, 10, 35,47, 77, 36, 17, 21, 36, 18, 40, 10, 7, 34, 27, 28, 56, 8, 25, 68, 146, 89, 18, 73, 69, 9, 37, 10, 82, 29, 8, 60, 61, 61, 18, 169, 25, 8, 26, 11, 83, 11, 42, 17, 14, 9, 12. Table 6 provides descriptive statistics.

Table 6. Descriptive statistics for eruption dataset.

n Mean Median min max Variance Skewness Kurtosis 64 39.82812 28 7 169 1139.097 1.54641 2.77108

We fitted the Normal-log-logistic distribution (NLL) (9) to the eruption dataset and compared it to the fits of Log-logistic (LL), Exponentiated Log-logistic (ExpLL), Beta-log-logistic (BLL),

29 Symmetry 2019, 11, 1407

Kumaraswamy-log-logistic (KwLL) and Gompertz-log-logistic (GoLL); the four latter along the lines of [1,8,9,11] respectively. Table 7 brings the MLEs, SEs and information criteria.

Table 7. Fitted distributions to the eruption dataset (estimates and information criteria).

Distribution Parameters Estimates (SE) AIC CAIC BIC HQIC NLL α 28.71747 (2.751091) 587.5681 587.7649 591.8859 589.2691 β 0.568200 (0.042998) LL α 28.27831 (3.203986) 597.1497 597.3464 601.4674 598.8506 β 1.969345 (0.198878) ExpLL α 7.394859 (6.479904) 597.3629 597.7629 603.8396 599.9144 β 1.461528 (0.197256) a 4.572569 (4.470086) BLL α 7.445103 (10.72863) 596.186 596.864 604.8215 599.588 β 0.484528 (0.223626) a 17.28664 (13.82502) b 9.285354 (9.566756) KwLL α 2.107772 (5.325557) 596.68 597.358 605.3156 600.082 β 0.511324 (0.130629) a 12.14489 (12.25210) b 11.42477 (8.749787) GoLL α 5.667167 (1.680598) 591.7172 592.3952 600.3528 595.1192 β 4.348435 (1.450980) a 0.035617 (0.017111) b 0.234894 (0.100666)

Since the Normal-log-logistic fitted model presents the smallest values of AIC, CAIC, BIC and HQIC compared to the fits of the other distributions, selecting it rather than the others is a reasonable decision in this case. In Figure 6 the histogram of eruption data and the fitted densities with the three lowest values of AIC among the distributions in the first column of Table 7 are depicted. By a visual comparison, the three curves are apparently good approximations to the histogram, but the Normal-log-logistic’s seems to explain the behavior of the data more accurately.

NLL LL GoLL Density 0.000 0.005 0.010 0.015 0.020 0.025 0 50 100 150

Waiting times

Figure 6. Histogram of eruption dataset and fitted densities.

∗ ∗ Table 8 provides the values of A and W of the distributions in the first column of Table 7. These statistics suggest that GoLL and NLL models fit the eruption dataset very closely. Nonetheless, in order to pick a more parsimonious model, one should prefer the NLL, since it has fewer parameters than GoLL.

30 Symmetry 2019, 11, 1407

Table 8. Goodness-of-fit tests.

∗ ∗ Distribution A W NLL 0.612291 0.0803799 LL 1.019129 0.1413872 ExpLL 1.138218 0.1617136 BLL 0.837211 0.1141264 KwLL 0.818072 0.1118931 GoLL 0.605822 0.0805111

It is worth mentioning that [ 17] proposed the new class Exponentiated Kumaraswamy- G and fitted one of its submodels (with Weibull as baseline) to the same eruption dataset. It presented ∗ ∗ A = 0.7594 and W = 0.1037, whereas NLL presented lower values of these statistics as one can check in Table 8.

4. Concluding Remarks Based on the method of generating classes of probability distributions presented by [ 12], we introduce a new class called Normal-G. It has the advantage of demanding no additional parameters besides the baseline ones. We demonstrate that the proposed class generates identifiable sub-models as long as the parent distribution is identifiable. The pdf of the class can be written as a linear combination of pdfs of exponentiated distributions; it allows us to easily derive the raw moments, the incomplete moments and the moment generating function. We bring Monte Carlo simulation studies to attest the good performance of the MLEs of two distributions generated by the class and to illustrate its usefulness, applications to real datasets are made. The fitted models are compared to other competitive distributions regarding the Anderson-Darling and the Cramér-von Mises statistics, as well as commonly used information criteria as goodness-of-fit measures. The general results indicate that the Normal- G outperforms the other distributions in comparison. The new class is powerful and provides parsimonious models, which may hopefully interest practitioners of statistics, soil science, oceanography and other fields.

Author Contributions: All of the authors contributed relevantly to this research article. Funding: This research received no external funding. Conflicts of Interest: The authors declare no conflict of interest.

31 Symmetry 2019, 11, 1407

Appendix A The information matrix mentioned in Section 2.5 is given by (ξ| )=−∇ ∇ (ξ| )=−( ) J X ξ ξ X uih 1≤i≤r,1≤h≤r, where:   m ( ) − ( )+ ∂ ∂ = 2Gξ xj 3 − 2Gξ xj 1 · ( ) · ( ) uih ∑ Gξ xj Gξ xj 4( ) [ − ξ ( )]4 ∂ξ ∂ξ j=1 Gξ xj 1 G xj i h [ − ( )]4 − 4( ) m 1 Gξ xj Gξ xj ∂2 + ∑ · Gξ (x ) 3( )[ − ( )]3 ∂ξ ∂ξ j j=1 Gξ xj 1 Gξ xj i h m [ − ( )] ( ) 8 1 Gξ xj Gξ xj ∂ ∂ + ∑ · Gξ (x ) · Gξ (x ) − ( )+ 2( ) ∂ξ j ∂ξ j j=1 1 2Gξ xj 2Gξ xj i h m ( ) − 2 4Gξ xj 2 ∂ + ∑ · Gξ (x ) − ( )+ 2( ) ∂ξ ∂ξ j j=1 1 2Gξ xj 2Gξ xj i h m ∂ ∂ m ∂2 + 2 · ( ) · ( ) − 2 · ( ) ∑ Gξ xj Gξ xj ∑ Gξ xj 2( ) ∂ξ ∂ξ ξ ( ) ∂ξ ∂ξ j=1 Gξ xj i h j=1 G xj i h m ∂ ∂ m ∂2 + 2 · ( ) · ( )+ 2 · ( ) ∑ Gξ xj Gξ xj ∑ Gξ xj [ − ξ ( )]2 ∂ξ ∂ξ − ξ ( ) ∂ξ ∂ξ j=1 1 G xj i h j=1 1 G xj i h m ∂ ∂ m ∂2 − 1 · ( ) · ( )+ 1 · ( ) ∑ gξ xj gξ xj ∑ gξ xj . 2( ) ∂ξ ∂ξ ξ ( ) ∂ξ ∂ξ j=1 gξ xj i h j=1 g xj i h

References

1. Mudholkar, G.S.; Srivastava, D.K.; Freimer M. The exponentiated Weibull family: A reanalysis of the bus motor failure data. Technometrics 1995, 37, 436–445. [CrossRef] 2. Gupta, R.D.; Kundu D. Generalized Exponential Distributions. Aust. N. Z. J. Stat. 1999, 41, 173–188. [CrossRef] 3. Marshall, A.W.; Olkin I. A new method for adding a parameter to a family of distributions with application to the exponential and Weibull families. Biometrika 1997, 84, 641–652. [CrossRef] 4. Nadarajah, S. A Generalized Normal Distribution. J. Appl. Stat. 2005, 32, 685–694. [CrossRef] 5. Azzalini, A. A Class of Distributions which includes the Normal ones. Scand. J. Stat. 1985, 12, 171–178. 6. Robertson, H.T.; Allison, D.B. A Novel Generalized Normal Distribution for Human Longevity and other Negatively Skewed Data. PLoS ONE 2012, 7, e37025. [CrossRef] 7. Cordeiro, G.M.; Hashimoto, E.M.; Ortega, E.M.M. The McDonald Weibull Model. Statistics2014, 48, 256–278. [CrossRef] 8. Famoye, F.; Lee, C.; Olumolade, O. The Beta-Weibull distribution. J. Stat. Theory Appl. 2005, 4, 121–136. 9. Cordeiro, G.M.; Castro, M. A new family of generalized distributions. J. Stat. Comput. Simul. 2010, 81, 883–898. [CrossRef] 10. Alzaatreh, A.; Lee, C.; Famoye, F. A new method for generating families of continuous distributions. Metron 2013, 71, 63–79. [CrossRef] 11. Alizadeh, M.; Cordeiro, G.M.; Pinho, L.G.B.; Ghosh, I. The Gompertz-G family of distributions. J. Stat. Theory Pract. 2017, 11, 179–207. [CrossRef] 12. Brito, C.R.; Rêgo, L.C.; Oliveira, W.R.; Gomes-Silva F. Method for Generating Distributions and Classes of Probability Distributions: The Univariate Case. Hacet. J. Math. Stat. 2019, 48, 897–930. 13. Xie, M.; Tang, Y.; Goh T.N. A modified Weibull extension with bathtub-shaped function. Reliab. Eng. Syst. Safe. 2002, 76, 279–285. [CrossRef] 14. Bebbington, M.; Lai, C.D.; Zitikis, R. A flexible Weibull extension. Reliab. Eng. Syst. Safe. 2007, 92, 719–726. [CrossRef] 15. Tahir, M.H.; Nadarajah, S. Parameter induction in continuous univariate distributions: Well-established G families. An. Acad. Bras. Ciênc. 2015, 87, 539–568. [CrossRef]

32 Symmetry 2019, 11, 1407

16. Cordeiro, G.M.; Ortega, E.M.M.; Cunha, D.C.C. The Exponentiated Generalized Class of Distributions. J. Data Sci. 2013, 11, 1–27. 17. Silva, R.; Gomes-Silva F.; Ramos, M.; Cordeiro, G.; Marinho, P.; Andrade, T.A.N. The Exponentiated Kumaraswamy-G Class: General Properties and Application. Rev. Colomb. Estad. 2019, 42, 1–33. [CrossRef] 18. Cakmakyapan, S.; Ozel, G. The Lindley Family of Distributions: Properties and Applications. Hacet. J. Math. Stat. 2017, 46, 1113–1137. [CrossRef] 19. Barreto-Souza, W.; Cordeiro, G.M.; Simas, A.B. Some Results for Beta Fréchet Distribution. Commun. Stat. Theory Methods 2011, 40, 798–811. [CrossRef] 20. Huang, S.; Oluyede, B.O. Exponentiated Kumaraswamy- with applications to income and lifetime data. J. Stat. Dist. Appl. 2014, 1, 1–20. [CrossRef] 21. Cordeiro, G.M.; Bager, R.S.B. Moments for Some Kumaraswamy Generalized Distributions. Comm. Statist. Theory Methods 2015 44, 2720–2737. [CrossRef] 22. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2018. 23. Von Neumann, J. Various techniques used in connection with random digits. In Applied Mathematics Series 12; National Bureau of Standards: Washington, DC, USA, 1951; pp. 36–38.

24. Fonseca, M.B. A influência da fertilidade do solo e caracterização da fixação biológica de N 2 para o crescimento de Dimorphandra wilsonii Rizz. Master’s Thesis, Federal University of Minas Gerais, Belo Horizonte, Brazil, 2007. 25. Silva, R.B.; Bourguignon, M.; Dias, C.R.B.; Cordeiro, G.M. The compound class of extended Weibull power series distributions. Comput. Stat. Data Anal. 2013, 58, 352–367. [CrossRef] 26. Cordeiro, G.M.; Lemonte, A.J. On the Marshall–Olkin extended Weibull distribution. Stat. Pap. 2013, 54, 333–353. [CrossRef] 27. Chen, G.; Balakrishnan, N. A general purpose approximate goodness-of-fit test. J. Qual. Technol. 1995, 27, 154–161. [CrossRef] 28. StatSci.org. Available online: http://www.statsci.org/data/oz/kiama.html (accessed on 21 October 2019).

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

33

S symmetry S

Article Generalized Truncation Positive Normal Distribution

Héctor J. Gómez 1,*, Diego I. Gallardo 2 and Osvaldo Venegas 1 1 Departamento de Ciencias Matemáticas y Físicas, Facultad de Ingeniería, Universidad Católica de Temuco, Temuco 4780000, Chile; [email protected] 2 Departamento de Matemática, Facultad de Ingeniería, Universidad de Atacama, Copiapó 1530000, Chile; [email protected] * Correspondence: [email protected]; Tel.: +56-45-2205410

Received: 27 September 2019; Accepted: 29 October 2019; Published: 3 November 2019

Abstract: In this article we study the properties, inference, and statistical applications to a parametric generalization of the truncation positive normal distribution, introducing a new parameter so as to increase the flexibility of the new model. For certain combinations of parameters, the model includes both symmetric and asymmetric shapes. We study the model’s basic properties, maximum likelihood estimators and Fisher information matrix. Finally, we apply it to two real data sets to show the model’s good performance compared to other models with positive support: the first, related to the height of the drum of the roller and the second, related to daily cholesterol consumption.

Keywords: truncation; half-normal distribution; likelihood

1. Introduction The half-normal (HN) distribution is a very important model in the statistical literature. Its density function has a closed-form and its cumulative distribution function (cdf) depends on the cdf of the standard normal model (or the error function), which is implemented in practically all mathematical and statistical software. Pewsey [1,2] provides the maximum likelihood (ML) estimation for the general location-scale HN distribution and its asymptotic properties. Wiper et al. [ 3] and Khan and Islam [4] perform analysis and applications for the HN model from a Bayesian framework. Moral et al. [5] also present the hnp R package, which produces half-normal plots with simulated envelopes using different diagnostics from a range of different fitted models. The HN model is also presented in the stochastic representation of the skew-normal distribution in Azzalini [ 6,7] and Henze [8]. In recent years this distribution has been used to model positive data, and it is becoming an important model in reliability theory despite the fact that it accommodates only decreasing hazard rates. Some of the generalizations of this distribution can be found in Cooray and Ananda [ 9], Olmos et al. [ 10], Cordeiro et al. [ 11], Gómez and Bolfarine [12], among others. In particular, we focused on the extension of Cooray and Ananda [ 9]. The authors provided a α motivation related to static life to consider the transformation Z = σY1/ , where Y ∼ HN(1). This model was named the generalized half-normal (GHN) distribution. An alternative way to extend the HN model was introduced by Gómez et al. [ 13] considering a normal distribution with mean and standard deviationμ and σ, respectively, truncated to the interval (0, +∞) and considering the reparametrization λ = μ/σ. This model was named the truncated positive normal (TPN) distribution with density function given by ( σ λ)= 1 φ z − λ σ > λ ∈ R f z; , σΦ(λ) σ , z, 0, , (1)

Symmetry 2019, 11, 1361; doi:10.3390/sym1111136135 www.mdpi.com/journal/symmetry Symmetry 2019, 11, 1361 where φ(·) and Φ(·) denote the density and cdf of the standard normal models, respectively. We use TPN(σ, λ) to refer to a random variable (r.v.) with density function as in Equation (1). Note that TPN(σ,0) ≡ HN(σ). In this work we consider a similar idea to that used in Cooray and Ananda [ 9] to extend the α TPN model including the transformationY = σX1/ , where X ∼ TPN(1, λ). We will refer to this distribution as the generalized truncation positive normal (GTPN). The rest of the manuscript is organized as follows. Section 2 is devoted to study of some important properties of the model, as well as its moments, quantile and hazard functions and its entropy. In Section 3 we perform an inference and present the Fisher information matrix for the proposed model. Section 4 discusses the selection model in nested and non-nested models for the GTPN distribution. In Section 5 we carry out a simulation study in order to study properties of the ML estimators in finite samples for the proposed distribution. Section 6 presents two applications to real data-sets to illustrate that the proposed model is competitive versus other common models for positive data in the literature. Finally, in Section 7, we present some concluding remarks.

2. Model Properties In this section we introduce the main properties of the GTPN model such as density, quantile and hazard functions, moments, among others.

2.1. Stochastic Representation and Particular Cases α As mentioned previously, we say that a r.v.Z has GTPN(σ, λ, α) distribution if Z = σY1/ , where Y ∼ TPN(1, λ). By construction, the following models are particular cases for the GTPN distribution:

• GTPN(σ, λ, α = 1) ≡ TPN(σ, λ). • GTPN(σ, λ = 0, α) ≡ GHN(σ, α). • GTPN(σ, λ = 0, α = 1) ≡ HN(σ).

Figure 1 summarizes the relationships among the GTPN and its particular cases. We highlight that λ = 0 and α = 1 are within the parametric space (not on the boundary). Therefore, to decide between the GTPN versus the TPN, GHN, or HN distributions we can use classical hypothesis tests such as the likelihood ratio test (LRT), (ST), or gradient test (GT).

GTPN(σ, λ, α)

α=1 λ=0

| " TPN(σ, λ) λ=0,α=1 GHN(σ, α)

λ=0 α=1 "  | HN(σ)

Figure 1. Particular cases for the GTPN distribution.

2.2. Density, Cdf and Hazard Functions Proposition 1. For Z ∼ GTPN(σ, λ, α), the density function is given by α α ( σ λ α)= α−1φ z − λ ≥ f z; , , σαΦ(λ) z σ , z 0. (2)

36 Symmetry 2019, 11, 1361 where σ, α > 0 and λ ∈ R

Proof. Considering the stochastic representation discussed in Section 2.1, we have that z = g(y)= 1/α σy and    −  α− −  dg 1(z)  1 − αz 1 f (z)= f (g 1(z))   = φ(g 1(z) − λ) . Z Y dz Φ(λ) σα − α Replacing g 1(z)=(z/σ) the result is obtained.

Proposition 2. For Z ∼ GTPN(σ, λ, α), the cdf and hazard function are given by α α α−1 z Φ(( z ) − λ)+Φ(λ) − ( ) αz φ σ − λ ( σ λ α)= σ 1 ( )= f z = F z; , , and h z α , Φ(λ) 1 − F(z) α z σ 1 − Φ σ − λ respectively, for all z ≥ 0.

Figure 2 shows the density and hazard functions for the GTPN (σ = 1, λ, α) model, considering some combinations for(λ, α). Note that the GTPN model can assume decreasing and unimodal shapes for the density function and decreasing or increasing shapes for the hazard function.

*731 í  *731  *731 í *731 í  *731  *731 í *731  *731  *731   *731  *731  *731    XQFWLRQ XQFWLRQ I I   KD]DUG GHQVLW\

 

               ] ]

Figure 2. Density and hazard functions for the GTPN(σ = 1, λ, α) model with different combinations of λ and α.

2.3. Proposition 3. The mode of the GTPN(σ, λ, α) model is attained: 1/α σ 2 1 1. at z = α λ + λ + 4 1 − whenever α ≥ 1 or 0 < α < 1 and λ > 0, 21/ α 2. at z = 0 in otherwise.

Proof. Let l = log( f ), where f is the density function defined in (2), of a direct computation we have ∂l 1 z 2α z α = l = − α − αλ − (α − 1) . ∂z z z σ σ α α z 2 z Note that lz vanishes when α σ − αλ σ − (α − 1)=0. Using the auxiliary variable w = α z σ the last equation is rewritten as follows:

37 Symmetry 2019, 11, 1361

αw2 − αλw − (α − 1)=0. (3)

In the rest of the proof we use the discriminant of the quadratic equation in Equation (3), Δ = α2λ2 + α(α − )=α2[λ2 + ( − 1 )] = which& is given by 4 1 4 1 α and its zeros are given by: w 2 1 (λ ± λ + 4(1 − α ))/2. α ≥ Δ ≥ λ2 = If 1 then . In consequence, the mode is attained at z1 1/α σ 2 1 α λ + λ + 4 1 − . 21/ α < α < Δ < λ2 < Δ < λ2 If 0 1 then . Here, two cases may occur. The first when 0 , in which 1/α σ 2 1 λ > 0 its mode is attained at z = α λ + λ + 4 1 − , since l < 0, and if λ < 0, case if 21/ α zz then the zeros of Equation (3) are negative, implying that function l is strictly decreasing. Its mode is therefore attained at zero. The other case is when Δ < 0, then we have that αw2 − αλw − (α − 1) > 0 for all w ≥ 0, implying that lz < 0 for all z ≥ 0. Therefore, l is strictly decreasing and thus its mode is zero.

Remark 1. Note that α ≥ 1 or λ > 0 implies that the mode of the GTPN model is attached in a positive value.

2.4. Quantiles Proposition 4. The quantile function for the GTPN(σ, λ, α) is given by

1 − α Q(p)=σ Φ 1(1 − (1 − p)Φ(λ)) + λ .

Proof. Follows from a direct computation, applying the definition of quantile function.

Corollary 1. The quartiles of the GTPN distribution are

− 1 σ[Φ 1( − 3 Φ(λ)) + λ] α 1. First quartile 1 4 . − Φ(λ) 1 σ[Φ 1( − )+λ] α 2. Median 1 2 . − 1 σ[Φ 1( − 1 Φ(λ)) + λ] α 3. Third quartile 1 4 .

2.5. Central Moments Proposition 5. Let Z ∼ GTPN(σ, λ, α) and r = 1, 2, . . .. The r-th non-central moment is given by r 1/α σλ br(λ, α) r √ μr = E(Z )= , 2Φ(λ) π

− −k k 1 (λ α)=∑∞ (r/α) λ Γ k+1 λ2 (r/α) = 1 ∏( α − ) where br , k=0 k 2 2 , 2 and k k! r/ n is the generalized binomial n=1 coefficient. When r/α ∈ N, the sum in br(λ, α) stops at r/α.

Proof. Considering the stochastic representation of the GTPN model in Section 2.1, it is immediate that r E(Zr)=σrE(Y α ), where Y ∼ TPN(1, λ). This expected value can be computed using Proposition 2.2 in Gómez et al. [13].

Remark 2. When r/α ∈ N, Closed Forms Can Be Obtained for μr

Figure 3 illustrates the mean, variance, skewness, and kurtosis coefficients for the GTPN (σ = 1, λ, α) model for some combinations of its parameters.

38 Symmetry 2019, 11, 1361

(a) Expectation (b) Variance

(c) Skewness (d) Kurtosis

Figure 3. Plots of the (a) expectation, ( b) variance, (c) skewness and ( d) kurtosis for GTPN (σ = 1, λ, α) for α ∈{0.75, 1, 1.5} as a function of λ.In(d), the dashed line represents the kurtosis of the normal distribution.

2.6. Bonferroni and Lorenz Curves In this subsection we present the Bonferroni and Lorenz curves (see Bonferroni14 [ ]). These curves have applications not only in economics to study income and poverty, but also in medicine, reliability, etc. The Bonferroni curve is defined as 1 q B(p)= zf(z)dz,0≤ p < 1. pμ 0

− where μ = E(Z), q = F 1(p). The Lorenz curve is obtained by the relationL(p)=pB(p). Particularly, it can be checked that for the GTPN model the Bonferroni curve is given by   ∞ 1 ( q )α − λ 2 1 σ α 1 − k−1 k + 1 σ B(p)= E(Z) − √ ∑ λ α k2 2 Γ( , ) . μ Φ(λ) π p 2 k=0 k 2 2

39 Symmetry 2019, 11, 1361

These curves serve as graphic methods for analysis and comparison, e.g., the inequality of non-negative distributions. See, for example, for a more detailed discussion [15]. Figure 4 shows the Bonferroni curve for the GTPN (σ = 1, λ, α) model, considering different values for λ and α.

 *731  *731  *731  *731 í  *731 í



S

 %





       S

Figure 4. Bonferroni curve for the generalized truncation positive normal (GTPN)(σ, λ, α) model.

2.7. Shannon Entropy Shannon entropy (see Shannon [16]) measures the amount of uncertainty for a random variable. It is defined as: S(Z)=−E (log f (Z)) .

Therefore, it can be checked that the Shannon entropy for the GTPN model is  % α α 2πσΦ(λ) μ2 λμ λ2 S(Z)=log − (α − 1)E (log(Z)) + − + , α 2σ2α σα 2

Figure 5 shows the entropy curve for the GTPN (σ = 1.5, λ, α) model, considering different values forλ and α. We note that this function is increasing in λ and α. where E (log(Z)) = ∞ log(z) f (z; σ, λ, α)dz. For α = 1, the Shannon entropy is reduced to 0 & μ2 λμ λ2 S(Z)=log 2πσΦ(λ) + − + , 2σ2 σ 2 which corresponds to the Shannon entropy for the TPN model; and for α = 1 and λ = 0, S(Z) is reduced to √ μ2 S(Z)=log πσ + , 2σ2 which corresponds to the Shannon entropy for the HN distribution.

40 Symmetry 2019, 11, 1361

Figure 5. Entropy for the GTPN(σ = 1, α, λ) model.

3. Inference In this section we discuss the ML method for parameter estimation in the GTPN model.

3.1. Maximum Likelihood Estimators (σ λ α) For a random sample z1, ..., zn from the GTPN , , model, the log-likelihood function for θ =(σ, λ, α) is given by

n n α 2 (θ)= (α)+(α − ) ( ) − α (σ) − (Φ(λ)) − n ( π) − 1 zi − λ n log 1 ∑ log zi n log n log log 2 ∑ σ . i=1 2 2 i=1

Therefore, the score assumes the form S(θ)=(Sσ(θ), Sλ(θ), Sα(θ)), where α n α α−1 n zi zi zi σ(θ)=− + ∑ − λ α S σ σ σ σ2 , i=1 n α (θ)=− ξ(λ)+ zi − λ Sλ n ∑ σ and i=1 n n α α (θ)=n + − (σ) − zi − λ zi zi Sα α ∑ log zi n log ∑ σ σ log σ , i=1 i=1

ξ(λ)= φ(λ) where Φ(λ) is the negative of the inverse Mills ratio. The ML estimators are obtained by solving (θ)= the equation S 03, where 0p denotes a vector of zeros with dimension p. This equation has the following solution for λ n 2α zi − ∑ σ n λ(σ α)= i=1 , n α . (4) zi ∑ σ i=1

Replacing Equation (4)inSλ(θ)=0 and Sα(θ)=0, the problem is reduced to two equations. The solution of this problem needs to be solved by numerical methods such as Newton-Raphson. Below we discuss initial values for the vector θ to initialize the algorithm.

41 Symmetry 2019, 11, 1361

3.2. Initial Point to Obtain the Maximum Likelihood Estimators In this subsection, we discuss the initial points for the iterative methods to find the ML estimators in the GTPN distribution.

3.2.1. A Naive Point Based on the HN Model In Section 2 we discuss that GTPN(σ, λ = 0, α = 1) ≡HN(σ). Based on this fact, and considering that the ML estimator for σ in the HN distribution has a closed-form, we can consider as an initial n θ = 1 2 point naive ∑ zi ,0,1 . n i=1

3.2.2. An Initial Point Based on Centiles = θ Let qt, t 1, ..., 99, the t-th sample centile based on z1, ..., zn. An initial point to can be ∈{ } obtained by matchingqu, q50 and q100−u, with u 1, 2, ..., 48, 49 , with their respective theoretical counterparts. Defining p = u/100, the equations obtained are

1 1 = σ Φ−1( − ( − )Φ(λ)) + λ α = σ Φ−1( − Φ(λ)) + λ α qu 1 1 p , q50 1 0.5 and

1 = σ Φ−1( − Φ(λ)) + λ α q100−u 1 p .

The solutions for σ and α are − ' ' Φ 1(1−(1−p)Φ(λ))+λ log Φ−1( − Φ(λ'))+λ' 'σ = 'σ(λ')= q100−u 'α = 'α(λ')= 1 p ' and , 1/'α(λ) qu −1 ' ' log Φ 1 − pΦ(λ) + λ q100−u

' where λ is obtained from the non-linear equation

' 1/'α(λ) 'σ(λ') Φ−1 − Φ(λ') + λ' = 1 0.5 q50. ' ' ' Therefore, the initial point based on this method is given by θcent = σ, λ, α .

3.3. An Initial Point Based on the Method of Moments A more robust initial point can be obtained using the method of moments. The equations to solve r are μr = z , r = 1, 2, 3. The solution for σ is √ ' 2 πzΦ(λ ) 'σ = 'σ (λ' 'α )= , 1/'α . λ' (λ' 'α) b1 ,

' The solution for λ and α (say λ and 'α , respectively) are obtained from the non-linear equations

α α ('σ (λ, α)λ1/ )2b (α, λ) ('σ (λ, α)λ1/ )3b (α, λ) √ 2 = z2 and √ 3 = z3. 2Φ(λ) π 2Φ(λ) π ' ' ' Therefore, the initial point based on this method is given by θmom = σ , λ , α .

42 Symmetry 2019, 11, 1361

3.4. Fisher Information Matrix (θ)= The Fisher information (FI) matrix for the GTPN distribution is given byIF Iab . a,b ∈{σ,λ,α} Consider the notation ∞ = (λ)= [ ( )]k jφ( − λ) ∈{ } ∈{ } Tjk Tjk log w w w dw, j 1, 2 , k 0, 1, 2 . 0

Therefore,

α α(2α + 1) λα(α + 1) σσ = − + − I σ2 σ2Φ(λ) T20 σ2Φ(λ) T10, α = Iσλ σΦ(λ) T10,

= 1 + 1 [λ( + ) − − ] Iσα σ σΦ(λ) T10 T11 T20 2T21 , 2 Iλλ = 1 − λξ(λ) − ξ (λ), = − 1 Iλα αΦ(λ) T11, 1 2 λ αα = + − I α2 α2Φ(λ) T22 α2Φ(λ) T12

(λ) (λ)= (λ) < +∞ We observe that Tjk is a continuous function and lim|λ|→+∞ Tjk 0, then Tjk . Note that for λ = 0 and α = 1 this matrix is reduced to ⎛ & ⎞ 2 1 2 − 4 ⎜ σ2 σ π σ T21 ⎟ (σ )=⎝ · − 2 − ⎠ FI ,0,1 1 π 2T11 , ·· + 1 4T22

= ( ) where Tjk Tjk 0 . Additionally, we note that   (π − ) ( (σ )) = 2 3 ( + + ) − ( − 2 )2 ≈ 0.1021506 > ∀σ > det FI ,0,1 σ2 π 1 4T22 4T21 4 T11 π T21 σ2 0, 0, where det(·) denotes the determinant operator. Therefore, the FI matrix for the reduced model (HN) is invertible.

4. Model Discrimination In this section we discuss some techniques to discriminate among the GTPN distribution and other models.

4.1. GTPN versus Submodels An interesting problem to solve is the discrimination between GTPN and the three submodels represented in Figure 1. In other words, we are interested in testing the following hypotheses:

• (1) α = (1) α = H0 : 1 versus H1 : 1 (TPN versus GTPN distribution). • (2) λ = (2) λ = H0 : 0 versus H1 : 0 (GHN versus GTPN distribution). • (3) (α λ)=( ) (3) (α λ) =( ) H0 : , 1, 0 versus H1 : , 1, 0 (HN versus GTPN distribution). The three hypotheses can be tested considering the LRT, ST, and GT. Below we present the statistics for the three tests considered and for the three hypotheses of interest.

43 Symmetry 2019, 11, 1361

4.1.1. Likelihood Ratio Test (j) = The statistic for the LRT (say SLR) to tests H0 , j 1, 2, 3, is defined as = σ λ α − σ λ α SLRj 2 , , 0j, 0j, 0j ,

σ λ α σ λ α (j) = (j) where 0j, 0j and 0j denote the ML estimators for , and restricted to H0 , j 1, 2, 3. Under H0 , ( ) = ∼ χ2 3 ∼ χ2 χ2 j 1, 2, SLR j (1) and under H0 , SLR3 (2), where (p) denotes the Chi-squared distribution (3) with p degrees of freedom. For H0 , we obtain n σ = 1 2 λ = α = 03 ∑ zi , 03 0 and 03 1, n i=1

(1) (2) whereas to test H0 and H0 the ML estimators under the null hypotheses need to be computed numerically. However, in both cases the problem is reduced to a unidimensional maximization. For details see Cooray and Ananda [9] and Gómez et al. [13], respectively.

4.1.2. Score Test (j) = The statistic for the ST (say SR) to test H0 , j 1, 2, 3, is defined as  −1 = θ θ θ SRj S 0j IF 0j S 0j ,

( ) ( ) ( ) θ j j = ∼ χ2 3 ∼ χ2 where j is the ML estimator under H0 . Under H0 , j 1, 2, SRj (1) and under H0 ,SR3 (2).

4.1.3. Gradient Test (j) = The statistic for the GT (say ST) to tests H0 , j 1, 2, 3, is defined as = (θ ) θ − θ STj S j j .

( ) ( ) j = ∼ χ2 3 ∼ χ2 Again, under H0 , j 1, 2, ST j (1) and under H0 ,ST3 (2). After some algebraic manipulations, we obtain that ( n = (α − ) [ − (σ )] + − zi − λ zi zi ST1 1 n 1 log 01 ∑ log zi σ 01 σ log σ . = 01 01 01  i 1 α n 02 = λ − 2 + zi ST2 n π ∑ σ . i=1 02  (   n 2 n = (α − ) [ − (σ )] + − zi zi + λ − 2 + zi ST3 1 n 1 log 03 ∑ log zi σ log σ n π ∑ σ . i=1 03 03 i=1 03

44 Symmetry 2019, 11, 1361

4.2. Non-Nested Models The comparison of non-nested models can be performed based on the AIC criteria (Akaike [17]), where the model with a lower AIC is preferred. However, in practice we can have a set of inappropriate models for a certain data set. For this reason, we also need to perform a goodness-of-fit validation. This can be performed, for instance, based on the quantile residuals (QR). For more details see Dunn and Smyth [18]. These residuals are defined as

= ( ψ) = QRi G zi; , i 1,...,n, where G(; ψ) is the cdf of the specified distribution evaluated in the estimator for ψ. If the model is correctly specified, such residuals are a random sample from the standard normal distribution. This can be assessed using, for instance, the Anderson–Darling (AD), Cramer-Von-Mises (CVM) and Shapiro–Wilks (SW) tests. A discussion of these tests can be seen in Yazici and Yocalan [19].

5. Simulation In this section we present a Monte Carlo (MC) simulation study in order to illustrate the behavior of the ML estimators. We consider three sample sizes: n = 50, 150 and 300; two values for σ:1 and 2; two values forλ:3 and 4; and three values for α: 0.8, 1 and 2. For each combination of n, σ, λ and α, we draw 10,000 samples of size n from the GTPN (σ, λ, α) model. To simulate a value from this distribution, we consider the following scheme:

1. Simulate X ∼ Uniform(0, 1). − 2. Compute Y = Φ 1(1 +(X + 1)Φ(λ)) + λ. 1 3. Compute Z = σY α .

For each sample generated, ML estimators were computed numerically using the Newton-Raphson algorithm. Table 1 presents and standard deviations for each parameter in each case. Notice that bias and standard deviations are reduced as the sample size increases, suggesting that the ML estimators are consistent.

45 Symmetry 2019, 11, 1361 α . The results α 1000 λ = and n λ , σ σ α model in 12 combinations of ) α 300 , λ λ = , n σ ( σ α 150 λ = n σ α 50 λ = n σ Monte Carlo (MC) simulation study for the maximum likelihood (ML) estimators in the GTPN mean sd mean sd mean sd mean sd mean sd mean sd mean sd mean sd mean sd mean sd mean sd mean sd 0.81 1.5232 1.573 1.286 3.089 1.006 1.105 1.797 0.510 3.118 0.987 3.220 1.8430.8 1.029 2.036 1.2271 3.006 1.227 2.432 0.7052 3.131 1.029 2.512 0.309 1.111 3.157 1.999 3.000 1.011 2.198 0.705 1.876 1.039 1.052 3.215 0.309 3.027 0.974 3.318 0.875 3.019 1.942 1.040 1.996 2.223 0.350 1.207 1.055 1.083 2.423 2.417 1.081 2.171 1.437 0.425 1.996 0.631 0.614 2.237 0.850 1.044 3.018 2.017 3.017 1.001 1.437 0.444 1.045 0.631 0634 3.028 0.196 3.020 0.869 3.035 0.830 3.022 1.077 0.629 0.340 1.096 1.086 0.202 0.614 1.031 2.152 2.171 1.011 2.058 0.431 0.250 1.185 0.862 2.094 0.233 0.481 1.006 3.023 2.000 3.014 0.998 0.897 0.185 0.624 0.405 3.017 0.267 0.093 3.012 0.823 3.027 0.803 3.013 0.642 0.266 0.194 0.638 1.033 0.071 0.266 1.004 2.034 2.061 2.008 0.250 0.089 0.472 0.504 2.009 0.177 3.007 1.995 0.373 0.269 0.186 3.015 0.805 3.016 0.269 0.072 0.267 1.004 2.006 0.089 0.177 0.81 1.4892 1.799 1.236 4.391 0.962 1.219 2.010 0.543 4.439 0.971 4.642 2.1820.8 0.860 2.495 1.2031 2.880 1.087 2.360 0.6312 3.544 0.860 2.402 0.321 1.018 4.562 1.913 4.280 0.957 2.434 0.631 2.283 1.132 1.199 4.659 0.321 4.295 0.970 4.802 0.817 4.323 2.419 1.222 1.714 2.848 0.282 1.176 1.326 1.017 2.161 2.378 1.017 2.033 1.283 0.343 1.714 0.668 0.513 2.033 0.696 0.986 4.326 1.898 4.164 0.974 1.283 0.418 1.288 0.668 0.786 4.344 0.223 4.187 0.813 4.393 0.798 4.168 1.346 0.826 0.282 1.469 0.148 1.015 0.839 0.995 2.024 2.024 1.003 2.000 0.355 0.191 1.078 0.708 0.301 1.974 0.382 0.993 4.194 1.938 4.049 0.993 0.835 0.230 0.849 0.460 0.413 4.190 0.118 4.055 0.796 4.198 0.799 4.045 0.844 0.404 0.160 0.901 0.082 0.995 0.404 0.998 2.003 1.993 2.000 0.191 0.099 0.573 0.391 1.990 0.198 4.053 1.984 0.465 0.404 0.236 4.053 0.799 4.047 0.408 0.079 0.401 0.998 2.000 0.100 0.198 3 3 4 4 summarize the mean and the standard deviation (sd) of the respective estimators obtained in the 10,000 replicates. Table 1. True Valor σλα 1 2

46 Symmetry 2019, 11, 1361

6. Applications In this section we present two real data applications to illustrate the better performance of the GTPN model over other well known models in the literature. For these comparisons we also consider the Weibull (WEI) and the Generalized Lindley (GL, Zakerzadeh20 [ ]) models. The density function of the Weibull distribution is given by

σ σ ( σ λ)= σ−1 −x /λ f x; , λ x e , with x > 0, σ > 0 and λ > 0, whereas for the GL model is given by

α− −θ θ2(θx) 1(α + γx)e x f (x; α, θ, γ)= , (γ + θ)Γ(α + 1) with x > 0, θ > 0 α > 0 and γ > 0.

6.1. Application 1 The data set was taken from Laslett [21], and consisted of n = 115 heights measured at 1 micron intervals along the drum of a roller (i.e., parallel to the axis of the roller). This was part of an extensive study of the surface roughness of the rollers. A statistical summary of the data set is presented in Table 2.

Table 2. Descriptive statistics of the Laslett data set. √ n XS2 b b Dataset 1 2 Heights measured 115 3.48 0.52 −1.24 6.30

Initially, we calculate the estimators based on the centiles, naive, and moments of the θ =( ) θ =( ) θ = GTPN distribution, which are cent 2.326, 2.741, 2.376 , naive 3.557, 0, 1 , and mom (3.075, 1.572, 3.203 ) . We used these estimations as initial values in computing the ML estimators for the GTPN model. Results are presented in Table 3. Note the high estimated for the γ parameter in the GL model. In addition, note that, based on the AIC criteria and BIC criteria22 [ ], the GTPN model is preferred (among the fitted models) for this data set. Figure 6 shows the estimated density for each model in this data set, where the GTPN model appears to provide a better fit. Finally, Figure 7 also presents the qq-plot for the QR in the same models and the p − values for the three normality tests discussed in Section 4.2. Results suggest that the GTPN model is an appropriate model for this data set while the rest of models are not.

Table 3. Estimation of the parameters and their standard errors (in parentheses) for the GTPN, TPN, WEI, and GL models for the data set. The AIC and BIC criteria are also included.

Estimated GTPN TPN WEI GL σ 2.354(0.379) 0.720(0.047) 5.855(0.435) - λ 2.512(0.495) 4.842(0.333) 3.744(0.062) 4.093(0.539) α 2.227(0.401) - - 13.467(1.902) γ - - - 15.528 (29.638) AIC 238.43 254.63 251.74 308.67 BIC 246.67 260.12 257.23 316.90

47 Symmetry 2019, 11, 1361

 *731 731  :(, /*

 'HQVLW\



  9DULDEOH Figure 6. Fit of the distributions for the Laslett data set.

(a) GTPN (b) TPN

(c) WEI (d) GL

Figure 7. QR for the fitted models in the Laslett data set. The p − values for the Anderson–Darling (AD), Cramer-Von-Mises (CVM) and Shapiro–Wilks (SW) normality tests are also presented to check if the RQ came from the standard normal distribution.

48 Symmetry 2019, 11, 1361

6.2. Application 2 The data set to be investigated was taken from Nierenberg et al. [ 23], and is related to a study on plasma retinol and betacarotene levels from a sample of n = 315 subjects. More specifically, the response variable observed is grams of cholesterol consumed per day. Descriptive statistics for the variable are provided in Table 4. Initially, we calculate the estimators based on the centiles, naive, and moments of the θ =( ) θ =( ) θ = GTPN distribution, which are cent 2.326, 2.741, 2.376 , naive 275.960, 0, 1 , and mom (0.003, −86.024, 1.854). We used these estimations as initial values in computing the ML estimators for the GTPN model. Table 5 summarizes the fit for this data set. As in last application, we noted the high estimated standard error for the γ parameter in the GL model. Again, based on the AIC criteria and BIC criteria, the preferred model is the GTPN. Figure 8 shows the estimated density for each model in the cholesterol data set. The GTPN model appears to provide a better fit. For this data set, we also present the hypothesis tests for the three particular models of the GTPN distribution (1) α = (1) α = − discussed in Section 4.1. Specifically, for H0 : 1 versus H1 : 1 we obtained p values < 0.0001, 0.0157, and< 0.0001 for the SLR, SR, and ST tests, respectively. Therefore, with a significance (2) λ = (2) λ = of 5% we preferred the GTPN over the TPN model. For H0 : 0 versus H1 : 0 we obtained p − values < 0.0001, 0.0008 and < 0.0001 for the SLR, SR and ST tests, respectively. Hence, with a (3) (α λ)=( ) significance of 5% we preferred the GTPN over the GHN model. Finally, H0 : , 1, 0 versus (3) (α λ) =( ) − < H1 : , 1, 0 we obtained p values 0.0001 for the three tests. Therefore, with a significance of 5% we preferred the GTPN over the HN model. Finally, Figure 9 also presents the qq-plot for the QR in the fitted models and thep − values for the three normality tests. Results suggest that the GTPN model is appropriate for this data set while the rest of the models are not.

Table 4. Descriptive statistics for the cholesterol data set. √ n XS2 b b Data Set 1 2 Heights measured 315 242.46 17421.79 1.47 6.34

Table 5. Estimated parameters and their standard errors (in parentheses) for the GTPN, TPN, WEI, and GL models for the cholesterol data set. The AIC and BIC criteria are also presented.

Estimated GTPN TPN WEI GL σ 0.040(0.027) 151.106(8.734) 1.964(0.080) - λ 7.888(0.459) 1.455(0.138) 274.717(8.353) 0.016(0.001) α 0.240(0.013) - - 2.831(0.293) γ - - - 2.067 (6.720) AIC 3874.92 3943.25 3905.68 3879.07 BIC 3886.17 3950.75 3913.18 3890.42

*731  731 :(,

 /*

'HQVLW\



     

9DULDEOH Figure 8. Fit of the distributions for the cholesterol data set.

49 Symmetry 2019, 11, 1361

(a) GTPN (b) TPN

(c) WEI (d) GL

Figure 9. RQ for the fitted models in the cholesterol data set. Thep − values for the AD, CVM, and SW normality tests are also presented to check if the QR came from the standard normal distribution.

7. Conclusions In this work we introduce a new distribution for positive data named the GTPN model. This new distribution includes as particular cases three models well-known in the literature: the TPN, GHN, and HN models. The basic properties of the model and ML estimation were studied. We performed a simulation study in finite samples and two real data applications, showing the good performance of the model compared with other usual models in the literature.

Author Contributions: All of the authors contributed significantly to this research article. Funding: The research of H.J.G. was supported by Proyecto de Investigación de Facultad de Ingeniería. Universidad Católica de Temuco. UCT-FDI012019. The research of O.V. was supported by Vicerrectoría de Investigación y Postgrado de la Universidad Católica de Temuco, Projecto FEQUIP 2019-INRN-03. Conflicts of Interest: The authors declare no conflict of interest.

50 Symmetry 2019, 11, 1361

References

1. Pewsey, A. Large-sample inference for the general half-normal distribution. Commun. Stat. Theory Methods 2002, 31, 1045–1054. [CrossRef] 2. Pewsey, A. Improved likelihood based inference for the general half-normal distribution. Commun. Stat. Theory Methods 2004, 33, 197–204. [CrossRef] 3. Wiper, M. P.; Girón, F. J.; Pewsey, A. Objective for the half-normal and half-t distributions. Commun. Stat. Theory Methods 2008, 37, 3165–3185. [CrossRef] 4. Khan, M.A.; Islam, H.N. Bayesian analysis of system availability with half-normal life time. Math. Sci.2012, 9, 203–209. [CrossRef] 5. Moral, R.A.; Hinde, J.; Demétrio, C.G.B. Half-normal plots and overdispersed models in R: The hnp package. J. Stat. Softw. 2017, 81, 1–23. [CrossRef] 6. Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Stat. 1985, 12, 171–178. 7. Azzalini, A. Further results on a class of distributions which includes the normal ones. Statistics 1986, 12, 199–208. 8. Henze, N. A probabilistic representation of the skew-normal distribution. Scand. J. Stat. 1986, 13, 271–275. 9. Cooray, K.; Ananda, M.M.A. A generalization of the half-normal distribution with applications to lifetime data. Commun. Stat. Theory Methods 2008, 10, 195–224. [CrossRef] 10. Olmos, N.M.; Varela, H.; Gómez, H.W.; Bolfarine, H. An extension of the half-normal distribution. Stat. Pap. 2012, 53, 875–886. [CrossRef] 11. Cordeiro, G.M.; Pescim, R.R.; Ortega, E.M.M. The Kumaraswamy generalized half-normal distribution for skewed positive data. J. Data Sci. 2012, 10, 195–224. 12. Gómez, Y.M.; Bolfarine, H. Likelihood-based inference for power half-normal distribution. J. Stat. Theory Appl. 2015, 14, 383–398. [CrossRef] 13. Gómez, H.J.; Olmos, N.M.; Varela, H.; Bolfarine, H. Inference for a truncated positive normal distribution. Appl. Math. J. Chin. Univ. 2015, 33, 163–176. [CrossRef] 14. Bonferroni, C.E. Elementi di Statistica Generale; Libreria Seber: Firenze, Italy, 1930. 15. Arcagnia, A.; Porrob, F. The Graphical Representation of Inequality. Rev. Colomb. Estad. 2014, 37, 419–436. [CrossRef] 16. Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [CrossRef] 17. Akaike, H. A new look at the statistical model identification. IEEE Trans. Auto. Contr. 1974, 19, 716–723. [CrossRef] 18. Dunn, P.K.; Smyth, G.K. Randomized Quantile Residuals. J. Comput. Graph. Stat. 1996, 5, 236–244. 19. Yazici, B.; Yocalan, S. A comparison of various tests of normality. J. Stat. Comput. Simul. 2007, 77, 175–183. [CrossRef] 20. Zakerzadeh, H.; Dolati, A. Generalized Lindley Distribution. J. Math. Ext. 2009, 3, 1–17. 21. Laslett, G.M. and Splines: An Empirical Comparison of Their Predictive Performance in Some Applications. J. Am. Stat. Assoc. 1994, 89, 391–400. [CrossRef] 22. Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [CrossRef] 23. Nierenberg, D.W.; Stukel, T.A.; Baron, J.A.; Dain, B.J.; Greenberg, E.R. Determinants of plasma levels of beta-carotene and retinol. Am. J. Epidemiol. 1989, 130, 511–521. [CrossRef][PubMed]

c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

51

S symmetry S

Article Skewness of Maximum Likelihood Estimators in the Weibull Censored Data

Tiago M. Magalhães 1, Diego I. Gallardo 2 and Héctor W. Gómez 3,* 1 Department of Statistics, Institute of Exact Sciences, Federal University of Juiz de Fora, Juiz de Fora 36000-000, Brazil; [email protected] 2 Departamento de Matemática, Facultad de Ingeniería, Universidad de Atacama, Copiapó 1530000, Chile; [email protected] 3 Departamento de Matemáticas, Facultad de Ciencias Básicas, Universidad de Antofagasta, Antofagasta 1240000, Chile * Correspondence: [email protected]

Received: 29 September 2019; Accepted: 21 October 2019; Published: 1 November 2019

− Abstract: In this paper, we obtain a matrix formula of order n 1/2, where n is the sample size, for the skewness coefficient of the distribution of the maximum likelihood estimators in the Weibull censored data. The present result is a nice approach to verify if the assumption of the normality of the regression parameter distribution is satisfied. Also, the expression derived is simple, as one only has to define a few matrices. We conduct an extensive Monte Carlo study to illustrate the behavior of the skewness coefficient and we apply it in two real datasets.

Keywords: maximum likelihood estimates; type I and II censoring; skewness coefficient; Weibull censored data

1. Introduction In its first appearance, the Weibull distribution1 [] claimed its wide applicability. Survival analysis, , and are some of its applicability. To amplify the relevance of the Weibull, a regression structure is added to one of the parameters, i.e., the behavior of the distribution may be explained from covariates (explanatory variables) and unknown parameters to be estimated from observable data. In , it is often desirable to test if there are regression parameters statistically significant and the is commonly performed. Under standard regularity conditions, the null distribution of the Wald statistic is asymptotically chi-squared, a consequence of the maximum likelihood estimators (MLE) distribution. Therefore, the Wald test must be avoided if the sample size is not large enough, because the distribution of the MLE will be poorly approximated by the normal distribution. Preventing the complexity of the statistical tests, the skewness coefficient (sayγ) of the distribution of the MLE is an easy way to verify if the approximation to normality is adequate. A value of γ far from zero indicates a departure from the normal distribution. Pearson’s standardized third cumulant γ = κ κ3/2 κ defined by 3/ 2 , where r is the rth cumulant of the distribution, is the most well-known measure of skewness. When γ > 0(γ < 0) the distribution is positively (negatively) skewed and will have a longer (shorter) right tail and a shorter (longer) left tail. If the distribution is symmetrical, γ equals zero. However, there are in [ 2] (Exercise 3.26) asymmetrical distributions with as many zero-odd order central moments as desired, so, the value of γ must be interpreted with some caution. In the statistical literature, there is not a closed-form for the skewness coefficient ofγ of the MLE −1/2 γ γ in several regression models. Ref. [3] obtained a generaln expression (say 1) for the distribution of the MLE, where n is the sample size. Following [ 3], several works have been developed in order

Symmetry 2019, 11, 1351; doi:10.3390/sym1111135153 www.mdpi.com/journal/symmetry Symmetry 2019, 11, 1351

γ to obtain the 1 coefficient. In the first, Ref. [4] determined its expression for the class of generalized γ linear models and, the last one, Ref. [5] defined the 1 for the varying dispersion beta regression model and showed that this coefficient for the distribution of the MLE of the precision parameter is relatively large in small to moderate sample sizes. This paper is the first focused on a censored model. γ In this work, we derive the 1 coefficient of the distribution of the MLE of the linear parameters in the Weibull censored data, assuming σ known, as σ = 1/2 and 1, the Rayleigh and exponential models, respectively. We discuss the situation when σ is unknown, however, it can be replaced by a consistent estimator, and then we can turn back to the original situation. This type of procedure was performed, for instance, by [4]. The remainder of the paper is organized as follows. Section 2 defines the Weibull censored data. − In Section 3, we obtain a simple matrix expression, of order n 1/2, for the skewness coefficients of the distributions of the MLEs of the parameters. In Section 4, some Monte Carlo simulations are performed. Two applications are presented in Section 5. Concluding remarks are offered in Section 6.

2. The Weibull Censored Data We say that a continuous random variableT has Weibull distribution with scale parameterθ and shape parameter σ,orT ∼ WE(θ, σ), if its probability density function (pdf) is given by ) * 1 σ− σ f (t; θ, σ)= t1/ 1 exp − (t/θ)1/ , (1) σθ1/σ

with t > 0, σ > 0 and θ > 0. From (1), we can observe two particular distributions: the exponential and the Rayleigh, where σ = 1 and σ = 1/2, respectively. In lifetime data, there is the censoring restriction, i.e, if T1, ..., Tn are a random sample from (1), instead of Ti, we observe, under right = ( ) = censoring, ti min Ti, Li , where Li is the censoring time, independent ofTi, i 1, ..., n. In this work, we consider an hybrid censoring scheme, where the study is finalized when a pre-fixed number, r ≤ n, = = = out of n observations have failed, as well as when a prefixed time, say L1 ... Ln L, has been reached. The type I censoring is a particular case forr = n and the type II censoring appears when =+∞ L1, ..., Ln . Additionally, we add the non-informative censoring assumption, i.e., the random θ variables Li does not depend on . Under this setup, the log-likelihood function has the form ) * −r (θ σ)=σθ1/σ 1 − − 1 L , exp σ 1 A1 θ1/σ A2 ,

= ∑n δ = ∑n δ = ∑n 1/σ δ = ≤ δ = where r i=1 i, A1 i=1 i log ti, A2 i=1 ti , i 1, if Ti Li and i 0, otherwise. Usually, = ( ) the regression modeling considers the distribution ofYi log Ti instead of Ti. The distribution of Yi is of the extreme value form with pdf given by  − μ − μ ( x )= 1 yi i − yi i −∞ < < ∞ f yi; i σ exp σ exp σ , yi , (2) μ = θ θ = xβ where i log i. The regression structure can be incorporated in (2) by making i exp i , β x where is a p-vector of unknown parameters and i is a vector of regressors related to the ith observation. From this moment, we assume thatσ is known, then, the log-likelihood function derived from (2) is given by n − μ − μ (β)= δ − σ + yi i − yi i ∑ i n log σ exp σ . i=1

The total score function and the total Fisher information matrix for β are, respectively, Uβ = σ−1XW1/2v K = σ−2XWX X =(x x ) and ββ , where 1, ..., n , the model matrix, assuming

54 Symmetry 2019, 11, 1351

−μ (X)= W = ( ) = E yi i v =( ) = rank) p, * diag w1, ..., wn , wi exp σ and v1, ..., vn , vi −μ − −δ + exp yi i w 1/2. It can observed that the value of w depends on the mechanism of i σ i ) *i 1/σ w = q × 1 − exp −L exp(−μ /σ) +(1 − q) × (r/n), where W( ) censoring. That means i i i r = P ≤ = = denotes the rth from W1, ..., Wn and q W(r) log Li . Note that q 1 and q 0 for types I and II censoring, respectively. The proof is presented in the Appendix A. The MLE of β, β, is the solution of Uβ = 0. The β can not be expressed in closed-form. It is typically obtained by numerically maximizing the log-likelihood function using a Newton or quasi-Newton nonlinear optimization algorithm. Under mild regularity conditions and in large samples, −1 β ∼ Np β, Kββ ,

approximately.

3. Skewness Coefficient As discussed, the skewness coefficient is simple way to verify whether the approximation to normality is adequate. The model presented in (2) does not has a closed-form for this coefficient. − The alternative is to apply the [3] result. These authors derived an approximation of orderO(n 2) for the third cumulant of the MLE of the a-th regressor, i.e., )  * κ (β )=E β − E(β ) 3 3 ˆ a ˆ a ˆ a ,

a = 1, . . . , p, which can be expressed as

κ (βˆ )=∑ κa,bκa,cκa,d (d) 3 a mbc , (3)

( ) ( ) ( ) ( ) d = κ d − (κ b + κ c + κ ) = ∑ where mbc 5 bc cd bd bcd , a 1, ..., p. Here, represents the summation over all combinations of parameters and over all the observations. From(3), after some algebra, we can express the third cumulant of the distribution of βˆ for the Weibull censored data as κ (β)=−σ−3P(3) W + σW 3 3 1, (4)

W = ( ) = −σ−1 1/σ {− 1/σ (−μ σ) − μ σ} P = K−1X = where diag w1, ..., wn , wi Li exp Li exp i/ i/ , ββ −  1  ( ) σ2 X WX X , P 3 = P  P  P,  represents a direct product of matrices and 1 is a n-dimensional vector of ones. Finally, by (4) and the Fisher information matrix, the asymmetry − coefficient of the distribution of β to order n 1/2 is given by ) * −3/2 γ (βˆ )=−σ−3P(3) W + σW  K−1 1 3 1 diag ββ 1 , (5) ) * −3/2 W = γ (βˆ )=−σ−3P(3)W  K−1 for type II censoring, 0, then (5) reduces to 1 1 diag ββ 1 . More details about the involved expressions are presented in Appendix A. The study of asymptotic properties of the Weibull censored data was the goal of many papers. Refs. [ 6,7] derived the Bartlett and the Bartlett-type correction factors for likelihood ratio and score tests, respectively, for the exponential censored data. Ref. [8] generalized these previous for the Weibull censored data and also derived the Bartlett-type correction factors for the gradient test. Ref.9 [] presented the asymptotic expansions up to − order n 1/2 of the non null distribution functions of the likelihood ratio, Wald, Rao score and gradient statistics also for the censored exponential data. The result in expression (5) can be incorporated in this gallery.

55 Symmetry 2019, 11, 1351

4. Simulation Study − In this section, we compare the sample skewness coefficient (ρ) and the n 1/2 skewness γ γ coefficients evaluated in the true and estimated parameters (1 and 1, respectively) of the distributions of the MLEs in the Weibull censored model. To draw the data, we consider three values forσ: 0.5, 1 and 3; five sample sizes: 20, 30, 40, 60 and 100; three values for the percent of censoring C: 10%, 25% and 50%; and two number of regressors p: 3 and 5, where we consider two vectors for β in each case: (−2, 0.5, 1) and (1, −0.75, 0.5) for p = 3 and (−2, 0.5, 1,−0.3, −0.5) and (1, −0.75, 0.5,−1, 0.8) for p = 5. For each combination of σ, β, % of censoring and sample size we considered 20,000 Monte Carlo x − replicates. Each vector of covariates i considers an intercept term and the p 1 remaining covariates were drawn independently from the standard normal distribution. Values from the Weibull model are drawn considering the inverse transformation method. Therefore, the greater n × C/100 values were censored at the observed (1 − C/100)-th quantile (a type II censoring scheme). For each sample, σ σ γ γ we considered the jackknife estimator for , say J. Therefore, the computation of 1 and 1 was (β σ) (β σ ) performed considering , and , J , the true and estimated parameters, respectively. Additionally, ρ is computed based on the 20,000 (marginal) skewness coefficient for the components of β. Table 1 summarizes the case β =(−2, 0.5, 1) (with p = 3 regressors) and C = 10%. The main conclusions are the following:

− Table 1. The n 1/2 and sample skewness coefficients of the distributions of the MLEs in the Weibull censored data with p = 3 regressors and β =(−2, 0.5, 1).

β β β 0 1 2 σ n ρ γ γ ρ γ γ ρ γ γ C 1 1 1 1 1 1 20 −0.235 −0.080 −0.095 0.052 0.207 0.197 0.071 0.245 0.234 30 −0.169 0.009 0.001 0.037 0.040 0.041 0.097 0.212 0.204 0.5 40 −0.114 −0.013 −0.019 0.038 0.107 0.104 0.075 0.196 0.193 60 −0.139 −0.082 −0.083 0.011 0.061 0.061 0.033 0.171 0.170 100 −0.059 −0.055 −0.055 0.054 0.074 0.073 −0.005 0.087 0.087 20 −0.198 0.017 0.004 −0.008 0.225 0.197 0.068 0.298 0.284 30 −0.219 −0.101 −0.107 0.110 0.200 0.192 0.216 0.304 0.299 10% 1.0 40 −0.210 0.004 −0.002 0.083 0.140 0.137 0.109 0.185 0.182 60 −0.147 −0.005 −0.010 0.085 0.143 0.137 0.057 0.173 0.171 100 −0.092 −0.013 −0.015 0.019 0.048 0.047 0.116 0.132 0.131 20 −0.232 0.006 0.005 0.094 0.143 0.131 0.022 0.093 0.087 30 −0.178 −0.041 −0.040 0.004 0.054 0.048 −0.007 0.147 0.136 3.0 40 −0.185 0.005 0.003 0.002 0.024 0.023 0.064 0.156 0.151 60 −0.128 −0.020 −0.020 0.044 0.049 0.047 0.073 0.124 0.120 100 −0.117 −0.028 −0.028 0.084 0.100 0.095 0.039 0.077 0.075

• γ γ γ The terms 1 and 1 are closer in all the considered combinations, suggesting that 1 approaches γ 1 in a reasonable way, even when the sample size is small. • γ ρ β β β In general terms, 1 approaches well for 1 and 2. However, for 0 the terms seem discrepant even for n = 100. • Considering the 90 cases for p = 3, ρ ranges from (−0.245, 0.255), (−0.429, 0.340) and (−0.819, 1.181) for C 10%, 25% and 50%, respectively. For p = 5, ρ ranges from (−0.373, 0.252), (−0.402, 0.198) and (−0.787, 0.495) for C 10%, 25% and 50%, respectively. This suggest that a higher percentage of censorship produce a higher skewness in the MLE estimators for the components of β. • Considering the 90 cases forp = 3, ρ ranges from (−0.819, 1.181), (−0.363, 0.867), (−0.351, 0.411), (−0.305, 0.346) and (−0.273, 0.255), for n = 20, 30, 40, 60 and 100, respectively. Forp = 5, ρ ranges

56 Symmetry 2019, 11, 1351

from (−1.015, 0.740), (−0.529, 0.426), (−0.372, 0.413), (−0.318, 0.320) and (−0.225, 0.243) for n = 20, 30, 40, 60 and 100, respectively. This suggest that, as expected, when n increases the skewness coefficient of the MLE estimators for the components of β will be more symmetric.

Results suggest that, even with a moderate percentage of censored observations and small sample sizes, the distribution of the MLE for the components of β in the Weibull censored model are closer to the symmetry. The combinations ofβ, p and C not seem to affect the results. A simulation study showing this finding was omitted for the sake of brevity.

5. Applications In this section we illustrate with two real dataset the application of the estimated skewness coefficient for the MLE estimators in the Weibull censored regression model. All the routine was performed in the statistical software R, [10]. Codes can be found in the personal website from the first author https://www.ufjf.br/tiago_magalhaes/downloads/.

5.1. Smokers Dataset This dataset is related to a on the effectiveness of triple-combination pharmacotherapy for tobacco dependence treatment conducted by the Cancer Institute of New Jersey and Robert Wood Johnson Foundation. The trial recruited 127 smokers 18 years or older with predefined medical illnesses from the local community. The outcome were the time (in days) to first relapse (return to smoking). The study lasted 182 days (26 weeks). Therefore, the times are subject to a censoring type I (32% of times were censored). We only considered the 113 patients where such observed time was positive (non-zero). Other measures were assigned randomly treatment group with levels combination or patch onlygrp ( ), age in years at time of randomization (age) and employment (full-time or non-full-time). We consider ∼ (θ σ) θ = Xβ β =(β β β β ) that timei WE i; , where log i i , intercept, grp, age, employment and X =( ) i 1, grpi, agei, employmenti σ = We estimated J 1.617008 based on the jackknife method, which was used as known in all the computations. Table 2 shows the parameters estimates, their standard errors and the estimated skewness coefficients and Figure 1 shows the estimated density function based on 1000 bootstrap samples for the coefficients related to the covariatesgrp, age and employment. Note that the estimated skewness for all parameters were closer to zero, suggesting a symmetric distribution for the estimators which is corroborated by the estimated density based on the bootstrap.

Figure 1. Estimated density function based on 1000 bootstrap samples and the asymptotic distribution for βgrp (left panel), βage (center panel) and βemployment (right panel). The red line denotes the estimated parameter.

57 Symmetry 2019, 11, 1351

Table 2. Estimates for parameters and skewness coefficient in smokers dataset.

γ Parameter Estimate s.e. 1

βintercept 3.1690 0.8136 −0.0478 βgrp −1.0303 0.3694 −0.0529 βage 0.0541 0.0167 0.1251 βemployment −1.1460 0.3935 −0.0753

5.2. Insulating Fluids Dataset This dataset was presented in [ 11] on insulating fluids and it is related an accelerated test performed in order to determine the relationship between time (in minutes) to breakdown and voltage (in kilovolts). The authors assumed a regression structure based on the Weibull model and = ∼ (θ σ) θ = Xβ a common censoring time at L 200 (type I censoring), i.e., timei WEI i, , where log i i , = X =(β β ) σ = i 1, ...,76, i Intercept, log-voltage . We estimated J 1.296704 based on the jackknife method, which was used as known. Table 3 shows the estimates, standard errors and estimated skewness coefficient for the MLE estimators and Figure 2 shows the estimated density function for βIntercept and βlog-voltage. Newly, the estimated skewness for both parameters are closer to zero, suggesting a symmetric distribution for the estimators as also suggest the estimated density based on bootstrap.

Table 3. Estimates for parameters and skewness coefficient in insulating fluids dataset.

γ Parameter Estimate s.e. 1

βintercept 20.4342 1.8772 0.1451 βlog-voltage −0.5311 0.0557 −0.1517

Figure 2. Estimated density function based on 1000 bootstrap samples and the asymptotic distribution for βIntercept (left panel) and βlog-voltage (right panel). The red line denotes the estimated parameter.

6. Concluding Remarks Since its beginning, the Weibull distribution and regression showed how it is important. In the frequentist context, this model depends strongly on the asymptotic properties of the MLE. Here, we presented an expression of the skewness that, in practical applications, can be used as an indicator of departure from the normal distribution of the MLE. Although the expression(3) entails a great deal of algebra, the final expression (5) of the skewness of the MLE distribution has a very nice form only

58 Symmetry 2019, 11, 1351 involving simple operations on diagonal matrices and can be easily implemented into a statistical software, for instance, R, [10].

Author Contributions: All the authors contributed significantly to the present paper. Funding: The research of D.I. Gallardo was supported by Grant from FONDECYT (Chile) 11160670. H.W. Gómez was supported by Grant SEMILLERO UA-2019 (Chile). Acknowledgments: We also acknowledge the referee’s suggestions that helped us to improve this work. Conflicts of Interest: The authors declare no conflict of interest.

Appendix A In this Section we provided some additional details related to the computation of the manuscript.

Appendix A.1. W’s Quantities −μ = E yi i = In order to compute wi exp σ , i 1, ..., n, we first study the case type I censoring. Note that ⎧ −μ − μ ⎨ yi i ≤ yi i exp σ ,if yi log Li exp = −μ σ ⎩ log Li i exp σ , otherwise

Therefore, log L ( − μ ) − μ − μ = i 1 2 yi i − yi i + log Li i P( > ) wi σ exp σ exp σ dyi exp σ Ti Li −∞ σ −μ σ σ −μ σ σ −μ σ σ −μ σ = 1 − exp −L1/ e i/ 1 + L1/ e i/ + L1/ e i/ exp −L1/ e i/ i i i i σ −μ σ = − − 1/ i/ 1 exp Li e .

Direct computation also shows that ( − μ ) = E 2 yi i vi exp σ log L ( − μ ) − μ ( − μ ) = i 1 3 yi i − yi i + 2 log Li i P( > ) σ exp σ exp σ dyi exp σ Ti Li −∞ σ −μ σ σ −μ σ σ − μ σ σ − μ σ σ −μ σ = 2 − exp −L1/ e i/ 2 + 2L1/ e i/ + L2/ e 2 i/ + L2/ e 2 i/ exp −L1/ e i/ ) i i *i i i σ −μ σ σ −μ σ = 2 1 − exp −L1/ e i/ 1 + L1/ e i/ . / i i = + σ 2 wi wi . −μ = yi i ∼ ( ) On the other hand, for the type II censoring note that Wi exp σ E 1 .By[12], we have that i =D Zj W(i) ∑ − + , (A1) j=1 n j 1

59 Symmetry 2019, 11, 1351

=D where W(i) is the ith order statistic from W1, ..., Wn, denotes “equal in distribution” and Z1, ..., Zn ( ) E( )= ( )= are independent and identically distributedE 1 random variables. Therefore, W(j) Var W(j) ∑j ( − + )−1 k=1 n k 1 and ⎧ ⎪ ⎪ W(1), with probability 1/n ⎨⎪ . W = . i ⎪ ⎪ W(r−1), with probability 1/n ⎩ ( − + ) W(r), with probability n r 1 /n

Therefore,

r−1 j ( − + ) r = E( )= 1 ( − + )−1 + n r 1 ( − + )−1 wi Wi ∑ ∑ n k 1 ∑ n k 1 . n j=1 k=1 n k=1

E( )= E( 2 )=E( )+ With some manipulations, we obtain that Wi r/n. Also, we have that W(j) W(j) E2( ) W(j) and ⎧ ⎪ W2 , with probability 1/n ⎪ (1) ⎨⎪ . = . Vi ⎪ ⎪ W2 , with probability 1/n ⎪ (r−1) ⎩ 2 ( − + ) W(r), with probability n r 1 /n

Therefore,     2 r−1 j ( − + ) r 2 = + 1 ( − + )−1 + n r 1 ( − + )−1 vi wi ∑ ∑ n k 1 ∑ n k 1 . n j=1 k=1 n k=1

Algebraic manipulations shows that   r ( − )+ = 1 + 2 r k 1 vi r ∑ − + . n k=1 n k 1

Finally, as the hybrid scheme can be seen as a mixture between type I and II censoring, we obtain directly that σ −μ σ = × − − 1/ i/ +( − ) × ( ) wi q 1 exp L e 1 q r/n , i   ) * r σ −μ σ σ −μ σ 1 2(r − k)+1 = × − − 1/ i/ + 1/ i/ +( − ) × + ∑ vi q 2 1 exp Li e 1 Li e 1 q r − + , n k=1 n k 1 = P( ≤ ) where q is the mixing probability given byq W(r) log L .By(A1),W(r) has hypoexponential [13] λ =(λ λ ) λ =( − + )−1 distribution with vector of parameters 1,..., n , where j n j 1 . Therefore,

−λ n L j q = 1 − ∑ , j=1 Pj

= ∏n ( − ) ( − + ) where Pj k=1,k =j k j / n j 1 .

60 Symmetry 2019, 11, 1351

Appendix A.2. Derivatives and Cumulants

Let Y1, ..., Yn a random sample from Weibull censored data, the of the likelihood function is given by  n − μ − μ (β)= δ − σ + yi i − yi i ∑ i n log σ exp σ . (A2) i=1

The first four derivatives of (A2) can be expressed, respectively, for  ∂ n − μ (β)= 1 −δ + yi i ∂β σ ∑ i exp σ xri; r = i 1 ∂2 n − μ (β)=− 1 ∑ yi i ∂β ∂β 2 exp σ xrixsi; r s σ = i 1 ∂3 n − μ (β)= 1 ∑ yi i ∂β ∂β ∂β σ3 exp σ xrixsixti. r s t i=1

The second- to third-order cumulants are

n n κ = − 1 ∑ κ = −κ = 1 ∑ rs σ2 wixrixsi; r,s rs σ2 wixrixsi; i=1 i=1

n n κ = 1 ∑ κ(t) = − 1 ∑ rst σ3 wixrixsixti; rs σ2 wi xrixsixti; i=1 i=1 ) * −μ = E yi i where wi exp σ , ) * = − − 1/σ (−μ σ) wi 1 exp Li exp i/ , = − 1 1/σ {− 1/σ (−μ σ) − μ σ} wi σ Li exp Li exp i/ i/ ,

= It can be observed that wi 0 for type II censoring.

References

1. Weibull, W. A statistical distribution function of wide applicability. J. Appl. Mech. 1951, 18, 293–297. 2. Kendall, M.G.; Stuart, A. The Advanced Theory of Statistic, 4th ed.; Griffin: London, UK, 1977. 3. Bowman, K.O.; Shenton, L.R. Asymptotic skewness and the distribution of maximum likelihood estimators. Commun. Stat. Theory Methods 1998, 27, 2743–2760. 4. Cordeiro, H.H.; Cordeiro, G.M. Skewness for parameters in generalized linear models. Commun. Stat. Theory Methods 2001, 30, 1317–1334. 5. Magalhães, T.M.; Botter, D.A.; Sandoval, M.C.; Pereira, G.H.A.; Cordeiro, G.M. Skewness of maximum likelihood estimators in the varying dispersion beta regression model. Commun. Stat. Theory Methods 2019, 48, 4250–4260. 6. Cordeiro, G.M.; Colosimo, E.A. Improved likelihood ratio tests for exponential censored data. J. Stat. Comput. Sim. 1997, 56, 303–315. 7. Cordeiro, G.M.; Colosimo, E.A. Corrected score tests for exponential censored data. Stat. Probab. Lett. 1999, 44, 365–373. 8. Magalhães, T.M.; Gallardo, D.I. Bartlett and Bartlett-type corrections for Weibull censored data. Stat. Oper. Res. Trans. 2019, Submitted.

61 Symmetry 2019, 11, 1351

9. Lemonte, A.J. Non null asymptotic distribution of some criteria in censored exponential regression. Commun. Stat. Theory Methods 2014, 43, 3314–3328. 10. R Development Core Team. A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2019. 11. Nelson, W.; Meeker, W.Q. Theory for optimum accelerated censored life tests for Weibull and extreme value distributions. Technometrics 1978, 20, 171–177. 12. Renyi, A. On the theory of order statistics. Acta Math. Hung. 1953, 4, 191–231. 13. Devianto, D.; Oktasari, L.; Maiyastri. Some properties of hypoexponential distribution with stabilizer constant. Appl. Math. Sci. 2015, 142, 7063–7070.

c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

62

S symmetry S

Article Flexible Birnbaum–Saunders Distribution

Guillermo Martínez-Flórez 1, Inmaculada Barranco-Chamorro 2, Heleno Bolfarine 3 and Héctor W. Gómez 4,* 1 Departamento de Matemáticas y Estadística, Facultad de Ciencias Básicas, Universidad de Córdoba, Córdoba 230027, Colombia; [email protected] 2 Departamento de Estadística e Investigación Operativa, Universidad de Sevilla, 41000 Sevilla, Spain; [email protected] 3 Departamento de Estatítica, IME, Universidade de São Paulo, São Paulo 01000, Brazil; [email protected] 4 Departamento de Matemáticas, Facultad de Ciencias Básicas, Universidad de Antofagasta, Antofagasta 1240000, Chile * Correspondence: [email protected]

Received: 5 September 2019; Accepted: 4 October 2019; Published: 16 October 2019

Abstract: In this paper, we propose a bimodal extension of the Birnbaum–Saunders model by including an extra parameter. This new model is termed flexible Birnbaum–Saunders (FBS) and includes the ordinary Birnbaum–Saunders (BS) and the skew Birnbaum–Saunders (SBS) model as special cases. Its properties are studied. Parameter estimation is considered via an iterative maximum likelihood approach. Two real applications, of interest in environmental sciences, are included, which reveal that our proposal can perform better than other competing models.

Keywords: flexible skew-normal distribution; skew Birnbaum–Saunders distribution; bimodality; maximum likelihood estimation; Fisher information matrix

1. Introduction The BS distribution was originally introduced in [1] to model the fatigue in the lifetime of certain materials. During the last decades, mainly due to its good properties, the use of this model spread out to other fields, such as economics and environmental sciences. In these applied scenarios, quite often, departures of the BS model are found, and therefore it is necessary to introduce some improvements. In this paper, we focus on those situations in which extra asymmetry or bimodality are present in our data, and a generalization of the BS model should be considered to deal with these issues. To reach this end, a flexible BS model is introduced. Our proposal is based on the flexible skew-normal distribution introduced in [2], and includes, as particular cases, the BS and skew BS distribution. Next, we briefly describe the key aspects that properly combined result in the flexible BS model. These are asymmetry, bimodality and main features of the basic BS model.

1.1. Asymmetry Earlier results on asymmetric models started with the pioneering works by [ 1,3,4]. This topic regained interest with the study in5 [], which from a Bayesian point of view developed a new asymmetric model which was later studied in depth by Azzalini [ 6], from a classical point of view. Azzalini model was termed the skew-normal distribution. Following Azzalini’s method, a general family of asymmetric models termed skew-symmetric models appeared in the literature. The following lemma, originally presented in [6], can be considered as the starting point for the development of these asymmetric models.

Symmetry 2019, 11, 1305; doi:10.3390/sym1110130563 www.mdpi.com/journal/symmetry Symmetry 2019, 11, 1305

Lemma 1. Let f0 be a probability density function (pdf) which is symmetric around zero, and G a cumulative distribution function (cdf) such that G exists and is a symmetric pdf around zero. Then

( λ)= ( ) (λ ) ∈ R fZ z; 2 f0 z G z , z , (1) is a pdf for λ ∈ R.

(·) (·) λ Equation (1) provides the skew version of f0 with skewing function G and the skewness (·)=φ(·) (·)=Φ(·) ( ) parameter. If f0 and G , the pdf and cdf, respectively, of the N 0, 1 distribution, then the skew-normal is obtained, whose pdf is

( )= φ( )Φ(λ ) ∈ R λ ∈ R fZ z 2 z z , z , . (2)

Other examples of skew models are: skew-t, skew-Cauchy, skew-elliptical, and generalized skew-elliptical. We highlight that all of them are unimodal distributions.

1.2. Bimodality Another fundamental result in our proposal will be the following lemma, which was given in δ Gómez et al. [2]. These authors extended (1), by introducing a parameter in f0, in such a way that for certain values of δ the resulting distribution is bimodal.

Lemma 2. Let f be a symmetric pdf around zero, F the corresponding cdf and G an absolutely continuous cdf such that G exists and is symmetric around zero. Then

g(z; δ, λ)=cδ f (|z| + δ)G(λz), z ∈ R, λ, δ ∈ R (3)

−1 is a pdf and cδ = 1 − F(δ).

Taking f (·)=φ(·) and G(·)=Φ(·),in(3), the flexible skew-normal (FSN) model was obtained and studied in detail in [2]. There, it was proved that the FSN model can be bimodal for certain values of δ. Notice that the FSN model is obtained by adding an extra parameter, δ, to the skew-normal distribution proposed in [6]. That is a random variable (rv)Z follows a FSN distribution, Z ∼ FSN(δ, λ), if its pdf is given by

f (z; δ, λ)=cδφ(|z| + δ)Φ(λz), z ∈ R, λ, δ ∈ R (4)

−1 where φ and Φ are the pdf and cdf of the N(0, 1) distribution, respectively, and cδ = 1 − Φ(δ). Other recent proposals in the contemporary literature dealing with bimodality are the extended two-pieces skew-normal model (ETN), introduced in [ 7] and the uni-bi-modal asymmetric power normal model given in [8] whose properties are based on results given in [ 9,10]. Applications of interest in Economics are given in11 [ ]. All these references show the interest in the latest years for modelling bimodality.

1.3. BS Model The BS or fatigue life distributions was proposed for modelling survival time data and material lifetime subject to stress in [12,13]. This model is asymmetric and only fits positive data. The pdf of a BS distribution is given by

−3/2( + β) ( )=φ( ) t %t > fT t at , t 0, (5) 2α β

64 Symmetry 2019, 11, 1305 where  1 t β a = a (α, β)= − , (6) t t α β t

α > 0 is a shape parameter,β > 0 is a scale parameter and the median of this distribution. (5) is denoted as T ∼ BS(α, β). It is well known that α is the parameter that controls asymmetry. Specifically, (5) becomes more asymmetric as α increases and symmetric around β as α gets close to zero. It can be seen in [13] that (5) can be obtained as the distribution of the random variable   2 α α 2 T = β Z + Z + 1 , (7) 2 2 where Z ∼ N(0, 1). The BS model has been applied to a variety of practical situations. However, quite often, although the data suggest a BS distribution, some deficiencies are observed in the fitted BS model. This problem has motivated an increasing interest in its generalizations. We highlight that, recently, this model was extended by [ 14] to the family of elliptical distributions, this is known in the literature as the generalized Birnbaum–Saunders (GBS) distribution. Later, [ 15] proposed an extension based on the elliptical asymmetric distributions, known as the doubly generalized Birnbaum–Saunders model. On the other hand, [ 16] presents the asymmetric BS distribution with five parameters called the extended Birnbaum–Saunders (EBS) distribution. Other types of extensions are the asymmetric epsilon-Birnbaum–Saunders model given in [ 17], models in [18] based on the slash-elliptical family of distributions, and the generalized modified slash Birnbaum–Saunders (GMSBS) proposed in [ 19], which is based on [20]. In these extensions, we find that the asymmetric BS models previously cited, such as [ 15,21], are designed to fit data with greater or smaller asymmetry (or kurtosis) than that of the ordinary BS model, but they are not appropriate for fitting bimodal data. On the other hand, we highlight that the extension given in [21], which can become bimodal for certain combination of parameters is unable to capture bimodality unless it is accentuated enough. Therefore there exists a real need for an asymmetric model, based on the BS distribution, and able to fit data presenting bimodal features, which is not uncommon in the literature. So the present paper presents a flexible BS distribution able to model skewness and to fit data with and without bimodality. The paper is organized as follows. Section 2 is devoted to the development of an asymmetric uni-bimodal BS model. Its properties are studied in depth. Specifically, a closed expression for the cumulative distribution function (cdf) is given in terms of the cdf of a bivariate normal distribution. Some of the models proposed in [ 15,22] are obtained as particular cases. The shape and bimodality of the distribution are studied. It is shown that this model is closed under a change of scale and reciprocity. Survival and hazard functions are also obtained. Section 3 deals with moments derivation and iterative maximum likelihood estimation methods for the new model. Section 4 is devoted to real data applications of interest in environmental sciences. The first one deals with a bimodal situation in which our proposal performs better than other BS models and a mixture of normal distributions. The second one is taken from [ 16], where the extended BS model was proposed as the best for this dataset. It is shown that the FBS outperforms the extended BS model.

2. Results in Flexible Birnbaum-Saunders Based on the flexible skew-normal model proposed in [ 2], we extend the Birnbaum–Saunders. The main idea is to apply (7) withZ ∼ FSN(δ, λ) introduced in (4). This new model is called the flexible Birnbaum–Saunders (FBS) distribution whose pdf is given by

− t 3/2(t + β) f (t; α, β, δ, λ)= φ(|at| + δ)Φ(λat), (8) 2αβ1/2(1 − Φ(δ))

65 Symmetry 2019, 11, 1305

with at defined in (6), t > 0, α > 0, β > 0, δ ∈ R, λ ∈ R, φ(·) and Φ(·) the pdf and cdf of a N(0, 1), respectively. We use the notationT ∼ FBS(α, β, δ, λ). The inclusion of parameters δ and λ makes our approach more flexible than the extensions previously discussed. λ is a parameter that controls asymmetry (skewness) and δ is a shape parameter related to bimodality of our proposal. If λ = 0 then we obtain, as a particular case, the model introduced by [22]. Figures 1 and 2 depict the behaviour of (8) for some values of parameters, illustrating that it can be bimodal for some combinations of them.

2.1. Interpretation of Parameters. In both figures the values of parameters α and β are fixed. We study the effects of

(i) λ positive versus λ negative. (ii) Increasing δ > 0 in Figure 1. Decreasing δ < 0 in Figure 2.

Figure 1 suggests that, for α and β fixed, if a positive value of δ is considered then we have a unimodal distribution and the peak of the distribution increases whenδ increases: δ = 0.75 (red solid line), δ = 1.5 (green dashed line), ..., δ = 3 (blue dashed dotted line). This happens for positive and negative values of λ. On the other hand, in Figure 2, we have different situations. This plot suggests that, for α and β fixed, if a negative value of δ is considered then a bimodal distribution can be obtained. For positive λ,ifδ decreases: δ = −0.75 (red solid line), δ = −1.5 (green dashed line), ..., δ = −3 (blue dashed dotted line), then the peaks decrease and bimodality becomes more accentuated. For negative λ,ifδ decreases, then main peak increases and bimodality becomes less accentuated. Also, note in Figures 1 and 2, that in the FBS model the pdf for negativeλ is no longer the specular image of plot for positive λ.

)%6SGI VODPEGD  )%6SGI VODPEGD í

GHOWD  GHOWD  GHOWD  GHOWD  GHOWD  GHOWD  GHOWD  GHOWD  I W I W      

             

W W

(a) (b) Figure 1. FBS distributions forα = 0.75, β = 1 (both fixed). In (a) λ = 1 versus (b) λ = −1. Increasing values of δ > 0: δ = 0.75 (red solid line), 1.5 (green dashed line), 2.25 (black dotted line) and 3.0 (blue dashed and dotted line).

66 Symmetry 2019, 11, 1305

)%6SGI VODPEGD  )%6SGI VODPEGD í

GHOWD í GHOWD í GHOWD í GHOWD í GHOWD í GHOWD í GHOWD í GHOWD í I W I W  

 

W W

(a) (b) Figure 2. Flexible Birnbaum–Saunders (FBS) distributions for α = 0.30, β = 0.75 (both fixed). In ( a) λ = 0.5 versus (b) λ = −0.5. Decreasing values ofδ < 0: δ = −0.75 (red solid line),−1.5 (green dashed line), −2.25 (black dotted line) and −3.0 (blue dashed and dotted line).

2.2. Properties Next, important properties of the FBS model are presented. First an explicit expression for the cdf is given in terms of the cdf of a bivariante normal distribution.

Proposition 1. Let T ∼ FBS(α, β, δ, λ). Then the cdf of T is ⎧ ⎪ Φ √ λδ − δ < < β ⎪ cδ λ , at , if 0 t ⎨ BN 1+λ2 F (t)= (9) T ⎪ ⎩ Φ √ λδ −δ + Φ − √ λδ + δ − Φ − √ λδ δ ≥ β cδ λ , λ , at λ , , if t , BN 1+λ2 BN 1+λ2 BN 1+λ2 Φ ( ) μ =( ) where BNλ x, y is the cdf of a bivariate normal distribution, with mean vector 0, 0 and covariance matrix  1 ρλ λ Ωλ = where ρλ = − √ . (10) ρλ 1 1+λ2

Proof. It can be seen in Appendix A.

Next some particular cases of interest for λ and δ parameters are discussed. Results about the (·) shape of fT are included.

2.2.1. Effect of λ. Corollary 1. Let T ∼ FBS(α, β, δ, λ).Ifλ = 0 then the cdf of T is cδ Φ( − δ) < < β ( )= 2 at , if 0 t FT t cδ {Φ( + δ)+ − Φ(δ)} ≥ β (11) 2 at 1 2 , if t .

Proof. If λ = 0 then ρλ, defined in (10), is equal to zero, and since in the bivariate normal distribution uncorrelation implies independence, we have that

Φ ( )=Φ( ) Φ( ) ∀( ) BNλ=0 x, y x y , x, y .

Taking into account thatΦ(0)=1/2 and Φ(−δ)=1 − Φ(δ), we have that (9) reduces to (11).

Result in Corollary 1 corresponds to the model studied in [22].

2.2.2. Effect of δ. ∼ (α β δ λ) δ = ( )= Φ ( ) > Corollary 2. Let T FBS , , , .If 0 then FT reduces to FT t 2 BNλ 0, at , for t 0.

67 Symmetry 2019, 11, 1305

Corollary 2 is a particular case of models studied in [15]. (·) 2.2.3. Shape of fT . Proposition 2. Let T ∼ FBS(α, β, δ, λ). Then the pdf given in (8) is nondifferentiable at t = β.

Proof. It follows from (8), by noting that if t = β then at = 0 and the absolute value function is not differentiable at zero.

Proposition 3. Let T ∼ FBS(α, β, δ, λ). The pdf given in (8) can be bimodal. The modes are the solution of the following non-linear equations. ∗ 1. 0 < t < β solution of 1 φ(λ ) a at1 t1 at = δ + λ + ) * . (12) 1 Φ(λa ) 2 t1 a t1 ∗ 2. t > β solution of 2 φ(λ ) a at2 t2 at = −δ + λ + ) * , (13) 2 Φ(λa ) 2 t2 a t2

With at given in (6), at and at the first and second derivatives of at with respect to t, respectively.

Proof. It is given in Appendix A.

Comments on the use of (12) and (13) are included in Appendix A, Remark A1.

Remark 1. Equations obtained in (12) and (13) are similar to those we have in the skew normal and BS model. ∗ 1. Let Z ∼ SN(λ), λ ∈ R. Then Z is unimodal and the mode, z , is given by the solution of the non-linear equation φ(λz) z = λ . Φ(λz) ∗ 2. Let T ∼ BS(α, β), α, β > 0. Then T is unimodal and the mode, t , is given by the solution of the non-linear equation . / − 2 + = at at at 0.

Next it is shown that the p-th quantile of T can be given in terms of the pth quantile of the FSN(δ, λ). Also it is proved that the FBS model is closed under change of scale and reciprocity.

+ Theorem 1. Let T ∼ FBS(α, β, δ, λ), with α, β ∈ R and δ, λ ∈ R. Then

(i) Let tp be the pth quantile of T , 0 < p < 1.  2 α α 2 t = β z + z + 1 (14) p 2 p 2 p

where zp denotes the pth quantile of Z ∼ FSN(δ, λ). (ii) kT ∼ FBS(α, kβ, δ, λ) for k > 0. − − (iii) T 1 ∼ FBS(α, β 1, δ, −λ).

Proof. It can be seen in Appendix A.

68 Symmetry 2019, 11, 1305

2.2.4. Lifetime Analysis The BS model is commonly used to explain survival and material resistance data. The survival and risk (or hazard) functions are important indicators in such fields. For the FBS model these functions are given next.

+ Proposition 4. Let T ∼ FBS(α, β, δ, λ) with α, β ∈ R and δ, λ ∈ R. Then ( )= [ > ]= − ( ) (·) (i) The is S t P T t 1 FT t with FT given in (9). (ii) The hazard function, r(t)= f (t)/S(t),is ⎧ φ(| |+δ)Φ(λ ) ⎪ cδ at at at , if 0 < t < β ⎨⎪ λδ 1−cδΦ √ ,a −δ BNλ +λ2 t ( )= 1 r t ⎪ cδ a φ(|a |+δ)Φ(λa ) ⎪ t t t , if t ≥ β ⎩ − Φ √ λδ −δ +Φ − √ λδ +δ −Φ − √ λδ δ 1 cδ BNλ , BNλ ,at BNλ , 1+λ2 1+λ2 1+λ2 Φ (·) with BNλ the cdf of the bivariate normal given in Proposition 1.

In Figure 3, the hazard function for those pdf’s considered in Figures 1 and 2 are plotted. These graphs show that, for the FBS distribution, the hazard function admits a variety of shapes, which is interesting from an applied point of view.

Figure 1 (a) Figure 1 (b)

3 5 4 2 3 2 1 1

0.5 1.0 1.5 2.0 2.5 3.0 0.5 1.0 1.5 2.0 2.5 3.0 Figure 2 (a) Figure 2 (b) 6 5 8 4 6 3 4 2 1 2 0.5 1.0 1.5 2.0 2.5 3.0 0.5 1.0 1.5 2.0 2.5 3.0

Figure 3. Hazard function of the FBS distribution for plots corresponding to Figure 1 (a), (b):α = 0.75, β = 1 (both fixed), δ = 0.75 (red solid line), 1.5 (green dashed line), 2.25 (black dotted line) and 3.0 (blue dashed dotted line), in (a) λ = 1 versus (b) λ = −1. For plots corresponding to Figure 2 (a), (b): α = 0.30, β = 0.75 (both fixed), δ = −0.75 (red solid line), –1.5 (green dashed line), –2.25 (black dotted line) and –3.0 (blue dashed dotted line), in (a) λ = 0.5 versus (b) λ = −0.5.

Remark 2. More complicated hazard functions than the traditional ones are obtained when we are dealing with models with complex structure, as it happens with the FBS. For instance, in Figure 3, we have two situations:

1. r(t) corresponding to Figure 1a,b. These are, first, quickly increasing, later decreasing more slowly or even in a flat way. It can be applied in practical situations in which the risk of failure increases quickly until certain point in which its behaviour becomes flatter. As [23] points out, the flat area is very interesting in survival analysis and reliability contexts.

69 Symmetry 2019, 11, 1305

2. r(t) corresponding to Figure 2a,b are increasing-decreasing-increasing. This kind of hazard functions has been recently introduced and discussed in literature, due to its interest in reliability of systems, see for instance [23]or[24] (and references therein). In plot for Figure 2b,r(t) is (quickly) increasing—or (quickly) decreasing. On the other hand, for Figure 2a the initial effect increasing—decreasing is less accentuated.

3. Moments and Maximum Likelihood Estimation Moments of the FBS model can be obtained from the moments of the flexible skew-normal model given in [2]. The following results present important properties relating those distributions, and the expressions for the first moment and variance in the FBS model.

Theorem 2. Let T ∼ FBS(α, β, δ, λ) and Z ∼ FSN(δ, λ). Then E(Tr), r = 0, 1, ..., always exists. Moreover   βr 2r 2r−k E( r)= ∑ 2r E (α )k α2 2 + 2 = T r Z Z 4 , r 0, 1, . . . . (15) 4 k=0 k

Proof. From (7), we can write  β 1/2 2 T = αZ + α2Z2 + 4 . 4

Taking expectation of the rth-power of T    βr 1/2 2r E(Tr)= E αZ + α2Z2 + 4 . (16) 4r

From (16), note that forr = 0, 1, ..., E(Tr) exists if and only ifE(Z2r) exists. On the other hand, it can be seen in [2] that E(Z2r) always exists, and therefore E(Tr) too. Finally note that (15) is the result of applying the binomial formula to (16). ∼ (α β δ λ) Next, explicit expressions for the√ expected value and variance of T FBS , , , are given. κ = E Zj α2 2 + ∼ (δ λ) In these expressions, j SFN 2 Z 4 with Z FSN , .

Theorem 3. Let T ∼ FBS(α, β, δ, λ). Then α2 ) * 2 E(T)=β 1 + ακ + cδ (1 + δ )(1 − Φ(δ)) − δφ(δ) , (17) 1 2

α4 E( 2)=β2 7 cδ ( + δ2 + δ4)( − Φ(δ)) − δ( + δ2)φ(δ) + α3κ + ακ + T 3 6 1 5 3 2 1 1 16 2 2 2 + 2α β cδ (1 + δ )(1 − Φ(δ)) − δφ(δ) and 7α4cδ Var(T)=β2 (3 + 6δ2 + δ4)(1 − Φ(δ)) − δ(5 + δ2)φ(δ) + α3κ − α2κ + 1 16 3 1 α2β2 cδ 2 2 2 − (1 + δ )(1 − Φ(δ)) − δφ(δ) α cδ (1 + δ )(1 − Φ(δ)) − δφ(δ) + 4(1 − ακ ) 4 1 √ κ = E Zj α2 2 + ∼ (δ λ) where j FSN 2 Z 4 and Z FSN , .

70 Symmetry 2019, 11, 1305

κ Proof. These results follows from Theorem 2 and the expressions of j, which have been computed by using the results for moments of Z ∼ FSN(δ, λ) obtained in [2]. As illustration, note that for the case r = 1, (15) reduces to β % E(T)= E(α2Z2 + 4)+2E (αZ) α2Z2 + 4) + E(α2Z2) 4 α2 = β 1 + ακ + E(Z2) , 1 2 it can be seen in [2] that E(Z2)=cδ(1 + δ2) (1 − Φ(δ)) − δφ(δ)), and so (17) is obtained.

3.1. Maximum Likelihood Estimators Parameter estimation in the BS model has been the topic of interest in many papers. Among others, we mentioned [25–27]. To estimate the parameters in the usual BS model, the modified moment method (MME) and maximum likelihood (MLE) are commonly used. To start the maximum likelihood approach moment estimators are used which are given by √ β = s ˆ M sr, αˆ = 2 − 1 . M r −1 1 n 1 n 1 where s = ∑ = t and r = ∑ = are the sample (arithmetic) and , n i 1 i n i 1 ti respectively. Relevant aspects of this distribution such as its robustness with respect to parameter − estimation and O(n 1) bias corrections for MLEs, can be seen in [25–27]. In the following, we discuss MLE estimation for the FBS model in depth. Thus, given ∼ (α β δ λ) n observations independent and identically distributed, T1, T2, ..., Tn, with Ti FBS , , , , the log-likelihood function for the parameter vector θ =(α, β, δ, λ) is given by n n (θ)=− (α)+1 (β)+ ( − Φ(δ)) − 3 ( )+ ( + β) n log log log 1 ∑ log ti ∑ log ti 2 2 i=1 i=1 n n − 1 ∑( 2 + δ| | + δ2)+∑ (Φ(λ )) ati 2 ati log ati . (18) 2 i=1 i=1

To maximize l(θ) in θ, consider the first derivatives of l(θ) with respect to α, β, δ and λ, denoted as l˙α, l˙β, l˙δ and l˙λ, respectively. From l˙α = 0, l˙β = 0, l˙δ = 0 and l˙λ = 0, the likelihood equations are given by n n n φ(λat ) − n + ∑ a2 − δ ∑ |a |−λ ∑ a i = 0 (19) ti ti ti Φ(λ ) i=1 i=1 i=1 ati n n n 1 1 φ(λat ) t + β − + ∑ + ∑ sgn(a ) (|a | + δ) − λ i i√ = 0 (20) β + β αβ3/2 ti ti Φ(λ ) 2 i=1 ti 2 i=1 ati ti φ(δ) n δ − = − 1 ∑ | | − Φ(δ) ati (21) 1 n i=1 n φ(λa ) ∑ ti = ati Φ(λ ) 0 (22) i=1 ati in which sgn(·) denotes the sign function. The solution to the previous system of equations must be obtained by iterative methods such as the Newton-Raphson or quasi-Newton procedures, which can be implemented using the statistical software R, [28].

71 Symmetry 2019, 11, 1305

As initial estimates of α and β can be proposed the estimates of these parameters obtained in the α β basic BS model, denoted as 0 and 0. These estimates can be plugged into (21) and (22) to obtain δ λ δ λ preliminar estimates of and , 0 and 0, and so, start the recursion.

3.2. Expected and Observed Information Matrices Recall that, the Fisher information matrix is given by (θ)= I ji,j i,j=α,β,δ,λ , which entries are equal to minus the second partial derivatives of the log-likelihood function given = − ∂2 (θ) in (18) with respect to the parameters in the model. They are denoted as jαα ∂α2 l , and so on. So we have n 1 n λ n jαα = − + ∑ 3a2 + 2δ|a | + ∑ a w (2 + λa B ) , α2 α2 ti ti α2 ti i ti i i=1 i=1

 n β2 − 2 n + β 1 t 1 ti jβα = − ∑ i + ∑ (δsgn(a )+λw (−1 + λa B )) √ , α3β2 α2β3/2 ti i ti i i=1 ti 2 i=1 ti

n n n + β n 1 1 1 3ti jββ = − + ∑ + ∑ t + ∑ (δsgn(a ) − λw ) √ β2 + β α2β3 i αβ5/2 ti i 2 = ti = 4 = ti i 1 i 1 i 1 λ2 n t + β 2 + ∑ w B i√ , α2β3/2 i i 4 i=1 ti n n + β 1 1 ti jδα = − ∑ |a |, jδβ = − ∑ sgn(a ) √ , α ti α2β3/2 ti i=1 2 i=1 ti n n + β 1 1 ti jλα = ∑ a w (1 − λa B ) , jλβ = ∑ w (1 + λa B ) √ , α ti i ti i αβ3/2 i ti i i=1 2 i=1 ti n 2 = ( (δ − )+ ) = = = λλ = ∑ jδδ n wδ wδ 1 , jδα jδβ jλδ 0, j ati wiBi i=1 φ(λ ) = at = φ(δ) ( − Φ(δ)) = λ + where w Φ(λ ) , wδ / 1 and B at w. at The Fisher (expected) information matrix would be obtained by computing the expected values of the above second derivatives. Taking in this matrixδ = λ = 0, that is, T ∼ BS(α, β), and, using results in [21], we have ⎛ & ⎞ 2 − 1 2 ⎜ α2 0 α π 0 ⎟ ⎜ α (α) ⎟ ⎜ α−2β−2 + √q √1 1 ( ) ⎟ ⎜ 0 1 π 0 π αβ3/2 A1 t ⎟ I(θ)=⎜ & 2 2 ⎟ , ⎜ − 1 2 − 2 ⎟ ⎝ α π 01π 0 ⎠ √1 1 2 0 A (t) 0 π 2π αβ3/2 1 & +β π exp( 2 ) ( )=E t√i (α)=α 2 − α2 ( 2 ) (·) where A1 t t , q π 2 er f c α , with er f c the error function, i.e.,  ∞ i ( )= √2 (− 2) er f c x π x exp t dt, see [29]. It can be shown that |I(θ)| = 0, so the Fisher information matrix is not singular at δ = λ = 0.

72 Symmetry 2019, 11, 1305

Hence, for large samples, the MLE, θ,ofθ is asymptotically normal, that is, √ L θ − θ −→ ( (θ)−1) n N4 0, I , resulting that the asymptotic variance of the MLE, θˆ, is the inverse of Fisher information matrix I(θ). Since the parameters are unknown, usually the observed information matrix is considered where the unknown parameters are estimated by ML. Asymptotic confidence intervals for the parameters in the FBS model can be obtained from these results.

4. Numerical Illustrations The numerical illustrations introduced next are aimed to show that the FBS model can be an alternative to modelling unimodal or bimodal data from different areas. First illustration is related to nickel content in soil samples analyzed at the Mining Department (Departamento de Minas) of Universidad de Atacama, Chile. We start by showing that both, BS and skew-BS (SBS) models are not able to capture bimodality present in this data set. Thus, the FBS model turned out to be a good option, to fit the data even better than a mixture of two normal distributions, which is another competing alternative to fit bimodal data. Second illustration is related to air pollution in New York city in USA, which was previously analyzed in [ 16,30]. In this case, it is shown that FBS model again provides a better fit than BS and SBS. As competing model the extended Birnbaum-Saunders (EBS) is also considered. Recall that the EBS(α, β, σ, ν, λ) is a five-parameter model proposed in [ 16] where the parameter σ affects the kurtosis;ν and λ affect the skewness; andα and β the shape and scale as in the usual BS model. We highlight that, for this dataset, theFBS(α, β, δ, λ) model provides a better fit than that given by the EBS(α, β, σ, ν, λ) in [16] with the merit of using less parameters.

4.1. Nickel Concentration For illustrative purposes, we apply the FBS model to a data set related to nickel content in soil samples. This data set encompasses 85 observations of the variable concentration of nickel with sample mean = 21.588, sample standard deviation = 16.573, sample asymmetry = 2.392 and sample kurtosis = 8.325, much higher than expected with the ordinary BS distribution.

4.1.1. FBS versus the BS and SBS distributions To fit the nickel concentration variable, we use the BS, skew BS (SBS) and FBS models. Using function optim from the R-package, [28], the following point estimates (and their standard errors) are obtained for each of the three models under consideration BS model: αˆ = 0.789 (0.060) and βˆ = 16.382 (1.296). SBS model: αˆ = 1.073 (0.201), βˆ = 8.841 (1.998) and λˆ = 1.252 (0.590). FBS model: αˆ = 0.870 (0.104), βˆ = 5.072 (0.763), δˆ = −1.520 (0.282) and λˆ = 1.405 (0.341). The bimodal hypothesis can be formally tested as follows

δ = δ = H0 : 0 versus H1 : 0, which is equivalent to compare models SBS versus FBS. Given the nonsigularity of the Fisher information matrix, and since these models are nested, we can consider the likelihood ratio statistics, namely Λ = (α β λ) (α β δ λ) 1 LSBS , , /LFBS , , , . − (Λ )= It is obtained 2 log 1 5.618, which is greater than the 5% chi-square critical value with one degree of freedom (df), which is equal to 3.84. Therefore, the null hypothesis of no-bimodality is rejected at the 5% critical level, leading to the conclusion that FBS model fits better than the unimodal SBS model to the nickel concentration data.

73 Symmetry 2019, 11, 1305

To compare the FBS model with the BS model, consider to test the null hypothesis of a BS distribution versus a FBS distribution, that is

(δ λ)=( ) (δ λ) =( ) H0 : , 0, 0 vs H1 : , 0, 0

Λ = (α β) (α β δ λ) using the likelihood ratio statistics based on the ratio 2 LBS , /LFBS , , , . After − (Λ )= substituting the estimated values, we obtain 2 log 2 7.628, which is greater than the 5% chi-square critical value with 2 df, which is 5.99. Therefore the FBS is preferred to BS model for this data set.

4.1.2. FBS versus a Mixture of Normal Distributions Another model widely applied in such situations of bimodality is the mixture of two normal distributions. The normal mixture model is given by:

( μ σ μ σ )= ( μ σ )+( − ) ( μ σ ) f x; 1, 1, 2, 2, p pf1 x, 1, 1 1 p f2 x; 2, 2 (23)

(μ σ ) = < < where fj is a normal distribution with parameters j, j , j 1, 2 and 0 p 1. (23) is denoted by (μ σ μ σ ) MN 1, 1, 2, 2, p . To compare FBS model with the MN model, we propose the Akaike information criterion (AIC), see [31], namely AIC = −2ˆ(·)+2k, the modified AIC criterion (CAIC), typically called the consistent AIC, namely CAIC = −2ˆ(·)+(1 + log(n))k and the Bayesian Information Criterion, BIC, BIC = −2ˆ(·)+log(n)k, where k is the number of parameters andˆ(·) is the log-likelihood function evaluated at the MLEs of parameters. The best model is the one with the smallest AIC or CAIC or BIC. (μ σ μ σ ) Now we compare the FBS with MN 1, 1, 2, 2, p . The estimated mixture model is

MN(15.348, 6.622, 40.908, 21.960, 0.755) with AIC = 674.849, CAIC = 692.061 and BIC = 687.062. On the other hand, for the FBS model, we have AIC = 671.859, CAIC = 685.628 and BIC = 681.630. According to these criteria, the FBS model provides a better fit to the data of nickel concentration.

4.1.3. FBS versus a Mixture of Log-Normal Distributions Following reviewer’s recommendations, a mixture of two log-normal distributions is also considered. The log-normal mixture model will be given by (23) with fj the pdf of a log-normal (μ σ ) = < < (μ σ μ σ ) distribution with parameters j, j , j 1, 2 and 0 p 1, and it is denoted by MLN 1, 1, 2, 2, p . The estimated mixture model is

MLN(2.829, 0.177, 2.8275, 0.877, 0.327) with AIC = 663.571, CAIC = 680.784 and BIC = 675.784. All of them are less than those corresponding to FBS. So, according to these criteria, the mixture ot two log-normal distributions provides a better fit to this dataset than the FBS model. This discussion illustrates that, quite often, the final selection of a model is a matter of choice. FBS model can be considered as appropriate if we want to use a more parsimonious model, and this is better than other BS models and a mixture of two normal distributions. On the other hand, based on AIC, CAIC and BIC, the mixture of two log-normal would be preferred but this model has more parameters than the FBS distribution and may present problems of identifiability. Anyway, the final choice must be properly justified. Figure 4 depicts maximized likelihoods and empirical cdf for variable nickel concentration revealing that FBS model fitting is quite good.

74 Symmetry 2019, 11, 1305     &XPXODWLYHGLVWULEXWLRQ                     W  (a) (b) Figure 4. (a) Plots for FBS, (solid line), MLN (dashed line), BS (dotted line) and SBS (dotted and dashed line) models . (b) Empirical cdf with estimated FBS cdf (dashed line) and estimated BS cdf (dotted line).

Remark 3. Going through the origin of this data set, the bimodal behavior of the nickel concentration statistical model seems to be due to the fact that the samples were taken according to different lithologies. Lithology classifies according to the physical and chemical elements in rock formation. Mining operations found different lithologies in these samples, as it is depicted in Figure 4.

4.2. Air Pollution The concentration of average air pollutants has been used in epidemiological surveillance as an indicator of the level of atmospheric contamination. Among its associated adverse effects in humans, diseases such as bronchitis are found. The distribution of this concentration has a bias to the right, and is always positive. It is typically assumed that the data on air pollutant concentrations are uncorrelated and independent and thus they do not require the diurnal or cyclic trend analysis, see [ 32]. The data set studied in this section corresponds to daily measures of ozone levels (inppb = ppm × 1000) in the city of New York, USA, from May to September, 1973, collected by the New York State Conservation Department. The sample mean, standard deviation, asymmetry and kurtosis coefficients are given, respectively, by 42.129, 32.987, 1.209 and 1.112.

4.2.1. FBS versus the BS and SBS Distributions Maximum likelihood estimators, their estimated standard errors (in parenthesis), for the BS, SBS and FBS models, were computed, the results are: BS model: αˆ = 0.982 (0.064) and βˆ = 28.031 (2.265). SBS model: αˆ = 1.270 (0.235), βˆ = 14.831 (4.019) and λˆ = 1.066 (0.533). FBS: αˆ = 5.160 (0.481), βˆ = 78.000 (0.008), δˆ = 3.991 (0.050) and λˆ = −9.135 (2.417). For this data set, the log-likelihood ratio statistics to test BS vs FBS and SBS vs FBS are given by

− (Λ )= − (Λ )= 2 log 1 12.812 and 2 log 2 5.828, which are greater than the corresponding 5% critical values from the chisquare distribution, which are 5.99 (with 2 df) and 3.84 (with one df), respectively. So, we can conclude that the unimodal FBS model provides a better fit to this dataset than BS and FBS models.

75 Symmetry 2019, 11, 1305

4.2.2. FBS versus the Extended BS (EBS) Model Fitting the five-parameter EBS model, EBS(α, β, σ, ν, λ), proposed in [ 16] as the best for this dataset, and whose point estimates for the parameters and summaries for comparison are equal to

EBS(0.780, 0.596, 3.618, −3.539, −0.096),

CAIC = 1111.154 and BIC = 1106.154. On the other hand, for the FBS model, it was obtained CAIC = 1108.396 and BIC = 1104.396. Then, according to the CAIC and BIC criteria, FBS model presents the best fit to this data set dealing with the daily ozone level concentration in the atmosphere. Figure 5 depicts the and the fitted density curves for the data set studied and empirical cdf for variable daily ozone level concentration in the atmosphere, while the dashed line corresponds to the cfd for FBS model.

 GHQVLW\ &XPXODWLYHGLVWULEXWLRQ            

       

W W

(a) (b) Figure 5. (a) Plots for FBS, (solid line), BS (dotted line), SBS (dashed line) and EBS (dotted and dashed line) models. (b) Empirical cdf with estimated FBS cdf (dashed line).

5. Conclusions We have introduced a new family of distributions able to model skewness, and bimodality in the BS distribution. We have discussed several of its properties. Explicit expressions for the cdf are given in terms of the cdf of a bivariate normal variable. Non-linear equations to obtain the modes of this distribution are provided. The estimation of parameters is carried out via maximum likelihood. We highlight that the ML equations must be solved by using iterative methods. The information matrix is non-singular and therefore likelihood ratio tests to compare this model with other nested models can be implemented. The interest and flexibility of our proposal is supported with two illustrations to real data in which we show that:

(i) the FBS model provides consistently better fits than the BS and SBS models (they can be considered relevant precedents of our proposal) (ii) the FBS distribution can improve the fit provided by other competing models designed to deal with bimodality (such as a mixture of normal distributions). It can also perform better for unimodal situations in which a generalized BS model with skewness parameters must be applied, such as the EBS model proposed in [ 16]. We highlight that in both situations FBS provides a better fit with a more parsimonious model (less number of parameters), and the problem of identifiability of mixtures can be circumvented.

Therefore the outcome of these practical demonstrations show that the new family is very flexible and widely applicable.

76 Symmetry 2019, 11, 1305

Author Contributions: All the authors contributed significantly to the present paper. Funding: The research of G. Martínez-Flórez was supported by Grant Proyecto Universidad de Córdoba: Distribuciones de probabilidad asimétrica bimodal con soporte positivo (Colombia). The research of I. Barranco-Chamorro was supported by Grant CTM2015-68276-R (Spain). H.W. Gómez was supported by Grant SEMILLERO UA-2019 (Chile). Acknowledgments: The authors would like to thank the editor and the anonymous referees for their comments and suggestions, which significantly improved our manuscript. Conflicts of Interest: The authors declare no conflict of interest.

Appendix A Proof of Proposition 1 (cdf in the FBS model). From Equation (7), T is a monotonically increasing function of Z ∼ FSN(δ, λ). Therefore the cdf of T is given by

( )= ( ) FT t FZ at (A1) (·) where FZ denotes the cdf of Z and at was given in (6). (i) First, we obtain the cdf of Z ∼ FSN(δ, λ) It can be seen in Gómez et al. [2], Proposition 4, that the pdf of Z ∼ FSN(δ, λ) is φ( − δ)Φ(λ ) < ( )= cδ z z ,ifz 0 fZ z cδφ(z + δ)Φ(λz),ifz ≥ 0.

Let us consider the case for z < 0 z z ( )= ( ) = φ( − δ)Φ(λ ) FZ z fZ t dt cδ t t dt −∞ −∞

By making the change of variablev = t − δ, and later, taking into account thatΦ(·) is the cdf of a N(0, 1) distribution, we have that z−δ z−δ λ(v+δ) ( )= φ( )Φ(λ( + δ)) = φ( )φ( ) FZ z cδ v v dv cδ v s dsdv (A2) −∞ −∞ −∞ The integrand in (A2) is the joint pdf of two independent N(0, 1) rv’s, (S, V), i.e.,    S 0 10 ∼ N , V 2 0 01

Note that (A2) can be rewritten as

F (z)=cδ Pr [S − λV ≤ λδ, V ≤ z − δ] Z S − λV λδ = cδ Pr √ ≤ √ , V ≤ z − δ + λ2 + λ2 1 1 λδ = Φ √ − δ cδ BNλ , z (A3) 1 + λ2 Φ ( ) μ =( ) where BNλ x, y denotes the cdf of a bivariate normal distribution, with mean vector 0, 0 and covariance matrix Ωλ given in (10). For z > 0, we have that 0 z ( )= ( ) + ( ) FZ z fZ t dt fZ t dt −∞ 0 z = ( )+ φ( + δ)Φ(λ ) FZ 0 cδ t t dt . (A4) 0

77 Symmetry 2019, 11, 1305

From (A3), it follows that λδ ( )= ( )= Φ √ −δ FZ 0 lim FZ z cδ BNλ , (A5) z→0− 1 + λ2

On the other hand, proceeding similarly to the previous case (change of variablev = t + δ), it can be proved that z S − λV λδ φ(t + δ)Φ(λt)dt = Pr √ ≤−√ , δ < V ≤ z + δ + λ2 + λ2 0 1 1 λδ λδ = Φ − √ + δ − Φ − √ δ BNλ , z BNλ , (A6) 1 + λ2 1 + λ2

Therefore, from (A3)–(A6), we have just proved that the cdf of Z is ⎧ ⎪ Φ √ λδ − δ < ⎪ cδ λ , z , if z 0 ⎨ BN 1+λ2 F (z)= (A7) Z ⎪ ⎩ Φ √ λδ −δ + Φ − √ λδ + δ − Φ − √ λδ δ ≥ cδ λ , λ , z λ , , if z 0 BN 1+λ2 BN 1+λ2 BN 1+λ2

(ii) Finally, the expression for the cdf of T ∼ FBS(α, β, δ, λ) given in (9) follows from (A1) and (A7).

Proof of Proposition 3 (Modes in the FBS model). Recall that, from (A1), the pdf of T is given by

( )= ( ) = φ(| | + δ)Φ(λ ) fT t fZ at at cδat at at

(·) ∼ (δ λ) where fZ denotes the pdf of Z FSN , , at was given in (6) and

− − ∂ t 3/2β 1/2 a = a = (t + β) . (A8) t ∂t t 2α < < < β (·) For at 0, or equivalently 0 t , consider the first derivative with respect to t of fT and equating to zero, we have . / ∂ f (t)=cδ a φ(−a + δ)Φ(λa ) = 0 (A9) T ∂t t t t

By using that φ (z)=−zφ(z), it can be proved that (A9) is equivalent to

. / 2 [(δ − )Φ(λ )+λφ(λ )] + Φ(λ )= at at at at at at 0 (A10)

> ∀ > β > Since at 0, t 0( 0), we have that (A10) is equivalent to (12). ∂ > > β ( )= δ { φ( + δ)Φ(λ )} = Similarly, for at 0, i.e., t ,from fT t c ∂t at at at 0,(13) is obtained.

Remark A1 (Comments to Proposition 3). In order to illustrate the use of Equations (12) and (13) next cases are considered.

1. Consider the pdf given in Figure 1a, case α = 0.75, β = 1, λ = 1, δ = 0.75. In this setting there do not ∗ ∈ ( β) ∗ > β exist t1 0, and t2 satisfying (12) and (13), respectively. It can be checked than the distribution is unimodal and the mode is at β. α = β = λ = − δ = ∗ ∈ ( β) 2. Figure 1b, case 0.75, 1, 1, 0.75. There exists t1 0, satisfying (12) and there ∗ > β ∗ does not exists t2 satisfying (13). Then the distribution is unimodal and the mode is at t1. ∗ ∈ ( β) ∗ > β 3. Figure 2a,b, in all cases under consideration, there exist t1 0, and t2 satisfying (12) and (13). ∗ ∗ Then the distribution is bimodal and the modes are t1 and t2.

78 Symmetry 2019, 11, 1305

Proof of Theorem 1 (pth quantile, change of scale and reciprocity). (i) (14) follows from the fact + that (7) is one-to-one function preserving the order from R to R . (ii) Note that the pdf of T can be rewritten as

( )= (α β)φ(| (α β)| + δ)Φ(λ (α β)) fT t cδat , at , at , (A11)

−1 with cδ =(1 − Φ(δ)) , at = at(α, β) given in (6) and

− − ∂ t 3/2β 1/2 a = a (α, β)= a (α, β)= (t + β) . (A12) t t ∂t t 2α

= > ( )=| | ( y α β δ λ) Let Y kT with k 0. By applying the Jacobian technique fY y J fT k ; , , , with | | = 1 (α β)= (α β) J k .From(6), ay/k , ay , k , and from (A12)

− − y 3/2(kβ) 1/2 |J|a (α, β)= (y + kβ)=a (α, kβ) . y/k 2α y Therefore ( )= (α β)φ(| (α β)| + δ)Φ(λ (α β)) fY y cδay , k ay , k ay , k , i.e., Y ∼ FBS(α, kβ, δ, λ). − − − − = 1 | | = 2 − (α β)=− (α β 1) | | (α β)= (α β 1) (iii) Let be Y T . In this case J Y , ay 1 , ay , , and J ay−1 , ay , . Therefore ( )=| | ( −1 α β δ λ)= (α β−1)φ | (α β−1)| + δ Φ −λ (α β−1) fY y J fT y ; , , , cδay , ay , ay , ,

− − i.e., Y = T 1 ∼ FBS(α, β 1, δ, −λ).

References

1. Birnbaum, Z.W. Effect of linear truncation on a multinormal population. Ann. Math. Statist.1950, 21, 272–279. [CrossRef] 2. Gómez, H.W.; Elal-Olivero, D.; Salinas, H.S.; Bolfarine, H. Bimodal Extension Based on the Skew-normal Distribution with Application to Pollen Data. Environmetrics 2009, 22, 50–62. [CrossRef] 3. Lehmann, E.L. The power of rank tests. Ann. Mathematical Stat. 1953, 24, 23–43. [CrossRef] 4. Roberts, C. A Correlation Model Useful in the Study of Twins. J. Am. Stat. Assoc. 1966, 61, 1184–1190. [CrossRef] 5. O’Hagan, A.; Leonard, T. Bayes estimation subject to uncertainty about parameter constraints. Biometrika 1976, 63, 201–203. [CrossRef] 6. Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Stat. 1985, 12, 171–178. 7. Arnold, B.C.; Gómez, H.W.; Salinas, H.S. On multiple contraint skewed models’. Statistics2009, 43, 279–293. [CrossRef] 8. Bolfarine, H.; Martínez–Flórez, G.; Salinas, H.S. Bimodal symmetric-asymmetric families. Commun. Stat. Theory Methods 2018, 47(2), 259–276. [CrossRef] 9. Durrans, S.R. Distributions of fractional order statistics in hydrology. Water Resour. Res.1992, 28, 1649–1655. [CrossRef] 10. Pewsey, A.; Gómez, H.W.; Bolfarine, H. Likelihood-based inference for power distributions. Test 2012, 21, 775–789. [CrossRef] 11. Chan, J.; Choy, S.; Makov, U.; Landsman, Z. Modelling insurance losses using contaminated generalised beta type-II distribution. ASTIN Bull. 2018, 48, 871–904. 12. Birnbaum, Z.W.; Saunders, S.C. Estimation for a family of life distributions with applications to fatigue. J. Appl. Probab. 1969, 6, 328–347. [CrossRef]

79 Symmetry 2019, 11, 1305

13. Birnbaum, Z.W.; Saunders, S.C. A New Family of Life Distributions. J. Appl. Probab. 1969, 6, 319–327. [CrossRef] 14. Díaz-García, J.A.; Leiva-Sánchez, V. A new family of life distributions based on the elliptically contoured distributions. J. Statist. Plann. Inference 2005, 128, 445–457. [CrossRef] 15. Vilca-Labra, F.; Leiva-Sanchez, V. A new fatigue life model based on the family of skew-elliptical distributions. Commun. Stat. Theory Methods 2006, 35, 229–244. [CrossRef] 16. Leiva, V.; Vilca-Labra, F.; Balakrishnan, N.; Sanhueza, A. A skewed sinh-normal distribution and its properties and application to air pollution. Commun. Stat. Theory Methods 2010, 39, 426–443. [CrossRef] 17. Castillo, N.O.; Gómez, H.W.; Bolfarine, H. Epsilon Birnbaum-Saunders distribution family: Properties and inference. Stat. Pap. 2011, 52, 871–883. [CrossRef] 18. Gómez, H.W.; Olivares, J.; Bolfarine, H. An extension of the generalized Birnbaum- Saunders distribution. Stat. Probab. Lett. 2009, 79, 331–338. [CrossRef] 19. Reyes, J.; Barranco-Chamorro, I.; Gallardo, D.I.; Gómez, H.W. Generalized Modified Slash Birnbaum-Saunders Distribution. Symmetry 2018, 10, 724. [CrossRef] 20. Reyes, J.; Barranco-Chamorro, I.; Gómez, H.W. Generalized modified with applications. Commun. Stat. Theory Methods 2019, doi:10.1080/03610926.2019.1568484. [CrossRef] 21. Martínez-Flórez, G.; Bolfarine, H.; Gómez, H.W. An alpha-power extension for the Birnbaum-Saunders distribution. Statistics 2014, 48, 896–912. [CrossRef] 22. Olmos, N.M.; Martínez-Flórez, G.; Bolfarine, H. Bimodal Birnbaum-Saunders Distribution with Application to Corrosion Data. Commun. Stat. Theory Methods 2017, 46, 6240-6257. [CrossRef] 23. Shakhatreh, M.; Lemonte, A.; Moreno-Arenas, G. The log-normal modified Weibull distribution and its reliability implications. Reliab. Eng. Syst. Saf. 2019, 188 , 6–22. [CrossRef] 24. Prataviera, F.; Ortega, E.M.M.; Cordeiro, G.M.; Pescim, R.R. A new generalized odd log-logistic flexible Weibull regression model with applications in repairable systems. Reliab. Eng. Syst. Saf. 2018, 176, 13–26. [CrossRef] 25. Ng, H.K.T.; Kundu, D.; Balakrishnan, N. Modified moment estimation for the two-parameter Birnbaum-Saunders distribution. Comput. Statist. Data Anal. 2003, 43, 283–298. [CrossRef] 26. Barros, M.; Paula, G.A.; Leiva, V. A new class of survival regression models with heavy-tailed errors: Robustness and diagnostics. Lifetime Data Anal. 2008, 14, 316–332. [CrossRef] 27. Lemonte, A.; Cribari-Neto, F.; Vasconcellos, K. Improved statistical inference for the two-parameter Birnbaum-saunders distribution. Comput. Stat. Data Anal. 2007, 51, 4656–4681. [CrossRef] 28. R Development Core Team. A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2017. 29. Gradshteyn, I.S.; Ryzhik, I.M. Table of Integrals, Series, and Products; Academic Press: New York, NY, USA, 2007. 30. Nadarajah, S. A truncated inverted with application to air pollution data. Stoch. Environ. Res. Risk Assess. 2008, 22, 285–289. [CrossRef] 31. Akaike, H. A new look at statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [CrossRef] 32. Gokhale, S.; Khare, M. Statistical behavior of carbon monoxide from vehicular exhausts in urban environments. Environ. Model. Softw. 2007, 22, 526–535. [CrossRef]

c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

80

S symmetry S

Article An Asymmetric Distribution with Heavy Tails and Its Expectation–Maximization (EM) Algorithm Implementation

Neveka M. Olmos 1, Osvaldo Venegas 2,*, Yolanda M. Gómez 3 and Yuri A. Iriarte 1 1 Departamento de Matemáticas, Facultad de Ciencias Básicas, Universidad de Antofagasta, Antofagasta 1240000, Chile; [email protected] (N.M.O.); [email protected] (Y.A.I.) 2 Departamento de Ciencias Matemáticas y Físicas, Facultad de Ingeniería, Universidad Católica de Temuco, Temuco 4780000, Chile 3 Departamento de Matemática, Facultad de Ingeniería, Universidad de Atacama, Copiapó 1530000, Chile; [email protected] * Correspondence: [email protected]; Tel.: +56-45-2205411

Received: 8 August 2019 ; Accepted: 2 September 2019 ; Published: 10 September 2019

Abstract: In this paper we introduce a new distribution constructed on the basis of the quotient of two independent random variables whose distributions are the half-normal distribution and a power of the exponential distribution with parameter 2 respectively. The result is a distribution with greater kurtosis than the well known half-normal and slashed half-normal distributions. We studied the general density function of this distribution, with some of its properties, moments, and its coefficients of asymmetry and kurtosis. We developed the expectation–maximization algorithm and present a simulation study. We calculated the moment and maximum likelihood estimators and present three illustrations in real data sets to show the flexibility of the new model.

Keywords: slashed half-normal distribution; kurtosis; likelihood; EM algorithm

1. Introduction In recent years, for data with positive support, specifically, lifetime, or reliability, the half-normal (HN) model has been widely used. The probability density function (pdf) is given by ( σ)= 2 φ x { > } f x; σ σ I x 0 , where σ > 0 is the scale parameter and φ(·) represents the standard normal pdf. We denote this by writing X ∼ HN(σ). Some generalizations for this model are proposed by Cooray and Ananda [1], Cordeiro et al. [2], Bolfarine and Gómez [3] and Gómez and Vidal [4]. Olmos et al. [5] extended the HN distribution by incorporating a kurtosis parameter q, with the purpose of obtaining heavier tails, i.e., it has greater kurtosis than the base model. They called this model the slashed half-normal (SHN) distribution. Its construction is based on considering the quotient of two independent random variables, with random variable X ∼ HN(σ) in the numerator and the U ∼ U(0, 1) in the denominator (See Rogers and Tukey [ 6] and Mosteller and Tukey [ 7] for more details). Thus a model is obtained that has more flexible coefficients of asymmetry and kurtosis than the HN model. We say that a random variable T follows a SHN if its pdf is given by 2q −( + ) 1 f (t; σ, q)=q σqΓ((q + 1)/2)t q 1 G t2; (q + 1)/2, , t > 0, (1) T π 2σ2

Symmetry 2019, 11, 1150; doi:10.3390/sym1109115081 www.mdpi.com/journal/symmetry Symmetry 2019, 11, 1150

 σ > > ( )= 2 ( ) where 0 is a scale parameter, q 0 is a kurtosis parameter, G z; a, b 0 g x; a, b dx is the cumulative distribution function (cdf) of the andg(·; a, b) is the pdf of the gamma model with shape and rate parameters a and b, respectively. Reyes et al. [8] introduced the modified slash (MS) distribution. We say that M has a MS distribution if 1 M = Z/E q , (2) the construction of which is based on considering an exponential (Exp) distribution with parameter 2 in the denominator, i.e., they consider that E ∼ Exp(2). The motivation of the selection of the Exp(2) distribution is given in Reyes et al. [8]. The result of this work shows that the MS model has a greater coefficient of kurtosis and this characteristic is very important for modeling data sets when they contain atypical observations. The principal goal of this article is to use the idea published by Reyes et al. [ 8] to construct an extension of the half-normal model with a greater range in the coefficient of kurtosis than the SHN model, in order to use it to model atypical data. This will allow us obtain a new model generated on the basis of a scale mixture between an HN and a Weibull (Wei) distribution. The rest of the paper is organized as follows: Section 2 contains the representation of this model and we generate the density of the new family, its basic properties and moments, and its coefficients of asymmetry and kurtosis. In Section 3 we make inferences using the moments and maximum likelihood (ML) methods. In Section 4 we implement the expectation–maximization (EM) algorithm. In Section 5 we carry out a simulation study for parameter recovery. We show three illustrations in real datasets in Section 6 and finally in Section 7 we present our conclusions.

2. An Asymmetric Distribution In this section we introduce the representation, its pdf, and some important properties and graphs to show the flexibility of the new model.

2.1. New Distribution The representation of this new distribution is

X T = , (3) Y1/q where X ∼ HN(σ) and Y ∼ Exp(2) are independent, σ > 0, q > 0. We call the distribution of T the modified slashed half-normal (MSHN) distribution. This is denoted by T ∼ MSHN(σ, q).

2.2. Density Function The following Proposition shows the pdf of the MSHN distribution with scale parameter σ and kurtosis parameter q, generated using the representation given in (3).

Proposition 1. Let T ∼ MSHN(σ, q). Then, the pdf of T is given by + ( σ )=√ 2q q 1 2 q 1 fT t; , q + N , , , , (4) 2πσ2tq 1 2 tq 2 2σ2 where t > 0, σ > 0,q> 0, and N(·, ·, ·, ·) is defined in Lemma 1 in the Appendix A.

Proof. Using the stochastic representation given in (3) and the Jacobian method, we obtain that the density function associated with T is given by  ∞ 2 2 ( σ )= √ 4q q − t w + q fT t; , q πσ2 w exp 2w dw. 2 0 2σ2

82 Symmetry 2019, 11, 1150

Making the change of variable u = t2w2 we have,  ( ∞ − q/2 2q q 1 u 2u ( σ )=√ 2 − + fT t; , q πσ2 q+1 u exp q du. 2 t 0 2σ2 t

Hence, applying the Lemma 1 as set forth in the Appendix A, we obtain the result.

Figure 1 depicts plots of the density of the MSHN distribution for different values of parameterq.

0.10 q = 1 q = 3 0.08 q = 5 on i 0.06 unct f ty i 0.04 ens D

0.02

0.00

0 1020304050 t MSHN(σ, q)

Figure 1. The density function for different values of parameterq and σ = 1 in the MSHN distribution.

We perform a brief comparison illustrating that the tails of the MSHN distribution are heavier than those of the SHN distribution. Table 1 shows the tail probability for different values in the SHN and MSHN models. It is immediately apparent that the MSHN tails are heavier than those of the SHN distribution.

Table 1. Tails comparison for different slashed half-normal (SHN) and modified slashed half-normal (MSHN) distributions.

Distribution P(T > 3) P(T > 4) P(T > 5) P(T > 6) P(T > 7) SHN(1, 0.5) 0.3781 0.3497 0.3239 0.3009 0.2805 MSHN(1, 0.5) 0.5304 0.48289 0.4466 0.4176 0.3936 SHN(1, 1) 0.1777 0.1570 0.1385 0.1224 0.1086 MSHN(1, 1) 0.3678 0.2992 0.2519 0.2173 0.19102 SHN(1, 3) 0.0350 0.0205 0.0120 0.0044 0.0034 MSHN(1, 3) 0.0901 0.0438 0.0238 0.0142 0.0091

2.3. Properties In this sub-section we study some properties of the MSHN distribution.

Proposition 2. Let T ∼ MSHN(σ, q), then when σ = q = 1 the density is ( )= 4 √1 − 2 ( 2)Φ − 2 > fT t exp 2/t , t 0, (5) t2 2π t t

83 Symmetry 2019, 11, 1150 where Φ(·) is the cdf of the standard normal.

Proof. Using Proposition 1 for σ = q = 1, we have, ∞ ( )=√ 2 2 1 1 = √ 2 − 2 1/2 − 1 > fT t N 1, , , exp x x dx, t 0. (6) 2πt2 z 2 2 2πt2 0 z 2σ2

Changing the variable x = u2 we obtain the result. | = ∼ σ 1/q = ∼ ( ) ∼ (σ ) Proposition 3. If T W w HN w and Y W Wei q,1/2 then T MSHN , q .

Proof. Since the marginal pdf of T is given by

∞ ∞ 4q − w2t2 − q ( σ )= ( | ) ( ) = √ q σ2 2w fT t; , q fT|W t w fW w dw w e 2 dw, 0 σ 2π 0 and using the Lemma 1 in the Appendix A, the result is obtained.

Proposition 4. Let T ∼ MSHN(σ, q).Ifq → ∞ then T converges in law to a random variableT ∼ HN(σ).

Proof. Let T ∼ MSHN(σ, q) and T = X , where X ∼ HN(σ) and Y ∼ Exp(2). Y1/q We study the convergence in law of T, since Y ∼ Exp(2) then W = Y1/q ∼ Wei(q,1/2), we have thatE (W − 1)2 = 1 Γ(1 + 2/q) − 2 Γ(1 + 1/q)+1. If q → ∞ then E (W − 1)2 → 0, i.e., we have 22/q 21/q P W −→ 1 (see Lehmann [9]). ∼ (σ ) = X Since T MSHN , q , by applying Slutsky’s Lemma (see Lehmann [9]) to T W , we have

L T −→ X ∼ HN(σ), q → ∞, (7) that is, for increasing values of q, T converges in law to a HN(σ) distribution.

Remark 1. Proposition 2 shows us that the MSHN(1, 1) distribution has a closed-form expression. Proposition 3 shows that an MSHN(σ, q) distribution can also be obtained as a scale mixture of an HN and a Wei distribution. This property is very important since it makes it possible to generate random numbers and implement the EM algorithm. Proposition 4 implies that, if q → ∞ then the cdf of an MSHN(σ, q) model approaches to the cdf of a HN(σ) distribution.

2.4. Moments In this sub-section, the following proposition shows the computation of the moments of a random variable T ∼ MSHN(σ, q). Hence, it also displays the coefficients of asymmetry and kurtosis.

Proposition 5. Let T ∼ MSHN(σ, q). Then the r-th moment of T is given by r 1 + 1 2 q 2 r + 1 q − r μ = E(Tr)= √ σrΓ Γ , q > r, (8) r π 2 q where Γ(·) denotes the .

Proof. Let W ∼ Wei(q,1/2) and using Proposition 3, we have  2r r + 1 − 2r r + 1 − μ = E(Tr)=E (E(Xr|Wr)) = E Γ σrW r = Γ σrE W r , r π 2 π 2

84 Symmetry 2019, 11, 1150

− where E(W r)=2r/qΓ ((q − r)/q), q > r is the r-th moment of the inverse Weibull distribution .

Corollary 1. Let T ∼ MSHN(σ, q). Then the expectation and variance of T are given respectively by

1 + 1 2 q 2 q − 1 E(T)= √ σΓ , q > 1, and π q 2 +1 1 q − 2 1 q − 1 Var(T)=2 q σ2 Γ − Γ2 , q > 2. 2 q π q

∼ (σ ) β β Corollary 2. Let T MSHN , q . Then the coefficients of asymmetry ( 1) and kurtosis ( 2) are given by − − − − √1 Γ q 3 − √3 Γ q 1 Γ q 2 + √2 Γ3 q 1 π q 2 π q q π3 q β = , q > 3, and 1 − − 3/2 1 Γ q 2 − 1 Γ2 q 1 2 q π q − − − − − − 3 Γ q 4 − 4 Γ q 1 Γ q 3 + 3 Γ2 q 1 Γ q 2 − 3 Γ4 q 1 4 q π q q π q q π2 q β = , q > 4. 2 − − 2 1 Γ q 2 − 1 Γ2 q 1 2 q π q

Remark 2. Figure 2 shows graphs of the coefficients of the MSHN distribution compared with those of the SHN distribution. Note that the MSHN distribution presents higher asymmetry and kurtosis values than the SHN distribution.√ Furthermore, in both distributions when q → ∞ the coefficients of asymmetry and kurtosis − − converge to 2(4 − π)(π − 2) 3/2 and (3π2 − 4π − 12)(π − 2) 2, respectively; they coincide with the coefficients of the HN distribution.

MSHN MSHN SHN SHN HN HN s i urtos K symmetry A 4 6 8 10 12 14 12345

4 6 8 10 12 14 468101214 q q

Figure 2. Graph of the coefficients of asymmetry and kurtosis for the MSHN and SHN distributions.

3. Inference ∼ (σ ) Proposition 6. Let T1, ..., Tn be a random sample of size n of the T MSHN , q distribution. Then for q > 2, the moment estimators of σ and q are given by √ π σ = T M 1 + 1 − , (9) qM 1 2 q 2 Γ qM − − π 2Γ qM 2 − 2Γ2 qM 1 = T 2T 0, (10) qM qM

85 Symmetry 2019, 11, 1150 where T is the mean of the sample and T2 is the mean of the sample for the square of the observations.

Proof. From Proposition 5, and considering the first two equations in the moments method, we have

1 + 1 2 q 2 q − 1 2 q − 2 T = √ σΓ and T2 = 2 q σ2Γ . π q q σ σ σ Solving the first equation above for we obtain M given in (9). Substituting M in the second equation above, we obtain the result given in (10).

4. Em Algorithm The EM algorithm (Dempster et al. [10]) is a useful method for ML estimation in the presence of latent variables. To facilitate the estimation process, we introduce latent variablesW1, ..., Wn through the following hierarchical representation of the MSHN model: σ T | W = w ∼ HN and W ∼ Wei(q,1/2). i i i w i In this setting, we have that  w2t2 f (w|t) ∝ wq exp − + 2wq . c 2σ2

Therefore, the complete log-likelihood function can be expressed as

n w2t2 (θ| ) ∝ − (σ) − ∑ i i + ( | ) lc tc n log σ2 lc q wc , i=1 2 n n ( | )= ( )+ ( ) − q where lc q wc n log q q ∑ log wi 2 ∑ wi . i=1 i=1 = E[ | θ = θ] Letting wi Wi ti, , it follows that the conditional expectation of the complete log-likelihood function has the form

n w2t2 (θ|θ) ∝ − (σ) − ∑ i i + ( |θ) Q n log σ2 Q q , (11) i=1 2

n n ( |θ)= ( )+ − = E[ ( )| ] = E[ q| ] where Q q n log q qS1n 2S2n,q, with S1n ∑ log Wi ti and S2n,q ∑ Wi ti . i=1 i=1 As both quantities S1n and S2n,q have no explicit forms in the context of the MSHN model, they have to be computed numerically. Thus to compute Q(q|θ) we use an approach similar to that of Lee and Xu ([11], Section 3.1), i.e., considering{wr; r = 1, ..., R} to be a random sample from the conditional distribution W|(T = t, θ = θ), then Q(q|θ) can be approximated as

R 1 Q(q|θ) ≈ ∑ c(q|wr). R r=1 Therefore, the EM algorithm for the MSHN model is given by ( ) θ = θ k =(σ(k) (k)) (k) = E-step: Given , q , calculate wi , for i 1,...,n. ( ) CM-step I: Update σ k

(k) ( + ) S σ2 k 1 = u , 2

86 Symmetry 2019, 11, 1150

( ) (k+1) (k) (k+1) (k+1) k CM-step II: Fix α = σ , update q by optimizing q = arg max qQ(σ , q| θ ), ( ) n ( ) k = 1 k where Su n ∑ wi ti. i=1 (k+1) The E, CM-I and CM-II steps are repeated until a convergence rule is satisfied, say |l(θ ) − (k) l(θ )| is sufficiently small. Finally, standard errors (SE) can be estimated using the inverse of the observed information matrix.

Remark 3. i. For q → ∞, σ in M-step reduces to those obtained when the HN distribution is used; ii. An alternative to the CM-Steps II is obtained considering the idea in Lin et al. ([12], Section 3), by using the following estimation: ( ) CML-step: Update q k by maximizing the constrained actual log-likelihood function, i.e.

(k+1) (k+1) q = arg max q (σ , q).

5. Simulation We present a simulation study to assess the performance of the EM algorithm for the parameters σ and q in the MSHN model. We consider 1000 samples of three sample sizes generated from the MSHN model: n = 30, 50 and 100. To generate T ∼ MSHN(σ; q) the following algorithm was used: 1. Simulate X ∼ N(0, σ2) and Y ∼ Exp(2). | | 2. Compute T = X . Y1/q For each sample generated, the ML estimates were computed using the EM algorithm. Table 2 shows the mean of the bias estimated for each parameter (bias), its SE and the estimated root of the mean squared error (RMSE). From Table 2, we conclude that the ML estimates are quite stable. The bias is reasonable and diminishes as the sample size is increased. As expected, the terms SE and RMSE are closer when the sample size is increased, suggesting that the SE of the estimators is well estimated.

Table 2. Maximum likelihood (ML) estimations for parameters σ and q of the MSHN distribution. Standard error (SE), root of the mean squared error (RMSE).

True Value n = 30 n = 50 n = 100 σ q Estimator Bias SE RMSE Bias SE RMSE Bias SE RMSE σ 0.178 0.430 0.528 0.122 0.320 0.378 0.089 0.206 0.279 1 q 0.199 0.422 0.668 0.097 0.219 0.263 0.059 0.138 0.163 σ 0.111 0.355 0.407 0.078 0.258 0.295 0.042 0.172 0.186 1 2 q 1.006 2.500 2.603 0.480 1.105 1.519 0.182 0.458 0.562 σ 0.026 0.277 0.239 0.033 0.222 0.189 0.023 0.159 0.149 5 q 2.227 8.833 3.871 2.012 6.743 3.550 1.333 4.092 2.905 σ 0.284 0.835 0.973 0.192 0.617 0.665 0.104 0.414 0.481 1 q 0.168 0.356 0.571 0.094 0.215 0.263 0.058 0.141 0.166 σ 0.294 0.727 0.815 0.122 0.507 0.572 0.074 0.343 0.383 2 2 q 1.210 2.821 2.835 0.465 1.067 1.534 0.174 0.454 0.623 σ 0.057 0.544 0.454 0.066 0.441 0.371 0.044 0.305 0.290 5 q 2.456 8.991 3.934 2.089 6.712 3.615 1.545 4.150 3.075 σ 0.834 2.111 2.548 0.494 1.527 1.826 0.386 1.038 1.233 1 q 0.217 0.414 0.740 0.119 0.225 0.287 0.083 0.144 0.174 σ 0.658 1.782 2.065 0.293 1.285 1.475 0.209 0.872 0.966 5 2 q 1.218 2.872 2.836 0.413 1.018 1.414 0.188 0.489 0.694 σ 0.094 1.379 1.160 0.146 1.096 0.950 0.123 0.779 0.731 5 q 2.266 8.894 3.880 1.952 6.526 3.557 1.370 4.118 2.948

87 Symmetry 2019, 11, 1150

6. Aplications In this section we provide three applications to real datasets that illustrate the flexibility of the proposed model.

6.1. Application 1 Lyu [13] presents a data set related 104 times with programming in the Centre for Software Reliability (CSR). Some descriptive statistics are: mean = 147.8, variance = 60, 071.7, skewness = 3, σ = = and kurtosis = 14.6. The moment estimators for the MSHN model were M 74.085 and qM 2.402, which were used as initial values to compute the ML estimator in Table 3. For each distribution we report the estimated log-likelihood. To compare the competing models, we consider the Akaike information criterion (AIC) (Akaike [ 14]) and the Bayesian information criterion (BIC) (Schwarz [15]), which are defined asAIC = 2k − 2 log lik and BIC = k log(n) − 2 log lik, respectively, where k is the number of parameters in the model, n is the sample size and log lik is the maximum value for the log-likelihood function. Table 4 displays the AIC and BIC for each model fitted. Figure 3 presents the histogram of the data fitted with the HN, SHN and MSHN distributions, provided with the ML estimations. The QQ-plot for the MSHN and SHN distributions are presented in Figure 4.

Table 3. ML estimations with the corresponding SE for the models fitted. Half-normal (HN).

Parameters HN (SE) SHN (SE) MSHN (SE) σ 285.191 (19.774) 20.977 (5.674) 19.874 (4.867) q - 0.687 (0.118) 0.872 (0.115) Log-likelihood −663.411 −605.102 −600.876

Table 4. The Akaike information criterion (AIC) and the Bayesian information criterion (BIC) for each model fitted.

Criterion HN SHN MSHN AIC 1328.822 1214.204 1205.752 BIC 1331.466 1219.493 1211.041

MSHN SHN HN ty i ens D 0.000 0.001 0.002 0.003 0.004 0.005 0.006 0.007

0 500 1000 1500 Time failures

Figure 3. Histogram fitted with the HN, SHN and MSHN distributions provided with the ML estimations.

88 Symmetry 2019, 11, 1150 Sample Quantiles Sample Quantiles 0 500 1000 1500 0 500 1000 1500

0 500 1500 0 1000 2000

(a) Theorical Quantiles (b) Theorical Quantiles

Figure 4. QQ plots: (a) MSHN distribution and (b) SHN distribution.

6.2. Application 2 The second dataset is taken from Von Alven [ 16], and represents 46 instances of active repairs (in hours) for an airborne communication transceiver. Some descriptive statistics are: mean = 3.607, variance = 24.445, skewness = 2.888, and kurtosis = 11.802. Initially we computed the moment estimators for the MSHN distribution, obtaining the following σ = = estimations: M 2.407 and qM 2.635. We used these estimations as initial values in computing the ML estimators presented in Table 5. For each distribution we report the estimated log-likelihood.

Table 5. ML estimations with the corresponding SE for the models fitted.

Parameters HN (SE) SHN (SE) MSHN (SE) σ 6.07 (0.6335) 1.6251 (0.4777) 1.5108 (0.3179) q - 1.3539 (0.4347) 1.6365 (0.3425) Log-likelihood −116.3881 −103.1834 −102.65

Table 6 displays the AIC and BIC for each model fitted. Figure 5 presents the histogram of the data fitted with the HN, SHN and MSHN distributions, provided with the ML estimations.

Table 6. AIC and BIC for each model fitted.

Criterion HN SHN MSHN AIC 234.7762 210.3668 209.302 BIC 236.6048 214.0241 212.9573

89 Symmetry 2019, 11, 1150

MSHN SHN HN ty i ens D 0.00 0.05 0.10 0.15 0.20 0.25

0 5 10 15 20 25 Repair times (in hours)

Figure 5. Histogram fitted with the HN, SHN and MSHN distributions provided with the ML estimations.

6.3. Application 3 The third data set (Linhart and Zucchini [ 17]) represents 31 times of air conditioning system failure of an aeroplane. Some descriptive statistics are: mean = 55.35, variance = 5132.503, skewness = 1.805, and kurtosis = 5.293. Initially we computed the moment estimators for the MSHN distribution, and obtained the σ = = following estimations: M 38.125 and qM 2.743. We used these estimations as initial values in computing the ML estimators presented in Table 7. For each distribution we report the estimated log-likelihood.

Table 7. ML estimations with the corresponding SE for the models fitted.

Parameters HN (SE) SHN (SE) MSHN (SE) σ 89.616 (11.381) 13.785 (6.047) 16.148(5.128) q - 0.859 (0.285) 1.233 (0.251) Log-likelihood -161.861 -154.857 -153.954

Table 8 displays the AIC and BIC for each model fitted. Figure 6 presents the histogram of the data fitted with the HN, SHN and MSHN distributions, provided with the ML estimations.

Table 8. AIC and BIC for each model fitted.

Criterion HN SHN MSHN AIC 325.7224 313.715 311.908 BIC 327.1564 316.583 314.776

90 Symmetry 2019, 11, 1150

06+1 06+1 6+1 6+1 +1 +1 W\ W\ L L HQV HQV ' '            

          7LPHVRIDLUFRQGLWLRQLQJIDLOXUH 7LPHVRIDLUFRQGLWLRQLQJIDLOXUH

Figure 6. Histogram fitted with the HN, SHN and MSHN distributions provided with the ML estimations.

7. Conclusions In this paper, we have introduced a new and more flexible model, as it increases kurtosis and contains, as a particular case, the HN distribution. The EM algorithm is implemented, obtaining acceptable results for the maximum likelihood estimators. In applications using real data it performs very well, better than competing models. Some further characteristics of the MSHN distribution are:

• The MSHN distribution has a greater kurtosis than the SHN distribution, as is clearly reflected in Table 1. • The proposed model has a closed-form expression and presents more flexible asymmetry and kurtosis coefficients than that of the HN model. • Two stochastic representations for the MSHN model are presented. One is defined as the quotient between two independent random variables: An HN in the numerator and Exp(2) in the denominator. The other shows that the MSHN distribution is a scale mixture of an HN and a Wei distribution. • Using the mixed scale representation, the EM algorithm was implemented to calculate the ML estimators. • Results from a simulation study indicate that with a reasonable sample size, an acceptable bias is obtained. • Three illustrations using real data show that the MSHN model achieves a better fit in terms of the AIC and BIC criteria.

Author Contributions: N.M.O., O.V., Y.M.G. and Y.A.I. contributed significantly to this research article. Funding: The research of Neveka M. Olmos and Yuri A. Iriarte was supported by SEMILLERO UA-2019. The research of Yolanda M. Gómez was supported by proyecto DIUDA programa de inserción No. 22367 of the Universidad de Atacama. Acknowledgments: The authors would like to thank the editor and the anonymous referees for their comments and suggestions, which significantly improved our manuscript. Conflicts of Interest: The authors declare no conflict of interest.

Appendix A Density function of the gamma, exponential and Weibull distributions, respectively, are given by

91 Symmetry 2019, 11, 1150

Gamma distribution: βα ( α β)= α−1 −βx f x; , Γ(α) x e ,

with, x > 0, α > 0 and β > 0. Exponential distribution: ( β)= 1 −x/β f x; β e ,

with, x > 0 and β > 0. Weibull distribution: γ γ ( γ β)= γ−1 −x /β f x; , β x e ,

with, x > 0, γ > 0 and β > 0.

In the following, Lemma presents an important result used in the derivation of the pdf for the MSHN distribution.

Lemma A1. Prudnikov et al. [18], Equation (2.3.1.13) For γ > 0 ,a> 0,r> 0 and s > 0. Then

∞ γ− x 1 exp (−axr − sx)dx = N(γ, a, r, s), (A1) 0 where ⎧ ⎪ q−1 ⎪ (−a)j ⎪ ∑ Γ(γ + ) + ( Δ( γ + ) Δ( + ) (− )q ) < < ⎪ γ+rj rj p 1Fq 1, p, rj ; q,1 j ; 1 z , if 0 r 1 ⎨⎪ j=0 j!s − N = p 1 (− )h γ + γ + (− )p ⎪ s Γ( h ) ( Δ( h ) Δ( + ) 1 ) > ⎪ ∑ q+1Fp 1, q, ; p,1 h ; , if r 1 ⎪ rh!a(γ+h)/r r r z ⎪ h=0 ⎩ Γ(γ) = (a+s)γ , if r 1,

p a considering γ = p/q, p ≥ 1 and q ≥ 1 are coprime integers, where z =( )p( )q, Δ(k, a)= s q a (a+1) (a+k−1) ( ) k , k ,..., k and pFq ., ., . is the generalized hypergeometric function defined by ∞ ( ) ( ) ( ) k ( )= a1 k a2 k ... ap kx pFq a1,...,ap; b1,...,bq; x ∑ ( ) ( ) ( ) k=0 b1 k b2 k ... bp kk! ( ) = ( + ) ( + − ) where c k c c 1 ... c k 1 .

References

1. Cooray, K.; Ananda, M.M.A. A Generalization of the Half-Normal Distribution with Applications to Lifetime Data. Commun. Stat. Theory Methods 2008, 37, 1323–1337. [CrossRef] 2. Cordeiro, G.M.; Pescim, R.R.; Ortega, E.M.M. The Kumaraswamy Generalized Half-Normal Distribution for Skewed Positive Data. J. Data Sci. 2012, 10, 195–224. 3. Gómez, Y.M.; Bolfarine, H. Likelihood-based inference for power half-normal distribution. J. Stat. Theory Appl. 2015, 14, 383–398. [CrossRef] 4. Gómez, Y.M.; Vidal, I.. A generalization of the half-normal distribution. Appl. Math. J. Chin. Univ. 2016, 31, 409–424. [CrossRef] 5. Olmos, N.M.; Varela, H.; Gómez, H.W.; Bolfarine, H. An extension of the half-normal distribution. Stat. Pap. 2012, 53, 875–886. [CrossRef] 6. Rogers, W.H.; Tukey, J.W. Understanding some long-tailed symmetrical distributions. Stat. Neerl. 1972, 26, 211–226. [CrossRef] 7. Mosteller, F.; Tukey, J.W. Data Analysis And Regression; Addison-Wesley: Reading, MA, USA, 1977. 8. Reyes, J.; Gómez, H.W.; Bolfarine, H. Modified slash distribution. Statistics 2013, 47, 929–941. [CrossRef] 9. Lehmann, E.L. Elements of Large-Sample Theory; Springer: New York, NY, USA, 1999.

92 Symmetry 2019, 11, 1150

10. Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Statist. Soc. Ser. B 1977, 39, 1–38. [CrossRef] 11. Lee, S.Y.; Xu, L. Influence analyses of nonlinear mixed-effects models. Comput. Stat. Data Anal. 2004, 45, 321–341. [CrossRef] 12. Lin, T.I.; Lee, J.C.; Yen, S.Y. Finite mixture modeling using the skew-normal distribution. Stat. Sin. 2007, 17, 909–927. 13. Lyu, M. Handbook of Software Reliability Engineering; IEEE Computer Society Press: Washington, DC, USA, 1996. 14. Akaike, H. A new look at the statistical model identification. IEEE Trans. Auto. Contr. 1974, 19, 716–723. [CrossRef] 15. Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [CrossRef] 16. Von Alven, W.H. Reliability Engineering by ARINC; Prentice-Hall, Inc.: Upper Saddle River, NJ, USA, 1964. 17. Linhart, H.; Zucchini, W. Model Selection; Wiley Series in Probability and Statistics; Wiley: Hoboken, NJ, USA, 1986. 18. Prudnikov, A.P.; Brychkov, Y.A.; Marichev, O.I. Integrals and Series; Gordon & Breach Science Publishers: Amsterdam, The Netherlands, 1986.

c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

93

S symmetry S

Article An Asymmetric Bimodal Distribution with Application to Quantile Regression

Yolanda M. Gómez 1, Emilio Gómez-Déniz 2, Osvaldo Venegas 3,*, Diego I. Gallardo 1 and Héctor W. Gómez 4 1 Departamento de Matemática, Facultad de Ingeniería, Universidad de Atacama, Copiapó 1530000, Chile 2 Department of Quantitative Methods in Economics and TIDES Institute, University of Las Palmas de Gran Canaria, 35017 Las Palmas de Gran Canaria, Spain 3 Departamento de Ciencias Matemáticas y Físicas, Facultad de Ingeniería, Universidad Católica de Temuco, Temuco 4780000, Chile 4 Departamento de Matemática, Facultad de Ciencias Básicas, Universidad de Antofagasta, Antofagasta 1240000, Chile * Correspondence: [email protected]; Tel.: +56-2205411

Received: 27 May 2019; Accepted: 25 June 2019; Published: 10 July 2019

Abstract: In this article, we study an extension of the sinh Cauchy model in order to obtain asymmetric bimodality. The behavior of the distribution may be either unimodal or bimodal. We calculate its cumulative distribution function and use it to carry out quantile regression. We calculate the maximum likelihood estimators and carry out a simulation study. Two applications are analyzed based on real data to illustrate the flexibility of the distribution for modeling unimodal and bimodal data.

Keywords: asymmetric bimodal distribution; bimodal; maximum likelihood

1. Introduction It frequently occurs in real life that we find continuous data that are bimodal; these cannot be modeled by known unimodal distributions. It is therefore of interest to investigate more flexible distributions in modes that will be useful for professionals working in different areas of knowledge. In unimodal distributions, the flexibility is based on the asymmetry and kurtosis of the data. In this context, Azzalini [1] introduced the skew-normal (SN) distribution, with asymmetry parameter λ. It has a probability density function (pdf) given by − μ ( − μ) ( μ σ λ) = 2 φ y Φ λ y μ λ ∈ R σ > f y; , , σ σ σ , y, , , 0, (1)

where φ and Φ denote, respectively, the density and cumulative distribution functions of the N (0, 1) distribution. This is denoted as Y ∼ SN(λ). SN(0) becomes the standard normal distribution. Bimodal distributions generated from skew distributions can be found in Ma and Genton [ 2], Kim [3], Lin et al. [4,5], Elal-Olivero et al. [6], Arnold et al. [7], Arnold et al. [8], and Venegas et al. [9], among others. The importance of studying these distributions is based on the fact that they do not have identifiability problems and can be used as alternative parametric models to replace the use of mixtures of distributions that present estimation problems from either the classical or the Bayesian point of view (see McLachlan and Peel10 [ ]; Marin et al. [11]). One difficulty with these distributions is that in general, there is no closed-form expression for their cumulative distribution function (cdf). This makes it more difficult to generate data from these distributions for simulation studies or to carry out quantile regression. Additionally, many such bimodal distributions have complicated expressions for a general quantile (say, the q-th).

Symmetry 2019, 11, 899; doi:10.3390/sym1107089995 www.mdpi.com/journal/symmetry Symmetry 2019, 11, 899

A variety of bimodal data sets and appropriate models have been presented by many authors. For example, Cobb et al. [ 12] used the quartic exponential density presented by Fisher [ 13]to model crude birth rates data; Rao et al.14 [ ] used a bimodal distribution to analyze fish length data; Famoye et al. [15] used the beta-normal distribution to analyze egg diameter data; Everitt and Hand [16] discussed some mixture distributions for modeling bimodal data; Chatterjee et al. [ 17] and Weisberg18 [ ] presented two bimodal data sets on the eruption and interruption times of the Old Faithful geyser; Bansal et al. [ 19] discussed the bimodality of quantum dot size distribution; Famoye et al. [15] cited a variety of bimodal distributions that arise from different areas of science. On the other hand, the sinh Cauchy (SC) distribution is given by

λ cosh(z) f (z; Λ)= , σπ(1 + {λ sinh(z)}2)

y−μ where Λ =(λ, μ, σ), z = σ , z ∈ R, μ ∈ R is a location parameter, σ > 0 is a scale parameter, and λ > 0 is a symmetric parameter. The SC distribution produces unimodal and bimodal densities. The disadvantage of the SC distribution is that it is symmetric, which limits it to modeling only symmetric bimodal data. The main objective of this article is therefore to study a bimodal skew-symmetric model with closed cdf, in order to apply it to quantile regression. To do this, we used an extension of the SC distribution that we call the gamma–sinh Cauchy (GSC) distribution, which presents flexibility in its modes and also closed-form expression in its cdf. The GSC distribution belongs to the (gamma-G generator) family introduced by Zografos and Balakrishnan [20]. For any baseline cdf G(y; Λ), x ∈ R, they defined the gamma-G generator by the pdf and cdf given by ( Λ) ( φ Λ)= g y; {− [ − ( Λ)]}φ−1 f y; , Γ(φ) log 1 G y; , (2) and − [ − ( Λ)] γ(− log[1 − G(y; Λ)], φ) 1 log 1 G y; φ− − F(y; φ, Λ)= = u 1e u du, (3) Γ(φ) Γ(φ) 0

φ > Λ ( )= d ( ) respectively, where 0 is a skewness parameter, is a vector of parameters, g y dy G y , γ( )= y a−1 −t Γ( )=γ(+∞ ) y, a 0 t e dt is the incomplete gamma function, and a , a is the usual gamma function. We remark that in the literature, there are many models that can accommodate bimodal distributions. However, in only a few of them do the parameters have an interpretation in terms of measures of (mean, median, for instance) or a generalq-th quantile. As we will show in Section 3, the main advantage of the GSC is that the location parameter represents the respective q-th quantile under a certain restriction overφ, which is very convenient for the use of this model in a quantile regression framework. The paper is organized as follows. Section 2 develops the GSC distribution, its basic properties, and quantile regression. In Section 3, we perform a small-scale simulation study of the maximum likelihood (ML) estimators for parameters. Two applications to real data are discussed in Section 4, which illustrate the usefulness of the proposed model. Finally, conclusions are given in Section 5.

2. Gamma–Sinh Cauchy Distribution The GSC distribution is obtained considering G in (2) as the cdf of the SC distribution. The pdf can be written as  φ− λ cosh(z) 1 1 f (z; Θ)= − log 0.5 − arctan {λ sinh(z)} , (4) σπΓ(φ)(1 + {λ sinh(z)}2) π where Θ =(φ, λ, μ, σ), and φ > 0 is an asymmetric parameter. We denoted this byZ ∼ GSC(φ, λ, μ, σ).

96 Symmetry 2019, 11, 899

The cdf is given by ( φ λ μ σ)= 1 γ − − 1 {λ ( )} φ F z; , , , Γ(φ) log 0.5 π arctan sinh z , . (5)

Particular cases:

1. φ = 1 SC distribution, 2. φ = 1, λ = 1 hyperbolic secant distribution (Talacko [21]).

The following proposition states conditions for the symmetry of the GSC distribution.

Proposition 1. The density of the GSC(φ, λ, μ, σ) model is symmetric if and only if φ = 1.

Proof. Without loss of generality, we considerμ = 0 and σ = 1. For φ = 1, the density of the model is

λ cosh(y) f (y;1,λ,0,1)= . σπΓ(φ)(1 + {λ sinh(y)}2)

This function is clearly even because cosh(y) and sinh(y)2 are even. To prove the reciprocal, φ = ( φ λ )= we will argue by contradiction. Let 0 1 such that the density is symmetric, i.e. f y; 0, ,0,1 (− φ λ ) ∀ ∈ R ∀λ > f y; 0, ,0,1 , y , 0. This implies that

⎛ ⎞φ − 0 1 log 1 + 1 arctan {λ sinh(y)} ⎝ 2 π ⎠ = 1. 1 − 1 {λ ( )} log 2 π arctan sinh y

From the latter equality, and jointly with the fact that the logarithmic function is injective, we find that arctan(λ sinh(y)) = 0, ∀y ∈ R, which implies that λ = 0, producing a contradiction.

The unimodal and bimodal regions for GSC(φ, λ, μ, σ) are illustrated in Figure 1. We can see that for allφ, there is λ such that GSC is bimodal. Figure 2 shows the density function for some values of the parameters φ and λ, considering the location and scale parameters fixed at 0 and 1, respectively. The distribution assumes symmetric unimodal and bimodal shapes and asymmetric unimodal and bimodal shapes. Figure 3 shows the skewness and kurtosis coefficients for the GSC model under different values of λ and φ (such coefficients do not depend onμ and σ). As illustrated previously, the model can assume positive and negative values for the skewness coefficient and can also accommodate kurtosis coefficients lower than, equal to, and greater than the normal model (<3, =3 and >3, respectively).

Figure 1. Unimodal and bimodal regions for GSC(φ, λ, μ, σ).

97 Symmetry 2019, 11, 899

(a) (b)

(c) (d) Figure 2. Plots for the gamma–sinh Cauchy (GSC) model for different values of the parameters with μ = 0, σ = 1(a) φ = 1(b) λ = 0.5, and (c,d) λ = 0.2

Figure 3. Skewness and kurtosis coefficients for the GSC (φ, λ, μ, σ) model with different values for λ and φ.

98 Symmetry 2019, 11, 899

The GSC Model for Quantile Regression From Equation (5), it follows that the cdf of the GSC distribution evaluated in μ is given by γ( ( ) φ) ( ≤ μ φ λ μ σ)= (μ φ λ μ σ)= log 2 , = ( ( ) φ) P Y ; , , , F ; , , , Γ(φ) G log 2 , , (6) where G(y, a)=γ(y, a)/Γ(a) corresponds to the cdf of the gamma distribution with shape and scale parameters a and 1, respectively. Note that G (log(2), φ) depends only on φ (and not on σ or λ). As G(log(2), φ) is an increasing function in terms of φ and G(·, φ) ∈ (0, 1), the equation G(log(2), φ)=q has an unique solution for q ∈ (0, 1), which can also be written as

( ) 1 log 2 φ− − u 1e udu = q. (7) Γ(φ) 0

Equation (7) can be solved numerically. For instance, in R the uniroot function can be used. Table 1 shows some values for φ(q) with different values for q.

Table 1. Some values for φ(q) in terms of q.

q 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 φ(q) 2.301 1.802 1.475 1.219 1.000 0.802 0.613 0.427 0.230

For this reason, for a fixed q, if we take φ = φ(q) satisfying (7), by (6) the parameter μ directly represents the q-th quantile, allowing regression to be performed conveniently even thoughμ. Under x =( ) = this setting, a set of available p-covariates, say i xi1, ..., xip , for i 1, ..., n, can be introduced as follows: μ = xβ = i i , i 1, . . . , n. This is a convenient property of the GSC distribution because it provides a simple way to performing quantile regression in a model that can be unimodal or bimodal, depending only on parameter λ (because φ = φ(q) is considered as fixed in this setting). As far as we know, there is no model in the literature that is parameterized conveniently in terms of the q-th quantile and can also be unimodal or bimodal. Figure 1 shows that for any φ = φ(q) fixed Λ(1) λ Λ(2) in the GSC, there is an interval φ(q) for where the distribution is unimodal and an interval φ(q) where the distribution is bimodal.

3. ML Estimation for the GSC Distribution

3.1. ML Estimation (φ λ μ σ) Consider y1, y2, ..., yn as a size n random sample from the pdf GSC , , , . Hence, the log-likelihood function is given by

n n ) * (φ λ μ σ)= (λ)+ { ( )}− + λ2 2( ) l , , , n log ∑ log cosh zi ∑ log 1 sinh zi = = i 1  i 1 n +(φ − ) − 1 − 1 {λ ( )} 1 ∑ log log π arctan sinh zi i=1 2 − n log(σ) − n log(π) − n log(Γ(φ)), (8)

99 Symmetry 2019, 11, 899

−μ = yi Θ =(φ λ μ σ) where zi σ . To compute the ML estimation for , , , ,(8) must be maximized. That δl δl δl δl is, we have to solve the following system of equations: δφ , δλ , δμ , and δσ . More precisely, we have to solve

n (− ( )) − Ψ(φ)= ∑ log log ti n 0, i=1 n 2λ sinh2(z ) n sinh(z ) n ∑ i +(φ − 1) ∑ i = , + λ2 2( ) π ( + λ2 2( )) ( ) λ i=1 1 sinh zi i=1 ti 1 sinh zi log ti n λ2 sinh(2z ) λ(φ − 1) n cosh(z ) n ∑ i + ∑ i = ∑ tanh(z ), + λ2 2( ) π ( + λ2 2( )) ( ) i i=1 1 sinh zi i=1 ti 1 sinh zi log ti i=1 n λ2z sinh(2z ) λ(φ − 1) n z cosh(z ) n ∑ i i + ∑ i i − n = ∑ z tanh(z ), + λ2 2( ) π ( + λ2 2( )) ( ) i i i=1 1 sinh zi i=1 ti 1 sinh zi log ti i=1

= 1 − 1 {λ ( )} Ψ(·) where ti 2 π arctan sinh zi , and is the digamma function. The system of equations given above can be solved using numerical procedures such as the Newton–Raphson procedure. An alternative is to use the NumDeriv routine with the R software (R Core Team [22]).

3.2. Simulation Study In this Section, we present a brief simulation study to assess the performance of MLE in the GSC model. To draw values from the model, we used the inversion method. If U ∼ U(0, 1), then − − − Z = μ + σ sinh 1 λ 1 tan π/2 − π exp −G 1 (U; φ) ∼ GSC(φ, λ, μ, σ),

− where G 1(·; φ) is the inverse of the cdf of the gamma distribution with shape and rate parameters equal to φ and 1, respectively. In all scenarios, we considered μ = 0 and σ = 1, three values for φ—0.75, 1, and 1.5—and three values for λ—0.5, 1, and 2. We also considered three sample sizes: 100, 200, and 500. For each combination of the parameters, we drew 1,000 samples and computed the ML estimates. Table 2 summarizes the results considering the average of the bias (bias), the root of the estimated mean squared error (RMSE), and the 95% coverage probability (CP). Note that in all cases, the bias and RMSE decreased when the sample size was increased, suggesting that the estimators are consistent. Finally, we also remark that the coverage probability converged to the nominal values used for the construction of the confidence intervals when the sample size was increased, suggesting that the normality for the ML estimates is reasonable in sample sizes.

Table 2. Simulation study for the GSC(φ, λ, μ = 0, σ = 1) model.

n = 100 n = 200 n = 500 φλparameter bias RMSE CP bias RMSE CP bias RMSE CP 0.75 0.5 μ −0.094 0.663 0.846 −0.055 0.578 0.900 −0.014 0.457 0.932 σ −0.037 0.392 0.880 −0.014 0.337 0.904 −0.006 0.268 0.933 λ −0.031 0.372 0.869 −0.013 0.317 0.905 −0.006 0.252 0.930 φ 0.044 0.411 0.867 0.023 0.349 0.907 0.006 0.272 0.934 1.0 μ −0.015 0.607 0.831 0.021 0.532 0.880 0.016 0.428 0.925 σ −0.035 0.460 0.848 −0.024 0.396 0.886 −0.009 0.320 0.927 λ −0.004 0.544 0.831 −0.010 0.459 0.889 −0.003 0.365 0.927 φ 0.018 0.424 0.843 −0.004 0.363 0.890 −0.006 0.290 0.926 2.0 μ 0.011 0.361 0.934 0.004 0.298 0.944 0.001 0.233 0.944 σ −0.011 0.504 0.911 −0.007 0.419 0.932 −0.003 0.333 0.945 λ 0.055 0.789 0.928 0.017 0.650 0.939 0.010 0.513 0.945 φ −0.007 0.337 0.940 −0.003 0.279 0.949 −0.001 0.220 0.947

100 Symmetry 2019, 11, 899

Table 2. Cont.

n = 100 n = 200 n = 500 φλparameter bias RMSE CP bias RMSE CP bias RMSE CP 1.0 0.5 μ −0.060 0.666 0.846 −0.036 0.580 0.899 −0.012 0.460 0.933 σ −0.044 0.373 0.877 −0.022 0.318 0.915 −0.009 0.253 0.933 λ −0.033 0.366 0.867 −0.018 0.308 0.913 −0.007 0.245 0.936 φ 0.045 0.458 0.874 0.023 0.388 0.909 0.007 0.304 0.938 1.0 μ −0.052 0.622 0.816 −0.024 0.543 0.881 −0.002 0.428 0.940 σ −0.040 0.431 0.832 −0.024 0.367 0.887 −0.006 0.293 0.936 λ −0.017 0.535 0.811 −0.015 0.448 0.875 0.000 0.354 0.934 φ 0.060 0.493 0.842 0.027 0.417 0.890 0.005 0.324 0.946 2.0 μ −0.001 0.357 0.937 0.000 0.291 0.946 −0.001 0.229 0.951 σ −0.014 0.494 0.896 −0.006 0.413 0.929 0.000 0.328 0.942 λ 0.033 0.795 0.916 0.017 0.658 0.937 0.011 0.521 0.944 φ 0.007 0.372 0.942 0.002 0.303 0.947 0.002 0.238 0.949 1.5 0.5 μ 0.015 0.683 0.850 0.014 0.597 0.894 0.008 0.480 0.925 σ −0.045 0.354 0.890 −0.021 0.300 0.926 −0.009 0.239 0.935 λ −0.022 0.366 0.877 −0.014 0.308 0.916 −0.006 0.245 0.937 φ 0.026 0.533 0.876 0.008 0.455 0.906 0.002 0.364 0.930 1.0 μ −0.134 0.688 0.835 −0.138 0.612 0.856 −0.076 0.492 0.904 σ -0.038 0.413 0.836 −0.025 0.347 0.856 −0.011 0.275 0.896 λ 0.091 0.581 0.840 −0.001 0.459 0.860 −0.010 0.360 0.896 φ 0.211 0.681 0.886 0.166 0.579 0.880 0.076 0.446 0.912 2.0 μ −0.029 0.383 0.933 −0.009 0.309 0.945 −0.004 0.241 0.948 σ −0.014 0.472 0.895 −0.007 0.395 0.927 −0.003 0.313 0.942 λ 0.065 0.784 0.932 0.023 0.646 0.942 0.010 0.510 0.952 φ 0.070 0.484 0.948 0.023 0.379 0.954 0.009 0.294 0.950

4. Applications In this section, we carry out two applications to real data, the first using the GSC model without covariates and the second applying quantile regression to uni- and bimodal data.

4.1. Application 1: Without Covariates The first application reported is for the data set consisting of 1150 heights measured at 1 micron intervals along the drum of a roller (i.e., parallel to the axis of the roller). This was part of an extensive study of roller surface roughness. It is available for downloading at http://lib.stat,emu.edu/jasadata/ laslett. Table 3 presents summary statistics for the data set where b1 and b2 correspond to the sample asymmetry and kurtosis coefficients, respectively.

Table 3. Descriptive statistics for the data set

ys2 b b n 1 2 1150 3.535 0.422 −0.986 4.855

We fitted this using the SN model, the exponentiated sinh Cauchy (ESC) model (see Cooray23 [ ]), and the GSC model. A summary of these fits is presented in Table 4. Based on the AIC criteria, the GSC provided the best fit for the height data set. Figure 4 shows plots of the density functions for the fitted models using the MLEs for SN, ECG and GSC distributions.

101 Symmetry 2019, 11, 899

Table 4. Maximum likelihood (ML) estimates for the GSC, exponentiated sinh Cauchy (ECG), and skew-normal (SN) models for the roller data set.

Parameter GSC ECG SN μ 4.1115 (0.0388) 4.0460 (0.0482) 4.2475 (0.0276) σ 0.2053 (0.0172) 0.1903 (0.0205) 0.9644 (0.0286) λ 0.5353 (0.0682) 0.5535 (0.0860) −2.7578 (0.2426) φ 0.3621 (0.0373) 0.3322 (0.0445) − log-likelihood −1053.8 −1056.0 −1071.3 AIC 2115.7 2119.9 2148.7

 *6& 61 (6&



 'HQVLW\





 +HLJKW

Figure 4. Fitted models for roller data set.

4.2. Data Set 2: Quantile Regression to Bimodal Data The second application we consider is the Australian data set available in the sn package in R. This data set is related to 102 male and 100 female athletes collected at the Australian Institute of Sport. The linear model considered is

= β ( )+β ( ) + β ( ) +  ( ) = Bfati 0 q 1 q bmii 2 q lbmi i q , i 1, . . . , 202, (9) where Bfati is the body fat percentage for the i-th athlete and bmii and lbmi are the covariates body  ( ) ∼ ( σ λ φ( )) mass index and lean body mass for thei-th athlete, respectively. In addition, i q GSC 0, , , q , φ(q) satisfies Equation (7), and q ∈ (0, 1) is the fixed quantile that is being modeled. This data set was also analyzed in Martínez-Flórez et al. [24] using a bimodal regression model. However, the authors modeled the mean of the distribution. In our approach, we model the 0.1, 0.25, 0.5, 0.75, and 0.9 of the distribution, which provides a more informative scenario to explain body fat in terms of the body mass index and lean body mass. Our approach is compared with the skewed Laplace (SKL) and skewed Student-t (SKT) models discussed in Galarza et al. [25], where the authors proposed a flexible model in a quantile regression model context. Table 5 shows the AIC for those models considering different quantiles. We also present the p-value for the Kolmogorov–Smirnov (K–S) test of the hypothesis that the respective quantile residuals came from the standard normal distribution. P-values greater than 5% suggest that with this significance level, the standard normal assumption is reasonable for those residuals, in which case the model would be appropriate for this

102 Symmetry 2019, 11, 899 data set. Note that based on the AIC criteria, the GSC presents a better fit for this data set, except for the median regression ( q = 0.5). On the other hand, based on the p-value for the K–S test applied to the quantile residuals, we conclude that GSC, SKL, and SKT are appropriate models for q = 0.25 and q = 0.5 (the GSC and SKT provide a better fit based on the AIC criteria). However, for q = 0.1 and 0.75, GSC provides the better fit because the p-values are (significantly) greater than 0.05. Finally, for q = 0.9 no model seems appropriate, but based on thep-values, the GSC provides a better fit than SKL and SKT distributions.

Table 5. AIC and p-value for K–S test in the ais data set for the GSC, skewed Laplace (SKL), and skewed Student-t (SKT) models and different quantiles.

AIC p-value for K–S test q GSC SKL SKT GSC SKL SKT 0.10 1156.54 1194.28 1166.74 0.79 0.06 0.02 0.25 1160.72 1172.70 1153.15 0.65 0.71 0.96 0.50 1162.74 1182.66 1161.80 0.32 0.13 0.31 0.75 1159.50 1221.65 1200.87 0.87 <0.001 0.02 0.90 1211.71 1280.45 1253.48 0.02 <0.001 <0.001

Figure 5 shows the regression coefficients for the quantile regression presented in Equation (9) and their respective 95% confidence intervals. Note that body mass index and lean body mass are significant in explaining all the quantiles modeled. Figure 6 shows the profile density for the q-th quantile of body fat percentage for q = 0.1 and q = 0.75. Note that the distribution of the 0.1 quantile is unimodal, and the distribution of the 0.9 quantile is bimodal.

Figure 5. Estimates for regression coefficients (and 95% confidence interval)s for variables bmi (left panel) and lbm (right panel) in different quantile regression models with quantiles equal to 0.1, 0.25, 0.5, 0.75, and 0.9 and response variable Bfat.

103 Symmetry 2019, 11, 899

T   T  

 

 

 

 

  GHQVLW\IXQFWLRQ GHQVLW\IXQFWLRQ  

 

 

             

%IDW %IDW

Figure 6. Distribution for 0.1 and 0.75 quantiles of body fat percentage considering body mass index and lean body mass equal to 22.96 and 64.87, respectively. Curves in black, red, and green represent the density functions estimated by the GSC, SKL, and SKT models, respectively.

5. Final Comments This paper proposes the GSC distribution, which is flexible in its modes and contains the SC distribution as a special case. We implemented ML estimation in a quantile regression, obtaining significant results. In the applications to real data it performs very well, better than potential rival models. Some further characteristics of the GSC distribution are:

• The GSC distribution contains the SC and hyperbolic secant models as special cases. • The GSC distribution presents great flexibility in its modes, as can be observed in Figure 1. • The proposed model has a closed-form expression for its cdf. • In the two applications, we show that the GSC model fits better than the other models.

Author Contributions: All of the authors contributed significantly to this research article. Funding: The research of Yolanda M. Gómez was supported by proyecto DIUDA programa de inserción No. 22367 of the Universidad de Atacama. The research of Héctor W. Gómez was supported by Grant SEMILLERO UA-2019 (Chile). Acknowledgments: The authors would like to thank the editor and the anonymous referees for their comments and suggestions, which significantly improved our manuscript. Conflicts of Interest: The authors declare no conflict of interest.

References

1. Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Stat. 1985, 12, 171–178. 2. Ma, Y.; Genton, M.G. Flexible class of skew-symmetric distributions. Scand. J. Stat. 2004, 31, 459–468. [CrossRef] 3. Kim, H.J. On a class of two-piece skew-normal distributions. Statistics 2005, 39, 537–553. [CrossRef] 4. Lin, T.I.; Lee, J.C.; Hsieh, W.J. Robust mixture models using the skew-t distribution. Stat. Comput. 2007, 17, 81–92. [CrossRef] 5. Lin, T.I.; Lee, J.C.; Yen, S.Y. Finite mixture modeling using the skew-normal distribution. Stat. Sin. 2007, 17, 909–927. 6. Elal-Olivero, D.; Gómez, H.W.; Quintana, F.A. Bayesian Modeling using a class of Bimodal skew-Elliptical distributions. J. Stat. Plan. Infer. 2009, 139, 1484–1492. [CrossRef] 7. Arnold, B.C.; Gómez, H.W.; Salinas, H.S. On multiple constraint skewed models. Statistics2009, 43, 279–293. [CrossRef]

104 Symmetry 2019, 11, 899

8. Arnold, B.C.; Gómez, H.W.; Salinas, H.S. A doubly skewed normal distribution. Statistics 2015, 49, 842–858. [CrossRef] 9. Venegas, O.; Salinas, H.S.; Gallardo, D.I.; Bolfarine, H.; Gómez, H.W. Bimodality based on the generalized skew-normal distribution. J. Stat. Comput. Simul. 2018, 88, 156–181. [CrossRef] 10. McLachlan, G.J.; Peel, D. Finite Mixture Models; Wiley Interscience: New York, NY, USA, 2000. 11. Marin, J.M.; Mengersen, K.; Robert, C. Bayesian modeling and inference on mixtures of distributions. Handbook Stat. 2005, 25, 459–503. 12. Cobb, L.; Koppstein, P.; Chen, N.H. Estimation and moment recursion relations for multimodal distributions of the exponential families. J. Arner. Stat. Assoc. 1983, 78, 124–130. [CrossRef] 13. Fisher, R.A. On the Mathematical Foundations of Theoretical Statistics. Philos. Trans. R. Soc. London Ser. A 1922, 222, 309–368. [CrossRef] 14. Rao, K.S.; Narayana, J.L.; Sastry, V.P. A bimodal distribution. Bull. Calcutta Math. Soc. 1988, 80, 238–240. 15. Famoye, F.; Lee, C.; Eugene, N. Beta-normal distribution: Bimodality properties and application. J. Modern Appl. Stat. Method 2004, 3, 85–103. [CrossRef] 16. Everitt, B.S.; Hand, D.J. Finite Mixture Distributions; Chapman & Hall: London, UK, 1981. 17. Chatterjee, S.; Handcock, M.S.; Simonoff, J.S. A Casebook for a First Course in Statistics and Data Analysis; John Wiley & Sons: New York, NY, USA, 1995. 18. Weisberg, S. Applied Linear Regression; John Wiley & Sons: New York, NY, USA, 2005. 19. Bansal, B.; Gokhale, M.R.; Bhattacharya, A.; Arora, B.M. InAs/InP quantum dots with bimodal size distribution: Two evolution pathways. J. Appl. Phys. 2007, 101, 1–6. [CrossRef] 20. Zografos, K.; Balakrishnan, N. On families of beta- and generalized gamma-generated distributions and associated inference. Stat. Methodol. 2009, 6, 344–362. [CrossRef] 21. Talacko, J. Perks’ distributions and their role in the theory of Wiener’s stochastic variables. Trabajos de Estadistica 1956, 7, 159–174. [CrossRef] 22. Team, R.C. R: A Language and Environment for Statistical Computing. Available online: http://www.R- project.org (accessed on 26 May 2019). 23. Cooray, K. Exponentiated Sinh Cauchy distribution with applications. Commun. Stat. Theor. Method 2013, 42, 3838–3852. [CrossRef] 24. Martínez-Flórez, G.; Salinas, H.S.; Bolfarine, H. Bimodal regression model. Colombian J. Stat.2017, 40, 65–83. 25. Galarza, C.E.; Lachos, V.H.; Barbosa, C.; Castro, L.M. Robust quantile regression using a generalized class of skewed distributions. Stat 2017, 6, 113–130.

c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

105

S symmetry S

Article Univariate and Bivariate Models Related to the Generalized Epsilon–Skew–Cauchy Distribution

Barry C. Arnold 1, Héctor W. Gómez 2, Héctor Varela 2,* and Ignacio Vidal 3 1 Statistics Department, University of California Riverside, Riverside, CA 92521, USA; [email protected] 2 Departamento de Matemática, Universidad de Antofagasta, Antofagasta 1240000, Chile; [email protected] 3 Instituto de Matemática y Física, Universidad de Talca, Casilla 747, Talca, Chile; [email protected] * Correspondence: [email protected]

Received: 8 May 2019; Accepted: 10 June 2019; Published: 14 June 2019

Abstract: In this paper, we consider a stochastic representation of the epsilon–skew–Cauchy distribution, viewed as a member of the family of skewed distributions discussed in Arellano-Valle et al. (2005). The stochastic representation facilitates derivation of distributional properties of the model. In addition, we introduce symmetric and asymmetric extensions of the Cauchy distribution, together with an extension of the epsilon–skew–Cauchy distribution. Multivariate versions of these distributions can be envisioned. Bivariate examples are discussed in some detail.

Keywords: Epsilon-skew-Normal; Epsilon-skew-Cauchy; bivariate densities; generalized Cauchy distributions

1. Introduction Mudholkar and Hutson (2000) [ 1] studied an asymmetric normal distribution that they called the epsilon–skew–normal {ESN(ε); |ε| < 1}, with asymmetry or skewness parameter ε. When the parameter ε assumes the value 0, the distribution simplifies to become a standard normal distribution. The family thus consists of a parameterized set of usually asymmetric distributions that includes the symmetric standard normal density as a special case. Specifically, we say that X ∼ ESN(ε) if its density is of the form: x g (x; ε) = φ 1 − sgn (x) ε where x ∈ R, φ is the standard normal density and sgn(·) is the sign function. Arellano-Valle et al. (2005) [2] discuss extension of this model, together with associated inference procedures. They consider a class of Epsilon-skew-symmetric distributions associated with a particular symmetric density f (·) that is indexed by an asymmetry parameter ε with densities given by x h(x; ε)= f (1) 1 − sgn (x) ε where x ∈ R and |ε| < 1. If X has density of the form (1), then we say that X is an epsilon-skew-symmetric random variable and we writeX ∼ ES f (ε). Arellano-Valle et al. (2005) [ 2] extend this family to the model epsilon-skew-exponential-power, a model that has major and minor asymmetry and kurtosis that the ESN model. On the other hand Gómez et al. (2007) [3] study the Fisher information matrix for epsilon-skew-t model, which was used before in the study a financial series by Hansen (1994) [ 4]; see also Gómez et al. (2008)5 []. Note that if in (1) we set f (t)=1/ π 1 + t2 , we obtain the epsilon–skew–Cauchy model.

Symmetry 2019, 11, 794; doi:10.3390/sym11060794107 www.mdpi.com/journal/symmetry Symmetry 2019, 11, 794

We will write X ∼ N(0, 1) to indicate that X has a standard normal distribution, and we will write Y ∼ HN(0, 1) to indicate that Y has a standard half-normal distribution, i.e., thatY = |X| where X ∼ N(0, 1). The distribution of the ratio X/Y of two random variables is of interest in problems in biological and physical sciences, , and and selection. It is well known that if X ∼ N(0, 1), ∼ ( ) ∼ ( ) Y1 N 0, 1 and Y2 HN 0, 1 are independent, then the random variables X/Y1 and X/Y2 both have Cauchy distributions; see Johnson et al. (1994, [6] Chapter 16). Behboodian, et al. (2006) [7] and Huang and Chen (2007) [ 8] study the distribution of such quotients when the component random variables are skew-normal (of the form studied in Azzalini (1985) [ 9]). The principle objective of the present paper is to study the behavior of such quotients when the component random variables have epsilon-skew-normal distributions. The paper is organized in the following manner. In Section 2, we describe a representation of the epsilon–skew–Cauchy model. In Section 3, we consider the distribution of the ratio of two independent random variables, one of which has an ESN (ε) distribution and the other a standard normal distribution. In addition, an extension of the epsilon–skew–Cauchy (ESC) distribution is introduced. Bivariate versions of these distributions are discussed in Section 4. Extensions to higher dimensions can be readily envisioned, but are not discussed here. In Section 5, some of the bivariate distributions introduced in this paper are considered as possible models for a particular real-world data set.

2. Representation of the ESC (Epsilon–Skew–Cauchy) Model

∼ (ε) ∼ ( ) =d X Proposition 1. If X ESN and Y HN 0, 1 are independent random variables, then Z1 Y has an epsilon–skew–Cauchy distribution with asymmetry parameter ε and density given by

1 f (z; ε) = , Z1 2 π + z 1 1−sgn(z)ε

∈ R |ε| < ∼ (ε) where z , 1 and we write Z1 ESC .

= = | | = Proof. With the transformation Z1 X/Y and W Y, whose Jacobian J w, we obtain   ( 2 ( ) = w − 1 z + 2 fZ ,W z, w exp 1 w , 1 π 2 (1 − sgn (z) ε)2 where z ∈ R, w > 0. It follows directly that ∞ 1 f (z; ε) = f (z, w) dw = . Z1 Z1,W 2 0 π + z 1 1−sgn(z)ε

Figure 1 depicts the behavior of the ESC density for a variety of parameter values.

108 Symmetry 2019, 11, 794 'HQVLW\        í í í    

]

Figure 1. Examples of the ESC(ε) density for : ε = 0 (green line), ε = −0.5 (blue line), ε = −0.8 (black line), ε = 0.5 (red line) and ε = 0.8 (pink line).

3. Generalized Cauchy Distribution

∼ (ε ) ∼ (ε ) =d X Proposition 2. If X ESN 1 and Y ESN 2 are independent random variables, thenZ Y has what ε ε we call a generalized epsilon–skew–Cauchy (GESC) distribution with parameters 1 and 2 and with density given by

1 1 f (z; ε , ε ) = + , Z 1 2 2 2 2 2 π 1 + z π 1 + z 2 +ε + ( )ε 2 −ε − ( )ε 1 2 1 sgn z 1 1 2 1 sgn z 1

∈ R |ε | < |ε | < ∼ (ε ε ) where z , 1 1, 2 1 and we write Z GESC 1, 2 .

Proof. Arguing as in Proposition 1, we have that   ( ∞ | | 2 ( ε ε ) = w − 1 z + 1 2 fZ z; , 2 exp w dw. 1 −∞ π ( − ( ) ε )2 ( − ( ) ε )2 2 2 1 sgn zw 1 1 sgn w 2

By considering separately the cases in which z ≥ 0 and z < 0, result follows directly.

From Proposition 2 two special cases are obtained directly, ε = ∼ (ε) 1. If 2 0 a generalized Cauchy (GC) distribution is obtained. In this case we write Z GC , and its pdf is given by

1 1 f (z; ε) = + , Z 2 2 π + z π + z 2 1 1+sgn(z)ε 2 1 1−sgn(z)ε

109 Symmetry 2019, 11, 794

d where z ∈ R,0< ε < 1 and Z = X/Y where X ∼ ESN (ε) and Y ∼ N(0, 1) are independent random variables. ε = ε = ε ∼ 2. If 1 2 an epsilon–skew–Cauchy distribution is obtained. In this case we write Z ESC2(ε), and its pdf is given by

1 1 f (z; ε) = + , Z 2 2 2 2 π 1 + z π 1 + z 2 1+ε 1+sgn(z)ε 2 1−ε 1−sgn(z)ε

d where z ∈ R, |ε| < 1 and Z = X/Y where X ∼ ESN (ε) and Y ∼ ESN (ε) are independent random variables.

4. General Bivariate Mudholkar-Hutson Distribution = Define Z1, Z2, Z3 to be i.i.d. standard normal random variables. For i 1, 2, 3 define −α γ = i with probability i Ui β − γ i with probability 1 i

α α α β β β < γ γ γ < α where 1, 2, 3, 1, 2, 3 are positive numbers and 0 1, 2, 3 1. So, the parameters i and β i indicate the propensity in which the discrete random variable takes negative and positive values, γ respectively. The parameters i control how often negative and positive values are taken by Ui. In addition assume that all six random variables Z1, Z2, Z3, U1, U2 and U3 are independent, and define | | | | ( )= U1 Z1 U2 Z2 X, Y | | , | | . (2) U3 Z3 U3 Z3 The model (2) is highly flexible since it allows for different behavior in each of the four quadrants of the plane. From (2) it may be recognized that only fractional moments of X and Y exist. | | | | ( ) = Z1 Z2 Note that if we define W1, W2 | | , | | , it is readily verified that Z3 Z3 2 −3/2 f (w , w ) = 1 + w2 + w2 I (w > 0, w > 0) , (3) W1,W2 1 2 π 1 2 1 2 ( ) in which case we say that W1, W2 has a standard bivariate half-Cauchy distribution.

110 Symmetry 2019, 11, 794

( ) Using (3) and conditioning on U1, U2 and U3 we obtain the density of X, Y in the albeit complicated form: ( −3/2 α2 α 2 α 2 2 3 3 3 f (x, y)= π γ γ γ α α 1 + α x + α y I(x > 0, y > 0) 1 2 3 1 2 1 2 ( −3/2 β2 β 2 β 2 2 3 3 3 + π γ γ (1 − γ ) α α 1 + α x + α y I(x < 0, y < 0) 1 2 3 1 2 1 2 ( −3/2 α2 α 2 α 2 2 3 3 3 + π γ (1 − γ )γ α β 1 + α x + β y I(x > 0, y < 0) 1 2 3 1 2 1 2 ( −3/2 β2 β 2 β 2 2 3 3 3 + π γ (1 − γ )(1 − γ ) α β 1 + α x + β y I(x < 0, y > 0) 1 2 3 1 2 1 2 ( −3/2 (4) α2 α 2 α 2 2 3 3 3 + π (1 − γ )γ γ β α 1 + β x + α y I(x < 0, y > 0) 1 2 3 1 2 1 2 ( −3/2 β2 β 2 β 2 2 3 3 3 + π (1 − γ )γ (1 − γ ) β α 1 + β x + α y I(x > 0, y < 0) 1 2 3 1 2 1 2 ( −3/2 α2 α 2 α 2 2 3 3 3 + π (1 − γ )(1 − γ )γ β β 1 + β x + β y I(x < 0, y < 0) 1 2 3 1 2 1 2 ( −3/2 β2 β 2 β 2 2 3 3 3 + π (1 − γ )(1 − γ )(1 − γ ) β β 1 + β x + β y I(x > 0, y > 0) . 1 2 3 1 2 1 2

Some special cases which might be considered include the following:

α = + ε β = − ε γ = ( + ε ) 1. Mudholkar and Hutson type. For this we set: i 1 i, i 1 i and i 1 i /2 for i = 1, 2, 3. In this case the density (4) simplifies somewhat to become: ( −3/2 +ε 2 +ε 2 1 3 1 3 1 3 f (x, y)= π (1 + ε ) 1 + +ε x + +ε y I(x > 0, y > 0) 4 3 1 1 1 2 ( −3/2 −ε 2 −ε 2 1 3 1 3 1 3 + π (1 − ε ) 1 + +ε x + +ε y I(x < 0, y < 0) 4 3 1 1 1 2 ( −3/2 +ε 2 +ε 2 1 3 1 3 1 3 + π (1 + ε ) 1 + +ε x + −ε y I(x > 0, y < 0) 4 3 1 1 1 2 ( −3/2 −ε 2 −ε 2 1 3 1 3 1 3 + π (1 − ε ) 1 + +ε x + −ε y I(x < 0, y > 0) 4 3 1 1 1 2 ( −3/2 (5) +ε 2 +ε 2 1 3 1 3 1 3 + π (1 + ε ) 1 + −ε x + +ε y I(x < 0, y > 0) 4 3 1 1 1 2 ( −3/2 −ε 2 −ε 2 1 3 1 3 1 3 + π (1 − ε ) 1 + −ε x + +ε y I(x > 0, y < 0) 4 3 1 1 1 2 ( −3/2 +ε 2 +ε 2 1 3 1 3 1 3 + π (1 + ε ) 1 + −ε x + −ε y I(x < 0, y < 0) 4 3 1 1 1 2 ( −3/2 −ε 2 −ε 2 1 3 1 3 1 3 + π (1 − ε ) 1 + −ε x + −ε y I(x > 0, y > 0) . 4 3 1 1 1 2

A further specialization of the density (5) can be considered, as follows.

111 Symmetry 2019, 11, 794

α = + ε β = − ε 2. Homogenous Mudholkar and Hutson type. For this we set: i 1 , i 1 and γ = ( + ε) = i 1 /2 for i 1, 2, 3. This homogeneity results in a little simplification of (4), thus: ) *   − ( ε) = 1 ( + ε)3 +( − ε)3 + 2 + 2 3/2 ( > > ) f x, y; 4π 1 1 1 x y I x 0, y 0 ( − 2 2 3/2 + 1 ( − ε)3 + 1−ε + 1−ε ( < < ) 4π 1 1 1+ε x 1+ε y I x 0, y 0 ( − 2 3/2 + 1 ( + ε)3 + 2 + 1+ε ( > < ) 4π 1 1 x 1−ε y I x 0, y 0 ( − 2 3/2 + 1 ( − ε)3 + 1−ε + 2 ( < > ) 4π 1 1 1+ε x y I x 0, y 0 ( (6) − 2 3/2 + 1 ( + ε)3 + 1+ε + 2 ( < > ) 4π 1 1 1−ε x y I x 0, y 0 ( − 2 3/2 + 1 ( − ε)3 + 2 + 1−ε ( > < ) 4π 1 1 x 1+ε y I x 0, y 0 ( − 2 2 3/2 + 1 ( + ε)3 + 1+ε + 1+ε ( < < ) 4π 1 1 1−ε x 1−ε y I x 0, y 0 .

It is easy to see that the parameter ε is not identifiable in (6) because f (x, y; ε) = f (x, y; −ε). An adjustment to ensure identifiability involves introducing the constraint ε ≥ 0. α α α β β β γ = 3. Equal weights. In this case we assume that 1, 2, 3, 1, 2, 3 are positive numbers and i 1/2 for i = 1, 2, 3. ( −3/2 −3/2 α2 α 2 α 2 β2 β 2 β 2 1 3 3 3 3 3 3 f (x, y) = π α α 1 + α x + α y + β β 1 + β x + β y 4 1 2 1 2 1 2 1 2 × ( > > ) I x 0, y 0 ( −3/2 −3/2 β2 β 2 β 2 α2 α 2 α 2 1 3 3 3 3 3 3 + π α α 1 + α x + α y + β β 1 + β x + β y 4 1 2 1 2 1 2 1 2 × ( < < ) I x 0, y 0 ( −3/2 −3/2 (7) α2 α 2 α 2 β2 β 2 β 2 1 3 3 3 3 3 3 + π α β 1 + α x + β y + β α 1 + β x + α y 4 1 2 1 2 1 2 1 2 × ( > < ) I x 0, y 0 ( −3/2 −3/2 α2 α 2 α 2 β2 β 2 β 2 1 3 3 3 3 3 3 + π β α 1 + β x + α y + α β 1 + α x + β y 4 1 2 1 2 1 2 1 2 ×I (x < 0, y > 0) .

α β The pdf (7) is not identifiable because the values of i can be interchanged with those of i and f (x, y) does not change. Moreover, multiplying all of the α’s and β’s by a constant does not ( ) α = β ( = ) change f x, y . So, one way to get identifiability in the model (7)istoset i i i 1, 2, 3 and α 3 equal to 1. In that case, Equation (7) takes the form

 − 2 2 3/2 ( ) = 1 1 + x + y f x, y π α α 1 α α . 2 1 2 1 2

However, this is then recognizable as being simply a scaled version of the standard bivariate Cauchy density (compare with Equation (3)).

We now consider the marginal densities for the random variable (X, Y) defined by (2). From (2) | | | | | |   = U1 Z1 = U2 Z2 = Z1 = Z1 we have X | | and Y | | and the density of W1 | |   is a standard half-Cauchy U3 Z3 U3 Z3 Z3 Z3 ( )= 2 ( + 2)−1 density, i.e., fW1 w1 π 1 w1 .

112 Symmetry 2019, 11, 794

Consequently, the density of X is of the form −1 −1 α α 2 β β 2 ( ) = 2 γ γ 3 + 3 ( > ) + 2 γ ( − γ ) 3 + 3 ( < ) fX x π 1 3 α 1 α x I x 0 π 1 1 3 α 1 α x I x 0 1 1 1 1 ( −1 −1 α α 2 β β 2 2 3 3 3 3 + π (1 − γ ) γ β 1 + β x I (x < 0) +(1 − γ ) β 1 + β x I (x > 0) , 1 3 1 1 3 1 1

α β which is a mixture of half-Cauchy densities. The density of Y is then of the same form with 1, 1 α β replaced by 2, 2. To get a more direct bivariate version of the original Mudholkar-Hutson density in which the α’s, β’s and γ’s were related to each other, we go back to the original representation, i.e., Equation (2), α = ( + ε ) β = ( − ε ) γ = ( + ε ) α = ( + ε ) β = ( − ε ) but now we set 1 1 1 , 1 1 1 , 1 1 1 /2, 2 1 2 , 2 1 2 , γ = ( + ε ) α = β = γ = 2 1 2 /2 and 3 3 3 1. So the density is of the form

 − 2 2 3/2 ( ) = 1 + x + y ( > > ) f x, y π 1 + ε + ε I x 0, y 0 2 1 1 1 2  − 2 2 3/2 + 1 + x + y ( > < ) π 1 + ε − ε I x 0, y 0 2 1 1 1 2  − 2 2 3/2 + 1 + x + y ( < > ) π 1 − ε + ε I x 0, y 0 2 1 1 1 2  − 2 2 3/2 + 1 + x + y ( < < ) π 1 − ε − ε I x 0, y 0 2 1 1 1 2 with marginal density

 −  − 2 1 2 1 ( ) = 1 + x ( > ) + 1 + x ( < ) fX x π 1 + ε I x 0 π 1 − ε I x 0 . 1 1 1 1

γ = A more general version of the density with 3 1, is of the form: (without loss of generality set α = 3 1) ⎧ ⎫  − ⎨ 2 2 3/2 ⎬ ( ) = 2 γ γ 1 + x + y ( > > ) f x, y π ⎩ 1 2 α α 1 α α I x 0, y 0 ⎭ 1 2 1 2 ⎧ ⎫  − ⎨ 2 2 3/2 ⎬ + 2 γ ( − γ ) 1 + x + y ( > < ) π ⎩ 1 1 2 α β 1 α β I x 0, y 0 ⎭ 1 2 1 2 ⎧ ⎫  − ⎨ 2 2 3/2 ⎬ + 2 ( − γ )γ 1 + x + y ( < > ) π ⎩ 1 1 2 β α 1 β α I x 0, y 0 ⎭ 1 2 1 2 ⎧ ⎫   ⎨ 2 2 3/2 ⎬ + 2 ( − γ )( − γ ) 1 + x + y ( < < ) π ⎩ 1 1 1 2 β β 1 β β I x 0, y 0 ⎭ 1 2 1 2 with corresponding marginal

 −  − 2 1 2 1 ( ) = 2 γ 1 + x ( > ) + 2 ( − γ ) 1 + x ( < ) fX x π 1 α 1 α I x 0 π 1 1 β 1 β I x 0 . 1 1 1 1

113 Symmetry 2019, 11, 794

Needless to say we can consider analogous models in which, instead of assuming that | | | | ( )= Z1 Z2 W1, W2 |Z | , |Z | has a density of the form (3) we assume that it has another bivariate density +3 3 + + + with support R × R , e.g., bivariate normal restricted to R × R , or bivariate Pareto, etc. Another general bivariate MH model which includes the bivariate skew-Cauchy distribution | | ∼ χ2 given by (2) is the bivariate skew t model. This model can be obtained replacing Z3 by V3 ν in iid∼ ( ) ∼ χ2 Equation (2). Thus, if we assume that all six random variables Z1, Z2 N 0, 1 , V3 ν; U1, U2 and U3 are independent, and define | | | | ( )= U√1 Z1 U√2 Z2 X, Y ν , ν , (8) U3 V3/ U3 V3/ | | | | ( ) = √ Z1 √ Z2 then, because W1, W2 ν , ν has density V3/ V3/  −(ν+2)/2 2 w2 + w2 f (w , w ) = 1 + 1 2 I (w > 0, w > 0) , W1,W2 1 2 π ν 1 2 the density of (X, Y) is  f (x, y) = f (x, y |u , u , u ) f (u , u , u ) du du du  1 2 3 1 2 3 1 2 3 u2  = 3 f u3 x, u3 y u , u , u f (u , u , u ) du du du u1u2 W1,W2 u1 u2 1 2 3 1 2 3 1 2 3 −(ν+ )  2 2 2 2 /2 = 2 u3 + 1 u3 + 1 u3 ( ) π u u 1 ν u x ν u y f u1, u2, u3 du1du2du3 1 2 1 2 ( −(ν+2)/2 α2 α 2 α 2 2 3 1 3 1 3 = π γ γ γ α α 1 + ν α x + ν α y I(x > 0, y > 0) 1 2 3 1 2 1 2 ( −(ν+2)/2 β2 β 2 β 2 2 3 1 3 1 3 + π γ γ (1 − γ ) α α 1 + ν α x + ν α y I(x < 0, y < 0) 1 2 3 1 2 1 2 ( −(ν+2)/2 α2 α 2 α 2 2 3 1 3 1 3 + π γ (1 − γ )γ α β 1 + ν α x + ν β y I(x > 0, y < 0) 1 2 3 1 2 1 2 ( −(ν+2)/2 (9) β2 β 2 β 2 2 3 1 3 1 3 + π γ (1 − γ )(1 − γ ) α β 1 + ν α x + ν β y I(x < 0, y > 0) 1 2 3 1 2 1 2 ( −(ν+2)/2 α2 α 2 α 2 2 3 1 3 1 3 + π (1 − γ )γ γ β α 1 + ν β x + ν α y I(x < 0, y > 0) 1 2 3 1 2 1 2 ( −(ν+2)/2 β2 β 2 β 2 2 3 1 3 1 3 + π (1 − γ )γ (1 − γ ) β α 1 + ν β x + ν α y I(x > 0, y < 0) 1 2 3 1 2 1 2 ( −(ν+2)/2 α2 α 2 α 2 2 3 1 3 1 3 + π (1 − γ )(1 − γ )γ β β 1 + ν β x + ν β y I(x < 0, y < 0) 1 2 3 1 2 1 2 ( −(ν+2)/2 β2 β 2 β 2 2 3 1 3 1 3 + π (1 − γ )(1 − γ )(1 − γ ) β β 1 + ν β x + ν β y I(x > 0, y > 0) . 1 2 3 1 2 1 2

The pair of variables in (8) allows one to model a wider variety of paired data sets than that given in (2) because it also can model light and heavy “tails” in a different way for each quadrant of the coordinate axis. Additionally, it is possible to compute the r−order moments.

Proposition 3. The expected value of (X, Y) in (8) is given by √ ν − γ γ Γ (ν − ) E ( ) = √ (β ( − γ ) − α γ β ( − γ ) − α γ ) 1 3 − 3 /2 1/2 X, Y 1 1 1 1 1, 2 1 2 2 2 β α Γ (ν ) , π 3 3 /2 provided that ν > 1.

114 Symmetry 2019, 11, 794

√ √ = E ( ) E (| |) = (β ( − γ ) − α γ ) π E −1 = Proof. For i 1, 2 it is immediate that Ui Zi i 1 i i i 2/ and U3 ( − γ ) β − γ α 1 3 / 3 3/ 3. ∼ χ2 On the other hand, since V3 ν then √ √ √ ∞ ν/2−1 ν ν x − ν>1 νΓ (ν/2 − 1/2) E √ = √ e x/2dx = √ . ν/2Γ (ν ) V3 0 x 2 /2 2Γ (ν/2)

Therefore, from (8) we have √ ν E ( ) = (E ( ) E (| |) E ( ) E (| |)) E 1 E √ X, Y U1 Z1 , U2 Z2 U3 V3 and the result is obtained straightforward.

Following the proof of the previous proposition it is possible to obtain the r−order moments provided that ν > r. In applications, it will usually be appropriate to augment these models by the introduction of location, scale and rotation parameters, i.e., to consider ' ' X, Y = μ + Σ1/2 (X, Y) , where μ ∈ (−∞, ∞)2 and Σ is positive definite.

5. Application The data that we will use were collected by the Australian Institute of Sport and reported by Cook and Weisberg (1994) [ 10]. The data set consists of values of several variables measured on n = 202 Australian athletes. Specifically, we shall consider the pair of variables (Ht,Wt) which are the height (cm) and the weight (Kg) measured for each athlete. We fitted the bivariate Mudholkar-Hutson distributions for five different cases. In addition to ( α β γ ν μ Σ) = the general case which is given by the pdf f x, y; i, i, i, , , , based on (9), where i 1, 2, 3, μ = (μ μ ) and 1, 2 is the location parameter and  Σ Σ Σ = 11 12 Σ Σ 21 22 is the symmetric positive definite scale matrix, we also consider four special cases:

1. When the pdf is given by Equation (4). That is taking ν → ∞ in the bivariate skew t MH model. 2. When the pdf is given by 1 + ε f x, y; ε , μ , Σ , Σ , Σ = f x, y;1+ ε ,1− ε , i , μ, Σ , i j 11 22 12 i i 2 |ε | < which is the bivariate MH distribution specified using the special case (1.), where i 1 for all i = 1, 2, 3. 3. When the pdf is given by 1 + ε f x, y; ε, μ , Σ , Σ , Σ = f x, y;1+ ε,1− ε, , μ, Σ , j 11 22 12 2

which is the bivariate MH distribution specified using the special case (2.), where |ε| < 1.

115 Symmetry 2019, 11, 794

4. When the pdf is given by 1 f x, y; μ , Σ , Σ , Σ = f x, y;1,1, , μ, Σ , j 11 22 12 2

which is the bivariate Cauchy distribution (as in (3)).

All fits were done by maximizing the likelihood through numerical methods which combine algorithms based on the Hessian matrix and the Simulated Annealing algorithm. Standard errors of the estimations were computed based on 1000 bootstrap data samples. To compare model fits, we used the Akaike criterion (see Akaike, 1974 [11]), namely   AIC = −2ln L θˆ + 2k, where k is the dimension of θ which is the vector of parameters of the model being considered. Table 1 displays the results of the fits for 100 women. In Table 1, we see the results of fitting the five competing models. They are the general bivariate skew−t MH model with pdf given in Equation (9), General bivariate MH with pdf given in Equation (4), Bivariate MH (1.) with pdf given in Equation (5), Bivariate MH (2.) with pdf given in Equation (6) and Bivariate Cauchy. It shows the maximum likelihood estimates (mle’s) of the five models. The last column shows the estimated standard errors (se) of the estimates. Non identifiability of the bivariate skew−t MH model is evidenced by the huge value of the estimated standard error of the estimate of ν. However, the AIC criterion indicates that data are better fitted by the general bivariate skew−t MH (9) model. Figure 2 shows the contour lines of the fitted pdf.

Table 1. Bivariate Mudholkar-Hutson fits for women.

Bootstrap Standard Errors Model AIC (αi, βi, γi, μ, Σ) Estimates of the Estimates (α α α ) = ( ) ( ) 1, 2, 3 13.48, 16.62, 1.25 10.02, 29.96, 8.05 (β β β ) = ( ) ( ) 1, 2, 3 10.65, 36.47, 3.00 1.53, 17.79, 19.75 Bivariate (γ , γ , γ ) = (1.00, 0.97, 0.94) (0.006, 0.02, 0.03) 1354.5 1 2 3 skew t MH ν = 10.93 168.56 (μ μ ) = ( ) ( ) 1, 2 161.97, 52.50 2.18, 2.22 (Σ Σ Σ ) = ( ) ( ) 11, 22, 12 1.82, 1.67, 1.58 17.80, 14.36, 14.33 (α α α ) = ( ) ( ) 1, 2, 3 13.26, 16.04, 1.05 2.70, 4.30, 0.90 (β β β ) = ( ) ( ) 1, 2, 3 10.64, 36.50, 4.99 0.92, 3.85, 3.64 General bivariate (γ γ γ ) = ( ) ( ) 1413.5 1, 2, 3 1.00, 0.98, 0.94 0.003, 0.02, 0.02 MH (μ μ ) = ( ) ( ) 1, 2 161.69, 50.90 1.79, 2.39 (Σ Σ Σ ) = ( ) ( ) 11, 22, 12 1.13, 1.17, 1.09 1.57, 1.59, 1.47 (ε ε ε ) = ( − ) ( ) 1, 2, 3 0.85, 0.83, 1.00 0.07, 0.08, 0.003 Bivariate (μ μ ) = ( ) ( ) 1429.4 1, 2 186.30, 83.80 1.45, 1.47 MH (1.) (Σ Σ Σ ) = ( ) ( ) 11, 22, 12 169.37, 334.65, 224.88 33.30, 51.34, 36.83 ε = 0.50 0.15 Bivariate (μ μ ) = ( ) ( ) 1453.8 1, 2 171.60, 62.30 1.86, 2.80 MH (2.) (Σ Σ Σ ) = ( ) ( ) 11, 22, 12 32.49, 61.40, 33.96 8.59, 16.41, 10.87 (μ μ ) = ( ) ( ) Bivariate 1, 2 177.21, 69.08 1.64, 2.01 1665.5 (Σ Σ Σ ) = ( ) ( ) Cauchy 11, 22, 12 77.84, 124.86, 41.53 2.23, 1.39, 8.32

Table 2 displays the results of the fits for 102 men. In Table 2 we again compare five competing models. The maximum likelihood estimates (mle’s) of the five models, the corresponding Akaike criterion values and estimated standard errors of the estimates are displayed in the table. Again, the AIC indicates that the data are better fitted by the general bivariate skew−t MH (9) model. Figure 3 shows the contour lines of the fitted pdf. Non identifiability of the general bivariate skew −t MH (9)

116 Symmetry 2019, 11, 794 and General bivariate MH (4) models is shown by the large values of the estimated standard errors for some estimates.

Table 2. Bivariate Mudholkar-Hutson fits for men.

Bootstrap Standard Errors Model AIC (αi, βi, γi, μ, Σ) Estimates of the Estimates (α α α ) = ( ) ( ) 1, 2, 3 13.43, 20.16, 2.62 14.66, 5.34, 4.12 (β β β ) = ( ) ( ) 1, 2, 3 35.02, 11.84, 6.01 8.42, 1.82, 7.83 Bivariate (γ , γ , γ ) = (0.94, 0.96, 0.88) (0.02, 0.02, 0.04) 1396.9 1 2 3 skew t MH ν = 9.99 64.18 (μ μ ) = ( ) ( ) 1, 2 177.92, 70.30 0.65, 1.22 (Σ Σ Σ ) = ( ) ( ) 11, 22, 12 4.77, 4.71, 4.04 1.89, 6.12, 1.83 (α α α ) = ( ) ( ) 1, 2, 3 52.54, 52.17, 14.58 0.89, 1.28, 0.43 (β β β ) = ( ) ( ) 1, 2, 3 29.83, 62.76, 31.76 1.52, 1.39, 0.59 General bivariate (γ γ γ ) = ( ) ( ) 1441.8 1, 2, 3 0.96, 0.98, 0.96 0.01, 0.01, 0.02 MH (μ μ ) = ( ) ( ) 1, 2 174.60, 66.6 1.13, 1.16 (Σ Σ Σ ) = ( ) ( ) 11, 22, 12 9.32, 17.40, 11.78 0.76, 0.50, 0.55 (ε ε ε ) = (− − − ) ( ) 1, 2, 3 0.87, 0.91, 1.0 0.06, 0.05, 0.00 Bivariate (μ μ ) = ( ) ( ) 1456.2 1, 2 172.70, 61.00 1.39, 1.94 MH (1.) (Σ Σ Σ ) = ( ) ( ) 11, 22, 12 195.26, 445.26, 279.93 32.88, 62.62, 38.52 ε = 0.17 0.21 Bivariate (μ μ ) = ( ) ( ) 1512.6 1, 2 185.10, 80.54 2.73, 3.54 MH (2.) (Σ Σ Σ ) = ( ) ( ) 11, 22, 12 36.15, 70.27, 38.53 14.66, 26.93, 19.09 (μ μ ) = ( ) ( ) Bivariate 1, 2 184.45, 79.36 3.01, 3.43 1775.2 (Σ Σ Σ ) = ( ) ( ) Cauchy 11, 22, 12 65.57, 155.19, 59.31 3.62, 1.54, 8.29

&RQWRXUOLQHVRIILWWHGSGI 







 :W .J 





        +W FP

Figure 2. Female Australian athletes data: (Ht, Wt) and fitted General bivariate MH.

117 Symmetry 2019, 11, 794

&RQWRXUOLQHVRIILWWHGSGI









 :W .J 







      +W FP

Figure 3. Male Australian athletes data: scatter plot (Ht, Wt) and fitted General bivariate MH.

Table 3 displays the results of the fits for the full set of n = 202 Australian athletes, regardless of gender. Table 3 shows the maximum likelihood estimates (mle’s) of the five models together with the corresponding Akaike criterion values and estimated standard errors of the estimates. The AIC here also indicates that the data are better fitted by the general bivariate skew −t MH model (9). Figure 4 shows the contour lines of the fitted pdf. Thus, in all three cases, males, females and combined, the best fitting model was the general bivariate skew−t MH.

Table 3. Bivariate Mudholkar-Hutson fits.

Bootstrap Standard Errors Model AIC (αi, βi, γi, μ, Σ) Estimates of the Estimates (α α α ) = ( ) ( ) 1, 2, 3 14.15, 17.35, 1.09 8.18, 10.35, 3.57 (β β β ) = ( ) ( ) 1, 2, 3 49.35, 67.05, 1.62 5.22, 5.57, 6.86 Bivariate (γ , γ , γ ) = (0.98, 0.96, 0.91) (0.01, 0.01, 0.02) 2803.3 1 2 3 skew t MH ν = 10.93 9.42 (μ μ ) = ( ) ( ) 1, 2 171.09, 61.00 0.24, 0.76 (Σ Σ Σ ) = ( ) ( ) 11, 22, 12 1.01, 1.32, 1.04 5.74, 7.75, 6.07 (α α α ) = ( ) ( ) 1, 2, 3 15.42, 17.67, 2.25 10.74, 12.92, 2.61 (β β β ) = ( ) ( ) 1, 2, 3 49.28, 66.53, 3.36 5.00, 5.97, 5.10 General bivariate (γ γ γ ) = ( ) ( ) 2901.4 1, 2, 3 0.98, 0.96, 0.88 0.01, 0.02, 0.02 MH (μ μ ) = ( ) ( ) 1, 2 170.54, 60.95 0.54, 1.02 (Σ Σ Σ ) = ( ) ( ) 11, 22, 12 2.16, 3.02, 2.35 1.55, 1.84, 1.32 (ε ε ε ) = ( − ) ( ) 1, 2, 3 1.00, 0.39, 0.13 0.01, 0.07, 0.06 Bivariate (μ μ ) = ( ) ( ) 2984.0 1, 2 181.06, 74.48 0.76, 0.84 MH (1.) (Σ Σ Σ ) = ( ) ( ) 11, 22, 12 11.41, 50.90, 18.73 1.59, 5.00, 2.17 ε = 0.70 0.10 Bivariate (μ μ ) = ( ) ( ) 3022.2 1, 2 171.37, 61.32 2.42, 3.07 MH (2.) (Σ Σ Σ ) = ( ) ( ) 11, 22, 12 48.62, 103.69, 64.24 53.89, 89.24, 68.07 (μ μ ) = ( ) ( ) Bivariate 1, 2 178.84, 71.81 0.83, 1.55 3536.9 (Σ Σ Σ ) = ( ) ( ) Cauchy 11, 22, 12 106.75, 199.84, 83.64 2.16, 1.16, 5.53

118 Symmetry 2019, 11, 794

&RQWRXUOLQHVRIILWWHGSGI











:W .J 









       +W FP

Figure 4. Australian athletes data: scatter plot (Ht, Wt) and fitted General bivariate MH. Red point for Men and sign + for women.

For the three real cases analyzed the general bivariate skew-t MH model (9) was indicated as the best fitted. That means that it seems worth considering the more general model to explain the variability of these data sets.

6. Concluding Remarks The Mudholkar–Hutson skewing mechanism admits flexible extensions in both the univariate and multivariate cases. Stochastic representations of such extended models typically have corresponding likelihood functions that are somewhat complicated (for example Equations (4) and (9)). This is partly compensated for by the ease of simulation for such models using the representation in terms of latent variables ( the Ui’s and Zi’s in (2)or(8)). The data set analyzed in Section 5 illustrates the potential advantage of considering the extended bivariate Mudholkar-Hutson model, since the basic bivariate M-H model does not provide an acceptable fit to the “athletes” data. Needless to say, applying Occam’s razor, it would always be desirable to consider the hierarchy of the five nested bivariate models that were considered in Section 5, in order to determine whether one of the simpler models might be adequate to describe a particular data set. Indeed there may be other special cases of the model (4), intermediate between (4) and (5) say, that might profitably be considered for some data sets. However, it should be remarked that selection of such sub-models for consideration should be done prior to inspecting the data. The addition of the GBMH model to the data analysts tool-kit should provide desirable flexibility for modeling data sets which exhibit behavior somewhat akin to, but not perfectly adapted to, more standard bivariate Cauchy models.

Author Contributions: All the authors contributed significantly to this research article. Funding: This research received no external funding. Acknowledgments: The research of H. Varela and H.W. Gómez was supported by Grant SEMILLERO UA-2019 (Chile). Conflicts of Interest: The authors declare no conflict of interest.

119 Symmetry 2019, 11, 794

References

1. Mudholkar, G.S.; Hutson A.B. The epsilon-skew-normal distribution for analyzing near-normal data. J. Stat. Plan. Inference 2000, 83, 291–309. [CrossRef] 2. Arellano-Valle, R.B.; Gómez, H.W.; Quintana, F.A. Statistical inference for a general class of asymmetric distributions. J. Stat. Plan. Inference 2005, 128, 427–443. [CrossRef] 3. Gómez, H.W.; Torres, F.J.; Bolfarine, H. Large-Sample Inference for the Epsilon-Skew-t Distribution. Commun. Stat. Theory Methods 2007, 36, 73–81. [CrossRef] 4. Hansen, B.E. Autoregressive conditional . Int. Econ. Rev. 1994, 35, 705–730. [CrossRef] 5. Gómez, H.W.; Torres, F.J.; Bolfarine, H. Letter to the Editor. Commun. Stat. Theory Methods 2008, 37, 986. [CrossRef] 6. Johnson, N.; Kotz, S.L.; Balakrishnan, N. Continuous Univariate Distributions, 2nd ed.; John Wiley & Sons: New York, NY, USA, 1994; Volume 1. 7. Behboodian, J.; Jamalizaded, A.; Balakrishnan, N. A new class of skew-Cauchy distributions.Stat. Probab. Lett. 2006, 76, 1488–1493. [CrossRef] 8. Huang, W.J.; Chen, Y.H. Generalized skew-Cauchy distribution. Stat. Probab. Lett. 2007, 77, 1137–1147. [CrossRef] 9. Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Stat. 1985, 12, 171–178. 10. Cook, R.D.; Weisberg, S. An Introduction to Regression Graphics; John Wiley & Sons: New York, NY, USA, 1994. 11. Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [CrossRef]

c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

120

S symmetry S

Article A Note on Ordering Probability Distributions by Skewness

V. J. García 1,†, M. Martel-Escobar 2,† and F. J. Vázquez-Polo 2,*,† 1 Department of Statistics and Operations Research, University of Cádiz, 11002 Cádiz, Spain; [email protected] 2 Department of Quantitative Methods & TiDES Institute, University of Las Palmas de Gran Canaria, 35017 Las Palmas de Gran Canaria, Las Palmas, Spain; [email protected] or [email protected] * Correspondence: [email protected]; Tel.: +34-928-451-800 † These authors contributed equally to this work.

Received: 15 June 2018; Accepted: 10 July 2018; Published: 16 July 2018

Abstract: This paper describes a complementary tool for fitting probabilistic distributions in data analysis. First, we examine the well known bivariate index of skewness and the aggregate skewness function, and then introduce orderings of the skewness of probability distributions. Using an example, we highlight the advantages of this approach and then present results for these orderings in common uniparametric families of continuous distributions, showing that the orderings are well suited to the intuitive conception of skewness and, moreover, that the skewness can be controlled via the parameter values.

Keywords: positive and negative skewness; ordering; fitting distributions

MSC: 62E10; 62P99

1. Introduction Detailed knowledge of the characteristics of probability models is desirable (if not essential) if data are to be modeled properly. In studying these properties, many authors have considered orderings within families, according to diverse measuring criteria. The usual approach taken by researchers in this field is to evaluate or measure one or more theoretical characteristics of a given distribution and to study the effect produced by the value of its parameters on this measurement. In , stochastic orders are widely used in order to make risk comparisons [1]. Some parametric distributions can be ordered according to the evaluation made of a given property, merely by comparing some of its parameters. Although most related orders are actually preorders, each one presents interesting applications. Many studies have been conducted in this area, and the following are particularly significant: Lehmann (1955) [ 2], which is of seminal importance; Arnold (1987)3 [], who compared random variables according to stochastic ordering in a particular Lorenz order; Shaked and Shanthikumar (2006)1 [ ], on stochastic orders; Nanda and Shaked (2001)4 [ ], on reversed hazard rate orders; Ramos-Romero and Sordo-Díaz (2001)5 [], on the likelihood ratio order; and Gupta and Aziz (2010) [6], on convex orders. In this paper, we study the relationship between the skewness of some parametric distributions and the value of one of their parameters. The first question to be addressed is that of measuring the skewness. In this respect, Oja (1981) [7] introduced a set of axioms to be verified by any measurement of skewness considered. These axioms were established for indexes of skewness with one main constraint: that the skewness of a distribution should be evaluated by a single real number. This point is discussed below.

Symmetry 2018, 10, 286; doi:10.3390/sym10070286121 www.mdpi.com/journal/symmetry Symmetry 2018, 10, 286

Many authors have proposed and obtained different descriptive elements to measure skewness (see, for instance, [8–13]). Ref [10] suggested a measurement of skewness corresponding to the (unique) mode, M, given by the following index:

γ ( ) = − ( ) M F 1 2F M . (1)

Ref [10] applied this index to ordering the gamma, log-logistic, lognormal and Weibull families of distributions by their skewness, taking into account the feasible values of their respective parameters. Index (1), which is proven to satisfy those axioms derived from Oja (1981) [ 7], is also recommended in [14] as a (very) good index of skewness. However, notice that (1) only compares the probability weight on the left side of a central point (the mode) with the value/ 12, but it does not account for how the weights are distributed to each side of the centre. García et al. (2015) [ 15] introduced some further elements to be incorporated into the list of skewness measurements of a probability distribution. According to these authors, given a unimodal probability distribution F (x), its skewness is considered to be a local function of a given distance, z, from the mode, M. For such a distance, and given the interval[M − z, M + z], the aggregate skewness ν ( ) function, F z , compares the probability weight of F at either side of the interval:

ν ( ) = ( > + ) − ( < − ) F z Pr X M z Pr X M z , (2) where z ≥ 0. Thus, the (maximum) right skewness of the distribution F and its (minimum) left skewness are respectively given by

+ ( ) = ν ( ) − ( ) = ν ( ) S F max F z , S F min F z . (3) z≥0 z≥0

The distances,zp and zn, where these extreme values are achieved, are termed the critical distances [− ] ν (∞) = to the mode. As the skewness function is bounded inside the interval 1, 1 and F 0, − + the bivariate indexS (F) , S (F) belongs to [−1, 0] × [0, 1]. A given distribution function F such ν ( ) ≥ ≥ ν ( ) ≤ ≥ thatF z 0 for all z 0 is said to be only skewed to the right; and if F z 0 for all z 0, it is said to be only skewed to the left. < − −1 [ ( )] The relationship F c G (Fc precedes G) means that G F x is a convex function. For a − + continuous distributionF, the bivariate measurement of skewness S (F) , S (F) verifies the following properties, whereaF + b and −F mean the distributions of the corresponding transformation of a random variable that is F−distributed: − + 1. S (F) , S (F) = 0, for any symmetric distribution F. − + − + 2. S (aF + b) , S (aF + b) = S (F) , S (F) , for all a > 0, −∞ < b < ∞. − + + − 3. S (−F) , S (− F) = −S (F) , −S (−F) . − + − + 4. If F

These properties can be considered as a vectorial interpretation of the axioms given by Oja (1981) [7]. ν ( ) = γ ( ) As it is easily proven that F 0 M F , we can establish that (2) and (3) give considerably clearer and more complete information than (1) about the skewness of any distribution function. Most families of continuous distributions are only skewed to the right (or only to the left), while doubles-sign skewness is abundant within the discrete families, as shown in [ 15]. Nevertheless, the joint use of the function (2) and the bivariate index (3) makes it possible to improve the ordering of the skewness-based distribution discussed in [10], as can be seen in the following example.

122 Symmetry 2018, 10, 286

Example 1. Assume the following random variable X ∈ [−2, ∞) with PDF given by: ⎧ ⎪ 1 1 ⎨⎪ x + , −2 ≤ x < 0; 4 2 f (x) = ⎪ ⎩⎪ 1 exp (−x) , x ≥ 0. 2 ( ) = − γ ( ) = = γ ( ) Assume also the PDF, g y of Y X. Then, M F 0 M G . That is, according to coefficient γ (·) M , both distributions have the same null skewness, although they do not even have a symmetric support set. However, using expression (2), we find that ⎧ ⎪ 1 1 2 ⎨⎪ exp (−z) − (z − 2) ,0≤ z < 2; 2 8 ν (z) = F ⎪ ⎩⎪ 1 exp (−z) , z ≥ 2, 2 ν ( ) = −ν ( ) ≥ and G z F z , for all z 0. These functions are plotted in Figure 1, where it can be seen that ν ( ) ≥ ν ( ) ≥ + ( ) = − − ( ) = − ( ) = = + ( ) F z G z for all z 0, S F S G 0 and S F 0 S G . Clearly, the information + − about skewness obtained from the aggregate skewness function ν (z) and the indices S (·) and S (·) is γ (·) considerably more comprehensive than that obtained from M .

ν ( ) ν ( ) Figure 1. Skewness functions F z and G z in Example 1.

Outline In applied statistical analysis, it is useful to have a large catalogue of plausible distributions with which to fit the data. According to García et al. (2015) [ 15], common measures of skewness can be complemented with a bivariate index of positive–negative skewness, and the authors show that the mode is the relevant central value to study both right and left skewness. In this paper, we extend the tool-box approach to fit data from probability distributions, introducing two orderings that are deduced from the skewness measures given in15 [ ]. The first of those orderings is based on the positive γ ( ) part of the bivariate index of skewness, which in many instances coincides with the well known M F . Nevertheless, the differences can be highly significant, as in the previous example. The second, more

123 Symmetry 2018, 10, 286

ν ( ) noteworthy, order is based on the skewness function F z and meets the first of the conditions, but not the reciprocal. There are two reasons for ordering a family of distributions according to a given measurement of skewness. Firstly, as a property of the distribution, this ordering allows us to control its skewness by the appropriate selection of the parameter. When this is done (and the parameter is readily determined), the theoretical results have immediate applications in the data-fitting process. Secondly, when a given family of distributions is conceived as being more or less skewed according to the value of a parameter, and a measurement of skewness ratifies the ordering, it may be concluded that the functioning of this measurement provides a reasonably good fit with an intuitive conception of skewness. The rest of this paper is organized as follows. In Section 2, we study the aggregate skewness function and the resultant skewness-based ordering of the gamma, log-logistic, lognormal, Weibull and asymmetric Laplace families of continuous probability distributions. In Section 3, we study the ordering of two of the most well-known distributions commonly used in PERT methods: the beta and the asymmetric triangular distributions. Finally, conclusions are presented in Section 4.

2. Families of Uniparametric Distributions Ordered by Skewness Let F and G be unimodal distribution functions, with no centre or scale parameters, and modes MF and MG, respectively. We compare their respective skewness by two different criteria.

Definition 1. If ν ( ) − ν ( ) ≥ ∀ ≥ F z G z 0, z 0, (4) then we say that F has equal or more aggregate skewness to the right at any point than G. We denote this by F ≥ν G.

Definition 2. If F and G are both skewed only to the right, we say thatF has equal or more maximum aggregate skewness to the right than G when + + S (F) − S (G) ≥ 0, (5) and we denote this by F ≥+ G.

With these definitions, it immediately follows that:

Proposition 1. If F ≥ν G, then F ≥+ G. The reverse implication is not true in general.

Proof. The proof follows immediately from the definitions given in (4) and (5).

In the next section, we consider some well known uniparametric families of continuous distributions, with no centre or scale parameters but depending on a skewness parameter, and examine whether they are ordered by aggregate skewness, or by maximum aggregate skewness. The gamma family is a very broad one, which includes many other well known distributions as particular cases. A study of the log-logistic, lognormal, Weibull and asymmetric Laplace families, one by one and in turn, when not included inside the previous one, will produce widely varying results.

2.1. Uniparametric Gamma Distributions Let X be a uniparametric gamma distributed random variable, G(α). That is, its CDF G (x; α) is given by 1 x α − G (x; α) = t e tdt, (6) Γ (1 + α) 0

124 Symmetry 2018, 10, 286 for x > 0, where −1 < α < ∞, and the mode is given by M = max {α,0}. Then, for −1 < α ≤ 0, the density function decreases on x along the positive real line and we obtain that

∞ ν ( α) = − ( α) = 1 α −t G z; 1 G z; t e dt. Γ (1 + α) z ν ( α) + ( (α )) = ν ( α ) = In these cases, G z; is a decreasing function on z, and S G 1 G 0; 1 1.

(α ) (α ) Proposition 2. Let G 1 and G 2 be gamma distributions with CDF as in (6). Then: − < α < α < (α ) ≥ (α ) 1. If 1 1 2 0, then G 2 ν G 1 . < α < α (α ) ≥ (α ) 2. If 0 1 2, then G 1 + G 2 .

Proof. Part 1. We can write

ν ( α ) − ν ( α ) = ( α ) − ( α ) G z; 1 G z; 2 G z; 2 G z; 1 .

d By denoting α = α + ε, ε > 0, and then considering u (z) = [ν (z; α ) − ν (z; α )] , we obtain 2 1 dz G 1 G 2 α −z α −z ε z 2 e z 1 e α − z Γ (1 + α ) − Γ (1 + α ) ( ) = − = 1 z 1 2 u z Γ ( + α ) Γ ( + α ) z e Γ ( + α ) Γ ( + α ) . 1 2 1 1 1 2 1 1 ( ) = Therefore, u z 0 when Γ ( + α ) 1/ε = = 1 2 z z0 Γ ( + α ) , 1 1 ( ) < < > ν ( + α ) − ν ( + α ) = u z is negative for 0 z z0, and positive for z z0. Also, G 0 ; 1 G 0 ; 2 0, ν (∞ α ) − ν (∞ α ) = G ; 1 G ; 2 0. Then, 1 x0 α − ε ν ( α ) − ν ( α ) = 1 t [Γ ( + α ) − Γ ( + α )] G z0; 1 G z0; 2 Γ ( + α ) Γ ( + α ) t e 1 1 t 1 2 dt, 1 1 1 2 0 is the integral of a negative function, so it is negative, and the proof is complete. Part 2. For 0 < α < ∞, we have that ⎧  ∞  α− ⎪ 1 α −t − z α −t ≤ < α ⎨ Γ(1+α) z+α t e dt 0 t e dt ,0 z , ν ( α) = G z; ⎪  ∞ ⎩ 1 α −t ≥ α Γ(1+α) z+α t e dt, z . and, ⎧ ⎪ α −(α− ) α −(α+ ) ⎨⎪ 1 (α − z) e z − (α + z) e z ,0≤ z < α, ν Γ(1+α) d G = dz ⎪ ⎩ −1 α −z ≥ α Γ(1+α) z e , z . ν < ≥ α ≤ < α Then, clearly we have that d G/dz 0 for all z . For 0 z , if we denote

α α − w (z) = (α − z) ez − (α + z) e x,

ν ( ) ( ) = (α) = − ( α)α −α < then the sign of d G/dz is the sign of w x .Asw 0 0, w 2 e 0, and

dw α− − α− = z (z + α) 1 e z − z (α − z) 1 ez ≤ 0, dz

125 Symmetry 2018, 10, 286

ν ≥ + ( (α)) = ν ( α) we conclude that G is a decreasing function on z 0 and S G G 0; .

+ Γ (1 + α, α) S (G (α)) = , Γ (1 + α)

+ where Γ (1 + α, α) is the incomplete Gamma function, and then S (G (α)) is a decreasing function α α → ∞ ν ( α ) < α < α on , when . Nevertheless, a simple plotting of the functionals G z; i for any 0 1 2 shows that both functionals cross each other and that they are not ordered by ” ≥ν”. Thus, the proof is completed.

2.2. Log–Logistic Distributions The CDF of a uniparametric log–logistic distributed random variable X is given by −1 ( θ) = + −θ FLL x; 1 x , (7) for x > 0, with θ > 0. The mode of these distributions depends on θ.If0< θ ≤ 1, then M = 0, and

1 ν (z; θ) = , LL 1 + zθ + ( (θ)) = ν ( θ) θ and S FLL 1. The functionals LL z; for different values of inside the rank cross each other at z = 1, and these distributions are ordered neither by skewness function nor by skewness indexes. Nevertheless, for θ > 1, the mode is

θ θ − 1 1/ 0 < M = < 1. (8) θ + 1

Notice that M is an increasing function of θ when θ > 1, because dM 2 1 θ − 1 = M · − ln > 0. (9) dθ (θ2 − 1) θ θ2 θ + 1

When θ > 1, it is also known from Arnold and Groeneveld (1995) that

ν ( θ) = 1 LL 0; θ .

ν ( θ) < θ < θ (θ ) ≥ (θ ) As LŁ z; is a decreasing function, it is then stated that 1 1 2 implies FLL 1 + FLL 2 . Furthermore, the skewness functions are ordered, as we prove below.

(θ ) (θ ) < θ < θ Proposition 3. Let be FLL 1 and FLL 2 log-logistic distributions with CDF as in (7), where 1 1 2. Then, (θ ) ≥ (θ ) FLL 1 ν FLL 2 . (10)

Proof. Let θ > 1. Then,

⎧ θ ⎪ 1 − M2 − z2 ⎪ ≤ ≤ ⎨⎪ θ θ θ ,0 z M, 1 + (M + z) + (M − z) + (M2 − z2) ν ( θ) = LL z; ⎪ ⎪ ⎪ 1 > ⎩ θ , z M, 1 + (M + z)

126 Symmetry 2018, 10, 286

< θ < θ < < < If we consider 1 1 2, such that the respective modes verify 0 M1 M2 1, we can then denote θ θ = ( − ) 1 < = ( + ) 1 a M1 z b M1 z , θ θ = ( − ) 2 < = ( + ) 2 c M2 z d M2 z . and consider the function h given by

θ h (θ) = (M ± z) ,0≤ z ≤ M, with M as in (8). Then, ( ± )θ−1 θ θ + dh = M z 2 + θ ( + θ)2 ( ± ) ( ± ) + ( + θ)2 1 θ− 1 M z ln M z 1 M ln , dθ θ (θ + 1)2 M 1 θ − 1 < < < ν ( θ ) − ν ( θ ) For z M, this implies that a c, b d. With this notation, we can write LL z; 1 LL z; 2 as follows. ≤ ≤ Firstly, for 0 z M1,

1 − ab 1 − cd ν (z; θ ) − ν (z; θ ) = − LL 1 LL 2 (1 + a)(1 + b) (1 + c)(1 + d) (c − a) + (d − b) + ac (d − b) + bd (c − a) + 2 (cd − ab) = > 0. (a + 1)(b + 1)(c + 1)(d + 1) < ≤ − Secondly, for M1 z M2, we only need to compare d b, because

1 1 − cd ν (z; θ ) − ν (z; θ ) = − LL 1 LL 2 1 + b (1 + c)(1 + d) c + (d − b) + 2cd + bcd = > 0. (1 + b)(1 + c)(1 + d) > Finally, when z M2,

1 1 d − b ν (z; θ ) − ν (z; θ ) = − = > 0. LL 1 LL 2 1 + b 1 + d (1 + b)(1 + d)

Hence, the proof is completed.

2.3. Lognormal Variance Distributions ( σ) = Φ ln x LN x; σ , (11) σ > Φ (·) for x, 0, where is the standard normal distribution function. The mode is given by Mσ = exp −σ2 and       + −σ2 −σ2 − ν ( σ) = − Φ ln z exp − Φ ln exp z LN z; 1 σ σ .

(σ ) (σ ) < σ < σ Proposition 4. Let LN 1 and LN 2 be lognormal distributions with CDF as in (11), where 0 1 2. Then, (σ ) ≥ (σ ) LN 2 ν LN 1 .

127 Symmetry 2018, 10, 286

< σ < σ > Proof. For 0 1 2, the corresponding modes are M1 M2, and ( + ) ( + ) Φ ln z M1 > Φ ln z M2 σ σ , 1 2 (− + ) (− + ) Φ ln z M1 > Φ ln z M2 σ σ , 1 2 Φ ν ( σ ) > ν ( σ ) > because is a strictly increasing function. Thus, we obtain that LN z; 1 LN z; 2 for all z 0 and the proof is completed.

2.4. Uniparametric Weibull Distributions Consider the uniparametric Weibull distributions family given by the CDF

W (x; c) = 1 − exp (−xc) , x > 0, c > 0. (12)

The mode is known to be at 0, for c ≤ 1 (as a limit, when c < 1) and at c − 1 1/c 0 < M = < 1, c c > ν for c 1. The expression for W is given by ⎧     ⎪ c c ⎨ exp − (Mc + z) + exp − (Mc − z) − 1, 0 < z < Mc ν ( ) = W z; c ⎪   ⎩ c exp − (Mc + z) , z ≥ Mc

< ν ( ) = −1 On the one hand, when c 1, note that W(c) 1 e , so all these functions intersect at this + point. Graphically, it can be seen that there is no ordering by≥ “ ν”, and also thatS (W (c)) = 1, when < ≤ < c 1. On the other hand, for 1 c1 c2, the following result is obtained.

( ) ( ) ≤ < Proposition 5. Let W c1 and W c2 be Weibull distributions with 1 c1 c2 and CDF as in (12). Then,

( ) ≥ ( ) W c1 ν W c2 .

≤ < < < < Proof. For 1 c1 c2, the corresponding modes are M1 M2. Then, for 0 z M1, .    / ν ( ) − ν ( ) = − ( + )c1 − − ( + )c2 W z; c1 W z; c2 exp M1 z exp M2 z .    / + − ( − )c1 − − ( − )c2 > exp M1 z exp M2 z 0, {·} ≤ < because each part of the expression inside brackets is positive. If we take M1 z M2, then .    / ν ( ) − ν ( ) = − ( + )c1 − − ( + )c2 W z; c1 W z; c2 exp M1 z exp M2 z .  / + − − ( − )c2 > 1 exp M2 z 0, > for a similar reason. Finally, if we take z M2, then     ν ( ) − ν ( ) = − ( + )c1 − − ( + )c2 > W z; c1 W z; c2 exp M1 z exp M2 z 0, and the proof is completed.

128 Symmetry 2018, 10, 286

2.5. Asymmetric Laplace Distributions The asymmetric Laplace distribution has been introduced in the literature by different ways16 ([ ,17]). In this paper we will use Kozubowski and Podgórski (2002) [ 18] (later refined in [ 19]) to refer it. This distribution is obtained by using the scheme introduced by Fernández and Steel (1998) [20]to produce skewness on a symmetric distribution. In this way, the pdf of a skewed or asymmetric Laplace distribution can be written in the form ⎧ √ √ ⎨ κ 2 exp − 2 (μ − x) , x < μ, σ 1+κ2 κσ f (x; μ, σ, κ) = √ √ ⎩ 2 κ − κ 2 ( − μ) ≥ μ σ 1+κ2 exp σ x , x , where σ, κ > 0, and −∞ < μ < ∞. Then, we assign values (0, 1) to the centre and scale parameters (μ and σ, respectively) in order to study the aggregate skewness function, and the extreme right and left skewness indices then depend only on the skewness parameterκ > 0. Thus, it is easily proven that: 1. The aggregate skewness function of an AL (κ) distribution can be written as   √  √ 1 2 ν (z; κ) = exp − 2κz − κ2 exp − z . AL 1 + κ2 κ

ν ( κ) κ > 2. AL z; is an increasing negative function of z when 1, and it is a decreasing positive < κ < ν ( ) = ≥ function ofz when 0 1. AL,1 z;1 0, for all z 0. That is, any AL distribution is skewed only to the right or to the left, depending onκ. In any case, the function verifies ν ( κ) = κ = limz→∞ AL z; 0 but, when 1, the function never reaches that limit value. To prove these results, it is sufficient to note that √   √  √ dν (z; κ) 2κ 2 AL = exp − z − exp − 2κz . dz κ2 + 1 κ

3. At z = 0, the skewness function takes the following value:

1 − κ2 ν (0; κ) = . AL 1 + κ2 ν ( κ) + ( (κ)) − ( (κ)) Then, AL 0; is the value for S FAL or S FAL , depending on its sign. ν ( κ) κ 4. AL z; is a strictly decreasing function on . This is easily shown by means of √ √ ν ( κ) κ2 + κ + √ √ d AL z; = − 2z 2 2z − z + − κ < κ 2 exp 2κ exp 2 z 0, d (κ2 + 1)

for all z > 0, and all κ > 0. As a conclusion, we can enunciate the following Proposition, whose proof is straightforward and hence omitted.

< κ < κ < ∞ (κ ) (κ ) Proposition 6. Assume 0 1 2 , and let FAL 1 and FAL 2 be the respective asymmetric Laplace distributions. Then: (κ ) ≥ (κ ) 1. FAL 1 ν FAL 2 . < κ < (κ ) 2. If 0 1 1, then FAL 1 is skewed only to the right. κ > (κ ) 3. If 1 1, then FAL 1 is skewed only to the left.

3. The Beta and the AST Distributions The methods for Project Management and Review Technique (PERT) are well known and widely applied when the needed activities for a given project must be ordered according to precedence in

129 Symmetry 2018, 10, 286 time. Some of these methods require modelling the time length of each activity as a random variable, following an expert’s opinion. The beta and the asymmetric triangular distributions are commonly used by engineers to describe these time lengths. In any case, the indications of the experts can be related to a maximum and a minimum values and a mode, often completed with further considerations about the shape and skewness of the PDF of the time random variable. Then, a deep study of the skewness of both families of probability distributions would be welcome to improve the model fit. On the one hand, the asymmetric standard (ASTD) , free of center and scale parameters, depends on only one parameter 0 ≤ θ ≤ 1, and has the pdf: ⎧ − ⎨⎪ 2xθ 1,0≤ x ≤ θ, − f (x|θ) = 2 (1 − x)(1 − θ) 1 , θ ≤ x ≤ 1, ⎩⎪ 0, elsewhere.

There is a large body of literature that shows the use of the ASTD in PERT methods (see [ 21] and [19] and cites therein). Note that cases θ = 0, 1 are members of the beta family of distributions. For 0 < θ < 1, the ASTD(θ) CDF can be written as follows: − x2θ 1,0≤ x ≤ θ, F (x|θ) = − 2x − x2 − θ (1 − θ) 1 , θ ≤ x ≤ 1.

As the mode is found to be at x = θ, its skewness function is found to be

(1 − 2θ) ν (z; θ) = (1 − 2θ) − z2, ASTD θ (1 − θ) for 0 ≤ z ≤ min {θ,1− θ} . In the case θ = 0.5, the skewness function is null. Then, for 0 < θ < 0.5 and θ < z ≤ 1 − θ, (z − 1 + θ)2 ν (z; θ) = . ASTD 1 − θ In the case 0.5 < θ < 1, for 1 − θ < z ≤ θ,

(θ − )2 ν ( θ) = − z ASTD z; θ , and it is easily found that ν ( θ) = −ν ( − θ) ASTD z; ASTD z;1 , (13) for 0 ≤ z < ∞. < θ < θ < Some algebra allows to prove that, being 0 1 2 1, (θ ) ≥ (θ ) 1. ASTD 1 ν ASTD 2 . < θ < + (θ ) = ν ( θ ) = − θ − (θ ) = 2. If 0 i 0.5, then SASTD i ASTD 0; i 1 2 i, and SASTD i 0. < θ < + (θ ) = − (θ ) = ν ( θ ) = − θ 3. If 0.5 i 1, then SASTD i 0 and SASTD i ASTD 0; i 1 2 i.

Therefore, the skewness of the ASTD distributions is completely controlled by the parameter θ.

On the other hand, the pdf of a beta distribution is given by

α− β− x 1 (1 − x) 1 f (x; α, β) = ,0≤ x ≤ 1, B B (α, β) where α, β > 0, and B (·, ·) is the beta function. Given that its CDF F (x; α, β) verifies that F (x; α, β) = 1 − F (x; β, α) and the sign of its skewness depends only on the condition β ≥ α or β ≤ α, we can study only the case β > α.

130 Symmetry 2018, 10, 286

We are interested on the cases α, β > 1, where there is an unique mode M,

α − 1 . M = = b (α, β) . α + β − 2

Hence, we only consider cases where 1 < α < β, where there exists a right skewness; the cases 1 < β < α, with left skewness, can be immediately deducted by taking the parameters in reverse. Notice that Pr (X > M + z) > 0 requires 0 ≤ z ≤ b (β, α), and that Pr (X < M − z) > 0 requires 0 ≤ z ≤ b (α, β). Then, ⎧ ⎪ − (α β) − (α β) > ≤ ≤ (α β) ⎨ 1 IM+z , IM−z , 0, 0 z b , ν (z; α, β) = 1 − I + (α, β) > 0, b (α, β) < z ≤ b (β, α) (14) B ⎩⎪ M z 0, z > b (β, α) , where, α− β− z t 1 (1 − t) 1 dt Iz (α, β) = 0 B (α, β) is the well known Beta Regularized function. Firstly, observe that M ν ( α β) = − 2 α−1 ( − )β−1 B 0; , 1 x 1 x dx, B (α, β) 0 and 1 M α− β− 1 m α− β− 1 x 1 (1 − x) 1 dx < x 1 (1 − x) 1 dx ≈ , B (α, β) 0 B (α, β) 0 2 where α − 1 m = 3 α + β − 2 3 is the approximate median of the distribution. Secondly, if 0 ≤ z ≤ b (α, β) < b (β, α) , then

dν (z; α, β) α− β− α− β− B (α, β) · B = − [b (α, β) + z] 1 [b (β, α) − z] 1 − [b (α, β) − z] 1 [b (β, α) + z] 1 , dz which is negative within the rank of z. For b (α, β) < z ≤ b (β, α),

dν (z; α, β) α− β− B (α, β) · B = − [b (α, β) + z] 1 [b (β, α) − z] 1 < 0. dz ≤ ≤ (β α) ν ( α β) Hence, for 0 z b , , B z; , is a strictly decreasing continuous function with ν ( α β) > ν ( (β α) α β) = B 0; , 0 and B b , ; , 0. Now we focus on the family of Beta distributions with given mode, M. That is, we consider the subfamily of Beta distributions: 1 − M B α + 1, 1 + α , M with α > 0. Then, with the aid of a proper software (we have used Wolfram Mathematica 10), one can obtain the derivative ∂ 1 − M ν z; α + 1, 1 + α , ∂α B M and maximize this function, in two cases: First case, the constrains are α ≥ 1, 0 < m < 1/2, 0 ≤ z ≤ b (α + 1, 1 + (1 − M) α/M) . The maximum value of the function is 0, and it is achieved when M = 0.5, α  3.54147, z  0.309936.

131 Symmetry 2018, 10, 286

Second case, the constrain are α ≥ 1, 0 < m < 1/2, b (α + 1, 1 + (1 − M) α/M) < z ≤ − b (1 + (1 − M) α/M, α + 1) . The maximum value of the function is−5.07056× 10 6, and it is achieved when M = 0.123564, α  1.62726, z  0.632457. ν ( α + + ( − ) α ) With these results, we can conclude that B z; 1, 1 1 M /M decreases with the feasible values of α. That way, the subsets of Beta distributions with fixed mode are ordered on skewness (see Figure 2). As the parameter values increase, these Beta distributions become less skewed.

Figure 2. Beta distributions with common given modeM = 0.2 (left panel) for α = 2, 4 and 9 and their ν skewness functions B (right panel).

4. Conclusions In this paper two main objectives are achieved: on the one hand, the given examples show that the skewness function orders the mesh in good accordance with the intuitive conception of skewness. Moreover, these examples show that the skewness of a distribution obtained from certain parametric families can be controlled by reference to their parameters. ν ( ) As we show, the function F z facilitates the description of a random variable by means of a probability distribution, by making any skewness in the model easily observable and should be undertaken to examine the use of these properties in data fitting. In practice, much can be learned from this model, but there remains the risk that it may be wrongly specified in real applications. Thus, in practice we must be willing to assume that the underlying distribution has a unique mode and belongs to a uniparametric family of distributions. γ ( ) In many practical situations, the maximum skewness index coincides with the well knownM F , but this second index only takes into account the difference of probability weights at each side of the mode, while the first takes a value from the point where this difference is maximum. Moreover, the aggregate skewness function gives more accurate information about how the probability weight is distributed along both sides of the mode. Accordingly, the conditionF ≥ν G provides highly valuable information.

Author Contributions: All authors have contributed equally to this paper. Funding: This research received no external funding. Acknowledgments: This research was partially funded by MINECO (Spain) grant number EC02017–85577–P. The authors are grateful for helpful suggestions made by two reviewers. Conflicts of Interest: The authors declare no conflict of interest.

References

1. Shaked, M.; Shanthikumar, J.G. Stochastic Orders. In Springer Series in Statistics 43; Springer: New York, NY, USA, 2007. 2. Lehmann, E.L. Ordered families of distributions. Ann. Math. Statist. 1955, 26, 399–419. [CrossRef]

132 Symmetry 2018, 10, 286

3. Arnold, B.C. Majorization and the Lorenz Order: A brief introduction. In Lecture Notes in Statistics 43; Springer: New York, NY, USA, 1987. 4. Nanda, A.K.; Shaked, M. The hazard rate and reverse hazard rate orders, with applications to order statistics. Ann. Inst. Statist. Math. 2001, 53, 853–864. [CrossRef] 5. Ramos–Romero, H.M.; Sordo–Díaz, M.A. The proportional likelihood ratio order and applications. Questiio 2001, 25, 211–223. 6. Gupta, A.K.; Aziz, M.A.S. Convex Ordering of Random Variables and its Applications in Econometrics and Actuarial Science. Eur. J. Pure Appl. Math. 2010, 3, 79–85. 7. Oja, H. On location, scale, skewness and kurtosis of univariate distributions. Scand. J. Stat.1981, 8, 154–168. 8. Van Zwet, W.R. Mean, Median, Mode II. Stat. Neerl. 1979, 33, 1–5. [CrossRef] 9. MacGillivray, H.L. Skewness and asymmetry: measures and orderings. Ann. Stat. 1986, 14, 994–1011. [CrossRef] 10. Arnold, B.C.; Groeneveld, R.A. Measuring Skewness with respect to the Mode. Am. Stat. 1995, 49, 34–38. 11. Sato, M. Some remarks on the mean, median, mode and skewness. Aust. J. Stat.1997, 39, 219–224. [CrossRef] 12. Von Hippel, P.T. Mean, Median and Skew: Correcting a Textbook Rule. J. Stat. Educ. 2005, 13.[CrossRef] 13. Das, S.; Mandal, P.K.; Ghosh, D. On Homogeneous Skewness of Unimodal Distributions. Indian J. Stat.2009, 71-B, 187–205. 14. Rubio, F.J.; Steel, M. On the Marshall–Olkin transformation as a skewing mechanism. Comput. Stat. Data Anal. 2012, 56, 2251–2257. [CrossRef] 15. García, V.J.; Martel–Escobar, M.; Vázquez–Polo, F.J. Complementary information for skewness measures. Stat. Neerl. 2015, 69, 442–459. [CrossRef] 16. Mc Gill, W.J. Random fluctuations of response rate. Psychometrika 1962, 27, 3–17. [CrossRef] 17. Holla, M.S.; Bhattacharya, S.K. On a compound Gaussian distribution. Ann. Instit. Stat. Math. 1968, 20, 331–336. [CrossRef] 18. Kozubowski, T.J.; Podgórski, K. Maximum likelihood estimation of asymmetric Laplace parameters. Ann. Inst. Stat. Math. 2002 54, 816–826. 19. Kotz, S.; van Dorp, J.R. Uneven two-sided power distribution: with applications in econometric models. Stat. Methods Appl. 2004, 13, 285–313. [CrossRef] 20. Fernández, C.; Steel, M.F.J. On Bayesian Modeling of Fat Tails and Skewness. J. Am. Stat. Assoc. 1998, 93, 359–371. 21. Johnson, D. The triangular distribution as a proxy for the beta distribution in risk analysis. The Statistician 1997, 46, 387–398. [CrossRef]

c 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

133

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

Symmetry Editorial Office E-mail: [email protected] www.mdpi.com/journal/symmetry

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel: +41 61 683 77 34 Fax: +41 61 302 89 18 www.mdpi.com ISBN 978-3-03936-647-7