The Energy Goodness-Of-Fit Test for the Inverse Gaussian Distribution

THE ENERGY GOODNESS-OF-FIT TEST FOR THE INVERSE GAUSSIAN DISTRIBUTION

Patrick Ofosuhene

A Dissertation

Submitted to the Graduate College of Bowling Green State University in partial fulﬁllment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

December 2020 Committee:

Maria Rizzo, Advisor

Neil Baird, Graduate Faculty Representative

Wei Ning

Maria Rizzo, Advisor

The inverse Gaussian distribution is one of the most widely used distributions for modelling positively skewed data. Areas of its application includes lifetime models, cardiology, linguistics, employment service, labor dispute, finance, electrical networks hydrology, demography, and meteorology (Folks and Chhikara, 1978; Seshadri, 1993; Norman, Kotz, and Balakrishnan, 1994; Bardsley, 1980). When making statistical inference with the inverse Gaussian distribution, it is an important practice to determine if the data fits the inverse Gaussian family. In this dissertation two new goodness-of-fit test based on energy distance are proposed. The first test called the energy goodness-of-fit test is consistent against general alternatives. Under the null hypothesis, the test statistic converges to a weighted sum of Chi-square random variable. The second proposed test is based on the independence characterization of the inverse Gaussian distribution. The proposed test utilize the distance correlation as a measure of dependence. The distance correlation test is implemented as non-parametric permutation test and requires the observations to be exchangeable under the null hypothesis. Simulation results indicates that, the energy goodness-of-fit test is more powerful compared to other test considered. However, the distance correlation based test outperforms other tests for large shape parameter values of the alternative distributions. The distance correlation based test is extended to the Pareto distribution, which is characterized by the independence of of the rth order

statistic X(r) and the ratio X(s)/X(r). Simulation results indicate that, the distance correlation based test has high power when the effective sample size is large. iv

This dissertation is dedicated to my parents. v ACKNOWLEDGMENTS

I am sincerely thankful to my advisor Professor Maria Rizzo for her kindness and unwavering support from start to ﬁnish of this dissertation. But for her admirable patience, this work could not have been completed. I am also grateful for her support and encouragement during my job search. Certainly, words cannot express how grateful I am for being her student. I would also like to thank my dissertation committee Professor Wei Ning, Professor Junfeng Shang, and Professor Neil Baird for their time and advice. Special thanks to Professor Craig Zirbel for making my transition to Bowling Green stress free. Thanks to all the Professors in the Mathematics and Statistics department for the invaluable wealth of knowledge imparted on me. Thanks to my brother and mentor Professor Henry de-Graft Acquah for his unselﬁsh desire to see me achieve academically. Your prayers are very much appreciated. Thanks to my parents and siblings for their support. Special thanks to my wife for her support and prayers. Thanks to the BGSU Ghanaian community for making my stay at Bowling Green a memorable one. But above all, I thank God for strength and grace through out graduate school. I couldn’t have come this far without God’s grace. I am indeed grateful. To God be the glory. vi TABLE OF CONTENTS Page

CHAPTER 1 INTRODUCTION ...... 1 1.1 Introduction ...... 1 1.2 Inverse Gaussian distribution ...... 2 1.2.1 Transformations and characterization of the inverse Gaussian ...... 9 1.2.2 Similarities with the normal distribution ...... 14 1.2.3 Point estimation of parameters ...... 15 1.2.4 Simulating inverse Gaussian distributions ...... 16 1.3 Organization of dissertation ...... 17

CHAPTER 2 EXISTING GOODNESS-OF-FIT TEST FOR THE INVERSE GAUSSIAN DISTRIBUTION ...... 18 2.1 Test based on empirical distribution function ...... 18 2.2 Test based on empirical Laplace transformation...... 21 2.3 Test based on independence characterization...... 22 2.4 Other goodness-of-ﬁt tests for inverse Gaussian distribution ...... 24

CHAPTER 3 THE ENERGY GOODNESS-OF-FIT TEST FOR THE INVERSE GAUS- SIAN DISTRIBUTION ...... 25 3.1 Introduction ...... 25 3.2 Univariate energy goodness-of-ﬁt statistic ...... 26 3.3 Energy statistic for inverse Gaussian distribution ...... 28 3.4 Energy statistic for standard Half-Normal distribution ...... 32

CHAPTER 4 DISTANCE CORRELATION AS A MEASURE OF DEPENDENCE . . . . 36 4.1 Distance correlation ...... 36 4.2 Empirical distance covariance and correlation ...... 39 4.3 Distance covariance/correlation goodness-of-ﬁt test ...... 42 vii 4.4 Distance covariance test for the inverse Gaussian ...... 44

CHAPTER 5 EMPIRICAL RESULTS ...... 45 5.1 Composite hypothesis ...... 45 5.2 Simulation design ...... 46 5.3 Empirical signiﬁcance ...... 47 5.4 Empirical power comparison ...... 48 5.5 Applications to real data ...... 69

CHAPTER 6 OTHER APPLICATIONS OF THE DISTANCE COVARIANCE BASED GoF TEST ...... 72 6.1 Introduction ...... 72 6.2 Characterization of univariate distribution by independence of two statistics . . . . 72 6.3 Pareto distribution ...... 74 6.4 Goodness-of-ﬁt test test for the Pareto distribution ...... 74 6.5 Empirical power and signiﬁcance of tests ...... 77

CHAPTER 7 SUMMARY ...... 83

BIBLIOGRAPHY ...... 84

APPENDIX A SELECTED R PROGRAMS ...... 91 viii LIST OF FIGURES Figure Page

1.1 Inverse Gaussian densities with µ = 1 for ﬁve values of λ ...... 4 1.2 Inverse Gaussian densities with λ = 1 for ﬁve values of µ ...... 5 1.3 Probability densities for Inverse Gaussian reciprocals with µ = 1 for seven values of λ ...... 8 1.4 Probability densities for Inverse Gaussian reciprocals with λ = 1 for four values of µ 9 1.5 Probability densities for Y = pλ/X(X − µ)/µ with µ = 1 for six values of λ . . 11 1.6 Probability densities for |Y | with µ = 1 for six values of λ ...... 13

2.1 Comparison of the EDF (blue) of a random sample of 50 observations from IG(1, 2) to the true CDF(red) ...... 19

3.1 Sampling distribution of the energy goodness-of-ﬁt statistic for univariate inverse Gaussian, n = 50 ...... 35

4.1 Illustration of empirical Pearson product moment correlation and distance correlation for different dependence structures...... 43

5.1 Estimated power of tests against Weibull(1, κ) with varying shape parameter κ, n = 20, 50 and α = 0.1 ...... 49 5.2 Estimated power of tests against Weibull(1, 1) with sample varying size and α = 0.1 50 5.3 Estimated power of tests against Weibull(1, 2) with sample varying size and α = 0.1 51 5.4 Estimated power of tests against Weibull(1, 3) with sample varying size and α = 0.1 52 5.5 Estimated power of tests against Beta(α, 1) with varying shape parameter α, n = 20, 50, 100 and α = 0.1 ...... 53 5.6 Estimated power of tests against Beta(1, 1) with varying sample size and α = 0.1 . 54 5.7 Estimated power of tests against Beta(2, 1) with varying sample size and α = 0.1 . 55 5.8 Estimated power of tests against Beta(3, 1) with varying sample size and α = 0.1 . 56 ix 5.9 Estimated power of tests against Pareto(1, θ) with varying shape parameter θ, n = 20, 50, 100 and α = 0.1 ...... 57 5.10 Estimated power of tests against Pareto(1, 1) with varying sample size and α = 0.1 58 5.11 Estimated power of tests against Pareto(1, 2) with varying sample size and α = 0.1 59 5.12 Estimated power of tests against Pareto(1, 3) with varying sample size and α = 0.1 60 5.13 Estimated power of tests against Gamma(α, 1) with varying shape parameter α, n = 20, 50, 100 and α = 0.1 ...... 61 5.14 Estimated power of tests against Gamma(1, 1) with varying sample size and α = 0.1 62 5.15 Estimated power of tests against Gamma(2, 1) with varying sample size and α = 0.1 63 5.16 Estimated power of tests against Gamma(3, 1) with varying sample size and α = 0.1 64 5.17 Estimated power of tests against Gamma(4, 1) with varying sample size and α = 0.1 65 5.18 Estimated power of tests against LN(1, σ) for different values of σ, n = 20, 50, 100 and α = 0.1 ...... 66 5.19 Estimated power of tests against LN(1, 1) with varying sample size and α = 0.1 . . 67 5.20 Estimated power of tests against LN(1, 2) with varying sample size and α = 0.1 . . 68 5.21 Active repair times(hours) for an airborne communication transceiver...... 70 5.22 Precipitation (in inches) at Jug Bridge, Maryland...... 71

6.1 Pareto Type I probability density functions for various shape parameter value with σ ﬁxed at 1 ...... 75 6.2 Empirical Type I error rate of testing composite Pareto hypothesis with varying shape parameter, α = 0.10, n = 20...... 78 6.3 Empirical power of testing composite Pareto hypothesis against Lognormal(0,1) alternative with varying sample size, α = 0.10 ...... 79 6.4 Empirical power of testing composite Pareto hypothesis against Weibull(1,1) alternative with varying sample size, α = 0.10 ...... 80 6.5 Empirical power of testing composite Pareto hypothesis against Weibull(2,1) alternative with varying sample size, α = 0.10 ...... 81 x 6.6 Empirical power of testing composite Pareto hypothesis against Gamma(2,1) alternative with varying sample size, α = 0.10 ...... 82 xi LIST OF TABLES Table Page

1.1 Parameter values for f(x, µ, λ)...... 6 1.2 Parameter values for f(w, µ, λ)...... 8

3.1 A comparison of mean deviation formula (3.3.12) and with the corresponding means estimated by simulation ...... 32

5.1 Empirical estimates of size of test with α = 0.10 ...... 48 5.2 Active repair times(hours) for an airbone communication transreceiver...... 70 5.3 p-values for the various tests...... 70 5.4 Precipitation (in inches) at Jug Bridge, Maryland...... 71 1

CHAPTER 1 INTRODUCTION

1.1 Introduction

Probability distributions are useful tools to model variants of events. There are a vast number of distributions for modelling skewed data. The inverse Gaussian distribution is one of the alternative models used for modeling positively skewed data. Its applications are well documented in diverse areas including but not limited to lifetime models, cardiology, linguistics, employment service, labor dispute, finance, electrical networks hydrology, demography, and meteorology (Folks and Chhikara, 1978; Seshadri, 1993; Norman et al., 1994; Bardsley, 1980). Huberman, Pirolli, Pitkow, and Lukose (1998) proposed the use of the inverse Gaussian model to determine the number of website links a user follows before the page value first reaches the stopping threshold. Before applying the inverse Gaussian model, it is imperative to establish the extent of agreement between the distribution of sample realizations and the theoretical distribution. Failure to do so may lead to wrong assumption and inferences about a given sample data. Determining whether a hypothesized distribution generates a sample data is usually a distribution fit problem and are resolved in the field of goodness-of-fit test. Distribution fit still remains an important problem in statistics and has attracted investigations for the past century. Several techniques have been developed to test the hypothesis that the random sample distribution coincides with a hypothesized inverse Gaussian distribution. These includes empirical distribution function (edf) test such as Kolmogorov-Sminorv test, Anderson-Darling test, and the Cramer-von Mises test. Modifications of these methods for the composite inverse Gaussian hypothesis also exist. (Pavur, Edgeman, and Scott, 1992). Other goodness of fit techniques employ the use of characterization properties of the hypothesized distribution. For instance, Mudholkar and Tian (2002) applied the entropy characterization of the inverse Gaussian distribution. Characterization by independence of two statistics is an at- tractive and simple alternative approach to develop a goodness of fit test (Mudholkar, Natarajan, 2 and Chaubey, 2001). For distributions characterized by the independence of two statistics, the goodness-of-fit testing problem is equivalent to testing the independence of the two statistics. Mud- holkar et al. (2001) used the Pearson product moment correlation as a measure of independence. Using the correlation coefficient is valid if the two statistics are bivariate normal. However, this is not the case in most problems. Moreover, the Pearson correlation can be zero for dependent variables. Due to these limitations, a more general measure of independence is required. The distance correlation comes in handy since a zero distance correlation implies the random variables X and Y are independent. In this dissertation, two new goodness-of-fit tests based on energy distances are developed for the inverse Gaussian distribution . The distance correlation which is analogous to, but more general than the product moment correlation, is used as a measure of independence. The second method we propose is called the energy test which is based on energy distance. En- ergy distance is a nonnegative metric that measures the distance between distributions of random vectors. The energy distance is zero if and only if the distributions are identical. Thus, the energy distance provides a characterization of equality of distributions and as such can be used to measure the difference between sample and hypothesized distribution. The name ‘energy’ is motivated by the close analogy of Newton’s gravitational potential energy. Applications of the energy distance includes a consistent univariate goodness-of-fit test (Szekely´ and Rizzo, 2005; Rizzo, 2009; Yang, 2012; Rizzo and Haman, 2016), multivariate test of independence (Szekely,´ Rizzo, and Bakirov, 2007), change point analysis (Kim, Marzban, Percival, and Stuetzle, 2009), nonparametric extension of analysis of variance (Rizzo and Szekely,´ 2010) etc.

1.2 Inverse Gaussian distribution

Schrodinger (1951), studied the Brownian motion with positive drift and derived the distribution of ﬁrst passage time. Surprisingly, in the same year, Smolucwski also derived this distribution but in a different way. Suppose W (t) is a one dimensional Weiner process with positive drift v, variance σ2, and W (0) = 0. Then, the time required for W (t) to reach a for the ﬁrst time is a 3 random variable with density

( ) a −(a − vt)2 f(t) = √ exp , t > 0, v > 0 (1.1) σ 2πt3 2σ2t

(Folks and Chhikara, 1978). Several derivations of the first passage time distribution have been established (Stephens, 1976). In an attempt to extend the results of Schrodinger¨ , Tweedie (1941) discovered that the cummulant generating function of the first passage distribution is the inverse of the cummulant generating function of the Gaussian distribution. Because of this relationship, Tweedie (1956) adopted the name ‘inverse Gaussian’ for the first passage time d istribution. The definition of the model is given below.

Definition 1.1. The family of the inverse Gaussian distribution, IG(µ, λ), with location parameter µ > 0 and shape parameter λ > 0, is deﬁned by the probability density function

r ( ) λ −λ(x − µ)2 f(x|µ, λ) = exp , x > 0. (1.2) 2πx3 2µ2x

a a 2 Note that with µ = v and λ = ( σ ) , (1.2) is a parametrization of the ﬁrst passage time distribution (1.1) (Folks and Chhikara, 1978). Other useful alternative reparameterization of (1.2) proposed by Tweedie (957a) are deﬁned below (Shuster, 1968; Chhikara and Folks, 1974)

1 2 λ Definition 1.2. Given the relationship 2 α = µ = φ , the probability density (1.2) can be written in the forms:

r λ 1 f(x, α, λ) = exp{αλx − λ(2α) 2 − λ/2x}, x > 0; (1.3a) 2πx3 r ( ) µφ −φx µφ f(x, µ, φ) = exp + φ − , x > 0; (1.3b) 2πx3 2µ 2x r ( ) λ −φ2x λ f(x, φ, λ) = exp + φ − , x > 0. (1.3c) 2πx3 2λ 2x

The density is skewed to the right, unimodal, and has shape which depends on φ. Figure 1 shows the density curves with µ = 1 and for ﬁve values of λ. These ranges from highly skewed 4

Figure 1.1 Inverse Gaussian densities with µ = 1 for ﬁve values of λ to symmetrical distribution. The inverse Gaussian distribution becomes more symmetrical, As λ tends to inﬁnity. The corresponding cumulative distribution function (CDF) is:

(r ) ( r ) λx n2λo λx F (x|µ, λ) = Φ − 1 + exp Φ − − 1 , (1.4) x µ µ x µ

where Φ(·) is the CDF of the standard normal distribution (Shuster, 1968; Chhikara and Folks, 1974)

Definition 1.3. Let X be IG(µ, λ) random variable. The characteristic function (CF) of X is given by

Z ∞ ( " r 2 #) itX itX λ 2itµ φx(t) = E[e ] = e f(x, µ, λ)dx = exp 1 − 1 − . (1.5) 0 µ λ

The corresponding moment generating function (MGF) and cummulant generating function 5

Figure 1.2 Inverse Gaussian densities with λ = 1 for ﬁve values of µ

(CGF),which is the log of the MGF, are

( " r #) λ 2µ2t exp 1 − 1 − , (1.6) µ λ and " r # λ 2µ2t 1 − 1 − , (1.7) µ λ respectively.

The CF, MGF and CGF are useful in ﬁnding the moments of X. The rth moment of X can be obtained by evaluating the rth derivative of (1.5), (1.6) or (1.7) at t = 0. For instance, using (1.5), the rth moment of X can be expressed in a power series representation form as

r−1 −s X (r − 1 + s)! 2λ E[Xr] = µr . (1.8) s!(r − 1 + s)! µ s=0 6 From (1.8), the ﬁrst four moments of X are

µ µ3 µ2 + λ µ4 µ5 µ3 + 3 + 3 λ λ2 µ5 µ6 µ7 µ4 + 6 + 15 + 15 λ λ2 λ3

Table 1.1 summarizes the parameter values of the distribution of w. In the table, φ = λ/mu, K3

and k2 are the third and second cumulants, m2 and m4 are the second and fourth sample moments about the mean. Table 1.1 Parameter values for f(x, µ, λ)

Parameter Representation Value Mean E(x) µ

2 µ3 Variance E(x − E(x)) λ

3/2 p µ Skewness k3/k2 3 λ

2 15µ Kurtosis m4/m2 λ

1/2 9µ2 3µ Mode µ 1 + 4λ2 − 2λ

Definition 1.4. (Single parameter form) The probability density of the inverse Gaussian distribu-tion with single parameter is defined as

µ (x − µ)2 f(x, µ, µ2) = √ exp − . (1.9) 2πx3 2x

µ2x µ3 For this parametrization, the mean and the variance are the same. By letting y = λ and µ0 = λ the double parameter inverse Gaussian density f(x, µ, λ) can be transformed into single parameter

2 density f(y, µ0, µ0). 7 Definition 1.5. (The three-parameter inverse Gaussian, Chhikara (1988, p. 19)). The random variable X is said to follow the three-parameter inverse Gaussian distribution if its density is deﬁned as: λ 1/2 λ[(x − θ) − δ]2 f(x) = exp − , x > θ, (1.10) 2π(x − θ)3 2δ2(x − θ)

with −∞ < θ < ∞, δ > 0, and λ > 0. In this parametrization, θ is the location parameter, λ is the scale parameter and θ + δ is the mean.

Definition 1.6. (Generalized inverse Gaussian Distribution, Chhikara (1988, p. 20)) A general class of distributions for which the inverse Gaussian is a special case is deﬁned by the pdf

λ−1 (λx−1 + (λ/µ2)x) f(x; γ, µ, λ) = 2µγK xγ−1 exp − , −∞ < γ < ∞ (1.11) r µ 2

The reciprocals of the inverse Gaussian random variable may sometimes be convenient to use. Its density is defined below.

Definition 1.7. (Reciprocals of inverse Gaussian Distribution, Chhikara (1988, p. 43)). Let X ∼ IG(µ, λ). Then the density function of W = X−1 is given by

r −λ(1 − µw)2 λ f(w, µ, λ) = exp , w > 0. (1.12) 2µ2w 2πw

Table 1.2 summarizes the parameter values of the distribution of w. In the table, φ = λ/mu,

K3 and k2 are the third and second cummulants, m2 and m4 are the second and fourth sample moments about the mean. The density of w is positively skewed and unimodal. Figures 1.3 and 1.4 shows the density plots of the inverse Gaussian reciprocals for different parameter settings. The mode is less pronounced for small values of λ when m = 1 but more pronounced for large values of λ. 8 Table 1.2 Parameter values for f(w, µ, λ)

Parameter Representation Value 1 1 Mean E(W ) µ + λ

2 1 2 Variance E(w − E(w)) λµ + λ2

3/2 3φ+8 Skewness k3/k2 (φ+2)3/2

2 3(5φ+16) Kurtosis m4/m2 (φ+2)2

1/2 1 µ2 µ Mode µ 1 + 4λ2 − 2λ

Figure 1.3 Probability densities for Inverse Gaussian reciprocals with µ = 1 for seven values of λ 9

Figure 1.4 Probability densities for Inverse Gaussian reciprocals with λ = 1 for four values of µ

1.2.1 Transformations and characterization of the inverse Gaussian

The inverse Gaussian density can be transformed to certain densities. In this section we present the relationship between the inverse Gaussian and the normal, standard half-normal and the chi- square distribution. We begin by ﬁnding the distribution of Y = pλ/X(X − µ)/µ. The result is stated in the theorem below.

p Theorem 1.1. Given that X ∼ IG(µ, λ). Then Y = λ/X(X − µ)/µ is a nonlinear weighted normal with density

y 1 f(y) = 1 − √ exp(−y2/2) , −∞ < y < ∞. (1.13) p4λ/µ + y2 2π 10 Proof. The proof due to Seshadri (1993) is summarized below. Let Y = pλ/X(X − µ)/µ. Then √ dy sqrtλ (x−µ) √ 2 dx = 2µ x−3/2 . Let t = x. Then we have Y = λ(t − µ)/tµ. This leads to the quadratic √ √ equation: 0 = λt2 − yµt − λµ. Since t assumes positive values, we have

yµ + py2µ2 + 4λµ t = √ , (1.14) 2 λ

so that

yµ + py2µ2 + 4λµ2 x + µ = µ + √ 2 λ y2µ2 + yµpy2µ2 + 4λµ + y2µ2 + 4λµ = µ + 4λ 4λµ + y2µ2 + yµpy2µ2 + 4λµ = . (1.15) 2λ

Reciprocating (1.15) and multiplying by 2µ, we have

2µ 4λ = x + µ 4λµ + y2µ2 + yµpy2µ2 + 4λµ 4λ = (py2µ2 + 4λµ + yµ)py2µ2 + 4λµ 4λ(py2µ2 − 4λµ − yµ) = (y2µ2 + 4λµ − yµ)py2µ2 + 4λµ y = 1 − q (1.16) 2 4λ y + µ

Substituting these algebraic results yields

r λ λ(x − µ)2 dx f(y) = exp − 2πx3 2µ2x dy y 1 = 1 − √ exp(−y2/2) , −∞ < y < ∞. (1.17) p4λ/µ + y2 2π 11 Figure 1.5 illustrates the densities of Y for various values of λ and µ = 1. We see that the distribution tends to standard normal for large values of λ/µ. From this relation, one can make inferences about the inverse Gaussian by using normal properties. The next result gives the distribution of |Y |.

Figure 1.5 Probability densities for Y = pλ/X(X − µ)/µ with µ = 1 for six values of λ

Theorem 1.2. Given that X ∼ IG(µ, λ) with Y as defined in theorem 1.1. Then W = |Y | is distributed as the standard half-normal (SHN) with pdf

2 2 fw(w) = √ exp(−w /2). (1.18) 2π 12 Proof. Let W = |Y |. Then

Z w 1 2 FW (w) = P (−w ≤ Y ≤ w) = (1 + g(y))√ exp (−y /2)dy, (1.19) −w 2π

where g(y) = −y/py2 + 4λ/µ. Note that g(y) is an odd function since g(−y) = −g(y). Hence

Z w d 1 2 2 2 fW (w) = √ exp(−y /2)dy = √ exp (−w /2) (1.20) dw 2π −w 2π

Figure 1.6 illustrates the density of |Y | for µ = 1 and several values of λ. We ﬁnd this transformation useful in the sense that it does not depend on the inverse Gaussian parameters. We apply this transformation in the derivation of a computing formula of the energy statistic which is independent of the inverse Gaussian parameter values.

Theorem 1.3. Given that X ∼ IG(µ, λ) with Y as defined in Theorem 1.1. Then Y 2 has a chi- square distribution with one degree of freedom. Consequently,

n 2 X λ(Xi − µ) ∼ χ2 . X µ2 n i=1 i

Proof. Since Y 2 = W 2, it follows directly that Y 2 is chi-square random variable with one degree of freedom.

Interesting characterization results of the inverse Gaussian model have been established over the years. Some of these results are useful in developing a goodness-of-ﬁt test. Theorems 1.4 to 1.7 summarizes these results. The ﬁrst result is based on independence of two statistics. If

¯ −1 P −1 ¯ −1 X1 ··· ,Xn are independent inverse Gaussian random variables, X and V = n (Xi − X ) are statistically independent (Tweedie, 957a). The converse, due to Khatri (1962), also holds. The theorem below stated in Chhikara (1988), summarizes Khatri’s characterization.

Theorem 1.4. (Independence characterization, Khatri (1962)) Let X1, X2, ..., Xn be indepen- 13

Figure 1.6 Probability densities for |Y | with µ = 1 for six values of λ

dently and identically distributed positive random variables such that the expected values of X,X2 P and 1/ Xi exist and are different from zero. Then the necessary and sufﬁcient condition that P ¯ P X1,X2, ..., Xn are distributed as inverse Gaussian is that Xi or X = Xi/n and V = P ¯ (1/Xi − 1/X )/n are independently distributed.

Theorem 1.5. (Constant regression characterization, Seshadri (1983)) Let X1, X2, ..., Xn be in-

2 dependent and identically distributed positive random variables such that E[X], E[X ], E[1/Xi] Pn −1 P and E[ i=1 Xi] exist and are different from zero. If the regression of V on S = Xi is constant

then each Xi has inverse Gaussian distribution.

Theorem 1.6. (Entropy characterization, Mudholkar and Tian (2002)) The random variable X √ with IG(µ, λ) distribution is characterized by the property that 1/ X attains maximum entropy 14 among all nonnegative, absolutely continuous random variables Y subject to restrictions E[Y −2] = µ, E[Y 2] = 1/µ + 1/λ

Theorem 1.7. (Martingale characterization, Seshadri and Wesołowski (2004)) Let {Xn}n≥1 be Pn a sequence of positive, non degenerate random variables. For n ≥ 1, let Sn = Xi and i=1

consider the σ− algebra, Fn = σ(Sn, Sn+1, . . . ). Let b > 0. Then the sequence n 1 − , Fn Sn 2bn n≥1

is a backward martingale if and only if X1 has the IG(a, b) distribution for some a > 0.

These characterizations are useful in developing goodness-of-ﬁt test for the inverse Gaussian. We will use Theorem 1.4 to propose a goodness-of-ﬁt test in chapter 4.

1.2.2 Similarities with the normal distribution

The inverse Gaussian distribution has other useful properties which are analogous to the properties of normal distributions. The common ones are summarized below.

Proposition 1.1. (Similarities with the normal distribution). If X ∼ IG(µ, λ), then

(a) the sample mean has an inverse Gaussian distribution. Thus, X¯ ∼ IG(µ, nλ).

2 λ 2 2 2 x−µ 2 (b) µ2x (x − µ) ∼ χ1. Note that if X ∼ N(µ, σ ), then σ ∼ χ1.

n 2 n ¯ 2 λ X (Xi − µ) X (X − µ) = λ (X−1 − X¯ −1) + nλ , µ X i µ2X¯ i=1 i i=1

¯ P −1 ¯ −1 ¯ −1 and X and V = (Xi − X ) are independent. It should be noted that X and n V are the maximum likelihood estimators of µ and λ−1. In the case of Gaussian distribution,

n n −2 X 2 −2 X ¯ 2 −2 ¯ 2 σ (Xi − µ) = σ (Xi − X) + nσ (X − µ) , i=1 i=1 15 ¯ P ¯ 2 and X and (Xi − X) are independent.

(d) kX ∼ IG(kµ, kλ) for any k > 0. Thus, the family of inverse Gaussian distribution is closed under a change of scale.

P 2 (e) Xi ∼ IG(nµ, n λ).

(f) The family of inverse Gaussian distribution is complete (Wasan, 1968).

1.2.3 Point estimation of parameters

Let X1, ..., Xn be a random sample from IG(µ, λ). The likelihood function is

1 n n ! 2 n n ! λ 2 Y 1 nλ λ X λ X 1 L(µ, λ) = exp − X − . (1.21) 2π X3 µ 2µ2 i 2 X i=1 i i=1 i=1 i

The log likelihood l(µ, λ) is proportional to

n n n nλ λ X λ X 1 l(µ, λ) ∝ log λ + − X − . (1.22) 2 µ 2µ2 i 2 X i=1 i=1 i The likelihood equations are obtained by setting the derivative of (1.22) with respect to µ and λ to zero. Solving the likelihood equation, we obtain the maximum likelihood estimators of µ and λ as

Pn n Xi 1 1 X 1 1 µ = i=1 = X,¯ = − . b n n X µ λb i=1 i b

respectively. Note that µb and λb are statistically independent since the conditional moment generating function of V given X¯ =x ¯, which is (1 − 2t)−(n−1)/2, is the same for all X¯. The maximum likelihood estimator of the variance is

n µ3 1 X X¯ 3 σ2 = b = − nX¯ 2 . b n X λb i=1 i

The two parameter inverse Gaussian density belongs to the exponential family with canonical 16 representation (Seshadri, 1993)

−3/2 x θ1 f(x) = √ = exp + θ2x − k(θ1, θ2) , (1.23) 2π x

where p 1 λ λ K(θ , θ ) = −2 θ θ − log(−θ ), (θ , θ ) = − , − . 1 2 1 2 2 1 1 2 2 2µ2

For X ∼ IG(µ, λ) which belongs to the two parameter exponential family, the statistic T =

Pn Pn 1 ( Xi, ) is minimal sufﬁcient and complete. By the Lehmann-Scheffe theorem, the i=1 i=1 Xi −1 ¯ V uniformly minimum variance unbiased estimators of µ and λ are X and n−1 respectively.

1.2.4 Simulating inverse Gaussian distributions

The inverse CDF approach is widely used to generate random observations from a distribution of interest. For distributions with closed form expression of the inverted CDF, we can generate the random variable x as x = F −1(u), where u is a standard uniform random variable. The inverse CDF approach is not applicable to the inverse Gaussian distribution since its cdf is not easily invertible. Michael, Schucany, and Haas (1976) suggested the use of transformations with

λ(X−µ)2 multiple roots. Let v = µ2X be the transformed variable where X ∼ IG(µ, λ). Note that 2 v ∼ χ1. This transformation has two roots:

µ p X = [2λ + µv − 4λµv + µ2v2] 1 2λ

and µ2 X2 = X1

Michael et al. (1976) suggested using binomial trial to select the roots as the inverse Gaussian observation. Here is the algorithm.

2 1. Generate v which is χ1 random variable with 1 degrees of freedom.

µ p 2 2 2. Compute X1 = 2λ [2λ + µv − 4λµv + µ v ] 17 3. A Bernoulli trial is performed with success probability p = µ µ+X1

4. X1 is chosen to be the inverse Gaussian random observation if the trial results in “success”,

2 else, µ is chosen. X1

1.3 Organization of dissertation

The object of this research is to present a new goodness-of-fit test for the inverse Gaussian distribution based on the energy statistic and distance correlation. Chapter 2 summarizes the existing goodness-of-fit techniques. In Chapter 3 we present the theory of energy distance and the energy statistic. A goodness-of fit test is developed for the univariate inverse Gaussian distribution based on the energy statistics. A computing formula for the test statistic is derived based on the transformation of the inverse Gaussian to standard half-normal random variable. In Chapter 4 we give a brief overview and properties of distance correlation. The distance correlation based goodness-of-fit test for the inverse Gaussian is proposed. Performance of the proposed methods are presented in Chapter 5. Simulations are conducted to show how the proposed tests controls the Type-I error of the composite inverse Gaussian hypothesis. Also, an extensive power study is done to compare the power of the proposed test. In Chapter 6, we summarize other distributions with independence characterizations. We then assess performance of the distance correlation based goodness-of-fit test for the Pareto distribution. Finally, in Chapter 7, we provide a discussion , summary, and future work. 18

CHAPTER 2 EXISTING GOODNESS-OF-FIT TEST FOR THE INVERSE GAUSSIAN DISTRIBUTION

Several goodness-of-ﬁt test have been developed for the inverse Gaussian distribution. Some techniques applies to simple hypothesis while others are meant for the composite inverse Gaussian distribution. In this chapter we review some existing goodness-of-ﬁt techniques for the inverse Gaussian distribution.

2.1 Test based on empirical distribution function

The empirical cumulative distribution function often referred to as the empirical distribution function (EDF) is a step function of the sample that increases by 1/n at each sample realization. The empirical distribution function estimates the cumulative distribution function that generated the sample data points. Let X1,...,Xn be independent and identically distributed random variable from some continuous distribution with distribution function F (X). Deﬁne X(1),X(2),...,X(n) as the order statistic of X1,...,Xn. The empirical distribution function Fn(x)is deﬁned as

n 1 X F (x) I(X ≤ x), −∞ < x < ∞ (2.1) n n (i) i=1

where I(X(i) ≤ x) = 1 if X(i) ≤ x and zero otherwise. From this deﬁnition we have that

  0, for x < X(1)   Fn(x) = 1/n, for X(i) ≤ x ≤ X(i+1) i = 1, ··· , n − 1    1, for X(n) ≤ x

Figure 2.1 illustrates the empirical CDF and the true CDF of 50 random samples from the

IG(1, 2). By the central limit theorem, Fn(x) is asymptotically normal with mean 0 and variance

F (x)(1 − F (x)). The empirical distribution Fn(x) is a consistent estimator of the true CDF F (x).

Thus |Fn(x)−F (x)| converges to 0 with probability 1 as n −→ ∞. The Glivenko-Cantelli theorem 19

Figure 2.1 Comparison of the EDF (blue) of a random sample of 50 observations from IG(1, 2) to the true CDF(red)

provides a uniform convergence from Fn(x) to F (x). Thus,

a.s ||Fn(x) − F (x)|| = sup|Fn(x) − F (x)| → 0 (2.2) x

EDF tests are based on the difference between Fn(x) and F (x). Stephens (1974) and DAgostino and Stephens (1986) gives a thorough description and application of the various EDF tests. EDF test of the form (2.2) are called the supremum statistics. Some EDF test for the inverse Gaus- sian distribution includes the Kolmogorov-Smirnov test, Cramer-von Mises test and the Anderson- Darling test. The most popular test among the EDF tests is the Komogorov-Smirnov test which belongs to 20 the class of supremum statistics. The test statistic of the Komogorov-Smirnov test is

D = sup|Fn(x) − F (x)| (2.3) x

The test statistic D converges to 0 almost surely if Fn(x) and F (x) coincide. Critical values can be obtained by a Monte carlo simulation. Other well known EDF tests belongs to the quadratic statistics or the Cramer-von Mises family.

For this type of tests the sup norm in (2.2) is replaced with L2 − norm and some weight function:

Z ∞ 2 n Fn(x) − F (x) φ(x)dF (x), (2.4) −∞ where φ(x) is a suitable weight function. Equation (2.4) is the average of the squared discrepancy 2 Fn(x) − F (x) weighted by some nonnegative function φ(x). When φ(x) = 1, we have the Cramer-von Mises test statistc. When φ(x) = [F (x)(F (x) − F (x))]−1 we have the Anderson- Darling test statistic (Anderson and Darling, 1954). The computing formulas for the EDF tests are obtained by the probability integral transform Z = F (X) which is distributed uniformly between 0 and 1. The computing formula for each test is given below: Kolmogorov-Smirnov statistic:

D = max(D+,D−), (2.5)

+ − where D = max{i/n − Z(i)} and D = max{Z(i) − (i − 1)/n} Cramer-von Mises statistic:

n X 2i − 12 W 2 = Z − (2.6) (i) 2n i=1 Anderson-Darling statistic:

n 1 X A2 = −n − (2i − 1)logZ + (2n + 1 − 2i)log(1 − Z ) (2.7) n (i) (i) i=1 21 Edgeman (1990) described the use Kolmogorov-Smirnov test for testing an inverse Gaussian hypothesis with specified parameters. The Komogorov-Smirnov test cannot be applied directly when the parameters are unknown. A modification of the Komorgorv-Smirnov test for the composite hypothesis is described by Edgeman, Scott, and Pavur (1988). O’Reilly and Rueda (1992) employs the Anderson-Darling test for the composite inverse Gaussian hypothesis. Estimations were done directly and by the Rao-Blackwell distribution estimator. A modification of EDF tests where Monte carlo methods are used to generate the critical values are developed by Gunes, Dietz, Auclair, and Moore (1997). The modified Anderson darling test are usually powerful compared to other EDF tests.

2.2 Test based on empirical Laplace transformation.

Henze and Klar (2002) proposed empirical Laplace transform based goodness-of-test for the inverse Gaussian distribution via a parametric bootstrap approach. The test is described as follows. Let X ∼ IG(µ, λ) and deﬁne

( " r #) λ 2µ2t L(t) = E[exp(−tX)] = exp 1 − 1 − (2.8) µ λ

as the Laplace transformation of X where L(t) satisﬁes the differential equation

p µL(t) + L0(t) + 1 + 2µ2t/λ = 0, t > 0, (2.9)

subject to the initial condition L(0) = 1. Let

q 0 2 ˆ ˜n(t) =µL ˆ n(t) + Ln(t) + 1 + 2ˆµ t/λ (2.10)

be an estimate of the left hand side of equation (2.9) where µˆ and λˆ are the maximum likelihood estimates of µ and λ and

−1 X Ln(t) = n exp(−txi) 22 is the empirical laplace transform of X. The proposed test statistic is the weighted L2-distance:

Z ∞ n 2 Tn,a = ˜n(t) exp(−aµˆnt)dt. (2.11) µˆn 0

Henze and Klar (2002) proposed a second test statistic for testing the composite inverse Gaussian ˆ ˆ which is based on the use of a measure of deviation between Ln(t) and L(t). In this setting, Ln(t) ˆ −1 P is the Laplace transform of X and Ln(t) = n exp(−txi) is the nonparametric estimate of the Laplace transform of X. The test statistic is deﬁned as

Z ∞ 2 ˆ Vn,a = nµˆn Ln(t) − Ln(t) exp(−aµˆnt)dt. (2.12) 0

In both test statistic, a is the weight parameter. The null hypothesis is rejected for large values of

Tn,a and Vn,a.

2.3 Test based on independence characterization.

Let X1,...,Xn be i.i.d with distribution function F , S1 = S1(X1,...,Xn) and

S2 = S2(X1,...,Xn) be two measurable single-valued functions of the observations. One can characterize a population based on the independence of S1 and S2. Thus, S1(X1,...,Xn) and

S2(X1,...,Xn) are independent if and only if F belongs to some population F0. The most common population with such characterization is the normal where the sample mean and the sample

variance are independent (Lukacs, 1956). Testing the hypothesis H0 : X1,...,Xn is from F is

equivalent to testing the Independence of S1(X1,...,Xn) and S2(X1, ··· ,Xn). Thus, the hypoth-

esis H0 : X1, ··· ,Xn is from F can be formulated as:

H0 : H(y, z) = G1(y)G2(z), y, z ∈ R (2.13)

where G1, G2, and H(y, z) are the marginal and the joint distribution functions of S1 and S2 respectively. Milosevicˇ (2017) suggest a natural choice of test statistic: 23

Z In = Hn(y, z) − Gn1(y)Gn2(z) dFn(y)dFn(z) (2.14)

Kn = sup Hn(y, z) − Gn1(y)Gn2(z) (2.15)

1 P 1 P where Gn1(y) = n I(S1(X1, ··· ,Xn) < y), G21(y) = n I(S2(X1, ··· ,Xn) < z) and 1 P Hn(y, z) = n I(S1(X1, ··· ,Xn) < y, S1(X1, ··· ,Xn) < z) are V -empirical distribution functions. Miloseviˇ c´ and Obradovic´ (2016) applied these methods to the Pareto and logistic distribution. Baringhaus and Gaigall (2015) adapted the BlumKiefer-Rosenblatt independence criterion to test for the independence of the the bivariate random vector (Y,Z). The test rejects the hypothesis of independence for large values of

Z 2 Tn = n (Hn(y, z) − Fn(y)Gn(z)) dHn(y, z), (2.16)

1 P Pn Pn where Hn(y, z) = n I(Yj ≤ y, Zj ≤ z), Fn(y) = j=1 I(Yj ≤ y) and Gn(z) = j=1 I(Zj ≤ z) are the empirical distribution functions of the joint and the marginal sample variables Lin and Mudholkar (1980) constructed a simple Z test based on independence characterization for normality. The Z test is asymptotically distributed as N(0, 3). The inverse Gaussian, just as in the case of normal distribution is characterized by the independence of X¯ and V = P(X−1 − X¯ −1). Mudholkar et al. (2001) developed a Z test premised on this characterization. The test statistic based on Fisher’s transformation is is:

1 + r Z = 0.5log , (2.17) 1 − r ¯ ¯ where r is the correlation coefﬁcient between X−1 and V−1. The n replicates of X−1 and V−1 are computed by the leave-one-out approach. Note that the Z test has the same asymptotic null distribution as the Z test for normality. The Z test controls the nominal Type I error rate and has 24 reasonable power.

2.4 Other goodness-of-ﬁt tests for inverse Gaussian distribution

Several other goodness-of-fit test for the inverse Gaussian distribution have been considered in literature. Nguyen and Dinh (2003) proposed exact EDF based goodness-of-fit test. Gracia- Medrano and O’Reilly (2005) commented and pointed out that the results by Nguyen and Dinh (2003) were incorrect. A smooth test proposed by Ducharme (2001) is based on the use of the reciprocals of inverse Gaussian random variable whose distribution is also known as the Random- walk distribution. Vexler, Shan, Kim, Tsai, Tian, and Hutson (2011) proposed an empirical likelihood ratio based goodness-of-fit test for inverse Gaussian distributions. This test improves the entropy based test by Mudholkar and Tian (2002). A variance ratio test is also proposed by Vasicek (1976). 25

CHAPTER 3 THE ENERGY GOODNESS-OF-FIT TEST FOR THE INVERSE GAUSSIAN DISTRIBUTION

3.1 Introduction

In this Chapter, a new goodness-of-ﬁt test based on empirical characteristic function for the inverse Gaussian is presented.

Let F denote the cumulative distribution function of a random variable and Fn denote the empirical CDF of a sample of size n. One can test if the the distributions coincide by checking if the distance between F and Fn converges to zero. An L2-distance such as the Cramer’s distance deﬁned as Z ∞ 2 (Fn(x) − F (x)) dx (3.1.1) ∞

can be used as a measure of discrepancy between F and Fn. However, this distance is not distribution free. Szekely (1989) proposed energy distance. The energy distance is the distance between probability distributions. Suppose that X and Y are independent random variables with cumulative distribution functions F and G respectively. If X0 and Y 0 are iid copies of X and Y , then the energy distance between F and G is deﬁned as (Szekely, 1989)

2E|X − Y | − E|X − X0| − E|Y − Y 0| ≥ 0. (3.1.2)

where | · | denotes Euclidean norm and E denotes expected value. The energy distance is nonzero and satisﬁes all properties of a metric. In particular, the energy distance between F and G is zero if and only if F and G are identical. Thus, the energy distance provides a characterization of equality of distributions and as a result, can be used to measure the difference between sample and hypothesized distribution. The energy distance is rotation invariant and the rotation invariant property extends to higher dimensions. The expectation E|X −Y | is taken with respect to the joint 26 distribution. By independence, FXY (x, y) = FX (x)FY (y). So,

ZZ E|X − Y | = |x − y|dFX (x)dFY (y). (3.1.3)

Definition 3.1. (Energy distance). The energy distance between the d-dimensional independent random vectors X and Y is deﬁned as

0 0 E(X,Y ) = 2E|X − Y |d − E|X − X |d − E|Y − Y |d ≥ 0, (3.1.4)

0 0 where E|X|d < ∞, E|Y |d < ∞, X and Y are iid copies of X and Y respectively.

Note that E(X, Y ) is non-negative with equality to zero if and only if F = G. The energy distance can be expressed in terms of characteristic functions.

Proposition 3.1. (Szekely, 1989) If the d-dimensional random variables X and Y are independent ˆ with E|X|d + E|Y |d < ∞, and f , gˆ denote their respective characteristic functions, then their energy distance is

Z ˆ 2 0 0 1 |f(t) − gˆ(t)| dt 2E|X − Y |d − E|X − X |d − E|Y − Y |d = d+1 , (3.1.5) cd Rd |t|d

π(d+1)/2 where cd = d+1 and Γ(.) is the complete gamma function. Thus, E(X,Y ) ≥ 0 with equality to Γ( 2 ) zero if and only if X and Y are identically distributed.

3.2 Univariate energy goodness-of-ﬁt statistic

Given the sample x1, . . . , xn from the distribution F , we wish to test if the hypothesized dis-

tribution F0 coincides with F . The energy goodness-of-ﬁt statistic for testing H0 : F = F0 vs

H1 : F 6= F0 is

n n 2 X 1 X Q = n E|x − X| − E|X − X0| − |x − x | , (3.2.1) n n i n2 i j i=1 i,j=1 27 0 where X and X are iid with CDF F0. The energy goodness-of-ﬁt statistic is a V − statistic with kernel deﬁned as

h(x, y) = E|x − X| + E|y − X| − E|X − X0| − |x − y| (3.2.2)

Large Values of Qn supports the alternative F 6= F0. The limiting distribution of Qn un-

P∞ 2 der the null hypothesis is a quadratic form k=1 λkZk where Zk, k = 1, 2, .. are identically and

independent standard normal random variable, and λk are non-negative constants. The energy

goodness-of-ﬁt test could be implemented by evaluating the constants λ k. However, we use the

empirical critical values of Qn which is cr = P (Qn > cr) = α. The energy goodness-of-test rejects the null if Qn exceeds the empirical critical value cr. The test based on Qn is a consistent

0 goodness-of-fit test (Sz´ekely and Rizzo, 2005). The expected value of Qn under Ho is E|X − X |.

Proposition 3.2. Let x1, x2, . . . , xn be identically and independently distributed as X. Then under

0 E[Qn] = E|X − X | (3.2.3)

Proof.

n n 2 X 1 X E[Q ] = n E|x − X| − E|X − X0| − E|x − x | (3.2.4) n n i n2 i j i=1 i,j=1 n n 2 X (n − 1) X = n E|X − X| − E|X − X0| − E|X − X0| n n i=1 i6=j = 2nE|X − X| − nE|X − X0| − nE|X − X0| + E|X − X0|

= E|X − X0| (3.2.5)

Following Rizzo (2002), the third term of Qn can be linearized to reduce computational time.

Proposition 3.3. Let X1, X2, · · · , Xn be a random sample and X(1), X(2), · · · , X(n) be the ordered 28 sample. Then,

n n X X |xi − xj| = 2 ((2k − 1) − n)X(k). (3.2.6) i,j=1 k=1 Proof.

n n X X |xi − xj| = 2 (x(i) − x(j)) i,j=1 i,j=1 h i = 2 (n − 1)X(n) + (n − 2)X(n−1) + (n − 3)X(n−2) + ··· + (1)X(2) h i −2 (n − 1)X(1) + (n − 2)X(2) + (n − 3)X(3) + ··· + (1)X(n) h i = 2 (n − 1)X(1) + (n − 3)X(2) + (5 − n)X(3) + ··· + (n − 1)X(n) n X = 2 ((2k − 1) − n)X(k). k=1

3.3 Energy statistic for inverse Gaussian distribution

Let X1,...,Xn be independent and identically distributed random sample from a population with distribution F and let x1 . . . , xn be the observed values of the random samples. The proposed energy test statistic is deﬁned as

n n 2 X 1 X Q = n E|x − X| − E|X − X0| − |x − x | , (3.3.1) n n i n2 i j i=1 i,j=1

0 where X and X are independent and identically distributed with distribution F0, and the expec- tations are taken with respect to the null distribution F0. In order to use (3.3.1), we will need to

0 evaluate E|xi − X| and E|X − X |. The derivations of these expected values are derived in the following propositions.

Proposition 3.4. If X and Y are independent and identically distributed IG(µ, λ) and X = x is 29 ﬁxed,

r r ! h λ x i h λx i E|x − Y | = 2xF (x) + (µ − x) − 2 µΦ − 1 − µe2λ/µΦ − + 1 , (3.3.2) Y x µ x µ

where FY is the CDF of the inverse Gaussian distribution and Φ is the standard normal CDF.

If X and Y are independent and identically distributed IG(µ, λ) and X = x is ﬁxed, then

Proof.

Z ∞ E|x − Y | = |x − y|f(y)dy 0 Z x Z ∞ = (x − y)f(y)dy + (y − x)f(y)dy 0 x Z x Z x Z ∞ Z ∞ = xf(y)dy − yf(y)dy + yf(y)dy − xf(y)dy 0 0 x x Z x h Z ∞ Z x i = xFY (x) − yf(y)dy + yf(y)dy − yf(y)dy − x[1 − FY (x)] 0 0 0 Z x = 2xFY (x) − x + E[Y ] − 2 yf(y)dy, (3.3.3) 0

where f(x) and F (x) are the density and CDF of IG(µ, λ). Now,

s ( ) Z x Z x λ −λ(y − µ)2 yf(y)dy = y 3 exp 2 dy 0 0 2πy 2µ y

x r ( 2 ) x Z −1 λ −λ(y − µ) Z 2 = y exp 2 dy = h(y, µ, λ)dy (3.3.4) 0 2π 2µ y 0

q 3/2 Following Chhikara and Folks (1974), let w = λ y − 1 , then dy = √2µy . Note that y µ dw λ(y+u) as y varies from 0 to ∞, w varies from −∞ to ∞. From this transformation, we have 0 = 30 λy2 − (2µλ + w2µ2)y + λµ2 and since y assumes positive values we have

2µλ + w2µ2 + p(2µλ + w2µ2)2 − 4λµ2 y = 2λ µ h p i = (2λ + µw2) + w 4µλ + µ2w2 . (3.3.5) 2λ

The density for w then becomes

dy g(w, µ, λ) = h(y, µ, λ) dw r ( 2 ) 3/2 −1 λ −w 2µy = y 2 exp √ 2π 2 λ(y + u) ( ) A −w2 = √ exp (3.3.6) 2π 2

where

h i µ 2 2 p 2 2 2µy 2y 2 2λ (2λ + µw ) + w 4µλ + µ w A = = y = 1 µ h p i y + µ µ + 1 2 2 2 2 µ 2λ (2λ + µw ) + w 4µλ + µ w + 1 h i 2µ (2λ + µw2) + wp4µλ + µ2w2 = (3.3.7) (4λ + µw2) + wp4µλ + µ2w2

Multiplying the numerator and denominator of (3.3.7) by (4λ + µw2) − wp4µλ + µ2w2 and simplifying yields

h w i A = µ 1 + q . (3.3.8) 4λ 2 µ + w

The density g(w, µ, λ) then becomes

( ) 1 h w i −w2 g(w, µ, λ) = √ µ 1 + q exp . (3.3.9) 2π 4λ 2 2 µ + w 31 Since w is one-to-one transformation, R yf(y)dy = R g(w, µ, λ)dw.

Z x Z ∞ yf(y)dy = g(w, µ, λ)dw 0 0 ( ) Z w 1 h t i −t2 = √ µ 1 + q exp dt −∞ 2π 4λ 2 2 µ + t ( ) ( ) Z w µ −t2 Z w µ t −t2 = √ exp dt + √ q exp dt.(3.3.10) −∞ 2π 2 −∞ 2π 4λ 2 2 µ + t

q 4λ 2 Note that to evaluate the second term, we substitute v = µ + t . Simplifying (3.3.10) yields

r r Z x h λ x i h λx i yf(y)dy = µΦ − 1 − µe2λ/µΦ − + 1 (3.3.11) 0 x µ x µ

From (3.3.3) and (3.3.11) we have

r r ! h λ x i h λx i E|x − Y | = 2xF (x) − x + E[Y ] − 2 µΦ − 1 − µe2λ/µΦ − + 1 Y x µ x µ r r ! h λ x i h λx i = 2xF (x) + (µ − x) − 2 µΦ − 1 − µe2λ/µΦ − + 1 , Y x µ x µ where

r r h λx i h λ xi F (x) = Φ − 1 + e2λ/µΦ − 1 + . Y x µ x µ

and Φ is the standard normal (standard Gaussian) distribution CDF.

Table 3.1 compares the mean deviation formulas E|x − Y | of Proposition 3.4 with the corresponding means estimated by simulation (n = 105) for varying values of λ. The exact values are in very close agreement with the simulated means. 32 Table 3.1 A comparison of mean deviation formula (3.3.12) and with the corresponding means estimated by simulation

µ λ E|x − Y | sim 2 1 1.473 1.447 2 2 1.287 1.267 2 3 1.467 1.446 2 4 1.009 1.003 2 5 0.890 0.908 2 6 0.828 0.823 p 2 7 1.197 1.197 2 8 0.744 0.731 2 9 0.761 0.768 2 10 0.700 0.704

Proposition 3.5. If X and Y are identically distributed as IG(µ, λ) then the mean difference is

−y2 Z ∞ 8e erf(y) E|X − Y | = µ √ dy (3.3.12) p 2 2 0 π y + 2φ

y p µ √2 R 2 where φ = λ , and erf(y) = π 0 exp(−t )dt

See Girone and D’Uggento (2016) for proof. The closed form expression of E|X − Y | is not known but its values can be numerically computed.

3.4 Energy statistic for standard Half-Normal distribution

From Section 3.3, we noticed that the energy statistics for the IG(µ, λ) relies on numerical approximation. Although, our numerical simulations are close to the actual, it is always convenient to work with a closed form expression of any statistics. For this reason, we derive an alternative formula for testing the IG(µ, λ) hypothesis. From Theorem 1.2, we have that if X ∼ IG(µ, λ),

q λ then the random variable R = | X (X − µ)/µ| is distributed as standard half-normal (SHN). This transformation provides an alternative way of deriving a goodness of ﬁt test for IG(µ, λ). Let F

denote the CDF of X and G denote the CDF of R. Thus, the testing problem H0 : F = IG(µ, λ)

vs H1 : F 6= IG(µ, λ) is equivalent to H0 : G = SHN vs H1 : G 6= SHN. Rejecting

H0 : G = SHN implies H0 : F = IG(µ, λ) is rejected. We therefore derive the energy statistics 33 for testing H0 : G = SHN.

Proposition 3.6. If X and Y are independent and identically distributed SHN and X = x is fixed,

x r 2 r 2 −x2 E|x − Y | = 2x erf √ − x − − 2 exp (3.4.1) 2 π π 2

where erf() is the error function.

Proof. The proof is similar to the proof of Proposition 3.4 except that here we will be using the SHN density. From (3.3.3), we only need to ﬁnd R yf(y)dy using the SHN density. Letting u = −x2/2, we have

Z x r 2 Z x x2 yf(y)dy = y exp − dy (3.4.2) 0 π 0 2 2 r 2 Z −x /2 = − exp(u)du π 0 r 2 −x2 = 1 − exp π 2

q Substituting F (X) = erf( √x ), E[X] = 2 , and equation (3.4.2) into equation (3.3.3) yields 2 π

x r 2 r 2 −x2 E|x − Y | = 2x erf √ − x − + 2 exp . (3.4.3) 2 π π 2

Proposition 3.7. If X and X0 are iid SHN random variable, then √ 2(2 − 2) E|X − X0| = √ . (3.4.4) π 34 Proof.

! x r 2 r 2 −x2 E|X − X0| = E 2x erf(√ ) − x − + 2 exp (3.4.5) 2 π π 2 ! Z ∞ x r 2 r 2 −x2 r 2 x2 = 2x erf(√ ) − x − + 2 exp exp − dy. 0 2 π π 2 π 2

We break this into four components as follows:

Z ∞ x r 2 x2 2 2x erf √ exp(− )dx = √ (3.4.6) 0 2 π 2 π

Z ∞ r 2 x2 r 2 x exp − dx = (3.4.7) 0 π 2 π

r 2 Z ∞ r 2 x2 r 2 exp − dx = (3.4.8) π 0 π 2 π

r 2 Z ∞ r 2 2 2 exp(−x2)dx = √ (3.4.9) π 0 π π

Putting all together, we have

√ 2 r 2 r 2 2 2(2 − 2) E|X − X0| = √ − − + √ = √ (3.4.10) π π π π π

q λ X−µ Let Y = µ µ where X ∼ IG(µ, λ). Then testing H0 : X = IG(µ, λ) is equivalent to testing H0 : Y = SHN. Using equation (3.4.3) and (3.4.10), the computing formula for testing 35

H0 : Y = SHN is

√ n r r 2 ! n 2 X Yi 2 2 −Y 2(2 − 2) 1 X Q = n 2 Y erf √ − Y − + 2 exp i − √ − |Y −Y | n n i i π π 2 π n2 i j i=1 2 i,j=1 (3.4.11)

Figure 3.1 shows replicates of energy test statistic Qn as shown in (3.4.11) assuming IG(1, 1) model. The shape of the distribution is similar to that of gamma.

Figure 3.1 Sampling distribution of the energy goodness-of-ﬁt statistic for univariate inverse Gaus- sian, n = 50 36

CHAPTER 4 DISTANCE CORRELATION AS A MEASURE OF DEPENDENCE

4.1 Distance correlation

The distance correlation introduced by Szekely´ et al. (2007) is used as measure of dependence between random vectors of arbitrary, but not necessarily equal, dimension. Interest in application and theory of the concept of distance correlation has risen since its emergence. Some of the use cases of distance correlation includes feature screening (Li, Zhong, and Zhu, 2012), measuring non linear dependence in time series (Zhou, 2012), and detecting associations in large astrophysical database (Mart´ınez-Gomez,´ Richards, and Richards, 2014). See Edelmann, Fokianos, and Pitsillou (2019) for an updated review of applications of distance correlation. The classical measure of dependence, the Pearson product moment correlation ρ, only measures the linear dependence between two random variables X and Y . Also, ρ = 0 characterizes independence of X and Y only if (X,Y ) is bivariate normal. Moreover, the Pearson correlation can be zero for dependent variables. For all distributions with ﬁnite ﬁrst moments, the distance correlation R(X,Y ) generalizes the Pearson-product moment correlation in the following ways.

1. R(X,Y ) is deﬁned for arbitrary random vectors X and Y.

2. R(X,Y ) = 0 characterizes independence.

3. R(X,Y ) does not require distributional assumption.

4. R(X,Y ) measures both linear and nonlinear association between two random variables or random vectors.

The distance correlation is bounded between zero and one and equals zero if and and only if the random vectors are independent. Analogous to the Pearson correlation, the distance correlation is computed from other quantities which includes, the distance variance, distance standard deviation, and distance covariance. These quantities are deﬁned as follows. 37 Definition 4.1. The distance covariance between random vectors X and Y with finite first moment is the non negative number

Z 2 V(X,Y ) = |ϕX,Y (t, s) − ϕX (t)ϕY (s)| w(t, s) dt ds, (4.1.1) Rp+q

where ϕX,Y (t, s) = exp{iht, Xi + ihs, Y i }, ϕX (t) = exp{iht, Xi}, and ϕY (s) = exp{ihs, Y i } are the characteristic functions of the random vectors (X,Y), X and Y respectively, w(t, s) =

1+p 1+q p cpcq|t|p |s|q , | · | is the Euclidean norm in R , p, q denote the Euclidean dimension of X and Y, π(1+k)/2 and thus of s and t, ck = Γ((1+k)/2) , and Γ is the complete gamma function.

The distance covariance deﬁned above is a weighted L2 norm measuring the distance between the joint characteristic function of X and Y and the product of the marginal characteristic functions of X and Y . The weight function w(t, s) is chosen to produce a scale equivariant and rotation

invariant measure that doesn’t go to zero for dependent variables. If E|X|p and E|Y |q are both ﬁnite then, (4.1.1) can be expressed as (Szekely´ and Rizzo, 2009)

2 0 0 0 00 0 00 V (X,Y ) = E[|X − X |p|Y − Y |q + E|X − X |pE|Y − Y |q − 2E|X − X ||Y − Y |], (4.1.2)

where X,X0,X00 are iid.

Deﬁne the joint empirical characteristic function of the sample {(X1,Y1),..., (Xn,Yn)} as

n 1 X ϕˆ (t, s) = exp{iht, X i + ihs, Y i}, X,Y n k k k=1

and the marginal empirical characteristic functions of the X sample and the Y samples as:

n n 1 X 1 X ϕˆ (t) = exp{iht, X i}, ϕˆ (s) = exp{ihs, Y i} X n k Y n k k=1 k=1

We can deﬁne the distance covariance as a measure of distance between ϕˆX (s, t) and ϕˆX (x)ϕ ˆY (t). 38 Thus

Z 2 2 Vn(X,Y ) = ||ϕˆX (s, t) − ϕˆX (s)ϕ ˆY (t)|| = |ϕˆX (s, t) − ϕˆX (s)ϕ ˆY (t)| w(t, s) dt ds (4.1.3) Rp+q

An important result presented by Sz´ekely et al. (2007) is summarised as follows:

Theorem 4.1. If (X, Y) is a random sample from the joint distribution of (X,Y), then

2 ||ϕˆX (s, t) − ϕˆX (s)ϕ ˆY (t)|| = S1 + S2 − 2S3, (4.1.4)

where

n 1 X S = |X − X | |Y − Y | 1 n2 k l p k l q k,l=1 n n 1 X 1 X S = |X − X | |Y − Y | 2 n2 k l p n2 k l q k,l=1 k,l=1 n n 1 X X S = |X − X | |Y − Y | 3 n3 k l p k m q k=1 l,m=1 (4.1.5)

Definition 4.2. The distance variance of a random vector X with a finite first moment, is defined as the nonnegative number

Z 2 V(X) = |ϕˆX,X (t, s) − ϕˆX (t)ϕ ˆX (s)| w(t, s)dt ds (4.1.6) Rp+q provided the integrals is ﬁnite.

Let X, X0 and X00 be independent and identically distributed and let E(|X|)2 < ∞. Then the distance variance can be written as

V2(X) = E[|X − X0|2] + [E|X − X0|]2 − 2E|X − X0||X − X00|] (4.1.7) 39 Note that if X ∈ R then one-half of E[|X − X0|2], which is the ﬁrst term in the right hand side of (4.1.7) equals the variance of X (Edelmann, Richards, and Vogel, 2017):

1 1 E[|X − X0|2] = E(X2 − 2XX0 + X02) = E(X2) − E(X)E(X0) 2 2

The second term (E|X −X0|)2 is the square of the Gini mean difference (Gerstenberger and Vogel, 2015). The following properties of the distance variance hold:

1. V(X) = 0 implies X = E[X], almost surely.

2. V(X) = 0 if and only if every sample realization is identical.

3. If X and Y are independent, then V(X + Y ) ≤ V(X) + V(X) with equality if one of the random vectors X or Y is constant.

4. V(a + bCX) = |b|V(X), for all constant vectors a in Rp, scalars b, and p × p orthonormal matrices C.

Definition 4.3. The nonnegative square root of V(X) is called the distance standard deviation of X

Definition 4.4. The distance correlation between random vectors X and Y with finite first moments is the nonnegative number R(X, Y ) deﬁned by

 2 √ V (X,Y ) , V2(X)V2(Y ) > 0;  2 2 R2(X,Y ) = V (X)V (Y ) (4.1.8)  0, V2(X)V2(Y ) = 0.

4.2 Empirical distance covariance and correlation

The empirical distance covariance and correlation are functions of double-centered distance ˆ ˆ matrices A and B of the random samples (x, y) = {(xi, yi): i = 1, ·, n} from the joint distribution 40 of random vectors X ∈ Rp and Y ∈ Rq. To obtain the double-centered matrices, ﬁrst compute the

Euclidean distance matrix (aij) = (|xi − xj|) for the samples x ∈ X and (bij) = (|yi − yj|) for the samples y ∈ Y , respectively. The ij−th entry of Aˆ is

ˆ Aij = aij − a¯i. − a¯.j +a ¯.. i, j = 1, ··· , n,

1 Pn 1 Pn 1 Pn where a¯i. = n j=1 aij, a¯.j = n i=1 aij, a¯.. = n2 ij=1 aij. Similarly, the ij-th entry of Bˆ is

ˆ ¯ ¯ ¯ Bij = bij − bi. − b.j + b.., i, j = 1, ··· , n.

Definition 4.5. The empirical distance covariance Vn(X, Y ) is the nonnegative number defined by

n 1 X V (X,Y ) = Aˆ Bˆ . (4.2.1) n n2 ij ij i,j=1

Similarly, the distance variance Vn(X) is the nonnegative number deﬁned by

n 1 X V (X) = V (X,X) = Aˆ . (4.2.2) n n n2 ij i,j=1

2 2 Note that Vn(X,Y ) is not an unbiased estimator of the V (X,Y ) since

(n − 1) E(V2) = [(n − 2)2V2(X,Y ) + 2(n − 1)µ − (n − 2)αβ], (4.2.3) n n3

where α = E|X − X0|, β = E|Y − Y 0| and µ = E|X − X0||Y − Y 0|. An unbiased estimator of V2(X,Y ) (Szekely´ and Rizzo, 2013) is an inner product of the U-centered distance matrices deﬁned as 1 X (A.˜ B˜) := A˜ B˜ , (4.2.4) n(n − 3) i,j i,j i6=j 41 where A˜ is a U-centered matrix with (i, j)−th term

 a − 1 Pn a − 1 Pn a + 1 Pn a , i 6= j; ˜  ij n−2 i=1 i,j n−2 j=1 i,j (n−1)(n−2) i,j=1 i,j Ai,j = (4.2.5)  0, i = j.

Definition 4.6. The empirical distance correlation between random vectors X and Y with finite first

2 moment is the nonnegative number Rn(X, Y ) deﬁned by

 2 √ Vn(X,Y ) , V2(X)V2(Y ) > 0;  2 2 n n 2 Vn(X)Vn(Y ) Rn(X,Y ) = (4.2.6)  2 2 0, Vn(X)Vn(Y ) = 0.

The empirical distance covariance and correlation are computationally simple and are implemented in the R package energy (Rizzo and Szekely,´ 2019) Properties of the distance covariance and correlation include (Szekely´ et al., 2007):

1. Vn(X,Y ) and Rn(X,Y ) converges almost surely to V(X,Y ) and R(X,Y ) respectively. Thus, with probability 1,

lim Vn(X,Y ) = V(X,Y ), n→∞

lim Rn(X,Y ) = R(X,Y ). n→∞

2. If E(|X|p + |Y |q) < ∞, then 0 ≤ R(X,Y ) ≤ 1, and R(X,Y ) = 0 if and only if X and Y are independent.

3. Vn(X,Y ) ≥ 0 and Vn(X) = 0 if and only if every sample observation is identical.

4. 0 ≤ Rn(X,Y ) ≤ 1. 42

5. If Rn(X,Y ) = 1 then there exist a vector a, nonzero real number b and orthogonal matrix C such that Y = a + bXC.

6. R(X,Y ) is scale invariant and also invariant to to shift and orthogonal transformations of X and Y.

7. if p = q = 1, with X and Y from the standard normal distribution, then R(X,Y ) ≤ |ρ| and

ρ arcsin(ρ) + p1 − ρ2 − ρ arcsin( ρ ) − p4 − ρ2 + 1 R2 = √ 2 . π (4.2.7) 1 + 3 − 3

Figure 4.1 illustrates the empirical Pearson and distance correlation for different dependence structures. The left graph represents linear dependence between X and Y, while the middle and the right graphs represents cubic and quadratic dependence between X and Y respectively. In each setting the sample size is ﬁxed at n = 1000. The linear relationship is deﬁned as Yi = 1.5Xi + ei, where X is uniformly distributed from -3 to 3, ei is normal with mean 0 and standard deviation of

2 0.5. The quadratic relationship is deﬁned as Yi = 0.5Xi + ei, where Xi and ei are simulated from the standard normal and the standard uniform distribution respectively. The cubic trend is obtained

3 2 from Yi = 0.5Xi + 0.9Xi − Xi + ei, where Xi is normal with mean 0 and standard deviation 0.75, and ei is simulated from the standard uniform distribution. The empirical distance correlation for the nonlinear relationships are signiﬁcantly greater than zero. This is in contrast to the empirical Pearson’s correlation which only detects linear association between the random variables X and Y.

4.3 Distance covariance/correlation goodness-of-ﬁt test

Given the random vectors X and Y , let fX , fY denote the characteristic function of X and

Y respectively. Let fXY denote the joint characteristic function of X and Y . A hypothesis of independence between X and Y can be formulated in terms of the characteristic functions as

H0 : fXY = fX fY vs H1 : fXY 6= fX fY 43

Figure 4.1 Illustration of empirical Pearson product moment correlation and distance correlation for different dependence structures.

The distance correlation can be used as a measure of dependence between X and Y . A test based

2 2 on the statistic nVn(X,Y ) or the normalized version nVn(X,Y )/S2 rejects the null hypothesis

2 of independence between X and Y for large values of nVn(X,Y ). Under the null hypothesis of independence between X and Y , nV2(X,Y ) converges to a quadratic form

∞ X 2 Q = λjZj (4.3.1) j=1

where Zj are iid standard normal and λj are non-negative coefﬁcients that depend on the distri-

2 bution of the underlying random variables. When X and Y are dependent, nVn(X,Y ) → ∞

2 stochastically as n → ∞. Hence a test that rejects independence for large values of nVn(X,Y ) is consistent against dependent alternatives (Szekely´ et al., 2007). The energy package (Rizzo and Szekely,´ 2019) contains the functions dcor and dcor.test for computation and signiﬁcant test of the distance correlation coefﬁcient. 44 4.4 Distance covariance test for the inverse Gaussian

We propose a new goodness-of-ﬁt test for the inverse Gaussian distribution using the distance covariance. Let X1,X2, ··· ,Xn be iid random sample from the inverse Gaussian distribution. We wish to test if the sample is from the inverse Gaussian distribution:

H0 : The sampled population has inverse Gaussian distribution,

H1 : The sampled population does not have inverse Gaussian distribution

On the basis of Khatri’s independence characterization (Khatri, 1962) of the inverse Gaussian distribution, the above hypothesis is equivalent to testing the independence of X¯ and V . However, we only have one sample to deal with. One way to obtain several replicates of X¯ and V is to use the leave one out algorithm. However, these replicates are not independent. We suggest dividing the samples into non-overlapping blocks each of size k. Since Khatri’s theorem is valid for n ≥ 2, we recommend k = 2 to obtain more replicates of X¯ and V , when the sample size is moderate. With this, we end up working with half sample. One can increase the block size for sufﬁciently large sample size. The consistent test of independence for the bivariate replicates (X,V¯ ) is based

2 on the distance covariance Qv = nVn. The test is implemented as a nonparamtric permutation test. 45

CHAPTER 5 EMPIRICAL RESULTS

In this chapter, the empirical power results for the Energy test and the distance covariance goodness-of-ﬁt test is presented. The critical values are estimated via Monte carlo simulation.

5.1 Composite hypothesis

We focus on testing whether a random sample is from the inverse Gaussian family. For this type of hypothesis testing problem, the parameters of the inverse Gaussian distribution is not speciﬁed or unknown. However, the unknown parameters can be estimated by some appropriate method. This type of hypothesis testing problem is called the composite hypothesis. From theorem 1.2 we know that Y = |pλ/X(X − µ)/µ| ∼ SHN, given that X ∼ IG(µ, λ). Thus, we have a standard half normal random variable irrespective of parameter values of µ and λ. In testing the inverse Gaussian composite hypothesis, we substitute the the unknown parameters

¯ ˆ P −1 ¯ −1 with the maximum likelihood estimators of (µ,λ) which are µˆ = X and λ = n/ (Xi − X ) and then assume the parameters are fully speciﬁed. Consequently, we have that if Z = Y then F (Z|µ =µ, ˆ λ = λˆ) is standard half-normal. We now test the hypothesis that, given the estimated parameters, Y is from the standard half-normal which is equivalent to testing the hypothesis that X if from the inverse Gaussian family. The energy test statistic for testing the composite inverse Gaussian hypothesis is

√ n r r 2 ! 2 X Yi 2 2 −Y 2(2 − 2) Q = n 2 Y erf √ − Y − + 2 exp i − √ − n n i i π π 2 π i=1 2 n 2 X (2k − 1 − n)Y , (5.1.1) n2 (k) k=1

where Yi are the transformed sample and Y(k) are the ordered transformed sample. 46 5.2 Simulation design

Using the bootstrap approach, the energy test and the EDF tests can be implemented as follows:

n For any positive realizations {Xi}i=1, let S = Sn(X1, ··· ,Xn) denote one of the test statistics:

Qn(X1, ··· ,Xn), ADn(X1, ··· ,Xn), KSn(X1, ··· ,Xn), and CVMn(X1, ··· ,Xn).

1. Generate independent samples X1, ··· ,Xn of size n from the alternative distribution.

2. Compute the maximum likelihood estimators µˆ and λˆ

q λˆ 3. Transform the samples using Yi = | (Xi − µˆ)/µˆ| and compute the value Sn(Y1, ··· ,Yn) Xi of the test statistic.

∗ ∗ 4. Generate b independent samples xi1, ··· , xbn each of size n from the standard inverse Gaus- sian.

∗ ˆ∗ ∗ ∗ ∗ ∗ ∗ 5. For each bootstrap sample b, compute µˆ , λ ,Yi , and S = Sn(Yi , ··· ,Yn ).

∗ ∗ ∗ 6. Let C1−α be the empirical 1 − α quantile of S1 , ··· ,Sb

∗ 7. Reject H0 if Sn(Y1, ··· ,Yn) > C1−α

The power of a test is the probability of rejecting a false null hypothesis. In our problem, the null hypothesis is that the random sample is from an inverse Gaussian distribution. The alternative hypothesis is that the random sample is from other specified distribution. The power of each test is obtained by dividing the number of times the null is rejected by the number of runs. If the null is true, the empirical type I error coincide with the specified level of significance. In our simulations we perform 500 bootstrap replicates and 5000 Monte Carlo runs. All simulation experiments were executed using R. ¯ The distance covariance test is implemented as a randomization test. Let Z = (Xp,Vq) be p+q ¯ dimensional sample with w1 and w2 as row labels of X and V respectively. Let (Z, w1, w2) be the sample from the joint distribution of X¯ and V . For independence test a permutation replicate is 47 generated by permuting either w1 or w2. The achieved level of significance of the permutation test is described as follows: Let θˆ be the distance covariance test statistic.

ˆ 1. Compute the observed test statistic θ(Z, w1, w2)

2. For each replicate, indexed b = 1, ··· ,B:

(a) Generate a random permutation πb = π(w1)

ˆ(b) ∗ (b) Compute the statistic θ = θ (Z, πb)

3. The achieved level of signiﬁcance (ASL) is computed as follows:

1 + PB I(θˆ(b) ≥ θˆ) pˆ = b=1 B + 1

We now can test the independence of X¯ and V . The permutation test of the distance covariance goodness-of-ﬁt test is implemented as follows.

1. Partition the sample into non overlapping blocks each of size 2. Thus, for X1, ··· Xn let

Y1 = (X1,X2),Y2 = (X3,X4),Y3 = (X5,X6), ··· ,Yn/2 = (Xn−1,Xn).

¯ ¯ ¯ 2. For each sub sample Y1,Y2, ··· ,Yn/2 compute (X1,V1), (X2,V2), ··· , (Xn/2,Vn/2)

n 2 ¯ 3. Compute the distance covarince test statistic 2 V (X,V )

4. Find the ASL using the procedure above.

5. Reject H0 at signiﬁcance level α if pˆ ≤ α

5.3 Empirical signiﬁcance

This section presents the empirical test size results for the various test. The empirical results are shown for IG(µ, λ) distribution with µ = 1 and λ = 0.5, 1, 2, 4, 8, 20. Sample size of 10, and 20 are considered. Table 5.1 shows the empirical results for α = 0.1. The results indicates that our proposed method which is the Energy test and the distance covariance test controls the nominal 48 signiﬁcance level. Except in the case of IG(1, 20), with n = 10 where the estimated test size is slightly higher for Kolomogorv-smirnov and Anderson-Darling tests, all EDF tests also controls the nominal signiﬁcance level.

Table 5.1 Empirical estimates of size of test with α = 0.10

n Energy DC KS AD CVM

IG(1, 0.5) 10 0.1020 0.0950 0.1074 0.1016 0.1008 20 0.1074 0.0978 0.0990 0.0966 0.1024 IG(1, 1) 10 0.0960 0.0998 0.1000 0.0950 0.0962 20 0.1004 0.0998 0.0972 0.0962 0.0988 IG(1, 2) 10 0.1026 0.0944 0.1012 0.1000 0.1024 20 0.1052 0.1076 0.1072 0.1046 0.1056 IG(1, 4) 10 0.1010 0.0972 0.1024 0.1040 0.1070 20 0.1096 0.0966 0.1044 0.099 0.1004 IG(1, 8) 10 0.1048 0.0950 0.1048 0.1060 0.1076 20 0.1040 0.1012 0.1074 0.1066 0.1076 IG(1, 20) 10 0.0998 0.1010 0.1120 0.1110 0.1088 20 0.0940 0.1008 0.0968 0.0968 0.0980

5.4 Empirical power comparison

In this section we compare the empirical power of the proposed method with other tests. We consider several alternative distributions including those that are similar to the inverse Gaussian distribution. For the power simulations, the following alternatives are considered as examples.

Example 5.1 (Empirical Power of testing IG(λ, µ) against Weibull alternative).

The Weibull distribution which is one of most widely used alternatives of the inverse Gaussian 49 distribution. By deﬁnition, the Weibull density, W (λ, k), with parameters λ and k is given by

 κ  κ x −(x/λ)κ  λ λ e x ≥ 0 f(x, λ, κ) = (5.4.1)  0 x < 0

For this example, we consider a ﬁxed scale parameter, λ = 1 and then vary the shape parameter κ. The empirical power of the various tests for the composite inverse Gaussian hypothesis are shown in Figures 5.1 to 5.4 . We see that the energy test competes with the EDF tests and performs really

Figure 5.1 Estimated power of tests against Weibull(1, κ) with varying shape parameter κ, n = 20, 50 and α = 0.1

well when the shape parameter is small (κ = 1, 2). However, these test becomes less powerful when the shape parameter value is high. On the other hand, the distance correlation based test is more powerful for high values of the shape parameter and the less powerful for small values of the shape parameter. 50

Figure 5.2 Estimated power of tests against Weibull(1, 1) with sample varying size and α = 0.1 51

Figure 5.3 Estimated power of tests against Weibull(1, 2) with sample varying size and α = 0.1 52

Figure 5.4 Estimated power of tests against Weibull(1, 3) with sample varying size and α = 0.1 53 Example 5.2 (Empirical Power of testing IG(λ, µ) against Beta alternative).

The beta density is deﬁned as

xα−1(1 − x)β−1 f(x, α, β) = (5.4.2) B(α, β)

Γ(α)Γ(β) where B(α, β) = Γ(α+β) and Γ is the gamma function. For this alternative we consider α = 1, 1.5, 2, 2.5, 3, 4, 6 and β = 1. Figures 5.5 to 5.8 shows the empirical power results for testing the composite inverse Gaussian distribution against the Beta alternative for the various test under different settings. These plots shows similar pattern as observed in the case of the Weibull alternative. Thus, the energy test competes with the EDF tests. These tests are powerful (with the energy test being the most powerful) for small shape parameter values. However, it is apparent that that these tests performs well as the sample size increases, which was not the case in the Weibull alternative. The distance correlation based test outperforms the EDF and energy tests for large shape parameter values.

Figure 5.5 Estimated power of tests against Beta(α, 1) with varying shape parameter α, n = 20, 50, 100 and α = 0.1 54

Figure 5.6 Estimated power of tests against Beta(1, 1) with varying sample size and α = 0.1 55

Figure 5.7 Estimated power of tests against Beta(2, 1) with varying sample size and α = 0.1 56

Figure 5.8 Estimated power of tests against Beta(3, 1) with varying sample size and α = 0.1 57 Example 5.3 (Empirical Power of testing IG(λ, µ) against Pareto alternative).

The Pareto density with scale parameter η and shape parameter θ is deﬁned as

θηθ f(x, η, θ) = , (5.4.3) xθ+1

For the Pareto alternative, we ﬁx the scale parameter at 1 and vary the shape parameter. The empirical power values are shown in Figures 5.9 to 5.12. It can be observed that the distance correlation based test appears outperforms the other tests for shape parameter value of at least 2. For θ = 1, the energy test compares favourably to other test. We also noticed just as in the case of the Beta alternative that, the power of each test improves as the sample size gets large.

Figure 5.9 Estimated power of tests against Pareto(1, θ) with varying shape parameter θ, n = 20, 50, 100 and α = 0.1 58

Figure 5.10 Estimated power of tests against Pareto(1, 1) with varying sample size and α = 0.1 59

Figure 5.11 Estimated power of tests against Pareto(1, 2) with varying sample size and α = 0.1 60

Figure 5.12 Estimated power of tests against Pareto(1, 3) with varying sample size and α = 0.1 61 Example 5.4 (Empirical Power of testing IG(λ, µ) against Gamma alternative).

The gamma density function with parameters α and β is deﬁned as

βαxα−1e−βx f(x, α, β) = x > 0, α, β > 0, (5.4.4) Γ(α)

where Γ(α) is the gamma function. Figures 5.13 to 5.17 shows the empirical power values. The energy test outperforms other test for the alternatives Gamma(1,1) and Gamma(2,1). For a sample size of at least 50, the distance correlation based test compares favourably to other test when the shape parameter is large as shown in the case of Gamma(4,1).

Figure 5.13 Estimated power of tests against Gamma(α, 1) with varying shape parameter α, n = 20, 50, 100 and α = 0.1 62

Figure 5.14 Estimated power of tests against Gamma(1, 1) with varying sample size and α = 0.1 63

Figure 5.15 Estimated power of tests against Gamma(2, 1) with varying sample size and α = 0.1 64

Figure 5.16 Estimated power of tests against Gamma(3, 1) with varying sample size and α = 0.1 65

Figure 5.17 Estimated power of tests against Gamma(4, 1) with varying sample size and α = 0.1 66 Example 5.5 (Empirical Power of testing IG(λ, µ) against Log-normal alternative).

The lognormal density is deﬁned as

1 −(lnx − µ)2 f(x, µ, σ) = √ exp (5.4.5) xσ 2π 2σ2

Figure 5.18 shows that the energy test competes with the other EDF tests. However, the distance correlation based test performs poorly compared to other tests. The empirical power values for the distance correlation based test are usually around the nominal signiﬁcance level. This suggest that the distance correlation test is unable to distinguish between the log-normal and the inverse Gaussian distribution.

Figure 5.18 Estimated power of tests against LN(1, σ) for different values of σ, n = 20, 50, 100 and α = 0.1 67

Figure 5.19 Estimated power of tests against LN(1, 1) with varying sample size and α = 0.1 68

Figure 5.20 Estimated power of tests against LN(1, 2) with varying sample size and α = 0.1 69 5.5 Applications to real data

In this section we apply the proposed methods to two different data sets. The procedure for determining the p-values are described as follows:

1. Estimate the maximum likelihood estimates of the parameters of the inverse Gaussian and compute the test statistic θˆ

2. For each bootstrap replicates, indexed b = 1, ··· ,B

a) Generate the sample x∗(b) each of size n from the inverse Gaussian distribution with the estimated parameters from the ﬁrst step

ˆ th b) Compute the test statistic θb for the b replicate

1 PB ˆ ˆ 3. compute the p-value as B i=1 I θb ≥ θ

The first data were reported on active repair times(hours) for an airborne communication transceiver. See Chhikara and Folks (1989) and references therein. We intend to check if the inverse Gaussian model fits this data. Figure 5.21 shows the density plot of the data. The maximum likelihood estimates of µ and λ are µˆ = 3.606522 and λˆ = 1.658853. The p-values for the various tests are summarized in Table 5.3. At the 10% level of significance, the empirical p-values indicates that, the data fits the inverse Gaussian distribution. The second data obtained from Ang and Tang (1975) represents 25 measurements of precipitation (in inches) at the Jug Bridge, Maryland. Folks and Chhikara (1978) suggested the inverse Gaussian model.Table 5.4 shows the data while Figure 5.22 shows the distribution of the data. The parameter estimates are µˆ = 2.1556 and λˆ = 8.081986. Except the Kolmogorov-Smirnov test, none of the tests is significant at the 10% level of significance. The claim that the data comes from the inverse Gaussian law is questionable. 70

Table 5.2 Active repair times(hours) for an airbone communication transreceiver.

0.2 0.3 0.5 0.5 0.5 0.5 0.6 0.6 0.7 0.7 0.7 0.8 0.8 1.0 1.0 1.0 1.0 1.1 1.3 1.5 1.5 1.5 1.5 2.0 2.0 2.2 2.5 2.7 3.0 3.0 3.3 3.3 4.0 4.0 4.5 4.7 5.0 5.4 5.4 7.0 7.5 8.8 9.0 10.3 22.0 24.5

Table 5.3 p-values for the various tests.

Data Energy DCor AD CvM KS Transceiver 0.7808 0.3080 0.8259 0.8554 0.9036 Precipitation 0.0651 0.0310 0.0843 0.0657 0.1659

Figure 5.21 Active repair times(hours) for an airborne communication transceiver. 71 Table 5.4 Precipitation (in inches) at Jug Bridge, Maryland.

1.01 1.11 1.13 1.15 1.16 1.17 1.17 1.20 1.52 1.54 1.54 1.57 1.64 1.73 1.79 2.09 2.09 2.57 2.75 2.93 3.19 3.54 3.57 5.11 5.62

Figure 5.22 Precipitation (in inches) at Jug Bridge, Maryland. 72

CHAPTER 6 OTHER APPLICATIONS OF THE DISTANCE COVARIANCE BASED GoF TEST

6.1 Introduction

In this chapter, other applications of the distance correlation based goodness-of-ﬁt test is considered. Speciﬁcally the method is applied to the Pareto population. A summary of the distributions characterized by the independence of two statistics is presented.

6.2 Characterization of univariate distribution by independence of two statistics

Consider a random sample X1, ··· ,Xn from a population with cumulative density function

FX . Let T1 = T1(X1, ··· ,Xn) and T2 = T2(X1, ··· ,Xn) be two statistics where both could be linear or second degree polynomial. For some populations, the independence of T1 and T2 determines the population. We’ve already indicated in chapter 1 that the normal and the inverse Gaussian population have such characterization. Other distributions with such characterization include gamma, exponential, and Pareto. Recall that, in the case of normal population the two statistics are the sample mean and variance. This characterization result due to Geary (1936) is summarized in the theorem below.

¯ Theorem 6.1 (Normal). Let X1, X2, · · · , Xn be a sample from a population with mean X = −1 Pn 2 −1 Pn ¯ 2 n i=1 Xi and sample variance S = n i=1(Xi −X) . A necessary and sufﬁcient condition

for the normality of the population is the stochastic independence of X¯ and S2.

Fisher (1925) proved the necessity of the result while Geary (1936) proved the sufﬁciency of the results by assuming moments of all orders. Lukacs (1942) established the above results by assuming the existence ﬁrst two m oments. A generalization of theorem 6.1 replaces the variance with symmetric and homogeneous polynomial statistics of higher degree, otherwise known as k − statistic of order p (Lukacs, 1956).

Theorem 6.2. Let X1, X2, ..., Xn be a sample of size n from a population with with distribution 73 function F (x). Assume that the pth (p > 1) moment of F (x) exist. The population is normal if, and only if, the k − statistic of order p is independent of the sample mean.

For a proof see Basu and Laha (1954).

Similar results holds for the gamma Gaussian distribution. Let X1 and X2 be iid from the

X1 gamma density. The gamma population has the property that, X1 + X2 and are independent X2 (Pitman, 1937). Lukacs (1955) used this property to establish a characterization result for the gamma population similar to the normal. Theorem 6.3 summarizes Lucas’s result.

Theorem 6.3. Let X and Y be two nondegenerate and positive random variables, and assume that they are independently distributed. The random variables U = X + Y and V = X/Y are independent if, and only if, both X and Y have Gamma distributions with the same scale parameter.

Tata (1969) characterized the exponential distribution by the independence of X(1) and X(2) −

X(1). The result is given in the following theorem.

Theorem 6.4. Let {Xn, n ≥ 1} be i.i.d with common distribution function F(x) which is absolutely

continuous. Then for the Xk to have an exponential distribution it is necessary and sufﬁcient that

X(1) and X(2) − X(1) are independent

Ahsanullah and Kabir (1973) characterized the Pareto distribution by the independence of the

rth order statistic X(r) and the ratio X(s)/X(r).

Theorem 6.5. Let X be a random variable having an absolutely continuous (with respect to Lebesgue measure) distribution F(X). A necessary and sufﬁcient c ondition t hat X f ollows the

Pareto distribution is that for some r and s (1 ≤ r ≤ s ≤ n), the statistic X(r) and X(s)/X(r) are independent .

These characterization results provides a ﬂexible way of testing the hypothesis that a random sample is from any of the distributions above. In the subsequent sections we restrict our attention to the Pareto distribution independence characterization. The distance correlation based goodness- of- ﬁt test is used to test the hypothesis that a random sample is from the Pareto family. 74 6.3 Pareto distribution

A random variable X is said to have Pareto distribution of the ﬁrst type with scale parameter σ > 0 and shape parameter α > 0 if its pdf is

ασα P (σ, α) = f(x, α, σ) = , x ≥ σ > 0. (6.3.1) xα+1

¯ σ α The survival function is given by F (x) = ( x ) , x ≥ σ > 0. The shape parameter α, which is the tail index, measures the heaviness in the upper tail. Figure 6.1 shows the density of the Pareto distribution for various shape parameter value. Small values of α indicates heavier tails. The moment generating function E[etX ] are deﬁned for t ≤ 0. The raw moments of the the Pareto distribution is given by ασk E[Xk] = , α > k. (6.3.2) α − k

ασ ασ2 The mean and variance are E[X] = α−1 and V ar(X) = (α−1)(α−2) respectively.

6.4 Goodness-of-ﬁt test test for the Pareto distribution

We test the composite hypothesis that a random sample is from the Pareto family. On the basis of Ahsanullah and Kabir (1973) independence characterization of the Pareto distribution, we propose a distance correlation based goodness-of-test. Thus, we test the independence of the

th statistic X(r) and X(s)/X(r) for some r and s (1 ≤ r ≤ s ≤ n) where X(r) is the r order statistics. The distance correlation is used as measure of independence. Note that a distance correlation of zero indicates that the two statistics are independent and hence the sample is from the Pareto family. The simulation procedure for the distance correlation based goodness-of-ﬁt test is similar to those described in section 5.2.

1. Partition the sample into non overlapping blocks each of size 2. Thus, for X1, ··· Xn let

Y1 = (X1,X2),Y2 = (X3,X4),Y3 = (X5,X6), ··· ,Yn/2 = (Xn−1,Xn).

2. For each sub sample Y1,Y2, ··· ,Yn/2 compute (s1, t1), (s2, t2), ··· , (sn/2, tn/2), where si =

Y(i1) and ti = Y(i1)/Y(i2) for i = 1, 2, ··· , n/2 75

Figure 6.1 Pareto Type I probability density functions for various shape parameter value with σ ﬁxed at 1

n 2 3. Compute the distance correlation test statistic 2 V (S,T )

4. Find the ASL using the permutation test described in section 5.2

5. Reject H0 at signiﬁcance level α if pˆ ≤ α

The distance correlation test is compared with the energy test and other EDF test. For the energy test we follow the procedure described in Rizzo (2009). Note that if X ∼ Pareto(σ, α) then T = log(X) has a two parameter exponential distribution. Hence Testing the hypothesis that X ∼ Pareto(σ, α) is equivalent to testing that T ∼ Exp(µ, α) where µ = log(σ) is the location parameter and α is the rate parameter. The energy statistic is

n n 2 X 1 X Q = n E|x − X| − E|X − X0| − |x − x | (6.4.1) n n i n2 i j i=1 i,j=1 where X and X0 are iid copies. If X and Y are independent and identically distributed Exp(µ, α) 76 and X = x is ﬁxed,

Z ∞ Z x E|x − y| = |x − y|f(x)dy = 2xFy(x) − x − E[Y ] − 2 yf(y)dy (6.4.2) 0 0

Using equation 6.4.2, we have

1 E|x − X| = x − µ + (1 − 2F (x)), x ≥ µ (6.4.3) α X and E|X − X0| = 1/α (6.4.4)

Plugging the above expected values into equation 6.4.1 we obtain the computing formula for the test statistics:

n n 2 X 1 1 1 X Q = n (x − µ + (1 − 2F (x ))) − − |x − x | (6.4.5) n n i α X i α n2 i j i=1 i,j=1 The simulation design for the energy test is described below. 5000 simulation runs is performed to estimate the empirical power and type one error rate.

1. Generate independent samples X1, ··· ,Xn of size n from the alternative distribution.

P Xj −1 2. Compute the maximum likelihood estimators αˆ = n[ log σˆ ] , σˆ = X(1) and µˆ = logσˆ

3. Transform the samples using Ti = Log(Xi) and compute the value Sn(Y1, ··· ,Yn) of the test statistic.

∗ ∗ 4. Generate b independent samples xi1, ··· , xbn each of size from n Pareto(ˆσ, αˆ).

∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ 5. For each bootstrap sample b, compute µˆ , αˆ ,σ , Ti , and S = Sn(Yi , ··· ,Yn ).

∗ ∗ ∗ 6. Let C1−α be the empirical 1 − α quantile of S1 , ··· ,Sb

∗ 7. Reject H0 if Sn(Y1, ··· ,Yn) > C1−α 77 6.5 Empirical power and signiﬁcance of tests

In this section, we compare the distance correlation based goodness-of-test and the energy test with Kolmogorov-Smirnov, and the Cramer-von Mises. The comparison is done for sample sizes n = 10, 20, 50, and 100 at the 10% level of signiﬁcance. The following alternative distributions are considered: log-normal(0, 1), Weibull(1,1), Weibull(2,1) and Gamma(2,1). Figure 6.2 shows the empirical Type I error rate. All tests controls the Type one error rate reasonably well. Figures 6.3 to 6.6 shows the empirical power values for these alternatives. With all the alternatives considered, the energy test outperforms the other test for small sample sizes. With sample size 100, the energy test and the other two EDF tests competes. The distance correlation based test achieve low power compared to the other tests. However, the power improves reasonably well with increasing sample size. Unlike the composite inverse Gaussian hypothesis, the distance correlation based test is able to distinguish between the Pareto distribution and the log-normal distribution. 78

Figure 6.2 Empirical Type I error rate of testing composite Pareto hypothesis with varying shape parameter, α = 0.10, n = 20. 79

Figure 6.3 Empirical power of testing composite Pareto hypothesis against Lognormal(0,1) alternative with varying sample size, α = 0.10 80

Figure 6.4 Empirical power of testing composite Pareto hypothesis against Weibull(1,1) alternative with varying sample size, α = 0.10 81

Figure 6.5 Empirical power of testing composite Pareto hypothesis against Weibull(2,1) alternative with varying sample size, α = 0.10 82

Figure 6.6 Empirical power of testing composite Pareto hypothesis against Gamma(2,1) alternative with varying sample size, α = 0.10 83

CHAPTER 7 SUMMARY

The inverse Gaussian distribution belongs to the exponential family with wide applications in GLM, lifetime models etc. In this dissertation, two new goodness-of-fit tests for the univariate inverse Gaussian is developed. The first is a consistent energy distance test. The computing formula for the test statistic Qn is derived by the transformation of the inverse Gaussian random variable to the standard half-normal. The second, is a goodness-of-fit test based on independence characterization. The inverse Gaussian is characterised by the independence of two statistics, X¯ and V . On the basis of this characterization, we utilize the distance correlation as a measure of independence, so that testing the independence between X¯ and V is equivalent to testing the inverse Gaussian hypothesis. We have explored the empirical power performance of the composite inverse Gaussian distribution. These test behaves differently under various alternatives. The energy test was found to be more powerful compared with other test when the shape parameter of the alternative distribution is small. Except the log-normal alternative, the distance correlation based test was more powerful for large shape parameter values. Our simulations suggest that the distance correlation based test is unable to distinguish between the log-normal and the inverse Gaussian distribution. We extended the application of the distance correlation based goodness of fit test to the Pareto distribution which is also characterized by the independence of two statistics. Our simulation results indicates that the distance correlation test performs better with increasing sample size. The distance correlation test is however, able to distinguish between the log-normal and the Pareto distribution. In this dissertation we only worked with univariate distributions. We intend to explore the application of the methods discussed to multivariate distributions. 84

BIBLIOGRAPHY

Ahsanullah, M. and A. B. M. L. Kabir (1973). A characterization of the pareto distribution. Cana- dian Journal of Statistics 1(1-2), 109–112.

Anderson, T. W. and D. A. Darling (1954). A test of goodness of ﬁt. Journal of the American statistical association 49(268), 765–769.

Ang, H.-S. A. and H. W. Tang (1975). Probability concepts in engineering planning and design, Volume 1.

Bardsley, W. (1980). Note on the use of the inverse gaussian distribution for wind energy applications. Journal of Applied Meteorology 19(9), 1126–1130.

Baringhaus, L. and D. Gaigall (2015). On an independence test approach to the goodness-of-ﬁt problem. Journal of Multivariate Analysis 140, 193–208.

Basu, D. and R. G. Laha (1954). On some characterizations of the normal distribution. Sankhya:¯ The Indian Journal of Statistics (1933-1960) 13(4), 359–362.

Chhikara, R. (1988). The Inverse Gaussian Distribution: Theory: Methodology, and Applications, Volume 95. CRC Press.

Chhikara, R. and J. Folks (1989). The Inverse Gaussian Distribution: Theory, Methodology and Applications. Marcel Dekker.

Chhikara, R. S. and J. L. Folks (1974). Estimation of the inverse gaussian distribution function. Journal of the American Statistical Association 69(345), 250–254.

Ducharme, G. R. (2001). Goodness-of-ﬁt tests for the inverse gaussian and related distributions. Test 10(2), 271–290.

DAgostino, R. and M. Stephens (1986). Goodness-of-ﬁt techniques. new york: Marcel a. 85 Edelmann, D., K. Fokianos, and M. Pitsillou (2019). An updated literature review of distance correlation and its applications to time series. International Statistical Review 87(2), 237–262.

Edelmann, D., D. Richards, and D. Vogel (2017). The distance standard deviation. arXiv preprint arXiv:1705.05777.

Edgeman, R. (1990). Assessing the inverse gaussian distribution assumption. IEEE transactions on reliability 39(3), 352–355.

Edgeman, R. L., R. C. Scott, and R. J. Pavur (1988). A modiﬁed kolmogorov-smirnov test for the inverse gaussian density with unknown parameters. Communications in Statistics-Simulation and Computation 17(4), 1203–1212.

Fisher, R. A. (1925). Application of student’s distribution. Metron 5(part 3), 90–104.

Folks, J. L. and R. S. Chhikara (1978). The inverse gaussian distribution and its statistical application–a review. Journal of the Royal Statistical Society. Series B (Methodological), 263– 289.

Geary, R. (1936). The distribution of” student’s” ratio for non-normal samples. Supplement to the Journal of the Royal Statistical Society 3(2), 178–184.

Gerstenberger, C. and D. Vogel (2015). On the efﬁciency of ginis mean difference. Statistical Methods & Applications 24(4), 569–596.

Girone, G. and A. M. D’Uggento (2016). About the mean difference of the inverse normal distribution. Appl Math 7(14), 1504–1509.

Gracia-Medrano, L. and F. O’Reilly (2005). Transformations for testing the ﬁt of the inverse- gaussian distribution. Communications in Statistics-Theory and Methods 33(4), 919–924.

Gunes, H., D. C. Dietz, P. F. Auclair, and A. H. Moore (1997). Modified goodness-of-fit tests for the inverse gaussian distribution. Computational statistics & Data analysis 24(1), 63–77. 86 Henze, N. and B. Klar (2002). Goodness-of-fit tests for the inverse gaussian distribution based on the empirical laplace transform. Annals of the Institute of Statistical Mathematics 54(2), 425–444.

Huberman, B. A., P. L. Pirolli, J. E. Pitkow, and R. M. Lukose (1998). Strong regularities in world wide web surﬁng. Science 280(5360), 95–97.

Khatri, C. G. (1962). A characterization of the inverse gaussian distribution. The Annals of Math- ematical Statistics 33(2), 800–803.

Kim, A. Y., C. Marzban, D. B. Percival, and W. Stuetzle (2009). Using labeled data to evaluate change detectors in a multivariate streaming environment. Signal Processing 89(12), 2529– 2536.

Li, R., W. Zhong, and L. Zhu (2012). Feature screening via distance correlation learning. Journal of the American Statistical Association 107(499), 1129–1139.

Lin, C.-C. and G. S. Mudholkar (1980). A simple test for normality against asymmetric alternatives. Biometrika 67(2), 455–461.

Lukacs, E. (1942). A characterization of the normal distribution. The Annals of Mathematical Statistics 13(1), 91–93.

Lukacs, E. (1955). A characterization of the gamma distribution. The Annals of Mathematical Statistics 26(2), 319–324.

Lukacs, E. (1956). Characterization of populations by properties of suitable statistics. In Proceed- ings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 2, pp. 195–214.

Mart´ınez-Gomez,´ E., M. T. Richards, and D. S. P. Richards (2014). Distance correlation methods for discovering associations in large astrophysical databases. The Astrophysical Journal 781(1), 39. 87 Michael, J. R., W. R. Schucany, and R. W. Haas (1976). Generating random variates using transformations with multiple roots. The American Statistician 30(2), 88–90.

Milosevic,ˇ B. (2017). Some recent characterization based goodness of ﬁt tests. In European Young Statisticians Meeting, pp. 67.

Miloseviˇ c,´ B. and M. Obradovic´ (2016). Two-dimensional kolmogorov-type goodness-of-ﬁt tests based on characterisations and their asymptotic efﬁciencies. Journal of Nonparametric Statis- tics 28(2), 413–427.

Mudholkar, G. S., R. Natarajan, and Y. Chaubey (2001). A goodness-of-ﬁt test for the inverse gaussian distribution using its independence characterization. Sankhya:¯ The Indian Journal of Statistics, Series B, 362–374.

Mudholkar, G. S. and L. Tian (2002). An entropy characterization of the inverse gaussian distribution and related goodness-of-ﬁt test. Journal of statistical planning and inference 102(2), 211–221.

Nguyen, T. T. and K. T. Dinh (2003). Exact edf goodness-of-ﬁt tests for inverse gaussian distributions. Communications in Statistics-Simulation and Computation 32(2), 505–516.

Norman, L., S. Kotz, and N. Balakrishnan (1994). Continuous univariate distributions.

O’Reilly, F. J. and R. Rueda (1992). Goodness of ﬁt for the inverse gaussian distribution. Canadian Journal of Statistics 20(4), 387–397.

Pavur, R., R. Edgeman, and R. Scott (1992). Quadratic statistics for the goodness-of-ﬁt test of the inverse gaussian distribution. IEEE Transactions on Reliability 41(1), 118–123.

Pitman, E. J. (1937). The closest estimates of statistical parameters. In Mathematical Proceedings of the Cambridge Philosophical Society, Volume 33, pp. 212–222. Cambridge University Press.

Rizzo, M. L. (2002). A new rotation invariant goodness-of-ﬁt test. 88 Rizzo, M. L. (2009). New goodness-of-ﬁt tests for pareto distributions. ASTIN Bulletin: The Journal of the IAA 39(2), 691–715.

Rizzo, M. L. and J. T. Haman (2016). Expected distances and goodness-of-ﬁt for the asymmetric laplace distribution. Statistics & Probability Letters 117, 158–164.

Rizzo, M. L. and G. J. Szekely´ (2010). Disco analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics 4(2), 1034–1055.

Rizzo, M. L. and G. J. Szekely´ (2019). energy: E-Statistics: Multivariate Inference via the Energy of Data. R package version 1.7.6.

Schrodinger, E. (1951). Zur theorie der fall-und steigversuche an teilchen mit brownscher bewe- gung. Phys. Ze 16(1), 289–295.

Seshadri, V. (1983). The inverse gaussian distribution: some properties and characterizations. Canadian Journal of Statistics 11(2), 131–136.

Seshadri, V. (1993). The inverse Gaussian distribution : a case study in exponential families. Oxford University Press.

Seshadri, V. and J. Wesołowski (2004). Martingales deﬁned by reciprocals of sums and related characterizations.

Shuster, J. (1968). On the inverse gaussian distribution function. Journal of the American Statisti- cal Association 63(324), 1514–1516.

Stephens, M. A. (1974). Edf statistics for goodness of ﬁt and some comparisons. Journal of the American statistical Association 69(347), 730–737.

Stephens, M. A. (1976). Asymptotic results for goodness-of-ﬁt statistics with unknown parameters. Annals of Statistics 4, 357–369.

Szekely, G. J. (1989). Potential and kinetic energy in statistics. Lecture Notes, Budapest Institute. 89 Szekely,´ G. J. and M. L. Rizzo (2005). A new test for multivariate normality. Journal of Multi- variate Analysis 93(1), 58–80.

Szekely,´ G. J. and M. L. Rizzo (2009). Brownian distance covariance. The annals of applied statistics, 1236–1265.

Szekely,´ G. J. and M. L. Rizzo (2013). The distance correlation t-test of independence in high dimension. Journal of Multivariate Analysis 117, 193–213.

Szekely,´ G. J., M. L. Rizzo, and N. K. Bakirov (2007). Measuring and testing dependence by correlation of distances. The annals of statistics 35(6), 2769–2794.

Tata, M. N. (1969). On outstanding values in a sequence of random variables. Zeitschrift fur¨ Wahrscheinlichkeitstheorie und verwandte Gebiete 12(1), 9–20.

Tweedie, M. C. K. (1941). mathematical investigation of some electrophoretic measurements. Master’s thesis, University of Reading, England.

Tweedie, M. C. K. (1956). Some statistical properties of the inverse gaussian distributions. Virginia J. Sci 7, 160–165.

Tweedie, M. C. K. (1957a). Statistical properties of inverse gaussian distributions. i. The Annals of Mathematical Statistics 28(2), 362–377.

Vasicek, O. (1976). A test for normality based on sample entropy. Journal of the Royal Statistical Society: Series B (Methodological) 38(1), 54–59.

Vexler, A., G. Shan, S. Kim, W.-M. Tsai, L. Tian, and A. D. Hutson (2011). An empirical likelihood ratio based goodness-of-ﬁt test for inverse gaussian distributions. Journal of Statistical Planning and Inference 141(6), 2128–2140.

Wasan, M. T. (1968). On an inverse gaussian process. Scandinavian Actuarial Journal 1968(1-2), 69–96. 90 Yang, G. (2012). The energy goodness-of-ﬁt test for univariate stable distributions. Ph. D. thesis, Bowling Green State University.

Zhou, Z. (2012). Measuring nonlinear dependence in time-series, a distance correlation approach. Journal of Time Series Analysis 33(3), 438–457. 91

APPENDIX A SELECTED R PROGRAMS

• Function to compute Kolmogorov-Smirnov statistic

library(SuppDists)

library(energy)

library (VGAM)

KS <− function(x){

x <−s o r t ( x )

#z <− pinvGauss(x,mu,lambda)

z<− erf(x/sqrt(2))

k<− 1:length(x)

Dpos <− max(k/length(x) − z )

Dneg <− max ( z − ( k −1)/length(x))

#s<− sqrt(length(x)) + .12 + .11/sqrt(length(x))

D <− max(c(Dneg, Dpos))

r e t u r n (D)

}

• Function to compute the Anderson-Darling statistic

AD <− function(x){ x <− s o r t ( x ) #z <− pinvGauss(x, mu,lambda) z<− erf(x/sqrt(2)) J <− 1:length(x) a <− (2∗ J − 1) ∗ l o g ( z ) b <− (2∗ length(x)+ 1 − 2∗ J) ∗ l o g (1 − z ) A2 <− − l e n g t h ( x ) − sum(a + b) / length(x) r e t u r n (A2) 92 }

• Function to compute the Cramer-von Mises statistic

CVM <− function(x){ x <− s o r t ( x ) #z <− pinvGauss(x, mu, lambda) z <− erf(x/sqrt(2)) J <− 1:length(x) cvm <− sum ( ( z − (2∗ J − 1 ) / ( 2 ∗ length(x)))ˆ2) + 1 / ( 1 2 ∗ l e n g t h ( x ) ) }

• Function for the distance covariance test.

varr = function(x){ (1/length(x)) ∗ sum ( ( 1 / x ) − (1/mean(x))) }

## distance covariance test tst1 = function(X){ indx = seq.int(1L, 2∗ floor(length(X)/2), 2L) b = 2∗ floor(length(X)/2) X = X[ 1 : b ] XX = as.matrix(cbind(X[indx],X[− indx ] ) ) m = apply(XX,1,mean) v = apply(XX,1,varr) dct.p = energy::dcor.test(m,v,R = 999)$p.value return(dct.p) 93 }

• Function for the energy test

e n e r g y s t a t <−function(x){ y = s o r t ( x ) k =1:length(x) exp1<− 2∗sum ( x ∗(2∗ erf(x/sqrt(2)) − 1) − sqrt(2/pi) + 2∗ s q r t ( 2 / p i )∗ exp (( −x ˆ 2 ) / 2 ) ) exp2 <− 2∗ l e n g t h ( x)∗(2 − sqrt(2))/sqrt(pi) exp3<− (2/(length(x))) ∗ sum ( ( 2 ∗ k − 1− l e n g t h ( x ) ) ∗ y ) s t a t <− exp1 − exp2 − exp3 return(stat) }

• Function for Type I error and power simulation

sim<−function(p1,p2, n,B1){

t e s t r e s u l t a d <−c ( ) t e s t r e s u l t k s <−c ( ) t e s t r e s u l t c v m <−c ( ) t e s t r e s u l t e n e r g y <−c ( ) t e s t r e s u l t d c <−c ( ) t e s t r e s u l t z <−c ( )

t e s t r e s u l t a d 1 <−c ( ) t e s t r e s u l t k s 1 <−c ( ) 94 t e s t r e s u l t c v m 1 <−c ( ) t e s t r e s u l t e n e r g y 1 <−c ( ) t e s t r e s u l t d c 1 <−c ( ) t e s t r e s u l t z 1 <−c ( ) for(i in 1:B1){ #X = rinvGauss(n, p1,p2) #X = rbeta(n,p1,p2) #X = rexp(n,p1) #X = rgamma(n,p1,p2) #X = rlnorm(n,p1,p2) #X=runif(n,p1,p2) #X <− rweibull(n, p1,p2) #X <− rbeta(n, p1, p2) X = rpareto(n, p1,p2) mu.hat = mean(X) lambda.hat = 1/mean(1/X − 1 /mu . h a t ) w = abs(sqrt(lambda.hat/X) ∗ ((X − mu.hat )/mu.hat))

a l p h a 1 = . 1 ######################################################## r e p l <− replicate(500, simplify = ”matrix”, expr = { y <− rinvGauss(n, 1, 1) mu = mean ( y ) lambda = 1/mean(1/y − 1 /mu) 95 #x <− rinvGauss(n, nu= 1,lambda=1) #m<− mean ( x ) # l <− 1 / mean ( 1 / x − 1 /m) z = abs(sqrt(lambda/y) ∗ ( ( y − mu ) / mu ) ) c ( r e p k s = KS( z ) , r e p a d = AD( z ) , rep cvm = CVM(z), r e p energy = energystat(z)) }) dff = as.data.frame(t(repl)) cv ad <−quantile(dff[,2], c(.95, .90)) cv ks <−quantile(dff[,1], c(.95, .90)) cv cvm<−quantile(dff[,3], c(.95, .90)) cv energy <−quantile(dff[,4], c(.95, .90))

# t e s t r e s u l t e n e r g y [ i ]<− energystat(w)> c v e n e r g y [ 1 ] # t e s t r e s u l t a d [ i ] <− AD(w)> cv ad [ 1 ] # t e s t r e s u l t k s [ i ] <− KS(w)> c v k s [ 1 ] # t e s t r e s u l t c v m [ i ]<− CVM(w)>cv cvm [ 1 ] # t e s t r e s u l t d c [ i ]<− t s t 1 (X)< a l p h a # t e s t r e s u l t z[i] = ifelse(pval

### alpha = .10

t e s t r e s u l t e n e r g y 1 [ i ] <− energystat(w)> c v e n e r g y [ 2 ] t e s t r e s u l t a d 1 [ i ] <− AD(w)> cv ad [ 2 ] 96 t e s t r e s u l t k s 1 [ i ] <− KS(w)> c v k s [ 2 ] t e s t r e s u l t c v m 1 [ i ]<− CVM(w)>cv cvm [ 2 ] t e s t r e s u l t d c 1 [ i ]<− t s t 1 (X)< a l p h a 1 # t e s t r e s u l t z1[i] = ifelse(pval

}

r e t u r n ( c ( mean ( t e s t r e s u l t energy1),mean(test r e s u l t d c 1 ) , mean ( t e s t r e s u l t ad1 ),mean(test r e s u l t c v m 1 ) , mean ( t e s t r e s u l t k s 1 ) )) }