<<

STATISTICAL ANALYSIS OF SKEW AND ITS APPLICATIONS

Grace Ngunkeng

A Dissertation

Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

August 2013

Committee:

Wei Ning, Advisor

Jane Y. Chang, Graduate Faculty Representative

Arjun K. Gupta

John T. Chen Copyright c August 2013 Grace Ngunkeng All rights reserved iii ABSTRACT

Wei Ning, Advisor

In many practical applications it has been observed that real sets are not symmetric. They exhibit some , therefore do not conform to the normal distribution, which is popular and easy to be handled. Azzalini (1985) introduced a new class of distributions named the , which is mathemat- ically tractable and includes the normal distribution as a special case with skewness being zero. The skew normal distribution family is well known for model- ing and analyzing skewed data. It is the distribution family that extends the normal distribution family by adding a to regulate the skewness, which has the higher flexibility in fitting a real data where some skewness is present. In this dissertation, we will explore statistical analysis related to this distribution family. In the first part of the dissertation, we develop a nonparametric goodness-of-fit test based on the empirical likelihood method for the skew normal distribution. The empirical likelihood was proposed by Owen (1988). It is a method which combines the reliability of the canonical nonparametric method with the flexibility and effec- tiveness of the likelihood approach. The of the test is derived. Simulations indicate that the proposed test can control the type I error within a given nominal level, and it has competitive power comparing to the other available tests. The test is applied to IQ scores and Australian Institute of Sport data set to illustrate the testing procedure. In the second part we focus on the change point problem of the skew normal distribution. The world is filled with changes, which can lead to unnecessary losses if people are not aware of it. Thus, are faced with the problem of de- iv tecting the number of change points or jumps and their location, in many practical applications. In this part, we address this problem for the standard skew normal family. We focus on the test based on the Schwartz information criterion (SIC) to detect the position and the number of change points for the shape parameter. The likelihood ratio test and the bayesian methods as two alternative approaches will be introduced briefly. The asymptotic null distribution of the SIC test statis- tics is derived and the critical values for different sizes and nominal levels are computed for the adjustified SIC . Simulation study indicates the performance of the proposed test. In the third part of the dissertation, we extend the methods in the second part by studying the different types of change point problem for the general skew nor- mal distribution, which include: the simultaneous changes of location and scale , the simultaneous change of location, scale and shape parameters. We derive the test statistic based on SIC to detect and estimate the number of possible change points. Firstly, we consider the change point problem for the simultaneous changes of location and scale parameters, assuming that the shape parameter is unknown and has to be estimated. Secondly, we explore the change point problem for simultaneous changes of location, scale and shape parameters. The asymptotic null distribution and the corresponding adjustification for the test statistic are established. Simulations for each proposed test are conducted to indicate the performance of the test. Power comparisons with the available tests are investigated to indicate the advantage of the proposed test. Applications to real data are provided to illustrate the test procedure. v

This work is dedicated to my beloved grand mother Ngunkeng Mariana and my parents Ashu Alexander and Monica Fuabe Ashu, for their constant love and . vi ACKNOWLEDGMENTS

To God be the honor and glory. I wish to express my sincere gratitude to my advisor, Dr. Wei Ning, for his continuous support, guidance and patience throughout this research, and from whom I have acquired a great deal of skills. I also want to extend my gratitude to my committee members, Dr. Arjun K. Gupta, Dr. John T. Chen and Dr. Jane Chang for taking the time to serve on my committee and for their constructive comments. I would like to thank the and Department and the Grad- uate College for providing me with financial support during my studies at BGSU. I would like to thank all the professors in the Mathematics and Statistics Department for their vast knowledge that has impacted me. I would also like to thank all my fellow graduate students for their friendship. I would like to especially thank Marcia Lynn Seubert, Mary Jane Busdeker and Barbara J Berta for all their assistance. I would like to thank Professor Reialdo B. Arellano-Valle, Professor Luis M. Castro and Professor Rosangela H. Loschi for proving us with the Latin American stock market data used in chapters 3 & 4. I owe special thanks to Dr. Lisa Chyvonne Chavers, Mr. Sidney Robert Childs, Dr. Nkem Khumbah and Mrs. Prudence Nojang for making it possible for me to continue my studies at BGSU and for the continuous moral and financial support. Finally my deepest gratitude goes to my parents, family and friends for their constant love and spiritual support throughout my studies.

Grace Ngunkeng Bowling Green, Ohio, USA August 2013 vii

Table of Contents

CHAPTER 1: SKEW NORMAL DISTRIBUTION 1 1.1 Introduction ...... 1 1.1.1 Properties of skew normal distribution (SN) ...... 3 1.2 Literature Review ...... 5 1.2.1 Thesis Structure ...... 6

CHAPTER 2: EMPIRICAL LIKELIHOOD RATIO BASED GOODNESS- OF-FIT TEST FOR SKEW 8 2.1 Introduction ...... 8 2.2 Empirical Likelihood Based Test ...... 15 2.2.1 Empirical Likelihood Method ...... 15 2.2.2 Test Statistic ...... 20 2.3 Asymptotic Results ...... 27 2.4 Calculations of Critical Values and P-values ...... 32 2.4.1 Critical Values ...... 32

2.4.2 Approximations to the p-value of SNn ...... 33 2.5 Simulations ...... 34 2.6 Application ...... 38 viii 2.6.1 Otis IQ Scores for Non-whites ...... 38 2.6.2 Australian Institute of Sport Data ...... 40 2.7 Conclusion ...... 41

CHAPTER 3: CHANGE POINT PROBLEM FOR STANDARD SKEW NORMAL DISTRIBUTION 43 3.1 Introduction ...... 43 3.1.1 Literature Review ...... 45 3.2 Change of the Shape Parameter λ ...... 47 3.2.1 Information Approach ...... 48 3.2.2 Likelihood Ratio Based Test ...... 53 3.2.3 Bayesian Approach ...... 55 3.3 Simulation ...... 62 3.4 Application ...... 62 3.5 Conclusion ...... 68

CHAPTER 4: CHANGE POINT PROBLEM FOR GENERAL SKEW NORMAL DISTRIBUTION 70 4.1 Location and Scale Change ...... 71 4.1.1 Information Approach (SIC) ...... 72 4.1.2 Power Simulation ...... 78 4.2 Application to Biomedical Data ...... 78 4.3 The Change of Location, Scale and Shape ...... 80 4.3.1 Test Statistics ...... 81 4.3.2 Power Simulation ...... 85 4.4 Applications to Latin American Emerging Market Stock Returns . . . 86 ix 4.4.1 Argentina Weekly Stock Market ...... 87 4.4.2 Brazilian Stock Return ...... 89 4.4.3 Chile Stock Return Market ...... 92 4.4.4 Mexico Stock Return Market ...... 94 4.5 Conclusion ...... 97

BIBLIOGRAPHY 99 x

List of Figures

2.1 of IQ scores with a skew normal fit and normal fit. . . . . 39 2.2 The histogram with a skew normal fit and normal fit for the body mass index (BMI) of 50 females...... 41

3.1 The Graph of the data for the weekly stock returns and return rate for Brazil with the corresponding change points respectively. 66

4.1 Left: The SIC values for every locus on chromosome 4 of the fibroblast cell line GM13330; Right: Chromosome 4 of the fibroblast cell line GM13330...... 80 4.2 The graphs of the time series data for the weekly stock returns and

return rate Rt for Argentina market with the corresponding change points...... 88

4.3 Left: The graph of the acf values of the transformed data Rt ; Right: Test for normality...... 89

4.4 The graphs of the time series data for the weekly return rate Rt and stock returns and for Brazil market with the corresponding change points...... 91

4.5 Left: The acf of Brazil Rt series data; Right: Test for Normality. 92 xi

4.6 The graphs of the time series data for the weekly return rate Rt and stock returns for Chile market with the corresponding change points . 94

4.7 Left: Graph of the acf of the Chile Rt series; Right: Test for Normality. 95

4.8 The graphs of the time series data for the weekly return rate Rt and stock returns for Mexico market with the corresponding change point. 96

4.9 The ACF of Mexico stock return rate Rt and Q-Q plot to test for normality assumption...... 97 xii

List of Tables

2.1 Type I error with SN(0, 1, λ), α = 0.05 ...... 34 2.2 Power comparison with n = 20, 25, 50 and 100 ...... 36 2.3 Power comparison with n = 20, 25, 50 and 100 ...... 37 2.4 Power of Test with Alternative Distribution N(0, 1) ...... 37 2.5 Empirical Power Evaluation of the statistic (2.2.11) with different δ at α = 0.05 ...... 38 2.6 Otis IQ Scores for Non-whites ...... 38 2.7 Estimated values for N(µ, σ) and SN(µ, σ, λ) ...... 40 2.8 Australian Institute of Sport, Body mass index of 50 females . . . . . 40

3.1 Approximate Critical values of SIC ...... 54 3.2 Power Simulation for SN(λ) with n = 100, 150 and 200 ...... 63 3.3 Power Simulation Cont...... 64 3.4 Power Simulation Cont...... 65

4.1 Critical values with α and Sample size n ...... 77 4.2 Power Simulation for SN(µ, σ, λ) with n = 100, 150, 200 and different

values of µ1, σ1 and µ2, σ2 ...... 79 4.3 Approximate Critical values with α and Sample size n ...... 84 xiii 4.4 Power Simulation for SN(µ, σ, λ) with n = 100, 150 and 200 . . . . . 85 1

CHAPTER 1

SKEW NORMAL DISTRIBUTION

1.1 Introduction

The skew normal distribution (SN) is an extension of the normal distribution allow- ing the presence of skewness. Azzalini (1985) introduced a new class of distribution, which shared similar properties with the widely used normal distributions. His mo- tivation was to find an ideal class of distributions which satisfied these three proper- ties, “strict inclusion of the normal density”, “mathematical tractability” and “wide of the indices of the skewness and ”. However, Azzalini showed that the skew normal class fulfilled the first two properties: strict inclusion of the normal density which follows directly from the definition and its mathematical tractability. Unfortunately, the SN does not fully meet the third property. However he showed that the third property is met to some extent. Therefore the limitation of the skew normal distribution is that of moderating the shape parameter. We first give the 2 basic definitions and properties of this distribution family.

Definition 1.1.1. A random X is said to have a standard skew normal distribution with shape parameter λ if it has the following density

f(x, λ) = 2φ(x)Φ(λx), (1.1.1) where λ and x are real numbers, φ(.) and Φ(.) are the standard normal density function (pdf) and cumulative distribution function (cdf) respectively. We denote it by X ∼ SN(λ). The density function (1.1.1) was introduced by Azzalini (1985), and it shares some foremost properties which are similar to those of the normal distribution. The general version of the skew normal distribution is given as follows.

Definition 1.1.2. Let X ∼ SN(λ) and consider the linear transformation Y = µ + σX, then the Y is said to have a skew normal distribution denoted by Y ∼ SN(µ, σ, λ) if it has the pdf given by:

2 y − µ y − µ f (y; µ, σ) = φ( )Φ(λ( )), (1.1.2) Y σ σ σ where φ(.), and Φ(.) are the pdf and cdf of the standard normal distribution respec- tively and µ, σ and λ are the location, scale, and shape parameters respectively.

We then give some basic properties of the skew normal distribution. For more properties the readers are referred to Azzalini (1985, 2005). 3 Definition 1.1.3. The cumulative density function of X is defined by

Z x Z λx F (x, λ) = 2 φ(t)φ(u)du dt −∞ −∞ = Φ(x) − 2T (x, λ) where T is Owen function defined by

1 Z a exp[−h2(1 + x2)/2] T (h, a) = 2 dx 2π 0 1 + x

This function is studied and tabulated by Owen (1956). For more details about function T , the readers are referred to Young and Minder (1974), Hill (1978) and Thomas (1979).

1.1.1 Properties of skew normal distribution (SN)

Let X ∼ SN(λ), then X has the following properties.

Property 1. When λ = 0, the SN becomes the standard normal distribution. That is SN(0) = N(0, 1).

Property 2. As λ → ±∞ the skew normal density converges to half normal density function.

Property 3. The skewness of the distribution increases as the value of λ increases in absolute value.

Property 4. If X ∼ SN(λ) then −X ∼ SN(−λ). The proofs of the properties 1, 2 and 4 are direct from the definition (1.1.1)

2 2 Property 5. If X ∼ SN(λ), then X ∼ χ1. 4 Property 6. 1 − F (−x, λ) = F (x, −λ).

Property 7. F (x, 1) = Φ2(x).

Property 8. The generating function (mgf) of X is given by

λ M(t) = 2 exp(t2/2)Φ(δt) where δ = √ . 1 + λ2

From the moment generating function of X given in property 8, it is easy for us to derive the and the of X.

Property 9. The mean and variance of X is given by

2 λ E[X] = √ , π 1 + λ2 !2 r 2 λ V ar[X] = 1 − √ . π 1 + λ2

Property 10. A measure of skewness of X denoted by γ1(X) ranges from −0.9953 to 0.9953 and a measure of the kurtosis of X denoted by γ2(X) ranges from 0 to 0.869 are defined by;

3 q 2 λ  3 √ 4 − π (E[X]) 4 − π π 1+λ2 γ1(X) = = , 2 (V ar[X])3/2 2  q 23/2 1 − 2 √ λ π 1+λ2 4 q 2 λ  4 √ E[X] π 1+λ2 γ2(X) = 2(π − 3) = 2(π − 3) . V ar[X]2  q 22 1 − 2 √ λ π 1+λ2 5 Property 11. The even moments of the X are equal to the even moments of the standard normal distribution.

Property 12. The odd moments of X are defined as :

r k 2 λ X i!(2λ)2i E[Z2k+1] = √ [2(1 + λ2)]−k(2k + 1)! for k = 0, 1, ··· . π 2 (2i + 1)!(k − i)! 1 + λ i=0

Property 13. ( representation). If U and V are identically and inde- pendently distributed (i.i.d) N(0, 1) random variables then

λ 1 √ |U| + √ V ∼ SN(λ). 1 + λ2 1 + λ2

Property 14. If X ∼ SN(λ), then its characteristic function is

2 ΨX (t) = exp(−t /2)(1 + iτ(δt)),

where for some y ≥ 0

Z y λ τ(y) = p2/π exp(u2/2)du, τ(−y) = −τ(y), and δ = √ . 2 0 1 + λ

1.2 Literature Review

The skew normal distribution due the fact that it shares some important prop- erties of the standard normal distribution and its mathematical tractability has received a lot of attention in the literature. Azzalini (1985, 1986) studied the basic 6 mathematical properties of the skew normal distribution. Henze (1986) presented a probabilistic representation of the skew normal distribution family in terms of a normal and a truncated normal random variable (property 14). The probabilistic representation helps us understand the structure of the SN class and in particular its from the normality. Azzalini and Dalla Valle . (1996) extended the case to the multivariate case. Gupta and Chen (2004) pro- vided another possible extension of the univariate skew normal model into the vector skew normal models. Recently, Ning and Gupta (2012) generalized the univariate extended skew normal distribution family to the matrix variate case by adopting Chen and Gupta (2005) and Harrar and Gupta (2008). Pewsey (2000) studied prob- lems of inference for Azzalini skew normal distribution. For example he applied the method of moment for center parameterization instead of direct parameterization for estimation. Gupta et al. (2004) also presented two characterization results based on quadratic statistics for the skew normal distribution. The statistical inference aspect of this distribution family has been addressed by many authors. For exam- ple, Gupta and Chen (2001) provided a table for cumulative density function of the SN(λ) and applied it in the study of the goodness of fit test of this distribution. Az- zalini (2005) studied the skew normal distribution and related multivariate families. As noted in the literature, both the parametric and nonparametric test statistics have been applied to test the goodness of fit test for this family of distribution.

1.2.1 Thesis Structure

In chapter 2, we develop a nonparametric test statistic based on the empirical like- lihood ratio test for the goodness of fit purpose. We introduce a distribution free density based likelihood technique to test the goodness of fit of the skew normal 7 distribution. We first give a brief description of the empirical likelihood method. Then the test statistic based on the empirical likelihood method will be proposed. The asymptotic results of the statistic will be derived. Simulations and comparisons will be conducted to illustrate the performance and advantage of the test. In chapter 3, we focus on the change point problem for the standard skew normal distribution. We provide a brief introduction of the change point problem in general and then narrow it to detect and estimate the change of the shape parameter. We construct the test statistic based on the Schwartz information criterion (SIC) to detect and estimate the possible change points. Two alternative method: the likelihood ratio test (LRT) and the Bayesian method will also be introduced briefly for the possible interest for the readers. In chapter 4, we discuss the change point problem for the general skew normal distribution. Assuming that the random variable follows a skew normal distribution with three parameters SN(µ, σ, λ) where µ, σ, λ are the location, scale and shape parameters respectively, then consider change point problem in the following cases: (i) Simultaneous changes of the location and scale parameters, assuming that the shape parameter is fixed and unknown. (ii) Simultaneous changes of the location, scale and shape parameters. Simulations are conducted to show the performance of the test. The proposed test is also applied to biomedical data set and the stock price data sets of the four Latin American countries to illustrate the process. 8

CHAPTER 2

EMPIRICAL LIKELIHOOD RATIO BASED GOODNESS-OF-FIT TEST FOR SKEW NORMALITY

2.1 Introduction

Let X be a skew normal random variable. Its probability density function is given by

x − µ x − µ f x) = 2φ( )Φ(λ ), (2.1.1) ( σ σ where µ, σ and λ are the location, scale and shape parameters respectively. Specifi- cally, when µ = 0 and σ = 1 it becomes a standard skew normal distributed variable. 9 Some basic properties of the skew normal distribution family have been introduced in chapter 1. The skew normal distribution family is well known for modeling and analyzing skewed data. It is a distribution family that extends the normal distribution family by adding a shape parameter to regulate the skewness, which has the higher flexibil- ity in fitting a real data where there is some skewness existing. It can be applied to different fields of science such as finance, statistical calibration and medical research (see Chen et al. (2003); Figueiredo et al. (2010); Guolo (2013)). Such a distribution family also shares some common properties with the standard normal distribution family such as the unimodal, the distribution of the squared standard skew normal variable following χ2 distribution with the degree of freedom 1. Because of these attractive characteristics of this distribution, it is meaningful to develop a corre- sponding goodness-of-fit test with some satisfactory properties. Several goodness of fit tests have also been considered in the literature for the skew normal distribution family. For example Gupta and Chen (2001) applied Kolmogorov-Smirnov test and Pearson χ2 test with the specified the values of the parameter λ to test the null hypothesis of a skew normal distribution. To facilitate the implementation of these test statistics they provided a probability table for the skew normal distribution for specific values of λ. These two test statistics are defined as follows. (1) χ2 test statistic:

k 2 X (Ni − mi) χ2 = , m i=1 i

th where Ni is the number of outcomes that falls in the i interval and mi is the

th expected number of i interval. Also mi = npi where n is the number of repetition 10 th and is the probability that an falls in the i interval. One drawback of this method is that it requires a large data set. (2) Kolmogorov-Smirnov test:

D = sup {|Fn(x) − F0(x)|} , x

where F0(.) is the cumulative distribution function of a skew normal distribution with known skew parameter λ0 and Fn(.) is the empirical cumulative distribution function (ECDF), which is defined by

  0, x < X(1)   i Fn(x) = ,X(i) ≤ x < X(i+1), i = 1, ··· , n − 1 (2.1.2)  n   1, x > X(n),

where X(1), ··· ,X(n) are the ordered statistics of X1, ··· ,Xn . Dalla Valle (2007) proposed a test based on Anderson-Darling goodness of fit test to test the skew normality. In his paper the general definition of the Anderson- Darling test statistics is defined as :

Z ∞ 2 2 Wn = n (Fn(x) − F (x, θ)) ψ(x)dF (x, θ), −∞

where Fn(.) is the ECDF , F (x, θ) is the hypothetical distribution functions and ψ(x)

−1 is the weighted function, which is given by ψ(x) = [Fn(x)(1 − F (x, θ))] . Thus the 11 test statistics can be redefined as

Z ∞ (F (x) − F (x, θ))2 AD = n n dF (x, θ). −∞ Fn(x)(1 − F (x, θ))

They also provided a computational formulae given by:

n 1 X A2 = −n − (2i − 1) log F (x , θ) + {2(n − i) + 1} log 1 − F (x , θ) , n (i) (i) i=1

th where X(i) is the i ordered statistic from a sample of size n. Meintanis (2010) provided a goodness-of-fit test through the moment generating function (mgf). The mgf for the standard skew normal distribution is given by

√ M(t) = 2 exp(t2/2)Φ(vt) where v = λ/ 1 + λ2.

The author proposed the following test statistic:

Z ∞ 2 Tn = n Dn(t)w(t)dt, t  IR, −∞ where w(t) is a nonnegative and

r  2  0 2 t D (t) = M (t) + tM (t) − νˆ exp 1 − δνˆ2 , n n n n π 2 n

p π 1 Pn 1 Pn whereν ˆn = 2 Xn, Xn = n j=1 Xj and Mn(t) = n j=1 exp(tXj).

When both the null distribution and the alternative distribution are completely known, Neyman-Pearson lemma shows that the maximum likelihood ratio test is a uniformly most powerful test. However, the alternative distribution is usually un- 12 known in practice. There are nonparametric method based goodness of fit tests avail- able in the literature. For example, Mateu-Figueras et al. (2007) proposed five differ- ent data-driven tests: Anderson-Darling statistic (A2); Cram´er-von Mises statistic (W 2); Watson’s statistic (U 2); Kolmogorov-Smirnov statistic (D) and Kuiper statis- tic (V ), which allow the parameters to be estimated from the data directly. Stephens (1986) classified these five statistics into two families; Cram´er-von Mises family (W 2,U 2 and A2 ) and Kolmogorov-Smirnov Family(D and V ). Mateu-Figueras et al. (2007) used these empirical distribution function (EDF) based test statistics to measure the difference between two distribution function F (.) and Fn(.) where Fn(.) is defined by (2.1.2) and under the null hypothesis the random sample X1, ··· Xn come from a population with the distribution F (.). These five test statistics are Pn defined as follows: We will denote p(i) = F (x(i)) and p = i=1 p(i)/n. (1) Cramer-Von Misses Statistic W 2.

Z ∞ 2 2 W = n (Fn(x) − F (x)) dF (x). −∞

The corresponding computational formula is given by

n 2 X  2i − 1 1 W 2 = p − + . (i) 2n 12n i=1

(2) Watson’s U 2 Statistic.

Z ∞  Z ∞ 2 2 U = n Fn(x) − F (x) − (Fn(t) − F (t))dF (t) dF (x), −∞ −∞ 13 and the corresponding computational formula is given by:

n 1 X 1 U 2 = W 2 − n( p − )2. n (i) 2 i=1

(3) Anderson Darling A2 Statistic.

Z ∞ 2 2 − A = n (Fn(x) − F (x)) [Fn(x) − F (x)] dF (x). −∞ and the corresponding computational formula:

n 1 X A2 = −n − (2i − 1)[log p + log(1 − p )]. n (i) (n+1−i) i=1

The Kolmogorov-Smirnov family consist of the D+ and D− defined as follows:

+ D = sup(Fn(x) − F (x)) and the corresponding computational formula is, x

+ i D = max( − p(i)) and i n − D = sup(F (x) − Fn(x)) and the corresponding computational formula is, x

− i − 1 D = max(p(i) − ). i n

(4) Kolmogorov-Smirnov Statistic D.

D = max(D+,D−).

(5) Kuiper Statistic V .

V = D+ + D−. 14 Vexler and Gurevich (2010) constructed an empirical likelihood ratio based good- ness of fit test to approximate the optimal Neyman-Pearson ratio test with an un- known alternative density function. The test statistic is given by:

Qn 2m i=1 n(X(i+m)−X(i−m)) Gn = min n , 0 < δ < 1 1≤m

ˆ where θ is the maximum likelihood estimate of θ under H0. Vexler et al. (2011) proposed the following similar goodness of fit test statistic based on the empirical likelihood method to test the null hypothesis of an inverse Gaussian distribution.

Qn 2m min√ j=1 n(Y −Y ) 1≤m< n (j+m) (j−m) TKnm = n/2 , 2λˆ πe

h i−1 ˆ 2 Pn −1 2 −1 Pn −2 where λ = nµˆ j=1 Yj − µYˆ i andµ ˆ = n i=1 Yi . In this chapter, we will follow the similar idea by Vexler and Gurevich (2010) to construct an empirical likelihood ratio based goodness of fit test to test the skew normality. The rest of the chapter is organized as follows. In section 2.2, a brief introduction of the empirical likelihood method will be given, and the test statistic based on this method will be proposed. Asymptotic results of the test statistic under the null hypothesis and the will be derived in section 2.3. In section 2.4, we provide the procedure of calculating the critical values and approximating the p-values based on the estimated values of the parameters from a given data. In section 2.5, we conduct the simulations to illustrate that the Type I error can be well controlled for a given nominal level, and the power comparisons with the five tests proposed by Mateu-Figueras et al. (2007) indicate that the proposed test is a very competitive candidate for the goodness of fit test of the skew normality. 15 Applications on two real data sets are provided in section 2.6. Some discussion is given in section 2.7.

2.2 Empirical Likelihood Based Test

2.2.1 Empirical Likelihood Method

The empirical likelihood (EL) method for statistical inference was proposed by Owen (1988). It is a nonparametric method that uses the likelihood method for data anal- ysis without the assumption that the data comes from a specific family of distribu- tions. Same as the parametric methods, it has automatically established the shape of the confidence regions. As established in the literature, the improved generality of bootstrap (nonparametric) comes at a cost of reduced power. However, simulations and theory suggest that EL test have good power properties. The EL method has advantages over bootstrap method in that, it is nonparametric method of inference which is based on the data driven . However it is computational challenging to optimize likelihood function. As follows, we introduce the empirical Likelihood method briefly. Readers are referred to Owen (2001) for more details and results. Let X be a random variable and F (.) be the cumulative distribution function (CDF). We define F (x) = P r(X ≤ x) for −∞ < x < ∞ and F (x−) = P (X < x). So P (X ≤ x) = F (x) − F (x−). We also define an 1A(x) as follows.

  1, if A(x) is true 1A(x) =  0, otherwise. 16

Definition 2.2.1. Let X1, ··· ,Xn be real random variables. The empirical cumu- lative distribution function (ECDF) of x1, ··· , xn is

n 1 X F (x) = 1 for −∞ < x < ∞. n n Xi≤x i=1

Definition 2.2.2. Let X1, ··· ,Xn be real independent random variables, with CDF

F0. The nonparametric likelihood of the CDF F is

n Y −  L(F ) = F (Xi) − F (Xi ) i=1

Owen ( 2001, Theorem 2.1) proved that such a nonparametric likelihood is max- imized by the ECDF. Thus the ECDF is the nonparametric maximum likelihood estimate (NPMLE) of F . Similar to the MLE, the NPMLE has the invariance prop- erty. Suppose we are interested in F through θ = T (F ), where T is a real valued function of distributions. Let θ0 = T (F0) be the true unknown parameter, then ˆ NPMLE of θ is θ = T (Fn).

• Nonparametric likelihood ratios In parametric approach, the likelihood ratio is used for inference on the hypoth-

esis tests and confidence regions. Thus we reject the null hypothesis η0 = η and

exclude η from the confidence region for η0 when L(η) is much smaller than

L(ˆη). Wilk’s theorem provides that −2 log (L(η0)/L(ˆη)) tends to chi-squared distribution as n → ∞ under the mild regularity conditions with the degree of freedom equal to the dimension of the set of η values. The confidence region

n L(η) o of θ is given by θ(η)| L(ˆη) ≥ c where c is chosen using Wilk’s theorem with 17 degree of freedom equal to the dimension of the set of θ values. Similarly for a distribution F , the nonparametric likelihood ratio for the hypothesis tests and confidence regions is defined as,

L(F ) R(F ) = . L(Fn)

Now suppose we are interested in a parameter θ = T (F ) where T (.) is a function of distributions and F is a member of a set of distribution F. Then the profile likelihood ratio function is defined as:

 L(F )  <(θ) = sup |T (F ) = θ, F ∈ F . L(Fn)

Empirical likelihood (EL) hypothesis test rejects H0 : T (F0) = θ0, when

R(θ0) < γ0, for some threshold γ0. EL confidence regions are of the form

{θ|<(θ) ≥ γ0}.

• Ties in the data When there is no tie in the data, the ELR is defined as:

n L(F ) Y R(F ) = = np , L(F ) i n i=1

where the distribution F places a probability pi ≥ 0 on each a real random Pn Qn variable Xi, i=1 pi ≤ 1 and L(F ) = i=1 pi. When there are some ties in the data, and suppose there are k distinct values in the data, then the ELR is 18 defined as:

k  nj k  nj L(F ) Y pj Y npj R(F ) = = = . L(F ) pˆ n n i=1 j i=1 j

We observe that the computation is simpler when there are no ties in the data as compared to when there are ties in the data. However, even when ties are present, we get the same profile likelihood function <(θ) as in the case when there is no tie. See Owen (2001) for more details.

Theorem 2.2.1. (Univariate ELT)

Let X1, ··· ,Xn be independent random variables with the common distri-

bution F0. Let µ0 = E(Xi), and suppose that 0 < V ar(X) < ∞. Then

2 −2 log(<(µ0)) converges in distribution to χ(1) as n → ∞. We note that in the theorem, the chi-squared limited distribution is the same

as in the parametric counterpart and Xi does not need to be bounded. The theorem provides an asymptotic justification for tests that reject the value

2,1−α µ0 at the α level when −2 log(<(µ0)) > χ(1) . The unrejected values of µ0 form the 100(1 − α)% confidence region with the same asymptotic results. EL confidence region for a mean is always an interval.

• Coverage Accuracy: A 100(1 − α)% EL confidence interval for µ is, n 2,1−αo  2,1−α µ| − 2 log(<(µ)) ≤ χ(1) which is equivalent to µ|<(µ) ≥ exp(−χ1 /2) . As n → ∞,

h 2,1−αi P r −2 log(<(µ)) ≤ χ(1) − (1 − α) → 0.

Everything is only asymptotically true. The rate (1/n) of convergence is the 19 same as the confidence intervals based on the parametric likelihood , the jack- knife and the simpler bootstrap methods. For the parametric likelihood in- tervals if the model is true, the converge error is O(1/n) even though if the model is not true as the n increases it need not converge to 0. Thus a Bartlett correction is applied to reduce it to O(1/n2).

• Power and efficiency: For the data to be fully utilized, the test must have good power, the confidence interval not too wide, and it must be efficient. The power of the EL can be assessed through the curvature of < at the NPMLE X¯.

−2 ¯ 2 ¯ For large sample sizes, log(<(µ) ˙= − nσ0 (µ − X) /2 for µ closed to X, where

2 σ0 = V ar(Xi). Therefore the greater the (absolute) curvature in the quadratic, the shorter the confidence intervals for the given level of coverage and therefore

−1/2 the greater the power. Thus it can be shown that −2 log <(µ0 + τσ0n )

2 2 2 converges in distribution to χ(1)(τ ), where τ is a non-centrality parameter.

Therefore, when µ 6= µ0 , the empirical likelihood (nonparametric) inferences will have roughly the same power as parametric inferences.

Suppose x1, ··· , xn are independently and identically distributed p dimensional observations, which follow an unknown population distribution function F with finite variance matrix with rank r > 0. The main idea of the empirical likelihood method proposed and systematically developed by Owen (1988, 1990) is to place an unknown probability mass at each observation. Let pi = P (X = xi) and the empirical likelihood function of F be defined as

n Y L(F ) = pi. i=1 20 It is clear that L(F ), subject to the constraints

X pi ≥ 0 and pi = 1 i

−n is maximized at pi = 1/n, i.e., the likelihood L(F ) attains its maximum n under the full nonparametric model. When a population parameter θ identified by E(X) =

θ is of interest, the empirical log-likelihood maximum, when θ has the true value θ0, is obtained subject to the additional constraint,

X xipi = θ0.

The empirical log-likelihood ratio (ELR) statistic to test θ = θ0 is given by

X X X R(θ0) = max{ log npi : pi ≥ 0, pi = 1, pixi = θ0}, i where R(θ) is the empirical log-likelihood ratio function defined through the defini- tion of the empirical likelihood ratio function by Owen (1988). Owen (1988) showed, similar to the log-likelihood ratio test statistic in a , with mild reg-

2 ular conditions, −2R(θ0) → χr in distribution as n → ∞ under the null hypothesis

θ = θ0, where r is the rank of the variance of X1, ··· ,Xn. Readers are referred to Owen (2001) for more details. 21 2.2.2 Test Statistic

We will test the following hypothesis:

H0 :f = f0 ∼ SN(µ, σ, λ)

H1 :f = f1  SN(µ, σ, λ),

where f0 and f1 are both unknown. The likelihood ratio test statistic for this hypothesis is defined as

Qn Qn i=1 fH1 (Xi) i=1 fH1 (Xi) LR = Qn = Qn 2 i=1 fH0 (Xi) i=1 σ φ((Xi − µ)/σ)Φ(λ(Xi − µ)/σ)

where under the null hypothesis X1,X2, ··· ,Xn follow a skew normal distribution with an unknown parameter θ = (µ, σ, λ). Neyman-Pearson lemma guarantees that such a test is the UMP test with f0 and f1 both completely known. If they are both unknown but estimable, the maximum likelihood method will be applied to estimate the parameters θˆ = (ˆµ, σ,ˆ λˆ) of a skew normal distribution under the null hypothesis, and the empirical likelihood method will be applied to estimate the numerator. We rewrite

n n n Y Y Y Lf = fH1 (Xi) = fH1 (X(i)) = fi, i=1 i=1 i=1 where X(1) ≤ X(2) ≤ · · · ≤ X(n) are the order statistics based on the observations

X1, ··· ,Xn. We will apply the empirical likelihood method introduced in section R 2.2.1 to derive the values of fi to maximize Lf with the constraint f(s)ds = 1 corresponding to the alternative hypothesis. We first give the following lemma by 22 Vexler et al. (2011) to express this constraint more explicitly.

Lemma 2.2.1. Let f(x) be a density function. Then

n m−1 X Z X(j+m) Z X(n) X Z X(n−k+1) f(x)dx = 2m f(x)dx − (m − k) f(x)dx j=1 X(j−m) X(1) k=1 X(n−k) m−1 X Z X(k+1) Z X(n) m(m − 1) − (m − k) f(x)dx 2m f(x)dx − , ≈ n k=1 X(k) X(1) (2.2.1)

+ where X(j) = X(1) if j ≤ 1 and X(j) = X(n), if j ≥ n, and m ∈ Z .

Proof of Lemma 2.2.1 Since f(x) is a density function, we have

n n n n X Z X(j+m) X Z X(j−m+1) X Z X(j+m−1) X Z X(j+m) f(x)dx = f(x)dx + f(x)dx + f(x)dx j=1 X(j−m) j=1 X(j−m) j=1 X(j−m+1) j=1 X(j+m−1) n n X Z X(j+1) Z X(n) X Z X(j+m−1) = f(x)dx − f(x)dx + f(x)dx j=1 X(j) X(n−m+1) j=1 X(j−m+1) n X Z X(j+1) Z X(m) + f(x)dx − f(x)dx j=1 X(j) X(1) n n X Z X(j+(m−1)) X Z X(j+1) = f(x)dx + 2 f(x)dx j=1 X(j−(m−1)) j=1 X(j)

Z X(n) Z X(m) − f(x)dx − f(x)dx X(n−(m−1)) X(1) m−1 Z X(n) X Z X(n−k+1) = 2m f(x)dx − (m − k) f(x)dx X(1) k=1 X(n−k) m−1 X Z X(k+1) − (m − k) f(x)dx k=1 X(k) 23 Since by applying in induction we can rewrite the following expressions

n X Z X(j+1) Z X(n) f(x)dx = f(x)dx j=1 X(j) X(1) m m−1 X Z X(n) X Z X(n−k+1) f(x)dx = (m − k) f(x)dx k=1 X(n−(k−1) k=1 X(n−k) m m−1 X Z X(k) X Z X(k+1) f(x)dx = (m − k) f(x)dx. k=1 X(1) k=1 X(k)

The next inequality follows from Vexler et al. (2011)

n m−1 X Z X(j+m) Z X(n) X Z X(n−k+1) f(x)dx = 2m f(x)dx − (m − k) f(x)dx j=1 X(j−m) X(1) k=1 X(n−k) m−1 X Z X()k+1 − (m − k) f(x)dx k=1 X(k) m−1 Z X(n) X = 2m f(x)dx − (m − k)F (X(n−k+1)) − F (X(n−k)) X(1) k=1 m−1 X − (m − k)F (X(k+1))F (X(k)) k=1 m−1 Z X(n) ∼ X = 2m f(x)dz − (m − k)Fn(X(n−k+1)) − Fn(X(n−k)) X(1) k=1 m−1 X − (m − k)Fn(X(k+1)) − Fn(X(k)) k=1 Z X(n) m(m − 1) = 2m f(x)dx − . n  X(1)

We should note there that the value of m will play an important role in the test statistics proposed later. To achieve the good asymptotic results of the test statistic, Vexler and Gurevich (2010) discussed a modified version of the test statistic by choosing m ∈ (1, n1−δ) where 0 < δ < 1. See Vexler and Gurevich (2010) section 2 24 for details.

Since R X(n) f(x)dx ≤ R ∞ f(x)dx = 1, from Lemma 2.2.1 we have X(1) −∞

n 1 X Z X(j+m) Z X(n) 1 m(m − 1) f(x)dx f(x)dx − · ≤ 1. (2.2.2) 2m ≈ 2m n j=1 X(j−m) X(1)

1 Pn R X(j+m) We denote ∆m = f(x)dx, therefore, ∆m ≤ 1. We also observe 2m j=1 X(j−m) that ∆m → 1 as m/n → 0 from (2.2.2). By applying the to the integration on the left side of the equation (2.2.2), we have

n n n X Z X(j+m) X X f(x)dx ≈ (X(j+m) − X(j−m))f(x(j)) = (X(j+m) − X(j−m))fj. j=1 X(j−m) j=1 j=1

Thus,

n 1 X ∆m (X(j+m) − X(j−m))fj ∆e m ≤ 1. ≈ 2m , j=1

P To maximize log fj with the constraint ∆e m ≤ 1, we apply the method and have

n n ! X 1 X l(f , ··· , f , η) = log f + η (X − X )f − 1 , (2.2.3) 1 n j 2m (j+m) (j−m) j j=1 j=1 where η is a Lagrange multiplier. By taking the of the equation (2.2.3) with respect to each fj and η, we obtain

∂l 1 η = 0 ⇒ + (X(j+m) − X(j−m)) = 0 (2.2.4) ∂fj fj 2m 25 n ∂l 1 X = 0 ⇒ (X − X )f − 1 = 0. (2.2.5) ∂η 2m (j+m) (j−m) j j=1

From equation (2.2.4) and (2.2.5) we have

1 η 2m + (X − X ) = 0 ⇒ f = − f 2m (j+m) (j−m) j η(X − X ) j (j+m) (j−m) (2.2.6) X 1 1 X fj · + η fj(X(j+m) − X(j−m)) = 0 ⇒ η = −n. fj 2m

P Hence, we will obtain the estimate of fj to maximize log fj, which also maximizes Q fj as

2m fj = , (2.2.7) n(X(j+m) − X(j−m))

where X(j) = X(1), if j ≤ 1, and X(j) = X(n), if j ≥ n. Therefore, the ELR based goodness of fit test for the skew normality is defined as

Qn 2m j=1 n(X(j+m)−X(j−m)) SNmn = Qn , (2.2.8) max j=1 fH0 (Xj|θ) θ where θ = (µ, σ, λ) is the parameter vector of a skew normal distribution, and the maximum value of the denominator can be obtained by plugging in the maximum likelihood estimate (MLE) of θ based on the observations. We notice that the test statistic SNmn strongly depends on the integer m. To make the test more efficient, we follow the same argument by Vexler and Gurevich (2010) to reconstruct the test statistic according to the properties of the empirical likelihood method. We adopt their idea here to reconstruct the test statistic in(2.2.8). The constraint of Lemma 2.2.1 is taking into account in the derivation of the empirical Likelihood. On the 26 contrary, if for some value of m say m0 we have

n X 1 X Z (j+m0) ∆m0 = f(x)dx > 1, 2m0 X j=1 (j−m0)

R X(n) then this implies X(1) f(x)dx > 1, which violate the condition of Lemma 2.2.1.

Thus we will restrict the values of fj such that the Lemma 2.2.1 conditions are satisfy for all values of m. Hence for some m1 we have

n n Y Y max fi ≤ max fi. f ,··· ,f :∆˜ ≤1 for all m f ,··· ,f :∆˜ ≤1 for all m 1 m m i=1 1 m m1 i=1 therefore,

n n Y Y max fi ≤ min max fi. (2.2.9) f ,··· ,f :∆˜ ≤1 for all m 1≤m1f ,··· ,f :∆˜ ≤1 for all m 1 m m i=1 1 m m1 i=1 now based on the constraint that

r r 1 X 1 X Z X(j+r) 1 ≥ f (X − X ) ∼= f(v)dv 2r j (j+r) (j−r) 2r j=1 j=1 X(j−r) r Z X(n) 1 X Z X(j+k) ∼= f(v)dv ∼= f(v)dv ∼= ∆˜ , 2k k X(1) j=1 X(j−k)

˜ ˜ which implies the ∆k ≤ 1. Now by virtue of this constraints ∆m ≤ 1 and (2.2.9) we can approximate (2.2.8) by

Qn 2m min j=1 n(X −X ) 1≤m

Qn 2m min√ j=1 n(X −X ) 1≤m< n (j+m) (j−m) SN = . (2.2.11) n Qn ˆ j=1 fH0 (Xj|θ)

2.3 Asymptotic Results

In this section, we will derive asymptotic properties of the test statistic proposed in (2.2.11) under the null hypothesis and the alternative hypothesis. First we denote

∂log fH0 (x; θ) hi(x, θ) = , i = 1, 2, 3, ∂θi

and θ = (θ1, θ2, θ3) = (µ, σ, λ). We assume the following conditions hold:

2 C1. E(log f(X1)) < ∞. ˆ ˆ C2. Under the null hypothesis, |θ − θ| = max|θi − θi| → 0 in probability. 1≤i≤3 ˆ C3. Under alternative hypothesis, θ → θ0 in probability where θ0 is a constant vector with finite components.

3 3 C4. There are open intervals Θ0 ⊆ R and Θ1 ⊆ R containing θ and θ0 respec- tively. There also exists a function s(x) such that |h(x, ξ)| ≤ s(x) for all x ∈ R and

ξ ∈ Θ0 ∪ Θ1.

Theorem 2.3.1. Assume that the conditions (C1)-(C4) hold. Then under H0,

1 log(SN ) → 0 (2.3.1) n n in probability as n → ∞. 28 Proof of Theorem 2.3.1 We consider the following statistic,

n 1 Y 2m Tn = log min√ n 1≤m< n n(X − X ) j=1 (j+m) (j−m)

= − max√ tmn 1≤m< n

1 Pn n −1 where tmn = n i=1 log( 2m (X(j+m) − X(j−m)). It is a part of n log(SNn), where

SNn is the test statistic defined in (2.2.11). Similar to the work by Vasicek (1976), we rewrite

n 1 X t = − log f(x ) + V + U , (2.3.2) mn n i mn mn i=1 where

n   1 X F (x(i+m)) − F (x(i−m)) V = − log , mn n f(x )(x − x ) i=1 (i) (i+m) (i−m) n 1 X h n i U = log (F (X ) − F (X )) . mn n 2m (i+m) (i−m) i=1

0 where F is the distribution function of X s. Denote Fn the empirical distribution function of X0s and combine the first two terms in (2.3.2), we obtain

n n n   1 X 1 X 1 X F (x(i+m)) − F (x(i−m)) − log f(x ) + V = − log f(x ) − log n i mn n i n f(x )(x − x ) i=1 i=1 i=1 (i) (i+m) (i−m) n n   1 X 1 X F (x(i+m)) − F (x(i−m)) = − log f(x ) − log n (i) n f(x )(x − x ) i=1 i=1 (i) (i+m) (i−m) n   1 X F (x(i+m)) − F (x(i−m)) = − log n (x − x ) i=1 (i+m) (i−m) 2m n   1 X X F (x(i+m)) − F (x(i−m)) = − log (F (x ) − F (x )), 2m (x − x ) n (i+m) n (i−m) j=1 i=1 (i+m) (i−m) 29 where i ≡ j (mod 2m). Let

n   X F (x(i+m)) − F (x(i−m)) S = − log (F (x ) − F (x )), i ≡ j (mod 2m), j (x − x ) n (i+m) n (i−m) i=1 (i+m) (i−m) then

2m 1 X t = S + U . mn 2m j mn j=1

With the argument in Theorem 1 by Vasicek (1976),

2m 1 X S → H(f), a.s, 2m j j=1

√ as m/n → 0 uniformly for all 1 ≤ m ≤ n, where H(f) = E(− log f(xi)) =

E(− log f(x1). Since the statistics Umn is a non-positive variable by the definition and is independent of F,Umn → 0 in probability as m → ∞ and n → ∞ by Lemma 1 in Vasicek (1976). Therefore,

√ p Tn ≤ −t n,n → Ef (log(X1)), 2m −1 X p Tn ≥ − max√ (2m) Sj → Ef (log(X1)) 1≤m< n j=1 as n → ∞. It implies that

Tn → Ef (log(f(X1))), as n → ∞. (2.3.3) 30

Now, we rewrite the test statistic SNn as

n 1 1 X log(SN ) = T − log(f (X |θ)) n n n n H0 i i=1 n n ! (2.3.4) 1 X X + log(f (X |θ)) − log(f (X |θˆ )) , n H0 i H0 i n i=1 i=1 where θˆ = (ˆµ, σ,ˆ λˆ). Under the null hypothesis, T → E (log(f (X ))) in prob- n n fH0 H0 1 ability since (2.3.3). With the condition (C1), we have

n 1 X p log(fH (Xi|θ)) → Ef (log(fH (X1|θ))). (2.3.5) n 0 H0 0 i=1

With the condition (C2) holds and applying one-term expansion to the third part in the equation (2.3.4), we obtain

" n n # n 3 1 X X 1 X X log(f (X |θ)) − log(f (X |θˆ )) ∼= h (X ; θˆ )(θ − θˆ ), n H0 i H0 i n n i i n j nj i=1 i=1 i=1 j=1

where hi(·) is defined in Section 3. Since (C4), we obtain

n n n 3 1 X X 1 X X { log(f (X |θ))− log(f (X |θˆ ))} ∼= h (X ; θˆ )(θ − θˆ ) n H0 i H0 i n n i i n j nj i=1 i=1 i=1 j=1 n 3 1 X X p ≥ |h (X ; ξ )|(θ − θˆ ) → 0 n i i i j nj i=1 j=1 (2.3.6)

ˆ where |ξi − θ| ≤ |θ − θn|. Thus under the null hypothesis, the equations (2.3.4), 31 (2.3.5) and (2.3.6) provide

1 p log(SN ) → 0, as n → ∞. (2.3.7) n n

This completes the proof of Theorem 2.3.1. 

Theorem 2.3.2. Assume that the conditions (C1)-(C4) hold. Then under H1,

  1 fH1 (X1) log(SNn) → E log (2.3.8) n fH0 (X1; θ0) in probability as n → ∞.

In fact, Theorem 2.3.2 shows that the power of the test goes to 1 as n → ∞ under the alternative hypothesis, that is, the test is consistent.

Proof of Theorem 2.3.2. Under H1, we have

n n   1 1 X 1 X fH (Xi) log(SN ) = T − log(f (X )) + log 1 n n n n H1 i n f (X |θ ) i=1 i=1 H0 i 0 n ! (2.3.9) 1 X fH (Xi|θ0) + log 0 n ˆ i=1 fH0 (Xi|θn)

Since (2.3.3), similarly under H1

T → E (log(f (X ))), as n → ∞, (2.3.10) n fH1 H1 1 and the condition (C1) leads to

n 1 X p log(fH (Xi|θ)) → Ef (log(fH (X1|θ))). (2.3.11) n 1 H1 1 i=1 32 With the condition (C3),

n ! 1 X fH (Xi|θ0) P log 0 −→ 0 (2.3.12) n ˆ i=1 fH0 (Xi|θn) as n → 0. With the equations (2.3.10),(2.3.11) and (2.3.12), we obtain,

n   1 P 1 X fH (Xi) log(SN ) −→ log 1 > 0 as n → ∞. (2.3.13) n n n f (X |θ ) i=1 H0 i 0

This completes the proof of Theorem 2.3.2 

2.4 Calculations of Critical Values and P-values

2.4.1 Critical Values

For a skew normal distribution with the µ, σ and the shape parameter λ, the maximum likelihood estimatorsµ, ˆ σˆ and λˆ have the property that the distribution (ˆµ − µ)/σˆ and λˆ depends only on the sample sizes n and the true value of λ (Mateu-Figueras et al. (2007)). As a consequence, the distribution of the test statistic proposed in section 2 is invariant to the change of the location and the scale parameters. Therefore, for the simplicity, we simulate 5,000 samples from SN(0, 1, λ) with various values of λ. For each simulated sample, we obtain the maximum likelihood of an assumed skew normal distribution under the null hypothesis by using R package sn.mle, that isµ, ˆ σˆ and λ.ˆ Then we can calculate a statistic for each sample based on (2.2.11). After we obtain all 5000 test statistics, we order them and choose 90th, 95th and 99th to 33 be the critical values corresponding to the significance levels α = 0.1, 0.05 and 0.01.

2.4.2 Approximations to the p-value of SNn

From (2.2.11) in section 2.2, we observe that the values of the test statistic depends on the estimated parameters based on the data, and the asymptotic null distribution is not available. Therefore, we will provide an empirical procedure through the simulations to approximate the p-value asymptotically as follows.

1. Fit the original data x1, x2, ··· , xn with a SN(µ, σ, λ) and obtain the estimated values (ˆµ, σ,ˆ λˆ) with R package sn.mle.

ˆ 2. Simulate a skew normal distributed data y1, y2, ··· , yn following SN(0, 1, λ) due to the of statistic being invariant to change of the location and scale parameters described in the above section.

1 3. Calculate the test statistic (2.2.11) for the original data and denote bySNn.

Calculate the test statistic (2.2.11) for the simulated sample y1, ··· , yn and

1B denote as SNn .

4. Repeat the above simulation procedure M times and obtain M test statistics

1B MB SNn , ··· ,SNn .

5. The p-value then will be approximated by

M 1 X pˆ = I(SN iB ≥ SN 1), M n n i=1

iB 1 where I(·) is an indicator function taking value 1 when SNn ≥ SNn, and

iB 1 taking value 0 when SNn < SNn. 34

Table 2.1: Type I error with SN(0, 1, λ), α = 0.05

λ n 1 2 3 4 5 7 10 20 0.047 0.044 0.049 0.046 0.047 0.045 0.036 25 0.053 0.051 0.049 0.048 0.048 0.052 0.044 50 0.052 0.043 0.047 0.047 0.048 0.039 0.048 100 0.053 0.045 0.033 0.048 0.045 0.044 0.051

2.5 Simulations

To investigate the performance of the proposed test in controlling the Type I error with the given nominal level α = 0.05, we conduct simulations 5000 times under SN(0, 1, λ) with different values of λ = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 and sample sizes n = 20, 25, 50, 100. For each sample, we calculate the sample statistic based on (2.2.11) and compare to the critical value. The percentage of rejecting the null hypothesis will be the size of the proposed test. The simulated results are listed in Table 2.1. We observe that the Type I error of the proposed test is well controlled. We also conduct the simulations of the power to illustrate the performance of the test under the alternative hypothesis. Mateu-Figueras et al. (2007) proposed five different data-driven empirical tests: Anderson-Darling statistic (A2); Cram´er-von Mises statistic (W 2); Watson’s statistic (U 2); Kolmogorov-Smirnov statistic (D) and Kuiper statistic (V ). See Mateu-Figueras et al. (2007) for the detailed expressions of these five statistics. They concluded that all the five tests are powerful if the al-

2 ternative distributions are lognorm(0, 1), exp(1) and χ2, but have poor performance for the distributions lognorm(0, 0.3), Weibull(1, 2), Gamma(4, 1), Gamma(5, 1) and

2 χ4 with respect to different sample sizes. In Table 2.2, we compare the performance of their five tests and the proposed ELR based goodness of fit test under the alter- 35 2 native distributions being lognorm(0, 1), exp(1) and χ2. The results show that the ELR based test performs better than any of those five tests with the small sample sizes n = 20, 25. When the sample sizes increase to medium or large sample sizes n = 50, 100, the ELR based test performs better than some of those five tests, and worse than the others. However, the differences between the powers are still in acceptable ranges. In Table 2.3, we compare the proposed test with those five tests under the alter- native distributions being lognorm(0, 0.3), Weibull(1, 2), Gamma(4, 1), Gamma(5, 1)

2 and χ4. In most cases, the proposed test shows better performance from the small sample size to the large sample size, except a little worse in a few cases. We also observe that when the alternative distribution is , the proposed test shows much higher power than those five tests, which indicates the proposed test is more sensitive in making difference between a Weibull distribution and a skew normal distribution. From the comparison of the powers, we can see that the ELR based test is a very competitive candidate for the goodness of fit test of the skew normality. Whiling maintaining the distribution free property, it avoids to decide which test should be used to achieve the highest power when we apply those five tests proposed by Mateu-Figueras et al. (2007), which may not be practical since the alternative distribution is usually unknown. Since a normal distribution is the special case of a skew normal distribution by choosing λ = 0, we also investigate the performance of the proposed test under the alternative distribution being a normal distribution. Without loss of the generality, we choose N(0, 1) to be the alternative distribution. We conduct 5000 simulations with the sample size n = 20, 25, 50, 100 and the significance level α = 0.1, 0.05, 0.01. Table 2.4 lists the results. From the results, we observe that the powers of the test 36

Table 2.2: Power comparison with n = 20, 25, 50 and 100

ELR A2 W 2 U 2 DV Exp(1) 0.621 0.367 0.358 0.255 0.333 0.256 lognorm(0, 1) n = 20 0.746 0.613 0.634 0.508 0.609 0.514 2 χ2 0.603 0.386 0.375 0.255 0.346 0.258 Exp(1) 0.589 0.446 0.437 0.297 0.394 0.297 lognorm(0, 1) n = 25 0.770 0.704 0.720 0.596 0.686 0.591 2 χ2 0.611 0.469 0.460 0.307 0.413 0.304 Exp(1) 0.667 0.744 0.722 0.533 0.666 0.525 lognorm(0, 1) n = 50 0.859 0.925 0.929 0.863 0.915 0.855 2 χ2 0.639 0.770 0.749 0.547 0.700 0.543 Exp(1) 0.882 0.952 0.940 0.811 0.905 0.808 lognorm(0, 1) n = 100 0.910 0.995 0.995 0.987 0.993 0.983 2 χ2 0.861 0.959 0.950 0.836 0.922 0.833

in all the scenarios are pretty low and are around the given nominal levels. It is reasonable since the normal distribution can be approximated by reasonably closely fitting skew normal distributions. Therefore, it is not expected that the proposed test can discriminate well between the standard normal distribution and the skew normal distribution. The test statistic proposed in (2.2.10) involves δ ∈ (0, 1). We take δ = 0.5 in the final form of the test statistic in (2.2.11). To investigate the performance of the test statistic with different values of δ, we conducted an extensive Monte Carlo study and simulation estimates are based on 5000 replications. We only list part of the results in Table 2.5. It illustrates that the power of the test statistic was found not to depend on the different values significantly. As Vexler et al. (2011) pointed out that it was due in part to the fact that the proposed operator min in (2.2.11) has the functional ability to detect well the preferable values of m that in is a component of the entropy-based statistic in (2.2.8). Therefore, similar to 37

Table 2.3: Power comparison with n = 20, 25, 50 and 100

ELR A2 W 2 U 2 DV Weibull(1, 2) 0.063 0.009 0.016 0.027 0.016 0.032 Gamma(4, 1) 0.042 0.020 0.026 0.030 0.028 0.035 Gamma(5, 1) n = 20 0.035 0.016 0.021 0.031 0.023 0.035 lognorm(0, 0.3) 0.027 0.013 0.021 0.031 0.023 0.034 2 χ4 0.067 0.055 0.066 0.058 0.069 0.058 Weibull(1, 2) 0.068 0.014 0.016 0.028 0.017 0.035 Gamma(4, 1) 0.042 0.020 0.027 0.028 0.017 0.035 Gamma(5, 1) n = 25 0.043 0.019 0.023 0.031 0.026 0.036 lognorm(0, 0.3) 0.038 0.020 0.027 0.031 0.023 0.040 2 χ4 0.089 0.063 0.076 0.059 0.073 0.059 Weibull(1, 2) 0.078 0.020 0.020 0.030 0.030 0.036 Gamma(4, 1) 0.043 0.030 0.028 0.039 0.034 0.043 Gamma(5, 1) n = 50 0.052 0.028 0.025 0.040 0.032 0.044 lognorm(0, 0.3) 0.041 0.037 0.035 0.048 0.040 0.051 2 χ4 0.101 0.100 0.106 0.083 0.107 0.074 Weibull(1, 2) 0.126 0.071 0.070 0.067 0.070 0.061 Gamma(4, 1) 0.062 0.048 0.046 0.056 0.049 0.058 Gamma(5, 1) n = 100 0.053 0.042 0.041 0.051 0.046 0.053 lognorm(0, 0.3) 0.047 0.062 0.056 0.063 0.054 0.062 2 χ4 0.121 0.138 0.139 0.130 0.131 0.112

Table 2.4: Power of Test with Alternative Distribution N(0, 1)

n \ α 0.01 0.05 0.1 20 0.0075 0.0580 0.1005 25 0.0095 0.0505 0.0995 50 0.0065 0.0490 0.0965 100 0.0135 0.0495 0.1075 38

Table 2.5: Empirical Power Evaluation of the statistic (2.2.11) with different δ at α = 0.05

Distribution n δ = 0.4 δ = 0.5 δ = 0.6 Weibull(1, 2) 20 0.0585 0.0630 0.0695 25 0.0705 0.0685 0.0672 50 0.0800 0.0780 0.0810 100 0.1275 0.1260 0.1240 Gamma(5, 1) 20 0.0349 0.0350 0.0352 25 0.0455 0.0425 0.0405 50 0.0505 0.0515 0.0480 100 0.0535 0.0525 0.0505

Table 2.6: Otis IQ Scores for Non-whites 91, 102, 100, 117, 122, 115, 97, 109, 108, 104, 108, 118, 103, 123, 123, 103, 106, 102, 118, 100, 103, 107, 108, 107, 97, 95, 119, 102, 108, 103, 102, 112, 99, 116, 114, 102, 111, 104, 122, 103, 111, 101, 91, 99, 121, 97, 109, 106, 102, 104, 107, 95 the argument of Vexler et al. (2011), we can hypothesize that the preferable values of m are mostly located below or around n0.5.

2.6 Application

2.6.1 Otis IQ Scores for Non-whites

The IQ data set is taken from Roberts (1988). The Roberts data sets give the Otis IQ scores for 87 white males and 52 non-white males hired by a large insurance company in 1971. Arnold et al. (1993) applied the skew normal distribution to a part of the data set. Gupta and Brown (2001) applied the full data set to illustrate their proposed estimation method for a skew normal distribution. In this section, 39

Table 2.7: Estimated values for N(µ, σ) and SN(µ, σ, λ) Skew normal Normal Shapiro-Wilk test Pearson χ2 test µ 106.621 106.654 0.957(pvalue=0.05794) 12.6154(p-value=0.08205) σ 8.266 8.230 λ 0.585 2logL −182.1399 −183.3872 SIC 376.132 374.676 we apply the proposed goodness of fit test to the IQ scores for non-whites. The data is given in the Table 2.6. We obtain the estimated parameters of an assumed skew normal distribution of the data asµ ˆ = 106.62, σˆ = 8.26 and λˆ = 0.58. Follow- ing the procedures in section 2.4.2, we generate 5000 samples with SN(0, 1, 0.58) with the sample size n = 52. The test statistic is calculated from (2.2.11) with the approximated p-value 0.012, which leads to the rejection of the null hypothesis of being a skew normal distribution at the significance level α = 0.05. In fact, the data can be fitted by a normal distribution good enough. The estimated values of the normal distribution and the skew normal distribution are listed in Table 2.7. We also list the values of Schwarz information criterion and the results of Shapiro-Wilk and Pearson χ2 normality tests. All the results match our conclusion. We also analyze this data with the five statistics studied by Mateu-Figueras et al. (2007). With the same data, the five statistics are W 2 = 15.04,U 2 = 9.17,A2 = 61.63 and D = 0.96,V = 1.02. With the critical values provided in Mateu-Figueras et al. (2007), all of these five statistics reject the null hypothesis which also matches our conclusion obtained above. 40

Figure 2.1: Histogram of IQ scores with a skew normal fit and normal fit.

Table 2.8: Australian Institute of Sport, Body mass index of 50 females 24.47 23.99 26.24 20.04 25.72 25.64 19.87 23.35 22.42 20.42 22.13 25.17 23.72 21.28 20.87 19.00 22.04 20.12 21.35 28.57 26.95 28.13 26.85 25.27 31.93 16.75 19.54 20.42 22.76 20.12 22.35 19.16 20.77 19.37 22.37 17.54 19.06 20.30 20.15 25.36 22.12 21.25 20.53 17.06 18.29 18.37 18.93 17.79 17.05 20.31

2.6.2 Australian Institute of Sport Data

The second data is taken from Australian Institute of Sport data by Cook and Weisberg (1994). The data includes 100 females and 102 males with 13 variables such as height, weight, body mass index (BMI) etc. We choose the BMI values for the second 50 females to apply the proposed method. Table 2.8 gives the 50 Australian female athletes body mass index (BMI). First, we estimate the parameters of an assumed skew normal distribution fit- ting this data. We obtain the estimated parameters by R package sn.mle asµ ˆ = 21.875, σˆ = 3.288 and λˆ = 0.826. Then we calculate the test statistic (2.2.11) as 41 636.3881. We follow the procedure in section 4.2 to generate 5000 samples with SN(0, 1, λˆ) = SN(0, 1, 0.826) with the sample size n = 50. The approximated p- value is 0.395 which is greater than the nominal level 0.05. It indicates that we fail to reject the null hypothesis, that is, a skew normal distribution can provide a reasonable fitting for the data. The data can be fitted by a skew normal distri- bution SN(21.875, 3.288, 0.826). We also analyze this data with the five statistics studied by Mateu-Figueras et al. (2007). With the same data, the five statistics are W 2 = 11.47,U 2 = 7.09,A2 = 41.15 and D = 0.99,V = 1.02. With the critical values provided in Mateu-Figueras et al. (2007), all of these five statistics reject the null hypothesis and conclude that the skew normal distribution does not provide a good fit.

Figure 2.2: The histogram with a skew normal fit and normal fit for the body mass index (BMI) of 50 females. 42 2.7 Conclusion

In this chapter, we propose a goodness-of-fit test based on the empirical likelihood ratio method for skew normality. When both null and alternative distributions are completely known, Neyman-Pearson lemma shows that the likelihood ratio test is a UMP test. When the alternative distribution is unknown which is usually the case in practice, the alternative distribution is needed to be estimated in an appropriate way so that the maximum likelihood ratio test can approximate the optimal UMP test. Compared to the parametric methods which provide robust and consistent estimators, the nonparametric methods have the advantage of being distribution free. The empirical likelihood method proposed by Owen (1988, 1990) is a method combining the advantages that it is data driven method from the nonparametric aspect, at the same time, it can be used to find efficient estimators and construct the tests with good power properties. This can be achieved by using likelihood methods from the parametric aspect. We also derive the asymptotic distributions of the proposed test statistic under the null and the alternative hypotheses respectively. Simulations indicate that the test can control the Type I error well. The power comparisons with some other available data-driven methods show that it is a very competitive goodness of fit test for the skew normality. 43

CHAPTER 3

CHANGE POINT PROBLEM FOR STANDARD SKEW NORMAL DISTRIBUTION

3.1 Introduction

In statistics a change point can be viewed as an unknown point or time point such that the observations follow different distributions before and after that point. In general, the change point problem involves hypothesis testing and parameter esti- mation. That is, first we need to test the null hypothesis of no change point against the alternative hypothesis of having a single change point. Second we need to esti- mate the corresponding location of the change point if there is any. Change point problem was first introduced by Page (1954, 1955) to detect a single change point for and since then it has received numerous attention in literature. The change point problem can be defined as follows. 44

Let X1, ··· ,Xn be a sequence of independent random vector with cumulative dis- tribution functions F1,F2, ··· ,Fn respectively, then change point problem can be viewed as testing the following hypotheses:

H0 : F1 = F2 = ··· = Fn, (3.1.1)

versus the alternative hypothesis(H1)

H1 : F1 = F2 = ··· = Fk1 6= Fk1+1 = ··· = Fk2 6= Fk2+1 = ··· = Fkq 6= Fkq+1 = ··· = Fn (3.1.2)

where 1 < k1 < k2 < ··· < kq < n are the change points and k1, . . . kq, q are unknown positions to be estimated. If the distribution functions belong to a com- mon F (θ) where θ ∈ Rp, then the change point problem will be simplified to test the hypotheses for the population parameters θi, i = 1, 2, ··· , n.

H0 : θ1 = θ2 = ··· = θn, (3.1.3) versus the alternative

H1 : θ1 = θ2 = ··· = θk1 6= θk1+1 = ··· = θk2 6= θk2+1 = ··· = θkq 6= θkq+1 = ··· = θn, (3.1.4)

where q and k1, k2, ··· , kq are unknown and need to be estimated. 45 3.1.1 Literature Review

In practical situation, statisticians are faced with the problem of detecting the num- ber of change point or jumps and their locations. In literature, the frequently used methods for testing the change points are the maximum likelihood ratio test, Bayesian test, nonparametric test, process, and method based on the information criteria. As noted in the literature, the most early study on the change point problem was concentrated on testing the single change point of a random sequence of variables. For example, Chernoff and Zacks (1964) derived the Bayes for the current mean of a normal distribution which is subjected to changes in time for a priori uni- form distribution on the real line and a quadratic . Sen and Srivastava (1975a,b) derived the exact and function for some bayesian test statistics on tests for detecting change in mean for a sequence of normal distri- bution. Hawkins (1977) derived the null distributions of the likelihood ratio test for known and unknown variance for testing the mean change. However, the test statis- tics for case of unknown variance was incorrect. Worsley (1979) derived the null distribution for a single change point with known and unknown variance (correct version). Kim and Siegmund (1989) derived a likelihood ratio test to detect a single change point in simple model. Change point problems for variance change have also been considered in the literature. Hsu (1977) used the maximum likelihood to estimate the time of changes and at the different period of the time series. Davis (1979) and Inclan (1993) derived the bayesian approach to detect multiple change points of variance in observations using posterior odds. Chen and Gupta (1997) studied the variance change problem for univariate Gaussian model using the information theoretic approach. Horv´athet al. (2004) considered the 46 change point for linear models. Gurevich and Vexler (2005) study the change point problem for . Jaruˇskov´a(2007) studied the asymptotic behavior of the log-likelihood ratio statistic for testing a change in a three parameters Weibull distribution. Vexler et al. (2009) studied the classification problems in the context of change point problem. Ning and Gupta (2009) investigated the change point problem for generalized lambda distribution. Ning et al. (2012) and Ning (2012) proposed nonparametric methods to detect the different types of changes in mean. Multiple change point problem has also been widely considered in literature. To deal with the multiple change point problem, Vostrikova (1981) proposed the binary segmentation procedure to transfer the multiple change point problem to the single change point problem. Such a method can detect the number of change points and the corresponding locations simultaneously, thus it has some computational advan- tage. He also showed that such a method is consistent. Due to its advantage, this method has been used widely for change point analysis. The binary segmentation procedure can be summarized in the following steps: Consider testing the hypothesis in (3.1.3) and (3.1.4):

Step1 : Screen the whole data set for a single change point. That is, test the null hypothesis

H0 : θ1 = θ2 = ··· = θn, (3.1.5)

versus the alternative hypothesis

H1 : θ1 = ··· θk 6= θk+1 = ··· = θn, (3.1.6) 47

where k is the location of the single change point. If we fail to reject H0, then

we stop. But if we reject H0, it implies there is a change point thus we proceed to step 2.

Step2 : Test for two subsequent before and after the change point found in step 1 separately for possible change points.

Step3 : Repeat the process until no further subsequences have change points.

Step4 : Collect all the change point positions from step 1 to 3 and denote them by ˆ ˆ {k1, ··· , kq} with q change points.

Locating the change points and estimating their positions can be treated as the problem, that is, we select the best model to fit the data with q possible change points. In the following sections, we will mainly focus on the method based on the Schwartz information criterion to deal with the multiple change point problem for the standard skew normal distribution and discuss corresponding properties of this testing procedure. As two alternative options, we will also briefly introduce the likelihood ratio test and the bayesian method for the same problem for readers’ reference.

3.2 Change of the Shape Parameter λ

Let X1, ··· ,Xn be a sequence of independently random variables from a standard skew normal distribution SN(λ) with the shape parameter λ. We are interested in testing the shape change. Hence we consider testing the null hypothesis:

H0 : λ1 = λ2 = ··· = λn = λ, (3.2.1) 48 versus the alternative hypothesis

H1 : λ1 = λ2 = ··· = λk 6= λk+1 = λk+2 = ··· = λn . (3.2.2) | {z } | {z } λ1 λ2

With the binary segmentation method introduced in the previous section, we only consider the single change point in the random sequence.

3.2.1 Information Approach

The information approach was first introduced by Akaike (1973) known as ”Akaike information criterion (AIC)” for the model selection in statistics. Since then, many authors have further developed new criteria and applied them to different fields such as decision theory, , and control theory. Akaike (1973) defined the AIC criterion as follows.

ˆ AIC(k) = −2 log L(Θk) + 2k for k = 1, 2, ··· ,K (3.2.3)

ˆ where k is the number of parameters in model (k) and L(Θk) is the correspond- ing maximum likelihood. In model selection, the model that minimizes the AIC is considered the best model. Unfortunately, this model is not an asymptotically con- sistent estimator of the true model order (see Schwarz (1978)). Thus many authors have tried to modify this method, among which is Schwarz (1978) who proposed the Schwarz information criterion (SIC) and proved that it gives an asymptotically of the true model order. The Schwarz information criterion is also known as Bayesian information criterion (BIC). The SIC is defined as follows: 49

ˆ SIC(k) = −2 log L(Θk) + k log n for k = 1, 2, ··· ,K (3.2.4)

ˆ where k is the number of parameters in the model (k) and L(Θk) is the maximum likelihood for the model. Note that the only difference between AIC and SIC is that the penalty term is k log n instead of 2k in SIC. Since SIC is asymptotically consistent, we will propose a method based on SIC to detect the change point and estimate its location of the shape parameter of the standard skew normal distribution SN(λ). To define the SIC for the null and alternative hypothesis (3.2.1) and (3.2.2), we first need to obtain the MLE’s for the parameters λ, λ1, λ2. The likelihood functions under the null and alternative hypotheses are defined as:

n n n Y n Y Y LH0 (λ) = 2φ(xi)Φ(λxi) = 2 φ(xi) Φ(λxi), (3.2.5) i=1 i=1 i=1 k n Y Y LH1 (λ1, λ2) = 2φ(xi)Φ(λ1xi) 2φ(xi)Φ(λ2xi) (3.2.6) i=1 i=k+1 n k n n Y Y Y = 2 φ(xi) Φ(λ1xi) Φ(λ2xi). (3.2.7) i=1 i=1 i=k+1

To obtain the MLE for λ, λ1 and λ2, we take the derivative of the log-likelihood function with respect to the parameters and set the equations to be zero.

n ∂ log LH (λ) X xiφ(λxi) 0 = = 0, (3.2.8) ∂λ Φ(λx ) i=1 i 50 k ∂ log LH (λ1, λ2) X xiφ(λ1xi) 1 = = 0, (3.2.9) ∂λ Φ(λ x ) 1 i=1 1 i

n ∂ log LH (λ1, λ2) X xiφ(λ2xi) 1 = = 0. (3.2.10) ∂λ2 Φ(λ2xi) i=k+1

Solving the three equations above we obtain the MLE for λ, λ1, λ2. However, the equations have no explicit form for these parameters. First, we note that the solu-

0 tions to these equations do not exist if the observed xis have the same signs. That is,

0 0 if all the observed xis are positive (negative), then the MLE for λ s will be (minus) infinite. In this case Azzalini (1985, 2005) used the centralization of the parameters to avoid this situation. Second, even in the situation where the observations are both positive and negative, the MLE is quite unstable with a positive probability of obtaining an infinite MLE, especially when |λ| is large and the sample size is small. Thus taking this into consideration, the numerical approach based on R package sn ˆ ˆ ˆ ( version 0.4-7, Azzalini, 2011) will be applied to obtain λ, λ1, λ2. Now under the null hypothesis the SIC is defined as :

ˆ SICt(n) = −2 log L(λ) + t log n, t = 1

ˆ where λ is the MLE of λ under H0. Under the alternative hypothesis, SIC is defined as:

ˆ ˆ SICt(k) = −2 log L(λ1, λ2) + t log n, t = 2

ˆ ˆ where λ1, λ2 are the MLEs’ of λ1, λ2, respectively under H1. Now based on the 51 principles of minimum information criterion we can draw the following conclusions.

We accept H0 (there is no change point) if

SICt(n) < min SICt(k) 2≤k≤n−2

and we accept H1 (there is a change point) if

SICt(n) > SICt(k), for some k and estimate the position of the change point by kˆ such that

ˆ SICt(k) = min SICt(k). 2≤k≤n−2

In change point problem, the method may not detect the changes if they occur either at the very beginning or the very ending of the observations. Meanwhile, we need to guarantee sufficient observations to estimate the parameters. Therefore, we proposed the trimmed version of the method as

SICt(n) < min SICt(k). [log n]≤k≤n−[log n]

To make our conclusion more statistically convincing, we introduce a critical level parameter as suggested by Gupta and Chen (2012), since the change in data may be due to fluctuation in the data or if the SICt(n) is very close to SICt(k). So we follow the same idea and introduce a significance level α and a critical value cα. 52

So we will accept H0 if

SICt(n) < min SIC(k) + cα. [log n]≤k≤n−[log n]

  where cα is determined from 1 − α = P SIC(n) < min SIC(k) + cα|H0 . [log n]≤k≤n−[log n]

Theorem 3.2.1. {Cs¨org¨oand Horv´ath(1997)}:

Under the null hypothesis H0, as n → ∞, then for all x  IR,

−x lim P [a(log n)λn − b(log n) ≤ x] = exp {−2e }, (3.2.11) n→∞

1/2 1 1 where a(log n) = (2 log log n) and b(log n) = 2 log log n + 2 log log log n + log Γ( 2 )   2 ˆ ˆ ˆ and λn = max −2 log L(λ) + 2 log L(λ1, λ2) . [log n]≤k≤n−[log n]

Now using Theorem 3. 2.1, we derive the critical value cα as follows.

  1 − α = P SIC(n) < min SIC(k) + cα|H0 [log n]≤k≤n−[log n]   = P max (SIC(n) − SIC(k)) < cα|H0 [log n]≤k≤n−[log n]    ˆ ˆ ˆ  = P max −2(log L(λ) − log L(λ1, λ2)) − log n < cα|H0 [log n]≤k≤n−[log n]

 2  = P λn < log n + cα|H0

 2  = P 0 < λn < log n + cα|H0

h 1 i = P 0 < λn < (log n + cα) 2 |H0

h 1 i = P −b(log n) < a(log n)λn − b(log n) < a(log n)(log n + cα) 2 − b(log n)|H0

h 1 i = P a(log n)λn − b(log n) < a(log n)(log n + cα) 2 − b(log n)

− P [a(log n)λn − b(log n) < −b(log n)] 53 Thus by (3.2.11) we have

∼ n n 1 oo 1 − α = exp −2 exp a(log n)(log n + cα) 2 − b(log n) − exp {−2 exp {b(log n)}}

∼ n n 1 oo ⇒ 1 − α + exp {−2 exp {b(log n)}} = exp −2 exp a(log n)(log n + cα) 2 − b(log n)

∼ n 1 o ⇒ log [1 − α + exp {−2 exp {b(log n)}}] = −2 exp a(log n)(log n + cα) 2 − b(log n)

− 1 1 2 ∼ ⇒ log log [1 − α + exp {−2 exp {b(log n)}}] = a(log n)(log n + cα) 2 − b(log n).

cα then can be obtained by,

 2 −1 − 1 b(log n) c ∼= log log [1 − α + exp {−2 exp {b(log n)}}] 2 + − log n. α a(log n) a(log n)

We compute the critical values cα for SIC’s for different sample sizes n = 13, ··· , 200, for different significant levels α = 0.01, 0.025, 0.05, and 0.1 and the results are given in Table 3.1.

3.2.2 Likelihood Ratio Based Test

Consider testing the hypotheses (3.2.1) and (3.2.2). We introduce the likelihood ratio based test to estimate the change point and its location. When there is a

∗ change point at k = k , we would like to reject H0 for sufficient small value of Λk where Λk is defined as:

Qn maxf(x1, ··· , xn|λ) max i=1 f(xi|λ) max L(λ) H0 H0 H0 Λk = = Qn = , max f(x1, ··· , xn|λ1, λ2) max i=1 f(xi|λ1, λ2) max L(λ1, λ2) H1∪H0 H1∪H0 H1∪H0 (3.2.12) 54

Table 3.1: Approximate Critical values of SIC n α=0.01 α=0.025 α=0.05 α= 0.1 13 20.92675 14.56992 10.49569 6.94636 14 20.43065 14.34047 10.37501 6.89469 15 20.07721 14.16477 10.27905 6.852219 16 19.80693 14.02261 10.19892 6.815508 17 19.58877 13.90274 10.12958 6.782654 18 19.40541 13.79859 10.06803 6.75255 19 19.24663 13.70609 10.01237 6.724527 20 19.10608 13.62261 9.961366 6.698157 21 18.97964 13.54637 9.914163 6.673154 22 18.86451 13.47609 9.870141 6.649319 23 18.75870 13.41085 9.828841 6.626502 24 18.66073 13.34992 9.789907 6.604593 25 18.56949 13.29276 9.753056 6.583501 26 18.48412 13.23891 9.718059 6.563154 27 18.4039 13.18800 9.684725 6.54349 28 18.32826 13.13974 9.652894 6.524459 29 18.25673 13.09386 9.62243 6.506014 30 18.18890 13.05013 9.593215 6.48811 35 17.89456 12.85800 9.462688 6.405716 40 17.65577 12.69917 9.352132 6.332946 45 17.45567 12.56395 9.256123 6.26767 50 17.28396 12.44632 9.171203 6.208415 55 17.13390 12.34228 9.095028 6.154117 60 17.00085 12.24907 9.02593 6.103975 70 16.77339 12.08754 8.904353 6.013831 80 16.58389 11.95085 8.799708 5.934417 90 16.42182 11.83244 8.707783 5.863376 100 16.28045 11.72800 8.625775 5.799058 120 16.04298 11.55018 8.484195 5.686098 140 15.84840 11.40228 8.364673 5.589008 150 15.76296 11.33671 8.3112 5.545099 160 15.68388 11.2757 8.261179 5.503777 170 15.61031 11.21865 8.214187 5.464746 180 15.54154 11.16508 8.169873 5.427759 200 15.41621 11.06686 8.088153 5.359113 55 where L(.) is the likelihood function. The log likelihood function under the null and alternative hypothesis are defined as:

n n X X lH0 (λ) = log LH0 (λ) = n log 2 + log φ(xi) + log Φ(λxi) i=1 i=1 n k n X X X lH1 (λ1, λ2) = log LH1 (λ1, λ2) = n log 2 + log φ(xi) + log Φ(λ1xi) + log Φ(λ2xi) i=1 i=1 i=k+1

ˆ ˆ ˆ Let λ, λ1, λ2 be the MLE for the parameters λ, λ1, λ2. When the change point k is unknown, it is naturally for us to use the maximally selected test statistic, which is defined by

Tn = max {−2 log Λk}. (3.2.13) 1≤k≤n

ˆ We repeat H0 for a multiple large value of Tn. The estimate of the change point k is the value at which the test statistic reaches it maximum. Under mild conditions, Cs¨org¨oand Horv´ath(1997) show that it has asymptotic extreme distribution.

3.2.3 Bayesian Approach

In this section we will briefly present the Bayesian approach to test the change point for SN(λ). Let x1, ··· , xn be sequence of independent random variables from the skew normal distribution for which x1, ··· , xk ∼ SN(λ1) and xk+1, ··· , xn ∼

SN(λ2) where λ1 and λ2 are unknown shape parameters and the parameter k is the unknown position of the change point. Detecting the change point is equivalent to 56 testing the following hypothesis:

H0 : k = n versus H1 : 1 ≤ k ≤ n − 1

We consider the following prior for the parameters k, λ1 and λ2

Let k have a prior pdf of g0(k) given by

  p, if k = n g0(k) =  1−p  n−1 , if k 6= n

where p is a known constant such that 0 ≤ p ≤ 1. When k = n, then λ1 = λ2 and the prior density of λ1 is given by

!−3/2 rπ 2λ2 π2 g (λ ) = 1 + 1 t(0, , 1/2), 1 1 2 π2 ≈ 4 4

π2 where 0 is the location (center), 4 is the scale and 1/2 is the degrees of freedom.

When k 6= n, we assumed λ1 and λ2 are independent and the prior of λ2 is given by

!−3/2 rπ 2λ2 π2 g (λ ) = 1 + 2 t(0, , 1/2). 2 2 2 π2 ≈ 4 4

Note that density of g(.) is an approximation of Jeffrey’s prior. That is,

s Z φ2(λx) g(λ) = f J (λ) ∝ 2x2φ(x) dx. Φ(λx)

Bayes and Branco (2007) used the following approximation to show that the Jeffrey’s

π2 prior approximates Student’s t distribution (t(0, 4 , 1/2)) . Consider the following 57 approximation

1 φ(x) 1  x2  ≈ √ exp − , (3.2.14) π pΦ(x)(1 − Φ(x)) 2π(π/2) 2(π2/4) see Bayes and Branco (2007) for detailed proof. Now the fisher information value can be written as :

Z ∞ φ2(λx) Z ∞ φ2(λx) Z ∞ φ2(λx) I(λ) = 2x2φ(x) dx = 2x2φ(x) dx + 2x2φ(x) dx −∞ Φ(λx) 0 Φ(λx) 0 [1 − Φ(λx)] Z ∞ φ2(λx) = 2x2φ(x) dx. 0 Φ(λx)[1 − Φ(λx)]

Now using (3.2.14) we obtain,

!−3/2 Z ∞ φ2(λx) 2 2λ2 I(λ) = 2x2φ(x) dx ≈ 1 + . Φ(λx) π π2 −∞ 4

Therefore,

!−3/2 rπ 2λ2 π2 f J (λ) = 1 + t(0, , 1/2). 2 π2 ≈ 4 4

The joint density function of k, λ,λ2 is given by

  n Qn 2 i=1 φ(xi)Φ(λ1xi), if k = n f(x1, ··· xn|k, λ1, λ2) =  n Qn Qk Qn 2 i=1 φ(xi) i=1 Φ(λ1xi) i=k+1 Φ(λ2xi), 1 ≤ k ≤ n − 1

The joint posterior density function of k, λ1, λ2 is given by

f(x1, ··· xn|k, λ1, λ2)g(k, λ1, λ2) h0(k, λ1, λ2|x1, ··· xn) = Pn RR k=1 f(x1, ··· , xn|k, λ1, λ2)g(k, λ1, λ2)dλ1dλ2 58

Let h0(λ) = h0(k, λ1, λ2|x1, ··· xn) for convenience.

 !−3/2 √ 2 n Qn π 2λ1  2 φ(x )Φ(λ1x )∗p∗ 1+  i=1 i i 2 π2  4  !−3/2 , k = n  √ 2λ2  Pn R n Qn π 1  2 φ(xi)Φ(λ1xi)∗p∗ 1+ dλ1  k=1 i=1 2 π2  4  !−3/2 !−3/2  √ 2 √ 2 n Qn Qk Qn 1−p π 2λ1 π 2λ2 2 φ(x ) Φ(λ1x ) Φ(λ2x )∗ 1+ 1+ h0(λ) = i=1 i i=1 i i=k+1 i n−1 2 π2 2 π2 4 4  !−3/2 !−3/2 , √ 2λ2 √ 2λ2  Pn RR n Qn Qk Qn 1−p π 1 π 2  2 φ(x ) Φ(λ1x ) Φ(λ2x )∗ 1+ 1+ dλ1dλ2  k=1 i=1 i i=1 i i=k+1 i n−1 2 π2 2 π2  4 4    1 ≤ k ≤ n − 1.

The above equation becomes

−3/2  2 ! 2λ1 Qn  p 1+ Φ(λ1x )  π2 i=1 i  4  !−3/2 , k = n  2  Pn R 2λ1 Qn  p 1+ Φ(λ1xi)dλ1  k=1 π2 i=1  4  !−3/2 !−3/2  2 2 1−p 2λ1 2λ2 Qk Qn 1+ 1+ Φ(λ1x ) Φ(λ2x ) h0(λ) = n−1 π2 π2 i=1 i i=k+1 i 4 4  −3/2 −3/2 ,  2 ! 2 !  Pn RR 1−p 2λ1 2λ2 Qk Qn  1+ 1+ Φ(λ1x ) Φ(λ2x )dλ1dλ2  k=1 n−1 π2 π2 i=1 i i=k+1 i  4 4    1 ≤ k ≤ n − 1

h0(k, λ1, λ2|x1, ··· xn) can be simplified to

  −3/2 2λ2 p ∗ const ∗ 1 + 1 Qn Φ(λ x ), k = n  π2 i=1 1 i  4   −3/2  −3/2 1−p 2λ2 2λ2 k n h0(λ) = 1 2 Q Q n−1 ∗ const ∗ 1 + π2 1 + π2 i=1 Φ(λ1xi) i=k+1 Φ(λ2xi),  4 4   1 ≤ k ≤ n − 1

Next we obtain the posterior density of the change point k by integrating h0(k, λ1, λ2|x1, ·xn) 59 function.

Z ∞ Z ∞ h(k|x) = h0(k, λ1, λ2|x1, ·xn)dλ1dλ2 −∞ −∞

  −3/2 ∞ 2λ2 p ∗ const ∗ R 1 + 1 Qn Φ(λ x )dλ , k = n  −∞ π2 i=1 1 i 1  4   −3/2  ∞ 2λ2 h(k|x) = 1−p R 1 Qk n−1 ∗ const. ∗ −∞ 1 + π2 i=1 Φ(λ1xi)dλ1∗  4   −3/2  ∞ 2λ2 n R 1 + 2 Q Φ(λ x )dλ , 1 ≤ k ≤ n − 1.  −∞ π2 i=k+1 2 i 2 4

Suppose instead of using the approximation Jeffrey’s prior for the shape param-

1 eter, we use the improper priors (non-informative) g(λi) (i = 1, 2) for λ1 and ∝ λi

λ2 and assume independence for λ1 and λ2. Then the marginal posterior density for the change point k is given by

  R ∞ 1 Qn p ∗ const. ∗ Φ(λ1xi)dλ1, k = n  −∞ λ1 i=1  1−p R ∞ 1 Qk R ∞ 1 Qn h(k|x) = ∗ const. ∗ Φ(λ1xi)dλ1 ∗ Φ(λ2xi)dλ2,  n−1 −∞ λ1 i=1 −∞ λ2 i=k+1   1 ≤ k ≤ n − 1 (3.2.15)

To detect the change point, we can use numerical approach. 60 Proof of Approximation of Jeffrey’s Prior

The Jeffrey’s prior is given by:

s Z φ2(λx) f J (λ) ∝ pI(λ) = 2x2φ(x) dx, Φ(λx)

R 2 φ2(λx) where I(λ) = 2x φ(x) Φ(λx) dx is the fisher information value. In this section we will prove that the Jeffrey’s prior distribution for λ is approximately equal to

π2 student’s t distribution (t(0, 4 , 1/2)). Consider the approximation

1 φ(x) 1  x2  ≈ √ exp − , π pΦ(x)(1 − Φ(x)) 2π(π/2) 2(π2/4) which can be rewritten as:

 2  2 φ(x) π x 2 − 2x ≈ √ exp − = √ e π2 . (3.2.16) pΦ(x)(1 − Φ(x)) 2π(π/2) 2(π2/4) 2π

Now the fisher information value can be written as:

Z ∞ φ2(λx) Z 0 φ2(λx) Z ∞ φ2(λx) I(λ) = 2x2φ(x) dx = 2x2φ(x) dx + 2x2φ(x) dx −∞ Φ(λx) −∞ Φ(λx) 0 Φ(λx) Z ∞ φ2(λx) Z ∞ φ2(−λx) = 2x2φ(x) dx − 2(−x)2φ(−x) d(−x) 0 Φ(λx) 0 Φ(−λx) Z ∞ φ2(λx) Z ∞ φ2(λx) = 2x2φ(x) dx + 2x2φ(x) dx 0 Φ(λx) 0 [1 − Φ(λx)] Z ∞ φ2(λx) = 2x2φ(x) dx. 0 Φ(λx)[1 − Φ(λx)] 61 1 let z = λx, then dx = λ dz. Substitute this in the above equation we obtain

Z ∞ z z φ2(z) 1 = 2( )2φ( ) dz 0 λ λ Φ(z)[1 − Φ(z)] λ Z ∞ 2 2 2 z 2 − 2z 2 √ π2 = 3 z φ( )( e ) dz λ 0 λ 2π Z ∞ 2 4 2 z − 4z π2 = 3 z φ( )e dz λ π 0 λ Z ∞ 2 2 4 2 1 − z − 4z √ λ2 π2 = 3 z e e dz λ π 0 2π Z ∞ 2 4 2 1 − z [ 1 + 8 ] √ 2 λ2 π2 = 3 z e dz λ π 0 2π 2 1 8 −1/2 Z ∞ − z 4[ λ2 + π2 ] 2 1 2([ 1 + 8 ]−1/2)2 = z √ e λ2 π2 dz λ3π 1 8 −1/2 0 2π[ λ2 + π2 ] 4[ 1 + 8 ]−1/2 [ 1 + 8 ] = λ2 π2 λ2 π2 λ3π 2 4 1 + 8 −3/2 = λ2 π2 2λ3π 2 1 8 = [ + ]−3/2 λ3π λ2 π2 2 1  2λ2 −3/2 = ( )−3/2 1 + λ3π λ2 π2/4 2  2λ2 −3/2 = 1 + . π π2/4

Therefore

!−3/2 Z ∞ φ2(λx) 2 2λ2 I(λ) = 2x2φ(x) dx ≈ 1 + . Φ(λx) π π2 −∞ 4

Hence

!−3/4 rπ 2λ2 π2 f J (λ) ≈ 1 + t(0, , 1/2). 2 π2 ≈ 4 4 62 3.3 Simulation

In this section we conduct the simulation of the power to illustrate the performance of the SIC test under the alternative hypothesis. We conduct simulations 1000 times under SN(λ) with different values of λ1, λ2 = 0, ±1, ±2, ±3, ±4, ±5, ±6 and sample sizes n = 100, 150, 200 with different change locations. The results are shown on

Tables 3.2, 3.3 and 3.4. We observe that when the difference between λ1 and λ2 is small the power of the test is low. As the differences increase, we see a corresponding increase in the power. For example, in Table 3.2 for sample size n = 100, k = 30,

λ1 = 1, λ2 = 0 the power of the SIC test is 0.2933 and for λ1 = 1, λ2 = −6, the power is 0.8467. We also observe that as the sample size increases the power of the SIC test also increases.

3.4 Application

In this section, we will apply our method to a real data. This real data set is about the Brazil stock change, which was recorded weekly from October 31, 1995 to October 31, 2000. This time series data consist of a sample size of 263 observations. Note that the weekly stock return may not be independent, so, we first transform the data into Rt (return rate) series by

Pt+1 − Pt Rt = , t = 1, 2, ··· , 262 (3.4.1) Pt

th where Pt is stock return values at t week. Hsu (1977) provided different methods to verify its independence. Later in Chapter 4 we check the independence and normality for Rt. We assume the Rt series data are independent and identically 63

Table 3.2: Power Simulation for SN(λ) with n = 100, 150 and 200

n=100 λ2/λ1 1 2 3 4 5 6 k=30 0 0.293 0.460 0.583 0.627 0.677 0.703 -1 0.573 0.563 0.620 0.697 0.713 0.700 -2 0.670 0.750 0.750 0.847 0.807 0.823 -3 0.763 0.820 0.820 0.853 0.883 0.863 -4 0.807 0.870 0.907 0.853 0.863 0.890 -5 0.783 0.910 0.860 0.893 0.923 0.907 -6 0.847 0.903 0.923 0.910 0.907 0.897

k=50 0 0.253 0.470 0.640 0.597 0.650 0.670 -1 0.527 0.580 0.663 0.677 0.723 0.710 -2 0.680 0.737 0.823 0.847 0.827 0.843 -3 0.740 0.810 0.860 0.860 0.853 0.890 -4 0.790 0.857 0.880 0.880 0.883 0.863 -5 0.790 0.883 0.920 0.920 0.917 0.907 -6 0.823 0.890 0.907 0.907 0.940 0.930

k=70 0 0.290 0.483 0.607 0.663 0.660 0.670 -1 0.520 0.603 0.650 0.727 0.723 0.663 -2 0.683 0.753 0.800 0.807 0.823 0.833 -3 0.733 0.827 0.867 0.853 0.883 0.863 -4 0.846 0.863 0.863 0.883 0.907 0.887 -5 0.836 0.883 0.907 0.927 0.913 0.933 -6 0.840 0.880 0.903 0.913 0.943 0.947

k=80 0 0.247 0.463 0.623 0.583 0.630 0.627 -1 0.487 0.697 0.703 0.613 0.690 0.690 -2 0.630 0.720 0.787 0.790 0.803 0.803 -3 0.720 0.817 0.863 0.853 0.840 0.863 -4 0.797 0.833 0.877 0.870 0.903 0.913 -5 0.840 0.883 0.907 0.907 0.910 0.907 -6 0.810 0.887 0.933 0.940 0.910 0.930 64

Table 3.3: Power Simulation Cont. n=150 λ2/λ1 1 2 3 4 5 6 k=30 -1 0.523 0.660 0.680 0.710 0.713 0.663 -2 0.673 0.783 0.807 0.857 0.833 0.800 -3 0.703 0.833 0.863 0.830 0.907 0.867 -4 0.777 0.876 0.883 0.907 0.917 0.910 -5 0.830 0.863 0.893 0.893 0.947 0.907 -6 0.800 0.887 0.880 0.910 0.930 0.937 k=75 -1 0.590 0.663 0.677 0.690 0.703 0.713 -2 0.703 0.776 0.807 0.813 0.817 0.833 -3 0.777 0.853 0.883 0.853 0.870 0.857 -4 0.820 0.866 0.873 0.863 0.910 0.897 -5 0.847 0.910 0.940 0.957 0.950 0.907 -6 0.840 0.883 0.923 0.913 0.920 0.930 k=100 -1 0.523 0.643 0.650 0.720 0.700 0.710 -2 0.720 0.810 0.780 0.820 0.800 0.847 -3 0.800 0.857 0.850 0.880 0.903 0.893 -4 0.707 0.833 0.890 0.890 0.913 0.913 -5 0.820 0.887 0.903 0.907 0.910 0.903 -6 0.857 0.880 0.917 0.910 0.933 0.927 k=130 -1 0.483 0.637 0.663 0.693 0.727 0.673 -2 0.637 0.790 0.813 0.810 0.820 0.833 -3 0.756 0.830 0.817 0.843 0.870 0.880 -4 0.777 0.860 0.913 0.850 0.900 0.903 -5 0.857 0.907 0.883 0.903 0.917 0.927 -6 0.810 0.920 0.927 0.800 0.903 0.943 65

Table 3.4: Power Simulation Cont. n=200 λ2/λ1 1 2 3 4 5 6 k=20 0 0.280 0.520 0.573 0.643 0.623 0.653 -1 0.527 0.626 0.653 0.700 0.700 0.763 -2 0.673 0.783 0.760 0.800 0.853 0.817 -3 0.743 0.753 0.853 0.867 0.867 0.860 -4 0.757 0.857 0.863 0.893 0.903 0.903 -5 0.760 0.860 0.890 0.913 0.937 0.917 -6 0.820 0.880 0.877 0.926 0.933 0.920 k=60 0 0.240 0.520 0.767 0.620 0.677 0.700 -1 0.556 0.663 0.703 0.676 0.680 0.693 -2 0.677 0.693 0.776 0.833 0.816 0.807 -3 0.757 0.803 0.833 0.880 0.887 0.873 -4 0.847 0.873 0.653 0.930 0.893 0.897 -5 0.807 0.870 0.923 0.903 0.927 0.847 -6 0.856 0.897 0.943 0.913 0.940 0.953 k=100 0 0.346 0.560 0.613 0.746 0.337 0.680 -1 0.510 0.613 0.707 0.717 0.733 0.767 -2 0.710 0.793 0.833 0.817 0.817 .8333 -3 0.753 0.823 0.900 0.846 0.867 0.857 -4 0.816 0.863 0.893 0.923 0.903 0.913 -5 0.807 0.897 0.917 0.930 0.910 0.937 -6 0.83 0.886 0.963 0.948 0.937 0.947 k=140 0 0.350 0.560 0.530 0.613 0.660 0.690 -1 0.860 0.633 0.680 0.653 0.747 0.740 -2 0.717 0.780 0.833 0.797 0.823 0.870 -3 0.783 0.843 0.833 0.887 0.873 0.890 -4 0.760 0.843 0.897 0.903 0.937 0.873 -5 0.830 0.890 0.917 0.867 0.933 0.923 -6 0.860 0.927 0.927 0.913 0.927 0.943 k=170 0 0.263 0.497 0.547 0.623 0.570 0.610 -1 0.503 0.593 0.657 0.700 0.753 0.750 -2 0.677 0.733 0.817 0.803 0.777 0.830 -3 0.763 0.807 0.870 0.833 0.873 0.877 -4 0.780 0.870 0.907 0.903 0.880 0.930 -5 0.833 0.887 0.883 0.893 0.763 0.730 -6 0.860 0.837 0.877 0.810 0.737 0.833 66 distributed (iid) from a SN(λ), and test the following hypothesis:

H0 : λ1 = λ2 = ··· = λ262 = λ (unknown) versus the alternative:

H1 : λ1 = ··· = λk1 6= λk1+1 = ··· = λk2 6= · · ·= 6 λkq+1 = ··· = λ262

where 1 < k1 < k2 < ··· < kq < 262 are unknown positions of change points and q is the unknown number of change points. We apply the binary segmentation procedure along with the SIC test statistics to detect all the possible change points and their locations. We compute the SICt(n) and the SICt(k) for the stock return

Weekly Stock Return for Brazil Weekly Return Rate for Brazil Return Rate Stock Return 400 800 1200 −0.2 0.0 0.2 0 50 100 150 200 250 0 50 100 150 200 250 Time Time

Figure 3.1: The Graph of the time series data for the weekly stock returns and return rate for Brazil with the corresponding change points respectively.

rate (Rt) at different stages and obtain the following results:

• At the first stage we consider the whole data set t from 1 to 262 and test

for change point. We obtain the SICt(n) = SICt(262) = −784.2962 and 67

min SICt(k) = SIC(88) = −836.6715 and SICt(262) > min SIC(k). If 5≤k≤257 5≤k≤257

we used the Table 3.1 we still observe that SICt(262) > SICt(88) + cα for

th all cα. Thus there is a change point at 88 position for the stock return rate which corresponds to the 89th position for the weekly stock return. This change occur on July 11, 1997, which indicate that the change was due to the 1997 Asian financial crises.

• At the second stage, we consider the first 87 return rate that is t from 1 to 87, and obtain the SIC(n) = SIC(87) = −362.1493 and min SIC(k) = 4≤k≤83 SIC(18) = −373.0621. We observe that SIC(n) > min SIC(k), however 4≤k≤83 the difference is 10.9128, so to make a decision we used Table 3.1, for α ≥ 0.05

there is a change point because SIC(n) > min SIC(k)+cα. Here the change 4≤k≤83 occurs on the 19th position on March 8, 1996, the change may have been as a result of the Mexico 1995 crises.

• At the third stage , we consider the second half of the data t from 89 to 262 ( here n =174). The SIC(n) = SIC(174) = −462.6885 and min SIC(k) = 5≤k≤169 SIC(151) = −479.8763, and SIC(174) > SIC(151). Now using the Table 3.1,

we observe that SIC(n) > min SIC(k) + cα. Hence there is a change point 5≤k≤169 at 239th which corresponds to the 240th position, on June 2000, this change was due to the 2000 Dot-com-bubble, the collapse of a technology bubble.

• At the fourth stage we check for change point in the mid section of the data t from 89 to 238 (n=150). The SIC(n) = SIC(150) = −383.8182 and min SIC(k) = SIC(88) = −392.3135.SIC(n) > min SIC(k), but the 5≤k≤145 5≤k≤145 difference is 8.4953. So we used Table 3.1 to determine if there is a change point or not. We observe that there is a change point at the 176th position for 68 α = 0.05. This change occurred on March 19, 1999.

• Finally, we check the change point for the data for t from 89 to 175 ( n=87). The SIC(n) = −199.6188 and min SIC(k) = SIC(55) = −214.0178. The 4≤k≤83

SIC(n) > min SIC(k), using Table 3.1, for any α ≥ 0.025, cα > 0 and 4≤k≤83 th SIC(n) > min SIC(k) + cα. Hence there is a change point at 143 position 4≤k≤83 which corresponds to the 144th position in the stock return market data. This change occurs on July 31, 1998, it may have been caused by the 1998 Russian crises.

Therefore, using our testing procedure with nominal level α = 0.05, we detect five possible change points in the Brazilian stock return market data sets. These changes are located on 19th , 89th , 144th, 177th and 240th positions, which occurred on March 8th , 1996, July 11, 1997, July 31, 1998, March 19, 1999 and June 2, 2000 respectively. These changes may have been caused by the 1997 Asian financial crises, 1998 Russian financial crises and the 2000 Dot-com bubble crises. The graphs of stock return market and the transformed data Rt with corresponding change points are given in Figure 3.1.

3.5 Conclusion

In this chapter, the change point problem for the shape parameter of SN(λ) is investigated. We propose a test based on the SIC testing procedure to test changes of the shape parameter and estimate the change location. Simulations are conducted to illustrate the performance of the proposed test. The critical values for various sample sizes and nominal levels are computed for the adjustified SIC test statistic. The proposed testing procedure is applied to analyze the Brazilian stock exchange 69 market. Two alternative testing procedures based on the likelihood ratio test (LRT) and Bayesian method are briefly introduced. 70

CHAPTER 4

CHANGE POINT PROBLEM FOR GENERAL SKEW NORMAL DISTRIBUTION

In this chapter we focus on the change point problem of the general skew normal distribution SN(µ, σ, λ). First we apply the Schwartz information criterion (SIC) method to test the simultaneous change of the location and scale parameter with the assumption that the shape parameter is fixed but unknown. Next we consider change point problem for all three parameter. Simulations of power are also conducted to investigate the performance of the test statistics. Lastly we apply the proposed tests statistics to real data sets. 71 4.1 Location and Scale Change

In this section we consider testing the location and scale change problem using SIC method. Let x1, ··· xn be a sequence of independently random variables from a skew normal distribution SN(µ, σ, λ) with parameters (µ1, σ1, λ), (µ2, σ2, λ), ··· , (µn, σn, λ), respectively. Assume that the shape parameter is constant but unknown and needs to be estimated. Consider testing the following hypotheses,

H0 : µ1 = µ2 = ··· = µn = µ; σ1 = σ2 = ··· = σn = σ, (4.1.1) where µ and σ are unknown, versus the alternative:

H1 : µ1 = ··· = µk1 6= µk1+1 = ··· = µk2 6= · · ·= 6 µkq+1 = ··· = µn,

σ1 = ··· = σk1 6= σk1+1 = ··· = σk2 6= · · ·= 6 σkq+1 = ··· = σn,

where 1 < k1 < k2 < ··· < kq < n are the unknown change point positions to be estimated and there are q unknown change points. We apply the binary segmentation procedure to test the above hypothesis. So we will test and estimate the position of a single change point at each stage. That is, we will test the following hypotheses: (4.1.1) versus

H1 : µ1 = µ2 = ··· = µk 6= µk+1 = µk+2 = ··· = µn (4.1.2) | {z } | {z } µ1 µ2

σ1 = σ2 ··· = σk 6= σk+1 = σk+2 = ··· = σn (4.1.3) | {z } | {z } σ1 σ2 72 where 1 < k < n , and k is the unknown position of the change point.

4.1.1 Information Approach (SIC)

The likelihood function for the above hypothesis is given as:

n Y 2 xi − µ xi − µ L (µ, σ, λ)) = φ( )Φ(λ ) H0 σ σ σ i=1

k n Y 2 xi − µ1 xi − µ1 Y 2 xi − µn xi − µn LH1 (µ1, µ2, σ1, σ2, λ) = φ( )Φ(λ ) φ( )Φ(λ ) σ1 σ1 σ1 σ2 σ2 σ2 i=1 i=k+1

The log likelihood functions :

n   X xi − µ xi − µ l (µ, σ, λ)) = n log 2 − n log(σ) + log φ( ) + log Φ(λ ) H0 σ σ i=1 k   X xi − µ1 xi − µ1 l (µ , µ , σ , σ , λ) = n log 2 − k log(σ ) + log φ( ) + log Φ(λ ) H1 1 2 1 2 1 σ σ i=1 1 1 n   X xi − µ2 xi − µ2 − (n − k) log(σ2) + log φ( ) + log Φ(λ ) σ2 σ2 i=k+1

To obtain the MLE for µ, µ1, µ2, σ, σ1, σ2 and λ, we first take the derivative of the log-likelihood function with respect to the parameters and set the equations equal to zero.

n  0 xi−µ xi−µ  n  xi−µ  ∂lH (µ, σ, λ) X 1 φ ( ) λ φ(λ ) X (xi − µ) λ φ(λ ) 0 = − σ − σ = − σ = 0. ∂µ σ φ( xi−µ ) σ Φ(λ xi−µ ) σ2 σ Φ(λ xi−µ ) i=1 σ σ i=1 σ (4.1.4) 73 n  0 xi−µ xi−µ  ∂lH (µ, σ, λ) X x − µ φ ( ) λ(x − µ) φ(λ ) 0 = − σ − σ (4.1.5) ∂σ σ2 φ( xi−µ ) σ2 Φ(λ xi−µ ) i=1 σ σ n  2 xi−µ  X (xi − µ) λ(x − µ) φ(λ ) = − σ = 0. (4.1.6) σ3 σ2 Φ(λ xi−µ ) i=1 σ

n  xi−µ  ∂lH (µ, σ, λ) X (x − µ) φ(λ ) 0 = σ = 0 (4.1.7) ∂λ σ Φ(λ xi−µ ) i=1 σ

Similarly we have

k xi−µ1 ! ∂lH X (xi − µ1) λ φ(λ σ ) 1 = − − 1 = 0, (4.1.8) 2 xi−µ1 ∂µ1 σ σ1 Φ(λ ) i=1 1 σ1

n xi−µ2 ! ∂lH X (xi − µ2) λ φ(λ σ ) 1 = − − 2 = 0, (4.1.9) 2 xi−µ2 ∂µ2 σ σ2 Φ(λ ) i=1 2 σ2

k 2 xi−µ1 ! ∂lH (µ1, µ2, σ1, σ2, λ) X (xi − µ1) λ(x − µ1) φ(λ σ ) 1 = − 1 = 0, (4.1.10) 3 2 xi−µ1 ∂σ1 σ σ Φ(λ ) i=1 1 1 σ1

n 2 xi−µ2 ! ∂lH (µ1, µ2, σ1, σ2, λ) X (xi − µ2) λ(x − µ2) φ(λ σ ) 1 = − 2 = 0, (4.1.11) 3 2 xi−µ2 ∂σ2 σ σ Φ(λ ) i=k+1 2 2 σ2

k xi−µ1 ! n xi−µ2 ! ∂lH (µ1, µ2, σ1, σ2, λ) X (x − µ1) φ(λ σ ) X (x − µ2) φ(λ σ ) 1 = 1 + 2 = 0. xi−µ1 xi−µ2 ∂λ σ1 Φ(λ ) σ2 Φ(λ ) i=1 σ1 i=k+1 σ2 74

Secondly, we solve equations (4.1.7), to (4.1.11) to obtain the MLE for µ, µ1, µ2,

σ1, σ2 and λ. However, there are no explicit form for the solutions to these equations, thus the numerical approach ( R package sn, version 0.4-7 by Azzalini, 2011) will be ˆ applied to obtain the MLE for these parameters. Letµ, ˆ µˆ1, µˆ2, σˆ1, σˆ2 λ represent the MLE for µ, µ1, µ2, σ1, σ2 and λ respectively. Under the null hypothesis, the SIC model is given by,

ˆ SICt(n) = −2 log L(ˆµ, σ,ˆ λ) + t log n, (4.1.12)

where t = 3 is the number of parameters in the model under H0. Under the alternative hypothesis, the SIC model is given by

ˆ SICt(k) = −2 log L(ˆµ1, µˆ2, σˆ1, σˆn, λ) + t log n, (4.1.13)

where t = 5 is the number of parameters in the model under H1 and we choose [log n] ≤ k ≤ n − [log n]. Thus we reject the null hypothesis if

SICt(n) > min SICt(k), [log n]≤k≤n−[log n]

ˆ ˆ and conclude that there is a change point at the k such that SICt(k) = min SICt(k). [log n]≤k≤n−[log n]

Theorem 4.1.1. (Cs¨org¨oand Horv´ath(1997)) Under the null hypothesis, for all x  IR,

−x lim P [a(log n)λn − b(log n) ≤ x] = exp{−2e }, (4.1.14) n→∞ 75 where a(log n) = (2 log log n)1/2 , b(log n) = 2 log log n + log log log n, and

2 n ˆ ˆ o λn = max 2 log L(µ ˆ1, µˆ2, σˆ1, σˆ2, λ) − 2 log L(ˆµ, σ,ˆ λ) . [log n]≤k≤n−[log n]

As we mention in chapter 3, the small difference between SICt(k) and SICt(n) may be a result of data fluctuation and in fact there is no change point. To make our conclusion more statistically convincing, we introduce a significance level α and the corresponding critical value cα and we conclude there is a change point if

SICt(n) > min SICt(k) + cα, [log n]≤k≤n−[log n]

where cα can be computed by

  1 − α = P SICt(n) < min SICt(k) + cα|H0 . (4.1.15) [log n]≤k≤n−[log n]

Thus , from (4.1.12) and (4.1.13) we have, 76

  1 − α = P SIC(n) < min SIC(k) + cα|H0 [log n]≤k≤n−[log n]   = P SIC(n) − min SIC(k) < cα|H0 [log n]≤k≤n−[log n]   = P max (SIC(n) − SIC(k)) < cα|H0 [log n]≤k≤n−[log n]    ˆ ˆ  = P max −2(log L(ˆµ, σ,ˆ λ) − log L(µ ˆ1, µˆ2, σˆ1, σˆ2, λ)) − 2 log n < cα|H0 [log n]≤k≤n−[log n]  2  = P λn < 2 log n + cα|H0

 2  = P 0 < λn < 2 log n + cα|H0

h 1 i = P 0 < λn < (2 log n + cα) 2 |H0

h 1 i = P −b(log n) < a(log n)λn − b(log n) < a(log n)(2 log n + cα) 2 − b(log n)|H0

h 1 i = P (a(log n)λn − b(log n) < a(log n)(2 log n + cα) 2 − b(log n)

− P [a(log n)λn − b(log n) < −b(log n)] .

Now with the approximation in Theorem 4.1.1 we solve cα as follows.

∼ n n 1 oo 1 − α = exp −2 exp a(log n)(2 log n + cα) 2 − b(log n) − exp {−2 exp {b(log n)}}

∼ n n 1 oo ⇒ 1 − α + exp {−2 exp {b(log n)}} = exp −2 exp a(log n)(2 log n + cα) 2 − b(log n)

− 1 1 2 ∼ ⇒ log log [1 − α + exp {−2 exp {b(log n)}}] = a(log n)(2 log n + cα) 2 − b(log n)  2 −1 − 1 b(log n) ⇒ c ∼= log log [1 − α + exp {−2 exp {b(log n)}}] 2 + − 2 log n α a(log n) a(log n)

Adjusted critical values for different value of sample size with given nominal values are given in Table 4.1. 77

Table 4.1: Critical values with α and Sample size n

n α=0.01 α=0.025 α=0.05 α= 0.1 7 35.69935 19.63085 12.90938 7.757992 8 25.97584 17.23230 11.92526 7.404845 9 23.94784 16.42328 11.54044 7.262061 10 23.07060 15.99423 11.31283 7.168499 11 22.52369 15.69148 11.13858 7.087391 12 22.10831 15.44547 10.98893 7.010367 13 21.76289 15.23288 10.85445 6.935751 14 21.46347 15.04386 10.73120 6.863355 15 21.19818 14.87308 10.61709 6.793235 16 20.95987 14.71714 10.51070 6.725433 17 20.74363 14.57361 10.41098 6.659935 18 20.54582 14.44062 10.31712 6.596686 19 20.36366 14.31671 10.22843 6.535604 20 20.19494 14.20073 10.14437 6.476595 21 20.03788 14.09171 10.06445 6.419556 22 19.89103 13.98886 9.988275 6.364386 23 19.75319 13.89152 9.915503 6.310986 24 19.62336 13.79911 9.845834 6.259258 25 19.50068 13.71117 9.779008 6.209112 26 19.38444 13.62728 9.714797 6.209112 27 19.27401 13.54708 9.652998 6.113227 28 19.16885 13.47026 9.593433 6.067332 29 19.06850 13.39655 9.535943 6.022706 30 18.97255 13.32569 9.480385 5.979285 35 18.54758 13.00757 9.227490 5.778242 40 18.19266 12.73666 9.007971 5.599685 45 17.88832 12.50071 8.813923 5.439112 50 17.62215 12.29170 8.639973 5.293224 55 17.38579 12.10408 8.482294 5.159545 60 17.17331 11.93387 8.338068 5.036173 65 16.98042 11.77811 8.205151 4.921615 70 16.80384 11.63453 8.081879 4.814683 80 16.49016 11.37717 7.859242 4.620012 90 16.21778 11.15145 7.662302 4.446292 100 15.97721 10.95041 7.485684 4.289397 120 15.56699 10.60421 7.179053 4.014778 140 15.22548 10.31289 6.918813 3.779721 150 15.07403 10.18286 6.802049 3.673718 160 14.93309 10.06140 6.692662 3.574131 180 14.67758 9.840132 6.492633 3.391355 200 14.45073 9.642588 6.313270 3.226777 300 13.59074 8.885006 5.619338 2.584701 78 4.1.2 Power Simulation

In this section, power simulation is conducted to illustrate the performance of the proposed testing procedure for different changes of location and scale. We conducted simulation 1000 times under SN(µ, σ, 1) with different change point location k, sample sizes, n = 100, 150 and 200 and location and scale parameters (µ1 = σ1) =

1, 2, 3 and (µ2 = σ2) = 2, 3, 4, 5, 6. We notice that as the difference between the parameters increase, the power of the test also increase. For instant, for sample size n = 100, k = 20, for (µ1, σ1/µ2, σ2) = (1, 2) , the power is 0.597 meanwhile for

(µ1, σ1/µ2, σ2) = (1, 4) the power is 0.917. We observe that the power of the test is within an acceptable range. The simulation also indicates that the proposed testing procedure control type I error within a given nominal level. The results are given in Table 4.2.

4.2 Application to Biomedical Data

We applied the SIC method to test the change points in “the array Comparative Genomic Hybridization” (aCGH) data set, see Snijders et al. (2001) for more de- tails. We consider the Chromosome 4 of the fibroblast cell line GM13330. This chromosome consists of 167 genomic positions on which log base 2 ratio of the in- tensities were recorded. Using the test criteria in (4.1.10) and (4.1.11), we compute the SIC for all the genomic positions. The value of SICt(n) = −55.86854 and the

min SICt(k) = SICt(150) = −301.2888. Clearly we observe that the SICt(n) is 6≤k≤163 th larger than the minimum of SICt(k). The minimum SIC occurs at position 150 position. The graphs of the SIC values and the log base 2 ratio of the fibroblast cell are given in Figure 4.1. We observe that the change point is visible in Figure 79

Table 4.2: Power Simulation for SN(µ, σ, λ) with n = 100, 150, 200 and different values of µ1, σ1 and µ2, σ2

n=100 µ1, σ1/µ2, σ2 2 3 4 5 6 k=20 1 0.597 0.783 0.917 0.930 0.933 2 0.050 0.240 0.530 0.740 0.780 3 0.250 0.050 0.353 0.760 0.653

k=50 1 0.597 0.820 0.927 0.967 0.9567 2 0.050 0.300 0.643 0.757 0.830 3 0.433 0.050 0.220 0.407 0.603

k=75 1 0.723 0.923 0.933 0.987 0.977 2 0.050 0.397 0.693 0.883 0.883 3 0.038 0.050 0.693 0.543 0.570

n=150 k=50 1 0.603 0.860 0.907 0.930 0.960 2 0.050 0.435 0.790 0.753 0.840 3 0.300 0.050 0.593 0.437 0.697

k=75 1 0.690 0.787 0.940 0.953 0.970 2 0.050 0.443 0.623 0.737 0.827 3 0.293 0.050 0.257 0.487 0.617 k=120 1 0.650 0.867 0.930 0.967 0.970 2 0.050 0.200 0.630 0.800 0.867 3 0.180 0.050 0.300 0.480 0.603

n=200 k=20 1 0.490 0.810 0.903 0.940 0.950 2 0.050 0.307 0.597 0.710 0.813 3 0.423 0.050 0.177 0.477 0.760

k=50 1 0.603 0.860 0.907 0.930 0.960 2 0.050 0.443 0.790 0.753 0.840 3 0.250 0.050 0.593 0.437 0.697

k=100 1 0.600 0.790 0.890 0.950 0.970 2 0.050 0.397 0.680 0.800 0.893 3 0.278 0.050 0.516 0.677 0.780 80 4.1 at the 150th position. This result matches the one obtained by Chen and Gupta (2012). log(T/R) SIC Values −1.0 0.0 1.0 −3000 −150 −50 50 100 150 0 50 100 150 Position Index Position Index

Figure 4.1: Left: The SIC values for every locus on chromosome 4 of the fibroblast cell line GM13330; Right: Chromosome 4 of the fibroblast cell line GM13330.

4.3 The Change of Location, Scale and Shape

In this section, we will consider testing the simultaneous changes of location, scale and shape parameters for skew normal distribution. Suppose x1, ··· xn is a sequence of independent random variable from skew normal distribution with parameters

(µ1, σ1, λ1), (µ2, σ2, λ2), ··· , (µn, σn, λn) respectively. Consider testing the following hypothesis:

H0 : µ1 = µ2 = ··· = µn = µ;

and σ1 = σ2 = ··· = σn = σ;

and λ1 = λ2 = ··· = λn = λ, 81 versus the alternative:

H1 : µ1 = ··· = µk1 6= µk1+1 = ··· = µk2 6= · · ·= 6 µkq+1 = ··· = µn,

and σ1 = ··· = σk1 6= σk1+1 = ··· = σk2 6= · · ·= 6 σkq+1 = ··· = σn,

and λ1 = ··· = λk1 6= λk1+1 = ··· = λk2 6= · · ·= 6 λkq+1 = ··· = λn,

where 1 < k1 < k2 < ··· < kq < n are the unknown change point positions to be estimated.

4.3.1 Test Statistics

The likelihood is defined as,

n Y 2 xi − µ xi − µ L (µ, σ, λ) = φ( )Φ(λ ), H0 σ σ σ i=1 k n Y 2 xi − µ1 xi − µ1 Y 2 xi − µn xi − µn LH1 (µ1, µ2, σ1, σ2, λ1, λ2) = φ( )Φ(λ1 ) φ( )Φ(λ2 ). σ1 σ1 σ1 σ2 σ2 σ2 i=1 i=k+1

The SIC test statistics are given as follows.

Under the null hypothesis H0

ˆ SICt(n) = −2 log L(ˆµ, σ,ˆ λ) + t log n, (4.3.1)

where t = 3 is the number of parameters in the model under H0.

Under the alternative hypothesis H1, the SIC is defined as:

ˆ ˆ SICt(k) = −2 log L(ˆµ1, µˆ2, σˆ1, σˆ2, λ1, λ2) + t log n, (4.3.2) 82 where t = 6 is the number of parameters in the model under H1, [log n] ≤ k ≤ n − ˆ ˆ [log n] and (ˆµ1, µˆ2, σˆ1, σˆ2, λ1, λ2) are the MLE of the parameters µ1, µ2, σ1, σ2, λ1, λ2. The R package sn version 0.4-7 (Azzalini, 2011) is applied to compute these esti- mates. Thus we reject the null hypothesis if

SICt(n) > min SICt(k), (4.3.3) [log n]≤k≤n−[log n]

ˆ ˆ where k is the change point position such that SICt(k) = min SICt(k). [log n]≤k≤n−[log n]

Theorem 4.3.1. (Cs¨org¨oand Horv´ath(1997)). Let

n  o 2 ˆ ˆ ˆ λn = max −2 log L(ˆµ, σ,ˆ λ) − log L(ˆµ1, µˆ2, σˆ1, σˆ2, λ1, λ2) , [log n]≤k≤n−[log n]

Under the null hypothesis, as n → ∞, then

−x lim P [a(log n)λn − b(log n) ≤ x] = exp{−2e }, (4.3.4) n→∞

1/2 3 3 where a(log n) = (2 log log n) and b(log n) = 2 log log n + 2 log log n − log Γ( 2 ).

Approximation of Critical Values cα

As mention in Chapter 3, we introduce a significant level α and compute the cor- responding critical values cα. Thus instead of concluding there is a change point based on (4.3.3), we will reject the null hypothesis if

SICt(n) > min SICt(k) + cα. (4.3.5) [log n]≤k≤n−[log n] 83

Here α and cαhave the following relationship:

  1 − α = P SICt(n) < min SIC(k) + cα|H0 (4.3.6) [log n]≤k≤n−[log n]

Now using (4.2.5) and (4.2.7) we compute cα as follows:

  1 − α = P SICt(n) < min SICt(k) + cα|H0 [log n]≤k≤n−[log n]   = P max (SICt(n) − SICt(k)) < cα|H0 [log n]≤k≤n−[log n]    ˆ ˆ ˆ  = P max −2(log L(ˆµ, σ,ˆ λ) − log L(ˆµ1, µˆ2, σˆ1, σˆ2, λ1, λ2)) − 3 log n < cα|H0 [log n]≤k≤n−[log n]  2  = P λn < 3 log n + cα|H0

h 1 i = P 0 < λn < (3 log n + cα) 2 |H0

h 1 i = P −b(log n) < a(log n)λn − b(log n) < a(log n)(3 log n + cα) 2 − b(log n)|H0

h 1 i = P a(log n)λn − b(log n) < a(log n)(3 log n + cα) 2 − b(log n)

− P [a(log n)λn − b(log n) < −b(log n)] .

Now with the approximation in Theorem 4.2.1, we solve for cα as follows.

∼ n n 1 oo 1 − α = exp −2 exp a(log n)(3 log n + cα) 2 − b(log n) − exp {−2 exp {b(log n)}}

∼ n n 1 oo ⇒ 1 − α + exp {−2 exp {b(log n)}} = exp −2 exp a(log n)(3 log n + cα) 2 − b(log n)

− 1 1 2 ∼ ⇒ log log [1 − α + exp {−2 exp {b(log n)}}] = a(log n)(3 log n + cα) 2 − b(log n)  2 −1 − 1 b(log n) ⇒ c ∼= log log [1 − α + exp {−2 exp {b(log n)}}] 2 + − 3 log n. α a(log n) a(log n)

Table 4.3 lists the approximated critical values for different sample sizes n and α. 84

Table 4.3: Approximate Critical values with α and Sample size n

n/α 0.010 0.025 0.050 0.100 10 20.9849 13.8935 9.1920 5.0242 11 20.5379 13.6583 9.0647 4.9705 12 20.1894 13.4555 8.9428 4.9064 13 19.8868 13.2691 8.8229 4.8350 14 19.6146 13.0948 8.7053 4.7591 15 19.3660 12.9308 8.5906 4.6809 16 19.1370 12.7760 8.4792 4.6018 17 18.9246 12.6295 8.3712 4.5226 18 18.7266 12.4904 8.2667 4.4442 19 18.5412 12.3580 8.1656 4.3667 20 18.3668 12.2319 8.0679 4.2905 25 17.6235 11.6768 7.6245 3.9220 30 17.0314 11.2171 7.2438 3.6122 35 16.5394 10.8248 6.9113 3.3260 40 16.1184 10.4826 6.6165 3.0682 45 15.7506 10.1792 6.3520 2.8340 50 15.4240 9.9066 6.1120 2.6196 60 14.8636 9.4324 5.6902 2.2389 70 14.3937 9.0294 5.3277 1.9085 80 13.9893 8.6787 5.0098 1.6166 90 13.6342 8.3683 4.7267 1.3552 100 13.3178 8.0899 4.4715 1.1184 125 12.6511 7.4978 3.9252 0.6087 150 12.1091 7.0119 3.4736 0.1848 175 11.6525 6.5996 3.0885 -0.1785 200 11.2581 6.2414 2.7527 -0.4963 250 10.6009 5.6409 2.1872 -1.0336 300 10.0655 5.1485 1.7214 -1.4778 85

Table 4.4: Power Simulation for SN(µ, σ, λ) with n = 100, 150 and 200

n k (µ2, σ2, λ2)/(µ1, σ1, λ1) (3,3,0) (4,4,-1) (5,5,-2) (3,4,-3) n=100 k=20 (2,2,1) 0.328 0.349 0.4762 0.620

k=50 (2,2, 1) 0.302 0.448 0.446 0.724

k=70 (2,2,1) 0.320 0468 0.490 0.714

n=150 k=75 (2,2,1) 0.32 0.476 0.502 0.766

k=100 (2,2,1) 0.328 0.588 0.604 0.792

n=200 k=20 (2,2,1) 0.264 0.460 0.501 0.770

k=100 (2,2,1) 0.386 0.426 0.602 0.802

k=140 (2,2,1) 0.410 0.580 0.603 0.822

4.3.2 Power Simulation

In this section, we conducted simulations 1000 times to investigate the performance of the proposed SIC test statistics. We conducted the simulation for sample size n = 100, 150, 200 with different values for the parameters. (µ1, σ1, λ1) =(2,2,1) and

(µ2, σ2, λ2) ={(3,3, 0), (4,4,-1), (5,5,-2), (3,4,-3)}. We observe that as the difference in the parameters increase, the power of the test also increases. For example, for n = 200, k = 100, and (µ1, σ1, λ1) = (2, 2, 1) , (µ2, σ2, λ2) = (3, 3, 0), the power is

0.386, meanwhile for (µ1, σ1, λ1) = (2, 2, 1) , (µ2, σ2, λ2) = (5, 5, −2) the power is 0.602. The power of the test in different scenarios are given in Table 4.4. 86 4.4 Applications to Latin American Emerging Mar-

ket Stock Returns

In this section, we apply the proposed SIC testing procedure to analyze the stock returns for four Latin American markets, namely, Argentinean, Brazilian, Chilean, and Mexican markets. The stock returns for each of these countries were recorded weekly from October 31, 1995 to October 31, 2000. For each data set, we let Pt be the stock return index values at week t. To guarantee independence in the time series data, we first transform the data into Rt series as follows.

Pt+1 − Pt Rt = for t = 1, 2 ··· , n. (4.4.1) Pt

Hsu (1979) proposed several methods to check independence and normality of the transformed data set. Here we use the portmanteau test statistic given by,

k X 2 Qk = n ri , (4.4.2) i=1

where ri are the coefficient (acf) at lag i, and the k is the lag up to which the autocorrelation coefficient function is considered, to test the independence in the Rt series. The portmanteau test statistics procedure is as follows:

• Use the acf (.) to find the graph.

• Let r = acf() or acf(.)[1 : k] to calculate the acf’s up to lag k.

Pk 2 2 Pk 2 2 • Calculate n i=1 ri and compare with χ (k). If n i=1 ri < χ1−α(k), accept

H0 independence. 87 4.4.1 Argentina Weekly Stock Market

We consider the time series data for Argentina weekly stock return, with sample size n = 222, and use (4.4.1) to transform the data into Rt series. We assume the Rt time series data are iid from a skew normal distribution. We will check independence and normality assumption later. The graph for this data set is given in Figure 4.2. We apply the binary segmentation procedure along with the SIC to test for possible change point in the data at each stage.

Stage 1. We first consider the whole data set, t from 1 to 222 and com- pute the SICt(n) and SICt(k). According to our computations, min SICt(k) = 5≤k≤216

SICt(102) = −737.6657 < SICt(221) = −721.5246. Using cα in Table 4.2, we still have SICt(221) > SICt(102) + cα. Hence there is a change point at t = 102 for the

th Rt series. Thus for the stock return, the change point occurs at the 103 position with stock return value of 1923.789 on October 17, 1997. This change may have occurred as a result of the 1997 Asian financial crises.

Stage 2. We consider the subsequences t from 1 to 101, and t from 103 to 221. Our results indicate no further change point in these subsequences. Hence we found only one change point in the Argentina stock return market data set. This change point is shown in Figure 4.2. 88 Weekly Stock Return Weekly Stock Return for Argentina Market Rate for Argentina Market Return Rate Stock Return 1000 1400 1800 0 50 100 150 200 −0.150 0.00 0.10 50 100 150 200 Time Time

Figure 4.2: The graphs of the time series data for the weekly stock returns and return rate Rt for Argentina market with the corresponding change points.

Diagnostic of Independence and Normality

We apply the portmanteau test (4.4.2) to verify independence in Rt series data for Argentinean stock return rate. We obtain the following results:

23 X 2 2 Q23 = 221 ∗ ri = 13.15613 < χ.95(23) = 35.172. i=1

Hence we fail to reject the H0, and conclude that the Rt series data are independent.

The left panel of Figure 4.3 shows the graph of the acf values of the Rt data, which indicates that the data are uncorrelated. We notice that the normality assumption is violated from the normal Q-Q plot on the right panel of Figure 4.3. Therefore the skew normal distribution is appropriate for the . 89

Series 1 Normal Q−Q Plot ACF 0.0 0.4 0.8 Sample

0 5 10 15 20 −0.15−3 0.00 0.15 −2 −1 0 1 2 3 Lag Theoretical Quantiles

Figure 4.3: Left: The graph of the acf values of the transformed data Rt ; Right: Test for normality.

4.4.2 Brazilian Stock Return

We apply the binary segmentation method along with the SIC procedure to test for possible change points in the Brazil stock return market. The data set consist of

263 observations, we first transform the data into Rt time series data. We assume

Rt series are independent and identically distributed from skew normal distribution. Our computations yield the following results at various stages:

Stage 1. Consider the sequence t from 1 to 262, the SICt(262) = −784.2962 >

min SICt(k) = SICt88 = −825.5348. Using the cα in Table 4.2, we observe that 5≤k≤257

SICt(262) > SICt(88) + cα. Hence there is a change point in the stock return price at the 89th position with stock return value of 1278.702 on July 11, 1997, which may be caused by Asian financial crises in July 1997. 90

Stage 2. Next we consider the subsequence for t from 1 to 87. The SICt(87) =

−362.1493 > min SICt(18) = −364.1303. Using the cα in Table 4.2 we still have 4≤k≤83

SICt(87) > SICt(18) + c0.1. Thus there is a change point in the stock return at the 19th position with a stock return value of 615.728 on March 8, 1996. This change may have occurred as a result of the Mexico’s financial crises in 1995.

Next, we consider the subsequence for t from 89 to 262, the SICt(174) = −462.6885 >

min SICt(k) = SICt151 = −469.5582. Using the table, we observe that SICt(174) > 5≤k≤257 th SICt(151) + c0.025. Hence the change occurs in the 151 + 88 = 239 in the Rt series. Thus the change point occurs in the stock rate at the 240th position with return price of 917.296 on June 2, 2000. This change may be caused by the 2000 Dot-com bubble, the collapse of a technology bubble which reached its climax in March 10, 2000.

Stage 3. Furthermore, we consider the four sub-subsequences: t from 1 to 17, t from 19 to 87, and t from 176 to 262. Our results indicate that there are no further change point in these sequences. However for t from 89 to 175, there is a change point in the stock return at 144th position with return value of 881.204 which occurred on July 31, 1998. This change was due to the 1998 Russian financial crisis. Therefore, using our SIC test statistics (4.3.5) with nominal value α = 0.05, we detect four change points in the Brazilian stock return data. These changes are located on the 19th, 88th, 144th and 240th positions. The graphs of the Brazilian stock return and return rate Rt with corresponding change points are shown in Figure 4.4. 91

Weekly Return Rate for Brazil Weekly Stock Return for Brazil Return Rate Stock Return 400 800 1200 −0.2 0.0 0.2 0 50 100 150 200 250 0 50 100 150 200 250 Time Time

Figure 4.4: The graphs of the time series data for the weekly return rate Rt and stock returns and for Brazil market with the corresponding change points.

Diagnostic of Independence and Normality .

We apply the Portmanteau test (4.4.2) to check for independence in the Brazil Rt series as follows.

24 X 2 2 Q24 = 262 ∗ ri = 262 ∗ 0.081383 = 21.32235 < χ.95(24) = 36.415. i=1

Hence, we fail to reject H0, which implies the Rt time series are independent. The left panel of Figure 4.5 shows the acf values of the data, this indicates the observations are uncorrelated. The right panel of Figure 4.5 shows the normal Q-Q plot, this plot shows that the normality assumption is not valid. Therefore the skew normal distribution is appropriate for the analysis of this data set. 92

Series 1 Normal Q−Q Plot ACF 0.0 0.4 0.8 Sample Quantiles −0.2 0.0 0.2 0 5 10 15 20 −3 −2 −1 0 1 2 3 Lag Theoretical Quantiles

Figure 4.5: Left: The acf plot of Brazil Rt series data; Right: Test for Normality.

4.4.3 Chile Stock Return Market

We consider applying our SIC testing procedure to test for possible change points in the stock return market for Chile. This data set consist of 262 observations. We

first transform the data into Rt series, and assume that the Rt series are iid from a skew normal distribution with parameters (µ, σ, λ). Using the binary segmentation procedure and SIC method were able to detect the change points in the Rt time series data. Our computations yield the following results:

Stage 1. Consider the whole data set for t from 1 to 261, the SIC(n) =

SICt(261) = −1016.994 > min SICt(k) = SICt(112) = −1051.2. If we use cα 5≤k≤256 and Table 4.2, we still have SICt(261) > SICt(112) + cα. Hence there is a change

th point in Rt series at the 112 position. Thus for the stock return market, the change occurs at the t + 1 = 113th position with stock return value of 736.133 on December 26, 1997. This change was due to the 1997 Asian financial crises which reached its climax in the October 27, 1997 mini crash. 93

Stage2. Consider the subsequence for t from 1 to 111. The SICt(111) =

−528.1872 > min SICt(k) = SICt(100) = −530.6044. Using the cα and Table 4≤k≤107

4.2, we have for α = 0.1, SICt(111) < SICt(100)+c0.1. Therefore, there is a change point at the 100th position, which corresponds to the 101th position in the stock return data with return value of 935.116, on October 3, 1997. This change may have occurred as a result of the Asian financial crises.

Stage3. We continue to test for the subsequence t from 113 to 261, and our computation indicates that there is a change point at t = 169 which corresponds to the 170th position in the stock return market. The stock return value corresponding to the 170th position was 519.747 which occurred on January 29, 1999. For t from

th 170 to 261, there is a change point at 180 position of Rt series. This corresponds to the 181th position in the stock return with return price of 687.862 on August 16, 1999. These changes were due to the 1998 Russian financial crises which lead to rubles devaluation and its government suspension on foreign creditors payments. Moreover, we continue the testing process and our computational results indicate no further change in the subsequence for t from 113 to 168. Using the SIC testing procedure in (4.3.5) with nominal level α = 0.1, we detect four change points, with locations 101th, 112th, 170th and 181th, for the Chilean stock return market. These change points are shown in Figure 4.6. However, if we use nominal level α = 0.05, we only can detect two change points for this data set at positions 112 and 170. 94 Weekly Stock Return Weekly Stock Return Rate for Chile Market for Chile Market Return Rate Stock Return 400 600 800

−0.150 0.00 0.10 50 100 150 200 250 0 50 100 150 200 250 Time Time

Figure 4.6: The graphs of the time series data for the weekly return rate Rt and stock returns for Chile market with the corresponding change points

Diagnostic of Independence and Normality for Chile Rt Series

Figure 4.7 show the graphs of the acf values of the Rt series data and the normal Q-Q plot respectively. From these graphs we notice from the left panel that the data set are uncorrelated and from the right panel, the normality assumption fails. Using the portmanteau test, we have

24 X 2 2 Q24 = 261 ∗ ri = 31.63503 < χ.95(24) = 36.415. i=1

Hence the Rt series data are independent. Therefore the skew normal distribution is valid for the analysis of this data set.

4.4.4 Mexico Stock Return Market

The Mexico stock return data set consist of 262 observations. We first transform the data set to Rt series and assume there are iid from a skew normal distribution 95

Series 1 Normal Q−Q Plot ACF 0.0 0.4 0.8 Sample Quantiles

0 5 10 15 20 −0.15−3 0.00 −2 0.10 −1 0 1 2 3 Lag Theoretical Quantiles

Figure 4.7: Left: Graph of the acf of the Chile Rt series; Right: Test for Normality. with parameters (µ, σ, λ). We apply the binary segmentation procedure and SIC test statistics to test for all possible change points and the corresponding locations in this data set. Based on our computation, we attain the following results at each stage.

Stage1. Consider the whole data set for t from 1 to 261. The SICt(n) =

SICt(261) = −817.8252 > min SICt(k) = SICt(94) = −839.7652. If we used the 5≤k≤256 value of cα in Table 4.2, we still have SICt(261) > SICt(94) + cα. Hence there is

th a change point in the Rt series data at the 94 . This change occurs in the stock return market at 95th position with a stock return value of 1284.851 on August 22, 1997. Thus we see that the change began 96thwith stock return value 1188.216 on August 29, 1997. The change may be caused by the 1997 mini crash in the Stock market which occurred on October 27, 1997. 96 Stage2. Next we consider the two subsequences, t from 1 to 93 and t from 95 to 216. Our results indicate that there is no further change in the data. The graphs of the stock return and the return rate of the Mexican market with corresponding change points are given in Figure 4.8.

Weekly Stock Return Weekly Stock Return Rate for Mexico Market for Mexican Market Return Rate Stock Return −0.1 0.0 0.1 0.2 600 1000 1400 0 50 100 150 200 250 0 50 100 150 200 250 Time Time

Figure 4.8: The graphs of the time series data for the weekly return rate Rt and stock returns for Mexico market with the corresponding change point.

Independence and Normality Check for Mexico Rt Series.

Using the portmanteau test along with the normal Q-Q plot, we check for inde- pendence and normality assumption. Figure 4.9 show the graphs of the acf of the transformed data Rt and the normal Q-Q plot. From this Figure 4.9, we observe that the Rt data are uncorrelated and normality assumption is violated. With port- manteau test, we have

24 X 2 2 Q24 = 261 ∗ ri = 23.74004 < χ (24) = 36.415. i=1 97

So the Rt series for Mexican market are independent. Hence the skew normal dis- tribution is appropriate for the analysis.

Series 1 Normal Q−Q Plot ACF −0.1 0.1 Sample Quantiles

−0.2 0.20 0.6 1.0 5 10 15 20 −3 −2 −1 0 1 2 3 Lag Theoretical Quantiles

Figure 4.9: The ACF of Mexico stock return rate Rt and Q-Q plot to test for normality assumption.

4.5 Conclusion

In this chapter, we address the change point problem for the general skew normal distribution using the Schwartz information criterion SIC testing procedure. First we propose a test based SIC testing procedure for simultaneous change point of the location and scale parameters with the assumption that the shape parameter is fixed but unknown. Simulation study shows that the testing procedure powers are within acceptable range. We apply our testing procedure to analyze the Biomedical data set and our result matches preexisting results by different authors. Secondly, we apply the SIC testing procedure to test for simultaneous change in location, scale and shape parameter. We apply our testing procedure to analyze four latin American market. Our results indicate that the skew normal distribution is reasonable for the 98 analysis. Simulations indicate that our test statistic has good power. Here we only consider the SIC method to study the change point problem for this distribution family. In the future work we will consider the other two alternative methods such as the likelihood ratio test and the Bayesian method to detect the change point for this distribution family. 99

BIBLIOGRAPHY

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. Second International Symposium on Information Theory, Petrov ,B N and Csaki, E, (Eds.), Akademiai Kiado, Budapest, 267–281.

Arnold, B., R. Beaver, R. Groeneveld, and W. Meeker (1993). The nontruncated marginal of a truncated bivariate normal distribution. Psychometrika 58 (3), 471– 488.

Azzalini, A. (1985). A class of distributions which includes the normal ones. Scan- dinavian Journal of Statistics, 171–178.

Azzalini, A. (1986). Further results on a class of distributions which includes the normal ones. Statistica 46, 199–208.

Azzalini, A. (2005). The skew-normal distribution and related multivariate families. Scandinavian Journal of Statistics 32 (2), 159–188.

Azzalini, A. and A. Dalla Valle . (1996). The multivariate skew-normal distribution. Biometrika 83 (4), 715–726. 100 Bayes, C. L. and M. Branco (2007). for the skewness param- eter of the skew-normal distribution. Brazilian Journal of Probability and Statistics 21 (2), 141–163.

Chen, J. and A. K. Gupta (1997). Testing and locating variance changepoints with application to stock prices. Journal of the American Statistical Associa- tion 92 (438), 739–747.

Chen, J. and A. K. Gupta (2012). Parametric statistical change point analysis. Birkhauser Boston.

Chen, J. T. and A. K. Gupta (2005). Matrix variate skew normal distributions. Statistics 39 (3), 247–253.

Chen, J. T., A. K. Gupta, and C. G. Troskie (2003). The distribution of stock returns when the market is up. Communications in Statistics - Theory and Meth- ods 32 (8), 1541–1558.

Chernoff, H. and S. Zacks (1964). Estimating the current mean of a normal dis- tribution which is subjected to changes in time. The Annals of 35 (3), 999–1018.

Cook, R. D. and S. Weisberg (1994). An Introduction to Regression Graphics. John Wiley & Sons.

Cs¨org¨o,M. and L. Horv´ath(1997). Theorems in Change-Point Analysis. John Wiley & Sons, New York.

Dalla Valle, A. (2007). A test for the hypothesis of skew-normality in a population. Journal of Statistical Computation and Simulation 77, 63–77. 101 Davis, W. W. (1979). Robust methods for detection of shifts of the innovation variance of a time series. Technometrics 21 (3), 313–320.

Figueiredo, C. C., H. Bolfarine, M. C. Sandoval, and C. Lima (2010). On the skew-normal calibration model. Journal of Applied Statistics 37 (3), 435–451.

Guolo, A. (2013). Flexibly modeling the baseline in meta-analysis. Statistics in Medicine 32 (1), 40–50.

Gupta, A. K. and J. T. Chen (2001). Goodness-of-fit tests for the skew-normal distribution. Communications in Statistics - Simulation and Computation 30 (4), 907–930.

Gupta, A. K. and J. T. Chen (2004). A class of multivariate skew-normal models. Annals of the Institute of Statistical Mathematics 56 (2), 305–315.

Gupta, A. K., T. T. Nguyen, and J. A. T. Sanqui (2004). Characterization of the skew-normal distribution. Annals of the Institute of Statistical Mathematics 56 (2), 351–360.

Gupta, R. C. and N. Brown (2001). Reliability studies of the skew-normal distribu- tion and its application to a strength-stress model. Communications in Statistics - Theory and Methods 30 (11), 2427–2445.

Gurevich, G. and A. Vexler (2005). Change point problems in the model of logistic regression. Journal of Statistical Planning and Inference 131 (2), 313–331.

Harrar, S. W. and A. K. Gupta (2008). On matrix variate skew normal distribu- tions. Statistics 42 (2), 179–194. 102 Hawkins, D. M. (1977). Testing a sequence of observations for a shift in location. Journal of the American Statistical Association 72 (357), 180–186.

Henze, N. (1986). A probabilistic representation of the ‘skew-normal’ distribution. Scandinavian Journal of Statistics 13, 271–275.

Hill, I. D. (1978). Remark AS R26: A remark on Algorithm AS 76: an inte- gral useful in calculating non-central t and bivariate normal . Applied Statistics, 379–379.

Horv´ath,L., M. Huˇskov´a,P. Kokoszka, and J. Steinebach (2004). Monitoring changes in linear models. Journal of Statistical Planning and Inference 126 (1), 225–251.

Hsu, D. A. (1977). Tests for variance shift at an unknown time point. Applied Statistics, 279–284.

Hsu, D. A. (1979). Detecting shifts of parameter in gamma sequences with applica- tions to stock price and air traffic flow analysis. Journal of the American Statistical Association 74 (365), 31–40.

Inclan, C. (1993). Detection of multiple changes of variance using posterior odds. Journal of Business & Economic Statistics 11 (3), 289–300.

Jaruˇskov´a,D. (2007). Maximum log-likelihood ratio test for a change in three pa- rameter weibull distribution. Journal of Statistical Planning and Inference 137 (6), 1805–1815.

Kim, H.-J. and D. Siegmund (1989). The likelihood ratio test for a change-point in . Biometrika 76 (3), 409–423. 103 Mateu-Figueras, G., P. Puig, and A. Pewsey (2007). Goodness-of-fit tests for the skew-normal distribution when the parameters are estimated from the data. Communications in Statistics - Theory and Methods 36 (9), 1735–1755.

Meintanis, S. G. (2010). Testing skew normality via the moment generating func- tion. Mathematical Methods of Statistics 19 (1), 64–72.

Ning, W. (2012). Empirical likelihood ratio test for a mean change point model with a linear trend followed by an abrupt change. Journal of Applied Statistics 39 (5), 947–961.

Ning, W. and A. K. Gupta (2009). Change point analysis for generalized lambda distribution. Communications in Statistics - Simulation and Computation 38 (9), 1789–1802.

Ning, W. and A. K. Gupta (2012). Matrix variate extended skew normal distribu- tions. Random Operators and Stochastic Equations 20, 299–310.

Ning, W., J. Pailden, and A. K. Gupta (2012). Empirical likelihood ratio test for the epidemic change model. Journal of Data Science 10, 107–127.

Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75 (2), 237–249.

Owen, A. B. (1990). Empirical likelihood ratio confidence regions. The Annals of Statistics 18 (1), 90–120.

Owen, A. B. (2001). Empirical likelihood, Volume 92. Chapman & Hall/CRC.

Owen, D. B. (1956). Tables for computing bivariate normal probabilities. Ann. Math. Statist. 27, 1075–1090. 104 Page, E. S. (1954). Continuous inspection schemes. Biometrika 41 (1/2), 100–115.

Page, E. S. (1955). A test for a change in a parameter occurring at an unknown point. Biometrika 42 (3/4), 523–527.

Pewsey, A. (2000). Problems of inference for azzalini’s skewnormal distribution. Journal of Applied Statistics 27 (7), 859–870.

Roberts, H. V. (1988). Data Analysis for Managers with Minitab. Scientific Press: Redwood City, CA.

Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statis- tics 6 (2), 461–464.

Sen, A. and M. S. Srivastava (1975a). Some one-sided tests for change in level. Technometrics 17 (1), 61–64.

Sen, A. and S. Srivastava (1975b). On tests for detecting change in mean when variance is unknown. Annals of the Institute of Statistical Mathematics 27 (1), 479–486.

Snijders, A. M., N. Nowak, R. Segraves, S. Blackwood, N. Brown, J. Conroy, G. Hamilton, A. K. Hindle, B. Huey, K. Kimura, et al. (2001). Assembly of microar- rays for genome-wide of DNA copy number. Nature Genetics 29 (3), 263–264.

Stephens, M. A. (1986). Tests based on EDF statistics. Goodness-of-Fit Techniques, RB D’Agostino and MS Stephens, Eds. Marcel Dekker, New York and Basel. 105 Thomas, G. E. (1979). Remark AS R30: A remark on Algorithm AS 76: An inte- gral useful in calculating non-central t and bivariate normal probabilities. Applied Statistics 28, 113.

Vasicek, O. (1976). A test for normality based on sample entropy. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 54–59.

Vexler, A. and G. Gurevich (2010). Empirical likelihood ratios applied to goodness- of-fit tests based on sample entropy. Computational Statistics & Data Analy- sis 54 (2), 531–545.

Vexler, A., G. Shan, S. Kim, W. M. Tsai, L. Tian, and A. D. Hutson (2011). An em- pirical likelihood ratio based goodness-of-fit test for inverse gaussian distributions. Journal of Statistical Planning and Inference 141 (6), 2128–2140.

Vexler, A., C. Wu, A. Liu, B. W. Whitcomb, and E. F. Schisterman (2009). An extension of a change-point problem. Statistics 43 (3), 213–225.

Vostrikova, L. J. (1981). Detecting “disorder” in multidimensional random pro- cesses. Soviet Mathematics Doklady 24, 55–59.

Worsley, K. J. (1979). On the likelihood ratio test for a shift in location of normal populations. Journal of the American Statistical Association 74 (366a), 365–367.

Young, J. C. and C. E. Minder (1974). Algorithm AS 76: An useful in calculating non-central t and bivariate normal probabilities. Journal of the Royal Statistical Society. Series C (Applied Statistics) 23 (3), 455–457.