<<

A CLASS OF MULTIVARIATE SKEW DISTRIBUTIONS: PROPERTIES AND INFERENTIAL ISSUES

Deniz Akdemir

A Dissertation

Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

May 2009

Committee:

Arjun K. Gupta, Advisor

James M. McFillen, Graduate Faculty Representative

John T. Chen

Hanfeng Chen ii ABSTRACT

Arjun K. Gupta, Advisor

Flexible parametric distribution models that can represent both skewed and symmetric distributions, namely skew symmetric distributions, can be constructed by skewing sym- metric densities by using weighting distributions. In this dissertation, we study a multivariate skew family that have either centrally symmetric or spherically symmetric ker- nel. Specifically, we define multivariate skew symmetric forms of uniform, normal, Laplace, and Logistic distributions by using the cdf’s of the same distributions as weighting distribu- tions. variate extensions of these distributions are also introduced herein. To bypass the unbounded likelihood problem related to the inference about this model, we propose an estimation procedure based on the maximum product of spacings method. This idea also leads to bounded criteria that can be considered as alternatives to Akaike’s and other likelihood based criteria when the unbounded likelihood may be a problem. Ap- plications of skew symmetric distributions to are also considered.

AMS 2000 subject classification: Primary 60E05, Secondary 62E10 Key words and phrases: Multivariate Skew-Symmetric Distribution, Matrix Variate

Skew-Symmetric Distribution, Inference, Maximum Product of Spacings, Unbounded Like- lihood, Model Selection Criterion iii

To little Pelin iv ACKNOWLEDGMENTS

This is a great opportunity to express my respect and thanks to Prof. Arjun K. Gupta for his continuous and invaluable input for this work. It was a pleasure to be his student. I am also grateful to my wife, my parents and my children. v

Table of Contents

CHAPTER 1: Notation, Introduction and Preliminaries 1

1.1 Notation...... 1 1.2 Preliminaries ...... 3 1.2.1 VectorsandMatrices ...... 3

1.2.2 RandomVectorsandMatrices ...... 5 1.2.3 Univariate, Multivariate, Matrix Variate .... 9

1.2.4 UnivariateNormalDistribution ...... 9 1.2.5 Multivariate Normal Distribution ...... 10

1.2.6 Matrix Variate Normal Distribution ...... 12 1.2.7 SymmetricDistributions ...... 13 1.2.8 Centrally Symmetric Distributions ...... 13

1.2.9 Spherically Symmetric Distributions ...... 14 1.2.10 Location-Scale Families from Spherically Symmetric Distributions . . 15

CHAPTER 2: A Class of Multivariate Skew Symmetric Distributions 18

2.1 Background: Multivariate Skew Normal Distributions ...... 18 2.2 Skew-Centrally Symmetric Densities ...... 22

2.2.1 Independent Observations, Location-Scale Family ...... 31 2.3 Multivariate Skew-Normal Distribution ...... 36 2.3.1 Motivation: Skew-Spherically Symmetric Densities ...... 36 vi 2.3.2 Multivariate Skew-Normal Distribution ...... 37

2.3.3 Generalized Multivariate Skew-Normal Distribution ...... 51 2.3.4 Stochastic Representation, Random Number Generation ...... 60

2.4 Extensions: Some OtherSkewSymmetricForms ...... 61

CHAPTER 3: A Class of Matrix Variate Skew Symmetric Distributions 70 3.1 DistributionofRandomSample ...... 70

3.2 Matrix Variable Skew Symmetric Distribution ...... 74 3.3 Matrix Variate Skew-Normal Distribution ...... 76 3.4 Generalized Matrix Variate ...... 79

3.5 Extensions...... 82

CHAPTER 4: Estimation and Some Applications 84 4.1 Maximum products of spacings (MPS) estimation ...... 85

4.2 A Bounded Information Criterion for Selection ...... 88

4.3 Estimation of parameters of snk(α, µ, Σ)...... 91

4.3.1 Case 1: MPS estimation of sn1(α, 0, 1) ...... 92

1/2 4.3.2 Case 2: Estimation of µ, Σc given α ...... 94 1/2 4.3.3 Case 3: Simultaneous estimation of µ, Σc and α...... 96

BIBLIOGRAPHY 103 vii

List of Figures

1.1 Centrally symmetric t distribution...... 16

2.1 Surface of skew uniform density in two dimensions ...... 26 2.2 Surface of skew uniform density in two dimensions ...... 27

2.3 Surface of skew uniform density in two dimensions ...... 28 2.4 The contours for 2 dimensionalskewnormalpdf ...... 29 − 2.5 The surface for 2 dimensional pdf for skew normal variable ...... 30 2.6 The surface for 2 dimensional pdf for skew Laplace variable ...... 32 2.7 The surface for 2 dimensional pdf for skew-normal variable ...... 63

2.8 The surface for 2 dimensional pdf for skew-normal variable ...... 64 2.9 The surface for 2 dimensional pdf for skew-normal variable ...... 65

2.10 The contours for 2 dimensional pdf for skew-normal variable ...... 66 2.11 The surface for 2 dimensional pdf for skew-normal variable ...... 67

2.12 The contours for 2 dimensional pdf for skew-normal variable ...... 68

4.1 Frontierdata ...... 93 4.2 of 5000 simulated values for α byMPS...... 95

4.3 1150heightmeasurements ...... b ...... 97 4.4 The scatter for 100 generated observations ...... 99

4.5 The for LBM and BMI in Australian Athletes Data ...... 100 4.6 The scatter plot for LBM and SSF in Australian Athletes Data ...... 100 viii

List of Tables

4.1 MPSIC, MPSIC2, and MPSIC3 arecompared ...... 91 4.2 MPS estimation of α ...... 93 4.3 ML estimation of α ...... 94

4.4 of α, µ and σ ...... 98 4.5 The simulation results about µ and α ...... 101

1/2 4.6 The simulation results about Σb c ...... b 102 d 1

CHAPTER 1

Notation, Introduction and Preliminaries

1.1 Notation

We denote matrices by capital letters, vectors by small bold letters, scalars by small letters.

We usually use letters from the beginning of the alphabet to denote the constants, sometimes we use Greek letters. For random variables we choose letters from the end of the alphabet.

We do not follow the convention of making a distinction between a and its values. Also the following notation is used in this dissertation:

Rk: k dimensional real space. −

(A)i.: the ith row vector of the matrix A.

(A).j: the jth row vector of the matrix A.

(A)ij: the element located at the ith row and jth column of matrix A.

A′: transpose of matrix A.

A+: Moore-Penrose inverse of matrix A. 2 A : of square matrix A. | | tr(A): of square matrix A. etr(A): etr(A) for square matrix A.

A1/2: symmetric square root of symmetric matrix A.

1/2 Ac : square root of positive definite matrix A by Cholesky decomposition.

I : k dimensional . k −

Sk: the surface of the unit sphere in Rk.

0k n: k n dimensional matrix of zeros. × × ej (or sometimes cj): a vector that has zero elements except for one 1 at its jth row.

1 : k dimensional vector of ones. k −

0 : k dimensional vector of zeros. k − x F : x is distributed as F. ∼ x =d y: x and y have the same distribution.

N (µ, Σ1/2): k dimensional normal distribution with µ and Σ. k −

1/2 φk(x, µ, Σ): density of Nk(µ, Σ ) distribution evaluated at x.

Φ(x, µ, σ): cumulative distribution function of N1(µ, σ) distribution evaluated at x.

φk(x): density of Nk(0,Ik) distribution evaluated at x.

Φ(x): cumulative distribution function of N1(0, 1) distribution evaluated at x.

1/2 Φk(x, µ, Σ): cumulative distribution function of Nk(µ, Σ ) distribution evaluated at x.

Φk(x): cumulative distribution function of Nk(0,Ik) distribution evaluated at x. 3 2 2 χk: central χ distribution with k degrees of freedom.

2 2 χk(δ): non central χ distribution of k degrees of freedom and non centrality parameter δ.

Wk(n, Σ): with parameters n, and Σ.

1.2 Preliminaries

In this section, we provide certain definitions and results involving vectors and matrices as

well as random vectors and matrices. Most of these results are stated here without proof; proofs can be obtained from ([4],[25]), or in most other textbooks on matrix algebra and multivariate .

1.2.1 Vectors and Matrices

A k dimensional real vector a is an ordered array of real numbers a , i = 1, 2,...,k, or- − i ganized in a single column, written as a = (a ). The set of all k dimensional real vectors i − is denoted by Rk. A real matrix A of dimension k n is an ordered rectangular array of ×

real numbers aij arranged in rows i = 1, 2,...,k and columns j = 1, 2,...,n, written as

A =(aij).

Two matrices, A and B of same dimension, are equal if aij = bij for i = 1, 2,...,k

and j = 1, 2,...,n. If all aij = 0, we write A = 0. A vector of dimension k that has all zero components except for a single component equal to 1 at its ith row is called the ith

k (k) elementary vector of R and is denoted by ei , we drop the superscript k when no confusion is possible. For two k n matrices A and B, the sum is defined as A + B = (a + b ). For a × ij ij

real number c, scalar multiplication is defined as cA =(caij). Product of two matrices A of dimension k n and B of dimension n m is defined as AB =(c ) where c = k a b . × × ij ij ℓ=1 iℓ ℓj P A matrix is called square matrix of order k if the number of columns or rows it has are 4 equal to k. The transpose of a matrix A is obtained by interchanging the rows and columns

of A and represented by A′. When A = A′, we say A is symmetric.A k k symmetric × matrix A for which a = 0,i = j for i, j = 1, 2,...,k is called diagonal matrix of order ij 6 k, and represented by A = diag(a11,a22,...,akk); in this case if aii = 1 for all i, then we denote this by Ik and call it identity matrix. A square matrix of order k is orthogonal if

A′A = AA′ = Ik. A symmetric matrix A of order k is positive definite and is denoted by A > 0 if for all

k vectors a R , a′Aa > 0. A symmetric matrix A of order k is positive semi definite and is ∈ k denoted by A 0 if for all vectors a R , a′Aa 0. ≥ ∈ ≥ Given a square matrix A of order k, the equation Ax = λx can be solved for k pairs

(λ, x) (λ R and x Rk), the first of a pair of solutions is called an eigenvalue, then ∈ ∈ second of the pair is called a corresponding eigenvector. Eigenvalues and eigenvectors of a matrix are usually complex, but a real symmetric matrix always have real eigenvalues and eigenvectors. Additional assumption of positive definiteness requires positive eigenvalues, assumption of nonnegative definiteness requires nonnegative eigenvalues. The number of nonzero eigenvalues of a square matrix is the rank of A and written as rk(A). We can define the rank of a k n matrix as rk(A) = rk(AA′) or equivalently as rk(A) = rk(A′A), if × rk(A)= min(k, n) then A is called a full rank matrix. The determinant of a square matrix A of order k is defined as the product of its eigenvalues and written as A . If A = 0,A is called nonsingular, and we can calculate the unique matrix | | | | 6 1 B such that AB = BA = Ik, B is called the inverse of A and written as A− . A positive definite matrix is nonsingular and has positive determinant. The trace of a square matrix A of order k is defined as the sum of its eigenvalues and written as tr(A).

A symmetric matrix A of order k can be written as A = GDG′ where G is an orthogonal matrix of order k and D is a diagonal matrix of order k. The symmetric square root matrix

1/2 1/2 1/2 1/2 A is defined as A = GD G′ where D is the diagonal matrix with diagonal elements equal to the square roots of the elements of D. If A is also positive definite, there exists a 5

unique lower triangular matrix L with positive diagonal elements such that A = LL′. This is called the Cholesky decomposition of A. In the following, we denote the square root of A

1/2 obtained through Cholesky decomposition as Ac . Suppose that A is a symmetric matrix of order k and rank r k. In this case, the unique ≤ pseudo inverse of A can be written as A− = GD−G′, where D− is the diagonal matrix of order r with diagonal elements equal to the inverses of nonzero eigenvalues of A, G. A matrix of dimensions k r is constructed such that jth column of it is the normalized eigenvector × corresponding to the inverse of the jth diagonal element of D−; A and A− satisfy AA−A = A, and A−AA− = A−. Let B be a k n matrix of rank r, r k n. The unique Moore-Penrose × ≤ ≤ + inverse of B is defined as B = B′(B′B)−. A symmetric matrix A of order k is idempotent

+ + of order k if it satisfies AA′ = A′A = A. BB and B B are idempotent matrices.

Iq 0 If A is idempotent of order k and rank q, then it can be written as A = G   G′  0 0  where G is an orthogonal matrix of order k, 0’s denote zero matrices of appropriate dimen- sions. If A1,A2,...,Am are each idempotent of order k with ranks r1, r2,...,rm such that m r = k and if A A = 0 for i = j, then we can find an orthogonal matrix G of order k j=1 j i j 6 P 0 0 0   Ir1 0 0 0 such that A1 = G   G′, A2 = G 0 I 0 G′,Am = G   G′ where  r2  0 0   0 Irm          0 0 0    0’s denote zero matrices of appropriate dimensions. 

1.2.2 Random Vectors and Matrices k random variables x1, x2,...,xk observable on a unit are represented by a k dimensional column vector x, and is called a multivariate random variable. The matrix variable random variable obtained by considering n sampling units selected to a random sample observed on k variables is represented by a k n matrix, say X. × The distribution of a random variable x can be characterized by its cumulative distribution function (cdf) F (.). It is defined as F (c) = P (x c). Given a cdf F (.), if there exists a x x ≤ 6 nonnegative function f(.) such that x f(t)dt for every x R, we say that the random ∈ R−∞ variable x with cdf Fx(.) has probability density function (pdf), dF (x)= f(x). We can also represent a distribution using the generating function (mgf) of the random variable

tx defined as Mx(t) = E(e ), if it exists for all t in some neighborhood of 0. The mgf of a random variable does not always exist, but the characteristic function (cf) always exists and

itx it is defined as Ψx(t) = E(e ). More generally, the distribution of a random variable x in

k R can be characterized by its joint cumulative distribution function (jcdf) Fx(.). For c in

k R , it is defined as Fx(c) = P (x c , x c ,...,x c ). Given a jcdf F (.), if there 1 ≤ 1 2 ≤ 2 k ≤ k x1 x2 xk exists a nonnegative function f(.) such that . . . f(t1,t2,...,tk)dt1dt2 ...dtk for R−∞ R−∞ R−∞ every x Rk, we say that the random vector x with jcdf F (.) has joint probability density ∈ x function (jpdf), dF (x)= f(x). The joint moment generating function (jmgf) of the random

t′x k variable defined as Mx(t) = E(e ), if it exists for all t in some neighborhood of 0k in R .

it′x k The joint characteristic function (jcf) is defined as Ψx(t) = E(e ) for t R . We write ∈ x F (x), x f(x), x Mx(t), or x Ψx(t) equivalently to say that a random vector ∼ ∼ ∼ ∼ has a certain distribution. When we say that two random vectors x and y are have the same distribution, we refer to the equivalence of either jcdf’s, jpdf’s, jmgf’s, or jcf’s for these variables, in this case we write x =d y.

Let x be a k dimensional random variable. Suppose we know either one of the following: x F (x), x f(x), x Mx(t), or x Ψx(t). Let k denote the set of indices for random ∼ ∼ ∼ ∼ variables in x. Let m be a subset of k. The marginal distribution of m dimensional vector − of random variables with indices in m, say x(m), can be found in one of the following ways:

m m m mc m ( ) c ( ) ( ) ( ) ( ) mc x F (x) x(m )= , x f(x ) = . . . Rk−m f(x)dx , x Mx(t)t( )=o, or ∼ | ∞ ∼ ∼ (m) c R R (mc) x Ψx(t) (mc) where m denotes the complement of m, x is the corresponding ∼ t =0 sub vector of x, and similarly t(mc) denotes the sub vector of t corresponding to indices in mc. If the random variable x has a jpdf, the conditional jpdf of x(m) given x(mc) is

(m) (mc) f(x) (m) (mc) f(x x ) = mc . When x does not have a density, the distribution of x x is | f(x( )) | still defined from definition of conditional probability. Two random vectors, x and y, are 7 said to be independent if F (x, y)= G(x)H(y) where F (x, y),G(x) and H(y) are the jcdf’s

of (x, y), x, and y correspondingly. If they have densities f(x, y), g(x), and h(y), then x and y are independent if and only if f(x, y)= g(x)h(y).

The expectation of a random variable x with cdf F (x) is defined by E(x)= R xdF (x). R For r = 1, 2,..., µr = (x a)rdF (x) is called the moment of order r about a. In general, a R − R the of an arbitrary function of x, g(X) with respect to the probability density

1 function dF (x) is given by E(g(x)) = R g(x)dF (x). We can write E(x)= µ0 for the mean. R The moment of order 2 about E(x) is called the of the random variable x, and

write it as var(x). Let x = (x , x ,...,x )′ be a k dimensional random vector. If every 1 2 k −

element xj of x has finite expectation µj, we write E(x)=(E(x1),E(x2),...,E(xk))′ =

r1r2...rk µ =(µ1,µ2,...,µk)′ for the mean vector.A k variable moment µa about an origin a =

(a ,a ,...,a )is defined as µr1r2...rk = E((x a )r1 (x a )r2 . . . (x a )rk ) when it exists. 1 2 k a 1 − 1 2 − 2 k − k

Expectation of an arbitrary function of x, g(x) is defined as E(g(x)) = Rk g(x)dF (x). R Σ for a k dimensional random vector x is the expectation of the k k × matrix (x µ)(x µ)′. Obviously, Σ is symmetric, that is, the class of covariance matrices − − coincides with the class of positive semi definite matrices. The following is very useful: If x is a k dimensional random vector with mean vector µ and covariance matrix Σ, then − the variable y = Ax + ξ for A a m k matrix and ξ Rm has mean vector Aµ + ξ, and × ∈ covariance matrix AΣA′. Assume x χ be a k dimensional random vector described by its jcdf, F (x). Let h(x) ∈ − be a measurable function from χ to Υ Rm. Then y = h(x) is a m dimensional random ⊂ − vector with with its distribution described by H(y)= dF (x), where Θ = x : h(x) y . Θ { ≤ } ′ R it h(x) Rm The jcf of y can be obtained as Ψy(t)= k e dF (x) for t . Similarly, the when it R ∈ R ′ t h(x) Rm exists jmgf of y is given by My(t)= k e dF (x) for t . R ∈ R Let x χ be a k dimensional random vector having density function f(x). Let h(x) be ∈ − m 1 an invertible function from χ to Υ R such that h− has continuous partial derivatives in ⊂ 1 Υ. Then y = h(x) is a m dimensional random vector with density function f(h− (y)) J(y) , − | | 8 1 here J(y) is called the Jacobian of the transformation h− , and it is the determinant of the | | 1 k k matrix whose ijth component is the partial derivative of the ith component of h− (y) × with respect to jth component of y. For example, the Jacobian of the linear transformation

y = Ax + µ for A a nonsingular matrix of order k and constant vector µ Rk is given by ∈ 1 A − . Also, for the k n matrix variate random variable X the transformation Y = AXB+M | | × for nonsingular matrices A and B of orders k and n and a k n constant matrix M is given × n k by A − B − . We also have the following useful results for transformation of variables: | | | | µ′t m If x Mx(t), then y = Ax + µ e Mx(A′t) for any m k matrix and µ R , if ∼ ∼ × ∈ X M (T ) then Y = AY B + M etr(M ′T )M (A′T B′) for any m k matrix A, n l ∼ X ∼ X × × matrix B and m l matrix M. × We define a model to be a k dimensional random vector x taking values in a set χ Rk − ⊂ and having jcdf F (x : θ) where θ is a p dimensional vector of parameters taking values − in Θ Rp. χ is called the sample space, Θ is called the parameter space. In parametric ⊂ , we assume that the sample space, the parameter space, and the jcdf F (x; θ) are all known. Usually we refer to a model family in one of the following ways:

F = [χ, Fx(x; θ), Θ] or F = [χ, fx(x; θ), Θ]. We try to make some inference about θ after we observe a sample of observations, say X.

A group of transformations, G, is a collection of transformations from χ to χ that is closed under inversion, and closed under composition. The model F = [χ, fx(x; θ), Θ] is invariant with respect to G when x fx(x; θ), g(x) f (g(x); g(θ)) where g G and G ∼ ∼ g(x) ∈ is a group of transformations from Θ to Θ. The family of the normal distributions and more generally elliptical distributions are closed under nonsingular linear transformations.

Invariance under a group of transformations, say G, can be useful for reducing the dimen- sion of the data since inference about θ should not depend on whether the simple random sample X =(x1, x2,..., xn) is observed or g(X)=(g(x1),g(x2),...,g(xn)) is observed for g G. Suppose that F is invariant with respect to G and that the T = T (X) is an ∈ of θ. That is, T is a function that maps χn to Θ. Then T is equivariant estimator 9 if T (g(X)) = g(T (X)) for all g G, g G, and all X χn. It can also be shown that if F is ∈ ∈ ∈ invariant with respect to G and the mle is unique then the mle of θ is equivariant, if the mle is not unique then the mle can be chosen to be equivariant. A statistic T (X) is invariant under a group of transformations G if T (g(X)) = T (X) for all g G and all X χn. A function ∈ ∈ M(X) is maximal invariant under a group of transformations G if M(g(X)) = M(X) for all g G and all X χn, and if M(X ) = M(X ) for some X χn and X χn then ∈ ∈ 1 2 1 ∈ 2 ∈

X1 = g(X2). Suppose that T is a statistic and that M is a maximal invariant. Then T is invariant if and only if T is a function of M. Invariance property is usually utilized for obtaining suitable test statistics for hypothesis testing problems that stay invariant under a group of transformations.

1.2.3 Univariate, Multivariate, Matrix Variate Normal Distribu-

tion

The normal distribution arises in many areas of theoretical and applied statistics. The use of the normal model is usually justified by assuming many small, independent effects additively contributing to each observation. Also, in , normal distributions arise as the limiting distributions of several continuous and discrete families of distributions. In addition, the normal distribution maximizes information entropy among all distributions with known mean and variance, which makes it the natural choice of underlying distribution for data summarized in terms of sample mean and variance. In the following, we give the definition and review some important results about the normal distribution for univariate, multivariate and matrix variate cases.

1.2.4 Univariate Normal Distribution

Each member of the univariate normal distribution family is defined by two parameters location and scale: the mean (”average”, µ) and variance ( squared) σ2, 10 respectively. To indicate that a real-valued random variable x is normally distributed with

mean µ and variance σ2, we write x N(µ, σ). The pdf of the normal distribution is the ∼ 1 1 (x µ)2 Gaussian function φ(x; µ, σ) = e− 2σ2 − where σ > 0 is the standard deviation, the √2πσ real parameter µ is the expected value. φ(z; 0, 1) = φ(z) is the density function of the ”standard” normal random variable. The cumulative distribution function of the normal

x u µ distribution is expressed in terms of the density function Φ(x; µ, σ) = φ( −σ )du. The R−∞ standard normal cdf, Φ(z), is just the general cdf evaluated with µ = 0and σ = 1. The values Φ(z) may be approximated very accurately by a variety of methods, such as numerical

integration, Taylor series, asymptotic series and continued fractions. Mgf exists, for x ∼ µt+ 1 t2σ2 iµt 1 t2σ2 N(µ, σ) and is given by Mx(t)= e 2 . Characteristic function is hx(t)= e − 2 .

1.2.5 Multivariate Normal Distribution

In classical multivariate analysis the central distribution is the multivariate normal distribu- tion. There seems to be at least two reasons for this: First due to the effect of multivariate , in most cases multivariate observations are at least approximately normal. Second, multivariate normal distribution and its sampling distributions are usually tractable. A k dimensional random vector x is said to have an k dimensional normal distribution − − k if the distribution of α′x is univariate normal for every α R . A random vector x with ∈ k components has normal distribution with parameters µ Rk and k k positive definite ∈ × matrix A has jpdf

1 1 (x µ)′(AA′)−1(x µ) φ (x; µ,A)= e− 2 − − . (1.1) k (2π)p/2 A | | We say that x is distributed according to N (µ,A), and write x N (µ,A). The mgf k ∼ k corresponding to the density in (1.1) is

t′µ+ 1 t′AA′t Mx(t)= e 2 . (1.2) 11 If we define a m dimensional random variable y as y = Bx+ξ where B is any m k matrix − × and ξ Rm, then by the moment generating function, we can write y N (Bµ + ξ,BA); ∈ ∼ m note that a pdf representation does not exist when BB′ is singular, i.e. when rk(B)= r

1 If x N (µ,A) where A is nonsingular, then z = A− (x µ) has the distribution ∼ k −

Nk(0k,Ik) with density

1 1 z′z φ (z)= e− 2 (1.3) k (2π)1/2 and moment generating function

t′z 1 t′t E(e )= e− 2 . (1.4)

We denote the jpdf of z with φk(z), and its jcdf with Φk(x). v = z′z has chi-square dis-

2 tribution with k degrees of freedom and denoted by χk, the density of v can be written as

1 1 k 1 1 v dF (v)= v 2 − e− 2 . (1.5) 2k/2Γ(k/2)

Let u have uniform distribution on the k dimensional sphere, and r =d v1/2. Then, z =d ru. − d and x = Az + µ satisfies x = µ + rAu; in general, if y has Nm(ξ, B) distribution for any m k matrix B and any k dimensional vector ξ, we have y =d ξ + rBu. × −

The parameters of the Nk(µ,A) distribution have direct interpretation as the expectation and the covariance of the variable x with this distribution since E(x)= µ, E((x µ)(x − − µ)′) = AA′. Any odd moments of x µ are zero. The fourth order moments are E[(x − i − mu )(x mu )(x mu )(x mu )]=(σ σ + σ σ + σ σ ) where AA′ =(σ ). i j − j k − k l − l ij kl ik jl il jk ij The family of the normal distributions is closed under linear transformations, marginaliza-

tion and conditioning. A characterization of multivariate normal distribution in the elliptical family of distributions is due to fact that it is the only distribution in this class for which

zero covariance matrix between two partitions of a random vector implies independence. The following are important for x N (µ,A), where A is nonsingular: ∼ k

1 2 1. (x µ)′(AA′)− (x µ) χ − − ∼ k 12 1 2 2 2 2. x′(AA′)− x χ (δ), here χ (δ) denotes the non central χ distribution with non ∼ k k 1 1 centrality parameter δ = 2 µ′(AA′)− µ.

When x N (µ,A) where A is singular of rank r we can prove similar results: ∼ k

+ 2 1. (x µ)′(AA′) (x µ) χ − − ∼ r

+ 2 2 2 2. x′(AA′) x χ (δ), here χ (δ) denotes the non central χ distribution with non cen- ∼ r r 1 + trality parameter δ = 2 µ′(AA′) µ.

Assume that x N (µ, A). Let y = Bx + b, z = Cx + c where B is m k, C is l k, ∼ k × × b is m 1 and c is l 1. Then y and z are independent if and only if BAA′C = 0. × ×

1.2.6 Matrix Variate Normal Distribution

If a random sample of n observations are independently selected from a Nk(µ,A) distribution,

the joint density of X =(x1, x2,..., xn) can be written as

n 1 n ′ ′ −1 n 1/2 1 (xj µ) (AA ) (xj µ) dF (X)= A − φk((AA′)− (xj µ)) = e− 2 j=1 − − . | | − (2π)nk/2 A n P Yj=1 | |

A more general matrix variate normal distribution can be defined through linear transfor- mation to an iid random sample of a k dimensional standard normal vectors both from right and left. Let z φ (z), and let Z =(z , z ,..., z ) denote an iid random sample from this ∼ k 1 2 n distribution. The joint density for Z is given by

n 1 n ′ 1 z zj dF (Z)= φk(zj)= e− 2 j=1 j (2π)nk/2 P Yj=1

1 1 = etr( ZZ′). (2π)np/2 −2

We denote this by Z φk n(Z). Now let X = AZB + M for nonsingular matrices A and B ∼ × of orders k and n and a k n constant matrix M. The Jacobian of this transformation is × 13 n k A − B − . The density of X is | | | |

1 1 1 1 φk n(X; M, A, B)= etr( (AA′)− (X M)(B′B)− (X M)′). × (2π)nk/2 A n B k −2 − − | | | |

1.2.7 Symmetric Distributions

A random variable x is symmetric if x =d x. Multivariate symmetry is usually defined in − terms of invariance of the distribution of a centered random vector x in Rk under a suitable family of transformations. A random vector x has centrally symmetric distribution about the origin if x =d x. An equivalent alternative description for central symmetry is through − d k requiring v′x = v′x for any vector v R , that is any projection of x to a line through the − ∈ origin has a symmetric distribution. The joint distribution of k symmetrically distributed random variables is centrally symmetric. A random vector x has a spherically symmetric distribution about the origin 0 if an orthogonal rotation of x about 0 by any orthogonal matrix A does not alter the distribution, i.e., x =d Ax for all orthogonal k k matrices A. × A random vector that has uniform distribution on a k cube of form [ c,c]k is centrally but − − not spherically symmetric, central symmetry is more general than spherical symmetry. Both of these definitions of symmetry reduce to the usual notion of symmetry in the univariate case.

1.2.8 Centrally Symmetric Distributions

The most obvious way of constructing centrally symmetric distributions over Rk is through considering the joint distribution of k independent symmetric random variables. Because of independence, the joint density is the product of univariate marginals. Some examples of this kind of densities follows:

1. Multivariate uniform distribution on the k cube over [ 1, 1]k. Density: − −

1 g(x)= , x [ 1, 1]k. 2k ∈ − 14 2. Multivariate symmetric on the k cube over [ 1, 1]k. Density: − −

k k Γ(2θ) ej′ x 1/2 ej′ x 1/2 θ 1 k g(x)=( ) ( − (1 − )) − , x [ 1, 1] . (Γ(θ))22 2 − 2 ∈ − Yj=1

3. Multivariate centrally symmetric t-distribution with v degrees of freedom. Density:

v+1 k k 2 Γ( 2 ) (ej′ x) ( v+1 ) k g(x)=( ) (1 + ) 2 , x R . √vπΓ( v ) v ∈ 2 Yj=1

4. Multivariate . Density:

k 1 k g(x; v)= exp( e′ x ), x R . 2k − | j | ∈ Xj=1

1.2.9 Spherically Symmetric Distributions

A spherically symmetric random vector x has a jcf of the form h(t′t) for some cf h(.);

and a density, if it exists, of the form f(x′x) for some pdf f(.). For a spherically symmetric random vector x, the random variable defined as u(k) = x is always √(x′x) distributed uniformly over the surface of the unit sphere in Rk, denoted by Sk. A stochastic representation for the random vector x which has spherically symmetric

distribution is given by x =d ru where r is a nonnegative random variable with cdf K(r) that is independent of u which is uniformly distributed over Sk. In this case

d d x r = (x′x), u = and they are independent. If both K′(r)= k(r) K the pdf √(x′x) ∈ p of r, and f(x′x) the jpdf of x exists then they are related as follows:

k 2π 2 k 1 2 k(r)= k r − f(r ) (1.6) Γ( 2 ) 15 or sometimes we like the expression for pdf of v = r2

k π 2 1 2 (k 1) k(v)= k v − f(v). (1.7) Γ( 2 )

Some examples follow:

(a) The standard multivariate normal Nk(0k,Ik) distribution with jpdf

1 1 x′x φ (x)= e− 2 ; k (2π)k/2

We denote the jcdf of x by Φk(x).

(b) The multivariate spherical t distribution with v degrees of freedom with density function Γ( 1 (v + k)) 1 dF (x)= 2 . 1 k/2 1 (v+k)/2 Γ( 2 v)(vπ) (1 + v x′x)

When n = 1 the multivariate t distribution is called spherical .

Both centrally symmetric and spherically symmetric multivariate t distributions ap- proach to the multivariate normal density. In Figure 1.2.9 we display centrally sym- metric t distribution for various choices of degrees of freedom, v. As v gets larger the contours approach to the contours of the multivariate normal density.

1.2.10 Location-Scale Families from Spherically Symmetric

Distributions

Elliptical distributions that are obtained from an affine transform of a spherically sym- metric random vector are usually utilized in the study of robustness of procedures developed under the normal distribution assumption. The family of the elliptical dis- tributions is closed under linear transformations, marginalization and conditioning. 16

degrees of freedom is 1. degrees of freedom is 5. 5 5

y 0 y 0

−5 −5 −5 0 5 −5 0 5 x x

degrees of freedom is 15. degrees of freedom is 100. 5 5

y 0 y 0

−5 −5 −5 0 5 −5 0 5 x x

Figure 1.1: Centrally symmetric t distribution for various choices of degrees of freedom, v. As v gets larger the contours approach to the contours of the multivariate normal density.

Let z be a k dimensional random vector with spherical distribution F. A random − vector x =d Az + µ for A a m k constant matrix and µ a m dimensional constant × − vector has elliptically contoured distribution with parameters µ Rk and A with kernel ∈ iµ′t F, we denote this distribution by Em(µ, A, F ). x has a jcf of the form e hz(A′t) where

hz(t) = h(t′t) is the jcf of the spherically distributed variable z. When it exists the

µ′t jmgf is given by e Mx(A′t). When AA′ is nonsingular and jpdf F ′ = f exists the jpdf of x is written as follows:

1 1 E (x; µ, A, f)= A − f((x µ)′(AA′)− (x µ)). (1.8) k | | − −

A characterization of x is given by x =d Az + µ =d rAu(k) + µ where r is a nonneg- ative random variable with cdf K(r) that is independent of u(k) which is uniformly

distributed over Sk. If x E (µ, A, f) has finite first moment then it is given by ∼ k E(x) = µ. Any odd moments of x µ are zero. Covariance matrix if it exists is − 17 E(r2) given by E[(x µ)(x µ)′] = AA′. If fourth order moments exists, they are − − k 4 E[(x mu )(x mu )(x mu )(x mu )] = E(r ) (σ σ + σ σ + σ σ ) where i − i j − j k − k l − l k(k+2) ij kl ik jl il jk

AA′ =(σij).

If a random sample of n observations are independently selected from a Ek(µ, A, f) distribution, the density of X =(x1, x2,..., xn) can be written as

n n 1 Ek n(X; µ, A, f)= A − f((xj µ)′(AA′)− (xj µ)). × | | − − Yj=1

Gupta and Varga give a generalization of this model ([37]):

n k 1 1 Ek n(X, M, A, B, f)= A − B − f(tr((X M)′(AA′)− (X M)(B′B)− )). × | | | | − − 18

CHAPTER 2

A Class of Multivariate Skew Symmetric Distributions

The interest in flexible distribution models that can adequately accommodate asym- metric populations has increased during the last few years. Many distributions are skewed and so do not satisfy some standard assumptions such as normality, or even el- liptical symmetry. A reasonable approach is to develop models that can represent both skewed and symmetric distributions. The book edited by Genton supplies a variety of applications for such models in areas such as economics, engineering, and biological sciences ([27]).

2.1 Background: Multivariate Skew Normal Dis- tributions

The univariate skew normal density first appears in Roberts ([55]) as an example to weighted densities. In Aigner et al., skew normal model is obtained in the context of stochastic frontier production function models ([1]). Azzalini provides a formal study 19 of this distribution ([10]). The density of this model is given by

2 2 2 SN1(y, σ , α)= f(y, σ , α) = 2φ(y, σ )Φ(αy), (2.1) where α is a real scalar, φ(., σ2) is the univariate normal density function with variance σ2 and Φ(.) denotes the cumulative distribution function of the univariate standard normal variable.

Some basic properties of the univariate skew normal distribution are given in ([10]):

2 2 (a) SN1(y, σ , α =0)= φ(y, σ ),

(b) If y SN (y, σ2, α) then y SN (y, σ2, α), ∼ 1 − ∼ 1 − (c) As α the density SN (y, σ2, α) approaches to the half normal density, i.e. → ±∞ 1 to the distribution of z when z φ(z, σ2), ±| | ∼ (d) If y SN (y, σ2 = 1, α) then y2 χ2. ∼ 1 ∼ 1

Properties 1, 2, and 3 follow directly from the definition while Property 4 follows immediately from the following lemma:

Lemma 2.1.1. ([55]) w2 χ2 if and only if the p.d.f, of w has the form f(w) = ∼ 1 h(w)exp( w2/2) where h(w)+ h( w)= 2/π. − − p The univariate skew normal distribution family extends the widely employed family of normal distributions by introducing a factor α. The advantage of this distribution family is that it maintains many statistical properties of the normal dis- tribution family. The study of skew normal distributions explores an approach for statistical analysis without the assumption of symmetry for the underlying population distribution. The skew normal distribution family emerges to take into account the skewness property. Skew symmetric distribution is useful in many practical situations. 20 A more general form of the density in (2.1) arises as

f(y, α) = 2g(y)H(αy) (2.2) where α is a real scalar, g(.) is a univariate density function symmetric around 0 and

H(.)is a absolutely continuous cumulative distribution function with H′(.) symmetric around 0. This family of densities is called skew symmetric family. Taking α equal to zero in (2.2) gives a symmetric density, we obtain a skewed density for any other value of α. The parameter α controls both skewness and .

When applying the skew normal distribution family in statistical inference, frequently we need to discuss the joint distribution of a random sample from the population. This consequently necessitates the study of multivariate skew normal distribution.

Several generalizations of densities (2.1), (2.2) to the multivariate case have been stud- ied. For example, Azzalini introduced the k-dimensional skew-normal density in two alternative forms ([10], [12]):

k

f(y, α, Σ)= cφ(y, Σ) Φ(αiyi) (2.3) Yi=1

f(y, α, Σ) = 2φ(y, Σ)Φ(α′y) (2.4)

where α = (α1, α2,...αk)′ is a k-dimensional real vector, φ(., Σ) is the k-dimensional normal density function with covariance matrix Σ, Φ(.) denotes the cumulative distri- bution function of the univariate standard normal variable, and c is the normalizing constant.

Chang and Gupta generalize the idea in (2.4) for cases other than normal ([31]). A function

f(y, α) = 2g(y)H(α′y) (2.5) 21 defines a probability density function for y Rk when α is k-dimensional real vector, ∈ g(.) is a k-dimensional density function symmetric around 0k and H(.) is a absolutely continuous cumulative distribution function with H′(.) is symmetric around 0.

The skew symmetric family generated by Equation 2.5 has been studied by many authors. For instance, one can define skew-t distributions ([16], [41], [56]), skew-Cauchy distributions ([9]), skew-elliptical distributions ([13], [6]), or other skew symmetric distributions ([60]). The models in ([35]) are obtained by taking both g(.) and H(.) in (2.2) to belong to one of normal, Cauchy, Laplace, logistic or uniform family. In ([49]), g(.) is taken to be a normal pdf while the cumulative distributive function

H(.) is taken as the cumulative distribution function of normal, Students t, Cauchy, Laplace, logistic or uniform distribution. Alternatively, Gomez, Venegas, and Bolfarine consider the situation where H(.) is fixed to be the cumulative distribution function of the normal distribution while g(.) is taken as the density function of the normal, Student’s t, logistic, Laplace, and uniform distributions ([30]).

Univariate and multivariate skew symmetric family of distributions appear as a subset of the so called selection models. Suppose x is a k dimensional random vector with den- sity f(x). The usual statistical analysis involves using a random sample x1, x2,..., xn to make inferences about f(x). Nevertheless, there are situations for which a weighted sample instead of a random sample is selected from f(x) because it is either difficult, costly, or even impossible to observe certain parts of the distribution. If we assume that a weight function used in selection, say g(x), then the sample of observation may be thought as coming from the following weighted density

f(x)g(x) h(x)= . (2.6) Rk g(x)f(x)dx R When the sample is only a subset of the population then the associated model would be called a selection model. This kind of densities originate from Fisher ([26]). Patil, Rao 22 and Zelen provide a survey on this family of distributions ([51]). In their recent article article, Arellano-Valle, Branco, and Genton show that a very general class of skew symmetric densities emerge from selection models called fundamental skew distribution

([5]).

2.2 Skew-Centrally Symmetric Densities

In this section, we study a family of multivariate skew symmetric densities generated by multivariate centrally symmetric densities.

Theorem 2.2.1. Let g(.) be the k-dimensional jpdf for k independent variables cen- trally symmetric around 0, H(.) be a absolutely continuous cumulative distribution function with H′(.) is symmetric around 0, α = (α , α ,...,α )′ be a k dimensional 1 2 k − k real vector, and ej for j = 1, 2,...,k are the elementary vectors of R . Then

k k f(y, α) = 2 g(y) H(αjej′ y) (2.7) Yj=1 defines a probability density function.

Proof. First note that f(y) 0 for all y Rk. We need to show that ≥ ∈

k k k(α1, α2, ..., αk)= 2 g(y) H(αjej′ y)dy = 1. Z k R Yj=1 23 Observe that,

k ∂ d k k(α1, α2, ..., αk)= 2 g(y) H(αjej′ y)dy ∂α Z k dα ℓ R ℓ Yj=1 k k = 2 yℓH′(αℓeℓ′ y) H(αjej′ y)g(y)dy Z k R Yj=ℓ 6 = 0.

The first equality is true because of Lebesgue dominated convergence theorem, the last equality is due to independence through

k k

Eg(y)(yℓH′(αℓeℓ′ y) H(αjej′ y)) = Eg(y)(yℓH′(αℓeℓ′ y))Eg(y)( H(αjej′ y))=0, Yj=ℓ Yj=ℓ 6 6 and for g(.) is centrally symmetric around 0, yℓH′(αℓeℓ′ y) is an odd function of yℓ.

Hence, k(α1, α2, ..., αk) is constant as a function of αj for all j = 1, 2,...,k; and when all αi = 0, k(α1, α2, ..., αk) = 1. This concludes the proof.

In the next theorem, we relate the distribution of the even powers of a skew symmetric random variable to those of its kernel’s.

Theorem 2.2.2. Let x be a random variable with probability density function g(x), let y be the random variable with probability density function

k k f(y, α) = 2 g(y) H(αjej′ y) Yj=1 where g(.), H(.) and α are defined as in Theorem 2.2.1. Then,

p p (a) the even moments of y and x are the same, i.e E(yy′) = E(xx′) for p even

m m and E(y′y) = E(x′x) for any natural number m,

(b) y′y and x′x have the same distribution. 24 Proof. It suffices to show

n1 n2 nk n1 n2 nk E(x1 x2 ...xk )= E(y1 y2 ...yk )

for n1, n2,...,nk even.

Let Ψy(t) be the characteristic function of y. Then,

k it′y k Ψy(t)= e 2 g(y) H(αjej′ y)dy. (2.8) Z k R Yj=1

th Let n1 + n2 + . . . + nk = n. Taking the nj partial derivatives of (2.8) with respect to tj for j = 1, 2,...,k and putting t = 0k,

n n k ∂ Ψ (t) ∂ ′ y it y k e y y n1 n2 nk = n1 n2 nk e 2 H(αj j′ )g( )dy t=0k ∂t ∂t ...∂t t 0 Z k ∂t ∂t ...∂t | 1 2 k | = k R 1 2 k Yj=1 k k ′ it y k n nℓ = [e 2 i H(αjej′ y)][ yℓ ]g(y)dy t=0k Z k | R Yj=1 Yℓ=1 k k k n nℓ = [2 i H(αjej′ y)][ yℓ ]g(y)dy. (2.9) Z k R Yj=1 Yℓ=1

Taking derivative of (2.9) with respect to αm,

k n k k nℓ ∂( Rk [2 i j=1 H(αjej′ y)][ ℓ=1 yℓ ]g(y)dy) R Q ∂αm Q

k n (nm+1) nℓ = 2 i Eg(y)[ym H′(αmem′ y)[ yℓ H(αjej′ y)] jY=m 6

k n (nm+1) nℓ = 2 i Eg(y)[ym H′(αmem′ y)]Eg(y)[ yℓ H(αjej′ y)] jY=m 6 = 0

The first equality is true because of Lebesgue dominated convergence theorem, the 25 second equality due to the independence of components. The last equality is due to the fact that

(nm+1) ym H′(αmym)

is an odd function of ym and g(.) is centrally symmetric around 0.

n1 n2 nk Therefore, E(y1 y2 ...yk ), for n1, n2,...,nk even, is constant as a function of αm.

If all αm = 0 then f(x)= g(x) and therefore

n1 n2 nk n1 n2 nk E(x1 x2 ...xk )= E(y1 y2 ...yk )

Finally,

n1 n2 nk n1 n2 nk E(x1 x2 ...xk )= E(y1 y2 ...yk )

is true for all αm. The required results follow immediately.

The first theorem aids in constructing families of densities. As for an example take the family of jpdf’s generated by the uniform kernel.

Example 2.2.1. Multivariate skew uniform distribution. Let’s take the uniform den- sity on the interval ( 1, 1). The density is given by −

1 u(z)= , z ( 1, 1). (2.10) 2 ∈ −

A k dimensional extension of this density is obtained by considering k independent − variables, x =(x1, x2,...,xk)′, each with density u(.). The density of z is

1 k k z u∗(z)=( ) , z ( 1, 1) . (2.11) ∼ 2 ∈ −

A skew symmetric density can be obtained by using the cdf of the standard normal distribution, Φ(.) In Theorem 2.2.1 above, use g(.)= u∗(.), and H(.)=Φ(.), the skew 26 2 variate skew uniform density for α =2, α =−2 1 2 1

0.8

0.6

0.4

0.2

0 −1

0

1 −0.5 −1 0.5 0 x 1 2 x 1

Figure 2.1: Surface of skew uniform density in two dimensions; α = 2, α = 2,H(.)=Φ(.). 1 2 −

symmetric density becomes

k k Φ(αjej′ x), x ( 1, 1) f(x, α)=  j=1 ∈ − Q  0, elsewhere.  Figures 2.1 to 2.3 show the flexibility of the skew uniform distribution.

A skew normal density is obtained from normal kernel in the following example.

Example 2.2.2. Multivariate skew normal densities In Theorem 2.2.1 above, let g(.)=

φk(.) where , φk(.) is the k-dimensional standard normal density function. Also let H(.) and α be defined as in Theorem 2.2.1. We can construct a density for k-dimensional 27

2 variable skew uniform density for α =2, α =0 1 2

1

0.8

0.6

0.4

0.2

0 1 0.5 1 0 0.5 0 −0.5 −0.5 −1 x −1 2 x 1

Figure 2.2: Surface of skew uniform density in two dimensions; α1 = 2, α2 = 0. 28

2 variable skew uniform density for α =2, α =10 1 2

1

0.8

0.6

0.4

0.2

0 1 0.5 1 0 0.5 0 −0.5 −0.5 −1 x −1 2 x 1

Figure 2.3: Surface of skew uniform density in two dimensions; α1 = 2, α2 = 10. 29

2 variable skew normal density for various choices of the α =−5, α =−5 α =−5, α =0 α =−5, α =5 1 2 1 2 1 2 4 4 4

2 2 2 2 2 2 x 0 x 0 x 0

−2 −2 −2

−4 −4 −4 −4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4 x x x 1 1 1 α =0, α =−5 α =0, α =0 α =0, α =5 1 2 1 2 1 2 4 4 4

2 2 2 2 2 2 x 0 x 0 x 0

−2 −2 −2

−4 −4 −4 −4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4 x x x 1 1 1 α =5, α =−5 α =5, α =0 α =5, α =5 1 2 1 2 1 2 4 4 4

2 2 2 2 2 2 x 0 x 0 x 0

−2 −2 −2

−4 −4 −4 −4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4 x x x 1 1 1

Figure 2.4: The contours for 2 dimensional skew normal pdf for different values for α and for H(.)=Φ(.).

joint p.d.f’s of the form

k k f(y, α) = 2 φk(y) H(αjej′ y) (2.12) Yj=1

Using Theorem 2.2.2 we can relate some properties of the density introduced by Example

2.2.2 with its kernel density, φ (.). Let x φ (x). Let y be the random variable with k ∼ k probability density function

k k f(y, α) = 2 φk(y) H(αiei′ y). Yi=1

Then, 30

α =3, α =5 1 2

0.4

0.2

0 3

2

1

0

−1 3 x 2 2 1 0 −2 −1 −2 x 1

Figure 2.5: The surface for 2 dimensional pdf for skew normal variable for α1 = 3, α2 = 5, H(.)=Φ(.). 31 p p (a) the even moments of y and x are the same, i.e E(yy′) = E(xx′) for p even

m m and E(y′y) = E(x′x) for any natural number m,

2 (b) y′y and x′x both have χk distribution.

Example 2.2.3. Multivariate skew-Laplace densities. Take g(.) as the joint density of k iid Laplace variables: 1 k g(x)= exp( x ). 2k − | j| Xj=1 Let H(.) and α be defined as in Theorem 2.2.1. We get the following density:

k k exp( x ) H(α e′ x). − | j| j j Xj=1 Yj=1

2.2.1 Independent Observations, Location-Scale Family

In this section, we consider a location scale family of skew symmetric random variables. But first observe the following:

Remark 2.2.1. The family of densities introduced by Equation 2.5 are different models from the family of densities introduced in Theorem 2.2.1 except for the case k = 1.

Remark 2.2.2. (Joint Density for Random Sample). Let g(.) be a density function symmetric about 0, H(.) be an absolutely continuous cumulative distribution func- tion with H′(.) symmetric around 0k. Let y1, y2,...,yn constitute a random sample from a distribution with density f(y, α) = 2g(y)H(αy). Then the joint p.d.f. of y =(y1, y2,...,yn)′ is

n n n n f(y, α) = 2 g∗(y) H(αyi) = 2 g∗(y) H(αei′ yi) Yi=1 Yi=1

n where g∗(y) = i=1 g(yi), belongs to the family of densities introduced in Theorem Q 2.2.1. Observe that density of the random sample can not be represented by the family 32

0.6

0.4 −2

0.2 −1

0 0 −2 1 −1

0 2

1 3 2 4 3 x 1 4 5 x 2

Figure 2.6: The surface for 2 dimensional pdf for skew Laplace variable for α1 = 3, α2 = 5, H(.)=Φ(.). − 33 of multivariate skew symmetric densities introduced by Equation 2.5, i.e., in the form f(y, α) = 2g(y)H(α′y). This puts a doubt on the usefulness of the latter.

Remark 2.2.3. (Joint Density of Independent Skew-Symmetric Variables). Let g(.) be a density function symmetric about 0, H(.) be an absolutely continuous cumulative distribution function with H′(.) symmetric around 0. Let

z f(z, α ) = 2g(z)H(α z) j ∼ j j

2 with mean µ(αj) and variance σ (αj) for j = 1, 2,...,k be independent variables. Then the joint p.d.f. of z =(z1, z2,...,zk)′ is

k k f(z, α) = 2 g∗(z) H(αjej′ z) Yj=1

k where α = (α1, α2 ...,αk)′, g∗(z) = j=1 g(zj) and ej′ are the elementary vectors of Q the coordinate system Rk. This density belongs to the family of densities introduced in Theorem 2.2.1. Note that the density of z can not be represented by the family of multivariate skew symmetric densities introduced by Equation 2.5.

k k Let z f(z, α) = 2 g∗(z) H(α e′ z). By construction, the mean of z is ∼ j=1 j j Q

E(z)= µ0 =(µ(α1),µ(α2),...,µ(αk))′, and covariance is

2 2 2 C(z)= Σ0 = diag(σ (α1), σ (α2),...,σ (αk)).

Now let Σ be a positive definite covariance matrix of dimension k, and let y = Σ1/2z. 34 Then k k 2 1/2 1/2 y f(y, α, Σ)= g∗(Σ− y) H(α e′ Σ− y). ∼ Σ 1/2 j j | | Yj=1

1/2 By letting αjΣ− ej = λj for j = 1, 2,...,k, we get

k k 2 1/2 f(y,λ ,λ ...,λ , Σ)= g∗(Σ− y) H(λ ′y). 1 2 k Σ 1/2 j | | Yj=1

This time, the mean of y is

1/2 E(y)= Σ µ0′ , and covariance matrix is

1/2 1/2 C(y)= Σ Σ0Σ .

Next, let x = y + µ where µ Rk. The probability density function of x is ∈

k k 2 1/2 1/2 f(x, α, Σ, µ)= g∗(Σ− (x µ)) H(α e′ Σ− (x µ)). Σ 1/2 − j j − | | Yj=1

The mean of x is

1/2 E(x)= Σ µ0 + µ, and covariance matrix of x is

1/2 1/2 C(x)= Σ Σ0Σ .

The results in the previous section lead to the following definition:

Definition 2.2.1. (Multivariate Skew-Symmetric Density). Let g(.) be a density func- tion symmetric about 0, H(.) be an absolutely continuous cumulative distribution func- tion with H′(.) symmetric around 0. A variable x has skew-symmetric distribution if it 35 has probability density function

k k 2 1/2 1/2 f(x, α, Σ,µ)= g∗(Σ− (x µ)) H(α e′ Σ− (x µ)) (2.13) Σ 1/2 − j j − | | Yj=1

k where α are scalars, α =(α , α ...,α )′, µ R , Σ is a positive definite covariance j 1 2 k ∈ k matrix, g∗(y) = j=1 g(yj) is a k-dimensional density function symmetric around 0. Q We call this density skew-symmetric density with µ,

1/2 g,H 1/2 Σ , and shape parameter α and denote it as ssk (µ, Σ , α).

Note that we could have taken the the matrix Σ1/2 in the linear form x = Σ1/2z + µ to be any (m k) matrix(for µ Rm). Then we do not always have a density for × ∈ the random variable x, nevertheless the distribution of x is defined by the moment generating function if it exists.

Let z ssg,H (0 ,I , α). Then the moment generating function of z evaluated at ∼ k k k k t R ,Mz(t ), can be obtained as follows. k ∈ k

k k t′ z t′ z t′ z k k k ∗ k Mz(tk)= E(e )= . . . e 2 g∗(z) H(αjej′ z)dz = Eg (z)(e H(αjej′ z)) Z ZRk Yj=1 Yj=1

Let x = Az + µ for constant (m k) matrix A and m dimensional constant vector µ. × m The moment generating function of x evaluated at t R is Mx(t ) can be obtained m ∈ m as follows.

k t′ t′ ′t′ z mµ mµ ∗ A m Mx(tm)= e Mz(A′tm)= e Eg (z)(e H(αjej′ z)) Yj=1 Definition 2.2.2. (Multivariate Skew Symmetric Random Variable) Let g(.) be a den- sity function symmetric about 0, H(.) be an absolutely continuous cumulative distri- bution function with H′(.) symmetric around 0. Let z f(z, α ) = 2g(z)H(α z) for j ∼ j j 36 j = 1, 2,...,k be independent variables. Then

k k z =(z , z ,...,z )′ f(z, α, Σ) = 2 g∗(z) H(α e′ z) 1 2 k ∼ j j Yj=1

k where g∗(z) = j=1 g(zj) and ej for j = 1, 2,...,k are the elementary vectors of the Q coordinate system Rk. Let A be a m k constant matrix, and µ be a k-dimensional × constant real vector. A random variable x = Az + µ is distributed with respect to multivariate skew symmetric distribution with location parameter µ, scale parameter A, and shape parameter α. We denote this by x SSg,H (µ, A, α). ∼ m,k

By this definition we can write the following properties:

Property 2.2.1. Let z f(z, α ) = 2g(z)H(α z) with mean µ(α ) and variance j ∼ j j j 2 σ (αj) for j = 1, 2,...,k be independent variables. Let z = (z1, z2,...,zk)′ and x =

g,H Az + µ and α = (α , α ...,α )′. Then, x SS (µ, A, α). By construction, the 1 2 k ∼ m,k mean of x is E(x) = Aµ0 + µ, and covariance matrix of x is C(x) = AΣ0A′ where

2 2 2 µ0 =(µ(α1),µ(α2),...,µ(αk))′ and Σ0 = diag(σ (α1), σ (α2),...,σ (αk)).

2.3 Multivariate Skew-Normal Distribution

2.3.1 Motivation: Skew-Spherically Symmetric Densities

Theorem 2.3.1. Let g(.) be a k dimensional spherical jpdf, i.e. satisfies g(x) = − k 2 k(x′x) for all x R for some density k(r ) defined for r 0, H(.) be a absolutely ∈ ≥ continuous cumulative distribution function with H′(.) is symmetric around 0, α = (α , α ,...,α ) be a k dimensional real vector, and e for j = 1, 2,...,k are the 1 2 k − j elementary vectors of Rk. Then

k 1 f(y, α)= c− g(y) H(αjej′ y) (2.14) Yj=1 37 k defines a probability density function where c = Ex( P (z α x ) x) for (z , z , j=1 j ≤ j j | 1 2 Q ...,z ) are identically distributed random variables with cdf H(.) independent of x k ∼ g(x).

Proof. f(y) is nonnegative, we have to prove Rk f(y)dy = 1. Write R k

cf(y)dy = g(y) H(αjej′ y)dy Z k Z k R R Yj=1

k

= Ex( P (z α x ) x). j ≤ j j | Yj=1

The last equality holds because z1, z2,...,zk are identically distributed random vari- ables with cdf H(.) independent of x g(x). This concludes the proof. ∼

When we choose g(.)= φk(.) and (z1, z2,...,zk)′ independent, this reduces to the den-

k sity introduced in Example 2.2.1. To see this observe that Ex( P (z α x ) x)= j=1 j ≤ j j | k k 1 Q P (z α x ) = P (z α x 0) = for both H′(.) and φ (.) are sym- j=1 j ≤ j j j=1 j − j j ≤ 2k 1 Q Q metric z α x j = 1, 2,...,k have iid symmetric distribution. j − j j

2.3.2 Multivariate Skew-Normal Distribution

Definition

In a recent article Chen and Gupta ([32]) pointed that neither of the multivariate skew-normal models (2.3, 2.4) cohere with the joint distribution of a random sample from a univariate skew-normal distribution and introduced an alternative multivariate skew-normal model which overcomes this problem ([32]):

k k SNk(y, α, Σ)= f(y, α, Σ) = 2 φk(y, Σ) Φ(λi′ y) (2.15) Yi=1 38 k 1/2 where α R and Λ=(λ , λ ,..., λ )= Σ− diag(α , α ,...,α ), φ (., Σ) is the k- ∈ 1 2 k 1 2 k k dimensional normal density function with covariance Σ and Φ(.) denotes the cumulative distribution function of the univariate standard normal variable. Chen and Gupta’s skew normal model ([32]) is in the form of skew symmetric density introduced in Definition 2.2.1.

First note that a random variable z with probability density function

f(z, α) = 2φ(z)Φ(αz) where α is a real scalar, φ(.) is the univariate standard normal density function and Φ(.) denotes the cumulative distribution function of the univariate standard normal

2 α 2 2α2 variable has mean µ(α)= π √ 2 , variance σ (α)=(1 π(1+α2) ). (Skewness: γ1 = q 1+α − 4 π (µ(α))3 (µ(α))4 − , Kurtosis: γ = 2(π 3) .) 2 (σ2(α))3/2 2 − (σ2(α))2 Let z f(z, α ) = 2φ(z)Φ(α z) for j = 1, 2,...,k be independent variables. Then j ∼ j j the joint p.d.f. of z =(z1, z2,...,zk)′ is

k k f(z, α1, α2 ...,αk) = 2 φk(z) Φ(αjej′ z) Yj=1

k where φk(z) = j=1 φ(zj) is the k-dimensional standard normal variable and ej are Q the elementary vectors of the coordinate system Rk. Mean of z is

E(z)= µ0 =(µ(α1),µ(α2),...,µ(αk))′, and covariance is

2 2 2 C(z)= Σ0 = diag(σ (α1), σ (α2),...,σ (αk)). 39 Now let Σ be a positive definite covariance matrix, and let y = Σ1/2z. Then

k k 2 1/2 1/2 y f(y, α, Σ)= φ (Σ− y) Φ(α e′ Σ− y). ∼ Σ 1/2 k j j | | Yj=1

Let x = y + µ where µ Rk. The probability density function of x is ∈

k k 2 1/2 1/2 f(x, α, Σ, µ)= φ (Σ− (x µ)) Φ(α e′ Σ− (x µ)). Σ 1/2 k − j j − | | Yj=1

Density

Definition 2.3.1. (Multivariate Skew Normal Density). We call the density

k k 2 1/2 1/2 f(x, α, Σ, µ)= φ (Σ− (x µ)) Φ(α e′ Σ− (x µ)) (2.16) Σ 1/2 k − j j − | | Yj=1 as the density of a multivariate skew normal variable with location parameter µ, scale

1/2 k parameter Σ , and shape parameter α = (α , α ...,α )′ R . and denote it by 1 2 k ∈ 1/2 snk(µ, Σ , α).

Remark 2.3.1. Neither models (2.2, 2.3) can represent the joint density of indepen- dent univariate variables x SN (σ , α ) except for some special cases. The joint j ∼ 1 j j density of independent univariate skew normal variables belong the new family of den- sities introduced by Definition 2.3.1.

We can take the matrix Σ1/2 in the linear form x = Σ1/2z+µ to be any (m k) matrix. × Then we do not have a density for the random variable x in all cases, nevertheless the distribution of x is defined by the moment generating function if it exists.

Moment Generating Function

We need the following lemmas: See Zacks ([61]) and Chen and Gupta ([17]). 40 Lemma 2.3.1. Let z φ (z). For scalar b, a Rk, and for Σ a positive definite ∼ k ∈ 1/2 b a z ′ matrix of order k E(Φ(b + ′Σ )) = Φ( (1+a Σa)1/2 ).

k Lemma 2.3.2. Let Z φk n(Z). For scalar b, a R , and for A and B positive ∼ × ∈ definite matrices of order k and n respectively, E(Φn(b + a′AZB)) = Φn(a, (1 +

1/2 a′AA′a) B).

Let z sn (0 ,I , α). The moment generating function of z evaluated at t Rk is ∼ k k k k ∈

Mz(tk) can be obtained as follows.

k ′ tkz Mz(tk)= Eφk(z)(e Φ(αjej′ z)) Yj=1 k k 2 1 z′z+t ′z − 2 k e′ z = k/2 e Φ(αj j )dz (2π) Z k R Yj=1 k k 2 1 (z′z 2t ′z) − 2 − k e′ z = k/2 e Φ(αj j )dz (2π) Z k R Yj=1 1 ′ k k tk tk 2 e 2 1 (z′z 2t ′z+t ′t ) − 2 − k k k e′ z = k/2 e Φ(αj j )dz (2π) Z k R Yj=1 1 ′ k k tk tk 2 e 2 1 (z t )′(z t ) − 2 − k − k e′ z = k/2 e Φ(αj j )dz (2π) Z k R Yj=1 1 ′ k k tk tk 2 e 2 1 2 (zj (tk)j ) = e− 2 − Φ(αjzj)dzj (2π)k/2 Z Yj=1 R k 1 ′ 1 1 2 k tk tk (yj ) = 2 e 2 e− 2 Φ(αjy + αj(tk)j)dyj Z j Yj=1 R (2π) k p k 1 t ′t αj(tk)j = 2 e 2 k k Φ( ) 2 Yj=1 (1 + αj ) p

Let x = Az + µ for constant (m k) matrix A and m dimensional constant vector µ. × m Then the moment generating function of x evaluated at t R is Mx(t ) can be m ∈ m 41 obtained as follows.

k ′ ′ 1 ′ ′ α (A′t ) tmµ k tmµ+ tmAA tm j m j Mx(tm)= e Mz(A′tm) = 2 e 2 Φ( ) 2 Yj=1 (1 + αj ) p

Hence the following definition and theorem.

Definition 2.3.2. (Multivariate Skew Normal Random Variable) Let z 2φ(z)Φ(α z) j ∼ j for j = 1, 2,...,k be independent univariate skew normal random variables. Then

k k z =(z , z ,...,z )′ 2 φ (z) Φ(α e′ z) 1 2 k ∼ k j j Yj=1

k where φk(z) = j=1 φ(zj) and ej′ are the elementary vectors of the coordinate sys- Q tem Rk. Let A be a m k constant matrix, and µ be a k-dimensional constant real × vector. A m dimensional random variable x = Az + µ is distributed with respect to multivariate skew symmetric distribution with location parameter µ, scale param- eter A, and k dimensional shape parameter α = (α1, α2 ...,αk)′. We denote this by x SN (µ, A, α). ∼ m,k

Theorem 2.3.2. If x has multivariate skew-normal distribution SNm,k(µ, A, α) then the moment generating function of x evaluated at t Rm is given by m ∈

k ′ 1 ′ ′ α (A′t ) k tmµ+ tmAA tm j m j Mx(tm) = 2 e 2 Φ( ). (2.17) 2 Yj=1 (1 + αj ) p

Moments

The expectation and covariance of a multivariate skew normal variable x with distri- bution SNm,k(µ, A, α) are easily calculated from the expectation and variance of the 42 univariate skew normal variable: Let z sn (0 ,I , α). Then, ∼ k k k

α1 1+α2  √ 1  α2 2  1+α2  E(x)= AE(z)+ µ = A  √ 2  + µ, rπ  .   .       αk   1+α2   √ k 

2 2 2 2 α1 α2 αk Cov(x)= ACov(z)A′ = A(Ik diag( 2 , 2 ,..., 2 ))A′. − π 1+ α1 1+ α2 1+ αk

Given its jmgf, the cumulants of the multivariate skew normal distribution with µ = 0 can be calculated. The cumulant generating function is given by

k 1 αj(A′t)j Kx(t)= log(Mx(t)) = k log(2) + t′AA′t log(Φ( )). 2 2 Xj=1 (1 + αj ) p

′ ′ ′ ′ αj ej A αj ej A t k 2 φ( 2 ) ∂Kx(t) √(1+αj ) √(1+αj ) ′t ′ t=0k = AA + α e A′t t=0k ∂t | j j | Xj=1 Φ( 2 ) √(1+αj )

α1 1+α2  √ 1  α2 2  1+α2  = A  √ 2  rπ  .   .       αk   1+α2   √ k 

′ ′ ′ ′ 2 αj ej A t αj ej A t k 2 2 φ′( 2 ) φ( 2 ) ∂ Kx(t) αj √(1+αj ) √(1+αj ) t=0 = AA′ + Aej e′ A′  ′ ′  ′ ′   t=0 k 2 j αj e A t αj e A t k ∂t∂t′ | 1+ αj ⊗ j − j | Xj=1 Φ( 2 ) Φ( 2 )  √(1+αj )  √(1+αj )       2 2 2 2 α1 α2 αk = A(Ik diag( 2 , 2 ,..., 2 ))A′. − π 1+ α1 1+ α2 1+ αk 43

3 k ∂ Kx(t) 8 4 (2) αj 3 t=0k =( ) ( ) Aej ej′ A′ Aej. ∂t∂t′∂t | π − p√π 2 ⊗ ⊗ Xj=1 1+ αj q

Linear Forms

By Definition 2.3.2 we can write z SN (0 ,I , α), and prove the following theo- ∼ k,k k k rems.

Theorem 2.3.3. Assume that y SN (µ, A, α) and x = By + d with B a (l m) ∼ m,k × matrix and d is a l-dimensional real vector. Then x SN (Bµ + d,BA, α). ∼ l,k

Proof. From assumption we have y = Az + µ, and so x = (BA)z +(Bµ + d), i.e., x SN (Bµ + d,BA, α). ∼ l,k

Corollary 2.3.1. Assume that y sn (µ, A, α) where A is a full rank square matrix ∼ k 1/2 of order k. Let Σ = AA′, and let Σ be the square root of Σ as defined in Section

1/2 d 1/2 1 2.1.1. Assume x SN (µ, Σ , α). Then x µ = O(y µ) where O = Σ A− . ∼ k,k − −

Corollary 2.3.2. Assume that y SN (µ, A, α) and let ∼ m,k

y(1) y =   , (2)  y   

µ(1) µ =   , (2)  µ   

A11 A12 A =    A21 A22   

(1) (1) (1) (1) where y , µ are l-dimensional, A11 is (l l). Then y SNl,k µ , ( A A ), α . × ∼  11 12  44

Proof. In theorem above put B = Il 0l (m l) and d = 0l.  × − 

Corollary 2.3.3. y(1) SN (µ(1), A, α(1)), y(2) SN (µ(2), B, α(2)) are inde- ∼ m,k ∼ m,k α(1) (1) (2) (1) (2) pendent, then x = y + y has SNm,2k(µ + µ , [A, B],  ) distribution. (2)  α    Corollary 2.3.4. Let y(i) SN (µ, A, α), be independent for i = 1, 2,...,n, then ∼ m,k α   α n (i)   S = i=1 y has SNm,nk(nµ, [A, A, . . . , A],  ) distribution. The jmgf can be  .  P  .       α  written as   k ′ 1 ′ ′ α (A′t ) nk tmnµ+ ntmAA tm j m j n MS(tm) = 2 e 2 (Φ( )) . 2 Yj=1 (1 + αj ) p The jmgf of S/n is

k ′ 1 1 ′ ′ α (A′t ) nk tmµ+ tmAA tm j m j n MS/n(tm) = 2 e 2 n (Φ( )) . 2 Yj=1 n (1 + αj ) p

Some Illustrations

To illustrate the preceding theory, we consider the bivariate skew normal distribution.

a11 ... a1k Assume (x , x )′ SN ((µ ,µ )′,   , (α ,...,α )′). The mean vector 1 2 ∼ 2,k 1 2 1 k  a21 ... a2k  is   a ... a  11 1k  E((x1, x2)′)= (µ(α1)0,...,µ(αk)0)′ +(µ1,µ2)′,  a21 ... a2k   

a11µ(α1) + . . . + a1kµ(αk) + µ1 =  0 0  (2.18)  a21µ(α1)0 + . . . + a2kµ(αk)0 + µ2    45 covariance can be written as

′ a ... a a ... a 11 1k 2 2 11 1k C((x1, x2)′)=   diag(σ (α1),...,σ (αk))    a21 ... a2k   a21 ... a2k     

′ 2 2 a11σ (α1) ... a1kσ (αk) a11 ... a1k =     2 2  a21σ (α1) ... a2kσ (αk)   a21 ... a2k     

2 2 2 2 2 2 a11σ (α1)+ . . . + a1kσ (αk) a11a21σ (α1)+ . . . + a1ka2kσ (αk) =   (2.19) 2 2 2 2 2 2  a11a21σ (α1)+ . . . + a1ka2kσ (αk) a21σ (α1)+ . . . + a2kσ (αk)    2 where µ(α)= π α , and σ2(α)=(1 2α ). 2 √1+α2 − π(1+α2) p First, let’s take k = 1.

Then the mean vector becomes

E((x1, x2)′)=(a11,a21)′µ(α1)0 +(µ1,µ2)′ =(µ(α1)0a11 + µ1,µ(α1)0a21 + µ2)′ (2.20)

π α1a11 π α1a21 =( + µ1, + µ2)′ (2.21) r 2 r 2 2 1+ α1 2 1+ α1 p p covariance can be written as

2 a2 a a 2 2α1  11 12 21  C((x1, x2)′)=(a11,a12)σ (α1)(a11,a12)′ = (1 2 ) . (2.22) − π(1 + α1) 2  a12a21 a22   

For k = 2, the mean vector becomes

a a  11 12  E((x1, x2)′)= (µ(α1)0,µ(α2)0)′ +(µ1,µ2)′ (2.23)  a21 a22    46

π α1a11 π α2a12 π α1a21 π α2a22 =( + + µ1, + + µ2)′ (2.24) r 2 r 2 r 2 r 2 2 1+ α1 2 1+ α2 2 1+ α1 2 1+ α2 p p p p covariance can be written as

′ a a a a 11 12 2 2 11 12 C((x1, x2)′)=   diag(σ (α1), σ (α2))   (2.25)  a21 a22   a21 a22     

2 2 2 2 2 2 σ (α1)a11 + σ (α2)a12 a11a21σ (α1)+ a12a22σ (α2) =   . (2.26) 2 2 2 2 2 2  a11a21σ (α1)+ a12a22σ (α2) σ (α1)a21 + σ (α2)a22    For k = 3, the mean is

a11µ(α1) + a12µ(α2) + a13µ(α3) + µ1  0 0 0  , (2.27)  a21µ(α1)0 + a22µ(α2)0 + a23µ(α3)0 + µ2    covariance is

2 2 2 2 2 2 a11σ (α1)+ a13σ (α3) a11a21σ (α1)+ a13a23σ (α3)   (2.28) 2 2 2 2 2 2  a11a21σ (α1)+ a13a23σ (α3) a21σ (α1)+ a23σ (α3)    47 Independence, Conditional Distributions

Theorem 2.3.4. Assume that y sn (µ, Σ1/2, α) and let ∼ k

y(1) y =   (2)  y    µ(1) µ =   (2)  µ    α(1) α =   , (2)  α    1/2 1/2 Σ11 Σ12 Σ1/2 =   1/2 1/2  Σ21 Σ22    where y(1), µ(1), α(1) are l-dimensional, Σ1/2 is (l l). If Σ1/2 = 0 and Σ1/2 = 0 then 11 × 12 21 (1) (1) 1/2 (1) (2) (2) 1/2 (2) y snl(µ , Σ11 , α ) is independent of y snk l(µ , Σ22 , α ). ∼ ∼ −

Proof. The inverse of Σ1/2 is

1/2 Σ− 0 1/2 11 Σ− =   . 1/2  0 Σ−22   

1/2 Note that the quadratic form in the exponent of snk(µ, Σ , α) is

1 (1) (1) 1 (1) (1) (2) (2) 1 (2) (2) Q =(y µ)′Σ− (y µ)=(y µ )′Σ− (y µ )+(y µ )′Σ− (y µ ). − − − 11 − − 22 −

Also

k l k 1/2 1/2 (1) (1) 1/2 (2) (2) Φ(α e′ Σ− (x µ)) = Φ(α e′ Σ− (x µ )) Φ(α e′ Σ− (x µ )) j j − j j 11 − j j 22 − Yj=1 Yj=1 j=Yl+1 48 and Σ = Σ Σ . The density of y can be written as | | | 11|| 22|

l l 1/2 2 1/2 (1) (1) 1/2 (1) (1) sn (µ, Σ , α)= φ (Σ− (y µ )) Φ(α e′ Σ− (y µ )) k Σ 1/2 l 11 − j j 11 − | 11| Yj=1 k l k l − 2 − 1/2 (2) (2) 1/2 (2) (2) φk l(Σ−22 (y µ )) Φ(αjej′ Σ−22 (y µ )) × Σ 1/2 − − − | 22| Yj=1 (1) 1/2 (1) (2) 1/2 (2) = snl(µ , Σ11 , α )snk l(µ , Σ22 , α ). −

Since the ordering of variables are irrelevant, the above discussion also proves the following corollary:

Corollary 2.3.5. Let y sn (µ, Σ1/2, α) and assume that [i, j] partitions the indices ∼ k k = 1, 2,...,k such that (Σ1/2) = 0 for all i i and all j j. Then, the marginal (ij) ∈ ∈ joint distribution of variables with indices in i is skew normal with location, scale, and shape parameters are obtained by taking the corresponding components of µ, Σ1/2 and α, respectively.

We can also prove the following:

Corollary 2.3.6. Let y sn (µ, A, α) write A = (a′ , a′ ,..., a′ )′ and assume that ∼ k 1 2 k

[i, j] are two disjoint subsets of the indices k = 1, 2,...,k such that ai′ aj = 0 for all i i and all j j. Let Ai denote the matrix of rows of A with indices in i and Aj be ∈ ∈ the matrix of rows of A with indices in j. Then, then there exists a k dimensional skew normal variable, say w, that can be obtained by a nonsingular linear transformation to y, so that for w the variables with indices in i are independent of the variables with indices in j.

Proof. The hypothesis about the matrix of linear transformation A requires that AA′ = Σ to be such that (Σ) = 0 for all i i and all j j. Let x sn (µ, Σ1/2, α), and for ij ∈ ∈ ∼ k 49 this distribution we have the variables with indices in i independent of the variables with indices in j by Corollary 2.3.5. By corollary 2.3.1, we can write y µ =d O(x µ) − − 1/2 1 for O = Σ A− . This concludes the proof.

Corollary 2.3.7. Assume that x sn (µ, A, α) and let ∼ k

x(1) x =   , (2)  x   

µ(1) µ =   , (2)  µ   

A11 A12 A =    A21 A22    where x(1), µ(1) are l-dimensional, A is (l l). Let 11 ×

1 Il A12A22− C =  −  .  0 Ik l   − 

Consider the variable

1/2 1 y =(CAA′C′) (CA)− (Cx Cµ). −

Write y(1) y =   (2)  y    (1) 1/2 (1) where y is l-dimensional. Then y sn (0 , (CAA′C′) , α), and y is indepen- ∼ k k dent of y(2). 50 Corollary 2.3.8. Assume that x sn (µ, Σ1/2, α) and let ∼ k c

x(1) x =   , (2)  x   

µ(1) µ =   , (2)  µ   

α(1) α =   , (2)  α   

A 0 1/2  11  Σc =  A21 A22    where x(1), µ(1) are l-dimensional, A is (l l) lower triangular matrix with positive 11 × diagonal elements, A is ((k l) (k l)) lower triangular matrix with positive diagonal 22 − × − elements. Let

Il 0 C =   . 1  A21A11− Ik l   − −  Consider the variable x(1) y = Cx =   . (2)  y    (1) (2) (2) 1 (1) Then x is independent of y = x A A− x which has − 21 11 (2) 1 (1) (2) snk l(µ A21A11− µ ,A22, α ) distribution. The joint distribution of y is given by − −

(1) (1) (2) 1 (1) (2) snl(µ ,A11, α )snk l(µ A21A11− µ ,A22, α ). − −

(2) (2) 1 (1) The density of x then can be obtained by substituting y = x A A− x , the − 21 11 51 density of x is

l l 2 1 (1) (1) (1) 1 (1) (1) φ (A − (x µ )) Φ(α e′ A − (x µ )) A k 11 − j j 11 − | 11| Yj=1 k l 2 − 1 (2) (2) 1 (1) (1) φ (A − (x (µ + A A− (x µ )))) × A k 22 − 21 11 − | 22| k l − (2) 1 (2) (2) 1 (1) (1) Φ(α e′ A − (x (µ + A A− (x µ )))). × j j 22 − 21 11 − Yj=1

(2) (1) (2) 1 (1) (1) (2) The conditional distribution of x given x is snk l(µ +A21A11− (x µ ),A22, α ). − −

2.3.3 Generalized Multivariate Skew-Normal Distribution

We can define a multivariate skew normal density with any choice of skewing cdf H that has properties in Theorem 2.2.1. In the next section, we define and study some properties of this generalized multivariate skew normal density

Definition 2.3.3. (Generalized Multivariate Skew Normal Density) Let x have a mul- tivariate skew symmetric distribution with kernel φk(.) and skewing distribution H(.)

1/2 defined as in Definition 2.2.1 with location parameter µ = 0k, scale parameter Σ ,

k and shape parameter α =(α , α ...,α )′ R ., i.e. the density of x is 1 2 k ∈

k k 2 1/2 1/2 f(x, α, Σ, µ = 0 )= φ (Σ− x) H(α e′ Σ− x). k Σ 1/2 k j j | | Yj=1

H We call this density the generalized skew normal density, and denote it by gsnk (µ, Σ, α).

Definition 2.3.4. (Generalized Multivariate Skew Normal Random Variable) Let H(.) be defined as in Theorem 2.2.1. Let z 2φ(z)H(α z) for j = 1, 2,...,k be independent j ∼ j univariate skew symmetric random variables. Then

k k z =(z , z ,...,z )′ 2 φ (z) H(α e′ z) 1 2 k ∼ k j j Yj=1 52 k where φk(z) = j=1 φ(zj) and ej′ are the elementary vectors of the coordinate system Q Rk. Let A be a m k constant matrix, and µ be a k-dimensional constant real vector. × A m dimensional random variable x = Az + µ is distributed with respect to multivari- ate skew symmetric distribution with location parameter µ, scale parameter A, and k dimensional shape parameter α =(α1, α2 ...,αk)′. We denote this by x GSN H (µ, A, α). ∼ m,k

Theorem 2.3.5. If x has generalized multivariate skew symmetric distribution GSN H (µ, A, α) then the moment generating function of x evaluated at t Rm is m,k m ∈ given by k 1 ′ ′ k tmµ+ 2 tm AA tm Mx(tm) = 2 e Eφ(y)(H(αjyj + αj(A′tm)j)) (2.29) Yj=1

H Proof. If z has generalized multivariate skew symmetric density gsnk (0k,Ik, α) then the moment generating function of z evaluated at t Rk is given by k ∈ 53

k ′ tkz Mz(tk)= Eφk(z)(e H(αjej′ z)) Yj=1 k k 2 1 z′z+t ′z − 2 k e′ z = k/2 e H(αj j )dz (2π) Z k R Yj=1 k k 2 1 (z′z 2t ′z) − 2 − k e′ z = k/2 e H(αj j )dz (2π) Z k R Yj=1 1 ′ k k tk tk 2 e 2 1 (z′z 2t ′z+t ′t ) − 2 − k k k e′ z = k/2 e H(αj j )dz (2π) Z k R Yj=1 1 ′ k k tk tk 2 e 2 1 (z t )′(z t ) − 2 − k − k e′ z = k/2 e H(αj j )dz (2π) Z k R Yj=1 1 ′ k k tk tk 2 e 2 1 2 (zj (tk)j ) = e− 2 − H(αjzj)dzj (2π)k/2 Z Yj=1 R k 1 ′ 1 1 2 k tk tk (yj ) = 2 e 2 e− 2 H(αjy + αj(tk)j)dyj Z j Yj=1 R (2π) k p 1 ′ k 2 tk tk = 2 e Eφ(y)(H(αjyj + αj(tk)j)) Yj=1

Let x = Az + µ for constant (m k) matrix A and m dimensional constant vector µ. × m Then the moment generating function of x evaluated at t R is Mx(t ) can be m ∈ m obtained as follows.

k ′ ′ 1 ′ ′ tmµ k tmµ+ 2 tm AA tm Mx(tm)= e Mz(A′tm) = 2 e Eφ(y)(H(αjyj + αj(A′tm)j)) Yj=1

By Definition 2.3.4 we can write z gsnH (0 ,I , α), and prove the following theorem. ∼ k,k k k

Theorem 2.3.6. Assume that y GSN H (µ, A, α) and x = By+d with B a (l m) ∼ m,k × matrix and d is a l-dimensional real vector. Then x GSN H (Bµ + d,BA, α). ∼ l,k 54 Proof. From assumption we have y = Az + µ, and so x = (BA)z +(Bµ + d), i.e., x GSN H (Bµ + d,BA, α). ∼ l,k

Theorem 2.3.7. Assume that y GSN H (µ, A, α) and let ∼ m,k

y(1) y =   , (2)  y   

µ(1) µ =   , (2)  µ   

A11 A12 A =    A21 A22    where y(1), µ(1) are l-dimensional, A is (l l). Then y(1) GSN H (µ(1),A , α). 11 × ∼ l,k 11

(1) Proof. In theorem above put B = Il 0l (m l) and d = 0l. Then we obtain y  × −  ∼ GSN H µ(1), ( A A ), α . l,k  11 12 

In the rest of this section we study some members of the generalized multivariate skew normal distribution. Different choices for the skewing function H(.) will give different distribution families. We will take H(.) to be the cdf of Laplace, logistic, or uniform in the sequel. Some properties of these distributions are studied.

Multivariate Skew Normal-Laplace Distribution

The cdf of Laplace distribution is given by

1 1 2 exp( x), x 0, H1(x)=  − − ≥  1 2 exp(x),x< 0  55

If we put H(.) = H1(.) in the definition of generalized skew normal density in 2.3.3 we get the skew normal-Laplace density. To evaluate the jmgf of skew normal-Laplace random vector we use the following lemma ([40]):

Lemma 2.3.3. Let z φ(z). Then ∼

1 1 (a + b2) a 1 1 a b2 E (H (a + bz)) = exp(a + b2)Φ(− )+Φ( ) exp( a + b2)Φ( − ). z 2 2 2 b b − 2 − 2 b | | | | | |

Corollary 2.3.9. The jmgf of a k dimensional skew normal-Laplace random vector − with location parameter µ, scale parameter A, and shape parameter α; call x, is given by

′ 1 ′ ′ k t µ+ t AA tm Mx(tm) = 2 e 2

k 2 2 1 α + 1 (A′t)2 (αj +(A′t)j ) αj 1 α + 1 (A′t)2 αj (A′t)j ( e j 2 j Φ(− )+Φ( ) e− j 2 j Φ( − )) × 2 (A t) (A t) − 2 (A t) Yj=1 | ′ j| | ′ j| | ′ j|

Proof. By Theorem 2.3.5 and Lemma 2.3.3.

The odd moments of the standard univariate skew normal-Laplace random variable are given in ([50]), even moments are the same as the even moments of univariate standard normal random variable. Suppose x is a random variable with standard skew normal-Laplace distribution with shape parameter α. The first moment of x is given by

α2 µ(α)= E(x) = 2αe 2 Φ( α), − and we can calculate the variance from first and second moments:

2 2 2 2 α 2 σ (α)= var(x)= E(x ) E (x) = 1 (2αe 2 Φ( α)) . − − −

Let x have k dimensional skew normal-Laplace distribution with location parameter − 56 µ, scale parameter A, and shape parameter α, then

µ(α1)   µ(α )  2  E(x)= A   + µ,  .   .       µ(αk)   

2 2 2 Cov(x)= A(I diag(µ(α ) ,µ(α ) ,...,µ(α ) ))A′. k − 1 2 k

Multivariate Skew Normal-

The cdf of logistic distribution is given by

1 H2(x)= x 1+ e−

1 ℓx  ∞  −  e− , x 0, ℓ=0 ≥  P  ℓ  =      1  x  −  ℓx  e ℓ∞=0 e ,x< 0   P  ℓ    

If we put H(.) = H2(.) in the definition of generalized skew normal density in 2.3.3 we get the skew normal-logistic density. To evaluate the jmgf of skew normal-logistic random vector we use the following lemma ([40]):

Lemma 2.3.4. Let z φ(z). Then ∼

2 2 ∞ 1 (ℓb)2 ((ℓ+1)b)2 ℓa+ a ℓb (ℓ+1)a+ a (ℓ + 1)b E (H (a+bz)) =  −  e− 2 Φ( − )+e 2 Φ(− − ). z 2 b b Xℓ=0  ℓ  | | | |  

Corollary 2.3.10. The jmgf of a k dimensional skew normal-logistic random vector − with location parameter µ, scale parameter A, and shape parameter α; call x, is given 57 by

′ 1 ′ ′ k t µ+ t AA tm Mx(tm) = 2 e 2

k ′ 2 2 ∞ 1 (ℓ(A t) ) ℓα + j αj ℓ(A′t)j  −  e− j 2 Φ( − ) × (A′t)j Yj=1 Xℓ=0  ℓ  | |  ′ ((ℓ+1)(A t) )2 2 (ℓ+1)α + j αj (ℓ + 1)(A′t)j + e j 2 Φ(− − ). (A t) | ′ j|

Proof. By Theorem 2.3.5 and Lemma 2.3.4.

The odd moments of the standard univariate skew normal-logistic random variable are given in ([50]), even moments are the same as the even moments of univariate standard normal random variable. Suppose x is a random variable with standard skew normal-logistic distribution with shape parameter α. The first moment of x is given by

µ(α)= E(x)

∞ 1 (ℓα)2 ∞ 1 ((ℓ+1)α)2 = 2α  ℓ  −  e 2 Φ( ℓα) (ℓ + 1)  −  e 2 Φ( (ℓ + 1)α) , − − − − Xℓ=0  ℓ  Xℓ=0  ℓ         and we can calculate the variance from first and second moments:

σ2(α)= var(x)= E(x2) E2(x) = 1 (µ(α))2. − −

Let x have k dimensional skew normal-logistic distribution with location parameter − 58 µ, scale parameter A, and shape parameter α, then

µ(α1)   µ(α )  2  E(x)= A   + µ,  .   .       µ(αk)   

2 2 2 Cov(x)= A(I diag(µ(α ) ,µ(α ) ,...,µ(α ) ))A′. k − 1 2 k

Multivariate Skew Normal-uniform Distribution

The cdf of uniform distribution is given by

0,x< h,  −  x+h H3(x)=  , h x h,  2h − ≤ ≥ 1,h

If we put H(.) = H3(.) in the definition of generalized skew normal density in 2.3.3 we get the skew normal-uniform density. To evaluate the jmgf of skew normal-uniform random vector we use the following lemma ([40]):

Lemma 2.3.5. Let z φ(z). Then ∼

b (h + a)2 Ez(H3(a + bz)) = | | exp( ) 2h√2π − 2b2 b (h a)2 | | exp( − ) − 2h√2π − 2b2 a 1 h a +( )(Φ( − ) 1) 2h − 2 b − | | a 1 h + a +( + )Φ( ). 2h 2 b | |

Corollary 2.3.11. The jmgf of a k dimensional skew normal-uniform variable with − 59 location parameter µ, scale parameter A, and shape parameter α; call x, is given by

k 2 2 ′ 1 ′ ′ t k t µ+ t AA tm (A′ )j) (h + αj) Mx(t ) = 2 e 2 | |exp( ) m √ − 2(A t) Yj=1 2h 2π ′ j 2 ((A′t)j) (h αj) | |exp( − 2 ) − 2h√2π −2((A′t)j) α 1 h α +( j )(Φ( − j ) 1) 2h − 2 (A t) − | ′ j| α 1 h + α +( j + )Φ( j ). 2h 2 (A t) | ′ j|

Proof. By Theorem 2.3.5 and Lemma 2.3.5.

The odd moments of the standard univariate skew normal-uniform random variable are given in ([50]), even moments are the same as the even moments of univariate standard normal random variable. Suppose x is a random variable with standard skew normal-uniform distribution with shape parameter α. The first moment of x is given by α h µ(α)= E(x)= (2Φ( ) 1), h α − and we can calculate the variance from first and second moments:

σ2(α)= var(x)= E(x2) E2(x) = 1 (µ(α))2. − −

Let x have k dimensional skew normal-uniform distribution with location parameter − µ, scale parameter A, and shape parameter α, then

µ(α1)   µ(α )  2  E(x)= A   + µ,  .   .       µ(αk)    60 2 2 2 Cov(x)= A(I diag(µ(α ) ,µ(α ) ,...,µ(α ) ))A′. k − 1 2 k

2.3.4 Stochastic Representation, Random Number Genera- tion

We have defined k-dimensional multivariate skew symmetric random variable through a location scale family based on k independent univariate skew symmetric random variables. This fact can be utilized to give a stochastic representation for the multi- variate skew symmetric random variable which in turn can be used for random number generation.

By putting k = 1 in Theorem 2.2.1, we get a density of univariate skew symmetric variable f(y, α) = 2g(y)H(αy) (2.30) where α is a real scalar, g(.) is a univariate density function symmetric around 0 and

H(.) is a absolutely continuous cumulative distribution function with H′(.) = h(.) is symmetric around 0.

A probabilistic representation of a univariate skew symmetric random variable is given in the following ([10]).

Theorem 2.3.8. Let h(.) and g(.) be defined as above. If v h(v) and u g(u) are ∼ ∼ independently distributed random variables then the random variable

u, v αu, y =  ≤  u, v > αu −  has density f(y, α) as given in (2.30) above.

Corollary 2.3.12. Let z f(z, α ) = 2g(z)H(α z) j ∼ j j 61 for j = 1, 2,...,k be independent variables. The joint p.d.f. of z =(z1, z2,...,zk) is

k k f(z, α) = 2 g∗(z) H(αjej′ z) Yj=1

k where g∗(z) = j=1 g(zj) and ej′ are the elementary vectors of the coordinate system Q 1/2 1 Rk. For µ Rk Let Σ be a positive definite matrix, x = Σ 2 z + µ has p.d.f. ∈

k k 1/2 2 1/2 1/2 f(x, α, Σ , µ)= g∗(Σ− (x µ)) H(α e′ Σ− (x µ)). (2.31) Σ 1/2 − j j − | | Yj=1

A random variable y with density f(y, α, Σ1/2, µ) has the same distribution with x.

Using Theorem 2.3.8 and Corrolary 2.3.12, we can generate skew symmetric random

1/2 vectors from snk(µ, Σ , α) density.

2.4 Extensions: Some Other Skew Symmetric Forms

In Theorem 2.2.1, and 2.2.2 we have seen that a function of the form

k 1 f(y, α)= c− g(y) H(αjyj) Yj=1 is a jpdf for a k dimensional random variable if certain assumptions are met. Genton − and Ma ([45]) obtain a flexible family of densities by replacing αy in this density by an odd function, say w(y). This motivates the following theorem.

Theorem 2.4.1. Let g(.) be the k-dimensional density centrally symmetric around 0k (i.e. g(x)= g( x)), H(.) be a absolutely continuous cumulative distribution function − 62 with H′(.) is symmetric around 0, let w(.) be an odd function. Then

k 1 f(y)= c− g(y) H(w(ej′ y)) (2.32) Yj=1

k is a jpdf for a k dimensional random variable where c = Ex( P (z w(x ) x) − j=1 j ≤ j | Q for (z , z ,...,z )′ iid random variables with cdf H(.) independent of x g(x). 1 2 k ∼

k Proof. g(y) j=1 H(w(yj)) is positive. We need to calculate c. Q k

c = g(y) H(w(yj))dy Z k R Yj=1 k

= Ex( P (z w(x ) x) j ≤ j | Yj=1

since (z , z ,...,z )′ iid random variables with cdf H(.) independent of x g(x). 1 2 k ∼

Observe that when g(x) is a density for k independent variables x1, x2,...,xk, for example, when we choose g(x)= 1 , g(x)= φ (x), or g(x)= 1 exp( k x ) like 2k k 2k − j=1 | j| P in Examples 2.2.1, 2.2.2, and 2.2.3 correspondingly, k P (z w(x )) = k P (z j=1 j ≤ j j=1 j − 1 Q Q w(x ) 0) becomes since assuming both H′(x) and φ (x) are symmetric implies j ≤ 2k 1 that w(x ) and therefore z w(x ) for j = 1, 2,...,k have independent and identical j j − j symmetric distributions.

Some examples of models motivated by 2.4.1 are presented next:

Example 2.4.1. Take w(x) = λ1x , we obtain a multivariate form of the skew 2 √1+λ2x symmetric family introduced by Arellanno-Valle at al. ([7]). This density is illustrated in two dimensions in Figure 2.7.

Example 2.4.2. Take w(x) = αx + βx3, we obtain a multivariate form of the skew symmetric family introduced by Ma and Genton ([45]). This density is illustrated in two dimensions in Figure 2.8. 63

λ =2, λ =−4 λ =50, λ =15 11 12 21 22

0.4

0.3

0.2

0.1

0 4 2 0 −4 −2 −2 0 2 x −44 x 1 2

Figure 2.7: The surface for 2 dimensional pdf for skew-normal variable in Example 2.4.1 for α1 = 2, α2 = 2, β1 = 5, β2 = 10: 64

Example 2.4.2 with α =2, α =1 β =5, β =10 1 2 1 2

4 0.5 3

2

0 1 4 3 0 2 1 −1 x 0 1 −1 x −2 2 −2

Figure 2.8: The surface for 2 dimensional pdf for skew-normal variable in Example 2.4.2 for α1 = 2, α2 = 2, β1 = 5, β2 = 10: 65 α =2, α =4 λ =1, λ =2 1 2 1 2

0.25

0.2

0.15

0.1

0.05

0 4 4 2 2 0 0

x −2 −2 2 x 1

Figure 2.9: The surface for 2 dimensional pdf for skew-normal variable in Example 2.4.3 for α1 = 2, α2 = 4, λ1 = 1, λ2 = 2:

Example 2.4.3. Take w(x) = sign(x) x α/2λ(2/α)1/2, we obtain a multivariate form | | of the skew symmetric family introduced by DiCiccio and Monti ([23]). This density is

illustrated in two dimensions in Figures 2.9, 2.10, 2.11 and 2.4.3.

If we take the linear transformation of this variable we get the following graphs:

By the definition of multivariate weighted distribution in Equation 2.6, the skewing

function of the multivariate variate skew normal density in Definition 2.3.1, i.e.

k 1 Φ(α e′ (A− (x µ))), j j − Yj=1

can be replaced by

1 Φ (ΓA− (x µ)) m − 66

α =2, α =4 λ =1, λ =2 1 2 1 2

2.5

2

1.5

1 2

x 0.5

0

−0.5

−1

−1.5

−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 x 1

Figure 2.10: The contours for 2 dimensional pdf for skew-normal variable in Example 2.4.3 for α1 = 2, α2 = 4, λ1 = 1, λ2 = 2: 67

α =2, α =4 λ =1, λ =2 1 2 1 2

0.2

0.15

0.1

0.05

0 −5

0 8 6 4 2 0 −2 −45 x x 1 2

Figure 2.11: The surface for 2 dimensional pdf for skew-normal variable in Example 2.4.3 1/2 1 0 for α = 2, α = 4, λ = 1, λ = 2 with location µ = (0, 0)′, Σ = 1 2 1 2  1 2  68

α =2, α =4 λ =1, λ =2 1 2 1 2 6

5

4

3

2 2 x

1

0

−1

−2

−1 −0.5 0 0.5 1 1.5 2 2.5 x 1

Figure 2.12: The contours for 2 dimensional pdf for skew-normal variable in Example 2.4.3 1/2 1 0 for α = 2, α = 4, λ = 1, λ = 2 with location µ = (0, 0)′, Σ = . 1 2 1 2  1 2  69 where Γ is a matrix of dimension m k. In this case, the normalizing constant 2k will × have to be changed to

1 Ez(P (y < ΓA− (z µ)) z) − | and for y φ (0 ,I ) and z φ (µ, AA′) this is equal to ∼ k k m ∼ k

Φm(0; Im + ΓΓ′).

This extension to multivariate skew normal density is given by Gupta et al. [34] and is similar to the fundamental skew distribution of Arellano-Valle, Branco, and Genton ([5]). Note that if k = m and Γ is diagonal then we are back to the density in Definition 2.3.1. 70

CHAPTER 3

A Class of Matrix Variate Skew Symmetric Distributions

3.1 Distribution of Random Sample

In order to make inferences about the population parameters of a k dimensional dis- − tribution, we work with a random sample of n individuals from this population which can be represented by a k n matrix X. The typical multivariate analysis assumes in- × dependence among the individuals. However, in many situations this assumption may be too restrictive. Matrix variable distributions can account for dependence among individual observations.

In the normal case, the matrix variate density of X is written as ([36])

1 1 1 etr( 2 (AA′)− (X M)(B′B)− (X M)′) φk n(X; M, AA′, B′B)= − − − . (3.1) × (2π)nk/2 AA n/2 B B k/2 | ′| | ′ |

Like in the multivariate case the matrix A determines how the variables are related, also a matrix B is introduced to account for dependence among individuals. Note that, for the density in Equation 3.1 be defined, AA′ and B′B should be positive definite. In 71 practice, however, there will be natural restrictions on the parameter space. First, the knowledge about the sample design through which the variable X is being observed can be reflected to the parameter space. Second, simplifying assumptions about the relationship of k variables, like independence or zero correlation assumptions, can be utilized to achieve model parsimony. Finally, if there are no restrictions on AA′ or B′B then these parameters are confounded ([37]).

A detailed investigation of the sources, magnitude and impact of errors is necessary to identify how survey design and procedures may be improved. For example, most and sample designs involve some overlapping between interviewer workload and the sampling units (clusters). For those cases, a proportion of the measurement variance which is due to interviewers is reflected to some degree in the sampling variance calculations. The variable effects that interviewers have on respondent answers are sometimes labeled the correlated response variance ([14]). In the next example, we illustrate the use of matrix variate normal distribution to account for the correlated response variance.

Example 3.1.1. Suppose that the k dimensional random variable x has spherical − 1 1 multivariate normal density over a population, i.e. x φ ((σI )− x). In order to ∼ σk k k make inferences about the scale parameter σ2 a random sample of individuals from this population are to be observed. In a realistic setting, there will be more than one interviewer to accomplish the work, each interviewer would observe a partition of the random sample of individuals. It is reasonable to assume that each interviewer intro- duces a degree of correlation to their observations and that observations from different interviewers are uncorrelated. Now, for simplicity, assume that there are only two in- terviewers available to observe the random sample of n individuals, say I-1 and I-2.

I-1 is going to observe n1 of the individuals and introduces a correlation ρ1 and I-2 is responsible of observing the remaining of the sample and introduces a correlation of ρ2. As the following results show, the measurement variance, which is due to the 72 interviewers, is reflected to some degree in the sampling variance calculations.

Likelihood function given the observations can be represented with the matrix variate normal density: X =(x1, x2,..., xn) φk n(X; M, AA′, B′B) with M = 0k n, AA′ = ∼ × × 2 σ Ik and

Ψ1 0n n1 n n1 B′B =  − × −  =Ψ. 0 Ψ  n n1 n n1 2   − × −  Ψ is a n n symmetric matrix with all diagonal entries equal to one and all its 1 1 × 1 off-diagonal entries equal to ρ . Similarly Ψ is a n n n n symmetric matrix 1 2 − 1 × − 1 with all diagonal entries equal to one and all its off-diagonal entries equal to ρ2.

The log-likelihood is

2 1 2 nk/2 k/2 k/2 σ− 1 l(σ, ρ ,ρ ; X) = log( ) log(σ ) log Ψ log Ψ tr(X′XΨ− ). 1 2 (2π)nk/2 − − | 1| − | 2| − 2

2 In order to find the ML estimators of ρ1,ρ2 and σ , we equate the partial derivatives of l(σ, ρ1,ρ2; X). According to this, if both ρ1 and ρ2 are known, asymptotically unbiased maximum likelihood estimator of σ2 is given by

(Ψ) tr(X XΨ 1) σ2 = ′ − . (3.2) nk b ′ ˆ2 tr(X X) The usual maximum likelihood estimator from the uncorrelated model, σ = nk , is 2 asymptotically biased for σ where the size of bias depends on the magnitudes of ρ1 and

ρ2. A detailed investigation of the sources, magnitude and impact of errors is necessary to identify how survey design and procedures may be improved.

Chen and Gupta extend the matrix normal distribution to accommodate skewness in the following form ([17]):

f1(X, Σ Ψ, b)= c1∗φk n(X; Σ Ψ)Φn(X′b, Ψ) (3.3) ⊗ × ⊗ 73 1 where c1∗ = (Φn(0, (1 + b′Σb)Ψ))− . A drawback of this definition is that it allows independence only over its rows or columns, but not both. Harrar ([39]) give two more definitions for the matrix variate skew normal density:

f2(X, Σ, Ψ, b, Ω) = c2∗φk n(X; Σ Ψ)Φn(X′b, Ω) (3.4) × ⊗ and

f3(X, Σ, Ψ, b, B)= c3∗φk n(X; Σ Ψ)Φn(tr(B′X)) (3.5) × ⊗

1 where c2∗ = (Φn(0, (Ω+b′Σb)Ψ))− ,c3∗ = 2; Σ, Ψ, and Ω are positive definite covariance matrices of dimensions k, n and n respectively, B is a matrix of dimension k n. Note × that if Ω = Ψ then f2 is the same as f1. Although, more general than f1, the densities f2 and f3 still do not permit independence of rows and columns simultaneously.

A very general definition of skew symmetric variable for the matrix case can be obtained from matrix variate selection models. Suppose X is a k n with density × f(X), let g(X) be a weight function. A weighted form of density f(X) is given by

f(X)g(X) h(X)= . (3.6) Rk×n g(X)f(X)dX R When the sample is only a subset of the population then the associated model would be called a selection model. In their recent article article Dom´ınguez-Molina, Gonz´alez- Far´ıas,Ramos-Quiroga and Gupta introduced the matrix variable closed skew normal distribution in this form ([8]).

In this chapter, a construction for a family of matrix variable skew-symmetric densities that allows for independence among both variables and individuals is studied. We also consider certain extensions and relation to matrix variable closed skew normal distribution. 74 3.2 Matrix Variable Skew Symmetric Distribution

To define a matrix variate distribution from the multivariate skew symmetric distri- bution first assume that z ssg,H (0 ,I , α ) for i = 1, 2,...,n are independently i ∼ k k k i distributed random variables. Write Z =(z1, z2,..., zn). We can write the density of X as a product as follows,

n k k 2 g∗(Z) H(αjiej′ zi). Yi=1 Yj=1

This is equal to n k nk 2 g∗∗(Z) H(αjiej′ Zci). Yi=1 Yj=1

Let A, and B be nonsingular symmetric matrices of order k, and n respectively, also assume M is a k n matrix. Define the matrix variate skew symmetric variable as × X = AZB + M.

Definition 3.2.1. (Matrix Variate Skew-Symmetric Density) Let g(.) be a density function symmetric about 0, H(.) be an absolutely continuous cumulative distribution function with H′(.) symmetric around 0. A variable X has matrix variate skew sym- metric distribution if it has probability density function

nk 1 1 n k 1 1 2 g∗∗(A− (X M)B− ) H(αjie′ (A− (X M)B− )ci) − i=1 j=1 j − (3.7) Q AQn B k | | | |

k n where α are real scalars, M R × ,A and B be nonsingular symmetric matrices of ji ∈ n k order k and n respectively. Finally, g∗∗(X) = i=1 j=1 g(yij). The density is called Q Q matrix variate skew-symmetric density with location parameter M, scale parameters

g,H (A, B), and shape parameter ∆=(αji), and it is denoted by mssk n(M, A, B, ∆). ×

g,H Let Z mssk n(0k n,Ik,In, ∆). The the moment generating function of Z evaluated ∼ × × 75 k n at Tk n R × is MZ (Tk n) and can be obtained as follows: × ∈ ×

MZ (Tk n)= E(etr(Tk′ nZ)) × × n k k = etr(Tk′ nZ)2 g∗∗(Z) H(αjiej′ Zci)dZ ZRk×n × Yi=1 Yj=1 n k

∗∗ = Eg (Z)(etr(Tk′ nZ) H(αjiej′ Zci)). × Yi=1 Yj=1

Let X = AZB + M for constant (k k) matrix A, (n n) matrix B and k n × × × dimensional constant matrix M. The the moment generating function of X evaluated at Tk n is MX (Tk n): × ×

MX (Tk n)= etr(Tk′ nM)MZ (A′Tk nB′) × × × n k

∗∗ = etr(Tk′ nM)Eg (Z)(etr((BTk′ nA)Z) H(αjiej′ Zci)). × × Yi=1 Yj=1

Definition 3.2.2. (Matrix Variate Skew Symmetric Random Variable) Let g(.) be a density function symmetric about 0,H(.) be an absolutely continuous cumulative distri- bution function with H′(.) symmetric around 0. Let z f(z , α ) = 2g(z )H(α z) ij ∼ ij ji ij ji for i = 1, 2,...,n, and j = 1, 2,...,k be independent variables. Then the matrix vari-

nk 1 n k ate random variable Z = (zij) has density 2 A n B k g∗∗(Z) i=1 j=1 H(αjiej′ Zci) | | | | n k Q Q where g∗∗(z) = i=1 j=1 g(zij), and ej′ and ci′ are the elementary vectors of the co- Q Q ordinate system Rk and Rn respectively. Let A be a k k constant matrix, B be × a n n constant matrix and M be a k n-dimensional constant matrix.A random × × variable X = AZB + M is distributed with respect to matrix variate skew symmetric distribution with location parameter M, scale parameters (A, B), and shape parameter

g,H ∆=(αji). We denote this by X MSSk n(M, A, B, ∆). ∼ × 76 3.3 Matrix Variate Skew-Normal Distribution

Definition 3.3.1. (Matrix Variate Skew Normal Density). We call the density

kn 1 1 k n 1 1 2 φk n(A− (X M)B− ) Φ(αjiej′ (A− (X M)B− )ci) f(X, M, A, B, ∆) = × − j=1 i=1 − Q A nQB k | | | | (3.8) as the density of a matrix variate skew normal variable with location parameter M, scale parameters (A, B), and shape parameter ∆. We denote it by msnk n(M, A, B, ∆). ×

Let Z msnk n(0k n,Ik,In, ∆). Then the moment generating function of Z evaluated ∼ × × at Tk n is MZ (Tk n). It can be obtained as follows: × ×

n k

MZ (Tk n)= Eφk×n(Z)(etr(Tk′ nZ) Φ(αjiej′ Zci)) × × Yi=1 Yj=1 k n k 2 1 z′ z +t ′z − 2 i i i i e′ z = k/2 e Φ(αji j i)dz (2π) Z k Yi=1 R Yj=1 k n k 2 1 (z′ z 2t ′z ) − 2 i i− i i e′ z = k/2 e Φ(αji j i)dz (2π) Z k Yi=1 R Yj=1 n 1 ′ k k ti ti 2 e 2 1 (z′ z 2t ′z +t ′t ) − 2 i i− i i i i e′ z = k/2 e Φ(αji j i)dz (2π) Z k Yi=1 R Yj=1 n 1 ′ k k ti ti 2 e 2 1 (z t )′(z t ) − 2 i− i i− i e′ z = k/2 e Φ(αji j i)dz (2π) Z k Yi=1 R Yj=1 n 1 ′ k k ti ti 2 1 2 2 e (zij (t)ij ) = e− 2 − Φ(αjizij)dzij (2π)k/2 Z Yi=1 Yj=1 R n k 1 ′ 1 2 k ti ti 1 (yij ) = 2 e 2 e− 2 Φ(αjiy + αji(t)ij)dyij Z ij Yi=1 Yj=1 R (2π) n k p k 1 t ′t αji(t)ij = 2 e 2 i i Φ( ) 2 Yi=1 Yj=1 (1 + αji ) p n k nk 1 αji(T )ij = 2 etr( Tk n′Tk n) Φ( ). 2 × × 2 Yi=1 Yj=1 (1 + αji ) p 77 Let X = AZB + M for constant (k k) matrix A,(n n) matrix B and k n × × × dimensional constant matrix M. Then the moment generating function of X evaluated

k n at Tk n R × is MX (Tk n), this can be obtained as follows: × ∈ ×

n k nk 1 αji(A′Tk nB′)ij MX (Tk n) = 2 etr(Tk′ nM + (A′Tk nB′)′A′Tk nB′) Φ( × ). × × 2 × × 2 Yi=1 Yj=1 (1 + αji ) p

Hence the following definition and theorems.

Definition 3.3.2. (Matrix Variate Skew Normal Random Variable) Let

z 2φ(z )Φ(α z ) ij ∼ ij ji ij for i = 1, 2,...,n, and j = 1, 2,...,k be independent univariate skew normal random variables. Then the matrix variate random variable Z =(zij) has density

n k nk 2 φk n(Z) Φ(αjiej′ Zci) × Yi=1 Yj=1

n k where φk n(Z) = φ(zij), and ej and ci are the elementary vectors of the × i=1 j=1 Q Q coordinate system Rk and Rn respectively. Let A be a k k constant matrix, B be × a n n constant matrix and M be a k n-dimensional constant matrix. A random × × variable X = AZB + M is distributed with respect to matrix variate skew symmetric distribution with location parameter M, scale parameters (A, B), and shape parameter

∆=(αji). We denote this by X MSNk n(M, A, B, ∆). If the density exists it is ∼ × given in Equation 3.8. We denote this case by writing X msnk n(M, A, B, ∆). ∼ ×

Theorem 3.3.1. If X has multivariate skew-normal distribution MSNk n(M, A, B, ∆) × 78 then the moment generating function of X evaluated at Tk n is given by ×

nk 1 MX (Tk n) = 2 etr(Tk′ nM + (A′Tk nB′)′A′Tk nB′) × × 2 × × n k αji(A′Tk nB′)ij Φ( × ). (3.9) × 2 Yi=1 Yj=1 (1 + αji ) p

By Definition 3.4.2 we can write Z MSNk n(0,Ik,In, ∆), and prove the following ∼ × theorems.

Theorem 3.3.2. Assume that Y MSNk n(M, A, B, ∆) and X = CYD + N. Then ∼ ×

X MSNk n(CMD + N,CA,BD, ∆). ∼ ×

Proof. From assumption, we have Y = AZB+M, and so X = CAZBD+(CMD+N), i.e., X MSNk n(CMD + N,CA,BD, ∆). ∼ ×

Theorem 3.3.3. Let x1, x2,... xn be independent, where xi is distributed according

1/2 to snk(0k, Σ , α). Then, n 1 2 x′ Σ− x χ . j j ∼ kn Xj=1

1 2 1 1 Proof. Let y N (µ = 0, Σ). Then y′Σ− y χ , and x′ Σ− x and y′Σ− y have the ∼ k ∼ k j j 1 same distribution from Theorem 2.2.2. Moreover, x′jΣ− xj are independent. Then the desired property is proven by the addition property of χ2 distribution.

It is well known that if X φk n(M, AA′, Ψ = In) then the matrix variable XX′ ∼ × has the Wishart distribution with the moment generating function given as (I | − n/2 1 2(AA′)T ) − , (AA′)− 2T being a positive definite matrix. The following theorem | − implies that the decomposition for a Wishart matrix is not unique.

Theorem 3.3.4. If a k n matrix variate random variable X has snk n(0k n, A, In, ∆) × × × distribution for constant positive definite matrix A of order k and ∆ = α1′, then

XX′ W (n, AA′). ∼ k 79

Proof. The moment generating function of the quadratic form XX′ can be obtained

k k 1 as follows, for any T R × , with (AA′)− 2T being a positive definite matrix, ∈ −

E(etr(XX′T )) = etr(XX′T )dF X ZRk×n

nk 1 1 n k 1 2 etr( (AA′)− XX′ + XX′T ) Φ(α e′ A− Xc ) − 2 i=1 j=1 ji j i = nk/2 n dX ZRk×n (2π) AQ Q | | nk 1 1 n k 1 2 etr( X′((AA′)− 2T )X) Φ(α e′ A− Xc ) − 2 − i=1 j=1 ji j i = nk/2 n dX ZRk×n (2π) QA Q | | nk 1 n k 1 1 1/2 2 etr( Z′Z) Φ(α e′ A− ((AA′)− 2T ) Zc ) − 2 i=1 j=1 ji j − i = nk/2 n/2 dZ ZRk×n Q(2π)Q (I 2(AA )T ) | − ′ | nk k n ( 1 z′ z ) 1 1 1/2 2 e − 2 i i Φ(α e′ A− ((AA′)− 2T ) z ) j=1 i=1 ji j − i = nk/2 n/2 dZ ZRk×n Q Q (2π) (I 2(AA )T ) | − ′ | nk 1 1 1/2 n 2 (EzΦk(αjie′ A− ((AA′)− 2T ) z)) = j − (I 2(AA )T ) n/2 − ′ | n/2 = (I 2(AA′)T ) − . | − |

3.4 Generalized Matrix Variate Skew Normal Dis- tribution

Definition 3.4.1. (Generalized Matrix Variate Skew Normal Density). We call the density

kn 1 1 k 1 1 2 φk n(A− (X M)B− ) H(αjiej′ (A− (X M)B− )ci) f(X, M, A, B, ∆) = × − j=1 − QA n B k | | | | (3.10) as the density of a matrix variate skew normal variable with location parameter M, scale 80 H parameters (A, B), and shape parameter ∆. We denote it by gmsnk n(M, A, B, ∆). ×

H Let Z gmsnk n(0k n,Ik,In∆). The moment generating function of Z evaluated at ∼ × ×

Tk n is MZ (Tk n), it can be obtained as follows: × ×

n k

MZ (Tk n)= EHk×n(Z)(etr(Tk′ nZ) H(αjiej′ Zci)) × × Yi=1 Yj=1 k n k 2 1 z′ z +t ′z − 2 i i i i e′ z = k/2 e H(αji j i)dz (2π) Z k Yi=1 R Yj=1 k n k 2 1 (z′ z 2t ′z ) − 2 i i− i i e′ z = k/2 e H(αji j i)dz (2π) Z k Yi=1 R Yj=1 n 1 ′ k k ti ti 2 e 2 1 (z′ z 2t ′z +t ′t ) − 2 i i− i i i i e′ z = k/2 e H(αji j i)dz (2π) Z k Yi=1 R Yj=1 n 1 ′ k k ti ti 2 e 2 1 (z t )′(z t ) − 2 i− i i− i e′ z = k/2 e H(αji j i)dz (2π) Z k Yi=1 R Yj=1 n 1 ′ k k ti ti 2 1 2 2 e (zij (t)ij ) = e− 2 − H(αjizij)dzij (2π)k/2 Z Yi=1 Yj=1 R n k 1 ′ 1 2 k ti ti 1 (yij ) = 2 e 2 e− 2 H(αjiy + αji(t)ij)dyij Z ij Yi=1 Yj=1 R (2π) n k p k 1 t ′t αji(t)ij = 2 e 2 i i H( ) 2 Yi=1 Yj=1 (1 + αji ) p n k nk 1 αji(T )ij = 2 etr( Tk n′Tk n) H( ). 2 × × 2 Yi=1 Yj=1 (1 + αji ) p

Let X = AZB + M for constant (k k) matrix A, (n n) matrix B and k n × × × dimensional constant matrix M. Then the moment generating function of X evaluated

k n at Tk n R × is MX (Tk n), this can be obtained as follows: × ∈ ×

n k nk 1 αji(A′Tk nB′)ij MX (Tk n) = 2 etr(Tk′ nM + (A′Tk nB′)′A′Tk nB′) H( × ). × × 2 × × 2 Yi=1 Yj=1 (1 + αji ) p

Hence the following definition and theorems. 81 Definition 3.4.2. (Matrix Variate Generalized Skew Normal Random Variable) Let

z 2φ(z )H(α z ) ij ∼ ij ji ij for i = 1, 2,...,n, and j = 1, 2,...,k be independent univariate generalized skew normal random variables. The matrix variate random variable Z =(zij) has density

n k nk 2 φk n(Z) H(αjiej′ Zci) × Yi=1 Yj=1

n k where φk n(Z) = φ(zij), and ej′ and ci′ are the elementary vectors of the × i=1 j=1 Q Q coordinate system Rk and Rn respectively. Let A be a k k constant matrix, B be a × n n constant matrix and M be a k n-dimensional constant matrix. A random variable × × X = AZB +M is distributed with respect to generalized matrix variate skew symmetric distribution with location parameter M, scale parameters (A, B), and shape parameter

H ∆=(αji). We denote this by X GMSNk n(M, A, B, ∆). If the density exists it is ∼ × H given in Equation 3.10. We denote this case by writing X gsnk n(M, A, B, ∆). ∼ × Theorem 3.4.1. If X has generalized matrix variate skew-normal distribution,

H GMSNk n(M, A, B, ∆), then the moment generating function of X evaluated at Tk n × × is given by

nk 1 MX (Tk n) = 2 etr(Tk′ nM + (A′Tk nB′)′A′Tk nB′) × × 2 × × n k αji(A′Tk nB′)ij H( × ). (3.11) × 2 Yi=1 Yj=1 (1 + αji ) p

H By Definition 3.4.2 we can write Z GMSNk n(0,Ik,In, ∆), and prove the following ∼ × theorems.

H Theorem 3.4.2. Assume that Y GMSNk n(M, A, B, ∆) and X = CYD+N. Then ∼ × H X GMSNk n(CMD + N,CA,BD, ∆). ∼ × 82 Proof. From assumption we have Y = AZB +M, and so X = CAZBD+(CMD+N), i.e., X MSNk n(CMD + N,CA,BD, ∆). ∼ ×

Theorem 3.4.3. Let x1, x2,... xn be independent, where xi is distributed according

H 1/2 to gsnk (0k, Σ , α). Then, n 1 2 x′ Σ− x χ j j ∼ kn Xj=1 .

1 2 1 1 Proof. Let y N (µ = 0, Σ). Then y′Σ− y χ , and x′ Σ− x and y′Σ− y have the ∼ k ∼ k j j 1 same distribution from Theorem 2.2.2. Moreover, x′jΣ− xj are independent. Then the desired property is proved by the addition property of χ2 distribution.

3.5 Extensions

As in Chapter 2, an odd function, say w(x), can be used to replace the term in the form αjix in the skewing function to give more flexible families of densities. As before we can take w(x )= λ1x , to obtain a matrix variable form of the skew symmetric ji 2 √1+λ2x family introduced by Arellanno-Valle at al. ([7]); take w(x) = αx + βx3, we obtain a matrix variate form of the skew symmetric family introduced by Ma and Genton ([45]); or take w(x) = sign(x) x α/2λ(2/α)1/2 to obtain a matrix variate form of the | | skew symmetric family introduced by DiCiccio and Monti ([23]).

Also note that, by the definition of matrix variate weighted distribution in Equation 3.6, the skewing function of the matrix variate skew normal density in Equation 3.8, i.e. k n 1 1 Φ(α e′ (A− (X M)B− )c ), ji j − i Yj=1 Yi=1 can be replaced by

1 1 Φk∗ n∗ (ΓA− (Z M)B− Λ) × − 83 where Γ and Λ are matrices of dimensions k∗k and n n∗ correspondingly. In this × kn 1 case, the normalizing constant 2 will have to be changed to E (P (Y < ΓA− (Z Z − 1 M)B− Λ Z)) for Y φk n(0k n,Ik,In) and Z φk n(M, AA′, B′B). This extension | ∼ × × ∼ × to matrix variate skew normal density is similar to the matrix variable closed skew normal distribution ([8]). Also, if k∗ = k, n∗ = n and both Γ and Λ are diagonal such that αij =ΓiiΛjj we return to the family introduced in Definition 3.4.1. 84

CHAPTER 4

Estimation and Some Applications

Pewsey discusses some of the problems related to inference about the parameters of the skew normal distribution ([52]). The method of moments approach fails for the samples for which the sample skewness index is outside the admissibility of the skew normal model. Also, the maximum likelihood estimator of the shape parameter α may take infinite values with positive probability. To deal with this problem certain methods have been proposed: Azzalini and Capitanio recommends that maximization of the log- be stopped when the likelihood reaches a value not significantly lower than the maximum ([12]). Sartori uses bias prevention of maximum likelihood estimates for scalar skew normal and skew-t distributions and obtains finite valued estimators ([57]). Bayesian estimation of the shape parameter is studied by Liseo and Loperfido ([44]). Minimum χ2 estimation method and an asymptotically equivalent maximum likelihood method are proposed by Monti ([46]). Chen and Gupta discuss goodness of fit procedures for the skew normal distribution ([33]).

Maximum products of spacings (MPS) estimation is independently introduced by Cheng and Amin ([18]), and Ranneby ([53]). It is a general method for estimating parameters in univariate continuous distributions and is known to give consistent and asymptotically efficient estimates under general conditions. In this chapter, we consider 85 1/2 estimation of the parameters of snk(α, µ, Σc ) distribution introduced in Chapter 2, using the MPS procedure. We also show that MPS estimation procedure can be ex- tended to give a class of bounded statistical model selection criteria which is suitable even for cases where Akaike’s and other likelihood based model selection criteria do not exist.

4.1 Maximum products of spacings (MPS) estima- tion

Consider the common statistical inference problem of estimating a parameter θ from observed data x1,...,xn, where the xi’s are mutually independent observations from a

distribution function Fθ0 (x). The distribution function Fθ(x) is assumed to belong to a family F = F (x) : θ Θ of mutually different distribution functions, where the { θ ∈ } parameter θ may be vector-valued.

Let x(1), x(2),...,x(n) be the ordered random sample from Fθ0 (x). MLE method arises from maximization of the log product of density function,

1 n 1 n l(θ, X)= log f (x )= l (θ). (4.1) n θ (i) n i Xi=1 Xi=1

Let the maximum of this be denoted by l(θ). MPS method arises from maximization of the log product of spacings, b

1 n+1 1 n+1 S(θ; X)= si(θ)= log(Fθ(x(i)) Fθ(x(i 1))), (4.2) n + 1 n + 1 − − Xi=1 Xi=1 with respect to θ. It is assumed that Fθ(x0)=0 and Fθ(x(n+1)) = 1. Let the maximum of this function be denoted by S(θ). Under very general conditions, Cheng and Amin ([18]) prove that the estimators producedb by it have the desirable properties of being 86 consistent, asymptotically normal, and asymptotically efficient since ML method and

MPS method are asymptotically equivalent. For the details of this proof and the necessary conditions see ([18]).

An intuitive comparison of the log likelihood with the log products of spacings is helpful in understanding the similarities and differences. By the mean value theorem,

x(i) log(s(i)) = log( fθ(x)dx)= li(θ) + log(x(i) x(i 1))+ R(x(i 1), x(i), θ). (4.3) − − Zx(i−1) −

for each i. Here, R(x(i 1), x(i), θ) is essentially of O(x(i) x(i 1)). Now, let the notation − − − y = Op(g(n)) mean that, given ǫ > 0, K and N can be found, depending possibly on θ , such that P ( y < Kg(n)) > 1 ǫ, for all n > N. Similarly y = o (g(n)) 0 | | − p mean that, given ǫ > 0,δ > 0, N can be found, depending possibly on θ0, such that P ( y < δg(n)) > 1 ǫ, for all n > N. Under standard regulatory assumptions, the | | − following result holds ([22]). Given any ǫ > 0, there is a K = K(ǫ, θ0) and N such

log(n) that sup(x(i) x(i 1)) < K with probability 1 ǫ for all n N. Therefore, the − − n − ≥ n+1 difference between l(θ, X) and S(θ, X) is approximately i=1 log(x(i) x(i 1))/(n + 1) − − log(n) P which is Op( n ). Consequently, MPS and ML estimation are asymptotically equal and have the same asymptotic sufficiency, efficiency, and consistency properties. How-

1 ever S(θ; X) is always bounded above by log( n+1 ), MPS approach gives consistent estimators even when MLE fails.

The following theorem is proved under mild conditions in Cheng and Stephens ([19]), it justifies the use of MPS approach in its relation to the likelihood and Moran Statistic.

Theorem 4.1.1. Let θ˘ be the MPS estimator of θ. Then the MPS estimator, θ,˘ exists in probability, and the following holds: 1 1/2 ˘ ∂2l(θ) − 1. n (θ θ0) n(0, E ∂θ2 θ0 ) asymptoticaly. − ∼ −   |  87 2. If θ is the likelihood estimator, then b 1/2 θ˘ θ = o (n− ). − 0 p b

3. Also

1 1 S(θ˘)= S(θ )+ Q + o (n− ), 0 2n p

2 where Q is distributed as χk.

ML method and MPS method can be seen as objective functions that are approxi- mations to Kullback-Leibler ([24]). Ranneby ([53]) gives approximations of information measures other than the Kullback-Leibler information. In Ranneby & Ekstr¨om ([54]) a class of estimation methods, called generalized maximum spac- ing (GMPS) methods, is derived from approximations based on simple spacings of so called ϕ-divergences. Ghosh & Jammalamadaka ([28]) show in a simulation study that

GMPS methods other than the MPS method can perform better in terms of mean square error. Ekstrom ([24]) gives strong consistency theorems for GMPS estimators.

There are two disadvantages to MPS estimation: MPS procedure fails if there are ties in the sample. Although this will happen with probability zero for continuous densities, when data is grouped or rounded this may become an issue. Titterington ([59]) and Ekstr¨om & Ranneby([54]) give methods to deal with ties in data using higher order spacings. The second disadvantage is that optimization related to MPS approach usually requires computational methods for solution. 88 4.2 A Bounded Information Criterion for Statisti- cal Model Selection

A fundamental problem in statistical analysis is about the choice of an appropriate model. The area of statistical model identification, model evaluation, or model selection deals with choosing the best approximating model among a set of competing models to describe a given data set. In a series of papers, Akaike ([2] and [3]) develops the

field of statistical data modeling from the principle of Boltzmann’s ([15]) generalized entropy or Kullback-Leibler ([42]) divergence. As a measure of the goodness of fit of an estimated statistical model among several competing models, Akaike proposes the so called Akaike information criterion (AIC). AIC provides a trade off between and complexity of a model, given a data set competing models may be ranked according to their AIC, the model with smallest AIC is considered the best approximating model.

The Kullback-Leibler divergence between the true density fθ0 (x) and its estimate fθ(x) is given by

fθ0 (x) KLD(θ0, θ)= fθ0 (x) log( )dx. (4.4) Z fθ(x)

The most obvious estimator of [4.4] is given by

1 n f (x ) KLD\(θ , θ)= log( θ0 (i) ). (4.5) 0 n f (x ) Xi=1 θ (i)

Minimization of the expression in (4.5) with respect to θ is equivalent to maximization of the log likelihood, in Equation 4.6. The maximum log likelihood method can thus be used to estimate the values of parameters. However, it cannot be used to compare different models without some corrections because it is biased. An unbiased estimate of 2 times the mean expected maximum value of log likelihood yields the famous − 89 AIC:

AIC = 2l(θ) + 2k/n. (4.6) − b For small sample sizes, the penalty term penalty term of AIC, i.e. 2k/n, is usually not sufficient and may be replaced in search for improvement. For example the Bayes information criterion (BIC) of Schwarz ([58]) is obtained for 2k log(n)/n, Hannan-

Quinn ([38]) information criterion is obtained for 2k log(log(n))/n. Many other model selection criteria are obtained by similar adjustments to the penalty term. However, all these different criteria share the same problem: There are many cases for which the likelihood function is unbounded as a function of the parameters (for example see ([21])), therefore in such situations likelihood based inference and model selection is not appropriate. Next, we develop a model selection criterion which overcomes the unbounded likelihood problem based on the MPS estimation.

In Akaike ([2]), AIC is recommended as an approximation to the Kullback-Leibler divergence between the true density and its estimate ([2]). The connection of S(θ, X) to the Kullback-Leibler divergence and minimum information principle is studied by

Ranneby [53] and Lind ([43]). Ranneby introduces MPS from an approximation to the Kullback-Leibler divergence [53]. Under mild assumptions the following holds

P (S(θ) < sup ( γ KLD(θ , θ)+ ǫ)) 1 − − o → b θ |{z} as n for every ǫ> 0 where γ = 0.57722 is the Euler’s constant. → ∞ ∼ In Cheng and Stephens ([19]), the relation of MPS method to Moran’s goodness of fit statistic ([47]) is studied. Moran’s statistics is the negative of S(θ, X), and its distri- bution evaluate at the true value of the parameter does not depend on the particular distribution function it is calculated for. The percentage points of Moran’s statistic are given by Cheng and Thornton ([20]). 90 Cheng and Stephens extend the Moran’s test ([48]) to the situation where parameters have to be estimated from sample data. Under mild assumptions, when θ is an efficient estimate of θ (this may be the maximum likelihood estimate, or the value,b θ, which minimizes S(θ; X)) then asymptotically the following holds:

1 1 S(θ)= S(θ ; X)+ Q + o (n− ) (4.7) 0 2n p b where Q is a χ2 random variable with expected value k. As n expected value of k → ∞ the statistic S(θ)+ k does not depend on the particular distribution function it is − 2n calculated for. b

Because of the above reasons, we propose the statistic

k MPSIC = S(θ)+ (4.8) − 2n b as a model selection criterion and we refer to it as the maximum products of spacings information criterion (MPSIC) here on. Note that the penalty term in 4.8 can be

k log(n) k log(log(n)) replaced with 2n or 2n to obtain other criteria of this kind, let’s call these variants MPSIC2 and MPSIC3. A

In Section 4.1, we have argued that S(θ0; X) should be a close to the log likelihood in 4.1 when the latter exists; however, the main difference between S(θ0; X) and the

n+1 log likelihood in 4.1 is the term i=1 log(x(i) x(i 1))/(n + 1) which depends on the − − P unknown true parameter value θ0. Because of this term the distribution of the log likelihood depends on the unknown true parameter value and therefore is not suitable for model identification purposes.

As mentioned earlier in skew normal model likelihood function is often unbounded. Therefore, estimation in skew normal model leaves a lot of room for testing MPSIC as a model selection criterion. In Table 4.1, we display the results of the following 91

: We generate n observations from sn1(α, 0, 1), and then we compare the

fit of this model to standard normal and half normal models with MPSC, MPSC2,

and MPSC3. We record the percentage of times the skew normal model is selected by these model selection criterion.

n α MPSIC MPSIC2 MPSIC3 20 0 0.272 0.056 0.251 50 0 0.265 0.030 0.190 100 0 0.294 0.030 0.182 20 2 0.961 0.953 0.961 50 2 1.000 1.000 1.000 100 2 1.000 1.000 1.000 20 5 0.806 0.777 0.803 50 5 0.960 0.959 0.960 100 5 0.999 0.999 0.999 20 10 0.536 0.483 0.529 50 10 0.842 0.822 0.835 100 10 0.972 0.966 0.972

Table 4.1: MPSIC, MPSIC2, and MPSIC3 are compared for samples of size 20, 50 and 100 for different levels of α.

4.3 Estimation of parameters of snk(α, µ, Σ)

We need the following result in this section: Let x(1), x(2),...,x(n) be the ordered

sample from sn1(α, 0, 1), and let the cdf of this density be denoted as Fα(.). Then

2 x(1) ∂ π φ( (α2+1)−1/2 ) q S(α; X)= 2 ∂α −Fα(x(1))(α + 1)

2 x(n) π φ( (α2+1)−1/2 ) + q (1 F (x ))(α2 + 1) − α (n) x − x n 2 (i 1) (i) π (φ( (α2+1)−1/2 ) φ( (α2+1)−1/2 )) q − + 2 . (4.9) (Fα(x(i)) Fα(x(i 1)))(α + 1) Xi=2 − − 92 4.3.1 Case 1: MPS estimation of sn1(α, 0, 1)

1/2 1/2 Assume X snk n(µ1k′ , Σc ,Ik, α1k′ ). When µ, Σc are known, we can write zi = ∼ × 1/2 Σ− (x µ) and estimate components α of α by using the corresponding i.i.d. c i − j sn1(αj, 0, 1) random variables zj1, zj2,...,zjn.

Example 4.3.1. The ”frontier data” 50 observations generated by a sn1(0, 1, α = 5) distribution reported in Azzalini and Capitonio ([12]) is an example for which the usual estimation methods like method of moments or maximum likelihood fail. Figure

4.1 gives the histogram, kernel density estimate, density curves for sn1(0, 1, α = 5) and sn1(0, 1, α = 4.9632) distributions. For the frontier data,the Bayesian estimates recommendedb by Liseo and Loperfido gives α = 2.12 ([44]), the estimate through bias prevention method by Sartori gives α = 6.243b ([57]). Moreover the MPSIC for skew normal model is 4.3999. The MPSICb for the half normal model is 5.0495. Therefore, MPSIC identifies the correct model.

For any value shape parameter α, the absolute value of the skew normal variable has half normal distribution. If we take the absolute values of the frontier data, the appropriate model should be the half normal model instead of a skew normal model with large shape parameter. This fact is supported by MPSIC: Estimate of α by MPS method is 38.3906, and the MPSIC for this model is 4.4136. The MPSIC for the half normal model is 4.4057.

The following Table 4.2 give for the distribution of MPS estimator for different values of n and α obtained from 5000 simulations. These results can be compared with the simulation results for the ML estimator in Table 4.2.

Although MPS estimation provides an improvement on ML estimation, highly skew distribution of the estimator, as seen in Figure 4.2, points to the tendency in some samples to give extremely large estimates for the shape parameter. Since α the → ±∞ density SN1(y, α) approaches to the half normal density, the question weather skew 93

40

35

30

25

20 relative 15

10

5

0 −2 −1 0 1 2 3 4 5 y

Figure 4.1: The histogram, kernel density estimate (blue), sn1(0, 1, α = 5) curve (red) and sn1(0, 1, α = 4.9681) curve (black) for the ”frontier data”.

b

n α Mean (α) std(α) (α) MAD(α) 20 0 0.0028 0.2702 0.0045 0.1578 50 0 0.0040b 0.1572b 0.0094b 0.1048b 100 0 0.0057 0.1193 0.0065 0.0849 20 2 2.9409 7.9874 1.6648 0.4516 50 2 1.9662 1.1007 1.8120 0.2981 100 2 1.9363 0.3932 1.8913 0.2446 20 5 10.2334 19.1683 3.6313 1.1751 50 5 5.5383 6.7330 4.2047 0.9611 100 5 4.7936 1.4561 4.4462 0.7202 20 10 21.5252 29.5315 6.2980 2.6961 50 10 11.4908 15.1492 7.3934 1.9900 100 10 10.0525 7.0268 8.3873 1.8476

Table 4.2: MPS estimation of α: Mean, standard deviation, median and mean absolute deviation of α from 5000 simulations.

b 94 n α Mean (α) std(α) Median(α) MAD(α) 20 0 0.0108 0.3469 0.0105 0.1969 50 0 0.0110b 0.1827b 0.0156b 0.1142b 100 0 0.0017 0.1308 0.0010 0.0895 20 2 3.3901 6.5156 2.2049 0.6616 50 2 2.2586 0.8532 2.1192 0.3817 100 2 2.1468 0.4518 2.0744 0.2737 20 5 16.6467 25.4334 5.7100 2.2305 50 5 7.2979 7.8039 5.5378 1.4243 100 5 5.7956 3.5357 5.2114 0.8775 20 10 30.9972 32.6160 12.0381 6.5854 50 10 16.0803 16.9440 10.8762 3.5005 100 10 12.8935 9.2488 10.4143 2.3308

Table 4.3: ML estimation of α: Mean, standard deviation, median and mean absolute deviation of α from 5000 simulations.

b normal model is appropriate for the data arises. When α is large, perhaps it is better

to use the half normal model with one less parameterb for sake of model parsimony. This can always be checked by comparing half normal model and skew normal model

by one of the model selection criteria, MPSIC, MPSIC2, or MPSIC3.

1/2 4.3.2 Case 2: Estimation of µ, Σc given α

1/2 For X snk n(µ1k′ , Σc ,Ik, α1k′ ), when α is known, we can use method of moments ∼ × 1/2 to estimate the parameters µ, Σc . Note that

α1 1+α2  √ 1  α2 2 1/2  1+α2  1/2 E(x)= Σ  √ 2  + µ = Σ µ (α)+ µ, rπ c  .  c 0  .       αk   1+α2   √ k 

2 2 2 1/2 2 α1 α2 αk ′1/2 1/2 ′1/2 Cov(x)= Σc (Ik diag( 2 , 2 ,..., 2 ))Σc = Σc Σ0(α)Σc . − π 1+ α1 1+ α2 1+ αk 95

5000

4500

4000

3500

3000

2500

2000

1500

1000

500

0 0 100 200 300 400 500 600 700 800 alpha

Figure 4.2: Histogram of 5000 simulated values for α by MPS. The MPS estimator is highly skewed suggesting that half normal model might be more suitable for cases where extreme values of α are observed. b

b 96 Therefore, given α, we can estimate Σ and µ by solving the moment equations

n 1/2 ′1/2 S = (1/n) (x x)(x x)′ = Σ Σ (α)Σ (4.10) j − j − c 0 c Xj=1

n 1/2 x = (1/n) xj = Σc µ0(α)+ µ. (4.11) Xj=1 Solution to the above equations are

2 2 2 1/2 1/2 2 α1 α2 αk 1/2 Σc = Sc (Ik diag( 2 , 2 ,..., 2 ))− , (4.12) − π 1+ α1 1+ α2 1+ αk d

α1 1+α2  √ 1  α2 1/2 2 2 2 1/2  √1+α2  µˆ = x S (I diag(µ(α ) ,µ(α ) ,...,µ(α ) ))−  2  . (4.13) − c k − 1 2 k  .   .       αk   1+α2   √ k 

1/2 1/2 Note that (Σc , µˆ). is an equivariant estimator of (Σc , µ) under the group affine transformationsd obtained by scaling with lower triangular matrix with positive diagonal elements.

1/2 4.3.3 Case 3: Simultaneous estimation of µ, Σc and α.

Given an initial guess of α, the following estimation procedure gives reasonably well results.

(a) Start with an initial estimate for α, say α0.

1/2 (b) Given an estimate of α, estimate Σc and µ by 4.12, and 4.13

(c) Given an estimate of µ and Σ, estimate α by MPS. 97 (d) Repeat steps 2-3 until convergence.

First, we apply this technique to univariate data.

Example 4.3.2. Bolfarine et. al. ([29]), consider the 1150 height measurements at 1 millisecond intervals along the drum of a roller (i.e. parallel to the axis of the roller).

Data is a part of a study on surface roughness of the rollers. Since there are ties in the sample before analyzing the data we have adjusted the data for ties. We apply the

procedure described above, Figure 4.3 summarizes the results.

800

700

600

500

400 relative frequency

300

200

100

0 0 2 4 6 8 Drum Measurements

Figure 4.3: The histogram, kernel density estimate (blue), and estimated density curve sn (µ = 4.2865, σ1/2 = 0.9938, α = 2.9928) (black) for 1150 height measurements. 1 − 98 Table 4.4 gives summary statistics for the distribution of estimators of α, µ = 0 and

σ = 1 for different of n and α.

n α mean α std α median α MAD α mean µ std µ mean σ std σ 30 1.5 2.6013 2.1136 1.9277 1.4755 -0.0722 0.2303 1.0560 0.2003 50 1.5 2.0914b 1.2355b 1.7920b 0.8614b -0.0536b 0.1971b 1.0432b 0.1634b 100 1.5 1.8096 0.6817 1.6584 0.5292 -0.0372 0.1443 1.0228 0.1157 30 3.0 4.0288 2.7718 3.2239 2.0730 0.0229 0.1848 0.9836 0.1840 50 3.0 3.7638 2.0545 3.2340 1.5469 0.0072 0.1426 0.9934 0.1509 100 3.0 3.5131 1.5431 3.1676 1.1380 0.0012 0.1090 0.9981 0.1090

Table 4.4: Table above gives summary statistics for the distribution of estimators of α, µ = 0 and σ = 1 for different of n and α.

1 0 1/2   Example 4.3.3. We generate 100 observations from sn2(µ = (0, 0)′, Σc = , α =  .5 2    (5, 3)′) distribution. We estimate the parameters with the procedure described above − with initial estimate for α, i.e. α = (10, 10)′. Figure 4.4 gives the the scatter plot, 0 − and the contour plots of the true, and the estimated densities.

Example 4.3.4. Australian Athletes Data: This data is collected on 202 athletes and mentioned in Azzalini ([11]). We take the variables LBM and BMI and fit a bivariate

skew normal model with the procedure described above with initial estimate for α, i.e.

α0 = (10, 10)′. Figure 4.5 gives the the scatter plot, and the contour plot the estimated density.

Example 4.3.5. Australian Athletes Data 2: This data is collected on 202 athletes and mentioned in Azzalini ([11]). We take the variables LBM and SSF and fit a bivariate

skew normal model with the procedure described above with initial estimate for α, i.e.

α0 = (10, 10)′. Figure 4.6 gives the the scatter plot, and the contour plot the estimated density. 99

1.5

1

0.5

0

−0.5

−1

−1.5

−2

−2.5 −0.5 0 0.5 1 1.5 2 2.5

Figure 4.4: The scatter plot for 100 generated observations (black dots) and, contours 1/2 1 0 of sn (µ = (0, 0)′, Σ = , α = (5, 3)′) (blue), and estimated density curve 2 c  .5 2  − 1/2 0.9404 0 sn (µ = (0.0795, 0.0579)′, Σ = , α = (5.6825, 3.6566)′) (red). 2 c  0.5845 1.1092  − 100

90

80

70 LBM

60

50

40

14 16 18 20 22 24 26 28 30 32 34 BMI

Figure 4.5: The scatter plot for LBM and BMI in Australian Athletes Data, contours 1/2 4.2760 0 of the estimated density sn (µ = (19.7807, 48.8092)′, Σ = , α = 2 c  13.9306 10.7933  (2.5429, 0.8070)′) (red).

160

140

120

100

80 SSF

60

40

20

0 0 5 10 15 20 25 30 35 40 BMI

Figure 4.6: The scatter plot for LBM and SSF in Australian Athletes Data, contours 1/2 4.2760 0 of the estimated density sn (µ = (19.7807, 17.3347)′, Σ = , α = 2 c  15.6127 50.5834  (2.5429, 8.6482)′) (red). 101

α µ n mean cov mean cov 10 3.9836b 17.6933 -0.1927 0.8143b 0.2056 0.0879 3.7503 -0.1927 16.2678 0.7769 0.0879 0.2981 α µ mean cov mean cov 20 2.6878b 16.1777 -0.77 0.8812b 0.1227 0.0597 2.5243 -0.77 12.7421 0.8466 0.0597 0.167 α µ mean cov mean cov 30 2.0305b 8.983 -0.3011 0.9149b 0.0854 0.0424 1.7223 -0.3011 3.9441 0.9044 0.0424 0.1235 α µ mean cov mean cov 50 1.2905b 1.1753 -0.0115 0.9553b 0.0644 0.0347 1.3733 -0.0115 1.2555 0.9345 0.0347 0.0859 α µ mean cov mean cov 100 1.184b 0.4377 -0.0227 0.9506b 0.0354 0.0178 1.1238 -0.0227 0.3385 0.9537 0.0178 0.05 α µ mean cov mean cov 200 1.0341b 0.1308 0.0064 0.9796b 0.0187 0.0093 1.0575 0.0064 0.1485 0.963 0.0093 0.0261

1/2 Table 4.5: The simulation results about µ and α in estimation of sn2(µ = (1, 1)′, Σc = 1 0 , α = (1, 1)′).  .5 1  b b 102

vec(Σ1/2) n meand cov 10 1.1266 0.1179 0.046 -0.0015 0.5627 0.046 0.1867 0.0108 1.0634 -0.0015 0.0108 0.1107 vec(Σ1/2) n meand cov 20 1.0797 0.0656 0.0349 -0.0042 0.5229 0.0349 0.0841 0.0024 1.056 -0.0042 0.0024 0.0621 vec(Σ1/2) n meand cov 30 1.0582 0.044 0.0226 -0.0002 0.5038 0.0226 0.0514 0.0018 1.0338 -0.0002 0.0018 0.0396 vec(Σ1/2) n meand cov 50 1.0319 0.0293 0.0147 0.0029 0.5201 0.0147 0.03 0.0014 1.0236 0.0029 0.0014 0.029 vec(Σ1/2) n meand cov 100 1.0268 0.0159 0.0078 0.0003 0.5168 0.0078 0.0147 0.0011 1.0162 0.0003 0.0011 0.0159 vec(Σ1/2) n meand cov 200 1.0179 0.0085 0.0039 -0.0003 0.5112 0.0039 0.0068 0.0002 1.0198 -0.0003 0.0002 0.009

1/2 1/2 Table 4.6: The simulation results about Σc in estimation of sn2(µ = (1, 1)′, Σc = 1 0 , α = (1, 1)′). d  .5 1  103

BIBLIOGRAPHY

[1] D.J. Aigner, C.A.K. Lovell, and P. Schmidt. Formulation and estimation of

stochastic frontier production function models. Rand Corp., 1976.

[2] H. Akaike. A new look at the statistical model identification. Automatic Control, IEEE Transactions on, 19(6):716–723, 1974.

[3] H. Akaike. On entropy maximization principle. Applications of Statistics, 1(1):27– 41, 1977.

[4] T.W. Anderson. An introduction to multivariate statistical analysis Wiley series

in probability and . Probability and mathematical statistics. Wiley-Interscience, 2003.

[5] R.B. Arellano-Valle, M.D. Branco, and M.G. Genton. A unified view on skewed distributions arising from selections. Canadian Journal of Statistics, 34(4):581,

2006.

[6] R.B. Arellano-Valle and M.G. Genton. On fundamental skew distributions. Jour- nal of Multivariate Analysis, 96(1):93–116, 2005.

[7] R.B. Arellano-Valle, H.W. Gomez, and F.A. Quintana. A new class of skew-normal distributions. Communications in Statistics-Theory and Methods, 33(7):1465–

1480, 2004.

[8] G.F.; Gupta A.K. Armando, J.; Dom´ınguez-Molina and R. Ramos-Quiroga. A matrix variate closed skew-normal distribution with applications to stochastic 104 frontier analysis. Communications in Statistics - Theory and Methods, 36(9):1691–

1703, 2007.

[9] B.C. Arnold and R.J. Beaver. The skew-Cauchy distribution. Statistics & Proba-

bility Letters, 49(3):285–290, SEP 15 2000.

[10] A. Azzalini. A class of distributions which includes the normal ones. Scandinavian

Journal of Statistics, 12:171–178, 1985.

[11] A. Azzalini. The skew-normal distribution and related multivariate families. Scan-

dinavian Journal of Statistics, 32(2):159–188, 2005.

[12] A. Azzalini and A. Capitanio. Statistical applications of the multivariate skew

normal distribution. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 61(3):579–602, 1999.

[13] A. Azzalini and A. Capitanio. Distributions generated by perturbation of sym- metry with emphasis on a multivariate skew t-distribution. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 65(2):367–389, 2003.

[14] L. Bailey, T.F. Moore, and B.A. Bailar. An interviewer variance study for the eight impact cities of the national crime survey cities sample. Journal of the American

Statistical Association, 73(1):16–23, 1978.

[15] L. Boltzmann. On Zermelos paper On the mechanical explanation of irreversible

processes. Annalen der Physik, 60:392–398, 1897.

[16] M.D. Branco and D.K. Dey. A general class of multivariate skew-elliptical distri-

butions. Journal of Multivariate Analysis, 79(1):99–113, 2001.

[17] J.T. Chen and A.K. Gupta. Matrix variate skew normal distributions. Statistics,

39(3):247–253, 2005.

[18] R.C.H. Cheng and N.A.K. Amin. Estimating parameters in continuous univariate

distributions with a shifted origin. Journal of the Royal Statistical Society. Series B. Methodological, 45(3):394–403, 1983. 105 [19] R.C.H. Cheng and M.A. Stephens. A goodness-of-fit test using Moran’s statistic

with estimated parameters. Biometrika, 76(2):385–392, 1989.

[20] R.C.H. Cheng and K.M. Thornton. Selected percentage points of the Moran statistic. Journal of Statistical Computation and Simulation, 30(3):189–194, 1988.

[21] R.C.H. Cheng and L. Traylor. Non-regular maximum likelihood problems. Journal of the Royal Statistical Society. Series B. Methodological, 57(1):3–44, 1995.

[22] D.A. Darling. On a class of problems related to the random division of an interval.

The Annals of Mathematical Statistics, 24:239–253, 1953.

[23] T.J. DiCiccio and A.C. Monti. Inferential aspects of the skew exponential power distribution. Journal of American Statistical Association, 99(466):439–450, 2004.

[24] M. Ekstr¨om. Alternatives to maximum likelihood estimation based on spacings and the Kullback–Leibler divergence. Journal of Statistical Planning and Infer-

ence, 138(6):1778–1791, 2008.

[25] K. Fang. Generalized Multivariate Analysis. Science Press, Beijing, 1990.

[26] R.A. Fisher. The effect of methods of ascertainment upon the estimation of fre- quencies. Annals of Eugenics, 6:13–25, 1934.

[27] M.G. Genton. Skew-Elliptical Distributions and Their Applications: A Journey Beyond Normality. Chapman & Hall/CRC, 2004.

[28] K. Ghosh and S.R. Jammalamadaka. A general estimation method using spacings.

Journal of Statistical Planning and Inference, 93(1-2):71–82, 2001.

[29] H.W. Gomez, H.S. Salinas, and H. Bolfarine. Generalized skew-normal models: properties and inference. Statistics, 40(6):495–505, 2006.

[30] H.W. Gomez, O. Venegas, and H. Bolfarine. Skew-symmetric distributions gen- erated by the distribution function of the normal distribution. Environmetrics,

18(4):395–407, 2007. 106 [31] A.K. Gupta and F.C. Chang. Multivariate skew-symmetric distributions. Applied

Mathematics Letters, 16(5):643–646, 2003.

[32] A.K. Gupta and J.T. Chen. A class of multivariate skew-normal models. Annals of the Institute of Statistical Mathematics, 56(2):305–315, 2004.

[33] A.K. Gupta and T. Chen. Goodness-of-fit tests for the skew-normal distribution. Communications in Statistics-Simulation and Computation, 30(4):907–930, 2001.

[34] A.K. Gupta, G. Gonz´alez-Far´ıas, and J.A. Dom´ınguez-Molina. A multivariate

skew normal distribution. Journal of Multivariate Analysis, 89(1):181–190, 2004.

[35] A.K. Gupta and W.J. Huang. Quadratic forms in skew normal variates. Journal of Mathematical Analysis and Applications, 273(2):558–564, 2002.

[36] A.K. Gupta and D.K. Nagar. Matrix Variate Distributions. Chapman & Hall/CRC, 2000.

[37] A.K. Gupta and T. Varga. Elliptically Contoured Models in Statistics. Kluwer

Series In Mathematics and Its Applications, 1993.

[38] E.J. Hannan and B.G. Quinn. The determination of the order of an autoregression. Journal of the Royal Statistical Society, 41(2):190–195, 1979.

[39] S.W. Harrar and A.K. Gupta. On matrix variate skew-normal distributions. Statis- tics, 42(2):179–194, 2008.

[40] W.J. Huang and Y.H. Chen. Quadratic forms of multivariate skew normal-

symmetric distributions. Statistics & Probability Letters, 76(9):871–879, 2006.

[41] M.C. Jones and M.J. Faddy. A skew extension of the t-distribution, with applica- tions. Journal of the Royal Statistical Society. Series B (Statistical Methodology),

65(1):159–174, 2003.

[42] S. Kullback and R.A. Leibler. On information and sufficiency. Annals of Mathe-

matical Statistics, 22(1):79–86, 1951. 107 [43] N.C. Lind. Information theory and maximum product of spacings estimation.

Journal of the Royal Statistical Society. Series B. Methodological, 56(2):341–343, 1994.

[44] B. Liseo and N. Loperfido. Default Bayesian analysis of the skew-normal distri- bution. Journal of Statistical Planning and Inference, 136(2):373–389, 2004.

[45] Y.Y. Ma and M.G. Genton. Flexible class of skew-symmetric distributions. Scan-

dinavian Journal of Statistics, 31(3):459–468, 2004.

[46] A.C. Monti. A note on the estimation of the skew normal and the skew exponential

power distributions. Metron, 61(2):205–219, 2003.

[47] P.A.P. Moran. The random division of an interval. Supplement to the Journal of the Royal Statistical Society, pages 92–98, 1947.

[48] P.A.P. Moran. The random division of an interval. Part II. Journal of the Royal Statistical Society, 13(1):92–98, 1951.

[49] S. Nadarajah and S. Kotz. Skewed distributions generated by the normal kernel. Statistics & Probability Letters, 65(3):269–277, 2003.

[50] S. Nadarajah and S. Kotz. Skew distributions generated from different families.

Acta Applicandae Mathematicae, 91(1):1–37, 2006.

[51] G.P. Patil, C.R. Rao, and M. Zelen. Weighted distributions. Encyclopedia of Statistical Sciences, 9:565–571, 1988.

[52] A. Pewsey. Problems of inference for Azzalini’s skewnormal distribution. Journal of Applied Statistics, 27(7):859–870, 2000.

[53] B. Ranneby. The maximum spacing method. An estimation method related to the maximum likelihood method. Scandinavian Journal of Statistics, 11(2):93–

112, 1984.

[54] B. Ranneby and M. Ekstr¨om. Maximum spacing estimates based on different metrics. Institute of Mathematical Statistics, Umea University, 1997. 108 [55] C. Roberts and S. Geisser. A necessary and sufficient condition for the square of

a random variable to be gamma. Biometrika, 53(1/2):275–278, 1966.

[56] S.K. Sahu, D.K. Dey, and M.D. Branco. A new class of multivariate skew distri- butions with applications to bayesian regression models. The Canadian Journal

of Statistics, 31(2):129–150, 2003.

[57] N. Sartori. Bias prevention of maximum likelihood estimates for scalar skew

normal and skew t distributions. Journal of Statistical Planning and Inference, 136(12):4259–4275, 2006.

[58] G. Schwarz. Estimating the dimension of a model. Annals of Statistics, 6(2):461–

464, 1978.

[59] D.M. Titterington. Comment on estimating parameters in continuous univariate distributions. Journal of the Royal Statistical Society. Series B. Methodological, 47(1):115–116, 1985.

[60] J.Z. Wang, J. Boyer, and M.G. Genton. A skew-symmetric representation of

multivariate distributions. Statistica Sinica, 14(4):1259–1270, 2004.

[61] S. Zacks. Parametric Statistical Inference. Pergamon Press, New York, 1981.