Statistical Analysis of Skew Normal Distribution and Its Applications

Total Page:16

File Type:pdf, Size:1020Kb

Statistical Analysis of Skew Normal Distribution and Its Applications STATISTICAL ANALYSIS OF SKEW NORMAL DISTRIBUTION AND ITS APPLICATIONS Grace Ngunkeng A Dissertation Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY August 2013 Committee: Wei Ning, Advisor Jane Y. Chang, Graduate Faculty Representative Arjun K. Gupta John T. Chen Copyright c August 2013 Grace Ngunkeng All rights reserved iii ABSTRACT Wei Ning, Advisor In many practical applications it has been observed that real data sets are not symmetric. They exhibit some skewness, therefore do not conform to the normal distribution, which is popular and easy to be handled. Azzalini (1985) introduced a new class of distributions named the skew normal distribution, which is mathemat- ically tractable and includes the normal distribution as a special case with skewness parameter being zero. The skew normal distribution family is well known for model- ing and analyzing skewed data. It is the distribution family that extends the normal distribution family by adding a shape parameter to regulate the skewness, which has the higher flexibility in fitting a real data where some skewness is present. In this dissertation, we will explore statistical analysis related to this distribution family. In the first part of the dissertation, we develop a nonparametric goodness-of-fit test based on the empirical likelihood method for the skew normal distribution. The empirical likelihood was proposed by Owen (1988). It is a method which combines the reliability of the canonical nonparametric method with the flexibility and effec- tiveness of the likelihood approach. The statistical inference of the test statistic is derived. Simulations indicate that the proposed test can control the type I error within a given nominal level, and it has competitive power comparing to the other available tests. The test is applied to IQ scores data set and Australian Institute of Sport data set to illustrate the testing procedure. In the second part we focus on the change point problem of the skew normal distribution. The world is filled with changes, which can lead to unnecessary losses if people are not aware of it. Thus, statisticians are faced with the problem of de- iv tecting the number of change points or jumps and their location, in many practical applications. In this part, we address this problem for the standard skew normal family. We focus on the test based on the Schwartz information criterion (SIC) to detect the position and the number of change points for the shape parameter. The likelihood ratio test and the bayesian methods as two alternative approaches will be introduced briefly. The asymptotic null distribution of the SIC test statis- tics is derived and the critical values for different sample sizes and nominal levels are computed for the adjustified SIC test statistic. Simulation study indicates the performance of the proposed test. In the third part of the dissertation, we extend the methods in the second part by studying the different types of change point problem for the general skew nor- mal distribution, which include: the simultaneous changes of location and scale parameters, the simultaneous change of location, scale and shape parameters. We derive the test statistic based on SIC to detect and estimate the number of possible change points. Firstly, we consider the change point problem for the simultaneous changes of location and scale parameters, assuming that the shape parameter is unknown and has to be estimated. Secondly, we explore the change point problem for simultaneous changes of location, scale and shape parameters. The asymptotic null distribution and the corresponding adjustification for the test statistic are established. Simulations for each proposed test are conducted to indicate the performance of the test. Power comparisons with the available tests are investigated to indicate the advantage of the proposed test. Applications to real data are provided to illustrate the test procedure. v This work is dedicated to my beloved grand mother Ngunkeng Mariana and my parents Ashu Alexander and Monica Fuabe Ashu, for their constant love and support. vi ACKNOWLEDGMENTS To God be the honor and glory. I wish to express my sincere gratitude to my advisor, Dr. Wei Ning, for his continuous support, guidance and patience throughout this research, and from whom I have acquired a great deal of skills. I also want to extend my gratitude to my committee members, Dr. Arjun K. Gupta, Dr. John T. Chen and Dr. Jane Chang for taking the time to serve on my committee and for their constructive comments. I would like to thank the Mathematics and Statistics Department and the Grad- uate College for providing me with financial support during my studies at BGSU. I would like to thank all the professors in the Mathematics and Statistics Department for their vast knowledge that has impacted me. I would also like to thank all my fellow graduate students for their friendship. I would like to especially thank Marcia Lynn Seubert, Mary Jane Busdeker and Barbara J Berta for all their assistance. I would like to thank Professor Reialdo B. Arellano-Valle, Professor Luis M. Castro and Professor Rosangela H. Loschi for proving us with the Latin American stock market data used in chapters 3 & 4. I owe special thanks to Dr. Lisa Chyvonne Chavers, Mr. Sidney Robert Childs, Dr. Nkem Khumbah and Mrs. Prudence Nojang for making it possible for me to continue my studies at BGSU and for the continuous moral and financial support. Finally my deepest gratitude goes to my parents, family and friends for their constant love and spiritual support throughout my studies. Grace Ngunkeng Bowling Green, Ohio, USA August 2013 vii Table of Contents CHAPTER 1: SKEW NORMAL DISTRIBUTION 1 1.1 Introduction . 1 1.1.1 Properties of skew normal distribution (SN) . 3 1.2 Literature Review . 5 1.2.1 Thesis Structure . 6 CHAPTER 2: EMPIRICAL LIKELIHOOD RATIO BASED GOODNESS- OF-FIT TEST FOR SKEW NORMALITY 8 2.1 Introduction . 8 2.2 Empirical Likelihood Based Test . 15 2.2.1 Empirical Likelihood Method . 15 2.2.2 Test Statistic . 20 2.3 Asymptotic Results . 27 2.4 Calculations of Critical Values and P-values . 32 2.4.1 Critical Values . 32 2.4.2 Approximations to the p-value of SNn . 33 2.5 Simulations . 34 2.6 Application . 38 viii 2.6.1 Otis IQ Scores for Non-whites . 38 2.6.2 Australian Institute of Sport Data . 40 2.7 Conclusion . 41 CHAPTER 3: CHANGE POINT PROBLEM FOR STANDARD SKEW NORMAL DISTRIBUTION 43 3.1 Introduction . 43 3.1.1 Literature Review . 45 3.2 Change of the Shape Parameter λ .................... 47 3.2.1 Information Approach . 48 3.2.2 Likelihood Ratio Based Test . 53 3.2.3 Bayesian Approach . 55 3.3 Simulation . 62 3.4 Application . 62 3.5 Conclusion . 68 CHAPTER 4: CHANGE POINT PROBLEM FOR GENERAL SKEW NORMAL DISTRIBUTION 70 4.1 Location and Scale Change . 71 4.1.1 Information Approach (SIC) . 72 4.1.2 Power Simulation . 78 4.2 Application to Biomedical Data . 78 4.3 The Change of Location, Scale and Shape . 80 4.3.1 Test Statistics . 81 4.3.2 Power Simulation . 85 4.4 Applications to Latin American Emerging Market Stock Returns . 86 ix 4.4.1 Argentina Weekly Stock Market . 87 4.4.2 Brazilian Stock Return . 89 4.4.3 Chile Stock Return Market . 92 4.4.4 Mexico Stock Return Market . 94 4.5 Conclusion . 97 BIBLIOGRAPHY 99 x List of Figures 2.1 Histogram of IQ scores with a skew normal fit and normal fit. 39 2.2 The histogram with a skew normal fit and normal fit for the body mass index (BMI) of 50 females. 41 3.1 The Graph of the time series data for the weekly stock returns and return rate for Brazil with the corresponding change points respectively. 66 4.1 Left: The SIC values for every locus on chromosome 4 of the fibroblast cell line GM13330; Right: Chromosome 4 of the fibroblast cell line GM13330. 80 4.2 The graphs of the time series data for the weekly stock returns and return rate Rt for Argentina market with the corresponding change points. 88 4.3 Left: The graph of the acf values of the transformed data Rt ; Right: Test for normality. 89 4.4 The graphs of the time series data for the weekly return rate Rt and stock returns and for Brazil market with the corresponding change points. 91 4.5 Left: The acf plot of Brazil Rt series data; Right: Test for Normality. 92 xi 4.6 The graphs of the time series data for the weekly return rate Rt and stock returns for Chile market with the corresponding change points . 94 4.7 Left: Graph of the acf of the Chile Rt series; Right: Test for Normality. 95 4.8 The graphs of the time series data for the weekly return rate Rt and stock returns for Mexico market with the corresponding change point. 96 4.9 The ACF of Mexico stock return rate Rt and Q-Q plot to test for normality assumption. 97 xii List of Tables 2.1 Type I error with SN(0; 1; λ); α = 0:05 . 34 2.2 Power comparison with n = 20; 25; 50 and 100 . 36 2.3 Power comparison with n = 20; 25; 50 and 100 . 37 2.4 Power of Test with Alternative Distribution N(0; 1) . 37 2.5 Empirical Power Evaluation of the statistic (2.2.11) with different δ at α = 0:05 .................................... 38 2.6 Otis IQ Scores for Non-whites . 38 2.7 Estimated values for N(µ, σ) and SN(µ, σ; λ) .
Recommended publications
  • Use of Proc Iml to Calculate L-Moments for the Univariate Distributional Shape Parameters Skewness and Kurtosis
    Statistics 573 USE OF PROC IML TO CALCULATE L-MOMENTS FOR THE UNIVARIATE DISTRIBUTIONAL SHAPE PARAMETERS SKEWNESS AND KURTOSIS Michael A. Walega Berlex Laboratories, Wayne, New Jersey Introduction Exploratory data analysis statistics, such as those Gaussian. Bickel (1988) and Van Oer Laan and generated by the sp,ge procedure PROC Verdooren (1987) discuss the concept of robustness UNIVARIATE (1990), are useful tools to characterize and how it pertains to the assumption of normality. the underlying distribution of data prior to more rigorous statistical analyses. Assessment of the As discussed by Glass et al. (1972), incorrect distributional shape of data is usually accomplished conclusions may be reached when the normality by careful examination of the values of the third and assumption is not valid, especially when one-tail tests fourth central moments, skewness and kurtosis. are employed or the sample size or significance level However, when the sample size is small or the are very small. Hopkins and Weeks (1990) also underlying distribution is non-normal, the information discuss the effects of highly non-normal data on obtained from the sample skewness and kurtosis can hypothesis testing of variances. Thus, it is apparent be misleading. that examination of the skewness (departure from symmetry) and kurtosis (deviation from a normal One alternative to the central moment shape statistics curve) is an important component of exploratory data is the use of linear combinations of order statistics (L­ analyses. moments) to examine the distributional shape characteristics of data. L-moments have several Various methods to estimate skewness and kurtosis theoretical advantages over the central moment have been proposed (MacGillivray and Salanela, shape statistics: Characterization of a wider range of 1988).
    [Show full text]
  • Concentration and Consistency Results for Canonical and Curved Exponential-Family Models of Random Graphs
    CONCENTRATION AND CONSISTENCY RESULTS FOR CANONICAL AND CURVED EXPONENTIAL-FAMILY MODELS OF RANDOM GRAPHS BY MICHAEL SCHWEINBERGER AND JONATHAN STEWART Rice University Statistical inference for exponential-family models of random graphs with dependent edges is challenging. We stress the importance of additional structure and show that additional structure facilitates statistical inference. A simple example of a random graph with additional structure is a random graph with neighborhoods and local dependence within neighborhoods. We develop the first concentration and consistency results for maximum likeli- hood and M-estimators of a wide range of canonical and curved exponential- family models of random graphs with local dependence. All results are non- asymptotic and applicable to random graphs with finite populations of nodes, although asymptotic consistency results can be obtained as well. In addition, we show that additional structure can facilitate subgraph-to-graph estimation, and present concentration results for subgraph-to-graph estimators. As an ap- plication, we consider popular curved exponential-family models of random graphs, with local dependence induced by transitivity and parameter vectors whose dimensions depend on the number of nodes. 1. Introduction. Models of network data have witnessed a surge of interest in statistics and related areas [e.g., 31]. Such data arise in the study of, e.g., social networks, epidemics, insurgencies, and terrorist networks. Since the work of Holland and Leinhardt in the 1970s [e.g., 21], it is known that network data exhibit a wide range of dependencies induced by transitivity and other interesting network phenomena [e.g., 39]. Transitivity is a form of triadic closure in the sense that, when a node k is connected to two distinct nodes i and j, then i and j are likely to be connected as well, which suggests that edges are dependent [e.g., 39].
    [Show full text]
  • Use of Statistical Tables
    TUTORIAL | SCOPE USE OF STATISTICAL TABLES Lucy Radford, Jenny V Freeman and Stephen J Walters introduce three important statistical distributions: the standard Normal, t and Chi-squared distributions PREVIOUS TUTORIALS HAVE LOOKED at hypothesis testing1 and basic statistical tests.2–4 As part of the process of statistical hypothesis testing, a test statistic is calculated and compared to a hypothesised critical value and this is used to obtain a P- value. This P-value is then used to decide whether the study results are statistically significant or not. It will explain how statistical tables are used to link test statistics to P-values. This tutorial introduces tables for three important statistical distributions (the TABLE 1. Extract from two-tailed standard Normal, t and Chi-squared standard Normal table. Values distributions) and explains how to use tabulated are P-values corresponding them with the help of some simple to particular cut-offs and are for z examples. values calculated to two decimal places. STANDARD NORMAL DISTRIBUTION TABLE 1 The Normal distribution is widely used in statistics and has been discussed in z 0.00 0.01 0.02 0.03 0.050.04 0.05 0.06 0.07 0.08 0.09 detail previously.5 As the mean of a Normally distributed variable can take 0.00 1.0000 0.9920 0.9840 0.9761 0.9681 0.9601 0.9522 0.9442 0.9362 0.9283 any value (−∞ to ∞) and the standard 0.10 0.9203 0.9124 0.9045 0.8966 0.8887 0.8808 0.8729 0.8650 0.8572 0.8493 deviation any positive value (0 to ∞), 0.20 0.8415 0.8337 0.8259 0.8181 0.8103 0.8206 0.7949 0.7872 0.7795 0.7718 there are an infinite number of possible 0.30 0.7642 0.7566 0.7490 0.7414 0.7339 0.7263 0.7188 0.7114 0.7039 0.6965 Normal distributions.
    [Show full text]
  • A Skew Extension of the T-Distribution, with Applications
    J. R. Statist. Soc. B (2003) 65, Part 1, pp. 159–174 A skew extension of the t-distribution, with applications M. C. Jones The Open University, Milton Keynes, UK and M. J. Faddy University of Birmingham, UK [Received March 2000. Final revision July 2002] Summary. A tractable skew t-distribution on the real line is proposed.This includes as a special case the symmetric t-distribution, and otherwise provides skew extensions thereof.The distribu- tion is potentially useful both for modelling data and in robustness studies. Properties of the new distribution are presented. Likelihood inference for the parameters of this skew t-distribution is developed. Application is made to two data modelling examples. Keywords: Beta distribution; Likelihood inference; Robustness; Skewness; Student’s t-distribution 1. Introduction Student’s t-distribution occurs frequently in statistics. Its usual derivation and use is as the sam- pling distribution of certain test statistics under normality, but increasingly the t-distribution is being used in both frequentist and Bayesian statistics as a heavy-tailed alternative to the nor- mal distribution when robustness to possible outliers is a concern. See Lange et al. (1989) and Gelman et al. (1995) and references therein. It will often be useful to consider a further alternative to the normal or t-distribution which is both heavy tailed and skew. To this end, we propose a family of distributions which includes the symmetric t-distributions as special cases, and also includes extensions of the t-distribution, still taking values on the whole real line, with non-zero skewness. Let a>0 and b>0be parameters.
    [Show full text]
  • 1. How Different Is the T Distribution from the Normal?
    Statistics 101–106 Lecture 7 (20 October 98) c David Pollard Page 1 Read M&M §7.1 and §7.2, ignoring starred parts. Reread M&M §3.2. The eects of estimated variances on normal approximations. t-distributions. Comparison of two means: pooling of estimates of variances, or paired observations. In Lecture 6, when discussing comparison of two Binomial proportions, I was content to estimate unknown variances when calculating statistics that were to be treated as approximately normally distributed. You might have worried about the effect of variability of the estimate. W. S. Gosset (“Student”) considered a similar problem in a very famous 1908 paper, where the role of Student’s t-distribution was first recognized. Gosset discovered that the effect of estimated variances could be described exactly in a simplified problem where n independent observations X1,...,Xn are taken from (, ) = ( + ...+ )/ a normal√ distribution, N . The sample mean, X X1 Xn n has a N(, / n) distribution. The random variable X Z = √ / n 2 2 Phas a standard normal distribution. If we estimate by the sample variance, s = ( )2/( ) i Xi X n 1 , then the resulting statistic, X T = √ s/ n no longer has a normal distribution. It has a t-distribution on n 1 degrees of freedom. Remark. I have written T , instead of the t used by M&M page 505. I find it causes confusion that t refers to both the name of the statistic and the name of its distribution. As you will soon see, the estimation of the variance has the effect of spreading out the distribution a little beyond what it would be if were used.
    [Show full text]
  • A New Parameter Estimator for the Generalized Pareto Distribution Under the Peaks Over Threshold Framework
    mathematics Article A New Parameter Estimator for the Generalized Pareto Distribution under the Peaks over Threshold Framework Xu Zhao 1,*, Zhongxian Zhang 1, Weihu Cheng 1 and Pengyue Zhang 2 1 College of Applied Sciences, Beijing University of Technology, Beijing 100124, China; [email protected] (Z.Z.); [email protected] (W.C.) 2 Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA; [email protected] * Correspondence: [email protected] Received: 1 April 2019; Accepted: 30 April 2019 ; Published: 7 May 2019 Abstract: Techniques used to analyze exceedances over a high threshold are in great demand for research in economics, environmental science, and other fields. The generalized Pareto distribution (GPD) has been widely used to fit observations exceeding the tail threshold in the peaks over threshold (POT) framework. Parameter estimation and threshold selection are two critical issues for threshold-based GPD inference. In this work, we propose a new GPD-based estimation approach by combining the method of moments and likelihood moment techniques based on the least squares concept, in which the shape and scale parameters of the GPD can be simultaneously estimated. To analyze extreme data, the proposed approach estimates the parameters by minimizing the sum of squared deviations between the theoretical GPD function and its expectation. Additionally, we introduce a recently developed stopping rule to choose the suitable threshold above which the GPD asymptotically fits the exceedances. Simulation studies show that the proposed approach performs better or similar to existing approaches, in terms of bias and the mean square error, in estimating the shape parameter.
    [Show full text]
  • On the Scale Parameter of Exponential Distribution
    Review of the Air Force Academy No.2 (34)/2017 ON THE SCALE PARAMETER OF EXPONENTIAL DISTRIBUTION Anca Ileana LUPAŞ Military Technical Academy, Bucharest, Romania ([email protected]) DOI: 10.19062/1842-9238.2017.15.2.16 Abstract: Exponential distribution is one of the widely used continuous distributions in various fields for statistical applications. In this paper we study the exact and asymptotical distribution of the scale parameter for this distribution. We will also define the confidence intervals for the studied parameter as well as the fixed length confidence intervals. 1. INTRODUCTION Exponential distribution is used in various statistical applications. Therefore, we often encounter exponential distribution in applications such as: life tables, reliability studies, extreme values analysis and others. In the following paper, we focus our attention on the exact and asymptotical repartition of the exponential distribution scale parameter estimator. 2. SCALE PARAMETER ESTIMATOR OF THE EXPONENTIAL DISTRIBUTION We will consider the random variable X with the following cumulative distribution function: x F(x ; ) 1 e ( x 0 , 0) (1) where is an unknown scale parameter Using the relationships between MXXX( ) ; 22( ) ; ( ) , we obtain ()X a theoretical variation coefficient 1. This is a useful indicator, especially if MX() you have observational data which seems to be exponential and with variation coefficient of the selection closed to 1. If we consider x12, x ,... xn as a part of a population that follows an exponential distribution, then by using the maximum likelihood estimation method we obtain the following estimate n ˆ 1 xi (2) n i1 119 On the Scale Parameter of Exponential Distribution Since M ˆ , it follows that ˆ is an unbiased estimator for .
    [Show full text]
  • Estimation of Common Location and Scale Parameters in Nonregular Cases Ahmad Razmpour Iowa State University
    Iowa State University Capstones, Theses and Retrospective Theses and Dissertations Dissertations 1982 Estimation of common location and scale parameters in nonregular cases Ahmad Razmpour Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/rtd Part of the Statistics and Probability Commons Recommended Citation Razmpour, Ahmad, "Estimation of common location and scale parameters in nonregular cases " (1982). Retrospective Theses and Dissertations. 7528. https://lib.dr.iastate.edu/rtd/7528 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. INFORMATION TO USERS This reproduction was made from a copy of a document sent to us for microfilming. While the most advanced technology has been used to photograph and reproduce this document, the quality of the reproduction is heavily dependent upon the quality of the material submitted. The following explanation of techniques is provided to help clarify markings or notations which may appear on this reproduction. 1. The sign or "target" for pages apparently lacking from the document photographed is "Missing Page(s)". If it was possible to obtain the missing page(s) or section, they are spliced into the film along with adjacent pages. This may have necessitated cutting through an image and duplicating adjacent pages to assure complete continuity. 2. When an image on the film is obliterated with a round black mark, it is an indication of either blurred copy because of movement during exposure, duplicate copy, or copyrighted materials that should not have been filmed.
    [Show full text]
  • A Comparison of Unbiased and Plottingposition Estimators of L
    WATER RESOURCES RESEARCH, VOL. 31, NO. 8, PAGES 2019-2025, AUGUST 1995 A comparison of unbiased and plotting-position estimators of L moments J. R. M. Hosking and J. R. Wallis IBM ResearchDivision, T. J. Watson ResearchCenter, Yorktown Heights, New York Abstract. Plotting-positionestimators of L momentsand L moment ratios have several disadvantagescompared with the "unbiased"estimators. For generaluse, the "unbiased'? estimatorsshould be preferred. Plotting-positionestimators may still be usefulfor estimatingextreme upper tail quantilesin regional frequencyanalysis. Probability-Weighted Moments and L Moments •r+l-" (--1)r • P*r,k Olk '- E p *r,!•[J!•. Probability-weightedmoments of a randomvariable X with k=0 k=0 cumulativedistribution function F( ) and quantile function It is convenient to define dimensionless versions of L mo- x( ) were definedby Greenwoodet al. [1979]to be the quan- tities ments;this is achievedby dividingthe higher-orderL moments by the scale measure h2. The L moment ratios •'r, r = 3, Mp,ra= E[XP{F(X)}r{1- F(X)} s] 4, '", are definedby ßr-" •r/•2 ß {X(u)}PUr(1 -- U)s du. L momentratios measure the shapeof a distributionindepen- dently of its scaleof measurement.The ratios *3 ("L skew- ness")and *4 ("L kurtosis")are nowwidely used as measures Particularlyuseful specialcases are the probability-weighted of skewnessand kurtosis,respectively [e.g., Schaefer,1990; moments Pilon and Adamowski,1992; Royston,1992; Stedingeret al., 1992; Vogeland Fennessey,1993]. 12•r= M1,0, r = •01 (1 - u)rx(u) du, Estimators Given an ordered sample of size n, Xl: n • X2:n • ''' • urx(u) du. X.... there are two establishedways of estimatingthe proba- /3r--- Ml,r, 0 =f01 bility-weightedmoments and L moments of the distribution from whichthe samplewas drawn.
    [Show full text]
  • A Study of Non-Central Skew T Distributions and Their Applications in Data Analysis and Change Point Detection
    A STUDY OF NON-CENTRAL SKEW T DISTRIBUTIONS AND THEIR APPLICATIONS IN DATA ANALYSIS AND CHANGE POINT DETECTION Abeer M. Hasan A Dissertation Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY August 2013 Committee: Arjun K. Gupta, Co-advisor Wei Ning, Advisor Mark Earley, Graduate Faculty Representative Junfeng Shang. Copyright c August 2013 Abeer M. Hasan All rights reserved iii ABSTRACT Arjun K. Gupta, Co-advisor Wei Ning, Advisor Over the past three decades there has been a growing interest in searching for distribution families that are suitable to analyze skewed data with excess kurtosis. The search started by numerous papers on the skew normal distribution. Multivariate t distributions started to catch attention shortly after the development of the multivariate skew normal distribution. Many researchers proposed alternative methods to generalize the univariate t distribution to the multivariate case. Recently, skew t distribution started to become popular in research. Skew t distributions provide more flexibility and better ability to accommodate long-tailed data than skew normal distributions. In this dissertation, a new non-central skew t distribution is studied and its theoretical properties are explored. Applications of the proposed non-central skew t distribution in data analysis and model comparisons are studied. An extension of our distribution to the multivariate case is presented and properties of the multivariate non-central skew t distri- bution are discussed. We also discuss the distribution of quadratic forms of the non-central skew t distribution. In the last chapter, the change point problem of the non-central skew t distribution is discussed under different settings.
    [Show full text]
  • 1 One Parameter Exponential Families
    1 One parameter exponential families The world of exponential families bridges the gap between the Gaussian family and general dis- tributions. Many properties of Gaussians carry through to exponential families in a fairly precise sense. • In the Gaussian world, there exact small sample distributional results (i.e. t, F , χ2). • In the exponential family world, there are approximate distributional results (i.e. deviance tests). • In the general setting, we can only appeal to asymptotics. A one-parameter exponential family, F is a one-parameter family of distributions of the form Pη(dx) = exp (η · t(x) − Λ(η)) P0(dx) for some probability measure P0. The parameter η is called the natural or canonical parameter and the function Λ is called the cumulant generating function, and is simply the normalization needed to make dPη fη(x) = (x) = exp (η · t(x) − Λ(η)) dP0 a proper probability density. The random variable t(X) is the sufficient statistic of the exponential family. Note that P0 does not have to be a distribution on R, but these are of course the simplest examples. 1.0.1 A first example: Gaussian with linear sufficient statistic Consider the standard normal distribution Z e−z2=2 P0(A) = p dz A 2π and let t(x) = x. Then, the exponential family is eη·x−x2=2 Pη(dx) / p 2π and we see that Λ(η) = η2=2: eta= np.linspace(-2,2,101) CGF= eta**2/2. plt.plot(eta, CGF) A= plt.gca() A.set_xlabel(r'$\eta$', size=20) A.set_ylabel(r'$\Lambda(\eta)$', size=20) f= plt.gcf() 1 Thus, the exponential family in this setting is the collection F = fN(η; 1) : η 2 Rg : d 1.0.2 Normal with quadratic sufficient statistic on R d As a second example, take P0 = N(0;Id×d), i.e.
    [Show full text]
  • Chapter 8 Fundamental Sampling Distributions And
    CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS 8.1 Random Sampling pling procedure, it is desirable to choose a random sample in the sense that the observations are made The basic idea of the statistical inference is that we independently and at random. are allowed to draw inferences or conclusions about a Random Sample population based on the statistics computed from the sample data so that we could infer something about Let X1;X2;:::;Xn be n independent random variables, the parameters and obtain more information about the each having the same probability distribution f (x). population. Thus we must make sure that the samples Define X1;X2;:::;Xn to be a random sample of size must be good representatives of the population and n from the population f (x) and write its joint proba- pay attention on the sampling bias and variability to bility distribution as ensure the validity of statistical inference. f (x1;x2;:::;xn) = f (x1) f (x2) f (xn): ··· 8.2 Some Important Statistics It is important to measure the center and the variabil- ity of the population. For the purpose of the inference, we study the following measures regarding to the cen- ter and the variability. 8.2.1 Location Measures of a Sample The most commonly used statistics for measuring the center of a set of data, arranged in order of mag- nitude, are the sample mean, sample median, and sample mode. Let X1;X2;:::;Xn represent n random variables. Sample Mean To calculate the average, or mean, add all values, then Bias divide by the number of individuals.
    [Show full text]