Student's T-Distribution

Total Page:16

File Type:pdf, Size:1020Kb

Student's T-Distribution Student's t-distribution STAT 587 (Engineering) Iowa State University March 30, 2021 (STAT587@ISU) Student's t-distribution March 30, 2021 1 / 13 Student's t distribution Probability density function Student's t distribution The random variable X has a Student's t distribution with degrees of freedom ν > 0 if its probability density function is ν+1 ν+1 2 − 2 Γ 2 x p(xjν) = p ν 1 + νπΓ( 2 ) ν where Γ(α) is the gamma function, Z 1 Γ(α) = xα−1e−xdx: 0 We write X ∼ tν. (STAT587@ISU) Student's t-distribution March 30, 2021 2 / 13 Student's t distribution Probability density function - graphically Student's t probability density function Student's t random variables 0.4 0.3 Degrees of freedom 1 0.2 10 100 0.1 Probablity density function, p(x) Probablity 0.0 −4 −2 0 2 4 x (STAT587@ISU) Student's t-distribution March 30, 2021 3 / 13 Student's t distribution Mean and variance Student's t mean and variance If T ∼ tv, then − ν+1 Z 1 ν+1 2 2 Γ 2 x E[X] = x p ν 1 + dx = ··· = 0; ν > 1 −∞ νπΓ( 2 ) ν and − ν+1 Z 1 ν+1 2 2 2 Γ 2 x ν V ar[X] = (x − 0) p ν 1 + dx = ··· = ; ν > 2: 0 νπΓ( 2 ) ν ν − 2 (STAT587@ISU) Student's t-distribution March 30, 2021 4 / 13 Student's t distribution Cumulative distribution function - graphically Gamma cumulative distribution function - graphically Student's t random variables 1.00 0.75 Degrees of freedom 1 0.50 10 100 0.25 Cumulative distribution function, F(x) distribution Cumulative 0.00 −4 −2 0 2 4 x (STAT587@ISU) Student's t-distribution March 30, 2021 5 / 13 Generalized Student's t distribution Location-scale t distribution Location-scale t distribution If X ∼ tν, then 2 Y = µ + σX ∼ tν(µ, σ ) for parameters: degrees of freedom ν > 0, location µ and scale σ > 0. By properties of expectations and variances, we can find that E[Y ] = µ, ν > 1 and ν V ar[Y ] = σ2; ν > 2: ν − 2 (STAT587@ISU) Student's t-distribution March 30, 2021 6 / 13 Generalized Student's t distribution Probability density function Generalized Student's t probability density function The random variable Y has a generalized Student's t distribution with degrees of freedom ν > 0, location µ, and scale σ > 0 if its probability density function is − ν+1 ν+1 2! 2 Γ 2 1 y − µ p(y) = ν p 1 + Γ( 2 ) νπσ ν σ 2 We write Y ∼ tν(µ, σ ). (STAT587@ISU) Student's t-distribution March 30, 2021 7 / 13 Generalized Student's t distribution Probability density function - graphically Generalized Student's t probability density function Student's t10 random variables 0.4 0.3 Scale, σ 1 2 0.2 Location, µ 0 0.1 2 Probablity density function, p(x) Probablity 0.0 −4 −2 0 2 4 x (STAT587@ISU) Student's t-distribution March 30, 2021 8 / 13 Generalized Student's t distribution t with 1 degree of freedom t with 1 degree of freedom 2 If T ∼ t1(µ, σ ), then T has a Cauchy distribution and we write T ∼ Ca(µ, σ2): If T ∼ t1(0; 1), then T has a standard Cauchy distribution. A Cauchy random variable has no mean or variance. (STAT587@ISU) Student's t-distribution March 30, 2021 9 / 13 Generalized Student's t distribution As degrees of freedom increases As degrees of freedom increases 2 If Tν ∼ tν(µ, σ ), then d 2 lim Tν = X ∼ N(µ, σ ) ν!1 (STAT587@ISU) Student's t-distribution March 30, 2021 10 / 13 Generalized Student's t distribution t distribution arise from a normal sample t distribution arising from a normal sample iid 2 Let Xi ∼ N(µ, σ ). We calculate the sample mean n 1 X X = X n i i=1 and the sample variance n 1 X S2 = (X − X)2: n − 1 i i=1 Then X − µ T = p ∼ t : S= n n−1 (STAT587@ISU) Student's t-distribution March 30, 2021 11 / 13 Generalized Student's t distribution Inverse-gamma scale mixture of a normal Inverse-gamma scale mixture of a normal If ν ν Xjσ2 ∼ N(µ, σ2=n) and σ2 ∼ IG ; s2 2 2 then 2 X ∼ tν(µ, s =n) which is obtained by Z 2 2 2 px(x) = pxjσ2 (xjσ )pσ2 (σ )dσ where px is the marginal density for x 2 pxjσ2 is the conditional density for x given σ , and 2 pσ2 is the marginal density for σ . (STAT587@ISU) Student's t-distribution March 30, 2021 12 / 13 Generalized Student's t distribution Summary Summary Student's t random variable: 2 T ∼ tν(µ, σ ); ν; σ > 0 E[X] = µ, ν > 1 ν 2 V ar[X] = ν−2 σ ; ν > 2 Relationships to other distributions (STAT587@ISU) Student's t-distribution March 30, 2021 13 / 13.
Recommended publications
  • 1. How Different Is the T Distribution from the Normal?
    Statistics 101–106 Lecture 7 (20 October 98) c David Pollard Page 1 Read M&M §7.1 and §7.2, ignoring starred parts. Reread M&M §3.2. The eects of estimated variances on normal approximations. t-distributions. Comparison of two means: pooling of estimates of variances, or paired observations. In Lecture 6, when discussing comparison of two Binomial proportions, I was content to estimate unknown variances when calculating statistics that were to be treated as approximately normally distributed. You might have worried about the effect of variability of the estimate. W. S. Gosset (“Student”) considered a similar problem in a very famous 1908 paper, where the role of Student’s t-distribution was first recognized. Gosset discovered that the effect of estimated variances could be described exactly in a simplified problem where n independent observations X1,...,Xn are taken from (, ) = ( + ...+ )/ a normal√ distribution, N . The sample mean, X X1 Xn n has a N(, / n) distribution. The random variable X Z = √ / n 2 2 Phas a standard normal distribution. If we estimate by the sample variance, s = ( )2/( ) i Xi X n 1 , then the resulting statistic, X T = √ s/ n no longer has a normal distribution. It has a t-distribution on n 1 degrees of freedom. Remark. I have written T , instead of the t used by M&M page 505. I find it causes confusion that t refers to both the name of the statistic and the name of its distribution. As you will soon see, the estimation of the variance has the effect of spreading out the distribution a little beyond what it would be if were used.
    [Show full text]
  • 5. the Student T Distribution
    Virtual Laboratories > 4. Special Distributions > 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 5. The Student t Distribution In this section we will study a distribution that has special importance in statistics. In particular, this distribution will arise in the study of a standardized version of the sample mean when the underlying distribution is normal. The Probability Density Function Suppose that Z has the standard normal distribution, V has the chi-squared distribution with n degrees of freedom, and that Z and V are independent. Let Z T= √V/n In the following exercise, you will show that T has probability density function given by −(n +1) /2 Γ((n + 1) / 2) t2 f(t)= 1 + , t∈ℝ ( n ) √n π Γ(n / 2) 1. Show that T has the given probability density function by using the following steps. n a. Show first that the conditional distribution of T given V=v is normal with mean 0 a nd variance v . b. Use (a) to find the joint probability density function of (T,V). c. Integrate the joint probability density function in (b) with respect to v to find the probability density function of T. The distribution of T is known as the Student t distribution with n degree of freedom. The distribution is well defined for any n > 0, but in practice, only positive integer values of n are of interest. This distribution was first studied by William Gosset, who published under the pseudonym Student. In addition to supplying the proof, Exercise 1 provides a good way of thinking of the t distribution: the t distribution arises when the variance of a mean 0 normal distribution is randomized in a certain way.
    [Show full text]
  • A Study of Non-Central Skew T Distributions and Their Applications in Data Analysis and Change Point Detection
    A STUDY OF NON-CENTRAL SKEW T DISTRIBUTIONS AND THEIR APPLICATIONS IN DATA ANALYSIS AND CHANGE POINT DETECTION Abeer M. Hasan A Dissertation Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY August 2013 Committee: Arjun K. Gupta, Co-advisor Wei Ning, Advisor Mark Earley, Graduate Faculty Representative Junfeng Shang. Copyright c August 2013 Abeer M. Hasan All rights reserved iii ABSTRACT Arjun K. Gupta, Co-advisor Wei Ning, Advisor Over the past three decades there has been a growing interest in searching for distribution families that are suitable to analyze skewed data with excess kurtosis. The search started by numerous papers on the skew normal distribution. Multivariate t distributions started to catch attention shortly after the development of the multivariate skew normal distribution. Many researchers proposed alternative methods to generalize the univariate t distribution to the multivariate case. Recently, skew t distribution started to become popular in research. Skew t distributions provide more flexibility and better ability to accommodate long-tailed data than skew normal distributions. In this dissertation, a new non-central skew t distribution is studied and its theoretical properties are explored. Applications of the proposed non-central skew t distribution in data analysis and model comparisons are studied. An extension of our distribution to the multivariate case is presented and properties of the multivariate non-central skew t distri- bution are discussed. We also discuss the distribution of quadratic forms of the non-central skew t distribution. In the last chapter, the change point problem of the non-central skew t distribution is discussed under different settings.
    [Show full text]
  • 1 One Parameter Exponential Families
    1 One parameter exponential families The world of exponential families bridges the gap between the Gaussian family and general dis- tributions. Many properties of Gaussians carry through to exponential families in a fairly precise sense. • In the Gaussian world, there exact small sample distributional results (i.e. t, F , χ2). • In the exponential family world, there are approximate distributional results (i.e. deviance tests). • In the general setting, we can only appeal to asymptotics. A one-parameter exponential family, F is a one-parameter family of distributions of the form Pη(dx) = exp (η · t(x) − Λ(η)) P0(dx) for some probability measure P0. The parameter η is called the natural or canonical parameter and the function Λ is called the cumulant generating function, and is simply the normalization needed to make dPη fη(x) = (x) = exp (η · t(x) − Λ(η)) dP0 a proper probability density. The random variable t(X) is the sufficient statistic of the exponential family. Note that P0 does not have to be a distribution on R, but these are of course the simplest examples. 1.0.1 A first example: Gaussian with linear sufficient statistic Consider the standard normal distribution Z e−z2=2 P0(A) = p dz A 2π and let t(x) = x. Then, the exponential family is eη·x−x2=2 Pη(dx) / p 2π and we see that Λ(η) = η2=2: eta= np.linspace(-2,2,101) CGF= eta**2/2. plt.plot(eta, CGF) A= plt.gca() A.set_xlabel(r'$\eta$', size=20) A.set_ylabel(r'$\Lambda(\eta)$', size=20) f= plt.gcf() 1 Thus, the exponential family in this setting is the collection F = fN(η; 1) : η 2 Rg : d 1.0.2 Normal with quadratic sufficient statistic on R d As a second example, take P0 = N(0;Id×d), i.e.
    [Show full text]
  • Random Variables and Probability Distributions 1.1
    RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 1. DISCRETE RANDOM VARIABLES 1.1. Definition of a Discrete Random Variable. A random variable X is said to be discrete if it can assume only a finite or countable infinite number of distinct values. A discrete random variable can be defined on both a countable or uncountable sample space. 1.2. Probability for a discrete random variable. The probability that X takes on the value x, P(X=x), is defined as the sum of the probabilities of all sample points in Ω that are assigned the value x. We may denote P(X=x) by p(x) or pX (x). The expression pX (x) is a function that assigns probabilities to each possible value x; thus it is often called the probability function for the random variable X. 1.3. Probability distribution for a discrete random variable. The probability distribution for a discrete random variable X can be represented by a formula, a table, or a graph, which provides pX (x) = P(X=x) for all x. The probability distribution for a discrete random variable assigns nonzero probabilities to only a countable number of distinct x values. Any value x not explicitly assigned a positive probability is understood to be such that P(X=x) = 0. The function pX (x)= P(X=x) for each x within the range of X is called the probability distribution of X. It is often called the probability mass function for the discrete random variable X. 1.4. Properties of the probability distribution for a discrete random variable.
    [Show full text]
  • On the Meaning and Use of Kurtosis
    Psychological Methods Copyright 1997 by the American Psychological Association, Inc. 1997, Vol. 2, No. 3,292-307 1082-989X/97/$3.00 On the Meaning and Use of Kurtosis Lawrence T. DeCarlo Fordham University For symmetric unimodal distributions, positive kurtosis indicates heavy tails and peakedness relative to the normal distribution, whereas negative kurtosis indicates light tails and flatness. Many textbooks, however, describe or illustrate kurtosis incompletely or incorrectly. In this article, kurtosis is illustrated with well-known distributions, and aspects of its interpretation and misinterpretation are discussed. The role of kurtosis in testing univariate and multivariate normality; as a measure of departures from normality; in issues of robustness, outliers, and bimodality; in generalized tests and estimators, as well as limitations of and alternatives to the kurtosis measure [32, are discussed. It is typically noted in introductory statistics standard deviation. The normal distribution has a kur- courses that distributions can be characterized in tosis of 3, and 132 - 3 is often used so that the refer- terms of central tendency, variability, and shape. With ence normal distribution has a kurtosis of zero (132 - respect to shape, virtually every textbook defines and 3 is sometimes denoted as Y2)- A sample counterpart illustrates skewness. On the other hand, another as- to 132 can be obtained by replacing the population pect of shape, which is kurtosis, is either not discussed moments with the sample moments, which gives or, worse yet, is often described or illustrated incor- rectly. Kurtosis is also frequently not reported in re- ~(X i -- S)4/n search articles, in spite of the fact that virtually every b2 (•(X i - ~')2/n)2' statistical package provides a measure of kurtosis.
    [Show full text]
  • Random Processes
    Chapter 6 Random Processes Random Process • A random process is a time-varying function that assigns the outcome of a random experiment to each time instant: X(t). • For a fixed (sample path): a random process is a time varying function, e.g., a signal. – For fixed t: a random process is a random variable. • If one scans all possible outcomes of the underlying random experiment, we shall get an ensemble of signals. • Random Process can be continuous or discrete • Real random process also called stochastic process – Example: Noise source (Noise can often be modeled as a Gaussian random process. An Ensemble of Signals Remember: RV maps Events à Constants RP maps Events à f(t) RP: Discrete and Continuous The set of all possible sample functions {v(t, E i)} is called the ensemble and defines the random process v(t) that describes the noise source. Sample functions of a binary random process. RP Characterization • Random variables x 1 , x 2 , . , x n represent amplitudes of sample functions at t 5 t 1 , t 2 , . , t n . – A random process can, therefore, be viewed as a collection of an infinite number of random variables: RP Characterization – First Order • CDF • PDF • Mean • Mean-Square Statistics of a Random Process RP Characterization – Second Order • The first order does not provide sufficient information as to how rapidly the RP is changing as a function of timeà We use second order estimation RP Characterization – Second Order • The first order does not provide sufficient information as to how rapidly the RP is changing as a function
    [Show full text]
  • Beta-Cauchy Distribution: Some Properties and Applications
    Journal of Statistical Theory and Applications, Vol. 12, No. 4 (December 2013), 378-391 Beta-Cauchy Distribution: Some Properties and Applications Etaf Alshawarbeh Department of Mathematics, Central Michigan University Mount Pleasant, MI 48859, USA Email: [email protected] Felix Famoye Department of Mathematics, Central Michigan University Mount Pleasant, MI 48859, USA Email: [email protected] Carl Lee Department of Mathematics, Central Michigan University Mount Pleasant, MI 48859, USA Email: [email protected] Abstract Some properties of the four-parameter beta-Cauchy distribution such as the mean deviation and Shannon’s entropy are obtained. The method of maximum likelihood is proposed to estimate the parameters of the distribution. A simulation study is carried out to assess the performance of the maximum likelihood estimates. The usefulness of the new distribution is illustrated by applying it to three empirical data sets and comparing the results to some existing distributions. The beta-Cauchy distribution is found to provide great flexibility in modeling symmetric and skewed heavy-tailed data sets. Keywords and Phrases: Beta family, mean deviation, entropy, maximum likelihood estimation. 1. Introduction Eugene et al.1 introduced a class of new distributions using beta distribution as the generator and pointed out that this can be viewed as a family of generalized distributions of order statistics. Jones2 studied some general properties of this family. This class of distributions has been referred to as “beta-generated distributions”.3 This family of distributions provides some flexibility in modeling data sets with different shapes. Let F(x) be the cumulative distribution function (CDF) of a random variable X, then the CDF G(x) for the “beta-generated distributions” is given by Fx() 1 11 G( x ) [ B ( , )] t(1 t ) dt , 0 , , (1) 0 where B(αβ , )=ΓΓ ( α ) ( β )/ Γ+ ( α β ) .
    [Show full text]
  • 11. Parameter Estimation
    11. Parameter Estimation Chris Piech and Mehran Sahami May 2017 We have learned many different distributions for random variables and all of those distributions had parame- ters: the numbers that you provide as input when you define a random variable. So far when we were working with random variables, we either were explicitly told the values of the parameters, or, we could divine the values by understanding the process that was generating the random variables. What if we don’t know the values of the parameters and we can’t estimate them from our own expert knowl- edge? What if instead of knowing the random variables, we have a lot of examples of data generated with the same underlying distribution? In this chapter we are going to learn formal ways of estimating parameters from data. These ideas are critical for artificial intelligence. Almost all modern machine learning algorithms work like this: (1) specify a probabilistic model that has parameters. (2) Learn the value of those parameters from data. Parameters Before we dive into parameter estimation, first let’s revisit the concept of parameters. Given a model, the parameters are the numbers that yield the actual distribution. In the case of a Bernoulli random variable, the single parameter was the value p. In the case of a Uniform random variable, the parameters are the a and b values that define the min and max value. Here is a list of random variables and the corresponding parameters. From now on, we are going to use the notation q to be a vector of all the parameters: Distribution Parameters Bernoulli(p) q = p Poisson(l) q = l Uniform(a,b) q = (a;b) Normal(m;s 2) q = (m;s 2) Y = mX + b q = (m;b) In the real world often you don’t know the “true” parameters, but you get to observe data.
    [Show full text]
  • Volatility Modeling Using the Student's T Distribution
    Volatility Modeling Using the Student’s t Distribution Maria S. Heracleous Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Economics Aris Spanos, Chair Richard Ashley Raman Kumar Anya McGuirk Dennis Yang August 29, 2003 Blacksburg, Virginia Keywords: Student’s t Distribution, Multivariate GARCH, VAR, Exchange Rates Copyright 2003, Maria S. Heracleous Volatility Modeling Using the Student’s t Distribution Maria S. Heracleous (ABSTRACT) Over the last twenty years or so the Dynamic Volatility literature has produced a wealth of uni- variateandmultivariateGARCHtypemodels.Whiletheunivariatemodelshavebeenrelatively successful in empirical studies, they suffer from a number of weaknesses, such as unverifiable param- eter restrictions, existence of moment conditions and the retention of Normality. These problems are naturally more acute in the multivariate GARCH type models, which in addition have the problem of overparameterization. This dissertation uses the Student’s t distribution and follows the Probabilistic Reduction (PR) methodology to modify and extend the univariate and multivariate volatility models viewed as alternative to the GARCH models. Its most important advantage is that it gives rise to internally consistent statistical models that do not require ad hoc parameter restrictions unlike the GARCH formulations. Chapters 1 and 2 provide an overview of my dissertation and recent developments in the volatil- ity literature. In Chapter 3 we provide an empirical illustration of the PR approach for modeling univariate volatility. Estimation results suggest that the Student’s t AR model is a parsimonious and statistically adequate representation of exchange rate returns and Dow Jones returns data.
    [Show full text]
  • Chapter 7 “Continuous Distributions”.Pdf
    CHAPTER 7 Continuous distributions 7.1. Basic theory 7.1.1. Denition, PDF, CDF. We start with the denition a continuous random variable. Denition (Continuous random variables) A random variable X is said to have a continuous distribution if there exists a non- negative function f = f such that X ¢ b P(a 6 X 6 b) = f(x)dx a for every a and b. The function f is called the density function for X or the PDF for X. ¡More precisely, such an X is said to have an absolutely continuous¡ distribution. Note that 1 . In particular, a for every . −∞ f(x)dx = P(−∞ < X < 1) = 1 P(X = a) = a f(x)dx = 0 a 3 ¡Example 7.1. Suppose we are given that f(x) = c=x for x > 1 and 0 otherwise. Since 1 and −∞ f(x)dx = 1 ¢ ¢ 1 1 1 c c f(x)dx = c 3 dx = ; −∞ 1 x 2 we have c = 2. PMF or PDF? Probability mass function (PMF) and (probability) density function (PDF) are two names for the same notion in the case of discrete random variables. We say PDF or simply a density function for a general random variable, and we use PMF only for discrete random variables. Denition (Cumulative distribution function (CDF)) The distribution function of X is dened as ¢ y F (y) = FX (y) := P(−∞ < X 6 y) = f(x)dx: −∞ It is also called the cumulative distribution function (CDF) of X. 97 98 7. CONTINUOUS DISTRIBUTIONS We can dene CDF for any random variable, not just continuous ones, by setting F (y) := P(X 6 y).
    [Show full text]
  • (Introduction to Probability at an Advanced Level) - All Lecture Notes
    Fall 2018 Statistics 201A (Introduction to Probability at an advanced level) - All Lecture Notes Aditya Guntuboyina August 15, 2020 Contents 0.1 Sample spaces, Events, Probability.................................5 0.2 Conditional Probability and Independence.............................6 0.3 Random Variables..........................................7 1 Random Variables, Expectation and Variance8 1.1 Expectations of Random Variables.................................9 1.2 Variance................................................ 10 2 Independence of Random Variables 11 3 Common Distributions 11 3.1 Ber(p) Distribution......................................... 11 3.2 Bin(n; p) Distribution........................................ 11 3.3 Poisson Distribution......................................... 12 4 Covariance, Correlation and Regression 14 5 Correlation and Regression 16 6 Back to Common Distributions 16 6.1 Geometric Distribution........................................ 16 6.2 Negative Binomial Distribution................................... 17 7 Continuous Distributions 17 7.1 Normal or Gaussian Distribution.................................. 17 1 7.2 Uniform Distribution......................................... 18 7.3 The Exponential Density...................................... 18 7.4 The Gamma Density......................................... 18 8 Variable Transformations 19 9 Distribution Functions and the Quantile Transform 20 10 Joint Densities 22 11 Joint Densities under Transformations 23 11.1 Detour to Convolutions......................................
    [Show full text]