Lecture 3. Inference About Multivariate Normal Distribution

Total Page:16

File Type:pdf, Size:1020Kb

Lecture 3. Inference About Multivariate Normal Distribution Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X1;:::; Xn be i.i.d. Np(µ, Σ). We are interested in evaluation of the maximum likelihood estimates of µ and Σ. Recall that the joint density of X1 is − 1 1 0 −1 f(x) = j2πΣj 2 exp − (x − µ) Σ (x − µ) ; 2 p n for x 2 R . The negative log likelihood function, given observations x1 = fx1;:::; xng, is then n n 1 X `(µ, Σ j xn) = log jΣj + (x − µ)0Σ−1(x − µ) + c 1 2 2 i i i=1 n n n 1 X = log jΣj + (x¯ − µ)0Σ−1(x¯ − µ) + (x − x¯)0Σ−1(x − x¯): 2 2 2 i i i=1 ~ On this end, denote the centered data matrix by X = [x~1;:::; x~n]p×n, where x~i = xi − x¯. Let n 1 1 X S = X~ X~ 0 = x~ x~0 : 0 n n i i i=1 n Proposition 1. The m.l.e. of µ and Σ, that jointly minimize `(µ, Σ j x1 ), are µ^MLE = x¯; MLE Σb = S0: Note that S0 is a biased estimator of Σ. The sample variance{covariance matrix S = n n−1 S0 is unbiased. For interval estimation of µ, we largely follow Section 7.1 of H¨ardleand Simar (2012). First note that since µ 2 Rp, we need to generalize the notion of intervals (primarily defined for R1) to higher dimension. A simple extension is a direct product of marginal intervals: for intervals a < x < b and c < y < d, we obtain a rectangular region f(x; y) 2 R2 : a < x < b; c < y < dg. A confidence region A 2 Rp is composed of the values of a function of (random) obser- n vations X1;:::; Xn. A = A(X1 ) is a confidence region of size 1 − α 2 (0; 1) for parameter µ if p P (µ 2 A) ≥ 1 − α; for all µ 2 R : (Elliptical confidence region) Corollary 7 in lecture 2 provides a pivot which paves n−p ¯ 0 −1 ¯ a way to construct a confidence region for µ. Since p (X − µ) S0 (X − µ) ∼ Fp;n−p and ¯ 0 −1 ¯ p P (X − µ) S0 (X − µ) < n−p F1−α;p;n−p = 1 − α; p A = µ 2 p :(X¯ − µ)0S−1(X¯ − µ) < F R 0 n − p 1−α;p;n−p is a confidence region of size 1 − α for parameter µ. (Simultaneous confidence intervals) Simultaneous confidence intervals for all linear combinations of elements of µ, a0µ for arbitrary a 2 Rp, provides confidence of size 1 − α 0 for all intervals covering a µ including the marginal means µ1; : : : ; µp. We are interested in evaluating lower and upper bounds L(a) and U(a) satisfying 0 p p P (L(a) < a µ < U(a); for all a 2 R ) ≥ 1 − α; for all µ 2 R : First consider a single confidence interval by fixing a particular vector a. To evaluate 0 0 0 0 a confidence interval for a µ, write new random variables Yi = a Xi ∼ N1(a µ, a Σa)(i = 2 (a0µ−a0X¯ )2 1; : : : ; n), whose squared t-statistic is t (a) = n a0Sa ∼ F1;n−1. Thus, for any fixed a, 2 P (t (a) ≤ F1−α,1;n−1) = 1 − α: (1) Next, consider many projection vectors a1; a2;:::; aM (M is finite only for convenience). The simultaneous confidence intervals of the type similar to (1) are then M ! \ 2 P ft (ai) ≤ h(α)g ≥ 1 − α; i=1 for some h(·). By collecting some facts, 2 2 1. maxa t (a) ≤ h(α) implies t (ai) ≤ h(α) for all i. 2 ¯ 0 −1 ¯ 2. maxa t (a) = n(µ − X) S (µ − X), 3. and Corollary 7 in lecture 2, n−1 p we have for h(α) = n n−p F1−α;p;n−p, M ! \ 2 2 P ft (ai) ≤ h(α)g ≥ P max t (a) ≤ h(α) = 1 − α: a i=1 Proposition 2. Simultaneously for all a 2 Rp, the interval a0X¯ ± ph(α)a0Sa contains a0µ with probability 1 − α. Example 1. From the Golub gene expression data, with dimension d = 7129, take the first and 1674th variables (genes), to focus on the bivariate case (p = 2). There are two populations: 11 observations from AML, 27 from ALL. Figure 1 illustrates the elliptical confidence region of size 95% and 99%. Figure 2 compares the elliptical confidence region with the simultaneous 0 0 confidence intervals for a1 = (1; 0) and a2 = (0; 1) . 2 4 x 10 Golub data, ALL (red) and AML(blue) 2.5 2 1.5 gene #1674 1 0.5 0 −500 −400 −300 −200 −100 0 100 gene #1 Figure 1: Elliptical confidence regions of size 95% and 99%. 4 x 10 Golub data, ALL (red) and AML(blue) 2.5 2 1.5 gene #1674 1 0.5 0 −500 −400 −300 −200 −100 0 100 gene #1 Figure 2: Simultaneous confidence intervals of size 95% 3.2 Hypotheses testing Consider testing a null hypothesis H0 : θ 2 Ω0 against an alternative hypothesis H1 : θ 2 Ω1. The principle of likelihood ratio test is as follows: Let L0 be the maximized likelihood under 3 H0 : θ 2 Ω0, and L1 be the maximized likelihood under θ 2 Ω0 [ Ω1. The likelihood ratio statistic, or sometimes called Wilks statistic, is then L W = −2 log( 0 ) ≥ 0 L1 The null hypothesis is rejected if the observed value of W is large. In some cases the exact distribution of W under H0 can be evaluated. In other cases, Wilks' theorem states that for large n (sample size), L 2 W =)χν; where ν is the number of free parameters in H1, not in H0. If the degrees of freedom in Ω0 is q and the degrees of freedom in Ω0 [ Ω1 is r, then ν = r − q. Consider testing hypotheses on µ and Σ of multivariate normal distribution, based on n-sample X1;:::; Xn. case I: H0 : µ = µ0;H1 : µ 6= µ0; Σ is known. In this case, we know the exact distribution of the likelihood ratio statistic 0 −1 2 W = n(x¯ − µ0) Σ (x¯ − µ0) ∼ χp; under H0. case II: H0 : µ = µ0;H1 : µ 6= µ0; Σ is unknown. ¯ The m.l.e. under H1 areµ ^ = x and Σb = S0. The restrictedp m.l.e. of Σ under H0 is 1 Pn 0 0 Σb(0) = n i=1(xi − µ0)(xi − µ0) = S0 + δδ , where δ = n(x¯ − µ0). The likelihood ratio statistic is then 0 W = n log jS0 + δδ j − n log jS0j: It turns out that W is a monotone increasing function of 0 −1 −1 δ S δ = n(x¯ − µ0)S (x¯ − µ0); which is the Hotelling's T 2(n − 1) statistic. case III: H0 : Σ = Σ0;H1 :Σ 6= Σ0; µ is unknown. We have the likelihood ratio statistic −1 −1 W = −n log jΣ0 S0j − np + ntrace(Σ0 S0): This is the case where the exact distribution of W is difficult to evaluate. For large 2 n, use Wilks' theorem to approximate the distribution of W by χν with the degrees of freedom ν = p(p + 1)=2. Next, consider testing the equality of two mean vectors. Let X11;:::; X1n1 be i.i.d. Np(µ1; Σ) and X21;:::; X2n2 be i.i.d. Np(µ2; Σ). 4 case IV: H0 : µ1 = µ2;H1 : µ1 6= µ2; Σ is unknown. Since ¯ ¯ n1 + n2 X1 − X2 ∼ Np(µ1 − µ2; Σ); n1n2 (n1 + n2 − 2)SP ∼ Wp(n1 + n2 − 2; Σ); we have Hotelling's T 2 statistic for two-sample problem 2 n1n2 ¯ ¯ 0 −1 ¯ ¯ T (n1 + n2 − 2) = (X1 − X2) SP (X1 − X2); n1 + n2 and by Theorem 5 in lecture 2 n1 + n2 − p − 1 2 T (n1 + n2 − 2) ∼ Fp;n1+n2−p−1: (n1 + n2 − 2)p 2 Similar to case II above, the likelihood ratio statistic is a monotone function of T (n1 + n2 − 2). 3.3 Hypothesis testing when p > n In the high dimensional situation where the dimension p is larger than sample size (p > n−1 2 or p > n1 +n2 −2), the sample covariance S is not invertable, thus the Hotelling's T statistic, which is essential in the testing procedures above, cannot be computed. We survey important proposals for testing hypotheses on means in high dimension, low sample size data. A basic idea in generalizing a test procedure for the p > n case is to base the test on a computable test statistic which is also an estimator for kµ − µ0k or kµ1 − µ2k. In case II (one sample), Dempster (1960) proposed to replace S−1 in Hotelling's statistic −1 by (trace(S)Ip) . He showed that under H0 : µ = 0, nX¯ 0X¯ T = ∼ F ; approximately; D trace(S) r;(n−1)r 2 for r = (trace(Σ)) , a measure of sphericity of Σ. An estimatorr ^ of r is used in testing. trace(Σ2) −1 Bai and Saranadasa (1996) proposed to simply replace S in Hotelling's statistic by Ip, ¯ 0 ¯ ¯ 0 ¯ 0 ¯ 0 ¯ yielding TB = nX X. However X X is not an unbiased estimator of µ µ since E(X X) = 1 0 n trace(Σ) + µ µ. They showed that the standardized statistic nX¯ 0X¯ − trace(S) nX¯ 0X¯ − trace(S) MB = = q sd(b nX¯ 0X¯ − trace(S)) 2(n−1)n 2 1 2 (n−2)(n+1) trace(S ) − n (trace(S)) has asymptotic N(0; 1) distribution for p; n ! 1.
Recommended publications
  • Confidence Intervals for the Population Mean Alternatives to the Student-T Confidence Interval
    Journal of Modern Applied Statistical Methods Volume 18 Issue 1 Article 15 4-6-2020 Robust Confidence Intervals for the Population Mean Alternatives to the Student-t Confidence Interval Moustafa Omar Ahmed Abu-Shawiesh The Hashemite University, Zarqa, Jordan, [email protected] Aamir Saghir Mirpur University of Science and Technology, Mirpur, Pakistan, [email protected] Follow this and additional works at: https://digitalcommons.wayne.edu/jmasm Part of the Applied Statistics Commons, Social and Behavioral Sciences Commons, and the Statistical Theory Commons Recommended Citation Abu-Shawiesh, M. O. A., & Saghir, A. (2019). Robust confidence intervals for the population mean alternatives to the Student-t confidence interval. Journal of Modern Applied Statistical Methods, 18(1), eP2721. doi: 10.22237/jmasm/1556669160 This Regular Article is brought to you for free and open access by the Open Access Journals at DigitalCommons@WayneState. It has been accepted for inclusion in Journal of Modern Applied Statistical Methods by an authorized editor of DigitalCommons@WayneState. Robust Confidence Intervals for the Population Mean Alternatives to the Student-t Confidence Interval Cover Page Footnote The authors are grateful to the Editor and anonymous three reviewers for their excellent and constructive comments/suggestions that greatly improved the presentation and quality of the article. This article was partially completed while the first author was on sabbatical leave (2014–2015) in Nizwa University, Sultanate of Oman. He is grateful to the Hashemite University for awarding him the sabbatical leave which gave him excellent research facilities. This regular article is available in Journal of Modern Applied Statistical Methods: https://digitalcommons.wayne.edu/ jmasm/vol18/iss1/15 Journal of Modern Applied Statistical Methods May 2019, Vol.
    [Show full text]
  • Estimating Confidence Regions of Common Measures of (Baseline, Treatment Effect) On
    Estimating confidence regions of common measures of (baseline, treatment effect) on dichotomous outcome of a population Li Yin1 and Xiaoqin Wang2* 1Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Box 281, SE- 171 77, Stockholm, Sweden 2Department of Electronics, Mathematics and Natural Sciences, University of Gävle, SE-801 76, Gävle, Sweden (Email: [email protected]). *Corresponding author Abstract In this article we estimate confidence regions of the common measures of (baseline, treatment effect) in observational studies, where the measure of baseline is baseline risk or baseline odds while the measure of treatment effect is odds ratio, risk difference, risk ratio or attributable fraction, and where confounding is controlled in estimation of both baseline and treatment effect. To avoid high complexity of the normal approximation method and the parametric or non-parametric bootstrap method, we obtain confidence regions for measures of (baseline, treatment effect) by generating approximate distributions of the ML estimates of these measures based on one logistic model. Keywords: baseline measure; effect measure; confidence region; logistic model 1 Introduction Suppose that one conducts a randomized trial to investigate the effect of a dichotomous treatment z on a dichotomous outcome y of certain population, where = 0, 1 indicate the 푧 1 active respective control treatments while = 0, 1 indicate positive respective negative outcomes. With a sufficiently large sample,푦 covariates are essentially unassociated with treatments z and thus are not confounders. Let R = pr( = 1 | ) be the risk of = 1 given 푧 z. Then R is marginal with respect to covariates and thus푦 conditional푧 on treatment푦 z only, so 푧 R is also called marginal risk.
    [Show full text]
  • Confidence Interval Estimation in System Dynamics Models: Bootstrapping Vs
    Confidence Interval Estimation in System Dynamics Models: Bootstrapping vs. Likelihood Ratio Method Gokhan Dogan* MIT Sloan School of Management, 30 Wadsworth Street, E53-364, Cambridge, Massachusetts 02142 [email protected] Abstract In this paper we discuss confidence interval estimation for system dynamics models. Confidence interval estimation is important because without confidence intervals, we cannot determine whether an estimated parameter value is significantly different from 0 or any other value, and therefore we cannot determine how much confidence to place in the estimate. We compare two methods for confidence interval estimation. The first, the “likelihood ratio method,” is based on maximum likelihood estimation. This method has been used in the system dynamics literature and is built in to some popular software packages. It is computationally efficient but requires strong assumptions about the model and data. These assumptions are frequently violated by the autocorrelation, endogeneity of explanatory variables and heteroskedasticity properties of dynamic models. The second method is called “bootstrapping.” Bootstrapping requires more computation but does not impose strong assumptions on the model or data. We describe the methods and illustrate them with a series of applications from actual modeling projects. Considering the empirical results presented in the paper and the fact that the bootstrapping method requires less assumptions, we suggest that bootstrapping is a better tool for confidence interval estimation in system dynamics
    [Show full text]
  • Theory Pest.Pdf
    Theory for PEST Users Zhulu Lin Dept. of Crop and Soil Sciences University of Georgia, Athens, GA 30602 [email protected] October 19, 2005 Contents 1 Linear Model Theory and Terminology 2 1.1 Amotivationexample ...................... 2 1.2 General linear regression model . 3 1.3 Parameterestimation. 6 1.3.1 Ordinary least squares estimator . 6 1.3.2 Weighted least squares estimator . 7 1.4 Uncertaintyanalysis ....................... 9 1.4.1 Variance-covariance matrix of βˆ and estimation of σ2 . 9 1.4.2 Confidence interval for βj ................ 9 1.4.3 Confidence region for β ................. 10 1.4.4 Confidence interval for E(y0) .............. 10 1.4.5 Prediction interval for a future observation y0 ..... 11 2 Nonlinear regression model 13 2.1 Linearapproximation. 13 2.2 Nonlinear least squares estimator . 14 2.3 Numericalmethods ........................ 14 2.3.1 Steepest Descent algorithm . 16 2.3.2 Gauss-Newton algorithm . 16 2.3.3 Levenberg-Marquardt algorithm . 18 2.3.4 Newton’smethods . 19 1 2.4 Uncertainty analysis . 22 2.4.1 Confidence intervals for parameter and model prediction 22 2.4.2 Nonlinear calibration-constrained method . 23 3 Miscellaneous 27 3.1 Convergence criteria . 27 3.2 Derivatives computation . 28 3.3 Parameter estimation of compartmental models . 28 3.4 Initial values and prior information . 29 3.5 Parametertransformation . 30 1 Linear Model Theory and Terminology Before discussing parameter estimation and uncertainty analysis for nonlin- ear models, we need to review linear model theory as many of the ideas and methods of estimation and analysis (inference) in nonlinear models are essentially linear methods applied to a linear approximate of the nonlinear models.
    [Show full text]
  • Random Vectors
    Random Vectors x is a p×1 random vector with a pdf probability density function f(x): Rp→R. Many books write X for the random vector and X=x for the realization of its value. E[X]= ∫ x f.(x) dx = µ Theorem: E[Ax+b]= AE[x]+b Covariance Matrix E[(x-µ)(x-µ)’]=var(x)=Σ (note the location of transpose) Theorem: Σ=E[xx’]-µµ’ If y is a random variable: covariance C(x,y)= E[(x-µ)(y-ν)’] Theorem: For constants a, A, var (a’x)=a’Σa, var(Ax+b)=AΣA’, C(x,x)=Σ, C(x,y)=C(y,x)’ Theorem: If x, y are independent RVs, then C(x,y)=0, but not conversely. Theorem: Let x,y have same dimension, then var(x+y)=var(x)+var(y)+C(x,y)+C(y,x) Normal Random Vectors The Central Limit Theorem says that if a focal random variable x consists of the sum of many other independent random variables, then the focal random variable will asymptotically have a 2 distribution that is basically of the form e−x , which we call “normal” because it is so common. 2 ⎛ x−µ ⎞ 1 − / 2 −(x−µ) (x−µ) / 2 1 ⎜ ⎟ 1 2 Normal random variable has pdf f (x) = e ⎝ σ ⎠ = e σ 2πσ2 2πσ2 Denote x p×1 normal random variable with pdf 1 −1 f (x) = e−(x−µ)'Σ (x−µ) (2π)p / 2 Σ 1/ 2 where µ is the mean vector and Σ is the covariance matrix: x~Np(µ,Σ).
    [Show full text]
  • WHAT DID FISHER MEAN by an ESTIMATE? 3 Ideas but Is in Conflict with His Ideology of Statistical Inference
    Submitted to the Annals of Applied Probability WHAT DID FISHER MEAN BY AN ESTIMATE? By Esa Uusipaikka∗ University of Turku Fisher’s Method of Maximum Likelihood is shown to be a proce- dure for the construction of likelihood intervals or regions, instead of a procedure of point estimation. Based on Fisher’s articles and books it is justified that by estimation Fisher meant the construction of likelihood intervals or regions from appropriate likelihood function and that an estimate is a statistic, that is, a function from a sample space to a parameter space such that the likelihood function obtained from the sampling distribution of the statistic at the observed value of the statistic is used to construct likelihood intervals or regions. Thus Problem of Estimation is how to choose the ’best’ estimate. Fisher’s solution for the problem of estimation is Maximum Likeli- hood Estimate (MLE). Fisher’s Theory of Statistical Estimation is a chain of ideas used to justify MLE as the solution of the problem of estimation. The construction of confidence intervals by the delta method from the asymptotic normal distribution of MLE is based on Fisher’s ideas, but is against his ’logic of statistical inference’. Instead the construc- tion of confidence intervals from the profile likelihood function of a given interest function of the parameter vector is considered as a solution more in line with Fisher’s ’ideology’. A new method of cal- culation of profile likelihood-based confidence intervals for general smooth interest functions in general statistical models is considered. 1. Introduction. ’Collected Papers of R.A.
    [Show full text]
  • STAT 22000 Lecture Slides Overview of Confidence Intervals
    STAT 22000 Lecture Slides Overview of Confidence Intervals Yibi Huang Department of Statistics University of Chicago Outline This set of slides covers section 4.2 in the text • Overview of Confidence Intervals 1 Confidence intervals • A plausible range of values for the population parameter is called a confidence interval. • Using only a sample statistic to estimate a parameter is like fishing in a murky lake with a spear, and using a confidence interval is like fishing with a net. We can throw a spear where we saw a fish but we will probably miss. If we toss a net in that area, we have a good chance of catching the fish. • If we report a point estimate, we probably won’t hit the exact population parameter. If we report a range of plausible values we have a good shot at capturing the parameter. 2 Photos by Mark Fischer (http://www.flickr.com/photos/fischerfotos/7439791462) and Chris Penny (http://www.flickr.com/photos/clearlydived/7029109617) on Flickr. Recall that CLT says, for large n, X ∼ N(µ, pσ ): For a normal n curve, 95% of its area is within 1.96 SDs from the center. That means, for 95% of the time, X will be within 1:96 pσ from µ. n 95% σ σ µ − 1.96 µ µ + 1.96 n n Alternatively, we can also say, for 95% of the time, µ will be within 1:96 pσ from X: n Hence, we call the interval ! σ σ σ X ± 1:96 p = X − 1:96 p ; X + 1:96 p n n n a 95% confidence interval for µ.
    [Show full text]
  • On the Meaning and Use of Kurtosis
    Psychological Methods Copyright 1997 by the American Psychological Association, Inc. 1997, Vol. 2, No. 3,292-307 1082-989X/97/$3.00 On the Meaning and Use of Kurtosis Lawrence T. DeCarlo Fordham University For symmetric unimodal distributions, positive kurtosis indicates heavy tails and peakedness relative to the normal distribution, whereas negative kurtosis indicates light tails and flatness. Many textbooks, however, describe or illustrate kurtosis incompletely or incorrectly. In this article, kurtosis is illustrated with well-known distributions, and aspects of its interpretation and misinterpretation are discussed. The role of kurtosis in testing univariate and multivariate normality; as a measure of departures from normality; in issues of robustness, outliers, and bimodality; in generalized tests and estimators, as well as limitations of and alternatives to the kurtosis measure [32, are discussed. It is typically noted in introductory statistics standard deviation. The normal distribution has a kur- courses that distributions can be characterized in tosis of 3, and 132 - 3 is often used so that the refer- terms of central tendency, variability, and shape. With ence normal distribution has a kurtosis of zero (132 - respect to shape, virtually every textbook defines and 3 is sometimes denoted as Y2)- A sample counterpart illustrates skewness. On the other hand, another as- to 132 can be obtained by replacing the population pect of shape, which is kurtosis, is either not discussed moments with the sample moments, which gives or, worse yet, is often described or illustrated incor- rectly. Kurtosis is also frequently not reported in re- ~(X i -- S)4/n search articles, in spite of the fact that virtually every b2 (•(X i - ~')2/n)2' statistical package provides a measure of kurtosis.
    [Show full text]
  • A Framework for Sample Efficient Interval Estimation with Control
    A Framework for Sample Efficient Interval Estimation with Control Variates Shengjia Zhao Christopher Yeh Stefano Ermon Stanford University Stanford University Stanford University Abstract timates, e.g. for each individual district or county, meaning that we need to draw a sufficient number of We consider the problem of estimating confi- samples for every district or county. Another diffi- dence intervals for the mean of a random vari- culty can arise when we need high accuracy (i.e. a able, where the goal is to produce the smallest small confidence interval), since confidence intervals possible interval for a given number of sam- producedp by typical concentration inequalities have size ples. While minimax optimal algorithms are O(1= number of samples). In other words, to reduce known for this problem in the general case, the size of a confidence interval by a factor of 10, we improved performance is possible under addi- need 100 times more samples. tional assumptions. In particular, we design This dilemma is generally unavoidable because concen- an estimation algorithm to take advantage of tration inequalities such as Chernoff or Chebychev are side information in the form of a control vari- minimax optimal: there exist distributions for which ate, leveraging order statistics. Under certain these inequalities cannot be improved. No-free-lunch conditions on the quality of the control vari- results (Van der Vaart, 2000) imply that any alter- ates, we show improved asymptotic efficiency native estimation algorithm that performs better (i.e. compared to existing estimation algorithms. outputs a confidence interval with smaller size) on some Empirically, we demonstrate superior perfor- problems, must perform worse on other problems.
    [Show full text]
  • The Probability Lifesaver: Order Statistics and the Median Theorem
    The Probability Lifesaver: Order Statistics and the Median Theorem Steven J. Miller December 30, 2015 Contents 1 Order Statistics and the Median Theorem 3 1.1 Definition of the Median 5 1.2 Order Statistics 10 1.3 Examples of Order Statistics 15 1.4 TheSampleDistributionoftheMedian 17 1.5 TechnicalboundsforproofofMedianTheorem 20 1.6 TheMedianofNormalRandomVariables 22 2 • Greetings again! In this supplemental chapter we develop the theory of order statistics in order to prove The Median Theorem. This is a beautiful result in its own, but also extremely important as a substitute for the Central Limit Theorem, and allows us to say non- trivial things when the CLT is unavailable. Chapter 1 Order Statistics and the Median Theorem The Central Limit Theorem is one of the gems of probability. It’s easy to use and its hypotheses are satisfied in a wealth of problems. Many courses build towards a proof of this beautiful and powerful result, as it truly is ‘central’ to the entire subject. Not to detract from the majesty of this wonderful result, however, what happens in those instances where it’s unavailable? For example, one of the key assumptions that must be met is that our random variables need to have finite higher moments, or at the very least a finite variance. What if we were to consider sums of Cauchy random variables? Is there anything we can say? This is not just a question of theoretical interest, of mathematicians generalizing for the sake of generalization. The following example from economics highlights why this chapter is more than just of theoretical interest.
    [Show full text]
  • Likelihood Confidence Intervals When Only Ranges Are Available
    Article Likelihood Confidence Intervals When Only Ranges are Available Szilárd Nemes Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg 405 30, Sweden; [email protected] Received: 22 December 2018; Accepted: 3 February 2019; Published: 6 February 2019 Abstract: Research papers represent an important and rich source of comparative data. The change is to extract the information of interest. Herein, we look at the possibilities to construct confidence intervals for sample averages when only ranges are available with maximum likelihood estimation with order statistics (MLEOS). Using Monte Carlo simulation, we looked at the confidence interval coverage characteristics for likelihood ratio and Wald-type approximate 95% confidence intervals. We saw indication that the likelihood ratio interval had better coverage and narrower intervals. For single parameter distributions, MLEOS is directly applicable. For location-scale distribution is recommended that the variance (or combination of it) to be estimated using standard formulas and used as a plug-in. Keywords: range, likelihood, order statistics, coverage 1. Introduction One of the tasks statisticians face is extracting and possibly inferring biologically/clinically relevant information from published papers. This aspect of applied statistics is well developed, and one can choose to form many easy to use and performant algorithms that aid problem solving. Often, these algorithms aim to aid statisticians/practitioners to extract variability of different measures or biomarkers that is needed for power calculation and research design [1,2]. While these algorithms are efficient and easy to use, they mostly are not probabilistic in nature, thus they do not offer means for statistical inference. Yet another field of applied statistics that aims to help practitioners in extracting relevant information when only partial data is available propose a probabilistic approach with order statistics.
    [Show full text]
  • Theoretical Statistics. Lecture 5. Peter Bartlett
    Theoretical Statistics. Lecture 5. Peter Bartlett 1. U-statistics. 1 Outline of today’s lecture We’ll look at U-statistics, a family of estimators that includes many interesting examples. We’ll study their properties: unbiased, lower variance, concentration (via an application of the bounded differences inequality), asymptotic variance, asymptotic distribution. (See Chapter 12 of van der Vaart.) First, we’ll consider the standard unbiased estimate of variance—a special case of a U-statistic. 2 Variance estimates n 1 s2 = (X − X¯ )2 n n − 1 i n Xi=1 n n 1 = (X − X¯ )2 +(X − X¯ )2 2n(n − 1) i n j n Xi=1 Xj=1 n n 1 2 = (X − X¯ ) − (X − X¯ ) 2n(n − 1) i n j n Xi=1 Xj=1 n n 1 1 = (X − X )2 n(n − 1) 2 i j Xi=1 Xj=1 1 1 = (X − X )2 . n 2 i j 2 Xi<j 3 Variance estimates This is unbiased for i.i.d. data: 1 Es2 = E (X − X )2 n 2 1 2 1 = E ((X − EX ) − (X − EX ))2 2 1 1 2 2 1 2 2 = E (X1 − EX1) +(X2 − EX2) 2 2 = E (X1 − EX1) . 4 U-statistics Definition: A U-statistic of order r with kernel h is 1 U = n h(Xi1 ,...,Xir ), r iX⊆[n] where h is symmetric in its arguments. [If h is not symmetric in its arguments, we can also average over permutations.] “U” for “unbiased.” Introduced by Wassily Hoeffding in the 1940s. 5 U-statistics Theorem: [Halmos] θ (parameter, i.e., function defined on a family of distributions) admits an unbiased estimator (ie: for all sufficiently large n, some function of the i.i.d.
    [Show full text]