A Primer on Mathematical Statistics and Univariate Distributions; the Normal Distribution; the GLM with the Normal Distribution
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Mathematical Statistics
Mathematical Statistics MAS 713 Chapter 1.2 Previous subchapter 1 What is statistics ? 2 The statistical process 3 Population and samples 4 Random sampling Questions? Mathematical Statistics (MAS713) Ariel Neufeld 2 / 70 This subchapter Descriptive Statistics 1.2.1 Introduction 1.2.2 Types of variables 1.2.3 Graphical representations 1.2.4 Descriptive measures Mathematical Statistics (MAS713) Ariel Neufeld 3 / 70 1.2 Descriptive Statistics 1.2.1 Introduction Introduction As we said, statistics is the art of learning from data However, statistical data, obtained from surveys, experiments, or any series of measurements, are often so numerous that they are virtually useless unless they are condensed ; data should be presented in ways that facilitate their interpretation and subsequent analysis Naturally enough, the aspect of statistics which deals with organising, describing and summarising data is called descriptive statistics Mathematical Statistics (MAS713) Ariel Neufeld 4 / 70 1.2 Descriptive Statistics 1.2.2 Types of variables Types of variables Mathematical Statistics (MAS713) Ariel Neufeld 5 / 70 1.2 Descriptive Statistics 1.2.2 Types of variables Types of variables The range of available descriptive tools for a given data set depends on the type of the considered variable There are essentially two types of variables : 1 categorical (or qualitative) variables : take a value that is one of several possible categories (no numerical meaning) Ex. : gender, hair color, field of study, political affiliation, status, ... 2 numerical (or quantitative) -
Relationships Among Some Univariate Distributions
IIE Transactions (2005) 37, 651–656 Copyright C “IIE” ISSN: 0740-817X print / 1545-8830 online DOI: 10.1080/07408170590948512 Relationships among some univariate distributions WHEYMING TINA SONG Department of Industrial Engineering and Engineering Management, National Tsing Hua University, Hsinchu, Taiwan, Republic of China E-mail: [email protected], wheyming [email protected] Received March 2004 and accepted August 2004 The purpose of this paper is to graphically illustrate the parametric relationships between pairs of 35 univariate distribution families. The families are organized into a seven-by-five matrix and the relationships are illustrated by connecting related families with arrows. A simplified matrix, showing only 25 families, is designed for student use. These relationships provide rapid access to information that must otherwise be found from a time-consuming search of a large number of sources. Students, teachers, and practitioners who model random processes will find the relationships in this article useful and insightful. 1. Introduction 2. A seven-by-five matrix An understanding of probability concepts is necessary Figure 1 illustrates 35 univariate distributions in 35 if one is to gain insights into systems that can be rectangle-like entries. The row and column numbers are modeled as random processes. From an applications labeled on the left and top of Fig. 1, respectively. There are point of view, univariate probability distributions pro- 10 discrete distributions, shown in the first two rows, and vide an important foundation in probability theory since 25 continuous distributions. Five commonly used sampling they are the underpinnings of the most-used models in distributions are listed in the third row. -
This History of Modern Mathematical Statistics Retraces Their Development
BOOK REVIEWS GORROOCHURN Prakash, 2016, Classic Topics on the History of Modern Mathematical Statistics: From Laplace to More Recent Times, Hoboken, NJ, John Wiley & Sons, Inc., 754 p. This history of modern mathematical statistics retraces their development from the “Laplacean revolution,” as the author so rightly calls it (though the beginnings are to be found in Bayes’ 1763 essay(1)), through the mid-twentieth century and Fisher’s major contribution. Up to the nineteenth century the book covers the same ground as Stigler’s history of statistics(2), though with notable differences (see infra). It then discusses developments through the first half of the twentieth century: Fisher’s synthesis but also the renewal of Bayesian methods, which implied a return to Laplace. Part I offers an in-depth, chronological account of Laplace’s approach to probability, with all the mathematical detail and deductions he drew from it. It begins with his first innovative articles and concludes with his philosophical synthesis showing that all fields of human knowledge are connected to the theory of probabilities. Here Gorrouchurn raises a problem that Stigler does not, that of induction (pp. 102-113), a notion that gives us a better understanding of probability according to Laplace. The term induction has two meanings, the first put forward by Bacon(3) in 1620, the second by Hume(4) in 1748. Gorroochurn discusses only the second. For Bacon, induction meant discovering the principles of a system by studying its properties through observation and experimentation. For Hume, induction was mere enumeration and could not lead to certainty. Laplace followed Bacon: “The surest method which can guide us in the search for truth, consists in rising by induction from phenomena to laws and from laws to forces”(5). -
Applied Time Series Analysis
Applied Time Series Analysis SS 2018 February 12, 2018 Dr. Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied Sciences CH-8401 Winterthur Table of Contents 1 INTRODUCTION 1 1.1 PURPOSE 1 1.2 EXAMPLES 2 1.3 GOALS IN TIME SERIES ANALYSIS 8 2 MATHEMATICAL CONCEPTS 11 2.1 DEFINITION OF A TIME SERIES 11 2.2 STATIONARITY 11 2.3 TESTING STATIONARITY 13 3 TIME SERIES IN R 15 3.1 TIME SERIES CLASSES 15 3.2 DATES AND TIMES IN R 17 3.3 DATA IMPORT 21 4 DESCRIPTIVE ANALYSIS 23 4.1 VISUALIZATION 23 4.2 TRANSFORMATIONS 26 4.3 DECOMPOSITION 29 4.4 AUTOCORRELATION 50 4.5 PARTIAL AUTOCORRELATION 66 5 STATIONARY TIME SERIES MODELS 69 5.1 WHITE NOISE 69 5.2 ESTIMATING THE CONDITIONAL MEAN 70 5.3 AUTOREGRESSIVE MODELS 71 5.4 MOVING AVERAGE MODELS 85 5.5 ARMA(P,Q) MODELS 93 6 SARIMA AND GARCH MODELS 99 6.1 ARIMA MODELS 99 6.2 SARIMA MODELS 105 6.3 ARCH/GARCH MODELS 109 7 TIME SERIES REGRESSION 113 7.1 WHAT IS THE PROBLEM? 113 7.2 FINDING CORRELATED ERRORS 117 7.3 COCHRANE‐ORCUTT METHOD 124 7.4 GENERALIZED LEAST SQUARES 125 7.5 MISSING PREDICTOR VARIABLES 131 8 FORECASTING 137 8.1 STATIONARY TIME SERIES 138 8.2 SERIES WITH TREND AND SEASON 145 8.3 EXPONENTIAL SMOOTHING 152 9 MULTIVARIATE TIME SERIES ANALYSIS 161 9.1 PRACTICAL EXAMPLE 161 9.2 CROSS CORRELATION 165 9.3 PREWHITENING 168 9.4 TRANSFER FUNCTION MODELS 170 10 SPECTRAL ANALYSIS 175 10.1 DECOMPOSING IN THE FREQUENCY DOMAIN 175 10.2 THE SPECTRUM 179 10.3 REAL WORLD EXAMPLE 186 11 STATE SPACE MODELS 187 11.1 STATE SPACE FORMULATION 187 11.2 AR PROCESSES WITH MEASUREMENT NOISE 188 11.3 DYNAMIC LINEAR MODELS 191 ATSA 1 Introduction 1 Introduction 1.1 Purpose Time series data, i.e. -
Mathematical Statistics the Sample Distribution of the Median
Course Notes for Math 162: Mathematical Statistics The Sample Distribution of the Median Adam Merberg and Steven J. Miller February 15, 2008 Abstract We begin by introducing the concept of order statistics and ¯nding the density of the rth order statistic of a sample. We then consider the special case of the density of the median and provide some examples. We conclude with some appendices that describe some of the techniques and background used. Contents 1 Order Statistics 1 2 The Sample Distribution of the Median 2 3 Examples and Exercises 4 A The Multinomial Distribution 5 B Big-Oh Notation 6 C Proof That With High Probability jX~ ¡ ¹~j is Small 6 D Stirling's Approximation Formula for n! 7 E Review of the exponential function 7 1 Order Statistics Suppose that the random variables X1;X2;:::;Xn constitute a sample of size n from an in¯nite population with continuous density. Often it will be useful to reorder these random variables from smallest to largest. In reordering the variables, we will also rename them so that Y1 is a random variable whose value is the smallest of the Xi, Y2 is the next smallest, and th so on, with Yn the largest of the Xi. Yr is called the r order statistic of the sample. In considering order statistics, it is naturally convenient to know their probability density. We derive an expression for the distribution of the rth order statistic as in [MM]. Theorem 1.1. For a random sample of size n from an in¯nite population having values x and density f(x), the probability th density of the r order statistic Yr is given by ·Z ¸ ·Z ¸ n! yr r¡1 1 n¡r gr(yr) = f(x) dx f(yr) f(x) dx : (1.1) (r ¡ 1)!(n ¡ r)! ¡1 yr Proof. -
Nonparametric Multivariate Kurtosis and Tailweight Measures
Nonparametric Multivariate Kurtosis and Tailweight Measures Jin Wang1 Northern Arizona University and Robert Serfling2 University of Texas at Dallas November 2004 – final preprint version, to appear in Journal of Nonparametric Statistics, 2005 1Department of Mathematics and Statistics, Northern Arizona University, Flagstaff, Arizona 86011-5717, USA. Email: [email protected]. 2Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas 75083- 0688, USA. Email: [email protected]. Website: www.utdallas.edu/∼serfling. Support by NSF Grant DMS-0103698 is gratefully acknowledged. Abstract For nonparametric exploration or description of a distribution, the treatment of location, spread, symmetry and skewness is followed by characterization of kurtosis. Classical moment- based kurtosis measures the dispersion of a distribution about its “shoulders”. Here we con- sider quantile-based kurtosis measures. These are robust, are defined more widely, and dis- criminate better among shapes. A univariate quantile-based kurtosis measure of Groeneveld and Meeden (1984) is extended to the multivariate case by representing it as a transform of a dispersion functional. A family of such kurtosis measures defined for a given distribution and taken together comprises a real-valued “kurtosis functional”, which has intuitive appeal as a convenient two-dimensional curve for description of the kurtosis of the distribution. Several multivariate distributions in any dimension may thus be compared with respect to their kurtosis in a single two-dimensional plot. Important properties of the new multivariate kurtosis measures are established. For example, for elliptically symmetric distributions, this measure determines the distribution within affine equivalence. Related tailweight measures, influence curves, and asymptotic behavior of sample versions are also discussed. -
Why Distinguish Between Statistics and Mathematical Statistics – the Case of Swedish Academia
Why distinguish between statistics and mathematical statistics { the case of Swedish academia Peter Guttorp1 and Georg Lindgren2 1Department of Statistics, University of Washington, Seattle 2Mathematical Statistics, Lund University, Lund 2017/10/11 Abstract A separation between the academic subjects statistics and mathematical statistics has existed in Sweden almost as long as there have been statistics professors. The same distinction has not been maintained in other countries. Why has it been kept so for long in Sweden, and what consequences may it have had? In May 2015 it was 100 years since Mathematical Statistics was formally estab- lished as an academic discipline at a Swedish university where Statistics had existed since the turn of the century. We give an account of the debate in Lund and elsewhere about this division dur- ing the first decades after 1900 and present two of its leading personalities. The Lund University astronomer (and mathematical statistician) C.V.L. Charlier was a lead- ing proponent for a position in mathematical statistics at the university. Charlier's adversary in the debate was Pontus Fahlbeck, professor in political science and statis- tics, who reserved the word statistics for \statistics as a social science". Charlier not only secured the first academic position in Sweden in mathematical statistics for his former Ph.D. student Sven Wicksell, but he also demonstrated that a mathematical statistician can be influential in matters of state, finance, as well as in different nat- ural sciences. Fahlbeck saw mathematical statistics as a set of tools that sometimes could be useful in his brand of statistics. After a summary of the organisational, educational, and scientific growth of the statistical sciences in Sweden that has taken place during the last 50 years, we discuss what effects the Charlier-Fahlbeck divergence might have had on this development. -
Field Guide to Continuous Probability Distributions
Field Guide to Continuous Probability Distributions Gavin E. Crooks v 1.0.0 2019 G. E. Crooks – Field Guide to Probability Distributions v 1.0.0 Copyright © 2010-2019 Gavin E. Crooks ISBN: 978-1-7339381-0-5 http://threeplusone.com/fieldguide Berkeley Institute for Theoretical Sciences (BITS) typeset on 2019-04-10 with XeTeX version 0.99999 fonts: Trump Mediaeval (text), Euler (math) 271828182845904 2 G. E. Crooks – Field Guide to Probability Distributions Preface: The search for GUD A common problem is that of describing the probability distribution of a single, continuous variable. A few distributions, such as the normal and exponential, were discovered in the 1800’s or earlier. But about a century ago the great statistician, Karl Pearson, realized that the known probabil- ity distributions were not sufficient to handle all of the phenomena then under investigation, and set out to create new distributions with useful properties. During the 20th century this process continued with abandon and a vast menagerie of distinct mathematical forms were discovered and invented, investigated, analyzed, rediscovered and renamed, all for the purpose of de- scribing the probability of some interesting variable. There are hundreds of named distributions and synonyms in current usage. The apparent diver- sity is unending and disorienting. Fortunately, the situation is less confused than it might at first appear. Most common, continuous, univariate, unimodal distributions can be orga- nized into a small number of distinct families, which are all special cases of a single Grand Unified Distribution. This compendium details these hun- dred or so simple distributions, their properties and their interrelations. -
Probability Distributions Used in Reliability Engineering
Probability Distributions Used in Reliability Engineering Probability Distributions Used in Reliability Engineering Andrew N. O’Connor Mohammad Modarres Ali Mosleh Center for Risk and Reliability 0151 Glenn L Martin Hall University of Maryland College Park, Maryland Published by the Center for Risk and Reliability International Standard Book Number (ISBN): 978-0-9966468-1-9 Copyright © 2016 by the Center for Reliability Engineering University of Maryland, College Park, Maryland, USA All rights reserved. No part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without permission in writing from The Center for Reliability Engineering, Reliability Engineering Program. The Center for Risk and Reliability University of Maryland College Park, Maryland 20742-7531 In memory of Willie Mae Webb This book is dedicated to the memory of Miss Willie Webb who passed away on April 10 2007 while working at the Center for Risk and Reliability at the University of Maryland (UMD). She initiated the concept of this book, as an aid for students conducting studies in Reliability Engineering at the University of Maryland. Upon passing, Willie bequeathed her belongings to fund a scholarship providing financial support to Reliability Engineering students at UMD. Preface Reliability Engineers are required to combine a practical understanding of science and engineering with statistics. The reliability engineer’s understanding of statistics is focused on the practical application of a wide variety of accepted statistical methods. Most reliability texts provide only a basic introduction to probability distributions or only provide a detailed reference to few distributions. -
Mathematical Expectation
Mathematical Expectation 1 Introduction 6 Product Moments 2 The Expected Value of a Random 7 Moments of Linear Combinations of Random Variable Variables 3 Moments 8 Conditional Expectations 4 Chebyshev’s Theorem 9 The Theory in Practice 5 Moment-Generating Functions 1 Introduction Originally, the concept of a mathematical expectation arose in connection with games of chance, and in its simplest form it is the product of the amount a player stands to win and the probability that he or she will win. For instance, if we hold one of 10,000 tickets in a raffle for which the grand prize is a trip worth $4,800, our mathemati- cal expectation is 4,800 1 $0 .48. This amount will have to be interpreted in · 10,000 = the sense of an average—altogether the 10,000 tickets pay $4,800, or on the average $4,800 $0 .48 per ticket. 10,000 = If there is also a second prize worth $1,200 and a third prize worth $400, we can argue that altogether the 10,000 tickets pay $4,800 $1,200 $400 $6,400, or on + + = the average $6,400 $0 .64 per ticket. Looking at this in a different way, we could 10,000 = argue that if the raffle is repeated many times, we would lose 99.97 percent of the time (or with probability 0.9997) and win each of the prizes 0.01 percent of the time (or with probability 0.0001). On the average we would thus win 0(0.9997 ) 4,800 (0.0001 ) 1,200 (0.0001 ) 400 (0.0001 ) $0 .64 + + + = which is the sum of the products obtained by multiplying each amount by the corre- sponding probability. -
Some Statistical Aspects of Spatial Distribution Models for Plants and Trees
STUDIA FORESTALIA SUECICA Some Statistical Aspects of Spatial Distribution Models for Plants and Trees PETER DIGGLE Department of Operational Efficiency Swedish University of Agricultural Sciences S-770 73 Garpenberg. Sweden SWEDISH UNIVERSITY OF AGRICULTURAL SCIENCES COLLEGE OF FORESTRY UPPSALA SWEDEN Abstract ODC 523.31:562.6/46 The paper is an account of some stcctistical ur1alysrs carried out in coniur~tionwith a .silvicultural research project in the department of Oi_leruriotzal Efficiency. College of Forestry, Gcwpenberg. Sweden (Erikssorl, L & Eriksson. 0 in prepuratiotz 1981). 1~ Section 2, u statbtic due ro Morurz (19501 is irsed to detect spafiul intercccriorl amongst coutlts of the ~u~rnbersof yo14119 trees in stimple~~lo~slaid out in a systelnntic grid arrangement. Section 3 discusses the corlstruction of hivuriate disrrtbutions for the numbers of~:ild and plunted trees in a sample plot. Sectiorc 4 comiders the relutionship between the distributions of the number oj trees in a plot arld their rota1 busal area. Secrion 5 is a review of statistical nlerhodc for xte in cot~tlectiorz w~thpreliminai~ surveys of large areas of forest. In purticular, Secrion 5 dzscusses tests or spurrnl randotnness for a sillgle species, tests of independence betwren two species, and estimation of the number of trees per unit urea. Manuscr~ptrecelved 1981-10-15 LFIALLF 203 82 003 ISBN 9 1-38-07077-4 ISSN 0039-3150 Berllngs, Arlov 1982 Preface The origin of this report dates back to 1978. At this time Dr Peter Diggle was employed as a guest-researcher at the College. The aim of Dr Diggle's work at the College was to introduce advanced mathematical and statistical models in forestry research. -
Multivariate Distributions
IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing1 in particular on multivariate normal, normal-mixture, spherical and elliptical distributions. In addition to studying their properties, we will also discuss techniques for simulating and, very briefly, estimating these distributions. Familiarity with these important classes of multivariate distributions is important for many aspects of risk management. We will defer the study of copulas until later in the course. 1 Preliminary Definitions Let X = (X1;:::Xn) be an n-dimensional vector of random variables. We have the following definitions and statements. > n Definition 1 (Joint CDF) For all x = (x1; : : : ; xn) 2 R , the joint cumulative distribution function (CDF) of X satisfies FX(x) = FX(x1; : : : ; xn) = P (X1 ≤ x1;:::;Xn ≤ xn): Definition 2 (Marginal CDF) For a fixed i, the marginal CDF of Xi satisfies FXi (xi) = FX(1;:::; 1; xi; 1;::: 1): It is straightforward to generalize the previous definition to joint marginal distributions. For example, the joint marginal distribution of Xi and Xj satisfies Fij(xi; xj) = FX(1;:::; 1; xi; 1;:::; 1; xj; 1;::: 1). If the joint CDF is absolutely continuous, then it has an associated probability density function (PDF) so that Z x1 Z xn FX(x1; : : : ; xn) = ··· f(u1; : : : ; un) du1 : : : dun: −∞ −∞ Similar statements also apply to the marginal CDF's. A collection of random variables is independent if the joint CDF (or PDF if it exists) can be factored into the product of the marginal CDFs (or PDFs). If > > X1 = (X1;:::;Xk) and X2 = (Xk+1;:::;Xn) is a partition of X then the conditional CDF satisfies FX2jX1 (x2jx1) = P (X2 ≤ x2jX1 = x1): If X has a PDF, f(·), then it satisfies Z xk+1 Z xn f(x1; : : : ; xk; uk+1; : : : ; un) FX2jX1 (x2jx1) = ··· duk+1 : : : dun −∞ −∞ fX1 (x1) where fX1 (·) is the joint marginal PDF of X1.