Statistics for Data Science

Total Page:16

File Type:pdf, Size:1020Kb

Statistics for Data Science Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1 (2) Random variables 2 Random variables • Definition and notation • Cumulative distribution functions • Probability mass and density functions 3 Random variables • Definition and notation • Cumulative distribution functions • Probability mass and density functions 4 Definition and notation Random variables and distributions • Let (Ω; A; P) be a probability space and let X :Ω !X be a function. • Let S be a σ-algebra on X . • For every S 2 S let the preimage of S be X−1(S) := f! 2 ΩjX(!) 2 Sg: (1) • If X−1(S) 2 A for all S 2 S, then X is called measurable. • Let X :Ω !X be measurable. All S 2 S get allocated the probability −1 PX : S! [0; 1];S 7! PX (S) := P X (S) = P (f! 2 ΩjX(!) 2 Sg) (2) • X is called a random variable and PX is called the distribution of X. • (X ; S; PX ) is a probability space. • With X = R and S = B the probability space (R; B; PX ) takes center stage. 5 Definition and notation Random variables and distributions 6 Definition and notation Definition (Random variable) Let (Ω; A; P) denote a probability space. A (real-valued) random variable is a mapping X :Ω ! R;! 7! X(!); (3) with the measurability property f! 2 ΩjX(!) 2 S)g 2 A for all S 2 S: (4) Remarks • Random variables are neither \random" nor \variables". • Intuitively, ! 2 Ω gets randomly selected according to P and X(!) realized. • The distributions (probability measures) of random variables are central. 7 Definition and notation Random variables and distributions • Let (Ω; A; P) and (X ; S; PX ) denote probability spaces for X :Ω !X . • The following notations for events A 2 A w.r.t. X are conventional: fX 2 Sg := f! 2 ΩjX(!) 2 Sg;S ⊂ X fX = xg := f! 2 ΩjX(!) = xg; x 2 X fX ≤ xg := f! 2 ΩjX(!) ≤ xg; x 2 X fX < xg := f! 2 ΩjX(!) < xg; x 2 X • These conventions entail the following conventions for distributions: PX (X 2 S) = P (fX 2 Sg) = P (f! 2 ΩjX(!) 2 Sg) ;S ⊂ X PX (X ≤ x) = P (fX ≤ xg) = P (f! 2 ΩjX(!) ≤ xg) ; x 2 X • Often, the random variable subscript in distribution symbols is omitted: P (X 2 S) = PX (X 2 S) ;S ⊂ X P (X ≤ x) = PX (X ≤ S) ; x 2 X • Distributions can be defined using cumulative distribution functions, probability mass functions, and probability density functions. 8 Random variables • Definition and notation • Cumulative distribution functions • Probability mass and density functions 9 Cumulative distribution functions Definition (Cumulative distribution function) The cumulative distribution function (CDF) of a random variable X is defined as P : R ! [0; 1]; x 7! P (x) := P(X ≤ x): (5) Remarks • CDFs can be used to define distributions. • CDFs exist for both discrete and continuous random variables. 10 Cumulative distribution functions Example (Cumulative distribution function) Consider a random variable with outcome space X = f0; 1; 2g and distribution defined by 1 1 1 (X = 0) = ; (X = 1) = ; (X = 2) = (6) P 4 P 2 P 4 Then its distribution function is given by 8 0 x < 0; > > <> 1 0 ≤ x < 1; P : ! [0; 1]; x 7! P (x) := 4 (7) R 3 > 4 1 ≤ x < 1; > :>1 x ≥ 2: Remarks • P is right-continuous. • P is defined for all x 2 R, while X 2 f0; 1; 2g. 11 Cumulative distribution functions Identity of CDFs Let X have CDF P and let Y have CDF Q. If P (x) = Q(x) for all x, then P(X 2 S) = P(Y 2 S) for all events S 2 S. Properties of CDFs A function P : R ! [0; 1] is a CDF for some probability P, if and only if P satisfies the following conditions (1) P is non-decreasing: x1 < x2 implies that P (x1) ≤ P (x2). (2) P is normalized: limx→−∞ P (x) = 0 and limx!1 P (x) = 1. + + (3) P is right-continuous: P (x) = P (x ) for all x, where P (x ) := limy!x;y>x P (y). 12 Random variables • Definition and notation • Cumulative distribution functions • Probability mass and density functions 13 Probability mass and density functions Definition (Probability mass functions, discrete random variables) A random variable X is discrete, if it takes on countably many values in X := fx1; x2; :::g. The probability mass function of X is defined as p : X! [0; 1]; x 7! p(x) := P(X = x): (8) Remarks • A set is countable, if it is finite or bijectively related to N. • A PMF is non-negative: p(x) ≥ 0 for all x 2 X . • P A PMF is normalized: i p(x) = 1. • The CDF of a PMF is P (x) = (X ≤ x) = P p(x ). P xi≤x i • The CDF of a PMF is also referred to as a cumulative mass function (CMF). 14 Probability mass and density functions Example (Bernoulli random variable) Let X be a random variable with outcome set X = f0; 1g and probability mass function p : X! [0; 1]; x 7! p(x) := µx(1 − µ)1−x for µ 2 [0; 1]: (9) Then X is said to be distributed according to a Bernoulli distribution with parameter µ 2 [0; 1], for which we write X ∼ Bern(µ). We denote the probability mass function of a Bernoulli random variable by Bern(x; µ) := µx(1 − µ)1−x: (10) Remarks • A Bernoulli random variable can be used to model a single biased coin flip with outcomes \failure" 0 and \success" 1. • µ is the probability for X to take the value 1, 1 1−1 P(X = 1) = µ (1 − µ) = µ. (11) 15 Probability mass and density functions Definition (Probability density functions, continuous random variables) A random variable X is continuous, if there exists a function p : R ! R≥0; x 7! p(x) (12) such that • p(x) ≥ 0 for all x 2 R, • R 1 −∞ p(x)dx = 1, • R b P(a ≤ X ≤ b) = a p(x) dx for all a; b 2 R; a ≤ b. Remarks • R a PDFs can take on values larger than 1 and P(X = a) = a p(x) dx = 0. • Probabilities are obtained from PDFs by integration, • (Probability) mass = (probability) density × (set) volume. • R x d The CDF of a PDF is P (x) = −∞ p(ξ) dξ, thus p(x) = dx P (x): • The CDF of a PDF is also referred to as cumulative density function. 16 Probability mass and density functions Example (Gaussian random variable, standard normal variable) Let X be a random variable with outcome set R and probability density function 1 1 2 p : R ! R>0; x 7! p(x) := p exp − (x − µ) : (13) 2πσ2 2σ2 Then X is said to be distributed according to a Gaussian distribution with parameters 2 2 µ 2 R and σ > 0, for which we write X ∼ N µ, σ . We abbreviate the PDF of a Gaussian random variable by 1 1 N x; µ, σ2 := p exp − (x − µ)2 : (14) 2πσ2 2σ2 A Gaussian random variable with µ = 0 and σ2 = 1 is said to be distributed according to a standard normal distribution and is often referred to as a Z variable. Remarks • The parameter µ specifies the location of highest probability density. • The parameter σ2 specifies the width of the distribution. • The term p 1 is the normalization constant for exp − 1 (x − µ)2 . 2πσ2 2σ2 17 Probability mass and density functions Example (Uniform random variables) Let X be a discrete random variable with a finite outcome set X and probability mass function 1 p : X! R≥0; x 7! p(x) := : (15) jX j Then X is said to be distributed according to a discrete uniform distribution, for which we write X ∼ U(jX j). We abbreviate the PMF of a discrete uniform random variable by 1 U(x; jX j) := : (16) jX j Similarly, let X be a continuous random variable with probability density function ( 1 b−a x 2 [a; b] p : R ! R>0; x 7! p(x) := (17) 0 x2 = [a; b] Then X is said to be distributed according to a continuous uniform distribution with parameters a and b, for which we write X ∼ U(a; b). We abbreviate the PDF of a continuous uniform random variable by 1 U(x; a; b) := : (18) b − a 18 Probability mass and density functions Properties of cumulative density functions • P(X > x) = 1 − P (x) (Exceedance distribution function) • P(x < X ≤ y) = P (y) − P (x) (Interval probability) • With the properties of the Riemann integral, we have P (y) − P (x) = P(x < X < y) = P(x ≤ X < y) (19) = P(x < X ≤ y) = P(x ≤ X ≤ y): 19 Probability mass and density functions Definition (Inverse cumulative distribution function) Let X be a random variable with CDF P . Then the inverse cumulative distri- bution function or quantile function of X is defined as −1 −1 P : [0; 1] ! R; q 7! P (q) := inffxjP (x) > qg (20) If P is invertible, i.e., strictly increasing and continuous, then P −1(q) is the unique real number x such that P (x) = q. Remarks • P −1(0:25) is called the first quartile. • P −1(0:50) is called the median or second quartile. • P −1(q) is also referred to as qth percentile. 20 Probability mass and density functions Example (CDF and inverse CDF for Gaussian random variables) Let X be a univariate Gaussian random variable with expectation parameter µ and variance parameter σ2.
Recommended publications
  • Efficient Estimation of Parameters of the Negative Binomial Distribution
    E±cient Estimation of Parameters of the Negative Binomial Distribution V. SAVANI AND A. A. ZHIGLJAVSKY Department of Mathematics, Cardi® University, Cardi®, CF24 4AG, U.K. e-mail: SavaniV@cardi®.ac.uk, ZhigljavskyAA@cardi®.ac.uk (Corresponding author) Abstract In this paper we investigate a class of moment based estimators, called power method estimators, which can be almost as e±cient as maximum likelihood estima- tors and achieve a lower asymptotic variance than the standard zero term method and method of moments estimators. We investigate di®erent methods of implementing the power method in practice and examine the robustness and e±ciency of the power method estimators. Key Words: Negative binomial distribution; estimating parameters; maximum likelihood method; e±ciency of estimators; method of moments. 1 1. The Negative Binomial Distribution 1.1. Introduction The negative binomial distribution (NBD) has appeal in the modelling of many practical applications. A large amount of literature exists, for example, on using the NBD to model: animal populations (see e.g. Anscombe (1949), Kendall (1948a)); accident proneness (see e.g. Greenwood and Yule (1920), Arbous and Kerrich (1951)) and consumer buying behaviour (see e.g. Ehrenberg (1988)). The appeal of the NBD lies in the fact that it is a simple two parameter distribution that arises in various di®erent ways (see e.g. Anscombe (1950), Johnson, Kotz, and Kemp (1992), Chapter 5) often allowing the parameters to have a natural interpretation (see Section 1.2). Furthermore, the NBD can be implemented as a distribution within stationary processes (see e.g. Anscombe (1950), Kendall (1948b)) thereby increasing the modelling potential of the distribution.
    [Show full text]
  • 5. the Student T Distribution
    Virtual Laboratories > 4. Special Distributions > 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 5. The Student t Distribution In this section we will study a distribution that has special importance in statistics. In particular, this distribution will arise in the study of a standardized version of the sample mean when the underlying distribution is normal. The Probability Density Function Suppose that Z has the standard normal distribution, V has the chi-squared distribution with n degrees of freedom, and that Z and V are independent. Let Z T= √V/n In the following exercise, you will show that T has probability density function given by −(n +1) /2 Γ((n + 1) / 2) t2 f(t)= 1 + , t∈ℝ ( n ) √n π Γ(n / 2) 1. Show that T has the given probability density function by using the following steps. n a. Show first that the conditional distribution of T given V=v is normal with mean 0 a nd variance v . b. Use (a) to find the joint probability density function of (T,V). c. Integrate the joint probability density function in (b) with respect to v to find the probability density function of T. The distribution of T is known as the Student t distribution with n degree of freedom. The distribution is well defined for any n > 0, but in practice, only positive integer values of n are of interest. This distribution was first studied by William Gosset, who published under the pseudonym Student. In addition to supplying the proof, Exercise 1 provides a good way of thinking of the t distribution: the t distribution arises when the variance of a mean 0 normal distribution is randomized in a certain way.
    [Show full text]
  • A Tail Quantile Approximation Formula for the Student T and the Symmetric Generalized Hyperbolic Distribution
    A Service of Leibniz-Informationszentrum econstor Wirtschaft Leibniz Information Centre Make Your Publications Visible. zbw for Economics Schlüter, Stephan; Fischer, Matthias J. Working Paper A tail quantile approximation formula for the student t and the symmetric generalized hyperbolic distribution IWQW Discussion Papers, No. 05/2009 Provided in Cooperation with: Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics Suggested Citation: Schlüter, Stephan; Fischer, Matthias J. (2009) : A tail quantile approximation formula for the student t and the symmetric generalized hyperbolic distribution, IWQW Discussion Papers, No. 05/2009, Friedrich-Alexander-Universität Erlangen-Nürnberg, Institut für Wirtschaftspolitik und Quantitative Wirtschaftsforschung (IWQW), Nürnberg This Version is available at: http://hdl.handle.net/10419/29554 Standard-Nutzungsbedingungen: Terms of use: Die Dokumente auf EconStor dürfen zu eigenen wissenschaftlichen Documents in EconStor may be saved and copied for your Zwecken und zum Privatgebrauch gespeichert und kopiert werden. personal and scholarly purposes. Sie dürfen die Dokumente nicht für öffentliche oder kommerzielle You are not to copy documents for public or commercial Zwecke vervielfältigen, öffentlich ausstellen, öffentlich zugänglich purposes, to exhibit the documents publicly, to make them machen, vertreiben oder anderweitig nutzen. publicly available on the internet, or to distribute or otherwise use the documents in public. Sofern die Verfasser die Dokumente unter Open-Content-Lizenzen (insbesondere CC-Lizenzen) zur Verfügung gestellt haben sollten, If the documents have been made available under an Open gelten abweichend von diesen Nutzungsbedingungen die in der dort Content Licence (especially Creative Commons Licences), you genannten Lizenz gewährten Nutzungsrechte. may exercise further usage rights as specified in the indicated licence. www.econstor.eu IWQW Institut für Wirtschaftspolitik und Quantitative Wirtschaftsforschung Diskussionspapier Discussion Papers No.
    [Show full text]
  • Univariate and Multivariate Skewness and Kurtosis 1
    Running head: UNIVARIATE AND MULTIVARIATE SKEWNESS AND KURTOSIS 1 Univariate and Multivariate Skewness and Kurtosis for Measuring Nonnormality: Prevalence, Influence and Estimation Meghan K. Cain, Zhiyong Zhang, and Ke-Hai Yuan University of Notre Dame Author Note This research is supported by a grant from the U.S. Department of Education (R305D140037). However, the contents of the paper do not necessarily represent the policy of the Department of Education, and you should not assume endorsement by the Federal Government. Correspondence concerning this article can be addressed to Meghan Cain ([email protected]), Ke-Hai Yuan ([email protected]), or Zhiyong Zhang ([email protected]), Department of Psychology, University of Notre Dame, 118 Haggar Hall, Notre Dame, IN 46556. UNIVARIATE AND MULTIVARIATE SKEWNESS AND KURTOSIS 2 Abstract Nonnormality of univariate data has been extensively examined previously (Blanca et al., 2013; Micceri, 1989). However, less is known of the potential nonnormality of multivariate data although multivariate analysis is commonly used in psychological and educational research. Using univariate and multivariate skewness and kurtosis as measures of nonnormality, this study examined 1,567 univariate distriubtions and 254 multivariate distributions collected from authors of articles published in Psychological Science and the American Education Research Journal. We found that 74% of univariate distributions and 68% multivariate distributions deviated from normal distributions. In a simulation study using typical values of skewness and kurtosis that we collected, we found that the resulting type I error rates were 17% in a t-test and 30% in a factor analysis under some conditions. Hence, we argue that it is time to routinely report skewness and kurtosis along with other summary statistics such as means and variances.
    [Show full text]
  • Sampling Student's T Distribution – Use of the Inverse Cumulative
    Sampling Student’s T distribution – use of the inverse cumulative distribution function William T. Shaw Department of Mathematics, King’s College, The Strand, London WC2R 2LS, UK With the current interest in copula methods, and fat-tailed or other non-normal distributions, it is appropriate to investigate technologies for managing marginal distributions of interest. We explore “Student’s” T distribution, survey its simulation, and present some new techniques for simulation. In particular, for a given real (not necessarily integer) value n of the number of degrees of freedom, −1 we give a pair of power series approximations for the inverse, Fn ,ofthe cumulative distribution function (CDF), Fn.Wealsogivesomesimpleandvery fast exact and iterative techniques for defining this function when n is an even −1 integer, based on the observation that for such cases the calculation of Fn amounts to the solution of a reduced-form polynomial equation of degree n − 1. We also explain the use of Cornish–Fisher expansions to define the inverse CDF as the composition of the inverse CDF for the normal case with a simple polynomial map. The methods presented are well adapted for use with copula and quasi-Monte-Carlo techniques. 1 Introduction There is much interest in many areas of financial modeling on the use of copulas to glue together marginal univariate distributions where there is no easy canonical multivariate distribution, or one wishes to have flexibility in the mechanism for combination. One of the more interesting marginal distributions is the “Student’s” T distribution. This statistical distribution was published by W. Gosset in 1908.
    [Show full text]
  • Characterization of the Bivariate Negative Binomial Distribution James E
    Journal of the Arkansas Academy of Science Volume 21 Article 17 1967 Characterization of the Bivariate Negative Binomial Distribution James E. Dunn University of Arkansas, Fayetteville Follow this and additional works at: http://scholarworks.uark.edu/jaas Part of the Other Applied Mathematics Commons Recommended Citation Dunn, James E. (1967) "Characterization of the Bivariate Negative Binomial Distribution," Journal of the Arkansas Academy of Science: Vol. 21 , Article 17. Available at: http://scholarworks.uark.edu/jaas/vol21/iss1/17 This article is available for use under the Creative Commons license: Attribution-NoDerivatives 4.0 International (CC BY-ND 4.0). Users are able to read, download, copy, print, distribute, search, link to the full texts of these articles, or use them for any other lawful purpose, without asking prior permission from the publisher or the author. This Article is brought to you for free and open access by ScholarWorks@UARK. It has been accepted for inclusion in Journal of the Arkansas Academy of Science by an authorized editor of ScholarWorks@UARK. For more information, please contact [email protected], [email protected]. Journal of the Arkansas Academy of Science, Vol. 21 [1967], Art. 17 77 Arkansas Academy of Science Proceedings, Vol.21, 1967 CHARACTERIZATION OF THE BIVARIATE NEGATIVE BINOMIAL DISTRIBUTION James E. Dunn INTRODUCTION The univariate negative binomial distribution (also known as Pascal's distribution and the Polya-Eggenberger distribution under vari- ous reparameterizations) has recently been characterized by Bartko (1962). Its broad acceptance and applicability in such diverse areas as medicine, ecology, and engineering is evident from the references listed there.
    [Show full text]
  • Nonparametric Multivariate Kurtosis and Tailweight Measures
    Nonparametric Multivariate Kurtosis and Tailweight Measures Jin Wang1 Northern Arizona University and Robert Serfling2 University of Texas at Dallas November 2004 – final preprint version, to appear in Journal of Nonparametric Statistics, 2005 1Department of Mathematics and Statistics, Northern Arizona University, Flagstaff, Arizona 86011-5717, USA. Email: [email protected]. 2Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas 75083- 0688, USA. Email: [email protected]. Website: www.utdallas.edu/∼serfling. Support by NSF Grant DMS-0103698 is gratefully acknowledged. Abstract For nonparametric exploration or description of a distribution, the treatment of location, spread, symmetry and skewness is followed by characterization of kurtosis. Classical moment- based kurtosis measures the dispersion of a distribution about its “shoulders”. Here we con- sider quantile-based kurtosis measures. These are robust, are defined more widely, and dis- criminate better among shapes. A univariate quantile-based kurtosis measure of Groeneveld and Meeden (1984) is extended to the multivariate case by representing it as a transform of a dispersion functional. A family of such kurtosis measures defined for a given distribution and taken together comprises a real-valued “kurtosis functional”, which has intuitive appeal as a convenient two-dimensional curve for description of the kurtosis of the distribution. Several multivariate distributions in any dimension may thus be compared with respect to their kurtosis in a single two-dimensional plot. Important properties of the new multivariate kurtosis measures are established. For example, for elliptically symmetric distributions, this measure determines the distribution within affine equivalence. Related tailweight measures, influence curves, and asymptotic behavior of sample versions are also discussed.
    [Show full text]
  • Variance Reduction Techniques for Estimating Quantiles and Value-At-Risk" (2010)
    New Jersey Institute of Technology Digital Commons @ NJIT Dissertations Electronic Theses and Dissertations Spring 5-31-2010 Variance reduction techniques for estimating quantiles and value- at-risk Fang Chu New Jersey Institute of Technology Follow this and additional works at: https://digitalcommons.njit.edu/dissertations Part of the Databases and Information Systems Commons, and the Management Information Systems Commons Recommended Citation Chu, Fang, "Variance reduction techniques for estimating quantiles and value-at-risk" (2010). Dissertations. 212. https://digitalcommons.njit.edu/dissertations/212 This Dissertation is brought to you for free and open access by the Electronic Theses and Dissertations at Digital Commons @ NJIT. It has been accepted for inclusion in Dissertations by an authorized administrator of Digital Commons @ NJIT. For more information, please contact [email protected]. Cprht Wrnn & trtn h prht l f th Untd Stt (tl , Untd Stt Cd vrn th n f phtp r thr rprdtn f prhtd trl. Undr rtn ndtn pfd n th l, lbrr nd rhv r thrzd t frnh phtp r thr rprdtn. On f th pfd ndtn tht th phtp r rprdtn nt t b “d fr n prp thr thn prvt td, hlrhp, r rrh. If , r rt fr, r ltr , phtp r rprdtn fr prp n x f “fr tht r b lbl fr prht nfrnnt, h ntttn rrv th rht t rf t pt pn rdr f, n t jdnt, flfllnt f th rdr ld nvlv vltn f prht l. l t: h thr rtn th prht hl th r Inttt f hnl rrv th rht t dtrbt th th r drttn rntn nt: If d nt h t prnt th p, thn lt “ fr: frt p t: lt p n th prnt dl rn h n tn lbrr h rvd f th prnl nfrtn nd ll ntr fr th pprvl p nd brphl th f th nd drttn n rdr t prtt th dntt f I rdt nd flt.
    [Show full text]
  • Modified Odd Weibull Family of Distributions: Properties and Applications Christophe Chesneau, Taoufik El Achi
    Modified Odd Weibull Family of Distributions: Properties and Applications Christophe Chesneau, Taoufik El Achi To cite this version: Christophe Chesneau, Taoufik El Achi. Modified Odd Weibull Family of Distributions: Properties and Applications. 2019. hal-02317235 HAL Id: hal-02317235 https://hal.archives-ouvertes.fr/hal-02317235 Submitted on 15 Oct 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Modified Odd Weibull Family of Distributions: Properties and Applications Christophe CHESNEAU and Taoufik EL ACHI LMNO, University of Caen Normandie, 14000, Caen, France 17 August 2019 Abstract In this paper, a new family of continuous distributions, called the modified odd Weibull-G (MOW-G) family, is studied. The MOW-G family has the feature to use the Weibull distribu- tion as main generator and a new modification of the odd transformation, opening new horizon in terms of statistical modelling. Its main theoretical and practical aspects are explored. In particular, for the mathematical properties, we investigate some results in distribution, quantile function, skewness, kurtosis, moments, moment generating function, order statistics and en- tropy. For the statistical aspect, the maximum likelihood estimation method is used to estimate the model parameters.
    [Show full text]
  • Univariate Statistics Summary
    Univariate Statistics Summary Further Maths Univariate Statistics Summary Types of Data Data can be classified as categorical or numerical. Categorical data are observations or records that are arranged according to category. For example: the favourite colour of a class of students; the mode of transport that each student uses to get to school; the rating of a TV program, either “a great program”, “average program” or “poor program”. Postal codes such as “3011”, “3015” etc. Numerical data are observations based on counting or measurement. Calculations can be performed on numerical data. There are two main types of numerical data Discrete data, which takes only fixed values, usually whole numbers. Discrete data often arises as the result of counting items. For example: the number of siblings each student has, the number of pets a set of randomly chosen people have or the number of passengers in cars that pass an intersection. Continuous data can take any value in a given range. It is usually a measurement. For example: the weights of students in a class. The weight of each student could be measured to the nearest tenth of a kg. Weights of 84.8kg and 67.5kg would be recorded. Other examples of continuous data include the time taken to complete a task or the heights of a group of people. Exercise 1 Decide whether the following data is categorical or numerical. If numerical decide if the data is discrete or continuous. 1. 2. Page 1 of 21 Univariate Statistics Summary 3. 4. Solutions 1a Numerical-discrete b. Categorical c. Categorical d.
    [Show full text]
  • Understanding PROC UNIVARIATE Statistics Wendell F
    116 Beginning Tutorials Understanding PROC UNIVARIATE Statistics Wendell F. Refior, Paul Revere Insurance Group · Introduction Missing Value Section The purpose of this presentation is to explain the This section is displayed only if there is at least one llasic statistics reported in PROC UNIVARIATE. missing value. analysis An intuitive and graphical approach to data MISSING VALUE - This shows which charac­ is encouraged, but this text will limit the figures to be ter represents a missing value. For numeric vari­ The used in the oral presentation to conserve space. ables, lhe default missing value is of course the author advises that data inspection and cleaning decimal point or "dot". Other values may have description. should precede statistical analysis and been used ranging from .A to :Z. John W. Tukey'sErploratoryDataAnalysistoolsare Vt:C'f helpful for preliminary data inspection. H the COUNT- The number shown by COUNT is the values of your data set do not conform to a bell­ number ofoccurrences of missing. Note lhat the shaped curve, as does the normal distribution, the NMISS oulput variable contains this number, if question of transforming the values prior to analysis requested. is raised. Then key summary statistics are nexL % COUNT/NOBS -The number displayed is the statistical Brief mention will be made of problem of percentage of all observations for which the vs. practical significance, and the topics of statistical analysis variable has a missing value. It may be estimation and hYPQ!hesis testing. The Schlotzhauer calculated as follows: ROUND(100 • ( NMISS and Littell text,SAs® System/or Elementary Analysis I (N + NMISS) ), .01).
    [Show full text]
  • Quantile Function Methods for Decision Analysis
    QUANTILE FUNCTION METHODS FOR DECISION ANALYSIS A DISSERTATION SUBMITTED TO THE DEPARTMENT OF MANAGEMENT SCIENCE AND ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Bradford W. Powley August 2013 © 2013 by Bradford William Powley. All Rights Reserved. Re-distributed by Stanford University under license with the author. This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/ This dissertation is online at: http://purl.stanford.edu/yn842pf8910 ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Ronald Howard, Primary Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Ross Shachter I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Tom Keelin Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost Graduate Education This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives. iii Abstract Encoding prior probability distributions is a fundamental step in any decision analysis. A decision analyst often elicits an expert's knowledge about a continuous uncertain quantity as a set of quantile-probability pairs (points on a cumulative distribution function) and seeks a probability distribution consistent with them.
    [Show full text]