1. Preface 2. Introduction 3. Sampling Distribution
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Appendix a Basic Statistical Concepts for Sensory Evaluation
Appendix A Basic Statistical Concepts for Sensory Evaluation Contents This chapter provides a quick introduction to statistics used for sensory evaluation data including measures of A.1 Introduction ................ 473 central tendency and dispersion. The logic of statisti- A.2 Basic Statistical Concepts .......... 474 cal hypothesis testing is introduced. Simple tests on A.2.1 Data Description ........... 475 pairs of means (the t-tests) are described with worked A.2.2 Population Statistics ......... 476 examples. The meaning of a p-value is reviewed. A.3 Hypothesis Testing and Statistical Inference .................. 478 A.3.1 The Confidence Interval ........ 478 A.3.2 Hypothesis Testing .......... 478 A.3.3 A Worked Example .......... 479 A.1 Introduction A.3.4 A Few More Important Concepts .... 480 A.3.5 Decision Errors ............ 482 A.4 Variations of the t-Test ............ 482 The main body of this book has been concerned with A.4.1 The Sensitivity of the Dependent t-Test for using good sensory test methods that can generate Sensory Data ............. 484 quality data in well-designed and well-executed stud- A.5 Summary: Statistical Hypothesis Testing ... 485 A.6 Postscript: What p-Values Signify and What ies. Now we turn to summarize the applications of They Do Not ................ 485 statistics to sensory data analysis. Although statistics A.7 Statistical Glossary ............. 486 are a necessary part of sensory research, the sensory References .................... 487 scientist would do well to keep in mind O’Mahony’s admonishment: statistical analysis, no matter how clever, cannot be used to save a poor experiment. The techniques of statistical analysis, do however, It is important when taking a sample or designing an serve several useful purposes, mainly in the efficient experiment to remember that no matter how powerful summarization of data and in allowing the sensory the statistics used, the inferences made from a sample scientist to make reasonable conclusions from the are only as good as the data in that sample. -
Programming in Stata
Programming and Post-Estimation • Bootstrapping • Monte Carlo • Post-Estimation Simulation (Clarify) • Extending Clarify to Other Models – Censored Probit Example What is Bootstrapping? • A computer-simulated nonparametric technique for making inferences about a population parameter based on sample statistics. • If the sample is a good approximation of the population, the sampling distribution of interest can be estimated by generating a large number of new samples from the original. • Useful when no analytic formula for the sampling distribution is available. B 2 ˆ B B b How do I do it? SE B = s(x )- s(x )/ B /(B -1) å[ åb =1 ] b=1 1. Obtain a Sample from the population of interest. Call this x = (x1, x2, … , xn). 2. Re-sample based on x by randomly sampling with replacement from it. 3. Generate many such samples, x1, x2, …, xB – each of length n. 4. Estimate the desired parameter in each sample, s(x1), s(x2), …, s(xB). 5. For instance the bootstrap estimate of the standard error is the standard deviation of the bootstrap replications. Example: Standard Error of a Sample Mean Canned in Stata use "I:\general\PRISM Programming\auto.dta", clear Type: sum mpg Example: Standard Error of a Sample Mean Canned in Stata Type: bootstrap "summarize mpg" r(mean), reps(1000) » 21.2973±1.96*.6790806 x B - x Example: Difference of Medians Test Type: sum length, detail Type: return list Example: Difference of Medians Test Example: Difference of Medians Test Example: Difference of Medians Test Type: bootstrap "mymedian" r(diff), reps(1000) The medians are not very different. -
The Cascade Bayesian Approach: Prior Transformation for a Controlled Integration of Internal Data, External Data and Scenarios
risks Article The Cascade Bayesian Approach: Prior Transformation for a Controlled Integration of Internal Data, External Data and Scenarios Bertrand K. Hassani 1,2,3,* and Alexis Renaudin 4 1 University College London Computer Science, 66-72 Gower Street, London WC1E 6EA, UK 2 LabEx ReFi, Université Paris 1 Panthéon-Sorbonne, CESCES, 106 bd de l’Hôpital, 75013 Paris, France 3 Capegemini Consulting, Tour Europlaza, 92400 Paris-La Défense, France 4 Aon Benfield, The Aon Centre, 122 Leadenhall Street, London EC3V 4AN, UK; [email protected] * Correspondence: [email protected]; Tel.: +44-(0)753-016-7299 Received: 14 March 2018; Accepted: 23 April 2018; Published: 27 April 2018 Abstract: According to the last proposals of the Basel Committee on Banking Supervision, banks or insurance companies under the advanced measurement approach (AMA) must use four different sources of information to assess their operational risk capital requirement. The fourth includes ’business environment and internal control factors’, i.e., qualitative criteria, whereas the three main quantitative sources available to banks for building the loss distribution are internal loss data, external loss data and scenario analysis. This paper proposes an innovative methodology to bring together these three different sources in the loss distribution approach (LDA) framework through a Bayesian strategy. The integration of the different elements is performed in two different steps to ensure an internal data-driven model is obtained. In the first step, scenarios are used to inform the prior distributions and external data inform the likelihood component of the posterior function. In the second step, the initial posterior function is used as the prior distribution and the internal loss data inform the likelihood component of the second posterior function. -
STATS 305 Notes1
STATS 305 Notes1 Art Owen2 Autumn 2013 1The class notes were beautifully scribed by Eric Min. He has kindly allowed his notes to be placed online for stat 305 students. Reading these at leasure, you will spot a few errors and omissions due to the hurried nature of scribing and probably my handwriting too. Reading them ahead of class will help you understand the material as the class proceeds. 2Department of Statistics, Stanford University. 0.0: Chapter 0: 2 Contents 1 Overview 9 1.1 The Math of Applied Statistics . .9 1.2 The Linear Model . .9 1.2.1 Other Extensions . 10 1.3 Linearity . 10 1.4 Beyond Simple Linearity . 11 1.4.1 Polynomial Regression . 12 1.4.2 Two Groups . 12 1.4.3 k Groups . 13 1.4.4 Different Slopes . 13 1.4.5 Two-Phase Regression . 14 1.4.6 Periodic Functions . 14 1.4.7 Haar Wavelets . 15 1.4.8 Multiphase Regression . 15 1.5 Concluding Remarks . 16 2 Setting Up the Linear Model 17 2.1 Linear Model Notation . 17 2.2 Two Potential Models . 18 2.2.1 Regression Model . 18 2.2.2 Correlation Model . 18 2.3 TheLinear Model . 18 2.4 Math Review . 19 2.4.1 Quadratic Forms . 20 3 The Normal Distribution 23 3.1 Friends of N (0; 1)...................................... 23 3.1.1 χ2 .......................................... 23 3.1.2 t-distribution . 23 3.1.3 F -distribution . 24 3.2 The Multivariate Normal . 24 3.2.1 Linear Transformations . 25 3.2.2 Normal Quadratic Forms . -
What Is Bayesian Inference? Bayesian Inference Is at the Core of the Bayesian Approach, Which Is an Approach That Allows Us to Represent Uncertainty As a Probability
Learn to Use Bayesian Inference in SPSS With Data From the National Child Measurement Programme (2016–2017) © 2019 SAGE Publications Ltd. All Rights Reserved. This PDF has been generated from SAGE Research Methods Datasets. SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 Learn to Use Bayesian Inference in SPSS With Data From the National Child Measurement Programme (2016–2017) Student Guide Introduction This example dataset introduces Bayesian Inference. Bayesian statistics (the general name for all Bayesian-related topics, including inference) has become increasingly popular in recent years, due predominantly to the growth of evermore powerful and sophisticated statistical software. However, Bayesian statistics grew from the ideas of an English mathematician, Thomas Bayes, who lived and worked in the first half of the 18th century and have been refined and adapted by statisticians and mathematicians ever since. Despite its longevity, the Bayesian approach did not become mainstream: the Frequentist approach was and remains the dominant means to conduct statistical analysis. However, there is a renewed interest in Bayesian statistics, part prompted by software development and part by a growing critique of the limitations of the null hypothesis significance testing which dominates the Frequentist approach. This renewed interest can be seen in the incorporation of Bayesian analysis into mainstream statistical software, such as, IBM® SPSS® and in many major statistics text books. Bayesian Inference is at the heart of Bayesian statistics and is different from Frequentist approaches due to how it views probability. In the Frequentist approach, probability is the product of the frequency of random events occurring Page 2 of 19 Learn to Use Bayesian Inference in SPSS With Data From the National Child Measurement Programme (2016–2017) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. -
Basic Econometrics / Statistics Statistical Distributions: Normal, T, Chi-Sq, & F
Basic Econometrics / Statistics Statistical Distributions: Normal, T, Chi-Sq, & F Course : Basic Econometrics : HC43 / Statistics B.A. Hons Economics, Semester IV/ Semester III Delhi University Course Instructor: Siddharth Rathore Assistant Professor Economics Department, Gargi College Siddharth Rathore guj75845_appC.qxd 4/16/09 12:41 PM Page 461 APPENDIX C SOME IMPORTANT PROBABILITY DISTRIBUTIONS In Appendix B we noted that a random variable (r.v.) can be described by a few characteristics, or moments, of its probability function (PDF or PMF), such as the expected value and variance. This, however, presumes that we know the PDF of that r.v., which is a tall order since there are all kinds of random variables. In practice, however, some random variables occur so frequently that statisticians have determined their PDFs and documented their properties. For our purpose, we will consider only those PDFs that are of direct interest to us. But keep in mind that there are several other PDFs that statisticians have studied which can be found in any standard statistics textbook. In this appendix we will discuss the following four probability distributions: 1. The normal distribution 2. The t distribution 3. The chi-square (2 ) distribution 4. The F distribution These probability distributions are important in their own right, but for our purposes they are especially important because they help us to find out the probability distributions of estimators (or statistics), such as the sample mean and sample variance. Recall that estimators are random variables. Equipped with that knowledge, we will be able to draw inferences about their true population values. -
Practical Meta-Analysis -- Lipsey & Wilson Overview
Practical Meta-Analysis -- Lipsey & Wilson The Great Debate • 1952: Hans J. Eysenck concluded that there were no favorable Practical Meta-Analysis effects of psychotherapy, starting a raging debate • 20 years of evaluation research and hundreds of studies failed David B. Wilson to resolve the debate American Evaluation Association • 1978: To proved Eysenck wrong, Gene V. Glass statistically Orlando, Florida, October 3, 1999 aggregate the findings of 375 psychotherapy outcome studies • Glass (and colleague Smith) concluded that psychotherapy did indeed work • Glass called his method “meta-analysis” 1 2 The Emergence of Meta-Analysis The Logic of Meta-Analysis • Ideas behind meta-analysis predate Glass’ work by several • Traditional methods of review focus on statistical significance decades testing – R. A. Fisher (1944) • Significance testing is not well suited to this task • “When a number of quite independent tests of significance have been made, it sometimes happens that although few or none can be – highly dependent on sample size claimed individually as significant, yet the aggregate gives an – null finding does not carry to same “weight” as a significant impression that the probabilities are on the whole lower than would often have been obtained by chance” (p. 99). finding • Source of the idea of cumulating probability values • Meta-analysis changes the focus to the direction and – W. G. Cochran (1953) magnitude of the effects across studies • Discusses a method of averaging means across independent studies – Isn’t this what we are -
One Sample T Test
Copyright c October 17, 2020 by NEH One Sample t Test Nathaniel E. Helwig University of Minnesota 1 How Beer Changed the World William Sealy Gosset was a chemist and statistician who was employed as the Head Exper- imental Brewer at Guinness in the early 1900s. As a part of his job, Gosset was responsible for taking samples of beer and performing various analyses to ensure that the beer was of the proper composition and quality. While at Guinness, Gosset taught himself experimental design and statistical analysis (which were new and exciting fields at the time), and he even spent some time in 1906{1907 studying in Karl Pearson's laboratory. Gosset's work often involved collecting a small number of samples, and testing hypotheses about the mean of the population from which the samples were drawn. For example, in the brewery, Gosset may want to test whether the population mean amount of some ingredient (e.g., barley) was equal to the intended value. And, on the farm, Gosset may want to test how different farming techniques and growing conditions affect the barley yields. Through this work, Gos- set noticed that the normal distribution was not adequate for testing hypotheses about the mean in small samples of data with an unknown variance. 2 2 1 Pn As a reminder, if X ∼ N(µ, σ ), thenx ¯ ∼ N(µ, σ =n) wherex ¯ = n i=1 xi, which implies x¯ − µ Z = p ∼ N(0; 1) σ= n i.e., the Z test statistic follows a standard normal distribution. However, in practice, the 2 2 1 Pn 2 population variance σ is often unknown, so the sample variance s = n−1 i=1(xi − x¯) must be used in its place. -
Statistical Power in Meta-Analysis Jin Liu University of South Carolina - Columbia
University of South Carolina Scholar Commons Theses and Dissertations 12-14-2015 Statistical Power in Meta-Analysis Jin Liu University of South Carolina - Columbia Follow this and additional works at: https://scholarcommons.sc.edu/etd Part of the Educational Psychology Commons Recommended Citation Liu, J.(2015). Statistical Power in Meta-Analysis. (Doctoral dissertation). Retrieved from https://scholarcommons.sc.edu/etd/3221 This Open Access Dissertation is brought to you by Scholar Commons. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of Scholar Commons. For more information, please contact [email protected]. STATISTICAL POWER IN META-ANALYSIS by Jin Liu Bachelor of Arts Chongqing University of Arts and Sciences, 2009 Master of Education University of South Carolina, 2012 Submitted in Partial Fulfillment of the Requirements For the Degree of Doctor of Philosophy in Educational Psychology and Research College of Education University of South Carolina 2015 Accepted by: Xiaofeng Liu, Major Professor Christine DiStefano, Committee Member Robert Johnson, Committee Member Brian Habing, Committee Member Lacy Ford, Senior Vice Provost and Dean of Graduate Studies © Copyright by Jin Liu, 2015 All Rights Reserved. ii ACKNOWLEDGEMENTS I would like to express my thanks to all committee members who continuously supported me in the dissertation work. My dissertation chair, Xiaofeng Liu, always inspired and trusted me during this study. I could not finish my dissertation at this time without his support. Brian Habing helped me with R coding and research design. I could not finish the coding process so quickly without his instruction. I appreciate the encouragement from Christine DiStefano and Robert Johnson. -
Descriptive Statistics Frequency Distributions and Their Graphs
Chapter 2 Descriptive Statistics § 2.1 Frequency Distributions and Their Graphs Frequency Distributions A frequency distribution is a table that shows classes or intervals of data with a count of the number in each class. The frequency f of a class is the number of data points in the class. Class Frequency, f 1 – 4 4 LowerUpper Class 5 – 8 5 Limits 9 – 12 3 Frequencies 13 – 16 4 17 – 20 2 Larson & Farber, Elementary Statistics: Picturing the World , 3e 3 1 Frequency Distributions The class width is the distance between lower (or upper) limits of consecutive classes. Class Frequency, f 1 – 4 4 5 – 1 = 4 5 – 8 5 9 – 5 = 4 9 – 12 3 13 – 9 = 4 13 – 16 4 17 – 13 = 4 17 – 20 2 The class width is 4. The range is the difference between the maximum and minimum data entries. Larson & Farber, Elementary Statistics: Picturing the World , 3e 4 Constructing a Frequency Distribution Guidelines 1. Decide on the number of classes to include. The number of classes should be between 5 and 20; otherwise, it may be difficult to detect any patterns. 2. Find the class width as follows. Determine the range of the data, divide the range by the number of classes, and round up to the next convenient number. 3. Find the class limits. You can use the minimum entry as the lower limit of the first class. To find the remaining lower limits, add the class width to the lower limit of the preceding class. Then find the upper class limits. 4. -
Pivotal Quantities with Arbitrary Small Skewness Arxiv:1605.05985V1 [Stat
Pivotal Quantities with Arbitrary Small Skewness Masoud M. Nasari∗ School of Mathematics and Statistics of Carleton University Ottawa, ON, Canada Abstract In this paper we present randomization methods to enhance the accuracy of the central limit theorem (CLT) based inferences about the population mean µ. We introduce a broad class of randomized versions of the Student t- statistic, the classical pivot for µ, that continue to possess the pivotal property for µ and their skewness can be made arbitrarily small, for each fixed sam- ple size n. Consequently, these randomized pivots admit CLTs with smaller errors. The randomization framework in this paper also provides an explicit relation between the precision of the CLTs for the randomized pivots and the volume of their associated confidence regions for the mean for both univariate and multivariate data. This property allows regulating the trade-off between the accuracy and the volume of the randomized confidence regions discussed in this paper. 1 Introduction The CLT is an essential tool for inferring on parameters of interest in a nonpara- metric framework. The strength of the CLT stems from the fact that, as the sample size increases, the usually unknown sampling distribution of a pivot, a function of arXiv:1605.05985v1 [stat.ME] 19 May 2016 the data and an associated parameter, approaches the standard normal distribution. This, in turn, validates approximating the percentiles of the sampling distribution of the pivot by those of the normal distribution, in both univariate and multivariate cases. The CLT is an approximation method whose validity relies on large enough sam- ples. -
Stat 3701 Lecture Notes: Bootstrap Charles J
Stat 3701 Lecture Notes: Bootstrap Charles J. Geyer April 17, 2017 1 License This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (http: //creativecommons.org/licenses/by-sa/4.0/). 2 R • The version of R used to make this document is 3.3.3. • The version of the rmarkdown package used to make this document is 1.4. • The version of the knitr package used to make this document is 1.15.1. • The version of the bootstrap package used to make this document is 2017.2. 3 Relevant and Irrelevant Simulation 3.1 Irrelevant Most statisticians think a statistics paper isn’t really a statistics paper or a statistics talk isn’t really a statistics talk if it doesn’t have simulations demonstrating that the methods proposed work great (at least in some toy problems). IMHO, this is nonsense. Simulations of the kind most statisticians do prove nothing. The toy problems used are often very special and do not stress the methods at all. In fact, they may be (consciously or unconsciously) chosen to make the methods look good. In scientific experiments, we know how to use randomization, blinding, and other techniques to avoid biasing the results. Analogous things are never AFAIK done with simulations. When all of the toy problems simulated are very different from the statistical model you intend to use for your data, what could the simulation study possibly tell you that is relevant? Nothing. Hence, for short, your humble author calls all of these millions of simulation studies statisticians have done irrelevant simulation.