Confidence Intervals
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Confidence Intervals for the Population Mean Alternatives to the Student-T Confidence Interval
Journal of Modern Applied Statistical Methods Volume 18 Issue 1 Article 15 4-6-2020 Robust Confidence Intervals for the Population Mean Alternatives to the Student-t Confidence Interval Moustafa Omar Ahmed Abu-Shawiesh The Hashemite University, Zarqa, Jordan, [email protected] Aamir Saghir Mirpur University of Science and Technology, Mirpur, Pakistan, [email protected] Follow this and additional works at: https://digitalcommons.wayne.edu/jmasm Part of the Applied Statistics Commons, Social and Behavioral Sciences Commons, and the Statistical Theory Commons Recommended Citation Abu-Shawiesh, M. O. A., & Saghir, A. (2019). Robust confidence intervals for the population mean alternatives to the Student-t confidence interval. Journal of Modern Applied Statistical Methods, 18(1), eP2721. doi: 10.22237/jmasm/1556669160 This Regular Article is brought to you for free and open access by the Open Access Journals at DigitalCommons@WayneState. It has been accepted for inclusion in Journal of Modern Applied Statistical Methods by an authorized editor of DigitalCommons@WayneState. Robust Confidence Intervals for the Population Mean Alternatives to the Student-t Confidence Interval Cover Page Footnote The authors are grateful to the Editor and anonymous three reviewers for their excellent and constructive comments/suggestions that greatly improved the presentation and quality of the article. This article was partially completed while the first author was on sabbatical leave (2014–2015) in Nizwa University, Sultanate of Oman. He is grateful to the Hashemite University for awarding him the sabbatical leave which gave him excellent research facilities. This regular article is available in Journal of Modern Applied Statistical Methods: https://digitalcommons.wayne.edu/ jmasm/vol18/iss1/15 Journal of Modern Applied Statistical Methods May 2019, Vol. -
Applied Biostatistics Mean and Standard Deviation the Mean the Median Is Not the Only Measure of Central Value for a Distribution
Health Sciences M.Sc. Programme Applied Biostatistics Mean and Standard Deviation The mean The median is not the only measure of central value for a distribution. Another is the arithmetic mean or average, usually referred to simply as the mean. This is found by taking the sum of the observations and dividing by their number. The mean is often denoted by a little bar over the symbol for the variable, e.g. x . The sample mean has much nicer mathematical properties than the median and is thus more useful for the comparison methods described later. The median is a very useful descriptive statistic, but not much used for other purposes. Median, mean and skewness The sum of the 57 FEV1s is 231.51 and hence the mean is 231.51/57 = 4.06. This is very close to the median, 4.1, so the median is within 1% of the mean. This is not so for the triglyceride data. The median triglyceride is 0.46 but the mean is 0.51, which is higher. The median is 10% away from the mean. If the distribution is symmetrical the sample mean and median will be about the same, but in a skew distribution they will not. If the distribution is skew to the right, as for serum triglyceride, the mean will be greater, if it is skew to the left the median will be greater. This is because the values in the tails affect the mean but not the median. Figure 1 shows the positions of the mean and median on the histogram of triglyceride. -
STAT 22000 Lecture Slides Overview of Confidence Intervals
STAT 22000 Lecture Slides Overview of Confidence Intervals Yibi Huang Department of Statistics University of Chicago Outline This set of slides covers section 4.2 in the text • Overview of Confidence Intervals 1 Confidence intervals • A plausible range of values for the population parameter is called a confidence interval. • Using only a sample statistic to estimate a parameter is like fishing in a murky lake with a spear, and using a confidence interval is like fishing with a net. We can throw a spear where we saw a fish but we will probably miss. If we toss a net in that area, we have a good chance of catching the fish. • If we report a point estimate, we probably won’t hit the exact population parameter. If we report a range of plausible values we have a good shot at capturing the parameter. 2 Photos by Mark Fischer (http://www.flickr.com/photos/fischerfotos/7439791462) and Chris Penny (http://www.flickr.com/photos/clearlydived/7029109617) on Flickr. Recall that CLT says, for large n, X ∼ N(µ, pσ ): For a normal n curve, 95% of its area is within 1.96 SDs from the center. That means, for 95% of the time, X will be within 1:96 pσ from µ. n 95% σ σ µ − 1.96 µ µ + 1.96 n n Alternatively, we can also say, for 95% of the time, µ will be within 1:96 pσ from X: n Hence, we call the interval ! σ σ σ X ± 1:96 p = X − 1:96 p ; X + 1:96 p n n n a 95% confidence interval for µ. -
Statistics for Dummies Cheat Sheet from Statistics for Dummies, 2Nd Edition by Deborah Rumsey
Statistics For Dummies Cheat Sheet From Statistics For Dummies, 2nd Edition by Deborah Rumsey Copyright © 2013 & Trademark by John Wiley & Sons, Inc. All rights reserved. Whether you’re studying for an exam or just want to make sense of data around you every day, knowing how and when to use data analysis techniques and formulas of statistics will help. Being able to make the connections between those statistical techniques and formulas is perhaps even more important. It builds confidence when attacking statistical problems and solidifies your strategies for completing statistical projects. Understanding Formulas for Common Statistics After data has been collected, the first step in analyzing it is to crunch out some descriptive statistics to get a feeling for the data. For example: Where is the center of the data located? How spread out is the data? How correlated are the data from two variables? The most common descriptive statistics are in the following table, along with their formulas and a short description of what each one measures. Statistically Figuring Sample Size When designing a study, the sample size is an important consideration because the larger the sample size, the more data you have, and the more precise your results will be (assuming high- quality data). If you know the level of precision you want (that is, your desired margin of error), you can calculate the sample size needed to achieve it. To find the sample size needed to estimate a population mean (µ), use the following formula: In this formula, MOE represents the desired margin of error (which you set ahead of time), and σ represents the population standard deviation. -
Random Variables and Applications
Random Variables and Applications OPRE 6301 Random Variables. As noted earlier, variability is omnipresent in the busi- ness world. To model variability probabilistically, we need the concept of a random variable. A random variable is a numerically valued variable which takes on different values with given probabilities. Examples: The return on an investment in a one-year period The price of an equity The number of customers entering a store The sales volume of a store on a particular day The turnover rate at your organization next year 1 Types of Random Variables. Discrete Random Variable: — one that takes on a countable number of possible values, e.g., total of roll of two dice: 2, 3, ..., 12 • number of desktops sold: 0, 1, ... • customer count: 0, 1, ... • Continuous Random Variable: — one that takes on an uncountable number of possible values, e.g., interest rate: 3.25%, 6.125%, ... • task completion time: a nonnegative value • price of a stock: a nonnegative value • Basic Concept: Integer or rational numbers are discrete, while real numbers are continuous. 2 Probability Distributions. “Randomness” of a random variable is described by a probability distribution. Informally, the probability distribution specifies the probability or likelihood for a random variable to assume a particular value. Formally, let X be a random variable and let x be a possible value of X. Then, we have two cases. Discrete: the probability mass function of X specifies P (x) P (X = x) for all possible values of x. ≡ Continuous: the probability density function of X is a function f(x) that is such that f(x) h P (x < · ≈ X x + h) for small positive h. -
Calculating Variance and Standard Deviation
VARIANCE AND STANDARD DEVIATION Recall that the range is the difference between the upper and lower limits of the data. While this is important, it does have one major disadvantage. It does not describe the variation among the variables. For instance, both of these sets of data have the same range, yet their values are definitely different. 90, 90, 90, 98, 90 Range = 8 1, 6, 8, 1, 9, 5 Range = 8 To better describe the variation, we will introduce two other measures of variation—variance and standard deviation (the variance is the square of the standard deviation). These measures tell us how much the actual values differ from the mean. The larger the standard deviation, the more spread out the values. The smaller the standard deviation, the less spread out the values. This measure is particularly helpful to teachers as they try to find whether their students’ scores on a certain test are closely related to the class average. To find the standard deviation of a set of values: a. Find the mean of the data b. Find the difference (deviation) between each of the scores and the mean c. Square each deviation d. Sum the squares e. Dividing by one less than the number of values, find the “mean” of this sum (the variance*) f. Find the square root of the variance (the standard deviation) *Note: In some books, the variance is found by dividing by n. In statistics it is more useful to divide by n -1. EXAMPLE Find the variance and standard deviation of the following scores on an exam: 92, 95, 85, 80, 75, 50 SOLUTION First we find the mean of the data: 92+95+85+80+75+50 477 Mean = = = 79.5 6 6 Then we find the difference between each score and the mean (deviation). -
Understanding Statistical Hypothesis Testing: the Logic of Statistical Inference
Review Understanding Statistical Hypothesis Testing: The Logic of Statistical Inference Frank Emmert-Streib 1,2,* and Matthias Dehmer 3,4,5 1 Predictive Society and Data Analytics Lab, Faculty of Information Technology and Communication Sciences, Tampere University, 33100 Tampere, Finland 2 Institute of Biosciences and Medical Technology, Tampere University, 33520 Tampere, Finland 3 Institute for Intelligent Production, Faculty for Management, University of Applied Sciences Upper Austria, Steyr Campus, 4040 Steyr, Austria 4 Department of Mechatronics and Biomedical Computer Science, University for Health Sciences, Medical Informatics and Technology (UMIT), 6060 Hall, Tyrol, Austria 5 College of Computer and Control Engineering, Nankai University, Tianjin 300000, China * Correspondence: [email protected]; Tel.: +358-50-301-5353 Received: 27 July 2019; Accepted: 9 August 2019; Published: 12 August 2019 Abstract: Statistical hypothesis testing is among the most misunderstood quantitative analysis methods from data science. Despite its seeming simplicity, it has complex interdependencies between its procedural components. In this paper, we discuss the underlying logic behind statistical hypothesis testing, the formal meaning of its components and their connections. Our presentation is applicable to all statistical hypothesis tests as generic backbone and, hence, useful across all application domains in data science and artificial intelligence. Keywords: hypothesis testing; machine learning; statistics; data science; statistical inference 1. Introduction We are living in an era that is characterized by the availability of big data. In order to emphasize the importance of this, data have been called the ‘oil of the 21st Century’ [1]. However, for dealing with the challenges posed by such data, advanced analysis methods are needed. -
Confidence Intervals
Confidence Intervals PoCoG Biostatistical Clinic Series Joseph Coll, PhD | Biostatistician Introduction › Introduction to Confidence Intervals › Calculating a 95% Confidence Interval for the Mean of a Sample › Correspondence Between Hypothesis Tests and Confidence Intervals 2 Introduction to Confidence Intervals › Each time we take a sample from a population and calculate a statistic (e.g. a mean or proportion), the value of the statistic will be a little different. › If someone asks for your best guess of the population parameter, you would use the value of the sample statistic. › But how well does the statistic estimate the population parameter? › It is possible to calculate a range of values based on the sample statistic that encompasses the true population value with a specified level of probability (confidence). 3 Introduction to Confidence Intervals › DEFINITION: A 95% confidence for the mean of a sample is a range of values which we can be 95% confident includes the mean of the population from which the sample was drawn. › If we took 100 random samples from a population and calculated a mean and 95% confidence interval for each, then approximately 95 of the 100 confidence intervals would include the population mean. 4 Introduction to Confidence Intervals › A single 95% confidence interval is the set of possible values that the population mean could have that are not statistically different from the observed sample mean. › A 95% confidence interval is a set of possible true values of the parameter of interest (e.g. mean, correlation coefficient, odds ratio, difference of means, proportion) that are consistent with the data. 5 Calculating a 95% Confidence Interval for the Mean of a Sample › ‾x‾ ± k * (standard deviation / √n) › Where ‾x‾ is the mean of the sample › k is the constant dependent on the hypothesized distribution of the sample mean, the sample size and the amount of confidence desired. -
4. Descriptive Statistics
4. Descriptive statistics Any time that you get a new data set to look at one of the first tasks that you have to do is find ways of summarising the data in a compact, easily understood fashion. This is what descriptive statistics (as opposed to inferential statistics) is all about. In fact, to many people the term “statistics” is synonymous with descriptive statistics. It is this topic that we’ll consider in this chapter, but before going into any details, let’s take a moment to get a sense of why we need descriptive statistics. To do this, let’s open the aflsmall_margins file and see what variables are stored in the file. In fact, there is just one variable here, afl.margins. We’ll focus a bit on this variable in this chapter, so I’d better tell you what it is. Unlike most of the data sets in this book, this is actually real data, relating to the Australian Football League (AFL).1 The afl.margins variable contains the winning margin (number of points) for all 176 home and away games played during the 2010 season. This output doesn’t make it easy to get a sense of what the data are actually saying. Just “looking at the data” isn’t a terribly effective way of understanding data. In order to get some idea about what the data are actually saying we need to calculate some descriptive statistics (this chapter) and draw some nice pictures (Chapter 5). Since the descriptive statistics are the easier of the two topics I’ll start with those, but nevertheless I’ll show you a histogram of the afl.margins data since it should help you get a sense of what the data we’re trying to describe actually look like, see Figure 4.2. -
Annex : Calculation of Mean and Standard Deviation
Annex : Calculation of Mean and Standard Deviation • A cholesterol control is run 20 times over 25 days yielding the following results in mg/dL: 192, 188, 190, 190, 189, 191, 188, 193, 188, 190, 191, 194, 194, 188, 192, 190, 189, 189, 191, 192. • Using the cholesterol control results, follow the steps described below to establish QC ranges . An example is shown on the next page. 1. Make a table with 3 columns, labeled A, B, C. 2. Insert the data points on the left (column A). 3. Add Data in column A. 4. Calculate the mean: Add the measurements (sum) and divide by the number of measurements (n). Mean= ∑ x +x +x +…. x 3809 = 190.5 mg/ dL 1 2 3 n N 20 5. Calculate the variance and standard deviation: (see formulas below) a. Subtract each data point from the mean and write in column B. b. Square each value in column B and write in column C. c. Add column C. Result is 71 mg/dL. d. Now calculate the variance: Divide the sum in column C by n-1 which is 19. Result is 4 mg/dL. e. The variance has little value in the laboratory because the units are squared. f. Now calculate the SD by taking the square root of the variance. g. The result is 2 mg/dL. Quantitative QC ● Module 7 ● Annex 1 A B C Data points. 2 xi −x (x −x) X1-Xn i 192 mg/dL 1.5 2.25 mg 2/dL 2 188 mg/dL -2.5 6.25 mg 2/dL2 190 mg/dL -0.5 0.25 mg 2/dL2 190 mg/dL -0.5 0.25 mg 2/dL2 189 mg/dL -1.5 2.25 mg 2/dL2 191 mg/dL 0.5 0.25 mg 2/dL2 188 mg/dL -2.5 6.25 mg 2/dL2 193 mg/dL 2.5 6.25 mg 2/dL2 188 mg/dL -2.5 6.25 mg 2/dL2 190 mg/dL -0.5 0.25 mg 2/dL2 191 mg/dL 0.5 0.25 mg 2/dL2 194 mg/dL 3.5 12.25 mg 2/dL2 194 mg/dL 3.5 12.25 mg 2/dL2 188 mg/dL -2.5 6.25 mg 2/dL2 192 mg/dL 1.5 2.25 mg 2/dL2 190 mg/dL -0.5 0.25 mg 2/dL2 189 mg/dL -1.5 2.25 mg 2/dL2 189 mg/dL -1.5 2.25 mg 2/dL2 191 mg/dL 0.5 0.25 mg 2/dL2 192 mg/dL 1.5 2.25 mg 2/dL2 2 2 2 ∑x=3809 ∑= -1 ∑ ( x i − x ) Sum of Col C is 71 mg /dL 2 − 2 ∑ (X i X) 2 SD = S = n−1 mg/dL SD = S = 71 19/ = 2mg / dL The square root returns the result to the original units . -
P Values and Confidence Intervals Friends Or Foe
P values and Confidence Intervals Friends or Foe Dr.L.Jeyaseelan Dept. of Biostatistics Christian Medical College Vellore – 632 002, India JPGM WriteCon March 30-31, 2007, KEM Hospital, Mumbai Confidence Interval • The mean or proportion observed in a sample is the best estimate of the true value in the population. • We can combine these features of estimates from a sample from a sample with the known properties of the normal distribution to get an idea of the uncertainty associated with a single sample estimate of the population value. Confidence Interval • The interval between Mean - 2SE and Mean + 2 SE will be 95%. • That is, we expect that 95% CI will not include the true population value 5% of the time. • No particular reason for choosing 95% CI other than convention. Confidence Interval Interpretation: The 95% CI for the sample mean is interpreted as a range of values which contains the true population mean with probability 0.95%. Ex: Mean serum albumin of the population (source) was 35 g/l. Calculate 95% CI for the mean serum albumin using each of the 100 random samples of size 25. Scatter Plot of 95% CI: Confidence intervals for mean serum albumin constructed from 100 random samples of size 25. The vertical lines show the range within which 95% of sample means are expected to fall. Confidence Interval • CI expresses a summary of the data in the original units or measurement. • It reflects the variability in the data, sample size and the actual effect size • Particularly helpful for non-significant findings. Relative Risk 1.5 95% CI 0.6 – 3.8 Confidence Interval for Low Prevalence • Asymptotic Formula: p ± 1.96 SE provides symmetric CI. -
Student's T Distribution
Student’s t distribution - supplement to chap- ter 3 For large samples, X¯ − µ Z = n √ (1) n σ/ n has approximately a standard normal distribution. The parameter σ is often unknown and so we must replace σ by s, where s is the square root of the sample variance. This new statistic is usually denoted with a t. X¯ − µ t = n√ (2) s/ n If the sample is large, the distribution of t is also approximately the standard normal. What if the sample is not large? In the absence of any further information there is not much to be said other than to remark that the variance of t will be somewhat larger than 1. If we assume that the random variable X has a normal distribution, then we can say much more. (Often people say the population is normal, meaning the distribution of the rv X is normal.) So now suppose that X is a normal random variable with mean µ and variance σ2. The two parameters µ and σ are unknown. Since a sum of independent normal random variables is normal, the distribution of Zn is exactly normal, no matter what the sample size is. Furthermore, Zn has mean zero and variance 1, so the distribution of Zn is exactly that of the standard normal. The distribution of t is not normal, but it is completely determined. It depends on the sample size and a priori it looks like it will depend on the parameters µ and σ. With a little thought you should be able to convince yourself that it does not depend on µ.