TABLES and FORMULAS for MOORE Basic Practice of Statistics

Total Page:16

File Type:pdf, Size:1020Kb

TABLES and FORMULAS for MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Correlation (use a calculator): • Look for overall pattern (shape, center, spread) 1 x x y y • r = i − i − and deviations (outliers). n 1 sx % sy & − ! # $ Mean (use a calculator): • Least-squares regression line (use a calculator): • x1 + x2 + + xn 1 yˆ = a + bx with slope b = rsy/sx and intercept x = ··· = xi a = y bx n n − ! Residuals: Standard deviation (use a calculator): • • residual = observed y predicted y = y yˆ − − 1 2 s = (xi x) "n 1 − − ! Producing Data Median: Arrange all observations from smallest • Simple random sample: Choose an SRS by to largest. The median M is located (n + 1)/2 • observations from the beginning of this list. giving every individual in the population a numerical label and using Table B of random Quartiles: The first quartile Q is the median of digits to choose the sample. • 1 the observations whose position in the ordered Randomized comparative experiments: list is to the left of the location of the overall • median. The third quartile Q is the median of 3 Group 1 - Treatment 1 the observations to the right of the location of Random ¨¨* HHj Observe the overall median. AllocationHHj ¨¨* Response Group 2 - Treatment 2 Five-number summary: • Minimum,Q1, M, Q3, Maximum Probability and Sampling Distributions Standardized value of x: • Probability rules: x µ • z = − Any probability satisfies 0 P (A) 1. σ • ≤ ≤ The sample space S has probability • P (S) = 1. Exploring Data: Relationships If events A and B are disjoint, P (A or B)= • P (A)+P (B). Look for overall pattern (form, direction, • For any event A, P (A does not occur) = strength) and deviations (outliers, influential • 1 P (A) observations). − Sampling distribution of a sample mean: Two-sample t test statistic for H : µ = µ • • 0 1 2 (independent SRSs from Normal populations): x has mean µ and standard deviation σ/√n. • x has a Normal distribution if the popula- x1 x2 • t = − tion distribution is Normal. s2 s2 1 + 2 Central limit theorem: x is approximately "n1 n2 • Normal when n is large. with conservative P -values from t with df the smaller of n1 1 and n2 1 (or use software). Basics of Inference − − z confidence interval for a population mean Inference About Proportions • (σ known, SRS from Normal population): Sampling distribution of a sample proportion: σ • x z∗ z∗ from N(0, 1) when the population and the sample size are ± √n both large and p is not close to 0 or 1,p ˆ is ap- Sample size for desired margin of error m: proximately Normal with mean p and standard • deviation p(1 p)/n. z σ 2 − n = ∗ ' m Large-sample z confidence interval for p: # $ • z test statistic for H0 : µ = µ0 (σ known, SRS pˆ(1 pˆ) • pˆ z∗ − z∗ from N(0, 1) from Normal population): ± " n x µ z = − 0 P -values from N(0, 1) Plus four to greatly improve accuracy: use the σ/√n same formula after adding 2 successes and two failures to the data. z test statistic for H : p = p (large SRS): Inference About Means • 0 0 t confidence interval for a population mean (SRS pˆ p0 • z = − P -values from N(0, 1) from Normal population): p (1 p ) 0 − 0 s " n x t∗ t∗ from t(n 1) ± √n − Sample size for desired margin of error m: t test statistic for H0 : µ = µ0 (SRS from Normal • • 2 population): z∗ n = p∗(1 p∗) m x µ # $ − t = − 0 P -values from t(n 1) s/√n − where p∗ is a guessed value for p or p∗ =0.5. Matched pairs: To compare the responses to the Large-sample z confidence interval for p1 p2: • • − two treatments, apply the one-sample t proce- (ˆp pˆ ) z∗SE z∗ from N(0, 1) dures to the observed differences. 1 − 2 ± Two-sample t confidence interval for µ µ (in- where the standard error ofp ˆ1 pˆ2 is • 1 − 2 − dependent SRSs from Normal populations): pˆ1(1 pˆ1) pˆ2(1 pˆ2) 2 2 SE = − + − s1 s2 " n1 n2 (x x ) t∗ + 1 2 n n − ± " 1 2 Plus four to greatly improve accuracy: use the with conservative t∗ from t with df the smaller same formulas after adding one success and one of n 1 and n 1 (or use software). failure to each sample. 1 − 2 − Two-sample z test statistic for H : p = p t confidence interval for regression slope β: • 0 1 2 • (large independent SRSs): b t∗SEb t∗ from t(n 2) pˆ pˆ ± − z = 1 − 2 1 1 t test statistic for no linear relationship, pˆ(1 pˆ) + • n n H0 : β = 0: " − # 1 2 $ b wherep ˆ is the pooled proportion of successes. t = P -values from t(n 2) SEb − The Chi-Square Test t confidence interval for mean response µ when • y x = x∗: Expected count for a cell in a two-way table: • row total column total yˆ t∗SEµˆ t∗ from t(n 2) expected count = × ± − table total t prediction interval for an individual observa- • Chi-square test statistic for testing whether the tion y when x = x∗: • row and column variables in an r c table are × yˆ t∗SE t∗ from t(n 2) unrelated (expected cell counts not too small): ± yˆ − (observed count expected count)2 X2 = − expected count ! One-way Analysis of Variance: with P -values from the chi-square distribution Comparing Several Means with df = (r 1) (c 1). − × − ANOVA F tests whether all of I populations Describe the relationship using percents, com- • • have the same mean, based on independent SRSs parison of observed with expected counts, and from I Normal populations with the same σ. terms of X2. P -values come from the F distribution with I 1 − and N I degrees of freedom, where N is the − Inference for Regression total observations in all samples. Conditions for regression inference: n observa- Describe the data using the I sample means and • • tions on x and y. The response y for any fixed standard deviations and side-by-side graphs of x has a Normal distribution with mean given by the samples. the true regression line µ = α + βx and stan- y The ANOVA F test statistic (use software) is dard deviation σ. Parameters are α, β, σ. • F = MSG/MSE, where Estimate α by the intercept a and β by the slope • n (x x)2 + + n (x x)2 b of the least-squares line. Estimate σ by the MSG = 1 1 − ··· I I − I 1 regression standard error: − (n 1)s2 + +(n 1)s2 MSE = 1 − 1 ··· I − I 1 N I s = residual2 − "n 2 − ! Use software for all standard errors in regression. P1: OSO FREE013-TABLE FREE013-Moore September 4, 2008 7:30 690 TABLES Table entry for z is the area under the standard Normal curve to the Table entry left of z. z TABLE A Standard Normal cumulative proportions z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 −3.4 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0002 −3.3 .0005 .0005 .0005 .0004 .0004 .0004 .0004 .0004 .0004 .0003 −3.2 .0007 .0007 .0006 .0006 .0006 .0006 .0006 .0005 .0005 .0005 −3.1 .0010 .0009 .0009 .0009 .0008 .0008 .0008 .0008 .0007 .0007 −3.0 .0013 .0013 .0013 .0012 .0012 .0011 .0011 .0011 .0010 .0010 −2.9 .0019 .0018 .0018 .0017 .0016 .0016 .0015 .0015 .0014 .0014 −2.8 .0026 .0025 .0024 .0023 .0023 .0022 .0021 .0021 .0020 .0019 −2.7 .0035 .0034 .0033 .0032 .0031 .0030 .0029 .0028 .0027 .0026 −2.6 .0047 .0045 .0044 .0043 .0041 .0040 .0039 .0038 .0037 .0036 −2.5 .0062 .0060 .0059 .0057 .0055 .0054 .0052 .0051 .0049 .0048 −2.4 .0082 .0080 .0078 .0075 .0073 .0071 .0069 .0068 .0066 .0064 −2.3 .0107 .0104 .0102 .0099 .0096 .0094 .0091 .0089 .0087 .0084 −2.2 .0139 .0136 .0132 .0129 .0125 .0122 .0119 .0116 .0113 .0110 −2.1 .0179 .0174 .0170 .0166 .0162 .0158 .0154 .0150 .0146 .0143 −2.0 .0228 .0222 .0217 .0212 .0207 .0202 .0197 .0192 .0188 .0183 −1.9 .0287 .0281 .0274 .0268 .0262 .0256 .0250 .0244 .0239 .0233 −1.8 .0359 .0351 .0344 .0336 .0329 .0322 .0314 .0307 .0301 .0294 −1.7 .0446 .0436 .0427 .0418 .0409 .0401 .0392 .0384 .0375 .0367 −1.6 .0548 .0537 .0526 .0516 .0505 .0495 .0485 .0475 .0465 .0455 −1.5 .0668 .0655 .0643 .0630 .0618 .0606 .0594 .0582 .0571 .0559 −1.4 .0808 .0793 .0778 .0764 .0749 .0735 .0721 .0708 .0694 .0681 −1.3 .0968 .0951 .0934 .0918 .0901 .0885 .0869 .0853 .0838 .0823 −1.2 .1151 .1131 .1112 .1093 .1075 .1056 .1038 .1020 .1003 .0985 −1.1 .1357 .1335 .1314 .1292 .1271 .1251 .1230 .1210 .1190 .1170 −1.0 .1587 .1562 .1539 .1515 .1492 .1469 .1446 .1423 .1401 .1379 −0.9 .1841 .1814 .1788 .1762 .1736 .1711 .1685 .1660 .1635 .1611 −0.8 .2119 .2090 .2061 .2033 .2005 .1977 .1949 .1922 .1894 .1867 −0.7 .2420 .2389 .2358 .2327 .2296 .2266 .2236 .2206 .2177 .2148 −0.6 .2743 .2709 .2676 .2643 .2611 .2578 .2546 .2514 .2483 .2451 −0.5 .3085 .3050 .3015 .2981 .2946 .2912 .2877 .2843 .2810 .2776 −0.4 .3446 .3409 .3372 .3336 .3300 .3264 .3228 .3192 .3156 .3121 −0.3 .3821 .3783 .3745 .3707 .3669 .3632 .3594 .3557 .3520 .3483 −0.2 .4207 .4168 .4129 .4090 .4052 .4013 .3974 .3936 .3897 .3859 −0.1 .4602 .4562 .4522 .4483 .4443 .4404 .4364 .4325 .4286 .4247 −0.0 .5000 .4960 .4920 .4880 .4840 .4801 .4761 .4721 .4681 .4641 Revised Pages P1: OSO FREE013-TABLE FREE013-Moore September 4, 2008 7:30 TABLES 691 Table entry Table entry for z is the area under the standard Normal curve to the left of z.
Recommended publications
  • Projections of Education Statistics to 2022 Forty-First Edition
    Projections of Education Statistics to 2022 Forty-first Edition 20192019 20212021 20182018 20202020 20222022 NCES 2014-051 U.S. DEPARTMENT OF EDUCATION Projections of Education Statistics to 2022 Forty-first Edition FEBRUARY 2014 William J. Hussar National Center for Education Statistics Tabitha M. Bailey IHS Global Insight NCES 2014-051 U.S. DEPARTMENT OF EDUCATION U.S. Department of Education Arne Duncan Secretary Institute of Education Sciences John Q. Easton Director National Center for Education Statistics John Q. Easton Acting Commissioner The National Center for Education Statistics (NCES) is the primary federal entity for collecting, analyzing, and reporting data related to education in the United States and other nations. It fulfills a congressional mandate to collect, collate, analyze, and report full and complete statistics on the condition of education in the United States; conduct and publish reports and specialized analyses of the meaning and significance of such statistics; assist state and local education agencies in improving their statistical systems; and review and report on education activities in foreign countries. NCES activities are designed to address high-priority education data needs; provide consistent, reliable, complete, and accurate indicators of education status and trends; and report timely, useful, and high-quality data to the U.S. Department of Education, the Congress, the states, other education policymakers, practitioners, data users, and the general public. Unless specifically noted, all information contained herein is in the public domain. We strive to make our products available in a variety of formats and in language that is appropriate to a variety of audiences. You, as our customer, are the best judge of our success in communicating information effectively.
    [Show full text]
  • Use of Statistical Tables
    TUTORIAL | SCOPE USE OF STATISTICAL TABLES Lucy Radford, Jenny V Freeman and Stephen J Walters introduce three important statistical distributions: the standard Normal, t and Chi-squared distributions PREVIOUS TUTORIALS HAVE LOOKED at hypothesis testing1 and basic statistical tests.2–4 As part of the process of statistical hypothesis testing, a test statistic is calculated and compared to a hypothesised critical value and this is used to obtain a P- value. This P-value is then used to decide whether the study results are statistically significant or not. It will explain how statistical tables are used to link test statistics to P-values. This tutorial introduces tables for three important statistical distributions (the TABLE 1. Extract from two-tailed standard Normal, t and Chi-squared standard Normal table. Values distributions) and explains how to use tabulated are P-values corresponding them with the help of some simple to particular cut-offs and are for z examples. values calculated to two decimal places. STANDARD NORMAL DISTRIBUTION TABLE 1 The Normal distribution is widely used in statistics and has been discussed in z 0.00 0.01 0.02 0.03 0.050.04 0.05 0.06 0.07 0.08 0.09 detail previously.5 As the mean of a Normally distributed variable can take 0.00 1.0000 0.9920 0.9840 0.9761 0.9681 0.9601 0.9522 0.9442 0.9362 0.9283 any value (−∞ to ∞) and the standard 0.10 0.9203 0.9124 0.9045 0.8966 0.8887 0.8808 0.8729 0.8650 0.8572 0.8493 deviation any positive value (0 to ∞), 0.20 0.8415 0.8337 0.8259 0.8181 0.8103 0.8206 0.7949 0.7872 0.7795 0.7718 there are an infinite number of possible 0.30 0.7642 0.7566 0.7490 0.7414 0.7339 0.7263 0.7188 0.7114 0.7039 0.6965 Normal distributions.
    [Show full text]
  • On the Meaning and Use of Kurtosis
    Psychological Methods Copyright 1997 by the American Psychological Association, Inc. 1997, Vol. 2, No. 3,292-307 1082-989X/97/$3.00 On the Meaning and Use of Kurtosis Lawrence T. DeCarlo Fordham University For symmetric unimodal distributions, positive kurtosis indicates heavy tails and peakedness relative to the normal distribution, whereas negative kurtosis indicates light tails and flatness. Many textbooks, however, describe or illustrate kurtosis incompletely or incorrectly. In this article, kurtosis is illustrated with well-known distributions, and aspects of its interpretation and misinterpretation are discussed. The role of kurtosis in testing univariate and multivariate normality; as a measure of departures from normality; in issues of robustness, outliers, and bimodality; in generalized tests and estimators, as well as limitations of and alternatives to the kurtosis measure [32, are discussed. It is typically noted in introductory statistics standard deviation. The normal distribution has a kur- courses that distributions can be characterized in tosis of 3, and 132 - 3 is often used so that the refer- terms of central tendency, variability, and shape. With ence normal distribution has a kurtosis of zero (132 - respect to shape, virtually every textbook defines and 3 is sometimes denoted as Y2)- A sample counterpart illustrates skewness. On the other hand, another as- to 132 can be obtained by replacing the population pect of shape, which is kurtosis, is either not discussed moments with the sample moments, which gives or, worse yet, is often described or illustrated incor- rectly. Kurtosis is also frequently not reported in re- ~(X i -- S)4/n search articles, in spite of the fact that virtually every b2 (•(X i - ~')2/n)2' statistical package provides a measure of kurtosis.
    [Show full text]
  • The Probability Lifesaver: Order Statistics and the Median Theorem
    The Probability Lifesaver: Order Statistics and the Median Theorem Steven J. Miller December 30, 2015 Contents 1 Order Statistics and the Median Theorem 3 1.1 Definition of the Median 5 1.2 Order Statistics 10 1.3 Examples of Order Statistics 15 1.4 TheSampleDistributionoftheMedian 17 1.5 TechnicalboundsforproofofMedianTheorem 20 1.6 TheMedianofNormalRandomVariables 22 2 • Greetings again! In this supplemental chapter we develop the theory of order statistics in order to prove The Median Theorem. This is a beautiful result in its own, but also extremely important as a substitute for the Central Limit Theorem, and allows us to say non- trivial things when the CLT is unavailable. Chapter 1 Order Statistics and the Median Theorem The Central Limit Theorem is one of the gems of probability. It’s easy to use and its hypotheses are satisfied in a wealth of problems. Many courses build towards a proof of this beautiful and powerful result, as it truly is ‘central’ to the entire subject. Not to detract from the majesty of this wonderful result, however, what happens in those instances where it’s unavailable? For example, one of the key assumptions that must be met is that our random variables need to have finite higher moments, or at the very least a finite variance. What if we were to consider sums of Cauchy random variables? Is there anything we can say? This is not just a question of theoretical interest, of mathematicians generalizing for the sake of generalization. The following example from economics highlights why this chapter is more than just of theoretical interest.
    [Show full text]
  • Testing Hypotheses
    Chapter 7 Testing Hypotheses Chapter Learning Objectives Understanding the assumptions of statistical hypothesis testing Defining and applying the components in hypothesis testing: the research and null hypotheses, sampling distribution, and test statistic Understanding what it means to reject or fail to reject a null hypothesis Applying hypothesis testing to two sample cases, with means or proportions n the past, the increase in the price of gasoline could be attributed to major national or global event, such as the Lebanon and Israeli war or Hurricane Katrina. However, in 2005, the price for a Igallon of regular gasoline reached $3.00 and remained high for a long time afterward. The impact of unpredictable fuel prices is still felt across the nation, but the burden is greater among distinct social economic groups and geographic areas. Lower-income Americans spend eight times more of their disposable income on gasoline than wealthier Americans do.1 For example, in Wilcox, Alabama, individuals spend 12.72% of their income to fuel one vehicle, while in Hunterdon Co., New Jersey, people spend 1.52%. Nationally, Americans spend 3.8% of their income fueling one vehicle. The first state to reach the $3.00-per-gallon milestone was California in 2005. California’s drivers were especially hit hard by the rising price of gas, due in part to their reliance on automobiles, especially for work commuters. Analysts predicted that gas prices would continue to rise nationally. Declines in consumer spending and confidence in the economy have been attributed in part to the high (and rising) cost of gasoline. In 2010, gasoline prices have remained higher for states along the West Coast, particularly in Alaska and California.
    [Show full text]
  • Tests of Hypotheses Using Statistics
    Tests of Hypotheses Using Statistics Adam Massey¤and Steven J. Millery Mathematics Department Brown University Providence, RI 02912 Abstract We present the various methods of hypothesis testing that one typically encounters in a mathematical statistics course. The focus will be on conditions for using each test, the hypothesis tested by each test, and the appropriate (and inappropriate) ways of using each test. We conclude by summarizing the di®erent tests (what conditions must be met to use them, what the test statistic is, and what the critical region is). Contents 1 Types of Hypotheses and Test Statistics 2 1.1 Introduction . 2 1.2 Types of Hypotheses . 3 1.3 Types of Statistics . 3 2 z-Tests and t-Tests 5 2.1 Testing Means I: Large Sample Size or Known Variance . 5 2.2 Testing Means II: Small Sample Size and Unknown Variance . 9 3 Testing the Variance 12 4 Testing Proportions 13 4.1 Testing Proportions I: One Proportion . 13 4.2 Testing Proportions II: K Proportions . 15 4.3 Testing r £ c Contingency Tables . 17 4.4 Incomplete r £ c Contingency Tables Tables . 18 5 Normal Regression Analysis 19 6 Non-parametric Tests 21 6.1 Tests of Signs . 21 6.2 Tests of Ranked Signs . 22 6.3 Tests Based on Runs . 23 ¤E-mail: [email protected] yE-mail: [email protected] 1 7 Summary 26 7.1 z-tests . 26 7.2 t-tests . 27 7.3 Tests comparing means . 27 7.4 Variance Test . 28 7.5 Proportions . 28 7.6 Contingency Tables .
    [Show full text]
  • Understanding Statistical Hypothesis Testing: the Logic of Statistical Inference
    Review Understanding Statistical Hypothesis Testing: The Logic of Statistical Inference Frank Emmert-Streib 1,2,* and Matthias Dehmer 3,4,5 1 Predictive Society and Data Analytics Lab, Faculty of Information Technology and Communication Sciences, Tampere University, 33100 Tampere, Finland 2 Institute of Biosciences and Medical Technology, Tampere University, 33520 Tampere, Finland 3 Institute for Intelligent Production, Faculty for Management, University of Applied Sciences Upper Austria, Steyr Campus, 4040 Steyr, Austria 4 Department of Mechatronics and Biomedical Computer Science, University for Health Sciences, Medical Informatics and Technology (UMIT), 6060 Hall, Tyrol, Austria 5 College of Computer and Control Engineering, Nankai University, Tianjin 300000, China * Correspondence: [email protected]; Tel.: +358-50-301-5353 Received: 27 July 2019; Accepted: 9 August 2019; Published: 12 August 2019 Abstract: Statistical hypothesis testing is among the most misunderstood quantitative analysis methods from data science. Despite its seeming simplicity, it has complex interdependencies between its procedural components. In this paper, we discuss the underlying logic behind statistical hypothesis testing, the formal meaning of its components and their connections. Our presentation is applicable to all statistical hypothesis tests as generic backbone and, hence, useful across all application domains in data science and artificial intelligence. Keywords: hypothesis testing; machine learning; statistics; data science; statistical inference 1. Introduction We are living in an era that is characterized by the availability of big data. In order to emphasize the importance of this, data have been called the ‘oil of the 21st Century’ [1]. However, for dealing with the challenges posed by such data, advanced analysis methods are needed.
    [Show full text]
  • History of Health Statistics
    certain gambling problems [e.g. Pacioli (1494), Car- Health Statistics, History dano (1539), and Forestani (1603)] which were first of solved definitively by Pierre de Fermat (1601–1665) and Blaise Pascal (1623–1662). These efforts gave rise to the mathematical basis of probability theory, The field of statistics in the twentieth century statistical distribution functions (see Sampling Dis- (see Statistics, Overview) encompasses four major tributions), and statistical inference. areas; (i) the theory of probability and mathematical The analysis of uncertainty and errors of mea- statistics; (ii) the analysis of uncertainty and errors surement had its foundations in the field of astron- of measurement (see Measurement Error in omy which, from antiquity until the eighteenth cen- Epidemiologic Studies); (iii) design of experiments tury, was the dominant area for use of numerical (see Experimental Design) and sample surveys; information based on the most precise measure- and (iv) the collection, summarization, display, ments that the technologies of the times permit- and interpretation of observational data (see ted. The fallibility of their observations was evident Observational Study). These four areas are clearly to early astronomers, who took the “best” obser- interrelated and have evolved interactively over the vation when several were taken, the “best” being centuries. The first two areas are well covered assessed by such criteria as the quality of obser- in many histories of mathematical statistics while vational conditions, the fame of the observer, etc. the third area, being essentially a twentieth- But, gradually an appreciation for averaging observa- century development, has not yet been adequately tions developed and various techniques for fitting the summarized.
    [Show full text]
  • An Introduction to Regression Analysis Alan O
    University of Chicago Law School Chicago Unbound Coase-Sandor Working Paper Series in Law and Coase-Sandor Institute for Law and Economics Economics 1993 An Introduction to Regression Analysis Alan O. Sykes Follow this and additional works at: https://chicagounbound.uchicago.edu/law_and_economics Part of the Law Commons Recommended Citation Alan O. Sykes, "An Introduction to Regression Analysis" (Coase-Sandor Institute for Law & Economics Working Paper No. 20, 1993). This Working Paper is brought to you for free and open access by the Coase-Sandor Institute for Law and Economics at Chicago Unbound. It has been accepted for inclusion in Coase-Sandor Working Paper Series in Law and Economics by an authorized administrator of Chicago Unbound. For more information, please contact [email protected]. The Inaugural Coase Lecture An Introduction to Regression Analysis Alan O. Sykes* Regression analysis is a statistical tool for the investigation of re- lationships between variables. Usually, the investigator seeks to ascertain the causal effect of one variable upon another—the effect of a price increase upon demand, for example, or the effect of changes in the money supply upon the inflation rate. To explore such issues, the investigator assembles data on the underlying variables of interest and employs regression to estimate the quantitative effect of the causal variables upon the variable that they influence. The investigator also typically assesses the “statistical significance” of the estimated relationships, that is, the degree of confidence that the true relationship is close to the estimated relationship. Regression techniques have long been central to the field of eco- nomic statistics (“econometrics”).
    [Show full text]
  • 1 - to Overcoming the Reluctance of Epidemiologists to Explicitly Tackle the Measurement Error Problem
    Statistics → Applications → Epidemiology Does adjustment for measurement error induce positive bias if there is no true association? Igor Burstyn, Ph.D. Community and Occupational Medicine Program, Department of Medicine, Faculty of Medicine and Dentistry, the University of Alberta, Edmonton, Alberta, Canada; Tel: 780-492-3240; Fax: 780-492-9677; E-mail: [email protected] Abstract This article is a response to an off-the-record discussion that I had at an international meeting of epidemiologists. It centered on a concern, perhaps widely spread, that measurement error adjustment methods can induce positive bias in results of epidemiological studies when there is no true association. I trace the possible history of this supposition and test it in a simulation study of both continuous and binary health outcomes under a classical multiplicative measurement error model. A Bayesian measurement adjustment method is used. The main conclusion is that adjustment for the presumed measurement error does not ‘induce’ positive associations, especially if the focus of the interpretation of the result is taken away from the point estimate. This is in line with properties of earlier measurement error adjustment methods introduced to epidemiologists in the 1990’s. An heuristic argument is provided to support the generalizability of this observation in the Bayesian framework. I find that when there is no true association, positive bias can only be induced by indefensible manipulation of the priors, such that they dominate the data. The misconception about bias induced by measurement error adjustment should be more clearly explained during the training of epidemiologists to ensure the appropriate (and wider) use of measurement error correction procedures.
    [Show full text]
  • Vital Statistics Rapid Release, Number 015 (July 2021)
    Vital Statistics Rapid Release Report No. 015 July 2021 Provisional Life Expectancy Estimates for 2020 Elizabeth Arias, Ph.D., Betzaida Tejada-Vera, M.S., Farida Ahmad, M.P.H., and Kenneth D. Kochanek, M.A. Introduction not produced due to the impact of race of births for the same period based on and ethnicity misclassification on death birth records received and processed The National Center for Health certificates for these populations on the by NCHS as of April 7, 2021; and, July Statistics (NCHS) collects and precision of life expectancy estimates 1, 2020, monthly postcensal population disseminates the nation’s official vital (2). There are two types of life tables: the estimates based on the 2010 decennial statistics through the National Vital cohort (or generation) and the period (or census. Provisional mortality rates are Statistics System (NVSS). NCHS uses current) life table. The cohort life table typically computed using death data provisional vital statistics data for presents the mortality experience of a after a 3-month lag following date of conducting public health surveillance particular birth cohort from the moment death, as completeness and timeliness and final data for producing annual of birth through consecutive ages in of provisional death data can vary national natality and mortality statistics. successive calendar years. The period by many factors, including cause of NCHS publishes annual and decennial life table does not represent the mortality death, month of the year, and age of national life tables based on final experience of an actual birth cohort but the decedent (4,5). Mortality data used vital statistics.
    [Show full text]
  • Introduction to Hypothesis Testing 3
    CHAPTER Introduction to Hypothesis 8 Testing 8.1 Inferential Statistics and Hypothesis Testing 8.2 Four Steps to LEARNING OBJECTIVES Hypothesis Testing After reading this chapter, you should be able to: 8.3 Hypothesis Testing and Sampling Distributions 8.4 Making a Decision: 1 Identify the four steps of hypothesis testing. Types of Error 8.5 Testing a Research 2 Define null hypothesis, alternative hypothesis, Hypothesis: Examples level of significance, test statistic, p value, and Using the z Test statistical significance. 8.6 Research in Focus: Directional Versus 3 Define Type I error and Type II error, and identify the Nondirectional Tests type of error that researchers control. 8.7 Measuring the Size of an Effect: Cohen’s d 4 Calculate the one-independent sample z test and 8.8 Effect Size, Power, and interpret the results. Sample Size 8.9 Additional Factors That 5 Distinguish between a one-tailed and two-tailed test, Increase Power and explain why a Type III error is possible only with 8.10 SPSS in Focus: one-tailed tests. A Preview for Chapters 9 to 18 6 Explain what effect size measures and compute a 8.11 APA in Focus: Reporting the Test Cohen’s d for the one-independent sample z test. Statistic and Effect Size 7 Define power and identify six factors that influence power. 8 Summarize the results of a one-independent sample z test in American Psychological Association (APA) format. 2 PART III: PROBABILITY AND THE FOUNDATIONS OF INFERENTIAL STATISTICS 8.1 INFERENTIAL STATISTICS AND HYPOTHESIS TESTING We use inferential statistics because it allows us to measure behavior in samples to learn more about the behavior in populations that are often too large or inaccessi­ ble.
    [Show full text]