Statistics and Epidemiology

Total Page:16

File Type:pdf, Size:1020Kb

Statistics and Epidemiology Slide 1 Statistics and Epidemiology Review of Book 7 SEER Program Self Instructional Manual for Cancer Registrars Cancer Information Management Program Outcomes, Data Quality and Data Utilization Course Statistics, Epidemiology and Data Utilization Module When or where do you use statistics in your everyday operations? Annual reports, data requests? How about evaluating quality or productivity, cancer conference documentation, survey documentation, budgets…just to name a few? If you stop and think about it, you may see that you are already doing many of the statistical techniques discussed in Book 7. You may just not think about yourself in terms of being a “statistician”. This presentation will provide a brief overview of the statistical and epidemiological methods introduced in Book 7: Statistics and Epidemiology for Cancer Registries. You may find it helpful to follow along in your book. Slide 2 Statistics Branch of mathematics Collection, summarization, analysis, interpretation, and presentation of masses of numerical data We will begin with statistics. Statistics is the science of gathering and interpreting facts and figures. Statistics can be further described as descriptive or inferential. Descriptive statistics will be discussed in more detail in the following slides. Inferential statistics refer to the procedures used to make a statement about a population based upon the results of a sample. Inferential statistics is related to methods applied in hypothesis testing. Cancer registrars would usually seek assistance from a statistician or researcher before attempting to conduct a study using hypothesis testing. Slide 3 Statistical Analysis Summarize the essential features and relationships of the data Reveal the major characteristics of the patient group Determine broad patterns of behavior or tendencies Statistical Analysis is the process of: -summarizing data -identifying the major characteristics of the patient group based on the data -then using the data and those major characteristics identified to determine broad patterns of behavior of a group of people. Slide 4 Descriptive Statistics SEER Book 7, Section B Descriptive statistics uses numerical summaries to describe an observed frequency distribution. Study Hint: Have you ever read a definition of a term or phrase and you had to look up the definition of the words contained in the original definition? Try to think of the principles of statistics as stepping stones. Such as you need to know how to add and subtract before you can do algebra. It is important to have a clear understanding of the basic terminology. You will see them used again and again as we get into the more complicated side of statistics and epidemiology. For example: Once you have a clear understanding of the use of the term “mean”, those same principles are applied to any other definition or phrase including the word “mean”. A mean tumor size or mean survival time is still based on calculating averages. Slide 5 Shorthand notation X = value of an observed measurement (X=…) ∑ = sum of the values of X n = number (count) of observations in our group _ n – 1 = degrees of freedom (the number of __ observations that are free to vary) X = mean, average value SD = standard deviation √ = square root This slide contains a reference key of symbols that will be used in the formulas on the upcoming slides. It is not necessary to memorize these symbols. It is more for your own use during your independent study time and for understanding the shorthand used in the workbook. The calculation of variance introduces the term “Degrees of freedom”. If you have 10 observations and the sum of the observation is 100, the first 9 (10-1) can be any number. The last number cannot. The last number has to be the number that when added to the 1st nine equals 100. Therefore, 9 of the numbers are free to vary. Slide 6 How do we summarize a set of data? Characterize a set of data in terms of: 1) Central values about which the data tend to cluster 2) The amount of spread or the dispersion of the observations If you will remember, statistics is the collection and summation of masses of numerical data, so… If measurable characteristics among individuals did not vary, describing a set of data would be completed after the first observation. Example: If everyone’s blood pressure was 110/70, then there would be no need in taking BP at each doctor visit. We will see this concept again when we look at the normal curve. Slide 7 Measures of Central Tendency Central values about which the data tend to cluster “Typical” values Example: Average tumor size Measuring central tendency are important because measurable characteristics (such as age and stage) vary from individual to individual. Therefore, we need to summarize the data in order to analyze the results. Typical values = what you see most often. The graphic at the bottom is a visual reminder. When you think of measures of central tendency, think about how the data clusters in the middle of the observations. Slide 8 Measures of Central Tendency Widely used measures of central tendency Mean Median Mode The tools we use to calculate the typical values are the 3 M’s…mean, median, and mode. To help describe mean, median, and mode, we will use an example of five tumor sizes. Slide 9 Mean _ Average (X) Influenced more by extreme values _ Sum of all values ∑X 8+5+3+6+3 25 X = -------------------- = ---- = ---------------- = --- = 5 # of values n 5 5 Mean = the statistical verbiage for average. Q: What is the mean in our example of tumor sizes? A: Add all of the values together (25) and divide by the number of values (5). The mean is 5. Q: What would happen if we added an extreme value (20) to our set of tumor sizes? A: Add 20 to the list of tumor sizes. Mean = 45 / 6 = 7.5. As you can see, the mean is more influenced by extreme values than is median and mode. Note: It is not necessary to memorize the formulas in this presentation. They are provided only to help with demonstrating and understanding the use of the terms and definitions. Slide 10 Median Middle value Sort the observations in order from smallest to largest Stable measure 83 53 35 6 38 Median is the 50th percentile – ½ of the values are smaller and ½ are larger Q: What is the median in our example of tumor sizes? A: First we have to sort them in order from smallest to largest (click), then take the middle value (click). Median is 5 (click). Q: What would happen to our median if we added an extreme value. Let’s use 20 again? A: Add 20 to the list of tumor sizes - 3,3,5,6,8,20. The middle falls between 5 and 6. If the middle falls between two values, then average the two middle values. Median is now 5.5. Stable measure – adding extreme values to a series of observations tends to cause only a limited change in the value of the median Slide 11 Mode Most frequently seen value 3 3 5 6 8 There may be no modal value (3, 5, 6, 8) There may be more than one modal value (3, 3, 5, 6, 6, 8) The mode is the value that occurs most frequently. A distribution with two most-common values is called a bimodal distribution. Slide 12 Measures of Variation Amount of spread or dispersion of the observations Example: Fluctuation of tumor sizes Again, the graphic at the bottom is a visual reminder. When you think of measures of variation, think about how the data spreads out along the observations. Slide 13 Measures of Variation Widely used measures of variation Range Standard Deviation (SD) The tools we use to measure variation are range and standard deviation. The standard deviation is a companion to the mean. Q: What type of measure is the mean? A: Central Tendency (typical values, clustering in the middle) Q: And, what type of measure is the standard deviation? A: Variation (spread, dispersion) So, the standard deviation expresses the spread of data about the mean. Make a mental note of this. You will see this concept again when we talk about the normal curve. Slide 14 Range Difference between the highest and lowest values Easiest measure of variation Greatly influenced by extreme values Highest # - Lowest # = 8 – 3 = 5 Easiest measure…it’s just simple subtraction. Q: What is the range in our example of tumor sizes? A: The highest number was 8. The lowest number was 3. 8-3 = 5. Q: What would happen to the range if we added an extreme value. Let’s use 20 again? A: 20 – 3 = 17 As you can see, range is greatly influenced by extreme values. Study Hint: An exam question may not be worded exactly as they are seen on the slides. It may be asked with a slight twist, so-to-speak. We’ve talked mostly in terms of measures that are most influenced by extreme values. A good exam question that comes to mind is: Q: Given a set of values, which is least likely to be influenced by an extreme value? Mean, Median, Mode, Range A: Median Slide 15 Standard Deviation How far the observations tend to vary from the mean Square root of variance _ ∑ (X – X)2 18 18 SD = √ --------------- = √ ------ = √ ---- = √4.5 = 2.12 (n-1) (5-1) 4 Standard Deviation….sounds and looks complicated, but the calculation is based on fairly simple mathematics. Q: First of all, how did we get 18 in the example above? A: Remember, we are still using our example of 5 tumor sizes (8,5,3,6,3) and the symbols used here are from our reference key.
Recommended publications
  • Projections of Education Statistics to 2022 Forty-First Edition
    Projections of Education Statistics to 2022 Forty-first Edition 20192019 20212021 20182018 20202020 20222022 NCES 2014-051 U.S. DEPARTMENT OF EDUCATION Projections of Education Statistics to 2022 Forty-first Edition FEBRUARY 2014 William J. Hussar National Center for Education Statistics Tabitha M. Bailey IHS Global Insight NCES 2014-051 U.S. DEPARTMENT OF EDUCATION U.S. Department of Education Arne Duncan Secretary Institute of Education Sciences John Q. Easton Director National Center for Education Statistics John Q. Easton Acting Commissioner The National Center for Education Statistics (NCES) is the primary federal entity for collecting, analyzing, and reporting data related to education in the United States and other nations. It fulfills a congressional mandate to collect, collate, analyze, and report full and complete statistics on the condition of education in the United States; conduct and publish reports and specialized analyses of the meaning and significance of such statistics; assist state and local education agencies in improving their statistical systems; and review and report on education activities in foreign countries. NCES activities are designed to address high-priority education data needs; provide consistent, reliable, complete, and accurate indicators of education status and trends; and report timely, useful, and high-quality data to the U.S. Department of Education, the Congress, the states, other education policymakers, practitioners, data users, and the general public. Unless specifically noted, all information contained herein is in the public domain. We strive to make our products available in a variety of formats and in language that is appropriate to a variety of audiences. You, as our customer, are the best judge of our success in communicating information effectively.
    [Show full text]
  • Use of Statistical Tables
    TUTORIAL | SCOPE USE OF STATISTICAL TABLES Lucy Radford, Jenny V Freeman and Stephen J Walters introduce three important statistical distributions: the standard Normal, t and Chi-squared distributions PREVIOUS TUTORIALS HAVE LOOKED at hypothesis testing1 and basic statistical tests.2–4 As part of the process of statistical hypothesis testing, a test statistic is calculated and compared to a hypothesised critical value and this is used to obtain a P- value. This P-value is then used to decide whether the study results are statistically significant or not. It will explain how statistical tables are used to link test statistics to P-values. This tutorial introduces tables for three important statistical distributions (the TABLE 1. Extract from two-tailed standard Normal, t and Chi-squared standard Normal table. Values distributions) and explains how to use tabulated are P-values corresponding them with the help of some simple to particular cut-offs and are for z examples. values calculated to two decimal places. STANDARD NORMAL DISTRIBUTION TABLE 1 The Normal distribution is widely used in statistics and has been discussed in z 0.00 0.01 0.02 0.03 0.050.04 0.05 0.06 0.07 0.08 0.09 detail previously.5 As the mean of a Normally distributed variable can take 0.00 1.0000 0.9920 0.9840 0.9761 0.9681 0.9601 0.9522 0.9442 0.9362 0.9283 any value (−∞ to ∞) and the standard 0.10 0.9203 0.9124 0.9045 0.8966 0.8887 0.8808 0.8729 0.8650 0.8572 0.8493 deviation any positive value (0 to ∞), 0.20 0.8415 0.8337 0.8259 0.8181 0.8103 0.8206 0.7949 0.7872 0.7795 0.7718 there are an infinite number of possible 0.30 0.7642 0.7566 0.7490 0.7414 0.7339 0.7263 0.7188 0.7114 0.7039 0.6965 Normal distributions.
    [Show full text]
  • The Magic of Randomization Versus the Myth of Real-World Evidence
    The new england journal of medicine Sounding Board The Magic of Randomization versus the Myth of Real-World Evidence Rory Collins, F.R.S., Louise Bowman, M.D., F.R.C.P., Martin Landray, Ph.D., F.R.C.P., and Richard Peto, F.R.S. Nonrandomized observational analyses of large safety and efficacy because the potential biases electronic patient databases are being promoted with respect to both can be appreciable. For ex- as an alternative to randomized clinical trials as ample, the treatment that is being assessed may a source of “real-world evidence” about the effi- well have been provided more or less often to cacy and safety of new and existing treatments.1-3 patients who had an increased or decreased risk For drugs or procedures that are already being of various health outcomes. Indeed, that is what used widely, such observational studies may in- would be expected in medical practice, since both volve exposure of large numbers of patients. the severity of the disease being treated and the Consequently, they have the potential to detect presence of other conditions may well affect the rare adverse effects that cannot plausibly be at- choice of treatment (often in ways that cannot be tributed to bias, generally because the relative reliably quantified). Even when associations of risk is large (e.g., Reye’s syndrome associated various health outcomes with a particular treat- with the use of aspirin, or rhabdomyolysis as- ment remain statistically significant after adjust- sociated with the use of statin therapy).4 Non- ment for all the known differences between pa- randomized clinical observation may also suf- tients who received it and those who did not fice to detect large beneficial effects when good receive it, these adjusted associations may still outcomes would not otherwise be expected (e.g., reflect residual confounding because of differ- control of diabetic ketoacidosis with insulin treat- ences in factors that were assessed only incom- ment, or the rapid shrinking of tumors with pletely or not at all (and therefore could not be chemotherapy).
    [Show full text]
  • (Meta-Analyses of Observational Studies in Epidemiology) Checklist
    MOOSE (Meta-analyses Of Observational Studies in Epidemiology) Checklist A reporting checklist for Authors, Editors, and Reviewers of Meta-analyses of Observational Studies. You must report the page number in your manuscript where you consider each of the items listed in this checklist. If you have not included this information, either revise your manuscript accordingly before submitting or note N/A. Reporting Criteria Reported (Yes/No) Reported on Page No. Reporting of Background Problem definition Hypothesis statement Description of Study Outcome(s) Type of exposure or intervention used Type of study design used Study population Reporting of Search Strategy Qualifications of searchers (eg, librarians and investigators) Search strategy, including time period included in the synthesis and keywords Effort to include all available studies, including contact with authors Databases and registries searched Search software used, name and version, including special features used (eg, explosion) Use of hand searching (eg, reference lists of obtained articles) List of citations located and those excluded, including justification Method for addressing articles published in languages other than English Method of handling abstracts and unpublished studies Description of any contact with authors Reporting of Methods Description of relevance or appropriateness of studies assembled for assessing the hypothesis to be tested Rationale for the selection and coding of data (eg, sound clinical principles or convenience) Documentation of how data were classified
    [Show full text]
  • Quasi-Experimental Studies in the Fields of Infection Control and Antibiotic Resistance, Ten Years Later: a Systematic Review
    HHS Public Access Author manuscript Author ManuscriptAuthor Manuscript Author Infect Control Manuscript Author Hosp Epidemiol Manuscript Author . Author manuscript; available in PMC 2019 November 12. Published in final edited form as: Infect Control Hosp Epidemiol. 2018 February ; 39(2): 170–176. doi:10.1017/ice.2017.296. Quasi-experimental Studies in the Fields of Infection Control and Antibiotic Resistance, Ten Years Later: A Systematic Review Rotana Alsaggaf, MS, Lyndsay M. O’Hara, PhD, MPH, Kristen A. Stafford, PhD, MPH, Surbhi Leekha, MBBS, MPH, Anthony D. Harris, MD, MPH, CDC Prevention Epicenters Program Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore, Maryland. Abstract OBJECTIVE.—A systematic review of quasi-experimental studies in the field of infectious diseases was published in 2005. The aim of this study was to assess improvements in the design and reporting of quasi-experiments 10 years after the initial review. We also aimed to report the statistical methods used to analyze quasi-experimental data. DESIGN.—Systematic review of articles published from January 1, 2013, to December 31, 2014, in 4 major infectious disease journals. METHODS.—Quasi-experimental studies focused on infection control and antibiotic resistance were identified and classified based on 4 criteria: (1) type of quasi-experimental design used, (2) justification of the use of the design, (3) use of correct nomenclature to describe the design, and (4) statistical methods used. RESULTS.—Of 2,600 articles, 173 (7%) featured a quasi-experimental design, compared to 73 of 2,320 articles (3%) in the previous review (P<.01). Moreover, 21 articles (12%) utilized a study design with a control group; 6 (3.5%) justified the use of a quasi-experimental design; and 68 (39%) identified their design using the correct nomenclature.
    [Show full text]
  • Epidemiology and Biostatistics (EPBI) 1
    Epidemiology and Biostatistics (EPBI) 1 Epidemiology and Biostatistics (EPBI) Courses EPBI 2219. Biostatistics and Public Health. 3 Credit Hours. This course is designed to provide students with a solid background in applied biostatistics in the field of public health. Specifically, the course includes an introduction to the application of biostatistics and a discussion of key statistical tests. Appropriate techniques to measure the extent of disease, the development of disease, and comparisons between groups in terms of the extent and development of disease are discussed. Techniques for summarizing data collected in samples are presented along with limited discussion of probability theory. Procedures for estimation and hypothesis testing are presented for means, for proportions, and for comparisons of means and proportions in two or more groups. Multivariable statistical methods are introduced but not covered extensively in this undergraduate course. Public Health majors, minors or students studying in the Public Health concentration must complete this course with a C or better. Level Registration Restrictions: May not be enrolled in one of the following Levels: Graduate. Repeatability: This course may not be repeated for additional credits. EPBI 2301. Public Health without Borders. 3 Credit Hours. Public Health without Borders is a course that will introduce you to the world of disease detectives to solve public health challenges in glocal (i.e., global and local) communities. You will learn about conducting disease investigations to support public health actions relevant to affected populations. You will discover what it takes to become a field epidemiologist through hands-on activities focused on promoting health and preventing disease in diverse populations across the globe.
    [Show full text]
  • On the Meaning and Use of Kurtosis
    Psychological Methods Copyright 1997 by the American Psychological Association, Inc. 1997, Vol. 2, No. 3,292-307 1082-989X/97/$3.00 On the Meaning and Use of Kurtosis Lawrence T. DeCarlo Fordham University For symmetric unimodal distributions, positive kurtosis indicates heavy tails and peakedness relative to the normal distribution, whereas negative kurtosis indicates light tails and flatness. Many textbooks, however, describe or illustrate kurtosis incompletely or incorrectly. In this article, kurtosis is illustrated with well-known distributions, and aspects of its interpretation and misinterpretation are discussed. The role of kurtosis in testing univariate and multivariate normality; as a measure of departures from normality; in issues of robustness, outliers, and bimodality; in generalized tests and estimators, as well as limitations of and alternatives to the kurtosis measure [32, are discussed. It is typically noted in introductory statistics standard deviation. The normal distribution has a kur- courses that distributions can be characterized in tosis of 3, and 132 - 3 is often used so that the refer- terms of central tendency, variability, and shape. With ence normal distribution has a kurtosis of zero (132 - respect to shape, virtually every textbook defines and 3 is sometimes denoted as Y2)- A sample counterpart illustrates skewness. On the other hand, another as- to 132 can be obtained by replacing the population pect of shape, which is kurtosis, is either not discussed moments with the sample moments, which gives or, worse yet, is often described or illustrated incor- rectly. Kurtosis is also frequently not reported in re- ~(X i -- S)4/n search articles, in spite of the fact that virtually every b2 (•(X i - ~')2/n)2' statistical package provides a measure of kurtosis.
    [Show full text]
  • The Probability Lifesaver: Order Statistics and the Median Theorem
    The Probability Lifesaver: Order Statistics and the Median Theorem Steven J. Miller December 30, 2015 Contents 1 Order Statistics and the Median Theorem 3 1.1 Definition of the Median 5 1.2 Order Statistics 10 1.3 Examples of Order Statistics 15 1.4 TheSampleDistributionoftheMedian 17 1.5 TechnicalboundsforproofofMedianTheorem 20 1.6 TheMedianofNormalRandomVariables 22 2 • Greetings again! In this supplemental chapter we develop the theory of order statistics in order to prove The Median Theorem. This is a beautiful result in its own, but also extremely important as a substitute for the Central Limit Theorem, and allows us to say non- trivial things when the CLT is unavailable. Chapter 1 Order Statistics and the Median Theorem The Central Limit Theorem is one of the gems of probability. It’s easy to use and its hypotheses are satisfied in a wealth of problems. Many courses build towards a proof of this beautiful and powerful result, as it truly is ‘central’ to the entire subject. Not to detract from the majesty of this wonderful result, however, what happens in those instances where it’s unavailable? For example, one of the key assumptions that must be met is that our random variables need to have finite higher moments, or at the very least a finite variance. What if we were to consider sums of Cauchy random variables? Is there anything we can say? This is not just a question of theoretical interest, of mathematicians generalizing for the sake of generalization. The following example from economics highlights why this chapter is more than just of theoretical interest.
    [Show full text]
  • Guidelines for Reporting Meta-Epidemiological Methodology Research
    EBM Primer Evid Based Med: first published as 10.1136/ebmed-2017-110713 on 12 July 2017. Downloaded from Guidelines for reporting meta-epidemiological methodology research Mohammad Hassan Murad, Zhen Wang 10.1136/ebmed-2017-110713 Abstract The goal is generally broad but often focuses on exam- Published research should be reported to evidence users ining the impact of certain characteristics of clinical studies on the observed effect, describing the distribu- Evidence-Based Practice with clarity and transparency that facilitate optimal tion of research evidence in a specific setting, exam- Center, Mayo Clinic, Rochester, appraisal and use of evidence and allow replication Minnesota, USA by other researchers. Guidelines for such reporting ining heterogeneity and exploring its causes, identifying are available for several types of studies but not for and describing plausible biases and providing empirical meta-epidemiological methodology studies. Meta- evidence for hypothesised associations. Unlike classic Correspondence to: epidemiological studies adopt a systematic review epidemiology, the unit of analysis for meta-epidemio- Dr Mohammad Hassan Murad, or meta-analysis approach to examine the impact logical studies is a study, not a patient. The outcomes Evidence-based Practice Center, of certain characteristics of clinical studies on the of meta-epidemiological studies are usually not clinical Mayo Clinic, 200 First Street 6–8 observed effect and provide empirical evidence for outcomes. SW, Rochester, MN 55905, USA; hypothesised associations. The unit of analysis in meta- In this guide, we adapt the items used in the PRISMA murad. mohammad@ mayo. edu 9 epidemiological studies is a study, not a patient. The statement for reporting systematic reviews and outcomes of meta-epidemiological studies are usually meta-analysis to fit the setting of meta- epidemiological not clinical outcomes.
    [Show full text]
  • Simple Linear Regression: Straight Line Regression Between an Outcome Variable (Y ) and a Single Explanatory Or Predictor Variable (X)
    1 Introduction to Regression \Regression" is a generic term for statistical methods that attempt to fit a model to data, in order to quantify the relationship between the dependent (outcome) variable and the predictor (independent) variable(s). Assuming it fits the data reasonable well, the estimated model may then be used either to merely describe the relationship between the two groups of variables (explanatory), or to predict new values (prediction). There are many types of regression models, here are a few most common to epidemiology: Simple Linear Regression: Straight line regression between an outcome variable (Y ) and a single explanatory or predictor variable (X). E(Y ) = α + β × X Multiple Linear Regression: Same as Simple Linear Regression, but now with possibly multiple explanatory or predictor variables. E(Y ) = α + β1 × X1 + β2 × X2 + β3 × X3 + ::: A special case is polynomial regression. 2 3 E(Y ) = α + β1 × X + β2 × X + β3 × X + ::: Generalized Linear Model: Same as Multiple Linear Regression, but with a possibly transformed Y variable. This introduces considerable flexibil- ity, as non-linear and non-normal situations can be easily handled. G(E(Y )) = α + β1 × X1 + β2 × X2 + β3 × X3 + ::: In general, the transformation function G(Y ) can take any form, but a few forms are especially common: • Taking G(Y ) = logit(Y ) describes a logistic regression model: E(Y ) log( ) = α + β × X + β × X + β × X + ::: 1 − E(Y ) 1 1 2 2 3 3 2 • Taking G(Y ) = log(Y ) is also very common, leading to Poisson regression for count data, and other so called \log-linear" models.
    [Show full text]
  • Testing Hypotheses
    Chapter 7 Testing Hypotheses Chapter Learning Objectives Understanding the assumptions of statistical hypothesis testing Defining and applying the components in hypothesis testing: the research and null hypotheses, sampling distribution, and test statistic Understanding what it means to reject or fail to reject a null hypothesis Applying hypothesis testing to two sample cases, with means or proportions n the past, the increase in the price of gasoline could be attributed to major national or global event, such as the Lebanon and Israeli war or Hurricane Katrina. However, in 2005, the price for a Igallon of regular gasoline reached $3.00 and remained high for a long time afterward. The impact of unpredictable fuel prices is still felt across the nation, but the burden is greater among distinct social economic groups and geographic areas. Lower-income Americans spend eight times more of their disposable income on gasoline than wealthier Americans do.1 For example, in Wilcox, Alabama, individuals spend 12.72% of their income to fuel one vehicle, while in Hunterdon Co., New Jersey, people spend 1.52%. Nationally, Americans spend 3.8% of their income fueling one vehicle. The first state to reach the $3.00-per-gallon milestone was California in 2005. California’s drivers were especially hit hard by the rising price of gas, due in part to their reliance on automobiles, especially for work commuters. Analysts predicted that gas prices would continue to rise nationally. Declines in consumer spending and confidence in the economy have been attributed in part to the high (and rising) cost of gasoline. In 2010, gasoline prices have remained higher for states along the West Coast, particularly in Alaska and California.
    [Show full text]
  • Tests of Hypotheses Using Statistics
    Tests of Hypotheses Using Statistics Adam Massey¤and Steven J. Millery Mathematics Department Brown University Providence, RI 02912 Abstract We present the various methods of hypothesis testing that one typically encounters in a mathematical statistics course. The focus will be on conditions for using each test, the hypothesis tested by each test, and the appropriate (and inappropriate) ways of using each test. We conclude by summarizing the di®erent tests (what conditions must be met to use them, what the test statistic is, and what the critical region is). Contents 1 Types of Hypotheses and Test Statistics 2 1.1 Introduction . 2 1.2 Types of Hypotheses . 3 1.3 Types of Statistics . 3 2 z-Tests and t-Tests 5 2.1 Testing Means I: Large Sample Size or Known Variance . 5 2.2 Testing Means II: Small Sample Size and Unknown Variance . 9 3 Testing the Variance 12 4 Testing Proportions 13 4.1 Testing Proportions I: One Proportion . 13 4.2 Testing Proportions II: K Proportions . 15 4.3 Testing r £ c Contingency Tables . 17 4.4 Incomplete r £ c Contingency Tables Tables . 18 5 Normal Regression Analysis 19 6 Non-parametric Tests 21 6.1 Tests of Signs . 21 6.2 Tests of Ranked Signs . 22 6.3 Tests Based on Runs . 23 ¤E-mail: [email protected] yE-mail: [email protected] 1 7 Summary 26 7.1 z-tests . 26 7.2 t-tests . 27 7.3 Tests comparing means . 27 7.4 Variance Test . 28 7.5 Proportions . 28 7.6 Contingency Tables .
    [Show full text]