Introduction to Biostatistics - Lecture 3: Statistical Inference for Proportions

Total Page:16

File Type:pdf, Size:1020Kb

Introduction to Biostatistics - Lecture 3: Statistical Inference for Proportions University of Massachusetts Medical School eScholarship@UMMS PEER Liberia Project UMass Medical School Collaborations in Liberia 2019-2 Introduction to Biostatistics - Lecture 3: Statistical Inference for Proportions Jonggyu Baek University of Massachusetts Medical School Let us know how access to this document benefits ou.y Follow this and additional works at: https://escholarship.umassmed.edu/liberia_peer Part of the Biostatistics Commons, Family Medicine Commons, Infectious Disease Commons, Medical Education Commons, and the Public Health Commons Repository Citation Baek J. (2019). Introduction to Biostatistics - Lecture 3: Statistical Inference for Proportions. PEER Liberia Project. https://doi.org/10.13028/9g0p-am33. Retrieved from https://escholarship.umassmed.edu/ liberia_peer/11 This material is brought to you by eScholarship@UMMS. It has been accepted for inclusion in PEER Liberia Project by an authorized administrator of eScholarship@UMMS. For more information, please contact [email protected]. Introduction to Biostatistics 2/29/2019 Jonggyu Baek, PhD Lecture 3: Statistical Inference for proportions 2 CTS605A - Lecture Notes, Jonggyu Baek, PhD Statistical Inference Two broad areas of statistical inference: • Estimation: Use sample statistics to estimate the unknown population parameter. – Point Estimate: the best single value to describe the unknown parameter. – Standard Error (SE): standard deviation of the sample statistic. Indicates how precise is the point estimate. – Confidence Interval (CI): the range with the most probable values for the unknown parameter with a (1-α)% level of confidence. • Hypothesis Testing: Test a specific statement (assumption) about the unknown parameter. 3 CTS605A - Lecture Notes, Jonggyu Baek, PhD Statistical Inference for proportions Suppose X: discrete (binary) variable with: 1, event A with probability p X=ቊ 0, otherwise with probability 1 − p We are interested in estimating the probability p of the event A in a population of size N: # of favorable outcomes = # of successes (1) P(event A) = sample space Total # of units in the population N Suppose Y = # of successes = σi=1 Xi. Then Y ~ Binomial(N, p) σN X P(event A) = i=1 i N 4 CTS605A - Lecture Notes, Jonggyu Baek, PhD Statistical Inference for proportions Suppose X: discrete (binary) variable with: 1, event A with probability p X=ቊ 0, otherwise with probability 1 − p We are interested in estimating the probability p of the event A in a population of size N: # of favorable outcomes = # of successes (1) P(event A) = sample space Total # of units in the population N Suppose Y = # of successes = σi=1 Xi. Then Y ~ Binomial(N, p) σN X P(event A) = i=1 i N 5 CTS605A - Lecture Notes, Jonggyu Baek, PhD Statistical Inference for proportions Suppose X: discrete (binary) variable with: 1, event A with probability p X=ቊ 0, otherwise with probability 1 − p We are interested in estimating the probability p of the event A in a population of size N: # of favorable outcomes = # of successes (1) P(event A) = sample space Total # of units in the population N Suppose Y = # of successes = σi=1 Xi. Then Y ~ Binomial(N, p) σN X P(event A) = i=1 i N 6 CTS605A - Lecture Notes, Jonggyu Baek, PhD Statistical Inference for proportions Suppose X: discrete (binary) variable with: 1, event A with probability p X=ቊ 0, otherwise with probability 1 − p We are interested in estimating the probability p of the event A in a population of size N: # of favorable outcomes = # of successes (1) P(event A) = sample space Total # of units in the population N Suppose Y = # of successes = σi=1 Xi. Then Y ~ Binomial(N, p) σN X P(event A) = i=1 i N 7 CTS605A - Lecture Notes, Jonggyu Baek, PhD Statistical Inference for proportions Suppose X: discrete (binary) variable with: 1, event A with probability p X=ቊ 0, otherwise with probability 1 − p We are interested in estimating the probability p of the event A in a population of size N: # of favorable outcomes = # of successes (1) P(event A) = sample space Total # of units in the population N Suppose Y = # of successes = σi=1 Xi. Then Y ~ Binomial(N, p) What is this? σN X P(event A) = i=1 i N 8 CTS605A - Lecture Notes, Jonggyu Baek, PhD Statistical Inference for proportions Suppose X: discrete (binary) variable with: 1, event A with probability p X=ቊ 0, otherwise with probability 1 − p We are interested in estimating the probability p of the event A in a population of size N: # of favorable outcomes = # of successes (1) P(event A) = sample space Total # of units in the population N Suppose Y = # of successes = σi=1 Xi. Then Y ~ Binomial(N, p) What is this? σN X P(event A) = i=1 i = p • A proportion N • A population mean Hence, all the statistical inference procedures we learned about the means also apply for proportions. 9 CTS605A - Lecture Notes, Jonggyu Baek, PhD Statistical Inference for proportions • Case 1: single population (one-sample) • Case 2: two-independent populations (two-samples) • Case 3: two-dependent populations (paired or matched samples) 10 CTS605A - Lecture Notes, Jonggyu Baek, PhD One Sample Case 1: single population (one-sample) Suppose X=‘stroke’ from a population with mean μ=p and standard deviation σ= 퐩(ퟏ − 퐩). Estimation • Point Estimates: – of p: xത = pො – of σ: s = pො(1 − pො) pෝ(1−pෝ) – precision of xത : standard error (s.e.) of pො→ n • (1-α)% CI: pෝ(1−pෝ) pෝ(1−pෝ) [ pො- Z ( ) , pො+ Z ( )] 1-α/2 n 1-α/2 n 11 CTS605A - Lecture Notes, Jonggyu Baek, PhD One Sample Case 1: single population (one-sample) Suppose X=‘stroke’ from a population with mean μ=p and standard deviation σ= 퐩(ퟏ − 퐩). • Point Estimates: • If the binary variable is code as [“1”=yes, “0”=no] we can also calculate the mean: 12 One Sample Case 1: single population (one-sample) Suppose X=‘stroke’ from a population with mean μ=p and standard deviation σ= 퐩(ퟏ − 퐩). • Point Estimates: • If the binary variable is code as [“1”=yes, “0”=no] we can also calculate the mean: 퐩ෝ 13 One Sample Case 1: single population (one-sample) Suppose X=‘stroke’ from a population with mean μ=p and standard deviation σ= 퐩(ퟏ − 퐩). • What about σ= 퐩(ퟏ − 퐩) ?: 14 One Sample Case 1: single population (one-sample) Suppose X=‘stroke’ from a population with mean μ=p and standard deviation σ= 퐩(ퟏ − 퐩). Hypothesis Testing • Null hypothesis (H0): p=p0 • Alternative hypothesis (H1): – p p0 (two-sided test), or Why? – p < p0 (one-sided test), or – p > p0 (one-sided test) H0 pෝ−p0 • Test statistic: Z0 = ~ N(0, 1) pෝ(1−pෝ) n 15 CTS605A - Lecture Notes, Jonggyu Baek, PhD One Sample Case 1: single population (one-sample) Suppose X=‘stroke’ from a population with mean μ=p and standard deviation σ= 퐩(ퟏ − 퐩). Hypothesis Testing • Null hypothesis (H0): p=p0 • Alternative hypothesis (H1): – p p0 (two-sided test), or Why? – p < p0 (one-sided test), or From the CLT: σ – p > p (one-sided test) xത N(μ, ) 0 n H0 i.e., pෝ−p0 • Test statistic: Z = ~ N(0, 1) p(1−p) 0 pො N(p, ) pෝ(1−pෝ) n n 16 CTS605A - Lecture Notes, Jonggyu Baek, PhD One Sample Case 1: single population (one-sample) Suppose X=‘stroke’ from a population with mean μ=p and standard deviation σ= 퐩(ퟏ − 퐩). Hypothesis Testing H pෝ−p 0 • Test statistic: Z = 0 ~ N(0, 1) 0 pෝ(1−pෝ) n • Decision Rules by H1: Testing H0: p=p0 vs : H1 Reject H0 if: p p0 Z0 < Zα/2 or Z0 > Z1-α/2 p < p0 Z0 < Zα p > p0 Z0 > Z1-α 17 CTS605A - Lecture Notes, Jonggyu Baek, PhD One Sample • Case 1: single population (one-sample) Example (FHS): – Calculate 95% CI for the proportion of strokes in the population – Test whether this proportion is not different from 0.12=12% 18 CTS605A - Lecture Notes, Jonggyu Baek, PhD One Sample Case 1: single population (one-sample) Suppose X=‘stroke’ from a population with mean μ=p and standard deviation σ= 퐩(ퟏ − 퐩). Hypothesis Testing • Testing H0: p=p0=12%=0.12 19 CTS605A - Lecture Notes, Jonggyu Baek, PhD One Sample • Case 1: single population (one-sample) X: discrete (binary) variable (e.g., ‘stroke’) Statistical Inference about: p (Proportion of strokes in the population) ESTIMATION HYPOTHESIS TESTING (H0: ρ=ρ0) pෝ −p Point Estimate pො Test Statistic 0 Z0 = pෝ(1−pෝ) ) n Standard Error Decision rules pො(1 − pො) Reject H0 n against H1: (1-α)% CI pෝ(1−pෝ) μ μ0 Z0 < Zα/2 or Z0 > Z1-α/2 pො Z1-α/2 ( ) n μ < μ0 Z0<Zα μ > μ0 Z0>Z1-α 20 CTS605A - Lecture Notes, Jonggyu Baek, PhD Two Independent Samples • Case 2: two-independent populations (two-samples) Suppose Y=‘stroke’ and X=‘prevchd’ (both binary variables). There are two-independent populations, one with coronary heart disease (chd) and the other without chd. We want to compare proportions of strokes between those two populations. • p1 is the proportion of strokes in the population with CHD. • p2 is the proportion of strokes in the population without CHD. 21 CTS605A - Lecture Notes, Jonggyu Baek, PhD Two Independent Samples • Case 2: two-independent populations (two-samples) Y=‘stroke’ and X=‘prevchd’ (both binary variables) Statistical Inference about: p1-p2 (compare proportions of strokes between the two populations) ESTIMATION HYPOTHESIS TESTING (H0: p1-p2=0) Point Estimate pො -pො Test Statistic pෝ1−pෝ2 1 2 Z = s.e. Standard Error Decision rules (for ‘small’ n1, n2 use the pො1(1 − pො1) pො2(1 − pො2) Binomial distribution – + Reject H0 n1 n2 exact test) against H1: (1-α)% CI pො1−pො2 Z1-α/2 s.e.
Recommended publications
  • Statistical Inferences Hypothesis Tests
    STATISTICAL INFERENCES HYPOTHESIS TESTS Maªgorzata Murat BASIC IDEAS Suppose we have collected a representative sample that gives some information concerning a mean or other statistical quantity. There are two main questions for which statistical inference may provide answers. 1. Are the sample quantity and a corresponding population quantity close enough together so that it is reasonable to say that the sample might have come from the population? Or are they far enough apart so that they likely represent dierent populations? 2. For what interval of values can we have a specic level of condence that the interval contains the true value of the parameter of interest? STATISTICAL INFERENCES FOR THE MEAN We can divide statistical inferences for the mean into two main categories. In one category, we already know the variance or standard deviation of the population, usually from previous measurements, and the normal distribution can be used for calculations. In the other category, we nd an estimate of the variance or standard deviation of the population from the sample itself. The normal distribution is assumed to apply to the underlying population, but another distribution related to it will usually be required for calculations. TEST OF HYPOTHESIS We are testing the hypothesis that a sample is similar enough to a particular population so that it might have come from that population. Hence we make the null hypothesis that the sample came from a population having the stated value of the population characteristic (the mean or the variation). Then we do calculations to see how reasonable such a hypothesis is. We have to keep in mind the alternative if the null hypothesis is not true, as the alternative will aect the calculations.
    [Show full text]
  • Receiver Operating Characteristic (ROC) Curve: Practical Review for Radiologists
    Receiver Operating Characteristic (ROC) Curve: Practical Review for Radiologists Seong Ho Park, MD1 The receiver operating characteristic (ROC) curve, which is defined as a plot of Jin Mo Goo, MD1 test sensitivity as the y coordinate versus its 1-specificity or false positive rate Chan-Hee Jo, PhD2 (FPR) as the x coordinate, is an effective method of evaluating the performance of diagnostic tests. The purpose of this article is to provide a nonmathematical introduction to ROC analysis. Important concepts involved in the correct use and interpretation of this analysis, such as smooth and empirical ROC curves, para- metric and nonparametric methods, the area under the ROC curve and its 95% confidence interval, the sensitivity at a particular FPR, and the use of a partial area under the ROC curve are discussed. Various considerations concerning the collection of data in radiological ROC studies are briefly discussed. An introduc- tion to the software frequently used for performing ROC analyses is also present- ed. he receiver operating characteristic (ROC) curve, which is defined as a Index terms: plot of test sensitivity as the y coordinate versus its 1-specificity or false Diagnostic radiology positive rate (FPR) as the x coordinate, is an effective method of evaluat- Receiver operating characteristic T (ROC) curve ing the quality or performance of diagnostic tests, and is widely used in radiology to Software reviews evaluate the performance of many radiological tests. Although one does not necessar- Statistical analysis ily need to understand the complicated mathematical equations and theories of ROC analysis, understanding the key concepts of ROC analysis is a prerequisite for the correct use and interpretation of the results that it provides.
    [Show full text]
  • The Focused Information Criterion in Logistic Regression to Predict Repair of Dental Restorations
    THE FOCUSED INFORMATION CRITERION IN LOGISTIC REGRESSION TO PREDICT REPAIR OF DENTAL RESTORATIONS Cecilia CANDOLO1 ABSTRACT: Statistical data analysis typically has several stages: exploration of the data set; deciding on a class or classes of models to be considered; selecting the best of them according to some criterion and making inferences based on the selected model. The cycle is usually iterative and will involve subject-matter considerations as well as statistical insights. The conclusion reached after such a process depends on the model(s) selected, but the consequent uncertainty is not usually incorporated into the inference. This may lead to underestimation of the uncertainty about quantities of interest and overoptimistic and biased inferences. This framework has been the aim of research under the terminology of model uncertainty and model averanging in both, frequentist and Bayesian approaches. The former usually uses the Akaike´s information criterion (AIC), the Bayesian information criterion (BIC) and the bootstrap method. The last weigths the models using the posterior model probabilities. This work consider model selection uncertainty in logistic regression under frequentist and Bayesian approaches, incorporating the use of the focused information criterion (FIC) (CLAESKENS and HJORT, 2003) to predict repair of dental restorations. The FIC takes the view that a best model should depend on the parameter under focus, such as the mean, or the variance, or the particular covariate values. In this study, the repair or not of dental restorations in a period of eighteen months depends on several covariates measured in teenagers. The data were kindly provided by Juliana Feltrin de Souza, a doctorate student at the Faculty of Dentistry of Araraquara - UNESP.
    [Show full text]
  • Epidemiology and Biostatistics (EPBI) 1
    Epidemiology and Biostatistics (EPBI) 1 Epidemiology and Biostatistics (EPBI) Courses EPBI 2219. Biostatistics and Public Health. 3 Credit Hours. This course is designed to provide students with a solid background in applied biostatistics in the field of public health. Specifically, the course includes an introduction to the application of biostatistics and a discussion of key statistical tests. Appropriate techniques to measure the extent of disease, the development of disease, and comparisons between groups in terms of the extent and development of disease are discussed. Techniques for summarizing data collected in samples are presented along with limited discussion of probability theory. Procedures for estimation and hypothesis testing are presented for means, for proportions, and for comparisons of means and proportions in two or more groups. Multivariable statistical methods are introduced but not covered extensively in this undergraduate course. Public Health majors, minors or students studying in the Public Health concentration must complete this course with a C or better. Level Registration Restrictions: May not be enrolled in one of the following Levels: Graduate. Repeatability: This course may not be repeated for additional credits. EPBI 2301. Public Health without Borders. 3 Credit Hours. Public Health without Borders is a course that will introduce you to the world of disease detectives to solve public health challenges in glocal (i.e., global and local) communities. You will learn about conducting disease investigations to support public health actions relevant to affected populations. You will discover what it takes to become a field epidemiologist through hands-on activities focused on promoting health and preventing disease in diverse populations across the globe.
    [Show full text]
  • Biostatistics (EHCA 5367)
    The University of Texas at Tyler Executive Health Care Administration MPA Program Spring 2019 COURSE NUMBER EHCA 5367 COURSE TITLE Biostatistics INSTRUCTOR Thomas Ross, PhD INSTRUCTOR INFORMATION email - [email protected] – this one is checked once a week [email protected] – this one is checked Monday thru Friday Phone - 828.262.7479 REQUIRED TEXT: Wayne Daniel and Chad Cross, Biostatistics, 10th ed., Wiley, 2013, ISBN: 978-1- 118-30279-8 COURSE DESCRIPTION This course is designed to teach students the application of statistical methods to the decision-making process in health services administration. Emphasis is placed on understanding the principles in the meaning and use of statistical analyses. Topics discussed include a review of descriptive statistics, the standard normal distribution, sampling distributions, t-tests, ANOVA, simple and multiple regression, and chi-square analysis. Students will use Microsoft Excel or other appropriate computer software in the completion of assignments. COURSE LEARNING OBJECTIVES Upon completion of this course, the student will be able to: 1. Use descriptive statistics and graphs to describe data 2. Understand basic principles of probability, sampling distributions, and binomial, Poisson, and normal distributions 3. Formulate research questions and hypotheses 4. Run and interpret t-tests and analyses of variance (ANOVA) 5. Run and interpret regression analyses 6. Run and interpret chi-square analyses GRADING POLICY Students will be graded on homework, midterm, and final exam, see weights below. ATTENDANCE/MAKE UP POLICY All students are expected to attend all of the on-campus sessions. Students are expected to participate in all online discussions in a substantive manner. If circumstances arise that a student is unable to attend a portion of the on-campus session or any of the online discussions, special arrangements must be made with the instructor in advance for make-up activity.
    [Show full text]
  • Biostatistics
    Biostatistics As one of the nation’s premier centers of biostatistical research, the Department of Biostatistics is dedicated to addressing high priority public health and biomedical issues and driving scientific discovery through specialized analytic techniques and catalyzing interdisciplinary research. Our world-renowned experts are engaged in developing novel statistical methods to break new ground in current research frontiers. The Department has become a leader in mission the analysis and interpretation of complex data in genetics, brain science, clinical trials, and personalized medicine. Our mission is to improve disease prevention, In these and other areas, our faculty play key roles in diagnosis, treatment, and the public’s health by public health, mental health, social science, and biomedical developing new theoretical results and applied research as collaborators and conveners. Current research statistical methods, collaborating with scientists in is wide-ranging. designing informative experiments and analyzing their data to weigh evidence and obtain valid Examples of our work: findings, and educating the next generation of biostatisticians through an innovative and relevant • Researching efficient statistical methods using large-scale curriculum, hands-on experience, and exposure biomarker data for disease risk prediction, informing to the latest in research findings and methods. clinical trial designs, discovery of personalized treatment regimes, and guiding new and more effective interventions history • Analyzing data to
    [Show full text]
  • Biostatistics (BIOSTAT) 1
    Biostatistics (BIOSTAT) 1 This course covers practical aspects of conducting a population- BIOSTATISTICS (BIOSTAT) based research study. Concepts include determining a study budget, setting a timeline, identifying study team members, setting a strategy BIOSTAT 301-0 Introduction to Epidemiology (1 Unit) for recruitment and retention, developing a data collection protocol This course introduces epidemiology and its uses for population health and monitoring data collection to ensure quality control and quality research. Concepts include measures of disease occurrence, common assurance. Students will demonstrate these skills by engaging in a sources and types of data, important study designs, sources of error in quarter-long group project to draft a Manual of Operations for a new epidemiologic studies and epidemiologic methods. "mock" population study. BIOSTAT 302-0 Introduction to Biostatistics (1 Unit) BIOSTAT 429-0 Systematic Review and Meta-Analysis in the Medical This course introduces principles of biostatistics and applications Sciences (1 Unit) of statistical methods in health and medical research. Concepts This course covers statistical methods for meta-analysis. Concepts include descriptive statistics, basic probability, probability distributions, include fixed-effects and random-effects models, measures of estimation, hypothesis testing, correlation and simple linear regression. heterogeneity, prediction intervals, meta regression, power assessment, BIOSTAT 303-0 Probability (1 Unit) subgroup analysis and assessment of publication
    [Show full text]
  • What Is Statistic?
    What is Statistic? OPRE 6301 In today’s world. ...we are constantly being bombarded with statistics and statistical information. For example: Customer Surveys Medical News Demographics Political Polls Economic Predictions Marketing Information Sales Forecasts Stock Market Projections Consumer Price Index Sports Statistics How can we make sense out of all this data? How do we differentiate valid from flawed claims? 1 What is Statistics?! “Statistics is a way to get information from data.” Statistics Data Information Data: Facts, especially Information: Knowledge numerical facts, collected communicated concerning together for reference or some particular fact. information. Statistics is a tool for creating an understanding from a set of numbers. Humorous Definitions: The Science of drawing a precise line between an unwar- ranted assumption and a forgone conclusion. The Science of stating precisely what you don’t know. 2 An Example: Stats Anxiety. A business school student is anxious about their statistics course, since they’ve heard the course is difficult. The professor provides last term’s final exam marks to the student. What can be discerned from this list of numbers? Statistics Data Information List of last term’s marks. New information about the statistics class. 95 89 70 E.g. Class average, 65 Proportion of class receiving A’s 78 Most frequent mark, 57 Marks distribution, etc. : 3 Key Statistical Concepts. Population — a population is the group of all items of interest to a statistics practitioner. — frequently very large; sometimes infinite. E.g. All 5 million Florida voters (per Example 12.5). Sample — A sample is a set of data drawn from the population.
    [Show full text]
  • Structured Statistical Models of Inductive Reasoning
    CORRECTED FEBRUARY 25, 2009; SEE LAST PAGE Psychological Review © 2009 American Psychological Association 2009, Vol. 116, No. 1, 20–58 0033-295X/09/$12.00 DOI: 10.1037/a0014282 Structured Statistical Models of Inductive Reasoning Charles Kemp Joshua B. Tenenbaum Carnegie Mellon University Massachusetts Institute of Technology Everyday inductive inferences are often guided by rich background knowledge. Formal models of induction should aim to incorporate this knowledge and should explain how different kinds of knowledge lead to the distinctive patterns of reasoning found in different inductive contexts. This article presents a Bayesian framework that attempts to meet both goals and describe 4 applications of the framework: a taxonomic model, a spatial model, a threshold model, and a causal model. Each model makes probabi- listic inferences about the extensions of novel properties, but the priors for the 4 models are defined over different kinds of structures that capture different relationships between the categories in a domain. The framework therefore shows how statistical inference can operate over structured background knowledge, and the authors argue that this interaction between structure and statistics is critical for explaining the power and flexibility of human reasoning. Keywords: inductive reasoning, property induction, knowledge representation, Bayesian inference Humans are adept at making inferences that take them beyond This article describes a formal approach to inductive inference the limits of their direct experience.
    [Show full text]
  • Good Statistical Practices for Contemporary Meta-Analysis: Examples Based on a Systematic Review on COVID-19 in Pregnancy
    Review Good Statistical Practices for Contemporary Meta-Analysis: Examples Based on a Systematic Review on COVID-19 in Pregnancy Yuxi Zhao and Lifeng Lin * Department of Statistics, Florida State University, Tallahassee, FL 32306, USA; [email protected] * Correspondence: [email protected] Abstract: Systematic reviews and meta-analyses have been increasingly used to pool research find- ings from multiple studies in medical sciences. The reliability of the synthesized evidence depends highly on the methodological quality of a systematic review and meta-analysis. In recent years, several tools have been developed to guide the reporting and evidence appraisal of systematic reviews and meta-analyses, and much statistical effort has been paid to improve their methodological quality. Nevertheless, many contemporary meta-analyses continue to employ conventional statis- tical methods, which may be suboptimal compared with several alternative methods available in the evidence synthesis literature. Based on a recent systematic review on COVID-19 in pregnancy, this article provides an overview of select good practices for performing meta-analyses from sta- tistical perspectives. Specifically, we suggest meta-analysts (1) providing sufficient information of included studies, (2) providing information for reproducibility of meta-analyses, (3) using appro- priate terminologies, (4) double-checking presented results, (5) considering alternative estimators of between-study variance, (6) considering alternative confidence intervals, (7) reporting predic- Citation: Zhao, Y.; Lin, L. Good tion intervals, (8) assessing small-study effects whenever possible, and (9) considering one-stage Statistical Practices for Contemporary methods. We use worked examples to illustrate these good practices. Relevant statistical code Meta-Analysis: Examples Based on a is also provided.
    [Show full text]
  • Statistical Inference: Paradigms and Controversies in Historic Perspective
    Jostein Lillestøl, NHH 2014 Statistical inference: Paradigms and controversies in historic perspective 1. Five paradigms We will cover the following five lines of thought: 1. Early Bayesian inference and its revival Inverse probability – Non-informative priors – “Objective” Bayes (1763), Laplace (1774), Jeffreys (1931), Bernardo (1975) 2. Fisherian inference Evidence oriented – Likelihood – Fisher information - Necessity Fisher (1921 and later) 3. Neyman- Pearson inference Action oriented – Frequentist/Sample space – Objective Neyman (1933, 1937), Pearson (1933), Wald (1939), Lehmann (1950 and later) 4. Neo - Bayesian inference Coherent decisions - Subjective/personal De Finetti (1937), Savage (1951), Lindley (1953) 5. Likelihood inference Evidence based – likelihood profiles – likelihood ratios Barnard (1949), Birnbaum (1962), Edwards (1972) Classical inference as it has been practiced since the 1950’s is really none of these in its pure form. It is more like a pragmatic mix of 2 and 3, in particular with respect to testing of significance, pretending to be both action and evidence oriented, which is hard to fulfill in a consistent manner. To keep our minds on track we do not single out this as a separate paradigm, but will discuss this at the end. A main concern through the history of statistical inference has been to establish a sound scientific framework for the analysis of sampled data. Concepts were initially often vague and disputed, but even after their clarification, various schools of thought have at times been in strong opposition to each other. When we try to describe the approaches here, we will use the notions of today. All five paradigms of statistical inference are based on modeling the observed data x given some parameter or “state of the world” , which essentially corresponds to stating the conditional distribution f(x|(or making some assumptions about it).
    [Show full text]
  • Pubh 7405: REGRESSION ANALYSIS
    PubH 7405: REGRESSION ANALYSIS Review #2: Simple Correlation & Regression COURSE INFORMATION • Course Information are at address: www.biostat.umn.edu/~chap/pubh7405 • On each class web page, there is a brief version of the lecture for the day – the part with “formulas”; you can review, preview, or both – and as often as you like. • Follow Reading & Homework assignments at the end of the page when & if applicable. OFFICE HOURS • Instructor’s scheduled office hours: 1:15 to 2:15 Monday & Wednesday, in A441 Mayo Building • Other times are available by appointment • When really needed, can just drop in and interrupt me; could call before coming – making sure that I’m in. • I’m at a research facility on Fridays. Variables • A variable represents a characteristic or a class of measurement. It takes on different values on different subjects/persons. Examples include weight, height, race, sex, SBP, etc. The observed values, also called “observations,” form items of a data set. • Depending on the scale of measurement, we have different types of data. There are “observed variables” (Height, Weight, etc… each takes on different values on different subjects/person) and there are “calculated variables” (Sample Mean, Sample Proportion, etc… each is a “statistic” and each takes on different values on different samples). The Standard Deviation of a calculated variable is called the Standard Error of that variable/statistic. A “variable” – sample mean, sample standard deviation, etc… included – is like a “function”; when you apply it to a target element in its domain, the result is a “number”. For example, “height” is a variable and “the height of Mrs.
    [Show full text]