Targeted Maximum Likelihood Estimation in Safety Analysis

Total Page:16

File Type:pdf, Size:1020Kb

Targeted Maximum Likelihood Estimation in Safety Analysis Targeted Maximum Likelihood Estimation in Safety Analysis Sam Lendle 1 Bruce Fireman 2 Mark van der Laan 1 1UC Berkeley 2Kaiser Permanente ISPE Advanced Topics Session, Barcelona, August 2012 1 / 35 Outline 1 Introduction 2 Super learning 3 TMLE and collaborative TMLE 4 Kaiser Permanente data example 5 Simulations based on KP data 2 / 35 Outline 1 Introduction 2 Super learning 3 TMLE and collaborative TMLE 4 Kaiser Permanente data example 5 Simulations based on KP data 3 / 35 Traditional approach in epidemiology and clinical medicine • Fit several parametric logistic regression models, and select a favorite one. • Report point estimate of coefficient in front of treatment, confidence intervals, and p-value, as if this parametric model was a priori-specified. • Problems • Parametric model is misspecified, but parameter estimates are interpreted as if the model is correct • Estimates of variance do not account for model selection, so confidence intervals and p-values are wrong, even if the final model is somehow correct! 4 / 35 The statistical estimation problem • Observed data: Realizations of random variables with a probability distribution. • Statistical model: Set of possible distributions for the data-generating distribution, defined by actual knowledge about the data. e.g. in an RCT, we know the probability of each subject receiving treatment. • Statistical target parameter: Function of the data-generating distribution that we wish to learn from the data. • Estimator: An a priori-specified algorithm that takes the observed data and returns an estimate of the target parameter. Benchmarked by a dissimilarity-measure (e.g., MSE) w.r.t target parameter. 5 / 35 Causal inference • Non-testable assumptions in addition to the assumptions defining the statistical model. (e.g. the “no unmeasured confounders” assumption). • Allows for causal interpretation of statistical parameter estimates • Even if we don’t believe the non-testable causal assumptions, the statistical estimation problem is still the same, and estimates still have valid statistical interpretations. 6 / 35 Targeted learning • Define true statistical models, and interesting target parameters • Avoid reliance on human art and nonrealistic parametric models • Target the fit of the data-generating distribution to the parameter of interest • Statistical inference • Has been applied to: static or dynamic treatments, direct and indirect effects, parameters of MSMs, variable importance analysis, longitudinal/repeated measures data with time-dependent confounding, censoring/missingness, case-control studies, RCTs 7 / 35 Two stage estimation methodology • Super learning (SL) (van der Laan et al. 2007) • Uses a library of candidate estimators (e.g. multiple parametric models, machine learning algorithms like neural networks, RandomForest, etc.) • Builds data-adaptive weighted combination of estimators using cross validation • Targeted maximum likelihood estimation (TMLE) (van der Laan and Rubin 2006) • Updates initial estimate, often a Super Learner, to remove bias for the parameter of interest • Calculates final parameter from updated fit of the data-generating distribution 8 / 35 Outline 1 Introduction 2 Super learning 3 TMLE and collaborative TMLE 4 Kaiser Permanente data example 5 Simulations based on KP data 9 / 35 Super learning • No need to chose a priori a particular parametric model or machine learning algorithm for a particular problem • Allows one to combine many data-adaptive estimators into one improved estimator. • Grounded by oracle results for loss-function based cross-validation (Van Der Laan and Dudoit 2003). Loss function needs to be bounded. • Performs asymptotically as well as best (oracle) weighted combination, or achieves parametric rate of convergence. 10 / 35 Super learning Figure: Relative Cross-Validated Mean Squared Error (compared to main terms least squares regression) 11 / 35 Super learning 12 / 35 Outline 1 Introduction 2 Super learning 3 TMLE and collaborative TMLE 4 Kaiser Permanente data example 5 Simulations based on KP data 13 / 35 TMLE algorithm 14 / 35 Targeted MLE 1 Identify the least favorable parametric model for fluctuating initial Pˆ – Small “fluctuation” → maximum change in target. 2 Identify optimum amount of fluctuation by MLE. 3 Apply optimal fluctuation to Pˆ → 1st-step targeted maximum likelihood estimator. 4 Repeat until the incremental “fluctuation" is zero – Some important cases: 1 step to convergence. 5 Final probability distribution solves efficient score equation for target parameter → T-MLE is a double robust & locally efficient plug-in estimator 15 / 35 Collaborative TMLE (CTMLE) algorithm • Like TMLE, but chooses an estimate gˆ of the treatment mechanism/propensity score based on how well it helps estimate Ψ(Q0) instead of how well it estimates the true g0. • Build estimate for g0 in a stepwise fashion • Strongest confounders are adjusted for first • Instrumental variables and weak confounders tend to be excluded • Order of terms added to gˆ is chosen via a penalized log likelihood, and number of terms is chosen via cross-validation 16 / 35 Kang and Schafer (2007) simulations • Outcome Y continuous subject to missingness, and 4 covariates, W1, W2, W3, W4 • True population mean (target parameter) is 210, mean among the non-missing is 200. • Positivity violations g0(∆ = 1 | W ) as small as 0.01 • Modification 1: stronger positivity violations, g0(∆ = 1 | W ) as small as 1.1 × 10−5 • Modification 2: same as 1, but one covariate is no longer affects Y , so it is an instrumental variable. 17 / 35 Kang and Schafer (2007) simulations Kang and Schafer Simulation 10 5 0 −5 −10 OLS WLS TMLE A−IPCW C−TMLE 18 / 35 Kang and Schafer (2007) simulations Modification 1 to Kang and Schafer Simulation 40 20 ● ● 0 ● ● −20 ● ● ● −40 ● ● OLS WLS TMLE A−IPCW C−TMLE 19 / 35 Kang and Schafer (2007) simulations Modification 2 to Kang and Schafer Simulation 40 20 ● 0 ● −20 ● ● ● ● ● −40 ● ● OLS WLS TMLE A−IPCW C−TMLE 20 / 35 Outline 1 Introduction 2 Super learning 3 TMLE and collaborative TMLE 4 Kaiser Permanente data example 5 Simulations based on KP data 21 / 35 Description of dataset • A subset of data from Kaiser Permanente, part of which is used in FDA’s Mini-Sentinel drug safety surveillance. • Population: diabetic patients without prior cardiovascular disease who are new users of pioglitazone or a sulfonylurea (two anti-diabetic drugs) and who are followed up for at least 6 months without also starting the other drug.1 • Treatment arm (in this example): pioglitazone (Treatment variable A = 1) • Comparator: sulfonylurea (A = 0) • Outcome (Y ): acute myocardial infarction (AMI) in first 6 months of new anti-diabetic drug use. • Baseline covariates (W ): fifty covariates including demographics, comorbidities, and other drug use. 1We found that adjusting for missing outcomes had no effect on the results in this case so we suppress those results and ignore missingness in this example. 22 / 35 Causal model, counterfactual outcomes, and parameter of interest • Non-parametric structural equation model: Each variable is an unknown deterministic function of the past and an error. • W = fW (UW ) • A = fA(W , UA) • Y = fY (A, W , UY ) • Counterfactual outcomes: substitute a fixed treatment for A in fY : Ya = fY (W , a, UY ) for a ∈ {0, 1}. • Causal parameter of interest: The average treatment effect (ATE). E(Y1 − Y0) • Statistical parameter of interest: Ψ(P0) = E[E(Y | A = 1, W ) − E(Y | A = 0, W )] equals E(Y1 − Y0) under randomization assumption (“no unmeasured confounders”) and positivity assumption 23 / 35 Analysis results • Summary of outcome by treatment Treatment Comparator Total Total 2146 25022 27168 AMI 5 (0.233%) 86 (0.3437%) 91 (0.335%) • Estimates Estimate p-value Unadjusted −0.0011 0.3943 G-comp −0.0007 0.6134 PS matching −0.0013 0.4512 IPTW −0.0005 0.7476 AIPTW −0.0003 0.8585 TMLE −0.0004 0.8042 • Though sample size is large, there are so few AMIs in this subset of data from Kaiser Permanente that it is hard to tell if adjustment for potential confounders is important. 24 / 35 Outline 1 Introduction 2 Super learning 3 TMLE and collaborative TMLE 4 Kaiser Permanente data example 5 Simulations based on KP data 25 / 35 Strategy Simulate datasets based on real study data where the true effect is known to highlight properties of estimators. • Start with KP data set, including additional new users of three other anti-diabetic drugs. • Sample W with replacement from empirical distribution of baseline covariates • Simulate treatment A assignments based on a known function of baseline covariates • Simulate outcome Y based on a function of W adjusted so that Y is not too rare. • Because the Y is simulated based on a function of only baseline covariates and not the treatment, the true average treatment effect is known to be zero. 26 / 35 Simulation 1 • Treatment mechanism a function of 12 covariates strongly predictive of the outcome. • Outcome and propensity score models known and can be correctly specified. • Outcome and propensity score models are misspecified by leaving out half of the important confounders. • Results demonstrate the double-robustness of TMLE and AIPTW: when either the model for the outcome regression or the PS is specified correctly, the parameter estimate is consistent, which is not the case for the G-computation estimator or IPTW. 27 / 35 Simulation 1 Estimator Bias MSE n=1000 n=5000 n=1000 n=5000 Unadjusted 0.0584 0.0575 0.0038 0.0034 G-comp 0.0015 0.0000 0.0003 0.0001 PSM 0.0012 0.0003 0.0006 0.0001 IPTW 0.0017 0.0002 0.0005 0.0001 AIPTW 0.0013 0.0002 0.0004 0.0001 TMLE 0.0014 0.0002 0.0004 0.0001 G-comp, misspecified 0.0183 0.0168 0.0007 0.0004 PSM, misspecified 0.0179 0.0167 0.0008 0.0004 IPTW, misspecified 0.0180 0.0166 0.0007 0.0004 AIPTW, Outcome misspecified 0.0016 0.0002 0.0004 0.0001 AIPTW, PS misspecified 0.0014 0.0001 0.0004 0.0001 TMLE, Outcome misspecified 0.0015 0.0002 0.0004 0.0001 TMLE, PS misspecified 0.0015 0.0001 0.0004 0.0001 28 / 35 Simulation 2 • Treatment mechanism now depends on a covariate that is very predictive of treatment, resulting in positivity violations, but is not a confounder.
Recommended publications
  • Statistical Characterization of Tissue Images for Detec- Tion and Classification of Cervical Precancers
    Statistical characterization of tissue images for detec- tion and classification of cervical precancers Jaidip Jagtap1, Nishigandha Patil2, Chayanika Kala3, Kiran Pandey3, Asha Agarwal3 and Asima Pradhan1, 2* 1Department of Physics, IIT Kanpur, U.P 208016 2Centre for Laser Technology, IIT Kanpur, U.P 208016 3G.S.V.M. Medical College, Kanpur, U.P. 208016 * Corresponding author: [email protected], Phone: +91 512 259 7971, Fax: +91-512-259 0914. Abstract Microscopic images from the biopsy samples of cervical cancer, the current “gold standard” for histopathology analysis, are found to be segregated into differing classes in their correlation properties. Correlation domains clearly indicate increasing cellular clustering in different grades of pre-cancer as compared to their normal counterparts. This trend manifests in the probabilities of pixel value distribution of the corresponding tissue images. Gradual changes in epithelium cell density are reflected well through the physically realizable extinction coefficients. Robust statistical parameters in the form of moments, characterizing these distributions are shown to unambiguously distinguish tissue types. These parameters can effectively improve the diagnosis and classify quantitatively normal and the precancerous tissue sections with a very high degree of sensitivity and specificity. Key words: Cervical cancer; dysplasia, skewness; kurtosis; entropy, extinction coefficient,. 1. Introduction Cancer is a leading cause of death worldwide, with cervical cancer being the fifth most common cancer in women [1-2]. It originates as a few abnormal cells in the initial stage and then spreads rapidly. Treatment of cancer is often ineffective in the later stages, which makes early detection the key to survival. Pre-cancerous cells can sometimes take 10-15 years to develop into cancer and regular tests such as pap-smear are recommended.
    [Show full text]
  • The Instat Guide to Choosing and Interpreting Statistical Tests
    Statistics for biologists The InStat Guide to Choosing and Interpreting Statistical Tests GraphPad InStat Version 3 for Macintosh By GraphPad Software, Inc. © 1990-2001,,, GraphPad Soffftttware,,, IIInc... Allllll riiighttts reserved... Program design, manual and help Dr. Harvey J. Motulsky screens: Paige Searle Programming: Mike Platt John Pilkington Harvey Motulsky Maciiintttosh conversiiion by Soffftttware MacKiiiev... www...mackiiiev...com Project Manager: Dennis Radushev Programmers: Alexander Bezkorovainy Dmitry Farina Quality Assurance: Lena Filimonihina Help and Manual: Pavel Noga Andrew Yeremenko InStat and GraphPad Prism are registered trademarks of GraphPad Software, Inc. All Rights Reserved. Manufactured in the United States of America. Except as permitted under the United States copyright law of 1976, no part of this publication may be reproduced or distributed in any form or by any means without the prior written permission of the publisher. Use of the software is subject to the restrictions contained in the accompanying software license agreement. How ttto reach GraphPad: Phone: (US) 858-457-3909 Fax: (US) 858-457-8141 Email: [email protected] or [email protected] Web: www.graphpad.com Mail: GraphPad Software, Inc. 5755 Oberlin Drive #110 San Diego, CA 92121 USA The entire text of this manual is available on-line at www.graphpad.com Contents Welcome to InStat........................................................................................7 The InStat approach ---------------------------------------------------------------7
    [Show full text]
  • Estimating Suitable Probability Distribution Function for Multimodal Traffic Distribution Function
    Journal of the Korean Society of Marine Environment & Safety Research Paper Vol. 21, No. 3, pp. 253-258, June 30, 2015, ISSN 1229-3431(Print) / ISSN 2287-3341(Online) http://dx.doi.org/10.7837/kosomes.2015.21.3.253 Estimating Suitable Probability Distribution Function for Multimodal Traffic Distribution Function Sang-Lok Yoo* ․ Jae-Yong Jeong** ․ Jeong-Bin Yim** * Graduate school of Mokpo National Maritime University, Mokpo 530-729, Korea ** Professor, Mokpo National Maritime University, Mokpo 530-729, Korea Abstract : The purpose of this study is to find suitable probability distribution function of complex distribution data like multimodal. Normal distribution is broadly used to assume probability distribution function. However, complex distribution data like multimodal are very hard to be estimated by using normal distribution function only, and there might be errors when other distribution functions including normal distribution function are used. In this study, we experimented to find fit probability distribution function in multimodal area, by using AIS(Automatic Identification System) observation data gathered in Mokpo port for a year of 2013. By using chi-squared statistic, gaussian mixture model(GMM) is the fittest model rather than other distribution functions, such as extreme value, generalized extreme value, logistic, and normal distribution. GMM was found to the fit model regard to multimodal data of maritime traffic flow distribution. Probability density function for collision probability and traffic flow distribution will be calculated much precisely in the future. Key Words : Probability distribution function, Multimodal, Gaussian mixture model, Normal distribution, Maritime traffic flow 1. Introduction* traffic time and speed. Some studies estimated the collision probability when the ship Maritime traffic flow is affected by the volume of traffic, tidal is in confronting or passing by applying it to normal distribution current, wave height, and so on.
    [Show full text]
  • Continuous Dependent Variable Models
    Chapter 4 Continuous Dependent Variable Models CHAPTER 4; SECTION A: ANALYSIS OF VARIANCE Purpose of Analysis of Variance: Analysis of Variance is used to analyze the effects of one or more independent variables (factors) on the dependent variable. The dependent variable must be quantitative (continuous). The dependent variable(s) may be either quantitative or qualitative. Unlike regression analysis no assumptions are made about the relation between the independent variable and the dependent variable(s). The theory behind ANOVA is that a change in the magnitude (factor level) of one or more of the independent variables or combination of independent variables (interactions) will influence the magnitude of the response, or dependent variable, and is indicative of differences in parent populations from which the samples were drawn. Analysis is Variance is the basic analytical procedure used in the broad field of experimental designs, and can be used to test the difference in population means under a wide variety of experimental settings—ranging from fairly simple to extremely complex experiments. Thus, it is important to understand that the selection of an appropriate experimental design is the first step in an Analysis of Variance. The following section discusses some of the fundamental differences in basic experimental designs—with the intent merely to introduce the reader to some of the basic considerations and concepts involved with experimental designs. The references section points to some more detailed texts and references on the subject, and should be consulted for detailed treatment on both basic and advanced experimental designs. Examples: An analyst or engineer might be interested to assess the effect of: 1.
    [Show full text]
  • Fitting Population Models to Multiple Sources of Observed Data Author(S): Gary C
    Fitting Population Models to Multiple Sources of Observed Data Author(s): Gary C. White and Bruce C. Lubow Reviewed work(s): Source: The Journal of Wildlife Management, Vol. 66, No. 2 (Apr., 2002), pp. 300-309 Published by: Allen Press Stable URL: http://www.jstor.org/stable/3803162 . Accessed: 07/05/2012 19:10 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. Allen Press is collaborating with JSTOR to digitize, preserve and extend access to The Journal of Wildlife Management. http://www.jstor.org FITTINGPOPULATION MODELS TO MULTIPLESOURCES OF OBSERVEDDATA GARYC. WHITE,'Department of Fisheryand WildlifeBiology, Colorado State University,Fort Collins, CO 80523, USA BRUCEC. LUBOW,Colorado Cooperative Fish and WildlifeUnit, Colorado State University,Fort Collins, CO 80523, USA Abstract:The use of populationmodels based on severalsources of data to set harvestlevels is a standardproce- dure most westernstates use for managementof mule deer (Odocoileushemionus), elk (Cervuselaphus), and other game populations.We present a model-fittingprocedure to estimatemodel parametersfrom multiplesources of observeddata using weighted least squares and model selectionbased on Akaike'sInformation Criterion. The pro- cedure is relativelysimple to implementwith modernspreadsheet software. We illustratesuch an implementation using an examplemule deer population.Typical data requiredinclude age and sex ratios,antlered and antlerless harvest,and populationsize.
    [Show full text]
  • Pdf) of a Random Process X(T) and E() the Mean
    Sensors 2008, 8, 5106-5119; DOI: 10.3390/s8085106 OPEN ACCESS sensors ISSN 1424-8220 www.mdpi.org/sensors Article The Statistical Meaning of Kurtosis and Its New Application to Identification of Persons Based on Seismic Signals Zhiqiang Liang *, Jianming Wei, Junyu Zhao, Haitao Liu, Baoqing Li, Jie Shen and Chunlei Zheng Shanghai Institute of Micro-system and Information Technology, Chinese Academy of Sciences, 200050, Shanghai, P.R. China E-mails: [email protected] (J.M.W); [email protected] (J.Y.Z); [email protected] (L.H.T) * Author to whom correspondence should be addressed; E-mail: [email protected] (Z.Q.L); Tel.: +86- 21-62511070-5914 Received: 14 July 2008; in revised form: 14 August 2008 / Accepted: 20 August 2008 / Published: 27 August 2008 Abstract: This paper presents a new algorithm making use of kurtosis, which is a statistical parameter, to distinguish the seismic signal generated by a person's footsteps from other signals. It is adaptive to any environment and needs no machine study or training. As persons or other targets moving on the ground generate continuous signals in the form of seismic waves, we can separate different targets based on the seismic waves they generate. The parameter of kurtosis is sensitive to impulsive signals, so it’s much more sensitive to the signal generated by person footsteps than other signals generated by vehicles, winds, noise, etc. The parameter of kurtosis is usually employed in the financial analysis, but rarely used in other fields. In this paper, we make use of kurtosis to distinguish person from other targets based on its different sensitivity to different signals.
    [Show full text]
  • Section 2, Basic Statistics and Econometrics 1 Statistical
    Risk and Portfolio Management with Econometrics, Courant Institute, Fall 2009 http://www.math.nyu.edu/faculty/goodman/teaching/RPME09/index.html Jonathan Goodman Section 2, Basic statistics and econometrics The mean variance analysis of Section 1 assumes that the investor knows µ and Σ. This Section discusses how one might estimate them from market data. We think of µ and Σ as parameters in a model of market returns. Ab- stract and general statistical discussions usually use θ to represent one or several model parameters to be estimated from data. We will look at ways to estimate parameters, and the uncertainties in parameter estimates. These notes have a long discussion of Gaussian random variables and the distributions of t and χ2 random variables. This may not be discussed explicitly in class, and is background for the reader. 1 Statistical parameter estimation We begin the discussion of statistics with material that is naive by modern standards. As in Section 1, this provides a simple context for some of the fun- damental ideas of statistics. Many modern statistical methods are descendants of those presented here. Let f(x; θ) be a probability density as a function of x that depends on a parameter (or collection of parameters) θ. Suppose we have n independent samples of the population, f(·; θ). That means that the Xk are independent random variables each with the probability density f(·; θ). The goal of param- eter estimation is to estimate θ using the samples Xk. The parameter estimate is some function of the data, which is written θ ≈ θbn = θbn(X1;:::;Xn) : (1) A statistic is a function of the data.
    [Show full text]
  • Statistical Evidence of Central Moments As Fault Indicators in Ball Bearing Diagnostics
    Statistical evidence of central moments as fault indicators in ball bearing diagnostics Marco Cocconcelli1, Giuseppe Curcuru´2 and Riccardo Rubini1 1University of Modena and Reggio Emilia Via Amendola 2 - Pad. Morselli, 42122 Reggio Emilia, Italy fmarco.cocconcelli, [email protected] 2University of Palermo Viale delle Scienze 1, 90128 Palermo, Italy [email protected] Abstract This paper deals with post processing of vibration data coming from a experimental tests. An AC motor running at constant speed is provided with a faulted ball bearing, tests are done changing the type of fault (outer race, inner race and balls) and the stage of the fault (three levels of severity: from early to late stage). A healthy bearing is also measured for the aim of comparison. The post processing simply consists in the computation of scalar quantities that are used in condition monitoring of mechanical systems: variance, skewness and kurtosis. These are the second, the third and the fourth central moment of a real-valued function respectively. The variance is the expectation of the squared deviation of a random variable from its mean, the skewness is the measure of the lopsidedness of the distribution, while the kurtosis is a measure of the heaviness of the tail of the distribution, compared to the normal distribution of the same variance. Most of the papers in the last decades use them with excellent results. This paper does not propose a new fault detection technique, but it focuses on the informative content of those three quantities in ball bearing diagnostics from a statistical point of view.
    [Show full text]
  • Parallel Approximation of the Maximum Likelihood Estimation for the Prediction of Large-Scale Geostatistics Simulations
    Parallel Approximation of the Maximum Likelihood Estimation for the Prediction of Large-Scale Geostatistics Simulations Sameh Abdulah, Hatem Ltaief, Ying Sun, Marc G. Genton, and David E. Keyes Extreme Computing Research Center Computer, Electrical, and Mathematical Sciences and Engineering Division, King Abdullah University of Science Technology, Thuwal, Saudi Arabia. [email protected], [email protected], [email protected], [email protected], [email protected]. Abstract—Maximum likelihood estimation is an important Equations (PDEs) to estimate conditions at specific output statistical technique for estimating missing data, for example points based on semi-empirical models and assimilated mea- in climate and environmental applications, which are usually surements. This conventional approach translates the original large and feature data points that are irregularly spaced. In particular, the Gaussian log-likelihood function is the de facto big data problem into a large-scale simulation problem, solved model, which operates on the resulting sizable dense covariance globally, en route to particular quantities of interest, and it matrix. The advent of high performance systems with advanced relies on PDE solvers to extract performance from the targeted computing power and memory capacity have enabled full sim- architectures. ulations only for rather small dimensional climate problems, An alternative available in many use cases is to estimate solved at the machine precision accuracy. The challenge for high dimensional problems lies in the computation requirements missing quantities of interest from a statistical model. Until of the log-likelihood function, which necessitates O(n2) storage recently, the computation used in statistical models, like using and O(n3) operations, where n represents the number of given field data to estimate parameters of a Gaussian log-likelihood spatial locations.
    [Show full text]
  • Maximum Likelihood Vs. Bayesian Parameter Estimation
    Maximum Likelihood vs. Bayesian Parameter Estimation Ronald J. Williams CSG 220 Spring 2007 Contains numerous slides downloaded from www.cs.huji.ac.il/course/2003/pmai/tirguls/tirgul10.ppt (apparently authored by Nir Friedman) . Example: Binomial Experiment (Statistics 101) Thumb tack Head Tail When tossed, it can land in one of two positions: Head or Tail We denote by θ the (unknown) probability P(H). Estimation task: Given a sequence of toss samples x[1], x[2], …, x[M] we want to estimate the probabilities P(H)= θ and P(T) = 1 - θ 2 1 Statistical Parameter Fitting Consider instances x[1], x[2], …, x[M] such that z The set of values that x can take is known i.i.d. z Each is sampled from the same distribution samples z Each sampled independently of the rest Here we focus on multinomial distributions z Only finitely many possible values for x z Special case: binomial, with values H(ead) and T(ail) 3 The Likelihood Function How good is a particular θ? It depends on how likely it is to generate the observed data L(θ : D) = P (D | θ) = ∏ P (x [m]| θ) m The likelihood for the sequence H,T, T, H, H is ) D : θ L(θ : D) = θ ⋅(1 − θ) ⋅(1 − θ) ⋅ θ ⋅ θ ( L 0 0.2 0.4 0.6 0.8 1 θ 4 2 Maximum Likelihood Estimation MLE Principle: Choose parameters that maximize the likelihood function This is one of the most commonly used estimators in statistics Intuitively appealing 5 Example: MLE in Binomial Data It can be shown that the MLE for the probability of heads is given by N We prove this θˆ= H after the next slide NH +NT (which coincides with what one would expect) Example: (NH,NT ) = (3,2) ) D : θ ( MLE estimate is 3/5 = 0.6 L 0 0.2 0.4 0.6 0.8 1 6 3 From Binomial to Multinomial For example, suppose X can have the values 1,2,…,K We want to learn the parameters θ 1, θ 2.
    [Show full text]
  • Guidance for Data Quality Assessment
    United States Office of Environmental EPA/600/R-96/084 Environmental Protection Information July, 2000 Agency Washington, DC 20460 Guidance for Data Quality Assessment Practical Methods for Data Analysis QualityEPA QA/G-9 QA00 UPDATE FOREWORD This document is the 2000 (QA00) version of the Guidance for Data Quality Assessment which provides general guidance to organizations on assessing data quality criteria and performance specifications for decision making. The Environmental Protection Agency (EPA) has developed a process for performing Data Quality Assessment (DQA) Process for project managers and planners to determine whether the type, quantity, and quality of data needed to support Agency decisions has been achieved. This guidance is the culmination of experiences in the design and statistical analyses of environmental data in different Program Offices at the EPA. Many elements of prior guidance, statistics, and scientific planning have been incorporated into this document. This document is distinctly different from other guidance documents; it is not intended to be read in a linear or continuous fashion. The intent of the document is for it to be used as a "tool-box" of useful techniques in assessing the quality of data. The overall structure of the document will enable the analyst to investigate many different problems using a systematic methodology. This document is one of a series of quality management guidance documents that the EPA Quality Staff has prepared to assist users in implementing the Agency-wide Quality System. Other related documents include: EPA QA/G-4 Guidance for the Data Quality Objectives Process EPA QA/G-4D DEFT Software for the Data Quality Objectives Process EPA QA/G-4HW Guidance for the Data Quality Objectives Process for Hazardous Waste Site Investigations EPA QA/G-9D Data Quality Evaluation Statistical Toolbox (DataQUEST) This document is intended to be a "living document" that will be updated periodically to incorporate new topics and revisions or refinements to existing procedures.
    [Show full text]
  • Statistics Students' Identification of Inferential Model Elements Within Contexts of Their Own Invention
    Statistics students' identification of inferential model elements within contexts of their own invention Matthew D. Beckman Pennsylvania State University, 326 Thomas Building, University Park, PA 16802 [email protected] Robert delMas University of Minnesota, 56 East River Road, Minneapolis, MN 55455 Abstract Statistical thinking partially depends upon an iterative process by which essential features of a problem setting are identified and mapped onto an abstract model or archetype, and then translated back into the context of the original problem setting (Wild and Pfannkuch 1999). Assessment in introductory statistics often relies on tasks that present students with data in context and expects them to choose and describe an appropriate model. This study explores post-secondary student responses to an alternative task that prompts students to clearly identify a sample, population, statistic, and parameter using a context of their own invention. The data include free-text narrative responses of a random sample of 500 students from a sample of more than 1600 introductory statistics students. Results suggest that students’ responses often portrayed sample and population accurately. Portrayals of statistic and parameter were less reliable and were associated with descriptions of a wide variety of other concepts. Responses frequently attributed a variable of some kind to the statistic, or a study design detail to the parameter. Implications for instruction and research are discussed, including a call for emphasis on a modeling paradigm in introductory statistics. 1 Keywords Statistics education, statistical modeling, statistical inference, assessment, parameter 1. Introduction “Models are one of the most important and yet least understood ideas in an introductory statistics course” (Garfield and Ben-Zvi 2008).
    [Show full text]