Sampling Algorithms, from Survey Sampling to Monte Carlo Methods: Tutorial and Literature Review

Total Page:16

File Type:pdf, Size:1020Kb

Sampling Algorithms, from Survey Sampling to Monte Carlo Methods: Tutorial and Literature Review Sampling Algorithms, from Survey Sampling to Monte Carlo Methods: Tutorial and Literature Review Benyamin Ghojogh* [email protected] Department of Electrical and Computer Engineering, Machine Learning Laboratory, University of Waterloo, Waterloo, ON, Canada Hadi Nekoei* [email protected] MILA (Montreal Institute for Learning Algorithms) – Quebec AI Institute, Montreal, Quebec, Canada Aydin Ghojogh* [email protected] Fakhri Karray [email protected] Department of Electrical and Computer Engineering, Centre for Pattern Analysis and Machine Intelligence, University of Waterloo, Waterloo, ON, Canada Mark Crowley [email protected] Department of Electrical and Computer Engineering, Machine Learning Laboratory, University of Waterloo, Waterloo, ON, Canada * The first three authors contributed equally to this work. Abstract pendent, we cover importance sampling and re- This paper is a tutorial and literature review on jection sampling. For MCMC methods, we cover sampling algorithms. We have two main types Metropolis algorithm, Metropolis-Hastings al- of sampling in statistics. The first type is sur- gorithm, Gibbs sampling, and slice sampling. vey sampling which draws samples from a set or Then, we explain the random walk behaviour of population. The second type is sampling from Monte Carlo methods and more efficient Monte probability distribution where we have a proba- Carlo methods, including Hamiltonian (or hy- bility density or mass function. In this paper, we brid) Monte Carlo, Adler’s overrelaxation, and cover both types of sampling. First, we review ordered overrelaxation. Finally, we summarize some required background on mean squared er- the characteristics, pros, and cons of sampling ror, variance, bias, maximum likelihood estima- methods compared to each other. This paper can tion, Bernoulli, Binomial, and Hypergeometric be useful for different fields of statistics, machine distributions, the Horvitz–Thompson estimator, learning, reinforcement learning, and computa- and the Markov property. Then, we explain the tional physics. theory of simple random sampling, bootstrap- arXiv:2011.00901v1 [stat.ME] 2 Nov 2020 ping, stratified sampling, and cluster sampling. 1. Introduction We also briefly introduce multistage sampling, Sampling is a fundamental task in statistics. However, this network sampling, and snowball sampling. Af- terminology is used for two different tasks in statistics. On terwards, we switch to sampling from distribu- one hand, sampling refers to survey sampling which is se- tion. We explain sampling from cumulative dis- lecting instances from a population or set: tribution function, Monte Carlo approximation, simple Monte Carlo methods, and Markov Chain D := fx1; x2; : : : ; xN g; (1) Monte Carlo (MCMC) methods. For simple Monte Carlo methods, whose iterations are inde- where the population size is N := jDj. Note that some of the instances of this population may be repetitive num- bers/vectors. Survey sampling draws n samples from the population D to have a set of samples S where n := jSj. There are several articles and books on survey sam- Sampling Algorithms, from Survey Sampling to Monte Carlo Methods: Tutorial and Literature Review 2 pling such as (Barnett, 1974; Smith, 1976; Foreman, 1991; et al., 2012; Sutton & Barto, 2018). Schofield, 1996; Nassiuma, 2001; Chaudhuri & Stenger, In this tutorial and literature review paper, we cover both 2005; Tille´, 2006; Mukhopadhyay, 2008; Scheaffer et al., areas of sampling, i.e., survey sampling and sampling from 2011; Fuller, 2011; Tille´ & Matei, 2012; Hibberts et al., distributions using Monte Carlo methods. The remain- 2012; Singh & Mangat, 2013; Kalton, 2020). It is a field der of this paper is organized as follows. Section2 re- of research in statistics, with many possible future devel- views some required background on mean squared error, opments (Brick, 2011), especially in distributed networks variance, bias, estimations using maximum likelihood es- and graphs (Frank, 2011a; Heckathorn & Cameron, 2017). timation, Bernoulli, Binomial, and Hypergeometric distri- Some of the popular methods in survey sampling are Sim- butions, the Horvitz–Thompson estimator, and the Markov ple Random Sampling (SRS) (Barnett, 1974), bootstrap- property. We introduce, in detail, the methods of survey ping (Efron & Tibshirani, 1994), stratified sampling, clus- sampling and Monte Carlo methods in Sections3 and4, ter sampling (Barnett, 1974), multistage sampling (Lance respectively. Finally, we provide a summary of methods, & Hattori, 2016), network sampling (Frank, 2011b), and their pros and cons, and conclusions in Section5. snowball sampling (Goodman, 1961). On the other hand, sampling can refer to drawing sam- 2. Background ples from probability distributions. Usually, in real-world 2.1. Mean Squared Error, Variance, and Bias applications, distributions of data are complicated to sam- The materials of this subsection are taken from our previ- ple from; for example, they can be mixture of several dis- ous tutorial paper (Ghojogh & Crowley, 2019). Assume we tributions (Ghojogh et al., 2019a). One can approximate have variable X and we estimate it. Let the random vari- samples from the complicated distributions by sampling able X denote the estimate of X. Let (·) and (·) denote from some other simple-to-sample distribution. The sam- b E P expectation and probability, respectively. The variance of pling methods which perform this sampling approxima- estimating this random variable is defined as: tion are referred to as the Monte Carlo methods (Mackay, 1998; Bishop, 2006; Kalos & Whitlock, 2009; Hammers- ar(X) := (X − (X))2; (2) ley, 2013; Kroese et al., 2013). Monte Carlo approximation V b E b E b (Kalos & Whitlock, 2009) can be used for estimating the X expectation or probability of a function of data over the data which means average deviation of b from the mean of our (X) distribution. Monte Carlo methods can be divided into two estimate, E b , where the deviation is squared for symme- main categories, i.e., simple methods and Markov Chain try of difference. This variance can be restated as: Monte Carlo (MCMC) (MacKay, 2003). Note that Monte ar(X) = (X2) − ( (X))2: (3) Carlo methods are iterative. In simple Monte Carlo meth- V b E b E b ods, every iteration is independent from previous iterations See AppendixA for proof. and drawing samples is performed blindly. Importance Our estimation can have a bias. The bias of our estimate is sampling (Glynn & Iglehart, 1989) and rejection sampling defined as: (Casella et al., 2004; Bishop, 2006; Robert & Casella, 2013) are examples of simple Monte Carlo methods. In ias(X) := (X) − X; (4) MCMC (Murray, 2007), however, every iteration is depen- B b E b dent on its previous iteration because they have the mem- which means how much the mean of our estimate deviates ory of Markov property (Koller & Friedman, 2009). Some from the original X. examples of MCMC are Metropolis algorithm (Metropo- lis et al., 1953), Metropolis-Hastings algorithm (Hastings, Definition 1 (Unbiased Estimator). If the bias of an esti- 1970), Gibbs sampling (Geman & Geman, 1984), and slice mator is zero, i.e., E(Xb) = X, the estimator is unbiased. sampling (Neal, 2003; Skilling & MacKay, 2003). The The Mean Squared Error (MSE) of our estimate, Xb, is de- Metropolis algorithms are usually slow because of their fined as: random walk behaviour (Spitzer, 2013). Some efficient methods, for faster exploration of range of data by sam- 2 MSE(Xb) := E (Xb − X) ; (5) pling methods, are Hamiltonian (or hybrid) Monte Carlo method (Duane et al., 1987), Adler’s overrelaxation (Adler, which means how much our estimate deviates from the 1981), and ordered overrelaxation (Neal, 1998). Monte original X. Carlo methods have been originally developed in compu- The relation of MSE, variance, and bias is as follows: tational physics (Newman, 2013); hence, they have appli- cation in physics (Binder et al., 2012). They also have ap- 2 MSE(Xb) = Var(Xb) + (Bias(Xb)) : (6) plication in other fields such as finance (Glasserman, 2013) and reinforcement learning (Barto & Duff, 1994; Wang See AppendixA for proof. Sampling Algorithms, from Survey Sampling to Monte Carlo Methods: Tutorial and Literature Review 3 If we have two random variables Xb and Yb, we can say: Proof. See AppendixA for proof. Var(aXb + bYb) Lemma 2. The variance of the estimate of mean is: 2 2 = a Var(Xb) + b Var(Xb) + 2ab Cov(X;b Yb); (7) 1 where ov(X; Y ) is covariance defined as: ar(µ) = σ2: (16) C b b V N ov(X; Y ) := (XY ) − (Y ) (Y ): (8) C b b E b b E b E b Proof. See AppendixA for proof. See AppendixA for proof. If the two random variables are independent, i.e., X ?? Y , Proposition 1. An unbiased estimator for variance is: we have: N 1 X E(XbYb) = E(Xb) E(Yb) =) Cov(X;b Yb) = 0; (9) σ2 = (x − µ)2: (17) N − 1 j See AppendixA for proof. Note that Eq. (9) is not true for j=1 the reverse implication (we can prove by counterexample). Proof. See AppendixA for proof. We can extend Eqs. (7) and (8) to multiple random vari- ables: k Note that Eq. (14) is a biased estimate of variance because X its expectation is: Var aiXi i=1 N k k k 2 1 X 2 1 2 X 2 X X E(σ ) = E (xj − µ) = (N − 1) σ : = ai Var(Xi) + aiajCov(Xi;Xj); N N j=1 i=1 i=1 j=1;j6=i (10) 2.3. Bernoulli, Binomial, and Hypergeometric k k k k X1 X2 X1 X2 Distributions Cov aiXi; bjYj = ai bj Cov(Xi;Yj); i=1 j=1 i=1 j=1 Bernoulli distribution is a discrete distribution of being one (11) and zero with probabilities p and 1 − p, respectively. Its expected value and variance are: where ai’s and bj’s are not random. According to Eq. (9), if the random variables are indepen- E(X) = p; (18) dent, Eq. (10) is simplified to: Var(X) = p (1 − p); (19) k k X X 2 Var aiXi = ai Var(Xi): (12) respectively.
Recommended publications
  • Survey Sampling
    Bayesian SAE using Complex Survey Data Lecture 5A: Survey Sampling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 119 Outline Overview Design-Based Inference Simple Random Sampling Stratified Simple Random Sampling Cluster Sampling Multistage Sampling Discussion Technical Appendix: Simple Random Sampling Technical Appendix: Stratified SRS Technical Appendix: Cluster Sampling Technical Appendix: Lonely PSUs 2 / 119 Overview 3 / 119 Outline Many national surveys employ stratified cluster sampling, also known as multistage sampling, so that’s where we’d like to get to. In this lecture we will discuss: I Simple Random Sampling (SRS). I Stratified SRS. I Cluster sampling. I Multistage sampling. 4 / 119 Main texts I Lohr, S.L. (2010). Sampling Design and Analysis, Second Edition. Brooks/Cole Cengage Learning. Very well-written and clear mix of theory and practice. I Lumley, T. (2010). Complex Surveys: A Guide to Analysis Using R, Wiley. Written around the R survey package. Great if you already know a lot about survey sampling. I Korn, E.L. and Graubard, B.I. (1999). Analysis of Health Surveys. Wiley. Well written but not as comprehensive as Lohr. I Sarndal,¨ Swensson and Wretman (1992). Model Assisted Survey Sampling. Springer. Excellent on the theory though steep learning curve and hard to dip into if not familiar with the notation. Also, anti- model-based approaches. 5 / 119 The problem We have a question concerning variables in a well-defined finite population (e.g., 18+ population in Washington State). What is required of a sample plan? We want: I Accurate answer to the question (estimate). I Good estimate of the uncertainty of the estimate (e.g., variance).
    [Show full text]
  • IBM SPSS Complex Samples Business Analytics
    IBM Software IBM SPSS Complex Samples Business Analytics IBM SPSS Complex Samples Correctly compute complex samples statistics When you conduct sample surveys, use a statistics package dedicated to Highlights producing correct estimates for complex sample data. IBM® SPSS® Complex Samples provides specialized statistics that enable you to • Increase the precision of your sample or correctly and easily compute statistics and their standard errors from ensure a representative sample with stratified sampling. complex sample designs. You can apply it to: • Select groups of sampling units with • Survey research – Obtain descriptive and inferential statistics for clustered sampling. survey data. • Select an initial sample, then create • Market research – Analyze customer satisfaction data. a second-stage sample with multistage • Health research – Analyze large public-use datasets on public health sampling. topics such as health and nutrition or alcohol use and traffic fatalities. • Social science – Conduct secondary research on public survey datasets. • Public opinion research – Characterize attitudes on policy issues. SPSS Complex Samples provides you with everything you need for working with complex samples. It includes: • An intuitive Sampling Wizard that guides you step by step through the process of designing a scheme and drawing a sample. • An easy-to-use Analysis Preparation Wizard to help prepare public-use datasets that have been sampled, such as the National Health Inventory Survey data from the Centers for Disease Control and Prevention
    [Show full text]
  • Sampling and Evaluation
    Sampling and Evaluation A Guide to Sampling for Program Impact Evaluation Peter M. Lance Aiko Hattori Suggested citation: Lance, P. and A. Hattori. (2016). Sampling and evaluation: A guide to sampling for program impact evaluation. Chapel Hill, North Carolina: MEASURE Evaluation, University of North Carolina. Sampling and Evaluation A Guide to Sampling for Program Impact Evaluation Peter M. Lance, PhD, MEASURE Evaluation Aiko Hattori, PhD, MEASURE Evaluation ISBN: 978-1-943364-94-7 MEASURE Evaluation This publication was produced with the support of the United States University of North Carolina at Chapel Agency for International Development (USAID) under the terms of Hill MEASURE Evaluation cooperative agreement AID-OAA-L-14-00004. 400 Meadowmont Village Circle, 3rd MEASURE Evaluation is implemented by the Carolina Population Center, University of North Carolina at Chapel Hill in partnership with Floor ICF International; John Snow, Inc.; Management Sciences for Health; Chapel Hill, NC 27517 USA Palladium; and Tulane University. Views expressed are not necessarily Phone: +1 919-445-9350 those of USAID or the United States government. MS-16-112 [email protected] www.measureevaluation.org Dedicated to Anthony G. Turner iii Contents Acknowledgments v 1 Introduction 1 2 Basics of Sample Selection 3 2.1 Basic Selection and Sampling Weights . 5 2.2 Common Sample Selection Extensions and Complications . 58 2.2.1 Multistage Selection . 58 2.2.2 Stratification . 62 2.2.3 The Design Effect, Re-visited . 64 2.2.4 Hard to Find Subpopulations . 64 2.2.5 Large Clusters and Size Sampling . 67 2.3 Complications to Weights . 69 2.3.1 Non-Response Adjustment .
    [Show full text]
  • Sampling Methods (Gatsby ML1 2017)
    Probabilistic & Unsupervised Learning Sampling Methods Maneesh Sahani [email protected] Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London Term 1, Autumn 2017 Both are often intractable. Deterministic approximations on distributions (factored variational / mean-field; BP; EP) or expectations (Bethe / Kikuchi methods) provide tractability, at the expense of a fixed approximation penalty. An alternative is to represent distributions and compute expectations using randomly generated samples. Results are consistent, often unbiased, and precision can generally be improved to an arbitrary degree by increasing the number of samples. Sampling For inference and learning we need to compute both: I Posterior distributions (on latents and/or parameters) or predictive distributions. I Expectations with respect to these distributions. Deterministic approximations on distributions (factored variational / mean-field; BP; EP) or expectations (Bethe / Kikuchi methods) provide tractability, at the expense of a fixed approximation penalty. An alternative is to represent distributions and compute expectations using randomly generated samples. Results are consistent, often unbiased, and precision can generally be improved to an arbitrary degree by increasing the number of samples. Sampling For inference and learning we need to compute both: I Posterior distributions (on latents and/or parameters) or predictive distributions. I Expectations with respect to these distributions. Both are often intractable. An alternative is to represent distributions and compute expectations using randomly generated samples. Results are consistent, often unbiased, and precision can generally be improved to an arbitrary degree by increasing the number of samples. Sampling For inference and learning we need to compute both: I Posterior distributions (on latents and/or parameters) or predictive distributions.
    [Show full text]
  • Target Population” – Do Not Use “Sample Population.”
    NCES Glossary Analysis of Variance: Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to test or analyze the differences among group means in a sample. This technique is operated by modeling the value of the dependent variable Y as a function of an overall mean “µ”, related independent variables X1…Xn, and a residual “e” that is an independent normally distributed random variable. Bias (due to nonresponse): The difference that occurs when respondents differ as a group from non-respondents on a characteristic being studied. Bias (of an estimate): The difference between the expected value of a sample estimate and the corresponding true value for the population. Bootstrap: See Replication techniques. CAPI: Computer Assisted Personal Interviewing enables data collection staff to use portable microcomputers to administer a data collection form while viewing the form on the computer screen. As responses are entered directly into the computer, they are used to guide the interview and are automatically checked for specified range, format, and consistency edits. CATI: Computer Assisted Telephone Interviewing uses a computer system that allows a telephone interviewer to administer a data collection form over the phone while viewing the form on a computer screen. As the interviewer enters responses directly into the computer, the responses are used to guide the interview and are automatically checked for specified range, format, and consistency edits. Cluster: a group of similar people, things or objects positioned or occurring closely together. Cluster Design: Cluster Design refers to a type of sampling method in which the researcher divides the population into separate groups, of similarities - called clusters.
    [Show full text]
  • Slice Sampling for General Completely Random Measures
    Slice Sampling for General Completely Random Measures Peiyuan Zhu Alexandre Bouchard-Cotˆ e´ Trevor Campbell Department of Statistics University of British Columbia Vancouver, BC V6T 1Z4 Abstract such model can select each category zero or once, the latent categories are called features [1], whereas if each Completely random measures provide a princi- category can be selected with multiplicities, the latent pled approach to creating flexible unsupervised categories are called traits [2]. models, where the number of latent features is Consider for example the problem of modelling movie infinite and the number of features that influ- ratings for a set of users. As a first rough approximation, ence the data grows with the size of the data an analyst may entertain a clustering over the movies set. Due to the infinity the latent features, poste- and hope to automatically infer movie genres. Cluster- rior inference requires either marginalization— ing in this context is limited; users may like or dislike resulting in dependence structures that prevent movies based on many overlapping factors such as genre, efficient computation via parallelization and actor and score preferences. Feature models, in contrast, conjugacy—or finite truncation, which arbi- support inference of these overlapping movie attributes. trarily limits the flexibility of the model. In this paper we present a novel Markov chain As the amount of data increases, one may hope to cap- Monte Carlo algorithm for posterior inference ture increasingly sophisticated patterns. We therefore that adaptively sets the truncation level using want the model to increase its complexity accordingly. auxiliary slice variables, enabling efficient, par- In our movie example, this means uncovering more and allelized computation without sacrificing flexi- more diverse user preference patterns and movie attributes bility.
    [Show full text]
  • Sampling Techniques Third Edition
    Sampling Techniques third edition WILLIAM G. COCHRAN Professor of Statistics, Emeritus Harvard University JOHN WILEY & SONS New York• Chichester• Brisbane• Toronto• Singapore Copyright © 1977, by John Wiley & Sons, Inc. All nghts reserved. Published simultaneously in Canada. Reproducuon or tran~lation of any pan of this work beyond that permined by Sections 107 or 108 of the 1976 United Slates Copy­ right Act wuhout the perm1!>sion of the copyright owner 1s unlaw­ ful. Requests for permission or funher mformat1on should be addressed to the Pcrm1ss1ons Depanmcnt, John Wiley & Sons. Inc Library of Congress Cataloging In Puhllcatlon Data: Cochran, William Gemmell, 1909- Sampling techniques. (Wiley series in probability and mathematical statistic!>) Includes bibliographical references and index. 1. Sampling (Statistics) I. Title. QA276.6.C6 1977 001.4'222 77-728 ISBN 0-471-16240-X Printed in the United States of America 40 39 38 37 36 to Betty Preface As did the previous editions, this textbook presents a comprehensive account of sampling theory as it has been developed for use in sample surveys. It cantains illustrations to show how the theory is applied in practice, and exercises to be worked by the student. The book will be useful both as a text for a course on sample surveys in which the major emphasis is on theory and for individual reading by the student. The minimum mathematical equipment necessary to follow the great bulk of the material is a familiarity with algebra, especially relatively complicated algeb­ raic expressions, plus a·knowledge of probability for finite sample spaces, includ­ ing combinatorial probabilities.
    [Show full text]
  • Multilevel Modeling of Complex Survey Data
    Multilevel Modeling of Complex Survey Data Sophia Rabe-Hesketh, University of California, Berkeley and Institute of Education, University of London Joint work with Anders Skrondal, London School of Economics 2007 West Coast Stata Users Group Meeting Marina del Rey, October 2007 GLLAMM – p.1 Outline Model-based and design based inference Multilevel models and pseudolikelihood Pseudo maximum likelihood estimation for U.S. PISA 2000 data Scaling of level-1 weights Simulation study Conclusions GLLAMM – p.2 Multistage sampling: U.S. PISA 2000 data Program for International Student Assessment (PISA): Assess and compare 15 year old students’ reading, math, etc. Three-stage survey with different probabilities of selection Stage 1: Geographic areas k sampled Stage 2: Schools j =1, . , n(2) sampled with different probabilities πj (taking into account school non-response) (1) Stage 3: Students i=1, . , nj sampled from school j, with conditional probabilities πi|j Probability that student i from school j is sampled: πij = πi|jπj GLLAMM – p.3 Model-based and design-based inference Model-based inference: Target of inference is parameter β in infinite population (parameter of data generating mechanism or statistical model) called superpopulation parameter Consistent estimator (assuming simple random sampling) such as maximum likelihood estimator (MLE) yields estimate β Design-based inference: Target of inference is statistic in finiteb population (FP), e.g., mean score yFP of all 15-year olds in LA Student who had a πij =1/5 chance of being sampled represents wij =1/πij =5 similar students in finite population Estimate of finite population mean (Horvitz-Thompson): FP 1 y = w y w ij ij ij ij Xij b P Similar for proportions, totals, etc.
    [Show full text]
  • Efficient Sampling for Gaussian Linear Regression with Arbitrary Priors
    Efficient sampling for Gaussian linear regression with arbitrary priors P. Richard Hahn ∗ Arizona State University and Jingyu He Booth School of Business, University of Chicago and Hedibert F. Lopes INSPER Institute of Education and Research May 18, 2018 Abstract This paper develops a slice sampler for Bayesian linear regression models with arbitrary priors. The new sampler has two advantages over current approaches. One, it is faster than many custom implementations that rely on auxiliary latent variables, if the number of regressors is large. Two, it can be used with any prior with a density function that can be evaluated up to a normalizing constant, making it ideal for investigating the properties of new shrinkage priors without having to develop custom sampling algorithms. The new sampler takes advantage of the special structure of the linear regression likelihood, allowing it to produce better effective sample size per second than common alternative approaches. Keywords: Bayesian computation, linear regression, shrinkage priors, slice sampling ∗The authors gratefully acknowledge the Booth School of Business for support. 1 1 Introduction This paper develops a computationally efficient posterior sampling algorithm for Bayesian linear regression models with Gaussian errors. Our new approach is motivated by the fact that existing software implementations for Bayesian linear regression do not readily handle problems with large number of observations (hundreds of thousands) and predictors (thou- sands). Moreover, existing sampling algorithms for popular shrinkage priors are bespoke Gibbs samplers based on case-specific latent variable representations. By contrast, the new algorithm does not rely on case-specific auxiliary variable representations, which allows for rapid prototyping of novel shrinkage priors outside the conditionally Gaussian framework.
    [Show full text]
  • CHAPTER 5 Choosing the Type of Probability Sampling
    CHAPTER 5 CHOOSING THE TYPE OF PROBABILITY SAMPLING What you will learn in this chapter: •• The types of probability sampling and how they differ from each other •• Steps in carrying out the major probability sample designs •• The strengths and weaknesses of the various types of probability sampling •• Differences between stratified sampling and quota sampling •• Differences between stratified sampling and cluster sampling •• Differences between multistage cluster sampling and multiphase sampling INTRODUCTION Once a choice is made to use a probability sample design, one must choose the type of probability sampling to use. This chapter includes descriptions of the major types of probability sampling. It covers steps involved in their adminis- tration, their subtypes, their weaknesses and strengths, and guidelines for choosing among them. There are four major types of probability sample designs: simple random sampling, stratified sampling, systematic sampling, and cluster sampling (see Figure 5.1). Simple random sampling is the most recognized probability sam- pling procedure. Stratified sampling offers significant improvement to simple random sampling. Systematic sampling is probably the easiest one to use, and cluster sampling is most practical for large national surveys. These sampling procedures are described below. 125 126 Sampling Essentials Figure 5.1 Major Types of Probability Sampling Probability Sample Designs Simple Stratified Systematic Cluster Random Sampling Sampling Sampling Sampling SIMPLE RANDOM SAMPLING What Is Simple Random Sampling? Simple random sampling is a probability sampling procedure that gives every element in the target population, and each possible sample of a given size, an equal chance of being selected. As such, it is an equal probability selection method (EPSEM).
    [Show full text]
  • An Overview of Primary Sampling Units (Psus) in Multi-Stage Samples for Demographic Surveys1
    Section on Government Statistics – JSM 2008 An Overview of Primary Sampling Units (PSUs) in Multi-Stage Samples 1 for Demographic Surveys Padraic Murphy U.S. Census Bureau, Washington, DC 20233-9200 Abstract This paper will present an overview of historical and current issues in the definition, stratification, and selection of Primary Sampling Units (PSUs) in large demographic surveys. By PSUs, we mean the large clusters (usually geographic areas such as counties) that are the sampling frame in the first stage of a multi-stage sample design. While simple in concept, the details of defining, stratifying, and selecting PSUs can prove to be surprisingly complex. We look at developments pertaining to PSUs (as used by the U.S. Census Bureau) over the past half-century, and describe some current problems. The issues discussed include (1) constraints on PSU size and boundaries, (2) choosing "building blocks" for PSUs, (3) methodology for and objectives of PSU stratification, (4) coordination among multiple surveys, and (5) coordination with preceding designs (maximizing or minimizing overlap.) Key Words: Sample Design, Stratified Sampling, History of Surveys, Multistage Sampling 1. Introduction In 1802, the French mathematician Laplace persuaded the French government to use a sample survey to estimate the population of France as of September 22 of that year. The method used was one he had demonstrated earlier in estimating the 1782 population of France; they first took a sample of small administrative districts known as communes, counted the total population y of each sample commune, and then used the known number of births in both the communes and the nation – x and X, respectively – to calculate the ratio estimator Y = X(y/x).
    [Show full text]
  • Approximate Slice Sampling for Bayesian Posterior Inference
    Approximate Slice Sampling for Bayesian Posterior Inference Christopher DuBois∗ Anoop Korattikara∗ Max Welling Padhraic Smyth GraphLab, Inc. Dept. of Computer Science Informatics Institute Dept. of Computer Science UC Irvine University of Amsterdam UC Irvine Abstract features, or as is now popular in the context of deep learning, dropout. We believe deep learning is so suc- cessful because it provides the means to train arbitrar- In this paper, we advance the theory of large ily complex models on very large datasets that can be scale Bayesian posterior inference by intro- effectively regularized. ducing a new approximate slice sampler that uses only small mini-batches of data in ev- Methods for Bayesian posterior inference provide a ery iteration. While this introduces a bias principled alternative to frequentist regularization in the stationary distribution, the computa- techniques. The workhorse for approximate posterior tional savings allow us to draw more samples inference has long been MCMC, which unfortunately in a given amount of time and reduce sam- is not (yet) quite up to the task of dealing with very pling variance. We empirically verify on three large datasets. The main reason is that every iter- different models that the approximate slice ation of sampling requires O(N) calculations to de- sampling algorithm can significantly outper- cide whether a proposed parameter is accepted or not. form a traditional slice sampler if we are al- Frequentist methods based on stochastic gradients are lowed only a fixed amount of computing time orders of magnitude faster because they effectively ex- for our simulations. ploit the fact that far away from convergence, the in- formation in the data about how to improve the model is highly redundant.
    [Show full text]