Stratified Random Sampling for Dependent Inputs
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Latin Hypercube Sampling in Bayesian Networks
From: FLAIRS-00 Proceedings. Copyright © 2000, AAAI (www.aaai.org). All rights reserved. Latin Hypercube Sampling in Bayesian Networks Jian Cheng & Marek J. Druzdzel Decision Systems Laboratory School of Information Sciences and Intelligent Systems Program University of Pittsburgh Pittsburgh, PA 15260 ~j cheng,marek}@sis, pitt. edu Abstract of stochastic sampling algorithms is less dependent on the topology of the network and is linear in the num- We propose a scheme for producing Latin hypercube samples that can enhance any of the existing sampling ber of samples. Computation can be interrupted at any algorithms in Bayesian networks. Wetest this scheme time, yielding an anytime property of the algorithms, in combinationwith the likelihood weightingalgorithm important in time-critical applications. and showthat it can lead to a significant improvement In this paper, we investigate the advantages of in the convergence rate. While performance of sam- applying Latin hypercube sampling to existing sam- piing algorithms in general dependson the numerical pling algorithms in Bayesian networks. While Hen- properties of a network, in our experiments Latin hy- rion (1988) suggested that applying Latin hypercube percube sampling performed always better than ran- sampling can lead to modest improvements in per- dom sampling. In some cases we observed as much formance of stochastic sampling algorithms, neither as an order of magnitude improvementin convergence rates. Wediscuss practical issues related to storage re- he nor anybody else has to our knowledge studied quirements of Latin hypercube sample generation and this systematically or proposed a scheme for gener- propose a low-storage, anytime cascaded version of ating Latin hypercube samples in Bayesian networks. -
(MLHS) Method in the Estimation of a Mixed Logit Model for Vehicle Choice
Transportation Research Part B 40 (2006) 147–163 www.elsevier.com/locate/trb On the use of a Modified Latin Hypercube Sampling (MLHS) method in the estimation of a Mixed Logit Model for vehicle choice Stephane Hess a,*, Kenneth E. Train b,1, John W. Polak a,2 a Centre for Transport Studies, Imperial College London, London SW7 2AZ, United Kingdom b Department of Economics, University of California, 549 Evans Hall # 3880, Berkeley CA 94720-3880, United States Received 22 August 2003; received in revised form 5 March 2004; accepted 14 October 2004 Abstract Quasi-random number sequences have been used extensively for many years in the simulation of inte- grals that do not have a closed-form expression, such as Mixed Logit and Multinomial Probit choice prob- abilities. Halton sequences are one example of such quasi-random number sequences, and various types of Halton sequences, including standard, scrambled, and shuffled versions, have been proposed and tested in the context of travel demand modeling. In this paper, we propose an alternative to Halton sequences, based on an adapted version of Latin Hypercube Sampling. These alternative sequences, like scrambled and shuf- fled Halton sequences, avoid the undesirable correlation patterns that arise in standard Halton sequences. However, they are easier to create than scrambled or shuffled Halton sequences. They also provide more uniform coverage in each dimension than any of the Halton sequences. A detailed analysis, using a 16- dimensional Mixed Logit model for choice between alternative-fuelled vehicles in California, was conducted to compare the performance of the different types of draws. -
Efficient Experimental Design for Regularized Linear Models
Efficient Experimental Design for Regularized Linear Models C. Devon Lin Peter Chien Department of Mathematics and Statistics Department of Statistics Queen's University, Canada University of Wisconsin-Madison Xinwei Deng Department of Statistics Virginia Tech Abstract Regularized linear models, such as Lasso, have attracted great attention in statisti- cal learning and data science. However, there is sporadic work on constructing efficient data collection for regularized linear models. In this work, we propose an experimen- tal design approach, using nearly orthogonal Latin hypercube designs, to enhance the variable selection accuracy of the regularized linear models. Systematic methods for constructing such designs are presented. The effectiveness of the proposed method is illustrated with several examples. Keywords: Design of experiments; Latin hypercube design; Nearly orthogonal design; Regularization, Variable selection. 1 Introduction arXiv:2104.01673v1 [stat.ME] 4 Apr 2021 In statistical learning and data sciences, regularized linear models have attracted great at- tention across multiple disciplines (Fan, Li, and Li, 2005; Hesterberg et al., 2008; Huang, Breheny, and Ma, 2012; Heinze, Wallisch, and Dunkler, 2018). Among various regularized linear models, the Lasso is one of the most well-known techniques on the L1 regularization to achieve accurate prediction with variable selection (Tibshirani, 1996). Statistical properties and various extensions of this method have been actively studied in recent years (Tibshirani, 2011; Zhao and Yu 2006; Zhao et al., 2019; Zou and Hastie, 2015; Zou, 2016). However, 1 there is sporadic work on constructing efficient data collection for regularized linear mod- els. In this article, we study the data collection for the regularized linear model from an experimental design perspective. -
Some Large Deviations Results for Latin Hypercube Sampling
Some Large Deviations Results For Latin Hypercube Sampling Shane S. Drew Tito Homem-de-Mello Department of Industrial Engineering and Management Sciences Northwestern University Evanston, IL 60208-3119, U.S.A. Abstract Large deviations theory is a well-studied area which has shown to have numerous applica- tions. Broadly speaking, the theory deals with analytical approximations of probabilities of certain types of rare events. Moreover, the theory has recently proven instrumental in the study of complexity of approximations of stochastic optimization problems. The typical results, how- ever, assume that the underlying random variables are either i.i.d. or exhibit some form of Markovian dependence. Our interest in this paper is to study the validity of large deviations results in the context of estimators built with Latin Hypercube sampling, a well-known sampling technique for variance reduction. We show that a large deviation principle holds for Latin Hy- percube sampling for functions in one dimension and for separable multi-dimensional functions. Moreover, the upper bound of the probability of a large deviation in these cases is no higher under Latin Hypercube sampling than it is under Monte Carlo sampling. We extend the latter property to functions that are monotone in each argument. Numerical experiments illustrate the theoretical results presented in the paper. 1 Introduction Suppose we wish to calculate E[g(X)] where X = [X1;:::;Xd] is a random vector in Rd and g(¢): Rd 7! R is a measurable function. Further, suppose that the expected value is ¯nite and cannot be written in closed form or be easily calculated, but that g(X) can be easily computed for a given value of X. -
A Tail Quantile Approximation Formula for the Student T and the Symmetric Generalized Hyperbolic Distribution
A Service of Leibniz-Informationszentrum econstor Wirtschaft Leibniz Information Centre Make Your Publications Visible. zbw for Economics Schlüter, Stephan; Fischer, Matthias J. Working Paper A tail quantile approximation formula for the student t and the symmetric generalized hyperbolic distribution IWQW Discussion Papers, No. 05/2009 Provided in Cooperation with: Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics Suggested Citation: Schlüter, Stephan; Fischer, Matthias J. (2009) : A tail quantile approximation formula for the student t and the symmetric generalized hyperbolic distribution, IWQW Discussion Papers, No. 05/2009, Friedrich-Alexander-Universität Erlangen-Nürnberg, Institut für Wirtschaftspolitik und Quantitative Wirtschaftsforschung (IWQW), Nürnberg This Version is available at: http://hdl.handle.net/10419/29554 Standard-Nutzungsbedingungen: Terms of use: Die Dokumente auf EconStor dürfen zu eigenen wissenschaftlichen Documents in EconStor may be saved and copied for your Zwecken und zum Privatgebrauch gespeichert und kopiert werden. personal and scholarly purposes. Sie dürfen die Dokumente nicht für öffentliche oder kommerzielle You are not to copy documents for public or commercial Zwecke vervielfältigen, öffentlich ausstellen, öffentlich zugänglich purposes, to exhibit the documents publicly, to make them machen, vertreiben oder anderweitig nutzen. publicly available on the internet, or to distribute or otherwise use the documents in public. Sofern die Verfasser die Dokumente unter Open-Content-Lizenzen (insbesondere CC-Lizenzen) zur Verfügung gestellt haben sollten, If the documents have been made available under an Open gelten abweichend von diesen Nutzungsbedingungen die in der dort Content Licence (especially Creative Commons Licences), you genannten Lizenz gewährten Nutzungsrechte. may exercise further usage rights as specified in the indicated licence. www.econstor.eu IWQW Institut für Wirtschaftspolitik und Quantitative Wirtschaftsforschung Diskussionspapier Discussion Papers No. -
Algorithms for Operations on Probability Distributions in a Computer Algebra System
W&M ScholarWorks Dissertations, Theses, and Masters Projects Theses, Dissertations, & Master Projects 2001 Algorithms for operations on probability distributions in a computer algebra system Diane Lynn Evans College of William & Mary - Arts & Sciences Follow this and additional works at: https://scholarworks.wm.edu/etd Part of the Mathematics Commons, and the Statistics and Probability Commons Recommended Citation Evans, Diane Lynn, "Algorithms for operations on probability distributions in a computer algebra system" (2001). Dissertations, Theses, and Masters Projects. Paper 1539623382. https://dx.doi.org/doi:10.21220/s2-bath-8582 This Dissertation is brought to you for free and open access by the Theses, Dissertations, & Master Projects at W&M ScholarWorks. It has been accepted for inclusion in Dissertations, Theses, and Masters Projects by an authorized administrator of W&M ScholarWorks. For more information, please contact [email protected]. Reproduced with with permission permission of the of copyright the copyright owner. owner.Further reproductionFurther reproduction prohibited without prohibited permission. without permission. ALGORITHMS FOR OPERATIONS ON PROBABILITY DISTRIBUTIONS IN A COMPUTER ALGEBRA SYSTEM A Dissertation Presented to The Faculty of the Department of Applied Science The College of William & Mary in Virginia In Partial Fulfillment Of the Requirements for the Degree of Doctor of Philosophy by Diane Lynn Evans July 2001 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 3026405 Copyright 2001 by Evans, Diane Lynn All rights reserved. ___ ® UMI UMI Microform 3026405 Copyright 2001 by Bell & Howell Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. -
Generating Random Samples from User-Defined Distributions
The Stata Journal (2011) 11, Number 2, pp. 299–304 Generating random samples from user-defined distributions Katar´ına Luk´acsy Central European University Budapest, Hungary lukacsy [email protected] Abstract. Generating random samples in Stata is very straightforward if the distribution drawn from is uniform or normal. With any other distribution, an inverse method can be used; but even in this case, the user is limited to the built- in functions. For any other distribution functions, their inverse must be derived analytically or numerical methods must be used if analytical derivation of the inverse function is tedious or impossible. In this article, I introduce a command that generates a random sample from any user-specified distribution function using numeric methods that make this command very generic. Keywords: st0229, rsample, random sample, user-defined distribution function, inverse method, Monte Carlo exercise 1 Introduction In statistics, a probability distribution identifies the probability of a random variable. If the random variable is discrete, it identifies the probability of each value of this vari- able. If the random variable is continuous, it defines the probability that this variable’s value falls within a particular interval. The probability distribution describes the range of possible values that a random variable can attain, further referred to as the sup- port interval, and the probability that the value of the random variable is within any (measurable) subset of that range. Random sampling refers to taking a number of independent observations from a probability distribution. Typically, the parameters of the probability distribution (of- ten referred to as true parameters) are unknown, and the aim is to retrieve them using various estimation methods on the random sample generated from this probability dis- tribution. -
Random Variables Generation
Random Variables Generation Revised version of the slides based on the book Discrete-Event Simulation: a first course L.L. Leemis & S.K. Park Section(s) 6.1, 6.2, 7.1, 7.2 c 2006 Pearson Ed., Inc. 0-13-142917-5 Discrete-Event Simulation Random Variables Generation 1/80 Introduction Monte Carlo Simulators differ from Trace Driven simulators because of the use of Random Number Generators to represent the variability that affects the behaviors of real systems. Uniformly distributed random variables are the most elementary representations that we can use in Monte Carlo simulation, but are not enough to capture the complexity of real systems. We must thus devise methods for generating instances (variates) of arbitrary random variables Properly using uniform random numbers, it is possible to obtain this result. In the sequel we will first recall some basic properties of Discrete and Continuous random variables and then we will discuss several methods to obtain their variates Discrete-Event Simulation Random Variables Generation 2/80 Basic Probability Concepts Empirical Probability, derives from performing an experiment many times n and counting the number of occurrences na of an event A The relative frequency of occurrence of event is na/n The frequency theory of probability asserts thatA the relative frequency converges as n → ∞ n Pr( )= lim a A n→∞ n Axiomatic Probability is a formal, set-theoretic approach Mathematically construct the sample space and calculate the number of events A The two are complementary! Discrete-Event Simulation Random -
On Half-Cauchy Distribution and Process
International Journal of Statistika and Mathematika, ISSN: 2277- 2790 E-ISSN: 2249-8605, Volume 3, Issue 2, 2012 pp 77-81 On Half-Cauchy Distribution and Process Elsamma Jacob1 , K Jayakumar2 1Malabar Christian College, Calicut, Kerala, INDIA. 2Department of Statistics, University of Calicut, Kerala, INDIA. Corresponding addresses: [email protected], [email protected] Research Article Abstract: A new form of half- Cauchy distribution using sin ξ cos ξ Marshall-Olkin transformation is introduced. The properties of () = − ξ, () = ξ, ≥ 0 the new distribution such as density, cumulative distribution ξ ξ function, quantiles, measure of skewness and distribution of the extremes are obtained. Time series models with half-Cauchy Remark 1.1. For the HC distribution the moments do distribution as stationary marginal distributions are not not exist. developed so far. We develop first order autoregressive process Remark 1.2. The HC distribution is infinitely divisible with the new distribution as stationary marginal distribution and (Bondesson (1987)) and self decomposable (Diedhiou the properties of the process are studied. Application of the (1998)). distribution in various fields is also discussed. Keywords: Autoregressive Process, Geometric Extreme Stable, Relationship with other distributions: Half-Cauchy Distribution, Skewness, Stationarity, Quantiles. 1. Let Y be a folded t variable with pdf given by 1. Introduction: ( ) > 0, (1.5) The half Cauchy (HC) distribution is derived from 2Γ( ) () = 1 + , ν ∈ , the standard Cauchy distribution by folding the ν ()√νπ curve on the origin so that only positive values can be observed. A continuous random variable X is said When ν = 1 , (1.5) reduces to 2 1 > 0 to have the half Cauchy distribution if its survival () = , function is given by 1 + (x)=1 − tan , > 0 (1.1) Thus, HC distribution coincides with the folded t distribution with = 1 degree of freedom. -
Quantile Estimation with Latin Hypercube Sampling
Submitted to Operations Research manuscript OPRE-2014-08-465-final Authors are encouraged to submit new papers to INFORMS journals by means of a style file template, which includes the journal title. However, use of a template does not certify that the paper has been accepted for publication in the named jour- nal. INFORMS journal templates are for the exclusive purpose of submitting to an INFORMS journal and should not be used to distribute the papers in print or online or to submit the papers to another publication. Quantile Estimation With Latin Hypercube Sampling Hui Dong Supply Chain Management and Marketing Sciences Dept., Rutgers Univ., Newark, NJ 07102, [email protected] Marvin K. Nakayama Computer Science Department, New Jersey Institute of Technology, Newark, NJ 07102, [email protected] Quantiles are often used to measure risk of stochastic systems. We examine quantile estimators obtained using simulation with Latin hypercube sampling (LHS), a variance-reduction technique that efficiently extends stratified sampling to higher dimensions and produces negatively correlated outputs. We consider single- sample LHS (ssLHS), which minimizes the variance that can be obtained from LHS, and also replicated LHS (rLHS). We develop a consistent estimator of the asymptotic variance of the ssLHS quantile estimator's central limit theorem, enabling us to provide the first confidence interval (CI) for a quantile when applying ssLHS. For rLHS, we construct CIs using batching and sectioning. On average, our rLHS CIs are shorter than previous rLHS CIs and only slightly wider than the ssLHS CI. We establish the asymptotic validity of the CIs by first proving that the quantile estimators satisfy Bahadur representations, which show that the quantile estimators can be approximated by linear transformations of estimators of the cumulative distribution function (CDF). -
Handbook on Probability Distributions
R powered R-forge project Handbook on probability distributions R-forge distributions Core Team University Year 2009-2010 LATEXpowered Mac OS' TeXShop edited Contents Introduction 4 I Discrete distributions 6 1 Classic discrete distribution 7 2 Not so-common discrete distribution 27 II Continuous distributions 34 3 Finite support distribution 35 4 The Gaussian family 47 5 Exponential distribution and its extensions 56 6 Chi-squared's ditribution and related extensions 75 7 Student and related distributions 84 8 Pareto family 88 9 Logistic distribution and related extensions 108 10 Extrem Value Theory distributions 111 3 4 CONTENTS III Multivariate and generalized distributions 116 11 Generalization of common distributions 117 12 Multivariate distributions 133 13 Misc 135 Conclusion 137 Bibliography 137 A Mathematical tools 141 Introduction This guide is intended to provide a quite exhaustive (at least as I can) view on probability distri- butions. It is constructed in chapters of distribution family with a section for each distribution. Each section focuses on the tryptic: definition - estimation - application. Ultimate bibles for probability distributions are Wimmer & Altmann (1999) which lists 750 univariate discrete distributions and Johnson et al. (1994) which details continuous distributions. In the appendix, we recall the basics of probability distributions as well as \common" mathe- matical functions, cf. section A.2. And for all distribution, we use the following notations • X a random variable following a given distribution, • x a realization of this random variable, • f the density function (if it exists), • F the (cumulative) distribution function, • P (X = k) the mass probability function in k, • M the moment generating function (if it exists), • G the probability generating function (if it exists), • φ the characteristic function (if it exists), Finally all graphics are done the open source statistical software R and its numerous packages available on the Comprehensive R Archive Network (CRAN∗). -
Sampling Based on Sobol Sequences for Monte Carlo Techniques
Proceedings of Building Simulation 2011: 12th Conference of International Building Performance Simulation Association, Sydney, 14-16 November. SAMPLING BASED ON SOBOL0 SEQUENCES FOR MONTE CARLO TECHNIQUES APPLIED TO BUILDING SIMULATIONS Sebastian Burhenne1;∗, Dirk Jacob1, and Gregor P. Henze2 1Fraunhofer Institute for Solar Energy Systems, Freiburg, Germany 2University of Colorado, Boulder, USA ∗Corresponding author. E-mail address: [email protected] ABSTRACT numerically expensive when the number of analyzed Monte Carlo (MC) techniques are commonly used to parameters (k) is large as its volume increases dramat- perform uncertainty and sensitivity analyses. A key el- ically with k. ement of MC methods is the sampling of input param- In this paper different sampling techniques are ana- eters for the simulation, where the goal is to explore lyzed with respect to the estimator of the mean of the the entire input space with a reasonable sample size result and how quick this estimator converges to the (N). The sample size determines the computational true mean (i.e., expected value) with respect to the cost of the analysis since N is equal to the required sample size. Another analyzed measure of the perfor- number of simulation runs. Quasi-random (QR) se- mance of the sampling strategy is its robustness. Ro- quences such as the Sobol0 sequences are designed to bustness can be measured via the standard error of the generate a sample that is uniformly distributed over estimated mean. This is done using multiple MC sim- the unit hypercube. In this paper, sampling based ulations and analyzing their results. A way to visualize on Sobol0 sequences is compared with other standard the robustness is to compare the empirical cumulated sampling procedures with respect to typical building density functions (CDFs) of several repetitions of the simulation applications.