Selected Topics in Statistical Computing

April 27, 2020 0:25 StatComputingArticle ENCYCLOPEDIA WITH SEMANTIC COMPUTING Vol. 1, No. 1 (2016) article id (17 pages) c The Authors Selected Topics in Statistical Computing Suneel Babu Chatla y , Chun-houh Chen z, and Galit Shmueliy yInstitute of Service Science, National Tsing Hua University, Hsinchu 30013, Taiwan R.O.C zInstitute of Statistical Science, Academia Sinica, Taipei 11529, Taiwan R.O.C Received Day Month Year; Revised Day Month Year; Accepted Day Month Year; Published Day Month Year The field of computational statistics refers to statistical methods or tools that are computationally intensive. Due to the recent advances in computing power some of these methods have become prominent and central to modern data analysis. In this article we focus on several of the main methods including density estimation, kernel smoothing, smoothing splines, and additive models. While the field of computational statistics includes many more methods, this article serves as a brief introduction to selected popular topics. Keywords: Histogram, Kernel density , Local regression, Additive models, splines, MCMC, Bootstrap Introduction highlights major computational methods for estimation and - “Let the data speak for themselves” for inference. We do not aim to provide a comprehensive review of each of these methods, but rather a brief introduction. In 1962, John Tukey76 published an article on ”the fu- However, we compiled a list of references for readers inter- ture of data analysis”, which turns out to be extraordinarily ested in further information on any of these methods. For each clairvoyant. Specifically, he accorded algorithmic models as of the methods, we provide the statistical definition and prop- the same foundation status as algebraic models that statis- erties, as well as a brief illustration using an example dataset. ticians had favored at that time. More than a three decades In addition to the aforementioned topics, the twenty first later, in 1998 Jerome Friedman delivered a keynote speech23 century has witnessed tremendous growth in statistical com- in which he stressed the role of data driven or algorithmic putational methods such as functional data analysis, lasso, models in the next revolution of statistical computing. In re- and machine learning methods such as random forests, neu- sponse, the field of statistics has seen tremendous growth in ral networks, deep learning and support vector machines. Al- some research areas related to computational statistics. though most of these methods have roots in the machine According to the current Wikipedia entry on “Compu- learning field, they have become popular in the field of statis- tational Statistics”a: “Computational statistics or statistical tics as well. The recent book by Ref. 15 describes many of computing refers to the interface between statistics and com- these topics. puter science. It is the area of computational science (or sci- The article is organized as follows. In Section 1, we open entific computing) specific to the mathematical science of with nonparametric density estimation. Sections 2 and 3 dis- statistics.” Two well known examples of statistical computing cuss smoothing methods and their extensions. Specifically, methods are the bootstrap and Markov Chain Monte Carlo Section 2 focuses on kernel smoothing while Section 3 in- (MCMC). These methods are prohibitive with insufficient troduces spline smoothing. Section 4 covers additive models, computing power. While the bootstrap has gained significant and Section 5 introduces Markov chain Monte Carlo methods arXiv:2004.11816v1 [stat.ME] 24 Apr 2020 popularity both in academic research and in practical appli- (MCMC). The final Section 6 is dedicated to the two most cations its feasibility still relies on efficient computing. Sim- popular resampling methods: the bootstrap and jackknife. ilarly, MCMC, which is at the core of Bayesian analysis, is computationally very demanding. A third method which has 1. Density Estimation also become prominent in both academia and practice is nonparametric estimation. Today, nonparametric models are pop- A basic characteristic describing the behavior of any random ular data analytic tools due to their flexibility despite being variable X is its probability density function. Knowledge of very computationally intensive, and even prohibitively inten- the density function is useful in many aspects. By looking sive with large datasets. at the density function chart we can get a clear picture of In this article, we provide summarized expositions for whether the distribution is skewed, multi-modal, etc. In the some of these important methods. The choice of methods simple case of a continuous random variable X over an inter- y The corresponding author can be reached at [email protected]. The first and third authors were supported in part by grant 105-2410-H-007-034- MY3 from the Ministry of Science and Technology in Taiwan. This is an Open Access article published by World Scientific Publishing Company. It is distributed under the terms of the Creative Commons Attribution 3.0 (CC-BY) License. Further distribution of this work is permitted, provided the original work is properly cited. ahttps://en.wikipedia.org/wiki/Computational_statistics, accessed August 24, 2016 article id-1 April 27, 2020 0:25 StatComputingArticle Chatla, Chen and Shmueli ESC 1, article id (2016) val X 2 (a; b), the density is defined as idea of the accuracy and precision of the estimator. If we de- Z b fine P(a < X < b) = f (x)dx: h a B j = x0 + ( j − 1)h; x0 + jh ; j 2 Z; In most practical studies the density of X is not directly avail- able. Instead, we are given a set of n observations x1;:::; xn with x0 being the origin of the histogram, then the histogram that we assume are iid realizations of the random variable. estimator can be formally written as We then aim to estimate the density on the basis of these observations. There are two basic estimation approaches: the Xn X ˆ −1 2 2 parametric approach, which consists of representing the den- fh(x) = (nh) I(Xi B j)I(x B j): (4) sity with a finite set of parameters, and the nonparametric ap- i=1 j proach, which does not restrict the possible form of the den- We now define the bias of histogram estimator. Assume sity function by assuming it belongs to a pre-specified family that the origin of the histogram x is zero and x 2 B . Since of density functions. 0 j X are identically distributed In parametric estimation only the parameters are un- i known. Hence, the density estimation problem is equivalent n −1 X to estimating the parameters. However, in the nonparametric E[ fˆh(x)] = (nh) E[I(Xi 2 B j)] approach, one must estimate the entire distribution. This is i=1 −1 because we make no assumptions about the density function. = (nh) nE[I(X 2 B j)] Z jh = h−1 f (u)du: 1.1. Histogram ( j−1)h The oldest and most widely used density estimator is the histogram. Detailed discussions are found in Refs. 65 and 34. This last term is not equal to f (x) unless f (x) is constant in Using the definition of derivatives we can write the density in B j. For simplicity, assume f (x) = a + cx; x 2 B j and a; c 2 R. the following form: Therefore d F(x + h) − F(x) ˆ ˆ f (x) ≡ F(x) ≡ lim : (1) Bias[ fh(x)] = E[ fh(x)] − f (x) dx h!0 h Z −1 where F(x) is the cumulative distribution function of the ran- = h ( f (u) − f (x))du B j dom variable X. A natural finite sample analog of equation Z (1) is to divide the real line into K equi-sized bins with small = h−1 (a + cu − a − cx)du bin width h and replace F(x) with the empirical cumulative B j ! ! distribution function 1 = h−1hc j − h − x #fx ≤ xg 2 Fˆ(x) = i : ! ! n 1 = c j − h − x : This leads to the empirical density function estimate 2 (#fx ≤ b g − #fx ≤ b g)=n ˆ i j+1 i j Instead of slope c we may write the first derivative of the den- f (x) = ; x 2 (b j; b j+1]; 1 h sity at the midpoint ( j − 2 )h of the bin B j where (b ; b ] defines the boundaries of the jth bin and j j+1 ! ! ! ! h b − b n fx ≤ b g − fx ≤ b g 0 1 1 = j+1 j. If we define j = # i j+1 # i j Bias( fˆ (x)) = f j − h j − h − x then h 2 2 n j fˆ(x) = : (2) = O(1)O(h) nh = O(h); h ! 0: The same histogram estimate can also be obtained using maximum likelihood estimation methods. Here, we try to find When f is not linear, a Taylor expansion of f to the first order a density fˆ maximizing the likelihood in the observations reduces the problem to the linear case. Hence the bias of the n ˆ histogram is given by Πi=1 f (xi): (3) ! ! ! ! Since the above likelihood (or its logarithm) cannot be 1 0 1 Bias( fˆ (x)) = j − h − x f j − h + o(h); h ! 0: maximized directly, penalized maximum likelihood estima- h 2 2 tion can be used to obtain the histogram estimate. (5) Next, we proceed to calculate the bias, variance and MSE of the histogram estimator. These properties give us an Similarly, the variance for the histogram estimator can be cal- article id-2 April 27, 2020 0:25 StatComputingArticle Chatla, Chen and Shmueli ESC 1, article id (2016) culated as der nh ! 1; h ! 0, the histogram estimator is consistent; p Xn fˆ (x) −! f (x).

Selected Topics in Statistical Computing

Applications of Computational Statistics with Multiple Regressions

Isye 6416: Computational Statistics Lecture 1: Introduction

Factor Dynamic Analysis with STATA

The Stata Journal Publishes Reviewed Papers Together with Shorter Notes Or Comments, Regular Columns, Book Reviews, and Other Material of Interest to Stata Users

Computational Tools for Economics Using R

Robust Statistics Part 1: Introduction and Univariate Data General References

Distributed Statistical Inference for Massive Data

Review of Stata 7

A User-Friendly Computational Framework for Robust Structured Regression Using the L2 Criterion Arxiv:2010.04133V1 [Stat.CO] 8

Introduction to Computational Statistics 1 Overview 2 Course Materials

Bayesian Linear Regression Models with Flexible Error Distributions

Bayesian Linear Regression Ahmed Ali, Alan N