[.3Cm] Part IV: Statistical Inference

Introduction to Statistics Part IV: Statistical inference Achim Ahrens Anna Babloyan [email protected] [email protected] Erkal Ersoy [email protected] Heriot-Watt University, Edinburgh September 2015 Outline 1. Descriptive statistics I Sample statistics (mean, variance, percentiles) I Graphs (box plot, histogram) I Data transformations (log transformation, unit of measure) I Correlation vs. Causation 2. Probability theory I Conditional probabilities and independence I Bayes’ theorem 3. Probability distributions I Discrete and continuous probability functions I Probability density function & cumulative distribution function I Binomial, Poisson and Normal distribution I E[X] and V[X] 4. Statistical inference I Population vs. sample I Law of large numbers I Central limit theorem I Confidence intervals I Hypothesis testing and p-values 1 / 41 Introduction Recall, in the last lecture we assumed that we know the probability distribution of the random variable in question as well as the parameters of the distribution (e.g. µ and σ2 for the normal distribution). Under these assumptions we were able to obtain the probability that the random variable would take values within a particular interval (e.g. P(X≤8)). 0.5 N(0, 1) N(0, 2) 0.4 N(0, 3) 0.3 ) x ( f 0.2 0.1 0 8 6 4 2 0 2 4 6 8 − − − − x What if we don’t know µ? 2 / 41 Population vs. sample Suppose we are interested in the distribution of heights in the UK. The residents of the UK are the population; the parameter µ is the true average height of UK residents and σ2 the true variance. If we were to measure the height of all UK residents, we would conduct a census. However, measuring the height of every individual is hardly feasible, or only at an exorbitant cost. Instead, we can randomly select a sample from the population and make inferences from the sample to the population. In particular, we can use the sample statistics (e.g. sample mean and sample variance) to make inferences about the true, but unknown population parameters (µ and σ2). 3 / 41 Population vs. sample We randomly select a sample from the UK population and measure the heights of the individuals in the sample. Simple random sample A random sample is given if each individual in the population has an equal chance of being chosen. Since the draws are random, the height of the first, second, third, . nth selected individual is random, too. That is, X1, X2,..., Xn are random variables. I.I.D. Suppose we draw n items (X1, X2,..., Xn) at random from the same population. Since X1, X2,..., Xn are drawn from the same population, they are identically distributed. Furthermore, since the distribution of Xi does not depend on the distribution of Xj (for i, j = 1,..., n; i 6= j), we can say that they are independently distributed. We say that X1, X2,..., Xn are independently and identically distributed (i.i.d.). 4 / 41 Population vs. sample Now, we draw (n=10, in cm) 182 197 183 171 171 162 152 157 192 174 Given this sample, what is our best guess about µ? It’s just the sample mean. n X 1 x¯ = x = (182 + ··· + 174) = 174.1 i 10 i=1 The sample mean is an unbiased and consistent estimator of the unknown population mean µ. Unbiasedness vs. consistency To understand unbiasedness, note that the sampling distribution of x¯ is centered at µ. When we repeatedly sample (more on this in a bit), x¯ is sometimes above the true value of the parameter µ and sometimes below it. However, the key aspect here is that there is no systematic tendency to overestimate or underestimate the true parameter. This makes x¯ an unbiased estimator of the parameter µ. Unbiasedness vs. consistency An estimator is consistent if, as the sample size increases, the estimator 5 / 41 converges to the true population parameter. The Law of Large Numbers Although x¯ is rarely exactly right and varies from sample to sample, it is still a reasonable (and in fact, the best) estimate of the population mean, µ. This is because it is guaranteed to get closer to the population parameter µ as the sample size increases. Therefore, we know that if we could keep taking measurements from more subjects, eventually we would estimate the true population mean very accurately. This fact is usually referred to as the law of large numbers. It is a remarkable fact because it holds for any population. Law of Large Numbers If we randomly draw independent observations from any population with finite mean µ, the sample mean, x¯, of the observed values approaches the true mean, µ, of the population as the number of observations, n, goes to ∞. 6 / 41 LLN in Action 101 100 99 Mean of first n observations 98 97 1 5 10 50 100 500 1000 5000 10000 Number of observations, n In the diagram on the previous slide (reproduced below), we have plotted the mean of the first n observations in our (artificially 7 / 41 generated) data set of 10,000 observations. More specifically, we generated a normally distributed variable with mean, µ, 100 and standard deviation, σ, 4. Then, to obtain each plotted point, we calculated the mean of the generated figures up to each n. Population vs. sample The sample mean estimator X¯ is a function of X1,..., Xn, n 1 X X¯ = X . N i i=1 Therefore, it is a random variable, whereas the sample mean of our sample, x¯ = 174.1, is a realisation. Estimator vs. estimate An estimator is a function of random variables which represent draws from a population. Thus, the estimator is a random variable. The estimator works like a method or formula for "guessing" population parameters. An estimate, on the other hand, is the numerical value that you obtain from a specific sample. An estimate is not a random variable, it’s just a number. As any other random variable, X¯ follows a distribution. What does the distribution look like? To answer this question, we consider one of the most remarkable theorems in statistics, the central limit theorem. 8 / 41 Central limit theorem Let’s demonstrate the CLT using a simulation. We assume that Xi ∼ i.i.d. uniform(160, 180). That is, we assume that the Xi’s are uniformly distributed within the interval [160, 180]. We proceed as follows: 1. We draw n = 50 observations (x1,..., x50) from our “population” (in this case: from our uniform distribution). 2. We obtain and write down the sample mean (i.e. x¯). 3. Repeat step (1) and (2) 10,000 times. This gives us 10,000 sample means (x¯(1), x¯(2),..., x¯(10,000)). This large set of sample means should give us an idea how the theoretical distribution of X¯ looks like. 9 / 41 CLT in Action 100 repetitions .6 .5 .4 .3 Density .2 .1 0 166 168 170 172 174 Sample mean 10 / 41 CLT in Action 5000 repetitions .6 .5 .4 .3 Density .2 .1 0 166 168 170 172 174 Sample mean 10 / 41 CLT in Action 10000 repetitions .6 .5 .4 .3 Density .2 .1 0 166 168 170 172 174 Sample mean 10 / 41 Central limit theorem 10000 repetitions .6 .5 .4 .3 Density .2 .1 0 (1) (2) (10,000) The sample166 mean of x¯ 168, x¯ ,..., x¯ 170is 170.0007 and172 the standard 174 deviation is 0.8139225. Sample mean 11 / 41 The mean and the standard deviation of x¯ If x¯ is the mean of a sample of size n, which are drawn randomly from a large sample with mean µ and standard deviation σ, then the mean of √σ the sampling distribution of x¯ is µ and its standard deviation is n . More formally, the central limit theorem can be described as follows. Central limit theorem Suppose you draw n random numbers from an arbitrary (discrete or continuous) distribution with mean µ and variance σ2. If n is sufficiently large, then σ2 X¯ ∼ N µ, n ¯ 1 Pn where X = n i=1 Xi. 12 / 41 Short digression: The expected value of X¯ N 1 1 1 1 1 X¯ = X X = X + X + X + ··· + X N i N 1 N 2 N 3 N N i=1 From last lecture, we know that the expectation of a sum is the sum of the expectations and thus: N 1 1 1 1 (X¯ ) = [X X ] = [ X ] + [ X ] + ··· + [ X ] E E N i E N 1 E N 2 E N N i=1 1 1 1 = [X ] + [X ] + ··· + [X ] N E 1 N E 2 N E N 1 1 1 = µ + µ + ··· + µ N N N = µ 13 / 41 Short digression: The variance of X¯ N 1 1 1 1 1 [X¯ ] = [X X ] = [ X + X + X + ··· + X ] V V N i V N 1 N 2 N 3 N N i=1 1 1 1 = [Y ] + [Y ] + ··· + [Y ] N 2 V 1 N 2 V 2 N 2 V N 1 1 1 = σ2 + σ2 + ··· + σ2 N 2 N 2 N 2 σ2 = N This result tells us that the sample variance decreases as the sample size increases. 14 / 41 Making statistical inferences Confidence intervals Recall the following diagram from the first lecture, where we indicated that 95% of the values in a given data set tends to lie within 2 standard deviations from the mean (for normally distributed variables). 68% mean mean - one SD mean + one SD 95% mean mean - two SDs mean + two SDs We can use this observation to make statistical inferences.

[.3Cm] Part IV: Statistical Inference

Statistical Inferences Hypothesis Tests

The Focused Information Criterion in Logistic Regression to Predict Repair of Dental Restorations

What Is Statistic?

Structured Statistical Models of Inductive Reasoning

Statistical Inference: Paradigms and Controversies in Historic Perspective

Understanding Statistical Hypothesis Testing: the Logic of Statistical Inference

Introduction to Statistical Methods Lecture 1: Basic Concepts & Descriptive Statistics

Principles of Statistical Inference

DRAFT Statistical Inference and Meta-Analysis

Introduction to Biostatistics - Lecture 3: Statistical Inference for Proportions

Abstract Title Page Title: a Bayesian Perspective on Methodologies For

Fundamentals of Biostatistical Inference Fall 2019