Simulation and Resampling

Simulation and Resampling Maribel Borrajo Garc´ıa and Mercedes Conde Amboage [email protected]; [email protected] Introduction Maribel Borrajo and Mercedes Conde (Modestya) Simulation and Resampling 1 What is simulation? The imitation of a real world process or system using mathematical models and computational resources. Conclusions SIMULATION Real system Output Math. model Comp. power Maribel Borrajo and Mercedes Conde (Modestya) Simulation and Resampling 2 What is simulation? The imitation of a real world process or system using mathematical models and computational resources Natural sample: set of observations from the population obtained through field work. Artificial sample (lab sample): set of observations from the population NOT obtained through field work. To obtain artificial samples we need to know the population. Maribel Borrajo and Mercedes Conde (Modestya) Simulation and Resampling 3 What is simulation? Advantages The only tool if mathematical methods are not available. Existing mathematical models are too complex. Allows comparison of alternative designs. Drawbacks Computational power. For stochastic models, simulation estimates the output, while the analytical method, if available, produces exact solutions. Not appropriate models may result in wrong conclusions. Maribel Borrajo and Mercedes Conde (Modestya) Simulation and Resampling 4 Applications of simulation Approximation of the distribution (or characteristic) of a certain statistic (bias and variability) with Monte Carlo methods. Comparison of two confidence intervals approximating with Monte Carlo their coverage. Comparison of two hypothesis tests approximating with Monte Carlo their power functions. There exist a wider variety of applications based on these ones. Maribel Borrajo and Mercedes Conde (Modestya) Simulation and Resampling 5 What is bootstrap? “Pull yourself up by your own bootstraps” Maribel Borrajo and Mercedes Conde (Modestya) Simulation and Resampling 6 What is bootstrap? Bradley Efron (Stanford University, 1979). He defined the method by mixing Monte Carlo with problem resolution in a very general way. Peter Hall (1951-2016). He was one of the most prolific nowadays statisticians and he devoted a huge part of his work to bootstrap from the 80’s. Hypothesis on the population distribution are not required. Intro R Maribel Borrajo and Mercedes Conde (Modestya) Simulation and Resampling 7 Part I: Simulation Maribel Borrajo and Mercedes Conde (Modestya) Simulation and Resampling 8 Part I: Simulation Module I: Simulation of univariate distributions Maribel Borrajo and Mercedes Conde (Modestya) Simulation and Resampling 9 Univariate simulation Continuos variables Salto linea The inverse transform method. Acceptance-rejection methods. ... Discrete variables Salto linea The generalized inverse method or quantile transformation method. Truncation methods. ... Cao, R. (2002) Introducción a la simulación y a la teor´ıa de colas. NetBiblio. Maribel Borrajo and Mercedes Conde (Modestya) Simulation and Resampling 10 Continuous univariate simulation The inverse transform method Probability Integral Transform Any random variable can be transformed into a uniform random variable and, more importantly, vice versa. That is, if a random variable X has density f and cumulative distribution function F , then it follows that Z x F (x) = f (t)dt, −∞ and if we set U = F (X), then U is a random variable distributed from a uniform U(0, 1). Random Numbers Generation Maribel Borrajo and Mercedes Conde (Modestya) Simulation and Resampling 11 Continuous univariate simulation The inverse transform method Probability Integral Transform Any random variable can be transformed into a uniform random variable and, more importantly, vice versa. That is, if a random variable X has density f and cumulative distribution function F , then it follows that Z x F (x) = f (t)dt, X = F−1(U) −∞ and if we set U = F (X), then U is a random variable distributed from a uniform U(0, 1). Random Numbers Generation Maribel Borrajo and Mercedes Conde (Modestya) Simulation and Resampling 11 Continuous univariate simulation The inverse transform method Example: Let us consider a random variable that follows a Maxwell distribution which density is given by 2 f (x) = xe−x /2 if x > 0, and as a consequence Z x Z x 2 2 F (x) = f (t)dt = te−t /2dt = 1 − e−x /2 con x > 0. −∞ 0 In this case, it is easy to compute the inverse of this distribution function as 2 q y = F (x) ⇔ y = 1 − e−x /2 ⇔ x = −2 ln(1 − y). Maribel Borrajo and Mercedes Conde (Modestya) Simulation and Resampling 12 Continuous univariate simulation > n=10000 > u<-runif(n) > maxw<-sqrt(-2*log(u)) > fmaxw<-function(x)f Estimated density + x*exp(-xˆ2/2) Theoretical density g > sec=seq(0,4,by=0.05) > fsec=fmaxw(sec) > plot(density(maxw),lwd=2) > lines(sec,fsec,col=2,lwd=2) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0 1 2 3 4 Maribel Borrajo and Mercedes Conde (Modestya) Simulation and Resampling 13 Continuous univariate simulation Acceptance-rejection methods Assumptions: The functional form of the density f of interest (called the target density) up to a multiplicative constant is known. Idea: We use a simpler (to simulate) density g, called the instru- mental or candidate density, to generate the random variable for which the simulation is actually done. Constraints: f and g have compatible supports, i.e., g(x) > 0 when f (x) > 0. f (x) There is a constant M with g(x) ≤ M for all x. Maribel Borrajo and Mercedes Conde (Modestya) Simulation and Resampling 14 > u=runif(1)*M > while (u>f(y)/g(y))f + u=runif(1)*M + y=randg(1) g Continuous univariate simulation Acceptance-rejection methods Algorithm 1. Generate Y ∼ g and U ∼ U(0, 1). f (Y ) 2. Accept X = Y if U ≤ Mg(Y ) . 3. Return to Step 1 otherwise. Maribel Borrajo and Mercedes Conde (Modestya) Simulation and Resampling 15 Continuous univariate simulation Acceptance-rejection methods Algorithm 1. Generate Y ∼ g and U ∼ U(0, 1). f (Y ) 2. Accept X = Y if U ≤ Mg(Y ) . 3. Return to Step 1 otherwise. > u=runif(1)*M > y=randg(1) > while (u>f(y)/g(y))f + u=runif(1)*M + y=randg(1) g Maribel Borrajo and Mercedes Conde (Modestya) Simulation and Resampling 15 Continuous univariate simulation Acceptance-rejection methods Algorithm 1. Generate Y ∼ g and U ∼ U(0, 1). f (Y ) 2. Accept X = Y if U ≤ Mg(Y ) . 3. Return to Step 1 otherwise. > u=runif(1)*M > y=randg(1) randg is a function > while (u>f(y)/g(y))f that delivers gene- + u=runif(1)*M rations from the + y=randg(1) density g. g Maribel Borrajo and Mercedes Conde (Modestya) Simulation and Resampling 15 Discrete univariate simulation Generalized inverse method Idea: Given a random variable with distribution function F and Pj quantile function Q(u) = inf{xj / i=1 pi ≥ u}. If we consider a random variable U ∼ U(0, 1) then the variable Q(U) has the same distribution as X. This method is also called quantile transformation. Problem: Compute the quantile function Q is not a easy problem in some cases. It is needed to use a sequential search. Algorithm: 1. Generate U ∼ U(0, 1). 2. Put I = 1 and S = p1. 3. While U > S compute I = I + 1 and S = S + pI . 4. Return X = xI . Maribel Borrajo and Mercedes Conde (Modestya) Simulation and Resampling 16 Discrete univariate simulation Generalized inverse method Example: Let us consider a discrete variable that takes values x1,..., xn with the same probability 1/n. > gdv<-function(x, prob) f + i <- 1 + Fx <- prob[1] + U <- runif(1) + while (Fx < U) f + i <- i + 1 + Fx <- Fx + prob[i] + g + return(x[i]) + g 0.00 0.05 0.10 0.15 0.20 1 2 3 4 5 Maribel Borrajo and Mercedes Conde (Modestya) Simulation and Resampling 17 Discrete univariate simulation Truncation methods Use a continuous distribution similar to the discrete one X x1 x2 x3 ··· xn X F (x) = pi . p1 p2 p3 ··· pn xi ≤x We define Y ∼ G and values −∞ = a0 < a1 < a2 < ··· < an−1 < an = ∞ such that − F (xi ) − F (xi ) = pi = G(ai ) − G(ai−1) i = 1,..., n. Then P(Y ∈ [ai−1, ai )) = pi . If the continuous r.v. Y is easy to simulate, we can generate values from it and transform them into values of X. Cao, R. (2002) Introducción a la simulación y a la teor´ıa de colas. NetBiblio. Maribel Borrajo and Mercedes Conde (Modestya) Simulation and Resampling 18 Discrete univariate simulation Truncation methods Algorithm 1. Generate Y ∼ G. 2. Find i/ai−1 < Y < ai . 3. Return xi . Cao, R. (2002) Introducción a la simulación y a la teor´ıa de colas. NetBiblio. Maribel Borrajo and Mercedes Conde (Modestya) Simulation and Resampling 19 Part I: Simulation Module II: Simulation of Multidimensional Distributions Maribel Borrajo and Mercedes Conde (Modestya) Simulation and Resampling 20 Discrete case X X X X FX,Y (x, y) = P(X = xi , Y = yj ) = pij . xi ≤x yj ≤y xi ≤x yj ≤y Continuous case Z x Z y FX,Y (x, y) = P(X ≤ x, Y ≤ y) = fX,Y (u, v)dudv. −∞ −∞ Introduction Given a pair of variables (X, Y ), let us define the distribution function as FX,Y (x, y) = P(X ≤ x, Y ≤ y) Maribel Borrajo and Mercedes Conde (Modestya) Simulation and Resampling 21 Continuous case Z x Z y FX,Y (x, y) = P(X ≤ x, Y ≤ y) = fX,Y (u, v)dudv. −∞ −∞ Introduction Given a pair of variables (X, Y ), let us define the distribution function as FX,Y (x, y) = P(X ≤ x, Y ≤ y) Discrete case X X X X FX,Y (x, y) = P(X = xi , Y = yj ) = pij . xi ≤x yj ≤y xi ≤x yj ≤y Maribel Borrajo and Mercedes Conde (Modestya) Simulation and Resampling 21 Introduction Given a pair of variables (X, Y ), let us define the distribution function as FX,Y (x, y) = P(X ≤ x, Y ≤ y) Discrete case X X X X FX,Y (x, y) = P(X = xi , Y = yj ) = pij .

Simulation and Resampling

Analysis of Imbalance Strategies Recommendation Using a Meta-Learning Approach

Particle Filtering-Based Maximum Likelihood Estimation for Financial Parameter Estimation

Resampling Hypothesis Tests for Autocorrelated Fields

Sampling and Resampling

Efficient Computation of Adjusted P-Values for Resampling-Based Stepdown Multiple Testing

Resampling Methods: Concepts, Applications, and Justification

Monte Carlo Simulation and Resampling

Resampling: an Improvement of Importance Sampling in Varying Population Size Models Coralie Merle, Raphaël Leblois, François Rousset, Pierre Pudlo

Profiling Heteroscedasticity in Linear Regression Models

Lessons in Biostatistics

Resampling Methods in Paleontology

Resampling Stats Add-In for Excel User's Guide