<<

Lecture 20 and Monte Carlo

J. Chaudhry

Department of and Statistics University of New Mexico

J. Chaudhry (UNM) CS 357 1 / 41 What we’ll do:

Random number generators Monte-Carlo integration Monte-Carlo

Our testing and our methods often rely on “randomness” and the impact of finite precision Random input Random perturbations Random sequences

J. Chaudhry (UNM) CS 357 2 / 41 Randomness

Randomness ≈ unpredictability One view: a sequence is random if it has no shorter description Physical processes, such as flipping a coin or tossing dice, are deterministic with enough information about the governing equations and initial conditions. But even for deterministic , sensitivity to the initial conditions can render the behavior practically unpredictable. we need random simulation methods

J. Chaudhry (UNM) CS 357 3 / 41 Quasi-Random Sequences

For some applications, reasonable uniform coverage of the sample is more important than the “randomness” True random samples often exhibit clumping Perfectly uniform samples uses a uniform grid, but does not scale well at high dimensions quasi-random sequences attempt randomness while maintaining coverage

J. Chaudhry (UNM) CS 357 4 / 41 Quasi-Random Sequences

quasi random sequences are not random, but give random appearance by design, the points avoid each other, resulting in no clumping

http://dilbert.com/strips/comic/2001-10-25/ http://www.xkcd.com/221/

J. Chaudhry (UNM) CS 357 5 / 41 Randomness is easy, right?

In May, 2008, Debian announced a vulnerability with OpenSSL: the OpenSSL pseudo-random number generator I the seeding process was compromised (2 years) I only 32,767 possible keys I seeding based on process ID (this is not entropy!) I all SSL and SSH keys from 9/2006 - 5/2008 regenerated I all certificates recertified Cryptographically secure pseudorandom number generator (CSPRNG) are necessary for some apps Other apps rely less on true randomness

J. Chaudhry (UNM) CS 357 6 / 41 Repeatability

With unpredictability, true randomness is not repeatable ...but lack of repeatability makes testing/debugging difficult So we want repeatability, but also independence of the trials

1 >> rand(’seed’,1234)

2 >> rand(10,1)

J. Chaudhry (UNM) CS 357 7 / 41 Pseudorandom Numbers

Computer algorithms for random number generations are deterministic

...but may have long periodicity (a long time until an apparent pattern emerges) These sequences are labeled pseudorandom Pseudorandom sequences are predictable and reproducible (this is mostly good)

J. Chaudhry (UNM) CS 357 8 / 41 Random Number Generators

Properties of a good random number generator: Random pattern: passes statistical tests of randomness Long period: long time before repeating Efficiency: executes rapidly and with low storage Repeatability: same sequence is generated using same initial states Portability: same sequences are generated on different architectures

J. Chaudhry (UNM) CS 357 9 / 41 Random Number Generators

Early attempts relied on complexity to ensure randomness “midsquare” method: square each member of a sequence and take the middle portion of the results as the next member of the sequence ...simple methods with a statistical basis are preferable

J. Chaudhry (UNM) CS 357 10 / 41 Linear Congruential Generators (LCGs)

Congruential random number generators are of the form:

xk = (axk−1 + c)( mod m)

where a and c are integers given as input.

x0 is called the seed Integer m − 1 is the largest integer representable Quality depends on a and c. The period will be at most m.

Example

Let a = 13, c = 0, m = 31, and x0 = 1.

1, 13, 14, 27, 10, 6,...

This is a permutation of integers from 1,..., 30, so the period is m − 1.

J. Chaudhry (UNM) CS 357 11 / 41 History From C. Moler, NCM

IBM used Scientific Subroutine Package (SSP) in the 1960’s the mainframes. Their random generator, rnd used a = 65539, c = 0, and m = 231. arithmetic mod 231 is done quickly with 32 bit words. multiplication can be done quickly with a = 216 + 3 with a shift and short add. Notice (mod m): xk+2 = 6xk+1 − 9xk ...strong correlation among three successive integers

J. Chaudhry (UNM) CS 357 12 / 41 History From C. Moler, NCM

Matlab used a = 75, c = 0, and m = 231 − 1 for a while period is m − 1. this is no longer sufficient

J. Chaudhry (UNM) CS 357 13 / 41 Linear Congruential Generators (LCGs) – Summary

sensitive to a and c be careful with supplied random functions on your period is m standard division is necessary if generating floating points in [0, 1).

J. Chaudhry (UNM) CS 357 14 / 41 what’s used?

Two popular methods: 1. rand() in Unix uses LCG with a = 1103515245, c = 12345, m = 231.

2. Method of Marsaglia (period ≈ 21430).

1 Initialize x0,..., x3 and c to random values given a seed 2

3 Let s = 2111111111xn−4 + 1492xn−31776xn−2 + 5115xn−1 + c 4 32 5 Compute xn = s mod 2 6 32 7 c = floor(s/2 )

In general, the digits in random numbers are not themselves random...some patterns reoccur much more often.

J. Chaudhry (UNM) CS 357 15 / 41 Fibonacci

produce floating-point random numbers directly using differences, sums, or products. Typical subtractive generator:

xk = xk−17 − xk−5

with “lags” of 17 and 5. Lags much be chosen very carefully negative results need fixing more storage needed than congruential generators no division needed very very good statistical properties long periods since repetition does not imply a period

J. Chaudhry (UNM) CS 357 16 / 41 Discrete random variables

x

values: x1,..., xn (the set of values is called sample space) n probabilities p1,..., pn with i=1 pi = 1 P throwing a die (1-based index)

values: x1 = 1, x2 = 2,..., x6 = 6

probabilities pi = 1/6

J. Chaudhry (UNM) CS 357 17 / 41 Expected value and variance

expected value: average value of the variable

n

E[x] = xjpj = Xj 1 variance: variation from the average

σ2[x] = E[(x − E[x])2] = E[x2] − E[x]2

Standard deviation: square root of variance. denoted by σ[x]

throwing a die expected value: E[x] = (1 + 2 + ··· + 6)/6 = 3.5 variance: σ2[x] = 2.916

J. Chaudhry (UNM) CS 357 18 / 41 Estimated E[x]

to estimate the expected value, choose a set of random values based on the probability and average the results

N 1 E[x] ≈ x N i = Xj 1 bigger N gives better estimates

throwing a die 3 rolls: 3, 1, 6 → E[x] ≈ (3 + 1 + 6)/3 = 3.33 9 rolls: 3, 1, 6, 2, 5, 3, 4, 6, 2 → E[x] ≈ (3 + 1 + 6 + 2 + 5 + 3 + 4 + 6 + 2)/9 = 3.51

J. Chaudhry (UNM) CS 357 19 / 41 Law of large numbers

by taking N to , the error between the estimate and the expected value is statistically zero. That is, the estimate will converge to the correct value

∞ N 1 P(E[x] = lim → x ) = 1 N N i i=1 ∞ X This is the basic idea behind Monte Carlo estimation

J. Chaudhry (UNM) CS 357 20 / 41 Continuous random variable and density function

random variable: x values: x ∈ [a, b] (the set of values is called sample space, which in this case is [a, b] ) b probability: density function ρ(x) > 0 with a ρ(x) dx = 1 d probability that the variable has value betweenR (c, d) ⊂ [a, b]: c ρ(x) R

J. Chaudhry (UNM) CS 357 21 / 41 Uniformly distributed random variable

ρ(x) is constant b a ρ(x) dx = 1 means ρ(x) = 1/(b − a) R

J. Chaudhry (UNM) CS 357 22 / 41 Continuous extensions to expectation and variance

Expected value

b E[x] = xρ(x) dx Za b E[g(x)] = g(x)ρ(x) dx Za Variance of random variable and a function of random variable b σ2[x] = (x − E[x])2ρ(x) dx = E[x2] − (E[x])2 Za b σ2[g(x)] = (g(x) − E[g(x)])2ρ(x) dx = E[g(x)2] − (E[g(x)])2 Za

Estimating the expected value of some g(x)

N 1 E[g(x)] ≈ g(x ) N i Xi=1 Law of large numbers valid for continuous case too

J. Chaudhry (UNM) CS 357 23 / 41 over intervals

Almost all systems provide a uniform random number generator on (0, 1) Matlab: rand

If we need a uniform distribution over (a, b), then we modify xk on (0, 1) by

(b − a)xk + a

Matlab: Generate 100 uniformly distributed values between (a, b) r = a + (b-a).*rand(100,1) Uniformly distributed integers: randi Random permutations: randperm

J. Chaudhry (UNM) CS 357 24 / 41 Non-uniform distributions (Not on Exam)

sampling nonuniform distributions is much more difficult if the cumulative distribution function is invertible, then we can generate the non-uniform sample from the uniform. Consider probability density function ρ(t) = λe−λt, t > 0 giving the cumulative distribution function as

x x −λ −λ −λ F(x) = λe tdt = −(e t) = (1 − e x)

Z0 0 thus inverting the CDF

xk = − log(1 − yk)/λ

where yk is uniform in (0, 1). ...not so easy in general

J. Chaudhry (UNM) CS 357 25 / 41 multidementional extensions

difficult domains (complex geometries) expected value

E[g(x)] = g(x)ρ(x) dx Zx∈Ω

Ω ⊂ Rd Density function for uniform distribution:

1 ρdx = 1 =⇒ ρ = V ZΩ

where V = Ω 1 dx is the volume of Ω.

J. Chaudhry (UNM)R CS 357 26 / 41 Monte Carlo

Central to the PageRank (and many many other applications in fincance, science, informatics, etc) is that we randomly process something what we want to know is “on average” what is likely to happen what would happen if we have an infinite number of samples? let’s take a look at integral (a discrete limit in a sense)

J. Chaudhry (UNM) CS 357 27 / 41 (deterministic) numerical integration

split domain into set of fixed segments sum function values with size of segments (Riemann!)

J. Chaudhry (UNM) CS 357 28 / 41 Monte Carlo Integration

We have for a random sequence x1,..., xn

n 1 1 f (x) dx ≈ f (x ) n i 0 = Z Xi 1

1 n=100;

2 x=rand(n,1);

3 a=f(x);

4 s=sum(a)/n;

J. Chaudhry (UNM) CS 357 29 / 41 Monte Carlo over random domains

We have for a uniform random sequence x1,..., xn in Ω

n 1 I = f (x) dx ≈ Q = V f (x ) n n i Ω = Z Xi 1 where V is the “volume” given by

V = dx ZΩ

Law of large numbers gurantees that Qn → I as n → Why? Recall that density for uniform distribution is ρ = 1/V. By Law of large numbers, ∞

n 1 1 f (x ) ≈ E[f (x)] = ρ f (x)dx = f (x)dx n i V = Ω Ω Xi 1 Z Z

J. Chaudhry (UNM) CS 357 30 / 41 Monte Carlo over random domains

Works in any dimension! One dimension: n b 1 f (x) dx ≈ (b − a) f (x ) n i a = Z Xi 1 where x1,..., xn are uniformly distributed in (a, b)

J. Chaudhry (UNM) CS 357 31 / 41 2d example: computing π

Use the unit square [0, 1]2 with a quarter-circle

1 (x, y) ∈ circle f (x, y) = 0 else 1 1 pi A − = f (x, y) dxdy = quarter circle 4 Z0 Z0

J. Chaudhry (UNM) CS 357 32 / 41 2d example: computing π

Estimate the area of the circle by randomly evaluating f (x, y)

N 1 A − ≈ f (x , y ) quarter circle N i i = Xi 1

J. Chaudhry (UNM) CS 357 33 / 41 2d example: computing π

By definition

Aquarter−circle = π/4

so N 4 π ≈ f (x , y ) N i i = Xi 1

J. Chaudhry (UNM) CS 357 34 / 41 2d example: computing π algorithm

1 input N 2 call rand in 2d 3 for i=1:N 4 sum = sum + f (xi, yi) 5 end 6 sum = 4 ∗ sum/N

J. Chaudhry (UNM) CS 357 35 / 41 2d example: computing π algorithm

The expected value of the error is O( √σ ) N convergence does not depend on dimension deterministic integration is very hard in higher dimensions deterministic integration is very hard for complicated domains Trapezoid rule in dimension : O( 1 ) d N2/d

J. Chaudhry (UNM) CS 357 36 / 41 Error in Monte Carlo Estimation

The distribution of an average is close to being normal, even when the distribution from which the average is computed is not normal.

What?

Let x1,..., xn be some independent random variables from any PDF

Each xi has expected value µ and standard deviation σ Consider the Average A = (x + ··· + x )/n n 1 n √ The expected value of An is µ and the standard deviation is σ/ n This follows from Central Limit Theorem √ What? The sample mean has an error of σ/ n

J. Chaudhry (UNM) CS 357 37 / 41 Simulation

Stochastic simulation mimics physical behavior through random ...also known as (no single MC method!) Usefulness: I nondeterministic processes I deterministic models that are too complicated to model I deterministic models with high dimensionality

J. Chaudhry (UNM) CS 357 38 / 41 Stochastic Simulation

Two requirements for MC: I knowing which probability distributions are needed I generating sufficient random numbers The probability distribution depends on the problem (theoretical or empirical evidence) The probability distribution can be approximated well by simulating a large number of trials

J. Chaudhry (UNM) CS 357 39 / 41 Stochastic Simulation: The Monte Hall Problem

Question You are on a gameshow. There are three doors that you have the option of opening. There is a prize behind one of the doors, and there are goats behind the other two. You select one of the doors. One of the remaining two doors is opened to reveal a goat. You are then given the option of opening your door or the other remaining unopened door. Should you switch. Determine the probability that you will win the prize if you switch.

Answer: Yes, always switch. Probability of success is 2/3. Can show using conditinal probability

J. Chaudhry (UNM) CS 357 40 / 41 Stochastic Simulation: The Monte Hall Problem

Solution via random simuation

1 input N 2 wins = 0 3 for i=1:N 4 Generate random permutation of goats and prize

5 You randomly choose a door

6 if the door had a goat, host always chooses the other door with the goat

7 else if the door had a prize, host randomly chooses one of the doors with the goat

8 end

9 You switch your choice

10 if the switched door had the prize

11 wins = wins + 1

12 end

13

14 end 15 probability of success = wins/N

J. Chaudhry (UNM) CS 357 41 / 41