A Note on Random Number Generation
Total Page:16
File Type:pdf, Size:1020Kb
A note on random number generation Christophe Dutang and Diethelm Wuertz September 2009 1 1 INTRODUCTION 2 \Nothing in Nature is random. number generation. By \random numbers", we a thing appears random only through mean random variates of the uniform U(0; 1) the incompleteness of our knowledge." distribution. More complex distributions can Spinoza, Ethics I1. be generated with uniform variates and rejection or inversion methods. Pseudo random number generation aims to seem random whereas quasi random number generation aims to be determin- istic but well equidistributed. 1 Introduction Those familiars with algorithms such as linear congruential generation, Mersenne-Twister type algorithms, and low discrepancy sequences should Random simulation has long been a very popular go directly to the next section. and well studied field of mathematics. There exists a wide range of applications in biology, finance, insurance, physics and many others. So 2.1 Pseudo random generation simulations of random numbers are crucial. In this note, we describe the most random number algorithms At the beginning of the nineties, there was no state-of-the-art algorithms to generate pseudo Let us recall the only things, that are truly ran- random numbers. And the article of Park & dom, are the measurement of physical phenomena Miller (1988) entitled Random generators: good such as thermal noises of semiconductor chips or ones are hard to find is a clear proof. radioactive sources2. Despite this fact, most users thought the rand The only way to simulate some randomness function they used was good, because of a short on computers are carried out by deterministic period and a term to term dependence. But algorithms. Excluding true randomness3, there in 1998, Japenese mathematicians Matsumoto are two kinds random generation: pseudo and and Nishimura invents the first algorithm whose quasi random number generators. period (219937 −1) exceeds the number of electron spin changes since the creation of the Universe The package randtoolbox provides R func- (106000 against 10120). It was a big breakthrough. tions for pseudo and quasi random number generations, as well as statistical tests to quantify As described in L'Ecuyer (1990), a (pseudo) the quality of generated random numbers. random number generator (RNG) is defined by a structure (S; µ, f; U; g) where 2 Overview of random genera- • S a finite set of states, • µ a probability distribution on S, called the tion algorithms initial distribution, • a transition function f : S 7! S, • a finite set of output symbols U, In this section, we present first the pseudo random • an output function g : S 7! U. number generation and second the quasi random 1quote taken from Niederreiter (1978). Then the generation of random numbers is as 2for more details go to http://www.random.org/ follows: randomness/. 3For true random number generation on R, use the random package of Eddelbuettel (2007). 1. generate the initial state (called the seed) s0 2 OVERVIEW OF RANDOM GENERATION ALGORITHMS 3 according to µ and compute u0 = g(s0), Finally, we generally use one of the three types 2. iterate for i = 1;::: , si = f(si−1) and ui = of output function: g(si). x • g : N 7! [0; 1[, and g(x) = m , Generally, the seed s0 is determined using the x • g : N 7!]0; 1], and g(x) = m−1 , clock machine, and so the random variates x+1=2 • g : N 7!]0; 1[, and g(x) = . u0; : : : ; un;::: seems \real" i.i.d. uniform random m variates. The period of a RNG, a key charac- teristic, is the smallest integer p 2 , such that N Linear congruential generators are implemented 8n 2 ; s = s . N p+n n in the R function congruRand. 2.1.1 Linear congruential generators 2.1.2 Multiple recursive generators There are many families of RNGs : linear congru- ential, multiple recursive,. and \computer oper- A generalisation of linear congruential generators ation" algorithms. Linear congruential generators are multiple recursive generators. They are based have a transfer function of the following type on the following recurrences f(x) = (ax + c) mod m1; xn = (a1xn−1 + ··· + akxn−kc) mod m; where a is the multiplier, c the increment and m the modulus and x; a; c; m 2 N (i.e. S is the set where k is a fixed integer. Hence the nth term of of (positive) integers). f is such that the sequence depends on the k previous one. A particular case of this type of generators is when xn = (axn−1 + c) mod m: x = (x + x ) mod 230; Typically, c and m are chosen to be relatively n n−37 n−100 prime and a such that 8x 2 ; ax mod m 6= 0. N which is a Fibonacci-lagged generator2. The The cycle length of linear congruential generators period is around 2129. This generator has will never exceed modulus m, but can maximised been invented by Knuth (2002) and is generally with the three following conditions called \Knuth-TAOCP-2002" or simply \Knuth- TAOCP"3. • increment c is relatively prime to m, • a − 1 is a multiple of every prime dividing m, An integer version of this generator is im- • a − 1 is a multiple of 4 when m is a multiple plemented in the R function runif (see RNG). of 4, We include in the package the latest double version, which corrects undesirable deficiency. As described on Knuth's webpage4 , the previous see Knuth (2002) for a proof. version of Knuth-TAOCP fails randomness test if we generate few sequences with several seeds. When c = 0, we have the special case of Park- The cures to this problem is to discard the first Miller algorithm or Lehmer algorithm (see Park & 2000 numbers. Miller (1988)). Let us note that the n + jth term can be easily derived from the nth term with a 2see L'Ecuyer (1990). puts to aj mod m (still when c = 0). 3TAOCP stands for The Art Of Computer Program- ming, Knuth's famous book. 1this representation could be easily generalized for 4go to http://www-cs-faculty.stanford.edu/ matrix, see L'Ecuyer (1990). ˜knuth/news02.html#rng. 2 OVERVIEW OF RANDOM GENERATION ALGORITHMS 4 2.1.3 Mersenne-Twister where >> u (resp. << s) denotes a rightshift (leftshift) of u (s) bits. At last, we transform random integers to reals with one of output These two types of generators are in the big fam- functions g proposed above. ily of matrix linear congruential generators (cf. L'Ecuyer (1990)). But until here, no algorithms Details of the order of the successive operations exploit the binary structure of computers (i.e. used in the Mersenne-Twister (MT) algorithm use binary operations). In 1994, Matsumoto and can be found at the page 7 of Matsumoto & Kurita invented the TT800 generator using binary Nishimura (1998). However, the least, we need operations. But Matsumoto & Nishimura (1998) to learn and to retain, is all these (bitwise) greatly improved the use of binary operations and operations can be easily done in many computer proposed a new random number generator called languages (e.g in C) ensuring a very fast algo- Mersenne-Twister. rithm. Matsumoto & Nishimura (1998) work on the The set of parameters used are finite set N2 = f0; 1g, so a variable x is represented by a vectors of ! bits (e.g. 32 bits). They use the following linear recurrence for the n + ith term: • (!; n; m; r) = (32; 624; 397; 31), upp low xi+n = xi+m ⊕ (xi jxi+1)A; upp where n > m are constant integers, xi • a = 0 × 9908B0DF; b = 0 × 9D2C5680; c = low (respectively xi ) means the upper (lower) ! − r 0 × EF C60000, (r) bits of xi and A a ! × ! matrix of N2. j is the upp low operator of concatenation, so xi jxi+1 appends the upper ! − r bits of xi with the lower r bits of • u = 11, l = 18, s = 7 and t = 15. xi+1. After a right multiplication with the matrix 1 A , ⊕ adds the result with xi+m bit to bit modulo two (i.e. ⊕ denotes the exclusive-or called xor). These parameters ensure a good equidistribution n!−r 19937 Once provided an initial seed x0; : : : ; xn−1, and a period of 2 − 1 = 2 − 1. Mersenne Twister produces random integers in 0;:::; 2!−1. All operations used in the recurrence The great advantages of the MT algorithm are are bitwise operations, thus it is a very fast a far longer period than any previous generators computation compared to modulus operations (greater than the period of Park & Miller (1988) used in previous algorithms. sequence of 232 − 1 or the period of Knuth (2002) around 2129), a far better equidistribution (since To increase the equidistribution, Matsumoto & it passed the DieHard test) as well as an very Nishimura (1998) added a tempering step: good computation time (since it used binary operations and not the costly real operation yi xi+n ⊕ (xi+n >> u); modullus). yi yi ⊕ ((yi << s) ⊕ b); yi yi ⊕ ((yi << t) ⊕ c); MT algorithm is already implemented in R (function runif). However the package yi yi ⊕ (yi >> l); randtoolbox provide functions to compute a 0 I 1Matrix A equals to !−1 whose right multi- new version of Mersenne-Twister (the SIMD- a oriented Fast Mersenne Twister algorithm) as well plication can be done with a bitwise rightshift operation and an addition with integer a. See the section 2 of as the WELL (Well Equidistributed Long-period Matsumoto & Nishimura (1998) for explanations. Linear) generator. 2 OVERVIEW OF RANDOM GENERATION ALGORITHMS 5 2.1.4 Well Equidistributed Long-period An usual measure of uniformity is the sum of Linear generators dimension gaps ! X The MT recurrence can be rewritten as ∆1 = δl: l=1 xi = Axi−1; Panneton et al.