<<

A Mathematical formulation of the

Hiroshi SUGITA

Department of Mathematics, Graduate School of Science, Osaka University

1 Introduction

Although admitting that the Monte Carlo method has been producing practical results in many fields, a number of researchers have the following serious suspicion about it.

The Monte Carlo method needs random numbers. But since no computer program can generate them, we execute a Monte Carlo method using a pseu- dorandom number generated by a computer program. Of course, such an expedient can never have any mathematical justification.

As a matter of fact, this suspicion is a misunderstanding caused by prejudice. In this article, we propose a proper mathematical formulation of the Monte Carlo method, which resolves this suspicion. The mathematical definitions of the notions of random number and pseudorandom number have been known for quite some time. Indeed, the notion of random number was defined in terms of computational complexity by Kolmogorov and others in 1960’s ([3, 4, 6]), while the notion of pseudorandom number was defined in the context of by Blum and others in 1980’s ([1, 2, 14]). But unfortunately, those definitions have been thought to be useless until now in understanding and executing the Monte Carlo method. It is not because their definitions are improper, but because our way of thinking about the Monte Carlo method has been improper. In this article, we propose a mathematical formulation of the Monte Carlo method which is based on and theoretically compatible with those definitions of random number and pseudorandom number. Under our formulation, as a result, we see the following.

The Monte Carlo method may not need random numbers; pseudorandom numbers may suffice. As a matter of fact, for the purpose of the Monte Carlo integration, there exist pseudorandom numbers which can serve as complete substitutes for random numbers.

The Monte Carlo integration is a numerical integration method making use of the law of large numbers. Since most of important applications of the Monte Carlo method are actually Monte Carlo integrations, the above fact is very significant.

∗ Ver.20151111 / This article is a digest of [11]. For details, see the full version [11]. 2 Overview

The Monte Carlo method is a numerical method to solve mathematical problems by computer-aided sampling of random variables. For each individual problem, we set up a probability space (Ω, F , P) — in this article, for simplicity, we assume it to be a proba- bility space of finite coin tosses

L {0,1}L + ({0, 1} , 2 , PL = the uniform probability measure), L ∈ N ,

— and a random variable S defined on it; S : {0, 1}L → R. We evaluate S (ω) by a computer for some chosen ω ∈ {0, 1}L, which procedure is called sampling.

Figure 1: Distribution and generic value of S (Conceptual figure)

an exceptional value a generic value

The Monte Carlo method is a kind of gambling as its name indicates. The aim of the player, say Alice, is to get a generic value (Figure 1) — a typical value or not an exceptional value — of S by sampling (§ 3.1). Suppose that she has no idea about which ω she should choose to get a generic value of S . She chooses an ω ∈ {0, 1}L of her own will, realizing the risk of getting an exceptional value of S . Her risk is measured by the probability PL(A) of a set { } A := ω ∈ {0, 1}L S (ω) is an exceptional value of S .

†1 Since S seldom takes exceptional values, we have PL(A) ≪ 1, and hence, Alice may well think that she will almost certainly get a generic value of S . Of course she will, when L is small. But when L ≫ 1, say L = 108, it is not sure that she will even though PL(A) ≪ 1. In case L ≫ 1, how Alice chooses an ω ∈ {0, 1}L comes into question. Indeed, to specify an ω ∈ {0, 1}L, when L = 108 for example, she cannot help using a computer because of its huge amount of information; ω is approximately a 12MByte data. Obvi- ously, a 12MByte data is too huge for her to input directly from a keyboard to a computer. So, she needs some device. But whatever device she may use, those ω’s∈ {0, 1}L she can choose of her own will are very limited (§ 4). By this reason, even though PL(A) ≪ 1, we cannot say that Alice is able to get a generic value of S almost certainly. Therefore, to let †1a ≪ b stands for “a is much less that b”, while a ≫ b stands for “a is much greater than b ”.

2 the risk evaluation PL(A) have a substantial meaning, it is natural to think that Alice needs an ω ∈ {0, 1}L which she cannot choose of her own will — namely, a random number.†2 On the other hand, S in an actual question must not be an arbitrary random variable, but one which represents some significant quantity. Thus S is very special among all the functions {0, 1}L → R. Therefore Alice may possibly be able to get a generic value of the special random variable S for a special ω — not a random number — which she can choose of her own will. It is that is planned to realize this idea. A pseudorandom generator is a mapping which stretches short {0, 1}-sequences into long {0, 1}-sequences. For example, suppose that Alice uses a pseudorandom generator g : {0, 1}n → {0, 1}L, n < L, to sample S . First, she chooses a seed ω′ ∈ {0, 1}n of her own will. Here n should be so small that she may input ω′ directly from a keyboard to a computer. Then the computer generates a pseudorandom number g(ω′) ∈ {0, 1}L from her seed ω′, and she finally gets a sample S (g(ω′)) of S . As a result, she is now betting on whether S (g(ω′)) is a generic value of S or not. Her risk is now measured by the ′ probability Pn (g(ω ) ∈ A). This probability naturally depends on g, but if there exists a ′ pseudorandom generator g such that Pn(g(ω ) ∈ A) ≪ 1 holds, Alice can actually get a generic value of S with high probability. Such a g is said to be secure against A (§ 5.2). The problem of sampling in a Monte Carlo method is solved by finding such a secure pseudorandom generator. For a general random variable S , although there are many candidates, there is no pseudorandom generator which is proved to be secure against the set of those ω’s ∈ {0, 1}L that yield exceptional values of S (§ 5.3). However, if we restrict the use of pseudorandom generator to the Monte Carlo integration, i.e., if S in question is a sample mean of i.i.d.†3 random variables, there exists a pseudorandom generator which is secure against the set of those ω’s ∈ {0, 1}L that yield exceptional values of S . Such a pseudorandom generator has already been used in practice for not too large Monte Carlo integrations (§ 6.2).

3 Monte Carlo method as gambling

Mathematical problems should be solved by sure methods, if it is possible. But some problems such as extremely complicated ones or those that lack a lot of information can only be solved stochastically. Those treated by the Monte Carlo method are such prob- lems.

3.1 Player’s aim As a mathematical formulation, it is proper to think the Monte Carlo method as stochastic game, i.e., gambling. The aim of the player, Alice, is to get a generic value of a given random variable. (The mathematical problem in question is assumed to be solved by a generic value of the random variable. See examples below.) Of course, Alice has a risk to get an exceptional value, which risk should be measured in terms of probability. The following is a very small example of the Monte Carlo method (without computer).

†2By this reason, some people use physically generated random numbers. But this article deals with only solutions by pseudorandom generators. †3i.i.d. stands for independently identically distributed.

3 Example 1 An urn contains 100 balls, 99 of which are numbered r and one of which is numbered r + 1. Alice draws a ball from the urn, and guesses the number r to be the number of her ball. The probability that she fails to guess the number r correctly is 1/100.

If we state this example in terms of gambling, it goes that “Alice wins if she draws a generic ball, i.e., a ball numbered r, and loses otherwise. The probability that she loses is 1/100.” In general, the player cannot know whether the aim has been attained or not even after the sampling. Indeed, in the above example, although the risk is measured, Alice cannot tell if her guess is correct, even after she draws a ball.†4 There are some cases where exceptional values are needed. For example, we of- ten seek for the minimum value of a complicated random variable X by a Monte Carlo method. In such a case, we define another random variable S so that a generic value of S is an exceptional value of X. Look at the following example.

Example 2 Suppose that Pr(X < c) = 1/10000. So X can be less than c, but the { }40000 probability is very small. Take a sequence Xk k=1 of independent copies of X, and †5 define S := min1≤k≤40000 Xk. Then we have ( ) 1 40000 Pr(S < c) = 1 − 1 − ≈ 1 − e−4 = 0.981 .... 10000

Thus S takes a value less than c with high probability 0.98.

3.2 An exercise To implement Example 1, we need only an urn and 100 numbered balls and nothing else. But actual Monte Carlo methods are implemented on such large scales that we need high- powered computers. In this article, we are going to solve the following exercise by a Monte Carlo method.

Exercise When we toss a coin 100 times, what is the probability p that Heads comes up at least 6 times in succession?

We apply the interval estimation in mathematical statistics. Repeat independent trials of “100 coin tosses” N times, and let S N be the number of the occurrences of “Heads comes up at least 6 times in succession” among the trials. Then by the law of large numbers, the sample mean S N/N is a good estimator for p when N is large. More concretely;

6 6 Example 3 Let N := 10 = 1, 000, 000. Then the mean and the variance of S 106 /10 are [ ] [ ] S 6 S 6 p(1 − p) 1 E 10 = p, V 10 = ≤ , 106 106 106 4 · 106

†4Many selections of our life are certainly gambles. It is often the case where we do not know whether the selection was correct or not . . . . †5 x ≈ y means that x and y are approximately equal to each other.

4 respectively. Hence by Chebyshev’s inequality, ( )

S 106 1 1 1 Pr − p ≥ ≤ · 2002 = (1) 106 200 4 · 106 100

6 holds. In other words, a generic value of S 106 /10 is an approximate value of p.

In Example 3, we may think that the inequality (1) measures the risk. Of course, we do not toss a coin 100 × 106 = 108 times, instead we use a computer. In order to formulate things mathematically, we realize the random variable S 106 of Ex- 108 Ω ample 3 on the coin tossing probability space (Ω := {0, 1} , 2 , P108 ) as follows. We first define a function X : {0, 1}100 → {0, 1} by

∏l+5 100 X(ξ1, . . . , ξ100):= max ξi, (ξ1, . . . , ξ100) ∈ {0, 1} . 1≤l≤100−5 i=l

This means that X = 1 if there are 6 successive 1’s in (ξ1, . . . , ξ100) and X = 0 otherwise. 108 6 Next we define Xk : {0, 1} → {0, 1}, k = 1, 2,..., 10 , by

108 Xk(ω):= X(ω100(k−1)+1, . . . , ω100k), ω = (ω1, . . . , ω108 ) ∈ {0, 1} ,

108 and S 106 : {0, 1} → Z by

∑106 108 S 106 (ω):= Xk(ω), ω ∈ {0, 1} . k=1

Defining A0 of ω’s that yield exceptional values of S 106 by { } ω 108 S 106 ( ) 1 A0 := ω ∈ {0, 1} − p ≥ , (2) 106 200 we have P108 (A0) ≤ 1/100 according to (1). Then we can regard Example 3 as a gamble in the following way.

108 Example 4 Alice chooses an ω ∈ {0, 1} . If ω < A0 holds, she wins, if ω ∈ A0, she loses. The probability that she loses is less than or equal to 1/100.

4 Problem of random number

There are no theoretical differences between Example 1 and Example 4 except scales. But the difference of scales yields an essential difference in practice. Now, suppose that Alice is going to choose an ω ∈ {0, 1}108 to play the gamble of Example 4. But as a matter of fact, those ω’s ∈ {0, 1}108 she can choose of her own will are very few and hence limited. Consequently, even though P108 (A0) ≪ 1, it is not necessarily easy for her to win the game. Let us look more closely at the situation. Since each element of {0, 1}108 is a very large data, i.e., about 12MByte, Alice cannot help using a computer to choose an ω ∈

5 {0, 1}108 . But it is quite impossible to input it directly from a keyboard to the computer. Suppose that Alice can input at most 1,000 bit data directly from the keyboard, and that a certain computer program generates an element of {0, 1}108 from her input.†6 Then the number of those ω’s ∈ {0, 1}108 that she can choose is at most 21,001. (This is because the number of all the l bit data is 2l, and hence the number of all data of at most l bit is 20 + 21 + 22 + ··· + 2l = 2l+1 − 1.) Since the number of all the elements of {0, 1}108 is 2108 , we see how few those ω’s ∈ {0, 1}108 she can choose are. Even if she were able to input at most 108 − 10 bit data, the number of ω’s ∈ {0, 1}108 she can choose would be at most 2108−9, which is only 1/512 of all the ω’s ∈ {0, 1}108 . In other words, at least 511/512 of them require more than 108 − 9 bit input to be specified. If an ω ∈ {0, 1}L, L ≫ 1, requires an input which is almost as long as ω itself to be specified, it is called a random number.†7 To specify a random number, there is no more efficient way than inputting it as it is. When L ≫ 1, most of elements of {0, 1}L are random numbers. The risk evaluation P108 (A0) ≤ 1/100 of Example 4 assumes that Alice can choose any ω ∈ {0, 1}108 with equal probability. This means that she should choose it among random numbers, because they account for nearly all sequences. This is the reason why random number is needed for the Monte Carlo method. However, although there are so many random numbers, she cannot choose any one of them of her own will. This is the most essential problem of sampling in the Monte Carlo method.

5 Pseudorandom generator

Pseudorandom generator is a device to get generic values of random variables with high probability without using random numbers. An important property that pseudorandom generators should have is discussed here.

5.1 Definition and role To play the gamble of Example 4, anyhow, Alice has to choose an ω ∈ {0, 1}108 . Let us suppose that she uses the most used device to do it, namely, a pseudorandom generator.

Definition 5 A function g : {0, 1}n → {0, 1}L is called a pseudorandom generator if n < L. The input ω′ ∈ {0, 1}n of g is called a seed,†8 and the output g(ω′) ∈ {0, 1}L a pseudorandom number.

To produce a pseudorandom number, we need to choose a seed ω′ ∈ {0, 1}n of g : {0, 1}n → {0, 1}L, which procedure is called initialization.†9 For practical use, n should be so small that we may input any seed ω′ ∈ {0, 1}n directly from a keyboard, and the program of the function g should work sufficiently fast.

†6Such a computer program is called a pseudorandom generator. †7In the IT terminology, if an ω ∈ {0, 1}L can be specified by a shorter input ω′ ∈ {0, 1}n, L > n, we say that ω is compressed into ω′. Thus a random number is an incompressible element of {0, 1}L. †8We also call it an initial value. †9It is also called randomization.

6 Example 6 In Example 4, suppose that Alice uses a pseudorandom generator g : {0, 1}238 → {0, 1}108 , for instance.†10 She chooses a seed ω′ ∈ {0, 1}238 of g and inputs it from a keyboard to a computer. Since ω′ is only a 238 bit data (≈ 30 letters of alpha- ′ bet), it is easy to input from a keyboard. Then the computer produces S 106 (g(ω )).

The reason why Alice uses a pseudorandom generator is because her input ω ∈ {0, 1}108 is too large. If it is short enough, a pseudorandom generator is not necessary. For example, when drawing a ball from the urn in Example 1, who on earth uses a pseu- dorandom generator ?

5.2 Security Let us continue to consider the case of Example 6. Alice can choose any seed ω′ ∈ {0, 1}238 of the pseudorandom generator g freely of her own will. Her risk is measured by ( ) ω′ S 106 (g( )) 1 P238 − p ≥ , (3) 106 200 which we need to calculate. Of course, the probability (3) depends on g. If this probability ′ — i.e., the probability that her sample S 106 (g(ω )) is an exceptional value of S — is large, then it is difficult for her to win the game, which is not desirable. So we give the following (somewhat vague) definition; we say that a pseudorandom generator g : {0, 1}n → {0, 1}L, n < L, is secure against a set A ⊂ {0, 1}L if it holds that

′ PL(ω ∈ A) ≈ Pn(g(ω ) ∈ A).

238 108 In Example 6, if g : {0, 1} → {0, 1} is secure against A0 of (2), for the majority of the seeds ω′ ∈ {0, 1}238, which Alice can choose of her own will, the samples S (g(ω′)) will be generic values of S . In this case, random numbers are not necessary. In other words, in sampling a value of S , using g does not make Alice’s risk big, and so g is said to be secure. The problem of sampling in each Monte Carlo method — i.e., the problem of random number — will be resolved by finding a suitable secure pseudorandom generator. In general, such a pseudorandom generator that is secure against very many sets A’s is desirable. But there is no pseudorandom generator that is against every set A. Indeed, for a given pseudorandom generator g : {0, 1}n → {0, 1}L, set

n L Ag := g({0, 1} ) ⊂ {0, 1} .

n−L ′ Then we have PL(ω ∈ Ag) ≤ 2 and Pn(g(ω ) ∈ Ag) = 1, which means that g is not secure against Ag. Thus, when we consider the security of pseudorandom generator, we must restrict a class of sets A’s.

5.3 Computationally secure pseudorandom generator Suppose L ≫ 1, e.g., L = 108. For each set A ⊂ {0, 1}L, let us think about how to judge if a pseudorandom generator g is secure against it. To do this, first of all, we must write

†10The origin of the number 238 will soon be clear in Example 9.

7 a program which judges if a given ω ∈ {0, 1}L belongs to A. Since the number of all the sets A ⊂ {0, 1}L is 22L , the same number of such programs are needed. Then by the same argument in § 4, we know there are very many sets A’s for which we need about 2L bit long programs. Obviously, such long programs cannot be realized by computers in practice. Consequently, it is practically impossible to judge if ω ∈ A for most of A’s⊂ {0, 1}L. A pseudorandom generator g needs to be secure against only A for which ω ∈ A is practically judge-able by small amount of computations. Such a g is said to be compu- tationally secure (or cryptographically secure). In cryptography, however, the security of pseudorandom generator is not defined in terms of space complexity (i.e., length of program, as is discussed above), but it is defined in terms of time complexity (i.e., CPU time).†11 In Example 6, if the pseudorandom generator g : {0, 1}238 → {0, 1}108 is computa- ′ tionally secure, the distributions of S 106 (ω) and S 106 (g(ω )) will be so close to each other. Indeed, the fact that the function S 106 can be computed in practice means that for any c1 < c2 ∈ N, the set A(c1, c2):= {ω | c1 ≤ S 106 (ω) ≤ c2} is a computationally judge-able set. Therefore whatever p′ may be, the computational security of g implies that ( ) ( ) ω ω′ S 106 ( ) − ′ ≥ 1 ≈ S 106 (g( )) − ′ ≥ 1 . P108 p P238 p 106 200 106 200 Computationally secure pseudorandom generator is, theoretically speaking, the most complete multi-purpose pseudorandom generator. But, its existence has not been proved yet, which is one of the hardest problems in computer science. Moreover, the notion of computational security is, rigorously speaking, an asymptotic property, and hence, it is not clear that a computationally secure pseudorandom generator is surely useful for a practical use.

6 Monte Carlo integration

Let X be a function of m coin tosses, i.e., X : {0, 1}m → R, and let us consider to calculate the mean ∑ 1 E[X] = X(ξ) 2m ξ∈{0,1}m of X numerically. When m is small, we can directly calculate the finite sum of the right hand side. But when m is large, e.g., m = 100, the direct calculation becomes impossible in practice because of the huge amount of computation. In such a case, we estimate the mean of X applying the law of large numbers, which is called the Monte Carlo integration (Example 3). Most of scientific Monte Carlo methods aim at calculating some character- istics of distributions of random variables, which are actually Monte Carlo integrations.

6.1 i.i.d.-sampling { }N If we formulate Example 3 in a general setting, it goes as follows. Let Xk k=1 be a { , }m → R { }N sequence of independent copies of X : 0 1 , and S N be their sum. Namely, Xk k=1 †11Since a short program can take time if it includes many loops, it is practically significant to define the security in terms of time complexity.

8 and S N are functions of Nm coin tosses, which are written down as m Nm Xk(ω):= X(ωk), ωk ∈ {0, 1} , ω = (ω1, . . . , ωN) ∈ {0, 1} , (4) ∑N S N(ω):= Xk(ω). (5) k=1

Then if N is large enough, a generic value of S N/N becomes an approximated value of E[X] by the law of large numbers. The estimation of the mean of X by sampling the random variable S N/N is called the i.i.d.-sampling. Since the means of S N/N and X (i.e., the integrations with respect to PNm and Pm, respectively) are the same (i.e., E[S N/N] = E[X]), and their variances satisfy V[S N/N] = V[X]/N, we have the following inequality due to Chebyshev ;†12 ( ) ω S N( ) V[X] PNm − E[X] ≥ δ ≤ . (6) N Nδ2 This is the risk measurement of the following gamble; when Alice chooses an ω ∈ Nm {0, 1} , she wins if ω < A1, and she loses if ω ∈ A1, where { } ω Nm S N( ) A1 := ω ∈ {0, 1} − E[X] ≥ δ . (7) N

6.2 Random Weyl sampling

In the Monte Carlo integration, the object random variable S N to sample has a very special form. Utilizing this fact, we can concretely construct a secure pseudorandom generator to sample it. Before doing it, we need a couple of notations; let

−m m 1 Dm := { i2 | i = 0, ..., 2 − 1 } ⊂ T . (8)

Let Bm be the algebra generated by the collection of sets Im := {[a, b)|a, b ∈ Dm}. Namely, each element of Bm is a finite union of some elements of Im. Let P(m) be the uniform 1 probability measure on Dm. For each m ≥ 1 and each x ∈ T , let m m ⌊x⌋m := ⌊2 x⌋/2 ∈ Dm. Definition 7 (cf. [7, 13]) Let j ∈ N+, and set

′ ′ j+1 Zk(ω ):= ⌊x + kα⌋m ∈ Dm, ω = (x, α) ∈ Dm+ j × Dm+ j, k = 1, 2, 3,..., 2 . Then we define a pseudorandom generator g : {0, 1}2m+2 j → {0, 1}Nm, N ≤ 2 j+1, by

′ ′ ′ ′ N Nm g(ω ):= (Z1(ω ), Z2(ω ),..., ZN(ω )) ∈ (Dm)  {0, 1} , (9) ′ 2m+2 j ω = (x, α) ∈ Dm+ j × Dm+ j  {0, 1} . The numerical integration method using the pseudorandom generator (9) for the sampling †13 of S N is called the random Weyl sampling (abbreviated as RWS). †12In general, as E[X] is unknown, V[X] is unknown, too. In this sense, this risk measurement is not complete. But under some circumstances, an upper bound V[X] can be obtained (e.g., Example 3), the risk measurement then becomes complete. †13RWS presented here is slightly improved from the one introduced by [7, 13].

9 Theorem 8 The pseudorandom generator g : {0, 1}2m+2 j → {0, 1}Nm of (9) satisfies that for S N of (5), ′ E[S N(g(ω ))] = E[S N(ω)] (= NE[X]), ′ V[S N(g(ω ))] = V[S N(ω)] (= NV[X]). Here ω′ and ω are assumed to be distributed uniformly in {0, 1}2m+2 j and in {0, 1}Nm, respectively. From this, as is seen in (6), Chebyshev’s inequality ( ) ω′ ′ S N(g( )) V[X] P2m+2 j(g(ω ) ∈ A1) = P2m+2 j − E[X] ≥ δ ≤ N Nδ2 follows. In this sense, g is secure against A1 of (7).

Proof. Step 1. (cf. [7, 13]) Under the uniform (direct product) probability measure P(m+ j)⊗ × { }2 j+1 P(m+ j) on Dm+ j Dm+ j, we show that Zk k=1 are pairwise independent, and that each Zk is distributed uniformly in Dm. To this end, we prove that for any Bm-measurable functions F, G : T1 → R and for any 1 ≤ k < k′ ≤ 2 j+1, it holds that ∫ ∫ 1 1 E[F(Zk′ )G(Zk)] = F(t)dt G(s)ds. (10) 0 0

Here E denotes the mean under P(m+ j) ⊗ P(m+ j). By the Bm-measurability of F and G, we have

+ + ( ) ( ) ∑2m j ∑2m j ′ 1 p k q p kq E[F(Z ′ )G(Z )] = F + G + k k 22m+2 j 2m+ j 2m+ j 2m+ j 2m+ j q=1 p=1 + m+ j ( ) ∑2m j 2 ∑+kq ′ ( ) 1 p (k − k)q p = F + G 22m+2 j 2m+ j 2m+ j 2m+ j q=1 p=1+kq + + ( ) ∑2m j ∑2m j ′ ( ) 1 p (k − k)q p = F + G . (11) 22m+2 j 2m+ j 2m+ j 2m+ j q=1 p=1

Now, let us assume that 0 < k′ − k = 2il ≤ 2 j+1 − 1, where 0 ≤ i ≤ j and l is an odd number. Then we have

+ ( ) + ( ) ∑2m j ′ ∑2m j 1 p (k − k)q 1 p lq F + = F + . (12) 2m+ j 2m+ j 2m+ j 2m+ j 2m+ j 2m+ j−i q=1 q=1

m+ j−i m+ j−i For each r = 1, 2, 3,..., 2 , there exists a qr such that lqr ≡ r (mod 2 ). Since l is odd,

#{1 ≤ q ≤ 2m+ j | lq ≡ r (mod 2m+ j−i)} m+ j m+ j−i = #{1 ≤ q ≤ 2 | lq ≡ lqr (mod 2 )} m+ j m+ j−i = #{1 ≤ q ≤ 2 | l(q − qr) ≡ 0 (mod 2 )} m+ j m+ j−i = #{1 ≤ q ≤ 2 | q ≡ qr (mod 2 )} = 2i.

10 From this, it follows that + ( ) + − ∑2m j 2∑m j i ( ) 1 p lq 1 p r F + = F + 2m+ j 2m+ j 2m+ j−i 2m+ j−i 2m+ j 2m+ j−i q=1 r=1 2∑m+ j−i ( ) 1 r = F 2m+ j−i 2m+ j−i ∫ r=1 1 = F(t)dt. (13) 0 By (11), (12) and (13), we see   + + ( ) ∑2m j  ∑2m j ′  ( ) 1  1 p (k − k)q  p ′ =  +  E[F(Zk )G(Zk)] m+ j  m+ j F m+ j m+ j  G m+ j 2 = 2 = 2 2 2 p 1  q 1  + + ( ) ∑2m j  ∑2m j  ( ) 1  1 p lq  p =  F +  G 2m+ j 2m+ j 2m+ j 2m+ j−i 2m+ j p=1 q=1 ∫ ∑2m+ j ( ) ∫ ∫ 1 1 p 1 1 = F(t)dt · G = F(t)dt G(s)ds. 2m+ j 2m+ j 0 p=1 0 0 Thus (10) is proved. ′ m Step 2. First, since each Zk(ω ) is distributed uniformly in {0, 1} , we see ′ E[S N(g(ω ))] = NE[X]. Next, the pairwise independence implies that    2 ∑N ( )  ω′ =  ω′ −   V[S N(g( ))] E  X(Zk( )) E[X]   k=1 ∑N ∑N [( ′ )( ′ )] = E X(Zk(ω )) − E[X] X(Zk′ (ω )) − E[X] k=1 k′=1 ∑N [( ) ] ′ 2 = E X(Zk(ω )) − E[X] k=1 ∑ [( ′ )( ′ )] +2 E X(Zk(ω )) − E[X] X(Zk′ (ω )) − E[X] 1≤k 106 = N and 2m + 2 j = 238. In a practical Monte Carlo integration, the sample size N is not determined in advance, but it is usually determined in doing numerical experiments. To be ready for such situations, it is a good idea to let j be a little on the big side.

11 Since Alice can easily choose any seed ω′ ∈ {0, 1}238 of her own will, she no longer needs a long random number. Here is a concrete example. Instead of her, the author chose the ′ 238 following seed ω = (x, α) ∈ D119 × D119  {0, 1} written in dyadic expansion;

x = 0.1110110101 1011101101 0100000011 0110101001 0101000100 0101111101 1010000000 1010100011 0100011001 1101111101 1101010011 111100100, α = 0.1100000111 0111000100 0001101011 1001000001 0010001000 1010101101 1110101110 0010010011 1000000011 0101000110 0101110010 010111111.

′ Then the computer calculated as S 106 (g(ω )) = 546, 177. In this case, ′ S 6 (g(ω )) 10 = 0.546177 106 is the estimated value of the probability p.

Remark 10 When Alice executes RWS, we can advise her a little in choosing a seed ω′ = (x, α) ∈ {0, 1}2m+2 j. That is, she should not choose a particularly simple α. Indeed, if she chooses an extremely simple one, such as α = (0, 0,..., 0) ∈ {0, 1}m+ j, the sampling will certainly end in failure.

Remark 11 In case 2m + 2 j ≫ 1, the problem of random number again prevents Alice even from choosing a seed ω′ ∈ {0, 1}2m+2 j of RWS of her own will. In such a case, she is forced to choose it by an auxiliary pseudorandom generator g′ : {0, 1}n → {0, 1}2m+2 j. Then we do not know whether or not the composite generator g◦g′ : {0, 1}n → 2m+2 j Nm {0, 1} → {0, 1} is secure against A1 of (7).

Example 12 Let us apply a pairwise independent sampling such as RWS to search the minimum value of the random variable X which was dealt with in Example 2. Suppose < = / ′ , ′ ,..., ′ that Pr(X c) 1 10000∑ and that X1 X2 X40000 are pairwise independent copies of ′ 40000 X. Then setting S := = 1{ ′ < }, we have k 1 Xk c ( ) ′ ′ 1 1 E[S ] = 4, V[S ] = 40000V[1{X′

Therefore Chebyshev’s inequality implies that ( ) 4 3 Pr(S ′ ≥ 1) ≥ Pr |S ′ − 4| < 4 ≥ 1 − = . 42 4 Thus the random variable ′ min Xk 1≤k≤40000 takes values less than c with probability at least 3/4.

12 7 From viewpoint of mathematical statistics

7.1 Random sampling We have formulated the Monte Carlo method as gambling and we have considered that the seed ω′ ∈ {0, 1}n of a pseudorandom generator g is chosen by Alice of her own will. But from the viewpoint of mathematical statistics, this is not a good formulation, because sampling should be done randomly in order to guarantee the objectivity of the result. Indeed, in the case of RWS, as we mentioned in Remark 10, Alice can choose a bad seed on purpose, i.e., the result may depend on the player’s will. Of course, it is impossible to discuss the objectivity of sampling rigorously. We here simply assume that Heads or Tails of coin tosses do not depend on anyone’s will. Then, for instance, when we choose a seed ω′ ∈ {0, 1}n, we toss a coin n times, record 1 if Heads comes up and 0 if Tails does at each coin toss, define ω′ as the recorded {0, 1}-sequence, and finally compute S (g(ω′)), which completes an objective sampling. As a matter of fact, Example 9 was performed in this way. This method cannot be used to choose a very long ω ∈ {0, 1}L. The point is that the pseudorandom generator g : {0, 1}n → {0, 1}L makes the input shorter so that this method may become executable.

7.2 Test for pseudorandom generator Many statistical tests for pseudorandom generators have been done in the following man- ner. Let g : {0, 1}n → {0, 1}L be a pseudorandom generator to test.

(1) Decide what test to do. (Test of run, Poker test, . . . )

(2) For randomly chosen seeds ω′ ∈ {0, 1}n, generate pseudorandom numbers g(ω′) by g, calculate the portion of rejected ones.

(3) If the portion of rejected ones is close to the significance level of the test, accept g, if it is much greater than the significance level, reject g.

In the above procedure, if we let A denote the rejection region of the test chosen at (1), ′ what is done at (2) is the estimation of the probability Pn(g(ω ) ∈ A). And at (3), we check if it is close to the significance level PL(ω ∈ A). As a result, this test can be said to be a test for the security of g against A.

8 Concluding remarks

As is seen in this article, needless to say, secure pseudorandom generators are useful. Here we mention about the relation between random number and probability theory. In Kolmogorov’s modern probability theory, a random quantity is expressed as a ran- L {0,1}L dom variable X, which is a function defined on a probability space, say ({0, 1} , 2 , PL), i.e., X : {0, 1}L → R. We think X random by the following interpretation; an ω ∈ {0, 1}L is randomly chosen and as a result X(ω) becomes random. But in probability theory, we

13 always deal with X as just a function and we never mind how ω is chosen. Since ran- domness does lie in the process how ω is chosen, probability theory does not mind what is. Suffering from Parkinson’s disease in his later years, Kolmogorov devoted himself to the question “What is randomness?”, which his probability theory had been avoiding. Finally, he answered the question, by establishing the notion of random number. Theo- retically, we do not need the notion of random number in studying probability theory, but recognizing it brings us deep understanding of not only the Monte Carlo method but also probability theory itself. Let us explain it below. According to Kolmogorov, studying randomness is equivalent to studying random number. The existence of random numbers becomes prominent, only when the sample space {0, 1}L in question is huge. We therefore know that it is important to study the case L ≫ 1. So, let us suppose L ≫ 1. Then, we cannot choose any one of random numbers in {0, 1}L of our own will, although they account for nearly all sequences. This means that L assuming the uniform probability measure PL implies that ω ∈ {0, 1} is not assumed to be chosen by anyone’s will, but by some method beyond man’s will, i.e., at random. Thus L {0,1}L the probability space ({0, 1} , 2 , PL) provides a framework to study randomness. It is difficult to get any knowledge about individual random numbers. But since ran- dom numbers account for nearly all sequences, it is a good idea to study characteristic properties that nearly all sequences share. Such properties have been minutely studied in probability theory — properties described in various limit theorems, such as law of large numbers, central limit theorems, etc. Probably, limit theorems are the only mathematical formulation that enables us to investigate randomness concretely. This explains why limit theorems are so much studied in probability theory. What is really amazing is that since a long time before the discovery of the notion of random number, the probabilists of great insight had recognized the importance of limit theorems and had made a lot of efforts to study them.

References

[1] L. Blum, M. Blum and M. Shub, A simple unpredictable pseudorandom number generator, SIAM J. Comput., 15-2 (1986), 364–383.

[2] M. Blum and S. Macali, How to generate cryptographically strong sequences of pseudorandom bits, SIAM J. on Computing, vol. 13, (1984) 850–864. A preliminary version appears in Proceedings of the IEEE Foundations of Comput. Sci. (1982), 112–117.

[3] G.J. Chaitin, Algorithmic information theory, IBM J. Res. Develop., 21 (1977), 350– 359.

[4] A.N. Kolmogorov, Selected works of A. N. Kolmogorov. Vol. III. Information theory and the theory of algorithms, Edited by A. N. Shiryayev [A. N. Shiryaev]. Translated from the 1987 Russian original by A. B. Sossinsky. Mathematics and its Applica- tions (Soviet Series), 27. Kluwer Academic Publishers Group, Dordrecht, (1993) xxvi+275 pp.

14 [5] M. Luby, Pseudorandomness and cryptographic applications, Princeton Computer Science Notes, Princeton University Press, (1996).

[6] P. Martin-Lof,¨ The definition of random sequences, Inform. Control, 9 (1966), 602– 619.

[7] H. Sugita, Robust numerical integration and pairwise independent random variables, Jour. Comput. Appl. Math., 139 (2002), 1–8.

[8] H. Sugita, Dynamic random Weyl sampling for drastic reduction of randomness in Monte Carlo integration, Math. Comput. Simulation, 62 (2003), 529–537.

[9] H. Sugita, Numerical integration of complicated functions and random sampling (in Japanese), “Sugaku”, 56-1, Iwanami-shoten (2004), 1–17.(English translation, SUGAKU EXPOSITIONS, 19-2, AMS, December 2006, 153–169.)

[10] H. Sugita, Security of Pseudo-random Generator and Monte Carlo Method, Monte Carlo Methods and Appl., 10-3, VSP, (2004), 609–615.

[11] H. Sugita, Monte Carlo method, Random number, and Pseudorandom number, MSJ Memoirs vol.25 (2011), xiv+133 pp.

[12] H. Sugita, The Random Sampler, available at; http://www.math.sci.osaka-u.ac.jp/˜sugita/mathematics.html.

[13] H. Sugita and S. Takanobu, Random Weyl sampling for robust numerical integration of complicated functions, Monte Carlo Methods and Appl., 6-1, VSP, (1999), 27–48.

[14] A. Yao, Theory and applications of trapdoor functions, Proceedings of the IEEE Foundations of Comput. Sci., (1982), 80–91.

15