Monte Carlo Automatic Integration with Dynamic Parallelism in CUDA

Total Page:16

File Type:pdf, Size:1020Kb

Monte Carlo Automatic Integration with Dynamic Parallelism in CUDA Monte Carlo Automatic Integration with Dynamic Parallelism in CUDA Elise de Doncker, John Kapenga and Rida Assaf Abstract The rapidly evolving CUDA environment is well suited for numerical in- tegration of high dimensional integrals, in particular by Monte Carlo or quasi-Monte Carlo methods. With some care, near peak performance can be obtained on impor- tant applications. A basis for efficient numerical integration using kernels in CUDA GPUs is presented, showing several ways to use CUDA features, provide automatic error control and prevention or detection of roundoff errors. This framework allows easy extension to multiple GPUs, clusters and clouds for addressing problems that were impractical to attack in the past, and is the basis for an update to the PARINT numerical integration package. 1 Introduction Problems in computational geometry, computational physics, computational fi- nance, and other fields give rise to computationally expensive integrals. This chap- ter will address applications of CUDA programming for Monte Carlo integration. A Monte Carlo method approximates the expected value of a stochastic process by sampling, i.e., by performing function evaluations over a large set of random points, and returns the sample average of the evaluations as the end result. The functions that are found in the application areas mentioned above are usually not smooth and may have singularities within the integration domain, which enforces the generation of large sets of random numbers. The algorithms described here will be incorporated in the PARINT multivariate integration package, where we are concerned with the automatic computation of an integral approximation Elise de Doncker, John Kapenga and Rida Assaf Western Michigan University, 1903 W. Michigan Avenue, Kalamazoo MI 49008 e-mail: felise.dedoncker,john.kapenga,[email protected] 1 2 Elise de Doncker, John Kapenga and Rida Assaf Z Q[D] f ≈ I [D] f = f (x) dx; D and an estimate or bound E [D] f for the error E[D] f = jI [D] f − Q[D] f j: Thus PARINT is set up as a black box, to which the user specifies the integration problem including a requested accuracy and a limit on the number of integrand evaluations. In its current form, PARINT supports the computation of multivariate integrals over hyper-rectangular and simplex regions. The user specifies the region D by the upper and lower integration limits in case of a hyper-rectangular region, or by the vertex coordinates of a simplex. The integrand f : D ! Rk is supplied as a function written in C (or which consists of C callable code). The desired accuracy is determined by an absolute and a relative error tolerance, ea and er; respectively, and it is the objective to either satisfy the prescribed accuracy, jQ[D] f − I [D] f j ≤ E [D] f ≤ maxfea;erjI [D] f jg; (1) within the allowed number of function evaluations, or flag an error condition if the limit has been reached. In the remainder of this chapter we will assume that the integrand f is a single-valued function (k = 1), and D is assumed to be the d- dimensional unit hypercube, D = [0;1]d (and will generally be omitted from the notation). The available approaches in PARINT include adaptive, quasi-Monte Carlo and Monte Carlo integration. For N uniform random points xi; the Monte Carlo approx- imation, N ¯ 1 Q f = f = ∑ fi; (2) N i=1 with fi = f (xi); is suitable for moderate to high dimensions d; or an irregular inte- grand behavior or integration domain. In the latter case, the domain can be enclosed in a cube, where the integrand is set to zero outside of D: The main elements of MC methods are the error bounding and the underlying Pseudo-Random Number Generators (PRNGs), which are discussed in Sections 2 and 3, respectively. In particular, Section 3 covers the implementation and use of parallel PRNGs in CUDA. Section 5 describes dynamic parallelism, which became available with CUDA- 5.0, for GPU cards supporting compute capability 3.0 and up. The recursive or “vertical” version in Section 5.1, and the iterative or “horizontal” version in Sec- tion 5.2 are utilized for an efficient GPU implementation of automatic integration in Section 5.3, which alleviates problems with memory restrictions and kernel launch overhead. Section 6 further addresses issues with accuracy and stability of the sum- mations involved in the MC approximation of the result and the sample variance. In Appendix A we include results showing speedups and accuracy for a Feynman loop integral arising in high energy physics [17, 7]. Previously we covered problems from computational geometry in [5]. Monte Carlo Integration with Dynamic Parallelism in CUDA 3 2 MC Convergence In the following we give a derivation of the error estimate for the one-dimensional case [4]; a similar formulation can be given for d > 1: Let x1;x2;::: be random variables drawn from a probability distribution with R ¥ density function m(x); −¥ m(x) dx = 1; and assume that the expected value of f ; R ¥ I = −¥ f (x) m(x) dx exists. For example, x1;x2;::: may be selected at random from a uniform random distribution in [0;1]; m(x) = 1 for 0 ≤ x ≤ 1; and 0 otherwise. According to the Central Limit Theorem, the variance Z ¥ Z ¥ s 2 = ( f (x) − I )2 m(x) dx = f 2(x) m(x) dx − I 2 (3) −¥ −¥ allows for an integration error bound in terms of the probability ls 1 P E f ≤ p = PI + O(p ) (4) N N 2 where PI represents the confidence level PI = p1 R l e−x =2 dx as a function of 2p −l l: For example, PI = 50% for l = 0:6745; PI = 90% for l = 1:645; PI = 95% for l = 1:96; PI = 99% for l = 2:576: In other words we canp state that, e.g., with probability or confidence level of 99% thep error E ≤ 2:567s= N: The behavior of the error bound in (4) represents the 1= N law of MC integration. Note that (4) assumes that s exists, which relies on the integrand being square-integrable, even though that is not required for the convergence of the MC method [4]. To calculate the error estimate we employ the sample variance estimate N N N 2 1 ¯ 2 1 2 ¯ ¯2 s¯ = ∑( fi − f ) = (∑ fi − 2 ∑ fi f + N f ) N − 1 i=1 N − 1 i=1 i=1 N 1 2 ¯2 ¯2 = (∑ fi − 2N f + N f ) N − 1 i=1 N 1 2 ¯2 = (∑ fi − N f ): (5) N − 1 i=1 We also make use of simple antithetic variates, which is known as a variance reduc- tion method, where for each sample point x in (2), an evaluation is also performed at the point x∗ = 1 − x (symmetric with respect to the centroid of the cube) [9]. Thus (2) becomes 1 N f + f ∗ Q f = ∑ i i ; (6) N i=1 2 and we update the variance estimate of (5) as 4 Elise de Doncker, John Kapenga and Rida Assaf N ∗ 2 N ∗ 2! 1 fi + f ∑ ( fi + f ) s¯ 2 = ∑ i − N i=1 i : N − 1 i=1 2 2N In view of (4) we can then use 1 N ∗ 2 N ∗ 2! 2 l fi + fi ∑i=1( fi + fi ) E ≤ p ∑ − N (7) N(N − 1) i=1 2 2N as an error estimate or bound (under certain conditions). There are a number of additional variance reduction techniques that can be ap- plied to Monte Carlo integration in addition to antithetic variates, such as control variates, importance sampling and stratified sampling. These can be included in a numerical integration package, such as PARINT, using the framework presented here, but are not the focus of the chapter and will not be discussed here. 3 Pseudo-Random Number Generators (PRNGs) A pseudo-random number generator (PRNG) is a deterministic procedure to gener- ate a sequence of numbers that behaves like a random sequence and can be tested for adherence to relevant statistical properties. Several PRNG testing packages are available [20, 2, 18]. Marsaglia’s DIEHARD [20, 2] is nowadays being superseded by the more stringent tests in SmallCrush, Crush and BigCrush by L’Equyer [18] in the TESTU01 framework. Whereas function sampling for Monte Carlo integration is considered embarrass- ingly parallel, some applications will involve large sets of evaluations, thus requir- ing efficient high quality random number streams on various multilevel architectures containing distributed nodes, multicore processors and many-core accelerators. In particular, good pseudo-random sequences are required for multiple threads or pro- cesses, and a much larger period of the sequence may be needed in view of much intensified sampling and larger problems. Choosing new PRNGs may also necessi- tate validation of streams of random numbers generated by different methods being used together. The brief overview of parallel PRNGs in this section is based on our AIAA SciTech 2014 paper [6]. Solutions for associating good quality streams of random numbers to many threads can be sought by [19]: (i) employing a PRNG with a long period and gen- erating subsequences started with different seeds; (ii) dividing a random number sequence with a long period into non-overlapping subsequences; and (iii) using dif- ferent PRNGs of the same class with different parameters. Continuous progress has been made in this area, and there are several open source PRNG packages in wide use, such as the Scalable Parallel Random Number Gen- erators Library, (SPRNG) [32]. To assure statistically independent streams of ran- dom numbers, the SPRNG library incorporates particular parametrizations of: ad- Monte Carlo Integration with Dynamic Parallelism in CUDA 5 ditive lagged-Fibonacci, prime modulus linear congruential, and maximal-period shift-register generators.
Recommended publications
  • Analisis Dan Perbandingan Berbagai Algoritma Pembangkit Bilangan Acak
    Analisis dan Perbandingan berbagai Algoritma Pembangkit Bilangan Acak Micky Yudi Utama - 13514011 Program Studi Teknik Informatika Sekolah Teknik Elektro dan Informatika Institut Teknologi Bandung, Jl. Ganesha 10 Bandung 40132, Indonesia [email protected] Abstrak—Makalah ini bertujuan untuk membandingkan dan Terdapat banyak algoritma yang telah dibuat untuk menganalisis berbagai pembangkit bilangan acak yang sudah membangkitkan bilangan acak, seperti Mersenne Twister dan dibangun. Pembangkit bilangan acak memiliki peranan yang luas variannya, HC-256, ChaCha20, ISAAC64, PCG dan dalam berbagai bidang. Oleh karena itu, perlu diketahui variannya, xoroshiro128+, xorshift+, Random123, dll. Setiap kelebihan dan kekurangan dari setiap pembangkit bilangan acak, agar dapat ditentukan algoritma yang akan digunakan pada algoritma tentu memiliki kelebihan dan kelemahan domain tertentu. Algoritma yang akan dianalisis adalah masing-masing. Oleh karena itu, pada makalah ini, akan Mersenne Twister, xoroshiro128+, PCG, SplitMix, dan Lehmer. dilakukan eksperimen dan analisis terhadap beberapa algoritma Untuk masing-masing algoritma, akan dilakukan pengukuran PRNG. Algoritma yang dipilih adalah dua varian dari kualitas statistik dengan TestU01 dan kecepatannya. Selain itu, Mersenne Twister yakni MT19937 dan SFMT dikarenakan akan dianalisis kompleksitas, penggunaan memori, dan periode banyaknya penggunaan Mersenne Twister dalam berbagai dari masing-masing algoritma. aplikasi. Selain itu, dipilih juga xoroshiro128+ dan PCG yang dianggap sebagai dua algoritma terbaik saat ini. Kemudian, Kata kunci—Pembangkit bilangan acak, kelebihan, dipilih SplitMix sebagai salah satu algoritma yang cukup baru kekurangan, TestU01 dan terkenal. Yang terakhir, dipilih Lehmer64 yang merupakan I. PENDAHULUAN pengembangan dari LCG. Tujuan dari makalah ini adalah untuk mengetahui kelebihan Pembangkit bilangan acak merupakan salah satu aplikasi dan kekurangan dari masing-masing algoritma.
    [Show full text]
  • CUDA Toolkit 4.2 CURAND Guide
    CUDA Toolkit 4.2 CURAND Guide PG-05328-041_v01 | March 2012 Published by NVIDIA Corporation 2701 San Tomas Expressway Santa Clara, CA 95050 Notice ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, "MATERIALS") ARE BEING PROVIDED "AS IS". NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication or otherwise under any patent or patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all information previously supplied. NVIDIA Corporation products are not authorized for use as critical components in life support devices or systems without express written approval of NVIDIA Corporation. Trademarks NVIDIA, CUDA, and the NVIDIA logo are trademarks or registered trademarks of NVIDIA Corporation in the United States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated. Copyright Copyright ©2005-2012 by NVIDIA Corporation. All rights reserved. CUDA Toolkit 4.2 CURAND Guide PG-05328-041_v01 | 1 Portions of the MTGP32 (Mersenne Twister for GPU) library routines are subject to the following copyright: Copyright ©2009, 2010 Mutsuo Saito, Makoto Matsumoto and Hiroshima University.
    [Show full text]
  • A Random Number Generator for Lightweight Authentication Protocols: Xorshiftr+
    Turkish Journal of Electrical Engineering & Computer Sciences Turk J Elec Eng & Comp Sci (2017) 25: 4818 { 4828 http://journals.tubitak.gov.tr/elektrik/ ⃝c TUB¨ ITAK_ Research Article doi:10.3906/elk-1703-361 A random number generator for lightweight authentication protocols: xorshiftR+ Umut Can C¸ABUK, Omer¨ AYDIN∗, G¨okhanDALKILIC¸ Department of Computer Engineering, Faculty of Engineering, Dokuz Eyl¨ulUniversity, Izmir,_ Turkey Received: 30.03.2017 • Accepted/Published Online: 05.09.2017 • Final Version: 03.12.2017 Abstract: This paper presents the results of research that aims to find a suitable, reliable, and lightweight pseudorandom number generator for constrained devices used in the Internet of things. Within the study, three reduced versions of the xorshift+ generator are built. They are tested using the TestU01 suite as well as the NIST suite to measure their ability to produce randomness and performance values along with some other existing generators. The best of our reduced variations according to our tests, called the xorshiftR+, demonstrated great suitability for lightweight devices considering its randomness, performance, and resource usage. Key words: TestU01, xorshift, lightweight cryptography, Internet of things 1. Introduction The rapidly emerging concept of the Internet of things (IoT) brings new approaches to everyday problems as well as industrial applications. These approaches rely on bundles of cheap, efficient, and dedicated networked devices that work and communicate continuously. These so-called lightweight devices of the IoT have limited power, space, and computation resources; hence, there is a huge need for developing suitable security protocols and methodologies tailored for those. Furthermore, most of the contemporary security protocols are not optimized for lightweight environments and require more sources than IoT devices may efficiently provide.
    [Show full text]
  • High-Performance Pseudo-Random Number Generation on Graphics Processing Units
    High-Performance Pseudo-Random Number Generation on Graphics Processing Units Nimalan Nandapalan1, Richard P. Brent1;2, Lawrence M. Murray3, and Alistair Rendell1 1 Research School of Computer Science, The Australian National University 2 Mathematical Sciences Institute, The Australian National University 3 CSIRO Mathematics, Informatics and Statistics Abstract. This work considers the deployment of pseudo-random num- ber generators (PRNGs) on graphics processing units (GPUs), devel- oping an approach based on the xorgens generator to rapidly produce pseudo-random numbers of high statistical quality. The chosen algorithm has configurable state size and period, making it ideal for tuning to the GPU architecture. We present a comparison of both speed and statistical quality with other common parallel, GPU-based PRNGs, demonstrating favourable performance of the xorgens-based approach. Keywords: Pseudo-random number generation, graphics processing units, Monte Carlo 1 Introduction Motivated by compute-intense Monte Carlo methods, this work considers the tailoring of pseudo-random number generation (PRNG) algorithms to graph- ics processing units (GPUs). Monte Carlo methods of interest include Markov chain Monte Carlo (MCMC) [5], sequential Monte Carlo [4] and most recently, particle MCMC [1], with numerous applications across the physical, biological and environmental sciences. These methods demand large numbers of random variates of high statistical quality. We have observed in our own work that, after acceleration of other components of a Monte Carlo program on GPU [13, 14], the PRNG component, still executing on the CPU, can bottleneck the whole pro- cedure, failing to produce numbers as fast as the GPU can consume them. The aim, then, is to also accelerate the PRNG component on the GPU, without com- promising the statistical quality of the random number sequence, as demanded by the target Monte Carlo applications.
    [Show full text]
  • Accelerating Probabilistic Computing with a Stochastic Processing Unit
    Accelerating Probabilistic Computing with a Stochastic Processing Unit by Xiangyu Zhang Department of Electrical and Computer Engineering Duke University Date: Approved: Alvin R. Lebeck, Advisor Hai Li Sayan Mukherjee Daniel J. Sorin Lisa Wu Wills Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Electrical and Computer Engineering in the Graduate School of Duke University 2020 Abstract Accelerating Probabilistic Computing with a Stochastic Processing Unit by Xiangyu Zhang Department of Electrical and Computer Engineering Duke University Date: Approved: Alvin R. Lebeck, Advisor Hai Li Sayan Mukherjee Daniel J. Sorin Lisa Wu Wills An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Electrical and Computer Engineering in the Graduate School of Duke University 2020 Copyright c 2020 by Xiangyu Zhang All rights reserved except the rights granted by the Creative Commons Attribution-Noncommercial Licence Abstract Statistical machine learning becomes a more important workload for computing sys- tems than ever before. Probabilistic computing is a popular approach in statistical machine learning, which solves problems by iteratively generating samples from pa- rameterized distributions. As an alternative to Deep Neural Networks, probabilistic computing provides conceptually simple, compositional, and interpretable models. However, probabilistic algorithms are often considered too slow on the conventional processors due to sampling overhead to 1) computing the parameters of a distribu- tion and 2) generating samples from the parameterized distribution. A specialized architecture is needed to address both the above aspects. In this dissertation, we claim a specialized architecture is necessary and feasible to efficiently support various probabilistic computing problems in statistical machine learning, while providing high-quality and robust results.
    [Show full text]
  • Arxiv:1811.04035V1 [Cs.CR] 3 Nov 2018 Generated Random Numbers for Various Purposes
    A Search for Good Pseudo-random Number Generators : Survey and Empirical Studies a a a, Kamalika Bhattacharjee , Krishnendu Maity , Sukanta Das ∗ aDepartment of Information Technology, Indian Institute of Engineering Science and Technology, Shibpur, West Bengal, India 711103 Abstract In today’s world, several applications demand numbers which appear random but are generated by a background algorithm; that is, pseudo-random num- bers. Since late 19th century, researchers have been working on pseudo-random number generators (PRNGs). Several PRNGs continue to develop, each one de- manding to be better than the previous ones. In this scenario, this paper targets to verify the claim of so-called good generators and rank the existing genera- tors based on strong empirical tests in same platforms. To do this, the genre of PRNGs developed so far has been explored and classified into three groups – linear congruential generator based, linear feedback shift register based and cellular automata based. From each group, well-known generators have been chosen for empirical testing. Two types of empirical testing has been done on each PRNG – blind statistical tests with Diehard battery of tests, TestU01 library and NIST statistical test-suite and graphical tests (lattice test and space- time diagram test). Finally, the selected 29 PRNGs are divided into 24 groups and are ranked according to their overall performance in all empirical tests. Keywords: Pseudo-random number generator (PRNG), Diehard, TestU01, NIST, Lattice Test, Space-time Diagram I. Introduction History of human race gives evidence that, since the ancient times, people has arXiv:1811.04035v1 [cs.CR] 3 Nov 2018 generated random numbers for various purposes.
    [Show full text]
  • A Guideline on Pseudorandom Number Generation (PRNG) in the Iot
    If you cite this paper, please use the ACM CSUR reference: P. Kietzmann, T. C. Schmidt, M. Wählisch. 2021. A Guideline on Pseudorandom Number Generation (PRNG) in the IoT. ACM Comput. Surv. 54, 6, Article 112 (July 2021), 38 pages. https://dl.acm.org/doi/10.1145/3453159 112 A Guideline on Pseudorandom Number Generation (PRNG) in the IoT PETER KIETZMANN and THOMAS C. SCHMIDT, HAW Hamburg, Germany MATTHIAS WÄHLISCH, Freie Universität Berlin, Germany Random numbers are an essential input to many functions on the Internet of Things (IoT). Common use cases of randomness range from low-level packet transmission to advanced algorithms of artificial intelligence as well as security and trust, which heavily rely on unpredictable random sources. In the constrained IoT, though, unpredictable random sources are a challenging desire due to limited resources, deterministic real-time operations, and frequent lack of a user interface. In this paper, we revisit the generation of randomness from the perspective of an IoT operating system (OS) that needs to support general purpose or crypto-secure random numbers. We analyse the potential attack surface, derive common requirements, and discuss the potentials and shortcomings of current IoT OSs. A systematic evaluation of current IoT hardware components and popular software generators based on well-established test suits and on experiments for measuring performance give rise to a set of clear recommendations on how to build such a random subsystem and which generators to use. CCS Concepts: • Computer systems organization → Embedded systems; • Mathematics of computing → Random number generation; • Software and its engineering → Operating systems; Additional Key Words and Phrases: Internet of Things, hardware random generator, cryptographically secure PRNG, physically unclonable function, statistical testing, performance evaluation, survey 1 INTRODUCTION Random numbers are essential in computer systems to enfold versatility and enable security.
    [Show full text]
  • A PCG: a Family of Simple Fast Space-Efficient Statistically Good Algorithms for Random Number Generation
    A PCG: A Family of Simple Fast Space-Efficient Statistically Good Algorithms for Random Number Generation MELISSA E. O’NEILL, Harvey Mudd College This paper presents a new uniform pseudorandom number generation scheme that is both extremely practical and statistically good (easily passing L’Ecuyer and Simard’s TestU01 suite). It has a number of important properties, including solid mathematical foundations, good time and space performance, small code size, multiple random streams, and better cryptographic properties than are typical for a general-purpose generator. The key idea is to pass the output of a fast well-understood “medium quality” random number generator to an efficient permutation function (a.k.a. hash function), built from composable primitives, that enhances the quality of the output. The algorithm can be applied at variety of bit sizes, including 64 and 128 bits (which provide 32- and 64-bit outputs, with periods of 264 and 2128). Optionally, we can provide each b-bit generator b 1 with a b 1 bit stream-selection constant, thereby providing 2 ¡ random streams, which are full period and ¡ b entirely distinct. An extension adds up to 2b-dimensional equidistribution for a total period of 2b2 . The construction of the permutation function and the period-extension technique are both founded on the idea of permutation functions on tuples. In its standard variants, b-bit generators use a 2b/2-to-1 function to produce b/2 bits of output. These functions can be designed to make it difficult for an adversary to discover the generator’s internal state by examining its output, and thus make it challenging to predict.
    [Show full text]
  • Fast Splittable Pseudorandom Number Generators
    Fast Splittable Pseudorandom Number Generators Guy L. Steele Jr. Doug Lea Christine H. Flood Oracle Labs SUNY Oswego Red Hat Inc [email protected] [email protected] [email protected] Abstract 1. Introduction and Background We describe a new algorithm SPLITMIX for an object- Many programming language environments provide a pseu- oriented and splittable pseudorandom number generator dorandom number generator (PRNG), a deterministic algo- (PRNG) that is quite fast: 9 64-bit arithmetic/logical opera- rithm to generate a sequence of values that is likely to pass tions per 64 bits generated. A conventional linear PRNG ob- certain statistical tests for “randomness.” Such an algorithm ject provides a generate method that returns one pseudoran- is typically described as a finite-state machine defined by dom value and updates the state of the PRNG, but a splittable a transition function τ, an output function µ, and an ini- PRNG object also has a second operation, split, that replaces tial state (or seed) s0; the generated sequence is zj where the original PRNG object with two (seemingly) independent sj = τ(sj−1) and zj = µ(sj) for all j ≥ 1. (An alternate PRNG objects, by creating and returning a new such object formulation is sj = τ(sj−1) and zj = µ(sj−1) for all j ≥ 1, and updating the state of the original object. Splittable PRNG which may be more efficient when implemented on a super- objects make it easy to organize the use of pseudorandom scalar processor because it allows τ(sj−1) and µ(sj−1) to numbers in multithreaded programs structured using fork- be computed in parallel.) Because the state is finite, the se- join parallelism.
    [Show full text]
  • It Is High Time We Let Go of the Mersenne Twister∗
    It Is High Time We Let Go Of The Mersenne Twister∗ SEBASTIANO VIGNA, Università degli Studi di Milano, Italy When the Mersenne Twister made his first appearance in 1997 it was a powerful example of how linear maps on F2 could be used to generate pseudorandom numbers. In particular, the easiness with which generators with long periods could be defined gave the Mersenne Twister a large following, in spite of the fact thatsuch long periods are not a measure of quality, and they require a large amount of memory. Even at the time of its publication, several defects of the Mersenne Twister were predictable, but they were somewhat obscured by other interesting properties. Today the Mersenne Twister is the default generator in C compilers, the Python language, the Maple mathematical computation system, and in many other environments. Nonetheless, knowledge accumulated in the last 20 years suggests that the Mersenne Twister has, in fact, severe defects, and should never be used as a general-purpose pseudorandom number generator. Many of these results are folklore, or are scattered through very specialized literature. This paper surveys these results for the non-specialist, providing new, simple, understandable examples, and it is intended as a guide for the final user, or for language implementors, so that they can take an informed decision about whether to use the Mersenne Twister or not. 1 INTRODUCTION “Mersenne Twister” [22] is the collective name of a family of PRNGs (pseudorandom number 1 generators) based on F2-linear maps. This means that the state of the generator is a vector of bits of size n interpreted as an n-dimensional vector on F2, the field with two elements, and the next-state function of the generator is an F2-linear map.
    [Show full text]
  • Curand Library
    cuRAND Library Programming Guide PG-05328-050_v11.4 | September 2021 Table of Contents Introduction.........................................................................................................................viii Chapter 1. Compatibility and Versioning..............................................................................1 Chapter 2. Host API Overview...............................................................................................2 2.1. Generator Types........................................................................................................................ 3 2.2. Generator Options..................................................................................................................... 3 2.2.1. Seed..................................................................................................................................... 3 2.2.2. Offset....................................................................................................................................3 2.2.3. Order....................................................................................................................................4 2.3. Return Values............................................................................................................................ 6 2.4. Generation Functions................................................................................................................ 7 2.5. Host API Example......................................................................................................................8
    [Show full text]
  • Survey on Hardware Implementation of Random
    Survey on hardware implementation of random number generators on FPGA: Theory and experimental analyses Mohammed Bakiri, Christophe Guyeux, Jean Couchot, Abdelkrim Oudjida To cite this version: Mohammed Bakiri, Christophe Guyeux, Jean Couchot, Abdelkrim Oudjida. Survey on hardware im- plementation of random number generators on FPGA: Theory and experimental analyses. Computer Science Review, Elsevier, 2018, 27, pp.135 - 153. hal-02182827 HAL Id: hal-02182827 https://hal.archives-ouvertes.fr/hal-02182827 Submitted on 13 Jul 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Survey on Hardware Implementation of Random Number Generators on FPGA: Theory and Experimental Analyses Mohammed Bakiria,b,∗, Christophe Guyeuxa, Jean-Fran¸coisCouchota, Abdelkrim Kamel Oudjidab aFemto-ST Institute, DISC Department, UMR 6174 CNRS, University of Bourgogne Franche-Comt´e,Belfort, 90010, France bCentre de D´eveloppement des Technologies Avanc´ees,ASM/DMN Department, Cit 20 aot 1956 Baba Hassen, B. P 17,16303, Alger, Algeria Abstract Random number generation refers to many applications such as simulation, nu- merical analysis, cryptography etc. Field Programmable Gate Array (FPGA) are reconfigurable hardware systems, which allow rapid prototyping. This re- search work is the first comprehensive survey on how random number generators are implemented on Field Programmable Gate Arrays (FPGAs).
    [Show full text]