The LUT-SR Family of Uniform Random Number Generators for FPGA Architectures David B

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 The LUT-SR Family of Uniform Random Number Generators for FPGA Architectures David B. Thomas, Member, IEEE, and Wayne Luk, Fellow, IEEE Abstract— Field-programmable gate array (FPGA) optimized and be difficult to integrate into existing tools and design random number generators (RNGs) are more resource-efficient flows. Faced with these unpalatable choices, engineers under than software-optimized RNGs because they can take advantage time constraints understandably choose less efficient methods, of bitwise operations and FPGA-specific features. However, it is difficult to concisely describe FPGA-optimized RNGs, so they are such as combined Tausworthe generators [3] or parallel linear not commonly used in real-world designs. This paper describes feedback shift registers (LFSRs). a type of FPGA RNG called a LUT-SR RNG, which takes This paper describes a family of generators which makes advantage of bitwise XOR operations and the ability to turn it easier to use FPGA-optimized generators by providing a lookup tables (LUTs) into shift registers of varying lengths. This simple method for engineers to instantiate an RNG that meets provides a good resource–quality balance compared to previous FPGA-optimized generators, between the previous high-resource the specific needs of their application. Specifically, it shows high-period LUT-FIFO RNGs and low-resource low-quality LUT- how to create a family of generators called LUT-SR RNGs, OPT RNGs, with quality comparable to the best software which use LUTs as shift registers to achieve high quality and generators. The LUT-SR generators can also be expressed using long periods, while requiring very few resources. The main ++ asimpleC algorithm contained within this paper, allowing 60 contributions of this paper are as follows: fully-specified LUT-SR RNGs with different characteristics to be embedded in this paper, backed up by an online set of very high 1) a type of FPGA-optimized uniform RNG called a speed integrated circuit hardware description language (VHDL) LUT-SR generator is presented which uses LUT-based generators and test benches. shift registers to implement generators with periods of Index Terms— Equidistribution, field-programmable gate 21024 − 1 or more, using two LUTs and two flip flops array (FPGA), uniform random number generator (RNG). (FFs) per generated random bit; 2) an algorithm for describing LUT-SR RNGs using five integers is given, along with a set of open-source test I. INTRODUCTION benchs and tools; ONTE CARLO applications are ideally suited to field- 3) tables of 60 LUT-SR RNGs are presented, covering Mprogrammable gate arrays (FPGAs) because of the output widths from 32 up to 624, with periods from highly parallel nature of the applications, and because it is 21024 − 1upto219937 − 1; possible to take advantage of hardware features to create very 4) a theoretical quality analysis of the given RNGs in efficient random number generators (RNGs). In particular, terms of equidistribution and a comparison with other uniform random bits are extremely cheap to generate in an software and hardware RNGs are carried out. FPGA, as large numbers of bits can be generated per cycle The LUT-SR family was first presented in a conference at high clock rates using lookup tables [1], or first-in-first- paper [4], which concentrated on the practical aspects of out (FIFO) queues [2]. In addition, these generators can be constructing and using these generators. This paper adds customized to meet the exact requirements of the application, Section V, which describes the method used to find maximum both in terms of the number of bits required per cycle, and period generators, Section VI, which describes the process for the FPGA architecture of the target platform. used to select the highest quality generators, and Section VIII, Despite these advantages, FPGA-optimized generators are which gives a rigorous theoretical quality analysis in terms of not widely used in practice, as the process of constructing equidistribution. a generator for a given parameterization is time consuming, in terms of both developer man hours and CPU time. While it is possible to construct all possible generators ahead of II. OVERVIEW OF BINARY LINEAR RNGS time, the resulting set of cores would require many megabytes, The LUT-SR RNGs are part of a large family of RNGs, all Manuscript received January 24, 2012; accepted March 21, 2012. This work ofwhicharebasedonbinarylinear recurrences. This family was supported in part by the U.K. Engineering and Physical Sciences Research includes many of the most popular contemporary software Council under Grant EP/D062322/1 and Grant EP/C549481/1. D. B. Thomas is with the Department of Electrical and Electronic generators, such as the Mersenne Twister (MT-19937) [5], Engineering, Imperial College London, London SW7, U.K. (e-mail: the Combined Tausworthe (TAUS-113) [3], SFMT [6], [email protected]). WELL [7], and TT-800 [8]. This section gives an overview W. Luk is with the Department of Computing, Imperial College London, London SW7, U.K. (e-mail: [email protected]). of the underlying maths, and describes existing binary linear Digital Object Identifier 10.1109/TVLSI.2012.2194171 RNGs used in FPGAs. 1063–8210/$31.00 © 2012 IEEE This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS A. Binary Linear RNGs the output bit width, and it is initialized to an unknown random ,..., Binary linear recurrences operate on bits (binary digits), state x0, then observing y1 yd−1 gives no information at ,..., where addition and multiplication of bits is implemented using all about what yd will be. However, observing y1 yd may exclusive-or (⊕) and bitwise-and (⊗).1 The recurrence of an allow us to make predictions about the value of yd+1,orin RNG with n-bit state and r-bit outputs is defined as some cases allow us to predict it precisely. There are three stages when designing such a generator. xi+1 = Axi (1) 1) Describe a family of generators G F , such that each y + = Bx + (2) i 1 i 1 member of G F can be efficiently implemented in the T where xi = (xi,1, xi,2,...,xi,n ) is the n-bit state of the target architecture. However, only some members of G F T generator, yi = (yi,1, yi,2,...,yi,r ) is the r-bit output of the will have the maximum period property. generator, A is an n × n binary transition matrix, and B is 2) Extract a maximum period subset G M ⊂ G F ,such an r × n binary output matrix. Because the state is finite, and that all members of G M implement a matrix A with the recurrence is deterministic, eventually the state sequence a primitive characteristic polynomial. This is achieved x0, x1, x2,... must start to repeat. either by randomly selecting and testing members of | | The minimum value p such that xi+p = xi is called the G F , or by exhaustive enumeration if G F is small. period of the generator, and one goal in designing RNGs is to 3) Find the generator gI ∈ G M which produces the output achieve the maximum period of p = 2n − 1. A period of 2n stream with highest statistical quality, either by consid- cannot be achieved because it is impossible to choose A such ering multiple members with different A matrices, or by that x0 = 0 maps to anything other than x1 = 0.Thisleadsto trying many different B matrices for a single transition two sequences in a maximum period generator: a degenerate matrix. sequence of length 1 which contains only zero, and the main The selected RNG instance gI ∈ G F can then be expressed sequence which iterates through every possible nonzero n-bit as code [e.g., C or VHDL] and used in the target architecture. pattern before repeating. A necessary and sufficient condition for a generator to have maximum period is that the characteristic polynomial P(z) of the transition matrix A must be B. LUT-Optimized (LUT-OPT) RNGs primitive [1]. LUT-OPT generators [1] are a family of generators with The matrix B is used to transform the internal RNG state amatrixA where each row and column contains t − 1or into the random output bits produced by the generator. In the t 1s. In hardware terms, this means that each row maps to simplest case, we have r = n and B = I, which means the a t − 1ort input XOR gate, and so can be implemented in a state bits are used directly as random output bits, but in many single t input LUT. Thus if the current vector state is held in a generators most of the internal state bits are not sufficiently register, each bit of the new vector state can be calculated in random. In these cases, r < n, and either some state bits are a single LUT, and an r-bit generator can be implemented in not propagated through to the output, or multiple state bits are r fully utilized LUT-FFs. The basic structure of a LUT-OPT XOR’d together to produce each output bit. generator is shown in Fig. 1(a). The quality of a generator is measured in two ways: empir- A simple example of a maximum period LUT-OPT gener- ical tests, which look at generated sequences of numbers, and ator with r = 6andt = 3 is given by the recurrence theoretical tests, which consider mathematical properties of ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ the entire number sequence.

Load more