Hardware Acceleration of Monte Carlo-Based Simulations

UNIVERSIDAD POLITÉCNICA DE MADRID ESCUELA TECNICA SUPERIOR DE INGENIEROS DE TELECOMUNICACIÓN HARDWARE ACCELERATION OF MONTE CARLO-BASED SIMULATIONS TESIS DOCTORAL PEDRO ECHEVERRÍA ARAMENDI INGENIERO EN TELECOMUNICACIÓN 2011 DEPARTAMENTO DE INGENIERÍA ELECTRÓNICA ESCUELA TECNICA SUPERIOR DE INGENIEROS DE TELECOMUNICACIÓN UNIVERSIDAD POLITÉCNICA DE MADRID PH.D. THESIS HARDWARE ACCELERATION OF MONTE CARLO-BASED SIMULATIONS Author: Pedro Echeverría Aramendi Telecomunication Engineer Advisor: María Luisa López Vallejo Profesor Titular del Dpto. de Ingeniería Electrónica Universidad Politécnica de Madrid 2011 Ph.D. THESIS: Hardware Acceleration of Monte Carlo-Based Sim- ulations AUTHOR: Pedro Echeverría Aramendi ADVISOR: María Luisa López Vallejo El tribunal nombrado por el Mgfco. y Excmo Sr. Rector de la Universidad Politécnica de Madrid el día 21 de Noviembre de 2011, para juzgar la Tesis arriba indicada, compuesto por los siguientes doctores: PRESIDENTE: D. Carlos Alberto López Barrio VOCALES: D. Javier Díaz Bruguera D. Florent Dupont de Dinechin D. Luis Entrena Arrontes SECRETARIO: D. Carlos Carreras Vaquer Realizado el acto de lectura y defensa de la Tesis el día 21 de Noviembre de 2011 en la E.T.S. de Ingenieros de Telecomunicación, acuerda otorgarle la calificación de: El Secretario del tribunal A mi familia Contents Contents i Abstract vii Resumen xi Acknowledges xv List of Figures xvii List of Tables xxi 1 Introduction 1 1.1 Motivation ........................................ 3 1.1.1 Acceleration Features of FPGAs ........................ 4 1.1.2 Applications .................................. 4 1.1.3 Designing with FPGAs. Challenges ...................... 6 1.2 Objectives and Thesis Structure ............................ 9 1.2.1 Monte Carlo Simulations and Target Application: LIBOR Market Model .. 9 i 1.2.2 Objectives .................................... 11 1.2.3 PhD Thesis Structure .............................. 13 2 Random Number Generation 15 2.1 Random Number Generation: Overall Introduction .................. 16 2.2 Uniform Random Number Generation ......................... 17 2.2.1 Linear Congruential Generators (LCG) .................... 18 2.2.2 Combined Generator Rand2 .......................... 20 2.2.3 Tausworthe Generators ............................. 20 2.2.4 Mersenne Twister ................................ 22 2.3 N(0,1) Gaussian Random Number Generation ..................... 22 2.3.1 Generation methods .............................. 22 2.3.2 Monte Carlo Implications and Hardware Implementation ........... 24 2.3.3 Inversion Method with Quintic Hermite Interpolation ............. 26 2.4 Variance Reduction Techniques ............................. 30 2.4.1 Stratified Sampling and Latin Hypercube ................... 31 2.5 Developed Gaussian Random Number Generator ................... 32 2.5.1 Uniform Random Number Generator ..................... 32 2.5.2 N(0,1) Gaussian Random Number Generator ................. 38 2.5.3 Stratified Sampling and Latin Hypercube ................... 47 2.5.4 Complete GRNG and SW-HW comparison .................. 51 2.6 Extending N(0,1) RNG ................................. 53 2.6.1 Parameterisable RNG based on N(0,1) RNG ................. 54 2.7 Conclusions ....................................... 55 3 Implementing Floating-Point Arithmetic in Configurable Logic 57 3.1 Related Works ...................................... 58 3.2 Floating Point Format IEEE 754 ............................ 60 3.2.1 Format Complexity ............................... 61 3.3 Floating-point Units for FPGAs. Adapting the Format and Standard Compliance .. 62 3.3.1 Simplification of Denormalized Numbers ................... 63 ii 3.3.2 Truncation Rounding .............................. 64 3.3.3 Hardware Representation ............................ 65 3.3.4 Global Approach Analysis ........................... 66 3.4 Operators Architecture ................................. 67 3.4.1 Adder/subtracter ................................ 67 3.4.2 Multiplication .................................. 68 3.4.3 Division ..................................... 69 3.4.4 Square Root ................................... 71 3.4.5 Exponential and Logarithm Units ....................... 72 3.5 Libraries Evaluation and Comparison ......................... 76 3.5.1 Comparison with respect to a Commercial Library .............. 77 3.5.2 Operators Evaluation .............................. 78 3.5.3 Replicability .................................. 82 3.6 Towards Standard Compliance and Performance .................... 86 3.6.1 Simplification of Denormalized Numbers: One Bit Exponent Extension ... 86 3.6.2 Truncation Rounding: Mantissa Extension ................... 86 3.6.3 FPGA-oriented floating-point library ..................... 87 3.7 Conclusions ....................................... 89 4 Exponentiation Operator 91 4.1 Exponentiation function ................................. 92 4.1.1 Related Work .................................. 93 4.2 Range and error analysis ................................ 93 4.2.1 Input-output range analysis ........................... 94 4.2.2 General Error analysis ............................. 95 4.2.3 Error Analysis for accurate xy ......................... 97 4.3 Variable precision implementation with FloPoCo ................... 100 4.3.1 Logarithm .................................... 101 4.3.2 Multiplier .................................... 101 4.3.3 Exponential ................................... 101 4.3.4 Exceptions Unit ................................. 102 iii 4.4 Experimental Results .................................. 102 4.4.1 Results Analysis ................................ 103 4.4.2 Comparison with previous work ........................ 103 4.4.3 Exceptions Unit ................................. 104 4.5 Conclusions ....................................... 104 5 LIBOR Market Model Hardware Core 107 5.1 LIBOR Market Model ................................. 108 5.1.1 LIBOR Market Model as base to compute financial products ......... 110 5.2 Model Analysis ..................................... 111 5.2.1 Variables’ Range ................................ 112 5.2.2 Simplifications to the model: Factorization .................. 113 5.2.3 Operators’ complexity ............................. 114 5.2.4 Model Summary ................................ 114 5.2.5 Qualitative Profiling .............................. 115 5.2.6 Data dependencies ............................... 116 5.3 Adapting the model to Hardware ............................ 117 5.3.1 Simulation order ................................ 117 5.3.2 Tailored Arithmetic ............................... 118 5.4 FPGA Monte Carlo Libor Market Model Engine .................... 119 5.4.1 General Architecture .............................. 119 5.4.2 Gaussian RNG Core .............................. 122 5.4.3 LMM Core ................................... 123 5.4.4 Product Valuation Core ............................. 131 5.4.5 Control Unit .................................. 132 5.5 LMM Engine Implementation ............................. 134 5.5.1 Operators’ Features ............................... 134 5.5.2 LMM Core. Precision-Accuracy-Performance ................. 135 5.5.3 Cores Implementation ............................. 140 5.6 Conclusions ....................................... 140 iv 6 Hardware-Software Integration 143 6.1 Hardware-Software Partitioning ............................ 144 6.1.1 Tasks Stability Characteristics ......................... 144 6.1.2 Communication overheads ........................... 145 6.1.3 Achieve maximum possible acceleration .................... 145 6.1.4 Partitioning Policy ............................... 145 6.2 System Architecture and Communications ....................... 146 6.2.1 Why PCI-Express? ............................... 147 6.2.2 Communications Model ............................ 148 6.2.3 Communications Requirements ........................ 150 6.3 PCI Express Core .................................... 151 6.3.1 Within FPGA Communications ........................ 152 6.4 Software ......................................... 155 6.4.1 Driver and Low Level Functions ........................ 155 6.4.2 Application modification ............................ 156 6.5 Experimental Results .................................. 157 6.5.1 Complete Accelerator Implementation Results ................ 158 6.5.2 Software Profiling ............................... 158 6.5.3 Hardware-Software Solution Results ...................... 163 6.6 Conclusions ....................................... 167 7 Conclusions 169 7.1 Contributions and Conclusions of this Thesis ..................... 170 7.1.1 Random Number Generators .......................... 170 7.1.2 Floating-Point Arithmetic Operators and FPGAs ............... 171 7.1.3 LMM Hardware Accelerator .......................... 173 7.1.4 Capacity and Performance of FPGAs. Accelerator design ........... 174 7.1.5 Hardware-Software co-design and Integration ................. 174 7.2 Future Lines of Work .................................. 175 7.2.1 Research lines related to Improvements .................... 175 7.2.2 New Research Lines .............................. 176 v Bibliography 179 vi Abstract During the last years there has been an enormous advance in FPGAs. Traditionally, FPGAs have been used mainly for prototyping as they offer significant advantages at a suitable low cost: flexibility and verification easiness. Their flexibility allows the implementation of different generations of a given application

Load more