Variable Long-Precision Arithmetic (VLPA) for Recon Gurable

UNIVERSITY OF CALIFORNIA Los Angeles Variable LongPrecision Arithmetic VLPA for Recongurable Copro cessor Architectures A dissertation submitted in partial satisfaction of the requirements for the degree Do ctor of Philosophy in Computer Science by Alexandre Ferreira Tenca c Copyright by Alexandre Ferreira Tenca ii The dissertation of Alexandre Ferreira Tenca is approved Prof Dr Willian Newman Prof Dr David Rennels Prof Dr Jason Cong Prof Dr Milos D Ercegovac Committee Chair University of California Los Angeles ABSTRACT OF THE DISSERTATION Variable LongPrecision Arithmetic VLPA for Recongurable Copro cessor Architectures by Alexandre Ferreira Tenca Do ctor of Philosophy in Computer Science University of California Los Angeles Professor Prof Dr Milos D Ercegovac Chair This is the abstract iii Contents Introduction The need for VLPA Alternative Arithmetic Systems Software Languages and Libraries for VLPA Existing Copro cessors for Long Precision Computations Chows VP Pro cessor CADAC Controlled Precision Decimal Arithmetic Unit Copro cessor for PascalXSC Interval Arithmetic Copro cessor JANUS VLP Copro cessor for the TM VLP Computation and RCArs Research Ob jectives Dissertation Outline Recongurable Copro cessor Architecture Recongurable Copro cessor Mo del FPGA Architecture FPGA Array Software and Hardware Interface Variable Longprecision Arithmetic VLPA Algorithms VLP Algorithms used in Software Notation and Conventions Software Algorithms for VLP Addition iv v Software Algorithms for VLP Multiplication Software Algorithms for VLP Division Software Algorithms for VLP SquareRo ot Hardware Implementation of VLP Algorithms Multiplication in VLP Copro cessors Division in VLP Copro cessors Squarero ot in VLP Copro cessors Online Algorithms for VLP Computations General Concepts and Scheduling Strategies Summary VLP Multiplier The VLP Multiplication Algorithm Data Path for VLP Multiplication Data Arrangement for Serial Computation Serial Computation of the Residual Pip elined Data Path VLP Multiplication with Precision less than m Truncation p oint to Satisfy Output Precision VLP Multiplication Algorithm for Truncated Results Gain in Performance Op erands with Dierent Precision Execution time of the VLP Multiplier VLP Divider VLP Divison Algorithm Selection Function Scaling Factor M Computation of the Scaling Factor at the Host Prescaling of Op erands vi Online Prescaler Selection Circuit Reducing the Number of Cycles Pip elined Op eration Execution Time VLP Square Ro ot VLP SquareRo ot Algorithm Convergence conditions for Output Selection Selection function with Comp ensated Residual Selection Circuit Performance Evaluation Optimization of the Number of Cycles Execution time Implementation Asp ects and Host Tasks for VLP Op eration Digit Co de Conversion CS to BS Converter BS to NR Converter Tasks Performed at the Host VLP Number Format OntheFly Conversion Digit Expansion and Compression VLP Floating Point Op erations VLP Circuit Design for FPGAs Imp ortant Design Asp ects FPGA Time Parameters Pip eline Degree Digit Representation vii Design of Arithmetic Op erators for FPGAs Addition Multiplication Summary of Results Reco der Circuit for the VLP Multiplier VLP Data Path Area Estimates Delay of Selection Functions Delay of VLP Division Selection Delay of VLP Square Ro ot Selection Performance Evaluation Copro cessor Reconguration Copro cessor Mo del Mo del Parameters Measurements Performance Estimate Copro cessor AreaTime Mo del Simulation Conclusion and Future Research Research Contributions Future Research A Timing Characteristics of the XC FPGAs B Test Program for LP Op erations using GMP version B Test Program for LP Integer Multiplication B Test Program for Floating Point Op erations C Digit radix transformation using BS Co de List of Tables Performance measurement of interval arithmetic on VPI with and without the arithmetic copro cessor VPIAC SSJa Cycle counts for various op erations in the VPIAC SSJa Recurrence equations for online arithmetic Number of stages in the digit by vector multiplier Example of highradix online division using prescaled op erands Example of VLP Square Ro ot in radix Truth table for the FS function FPGA Timing Parameters adapted from Xilinx data b o ok Areadelay of digitparallel adders in the XC Array Multiplier with Bo oth Reco ding CS output Extra area for the LinearArray Multiplier LSA Multiplication radix Area and time estimates for addition and multiplication of nbit op erators using input LUT FPGAs n Area of the VLP division selection function digits in radix n Area of the VLP square ro ot selection function digits in radix n Data Path area for digits in radix pip elined Area CLBs of the VLP data path for some values of n Maximum number of digits read simultaneously in each iteration Maximum area required to implement the VLP algorithms in FPGAs Number of cycles for longprecision op erations in GMP C pr og viii ix Signicant and exp onent manipulation time in GMP Other tasks p erformed by the host during VLP op erations Copro cessor parameters Parameters for the system Hostcopro cessor op eration Integer Multiplication Hostcopro cessor op eration VLP FP multiplication Hostcopro cessor op eration VLP FP divisionprescaling at the host Hostcopro cessor op eration VLP FP division prescaling at the copro cessor Hostcopro cessor op eration VLP FP square ro ot Variation in the Sp eedup with the Host sp eed List of Figures Solutions given to Very Long Precision Computation Copro cessor mo del Copro cessor Organization for VLP op eration Mo del for a Lo okup Table based congurable cell FPGA structure Structure of a CLB in the Xilinx XC series Linear Array of FPGAs SoftwareHardware interface for LP addition Flowchart of NewtonRaphson algorithm for division Flowchart of squarero ot computation using NewtonRaphson metho d Multiplication metho d in the VPIAC Spatial representation of online recurrence equation computation Multipleprecision computation of the recurrence equation Digitslices for VLP multiplication Online computation of the recurrence equation for multiplication Data path for VLP multiplication Digit by vector multiplier in online mo de One layer of the reduction structure Data path delays .

Variable Long-Precision Arithmetic (VLPA) for Recon Gurable

Lab 7: Floating-Point Addition 0.0

The Hexadecimal Number System and Memory Addressing

POINTER (IN C/C++) What Is a Pointer?

Should Equalization Keep on Growing in an Era of Converging Fiscal Capacity?

Subtyping Recursive Types

A Variable Precision Hardware Acceleration for Scientific Computing Andrea Bocco

AAPM Adult Routine Head CT Protocols Document

CS31 Discussion 1E Spring 17’: Week 08

Chapter 9: Memory Management

Warner Sumpter, Usmc Force Recon 1967 - 68

Prisma User Guide

Objective-C Internals Realmac Software