Algorithms and Architectures for Decimal Transcendental Function Computation

Algorithms and Architectures for Decimal Transcendental Function Computation A Thesis Submitted to the College of Graduate Studies and Research in Partial Fulfillment of the Requirements for the degree of Doctor of Philosophy in the Department of Electrical and Computer Engineering University of Saskatchewan Saskatoon, Saskatchewan, Canada By Dongdong Chen c Dongdong Chen, January, 2011. All rights reserved. Permission to Use In presenting this thesis in partial fulfilment of the requirements for a Postgraduate degree from the University of Saskatchewan, I agree that the Libraries of this University may make it freely available for inspection. I further agree that permission for copying of this thesis in any manner, in whole or in part, for scholarly purposes may be granted by the professor or professors who supervised my thesis work or, in their absence, by the Head of the Department or the Dean of the College in which my thesis work was done. It is understood that any copying or publication or use of this thesis or parts thereof for financial gain shall not be allowed without my written permission. It is also understood that due recognition shall be given to me and to the University of Saskatchewan in any scholarly use which may be made of any material in my thesis. Requests for permission to copy or to make other use of material in this thesis in whole or part should be addressed to: Head of the Department of Electrical and Computer Engineering 57 Campus Drive University of Saskatchewan Saskatoon, Saskatchewan Canada S7N 5A9 i Abstract Nowadays, there are many commercial demands for decimal floating-point (DFP) arithmetic operations such as financial analysis, tax calculation, currency conversion, Internet based applications, and e-commerce. This trend gives rise to further development on DFP arithmetic units which can perform accurate computations with exact decimal operands. Due to the significance of DFP arithmetic, the IEEE 754-2008 standard for floating-point arithmetic includes it in its specifications. The basic decimal arithmetic unit, such as decimal adder, subtracter, multiplier, divider or square-root unit, as a main part of a decimal microprocessor, is attracting more and more researchers’ attentions. Recently, the decimal- encoded formats and DFP arithmetic units have been implemented in IBM’s system z900, POWER6, and z10 microprocessors. Increasing chip densities and transistor count provide more room for designers to add more essential functions on application domains into upcoming microprocessors. Decimal transcendental functions, such as DFP logarithm, antilogarithm, exponential, reciprocal and trigonometric, etc, as useful arithmetic operations in many areas of science and engineering, has been specified as the recommended arithmetic in the IEEE 754-2008 standard. Thus, vir- tually all the computing systems that are compliant with the IEEE 754-2008 standard could include a DFP mathematical library providing transcendental function computation. Based on the development of basic decimal arithmetic units, more complex DFP transcendental arithmetic will be the next building blocks in microprocessors. In this dissertation, we researched and developed several new decimal algorithms and architectures for the DFP transcendental function computation. These designs are com- posed of several different methods: 1) the decimal transcendental function computation based on the table-based first-order polynomial approximation method; 2) DFP logarithmic and antilogarithmic converters based on the decimal digit-recurrence algorithm with selec- tion by rounding; 3) a decimal reciprocal unit using the efficient table look-up based on Newton-Raphson iterations; and 4) a first radix-100 division unit based on the non-restoring algorithm with pre-scaling method. Most decimal algorithms and architectures for the DFP transcendental function computation developed in this dissertation have been the first at- ii tempt to analyze and implement the DFP transcendental arithmetic in order to achieve faithful results of DFP operands, specified in IEEE 754-2008. To help researchers evaluate the hardware performance of DFP transcendental arithmetic units, the proposed architectures based on the different methods are modeled, verified and synthesized using FPGAs or with CMOS standard cells libraries in ASIC. Some of implementation results are compared with those of the binary radix-16 logarithmic and exponential converters; recent developed high performance decimal CORDIC based architecture; and Intel’s DFP transcendental function computation software library. The comparison results show that the proposed architectures have significant speed-up in contrast to the above designs in terms of the latency. The algorithms and architectures developed in this dissertation provide a useful starting point for future hardware-oriented DFP transcendental function computation researches. iii Acknowledgements It is a pleasure to express my sincere appreciation to the people who have assisted me throughout the years of research that have led to this Ph.D dissertation at University of Saskatchewan. First and foremost, I sincerely thank my supervisor, Dr. Seok-Bum Ko, who has sup- ported me throughout this dissertation with his expertise, understanding, and patience whilst allowing me the room to work in my own way. Without his constant guidance, advice and encouragement, this dissertation would not have been possible. Second, I would like to thank the members of my dissertation committee, Dr. Daniel Teng, Dr. Khan Wahid and Dr. Derek Eager, for the invaluable suggestions for my Ph.D dissertation. Also, I would like to thank Dr. Tor Aamodt from the University of British Columbia for taking time out from his busy schedule to serve as my external examiner. Third, I thank all people in the Department of Electrical and Computer Engineering for providing such a great academic environment, in which I can carry out this dissertation. In particular, I would like to thank the faculties and staff of VLSI research group; and thank all workmates for collaborations on several research projects. Fourth, I would appreciate the anonymous reviewers from IEEE Symposium on Computer Arithmetic and IEEE Transactions on Computer for their invaluable comments. I am very grateful to the Dr. Ivan Godard for his insightful advice and brilliant idea for several future research projects. I would like to mention Dr. Liang-Kai Wang, Dr. Mark A. Erle and Dr. Alvaro´ Vázquez for their impressive Ph.D works in the area of the DFP computer arithmetic, which continually inspire me on doing my Ph.D research. Fifth, I would to deliver my thanks to my friends in 2C60 for their friendship and help; special thanks to my beloved family for their love and support. They always encouraged me and asked me to be patience and work harder, and that is the thing I really kept in mind. Finally, I am grateful to the College of Graduate Studies and the Department of Electrical and Computer Engineering for providing financial assistance through scholarships that were invaluable to me. iv Dedicate to my beloved family v Contents Permission to Use i Abstract ii Acknowledgements iv Contents vi List of Tables x List of Figures xi List of Abbreviations xii I Preface 1 1 Introduction 2 1.1 WhyDecimalArithmetic............................. 2 1.2 Motivation..................................... 4 1.3 ResearchOverview ................................ 7 1.4 ResearchContributions.............................. 9 II Research Background 12 2 Decimal Transcendental Arithmetic 13 2.1 DFPFormatsinIEEE754-2008Standard . 13 2.1.1 DFPFormatsandEncodings . 14 2.1.2 DFPArithmeticOperations . 17 2.1.3 DFPRoundingModes .......................... 19 2.1.4 ExceptionHandling............................ 20 2.2 DecimalTranscendentalUnitDesign . 20 2.2.1 Some Details of DFP Transcendental Operations . 21 2.2.2 Classification of Hardware Approaches in BFP . 23 2.2.3 Considerations of Hardware Implementation . 26 2.2.4 Related Basic Decimal Arithmetic . 28 III Table-based First-Order Polynomial Approximation 29 3 A Dynamic Non-Uniform Segmentation Method 30 vi 3.1 Introduction.................................... 30 3.2 Minimax Polynomial Approximation . 32 3.2.1 Notations ................................. 32 3.2.2 Minimax Error Analysis in One Segment . 33 3.3 ANon-UniformSegmentationMethod . 35 3.3.1 Determination of Initial UFB by MiniBit Approach . 37 3.3.2 Evaluation of Bit-Width of Segment Boundary . 38 3.3.3 Partition of Non-Uniform Segments by BSPS . 39 3.4 HardwareArchitecture .............................. 40 3.4.1 SegmentIndexEncoder ......................... 41 3.4.2 EstimationofMemorySizes . .. .. 42 3.5 ExperimentalResults............................... 43 3.5.1 ComparisonResults............................ 43 3.5.2 Evaluation Results for More Functions . 44 3.5.3 MemorySizesforTwoMethods . 45 3.5.4 CPUTimeConsumed .......................... 46 3.6 Summary ..................................... 47 4 Decimal Logarithmic and Antilogarithmic Converters 50 4.1 Introduction.................................... 50 4.2 Decimal Logarithm and Antilogarithm Conversion . .. 51 4.2.1 Binary-based Decimal Logarithm Conversion (Alg. 1) . 51 4.2.2 Decimal Logarithm Conversion (Alg. 2) . 52 4.2.3 Decimal Antilogarithm Conversion (Alg. 3) . 53 4.3 Piecewise Linear Approximation Method . 54 4.3.1 Notations ................................. 54 4.3.2 Decimal Minimax Error Analysis in One Segment . 54 4.3.3 Decimal Dynamic Non-Uniform Segmentation Method .

Algorithms and Architectures for Decimal Transcendental Function Computation

High Performance Decimal Floating-Point Units

Fully Redundant Decimal Arithmetic

History of Binary and Other Nondecimal Numeration

Ed 040 737 Institution Available from Edrs Price

Report Association for Computing Machinery

Is Hirihiti?H DECIMAL

Design and Architecture of New 11:2 Decimal Compressors

Binary Number System

Downloading from Naur's Website: 19

Binary Number System

A High Performance Binary to BCD Converter

Design and Implementation of 4-Bit Decimal Multiplier Using Sign Magnitude Encoding